Ignored By Dinosaurs 🦕

devops

Problemspace

You've got a new site on Platform.sh that is basically at the end of its development stage, and you're preparing to go live. You've decided on Cloudflare to host your DNS. Cloudflare is a good choice for smaller sites, and I recommend it often. Is has a few things going for it -

  • It has a free tier, which gives you pretty much everything you really need for a personal or small business site.
  • it has a very robust and modern global network.

One of the main features that a modern DNS provider needs to have in order to work with Platform.sh is somethat that's colloquially known as “CName Flattening”. This solves an age-old problem (in internet years) – being able to point your “root domain” to a domain name rather then an IP address. This post explains it better than I can, and that's not eh point of this post anyway.

Another nice feature of Cloudflare is something they call “Flexible SSL”. SSL, HTTPS, TLS, they all mean basically the same thing – that the traffic from your user's browser to your website is being encrypted. This is very important in a practical sense if you're ever on a public wifi network, for example. Setting up SSL on your website can be a bit of a headache though, involving buying a cert, generating some rather arcane crytographic keyfiles, installing them correctly on your server, etc.

Cloudflare offers a bit of a relief from this headache with “flexible SSL”. That means that your site can use Cloudflare's SSL cert to encrypt the traffic from user to Cloudflare (remember, cloudflare is sitting in between your users and your website's server). Traffic from Cloudflare to your website then travels unencrypted over plain old HTTP. This is “suboptimal”, but it does alleviate some of the attack vectors on your users.

Cloudflare's "Flexible SSL" option
 HTTPS HTTP
User ----->----> Cloudflare ----->----> "Origin" server (your website)

The other alternatives to this are either running your site unencrypted over HTTP or using “full SSL”, in which you have to install a cert at the origin in order to encrypt the traffic between Cloudflare and your website.

Cloudflare's "Full SSL" option
 HTTPS HTTPS
User ----->----> Cloudflare ----->----> "Origin" server (your website)

Getting there...

So you've been in development this whole time, you're using HTTPS and redirecting HTTP traffic to HTTPS like a good net citizen and now it's time to go live. You figure you'll just skip setting up a SSL and use Cloudflare's Flexible option. You immediately run into a problem though, of a redirect loop. Here's why..

  • User requests website over HTTP, gets redirected to HTTPS by your application
  • HTTPS travels from user to Cloudflare where it's decrypted and handed back to your website over HTTP
  • Repeat until browser crashes and tells you you have a redirect loop.

You have two options at this point – either allow HTTP traffic as well so that traffic can flow from Cloudflare to your server without being redirected, or go full SSL.

Option A will not work very well in the end. Since your app thinks it's running HTTP, all of the in-site generated links will point to the HTTP version of the pages, which means that as soon as someone clicks a link on your site, they'll be on HTTP. No good.

Solutionspace

Get a cert, install it, and go Full SSL.

Soon come

A post about using letencrypt to set up certs on your Platform.sh project.

#devops #platformsh

Hello, and welcome to “Platform.sh from Scratch”. In this prologue to the series, I'll go over some of the very highest level concepts of Platform.sh so that you'll have a clearer understanding of what this product is and why it came to be.

Platform.sh is a “Platform as a Service”, commonly referred to in this age of acronyms as a “PaaS”. The platform that we provide is essentially a suite of development and hosting tools to make developing software applications a smoother end-to-end process. In order to understand what this means though, I'm going to have to go into some detail in this first sidebar. Skip this if you're comfortable with this post so far.


[!abstract]

PaaS

Everyone has heard of Salesforce. Salesforce has come to be the poster child for what is now referred to as “SaaS” – Software as a service. Prior to the SaaS era if you wanted a piece of software, be it a video game or Quickbooks or anything else, you had to drive to a store and buy a box with some disks in it. Once the internet reached a level of market penetration into people's homes though, those stores went out of business. This is an obvious evolution in hindsight. SaaS is a high level thing – it's a runnable piece of software that you'll access over the internet via a URL. You might be able to modify/config it a little bit, but will never be your entire business. It's not your product. It's someone else's and will likely play some fractional part in your overall business plan.

Almost everyone by this point has heard of Amazon Web Services – AWS. AWS is basically what people are talking about when they say “The Cloud”. AWS is a suite of products that emerged from Amazon when they figured out that they needed a huge amount of datacenter capacity to be able to withstand massive retail events like “Black Friday” and “Cyber Monday”, and that for most of the rest of the year they had tons of excess capacity sitting around draining money from their wallet. What to do with all that excess capacity? Sell it to someone else.

This relatively simple premise has evolved over the last 10-12 years into numerous products from S3 (basically a giant, limitless hard drive in the sky) to EC2 (basically a giant, limitless hosting server in the sky) to Redshift (basically a giant, limitless database that can be used for data warehousing) to SES (a simple service that sends emails) to an ever growing host of other services that always seems to come out just before your start-up figures out that it needs them.

AWS and “the cloud” in general is often given the acronym “IaaS” – infrastructure as a service. They're selling you the low level hardware abstractions that you can assemble into an infrastructure on which to run your software and by extension your company. It requires a decent bit of specialized knowledge for how to use the individual pieces as well as how to plumb them together, but for all intents is infinitely flexible. It's this level that has had most of my interest for the past few years.

In the middle of these two is what's called “platform as a service” – PaaS. This is what Platform.sh is – a suite of software and hosting services that lets you efficiently build and develop your software application, and then deploy your software application to a hosting environment that doesn't require as much specialized knowledge on how to plumb all the pieces together. Nor does the hosting environment require you – and this is a most important detail – to set up monitoring and alerting for if something goes wrong in the public environment.

The PaaS takes elements of both IaaS and SaaS to allow you to build your software product but not have to hire an extra person just to know the low level server business.

So, back to the program. The development tool set of Platform.sh is entirely based around Git. Just in case the reader is not already familiar with Git, I should explain this a little bit.


[!abstract]

Git

Software projects are typically composed of lots of files. If you want to add a new feature, you might be required to make changes in more than one of those files. Of course, before you get started you'll want to make some kind of backup just in case. If it turns out that the change was buggy or unneeded and you want to revert back to a previous state, you'd just restore those few files back to their previous versions.

What if, however, you're working with a bunch of different people and more than one person is working on that change (an utterly common scenario)? How do you manage those backups between all those people? Saving copies of files is basically impossible to manage after a very short while, so out of this need SCM (source code management) was born. It's been through several different iterations by this point, and at this point in time the version of SCM that is leading the market is called Git.

Git is pretty cool. It basically takes snapshots of your entire project whenever you tell it to. It then keeps track of all those snapshots and lets you share those snapshots among a team of developers. Any snapshot can be reverted, and you can see the full history of every change to the codebase so you can keep track of “what happened when”. But wait! There's more!

This is not an exclusive feature of Git, but it has a feature called “branching”. Branching is intuitively named, and is basically the concept of taking a specific snapshot and making changes based off of that one snapshot while other work continues on down the main code line. This is the recommended way to work if you're going to make any kind of significant change to the software, and this method of working allows you to keep the main code line (almost always referred to as the “master branch”) in 100% working order. It can be thought of as having a furniture workshop away from your house where you can work and keep the house clean for company to come over at any moment, as opposed to working in the house and risking having a wreck to present should company decide to drop by.

In essence branching is making a complete copy of your project at a point in time that you can hack on all you like without disturbing anyone else. If and when the change is ready, you “merge” the code back in to the master branch, test it out to make sure everything is still groovy and then you can release the feature or bug fix to the public.

gitGraph
   commit
   commit
   branch develop
   checkout develop
   commit
   commit
   checkout main
   merge develop
   commit
   commit

You can read more about the super basics here if you wish. For now, all you really need to know is that Git

  • Makes it easier to develop software as a team
  • Makes it very cheap and easy to try out new features without breaking anything
  • Makes it easier to manage changes to your software and to revert back to a known non-broken state

update: Hey look, a really great post explaining all this better than I did.


Platform.sh has taken this branching and merging workflow and extended it out into the entire hardware stack. When you're building a software project of any size, there are considerations beyond just the code your team is writing.

Most applications of any size connect to some kind of database in the background, this is where they save “data stuff”. User uploaded images are a very common thing in the web app world, so if those images aren't there the app will look busted.

You can branch your code all you like, but you need these other supporting resources to really do your job. Platform.sh makes branches not just of your code, but the entire infrastructure that your project runs on. This allows you to use the common branching/merging workflow with the complete support of everything else that your application depends on.

This may seem like an obvious feature, since how can you develop a new feature without being able to run it (?), but no other service that I know of actually does this. A branch in Git triggers (for all intents) a complete copy of your production site without requiring you to set up any new servers, copy databases over, copy images and everything else, etc, etc, etc. It's a significant hassle to do all this stuff, trust me, and it slows the team down every time you have to do it. Removing this need removes a major friction point in the workflow for building new features on your software product.

But wait! There's more!!!

This is where it starts getting really, really good. In case you're not aware, there's a website called GitHub. It's where a whole lot of folks have decided to host their “git repos” – repo being short for “repository”, which is basically that series of snapshots of the state of your project/codebase back to the beginning of time. This is the repo for this blog – https://github.com/JGrubb/django-blog, and here's some of the code that just ran to generate this page you're reading – https://github.com/JGrubb/django-blog/blob/master/blog/views.py#L26-L33. Pretty cool, right? And if I were working on a project with a buddy, we could both use this same repo and work on the same project, whether I'm in Germany or New Jersey or wherever. I can pull his changes over and he can pull mine and this is basically how open source software gets written these days.

The same workflow applies though – if you want to make a new feature or even if you just want to fix a bug, you'd make a new branch and do your work and then “submit a merge request”. This basically pings the person who runs the project and says “hey, I would like to suggest making this change. Here's the code I'm changing, maybe you could look it over and if you agree with this change you can merge it in”. By way of an example, here's a list of “pull (aka merge) requests” for the codebase that comprises the documentation for Platform.

Again, this is how software gets written and it's pretty mind blowing if you think about it. We software developers are so used to it our minds cease to be amazed, but not because it's not amazing. I mean, currently participating in that list of PRs are folks from France, Chicago, Hong Kong, the UK, and so on. Amazing. It is also, however, a pain in the ass.

It's a pain in the ass because it's typically impossible to tell if something works or not just by looking at the code, so you have to pull their changes over to your computer and test them out somehow. I bet you can see where this is going! Platform has a GitHub integration (BitBucket too) that will automatically build a working version of any merge request that someone opens against your project. That let's you go visit a working copy of the project and test it out without having to do a thing. Now, I don't care how long you've been doing this, that is mind blowing. For example, here's Ori's (currently work in progress) PR for adding the Ruby runtime documentation – https://github.com/platformsh/platformsh-docs/pull/339. If you click the “show all checks” link down toward the bottom, it expands with a little link “details”. That link takes you to a complete copy of the documentation with Ori's change added to it, so you can read it like you normally would, rather than reviewing a “diff”. It's the future now!

What this means in the wider scope is that your time to set up new things to test out new ideas, only to have to tear it down once the tests pass is time that you don't have to waste anymore. You can test changes out and keep right on moving.

This GitHub integration is only one of the really cool and unique features that Platform provides, but this post has gotten absurdly long already. Fortunately, this is intended to be the prologue to this series, so I'll touch on as many of those features as I can as the series progresses.

#generaldevelopment #devops #theory #platformsh

Problemspace

We recently migrated from using a local Drupal filesystem (Gluster) to using S3 to house our uploaded site assets. This was relatively simple, and killed at least two birds for us, metaphorically speaking. Some of my findings are chronicled in the previous post linked above.

We are loving that we don't have to worry about syncing files between environments anymore, which means that when we are developing a site locally, the image sources are all pointing to their S3 URLs and so everything Just Works. The only tiny problem is that if anyone needs to upload an image in development it goes up to the same production S3 bucket. Obviously this costs us next to nothing, but it bothers my sense of cleanliness.

Solutionspace

S3 has a “lifecycle management” feature that will let you Do Stuff with your bucket assets. Do Stuff is things like delete assets after a certain period, or move them to another “storage class”, which is not in the scope of this post...

The limitations of their lifecycle mgmt are a major bummer. They can only be applied to directories within a bucket or to entire buckets themselves. They cannot be applied (simply) to individual objects. If they could, then the fix would be simple – have a hook on file uploads that adds a “delete-after” header to objects that are uploaded from anything but the production environment.

I'm starting a new job in a few weeks, and they use Ceph for managing network files. I haven't even gotten on board with this gig yet, so I don't know what's going on behind the scenes, but Ceph does have this individual object expiration feature, at least according to their bug tracker. I'm wondering if this can be brought to bear on this issue, because once you don't have to move or copy files between environments anymore, going back feels kind of anachronistic.

#devops

Yeah, the pricing on the new AWS ES service is too high for you too, huh? Well just using their service is a heck of a lot easier and possibly cheaper in dev time than trying to set it up yourself. Consider that. But possibly together we can make it over the hump.

These are the bits that I was stuck on.


Put all your nodes in the same security group

I have a group for all my EC2 instances that has the appropriate ports opened up.

#devops #elasticsearch

This one really took longer than it needed to.

If you're here, hopefully you've already been through this lesson on setting up the full ELK stack with Logtash-Forwarder thrown in to boot. For me it pretty much ran as intended from top to bottom, so hopefully you're already getting data into Elasticsearch and are flummoxed by how every single other logstash config out there to parse your syslog data doesn't seem to do the job and is still just treating it like every other syslog message.

The rest of the steps to configure Drupal/Logstash

Drupal's “syslog facility” setting

This is more or less the key. You have to dig around in Drupal, as well as your webserver and make sure that Drupal is logging to it's own log. By default it'll just go to syslog and then you'll have a hell of a time distinguishing messages from Drupal on the way in.

If you recall your Logstash-Forwarder config, you tagged the syslog watcher with a "type": "syslog" bit. This is really the only info that logstash has at the point that you're setting up your input filters/grok config.

Regardless of Linux flavor, follow this guide to set up the syslog module to point to a logfile of your choosing – https://www.drupal.org/documentation/modules/syslog. I just copied everything in here, so I now have /var/log/drupal.log and it works just fine. The only thing I haven't figured out yet is that now Drupal is logging to both syslog and drupal.log, so somebody tell me how to stop that from happening.

New Logstash-forwarder config

You'll just need to a) remove the old syslog watcher from your Logstash-Forwarder (henceforth LF) config and b) tell it to now watch the new drupal.log instead. This took the relevant bits of my LF config from this

 {
 "paths": [
 "/var/log/syslog",
 "/var/log/auth.log"
 ],
 "fields": { "type": "syslog" }
 }

to this

 {
 "paths": [
 "/var/log/drupal.log"
 ],
 "fields": { "type": "drupal" }
 }

Don't restart LF just yet, we have to config Logstash to understand what to do with input of “type: drupal” first.

New Logstash config

This is where I wasted most of my time over the last few days. I was under the mistaken impression that I could perform some kind of introspection into the fields that were parsed out and then tell Logstash to do this or that with them. As far as I can tell, you'd need to use the “Ruby” Logstash filter to do that, which I didn't feel like this was that complicated a use-case if I could just figure out the “right” way to do it.

Anyway, you've probably already stumbled across this – https://gist.github.com/Synchro/5917252, and this https://www.drupal.org/node/1761974 both of which, annoyingly, show the same useless config (for me, anyway).

My logs look like this —

Oct 4 08:52:34 690elwb07 drupal: http://www.biosciencetechnology.com|1443963154|php|162.220.4.130|http://www.biosciencetechnology.com/news/2011/04/students-prediction-points-way-hot-dense-super-earth||0||Notice: Trying to get property of non-object in abm_metadata_page_alter() (line 41 of /var/www/cascade/prod/brandsites_multi.com/htdocs/docroot/sites/all/modules/custom/abm_metadata/abm_metadata.module).

The config on that page is presumably looking for a string that begins with “http”, which this clearly does not. Here's the config for this particular sequence.

filter {
 if [type] == "drupal" {
 date {
 match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
 }
 grok {
 match => [ "message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: https?://%{HOSTNAME:drupal_vhost}?\|%{NUMBER:drupal_timestamp}\|(?[^\|]\*)\|%{IP:drupal_ip}\|(?[^\|]\*)\|(?[^\|]\*)\|(?[^\|]\*)\|(?[^\|]\*)\|(?.\*)" ]
 }
 }
}

Now restart Logstash, restart LF, and carry on.

#devops #drupal

So this is just another letter to my younger self, straightening out some mental inconsistencies with how I used to think Memcached worked. Much of this will be in the context of Drupal, since much of my work experience is in the context of Drupal. Memcached is obviously not a Drupal specific construct though.

Expository

The first time I ever installed it was at the behest of my senior dev, who suggested just installing it on my laptop and giving it 64M of memory. Drupal is slow as molasses if you don't enable caching, so out of the box it actually comes with a pretty intelligent caching story. By default however, it caches everything into the database. This is “less than ideal” for sure, but since Drupal came up a long time ago in the shared hosting days, and since you could never really know what resources were going to be on the server in the first place it made sense to use what you knew would be there – namely MySQL.

This is not ideal since you're still hitting the database – often the bottleneck with Drupal – but it lightens the load significantly from the default and will definitely keep your site up under load, to a point.

One of the first places you start looking for performance improvements is in moving that out to something a little more responsive and purpose built. Redis is a newer option, but the old reliable standby is Memcached. Drupal has a bunch of tables in the database that start with cache_, and (simplified) they basically all have the format of (cache key)/(value). Love those out into Memcached and rather than hitting the DB, you're hitting memory. This is ideal, since looking something up in RAM is orders of magnitude faster than calling the DB.

This is why big boy sites use Memcached.

Beginning explorations

A bug appeared months ago on our websites. An editor would make a change to a piece of content, say bold an word, and upon saving the piece of content, they would frequently see the old version of the article and not see the change they just made. Obviously this is really annoying, because then they have to go and redo the change they just made, which then would usually work.

This got really interesting when I discovered that clearing the cache on the site would then make the change appear. Clearly this was an issue in the cache layer somewhere.

We used to use a big name hosting vendor who built the servers for us, and Memcached was installed on every webserver and given 512M to work with. I knew that the load balancer would route authenticated traffic to the same webserver, so this lead to my mistaken notion that each webserver had it's own instance of Memcached to work with and that if the editor would hit a different one on saving the page, perhaps they were getting an old version of the article.

This is not how Memcached works, as it turns out.

Go to Memcached.org

So the introductory page of http://memcached.org/ says

Free & open source, high-performance, distributed memory object caching system

What that means for me is that my mental model of each webserver having it's own pool and being unaware of the others was incorrect. What really happens is that each server you add to the pool adds to the overall cache size, and objects are distributed among them only once. I thought we had 4 512M instances of Memcached, but we really had 1 2G pool.

The wiki has some interesting notes on the the design paradigms that are worth quoting.

The server does not care what your data looks like. Items are made up of a key, an expiration time, optional flags, and raw data.

Funny, so basically the exact same schema as the cache tables in the Drupal database. That's handy.

Memcached servers are generally unaware of each other. There is no crosstalk, no syncronization, no broadcasting. The lack of interconnections means adding more servers will usually add more capacity as you expect.

For everything it can, memcached commands are O(1).

So this means that means that it should basically scale infinitely with the same performance. Whether you have 32M on your laptop, or 48G across 6 servers as we have now in production, the lookup time is constant for a piece of cached data.

What about the problem?

I actually just solved it yesterday. It was this – https://www.drupal.org/node/1679344. Learned a hell of a lot about caching in Drupal and caching in general in the last 6 months before really hunkering down to figure this one out.

#devops

There are other articles on this topic around the internet, but for some reason I could never completely make the mental connection on how Drush aliases worked until recently. It's actually really simple to get started, but most other articles tend to throw all the options into their examples so it kind of muddies the waters when you're trying to set yours up. By “you/yours”, of course I mean “I/mine”.

Simple

My work is an Acquia hosting client, and we have a multisite setup. Aliases are a natural fit for multisite configs, so let's show that first.

<?php

// put this in a file at ~/.drush/local.aliases.drushrc.php

$aliases['foo'] = array(
 'root' = '/path/to/docroot',
 'uri' => 'local.foobar.com' // the local dev URL
);

This is all you need to get off the ground and start using aliases locally. If you then run a drush cache-clear drush to reset Drush's internal cache, and then a drush site-alias you should be presented with a listing of your aliases.

@none
@local
@local.foo

The key to this scheme, and something that I feel like was inadequately explained to me even after numerous tutorials, is that the name of the file itself defines the particular group of aliases that this setting will speak to. If you put this into ~/.drush/foo.aliases.drushrc.php then you list of aliases would look like this —-

@none
@foo
@foo.foo

If you're running multisite, you'll have a few more in there —


<?php

// put this in a file at ~/.drush/local.aliases.drushrc.php

$aliases['foo'] = array(
 'root' = '/path/to/docroot',
 'uri' => 'local.foobar.com' // the local dev URL
);
$aliases['bar'] = array(
 'root' => '/path/to/docroot',
 'uri' => 'local.example.com' // the local dev URL
);
$aliases['ibd'] = array(
 'root' => '/path/to/docroot',
 'uri' => 'local.ignoredbydinosaurs.com' // the local dev URL
);


$ drush sa

@none
@local
@local.foo
@local.bar
@local.ibd

Ok, whoop-tee-do, what do you do with that?

Try clearing the cache on one of those sites from anywhere in your file system with drush @local.foo cc all, or clear all the caches on every site in that file with drush @local cc all. This is helpful out of the box even without multisite since you don't have to be in the drupal file tree to call drush and not get yelled at for “not having a high enough bootstrap level”, but this becomes a major time saver in multisite, since the alternative would be cding around constantly to effect commands from different directories in sites/\*.

Nice and simple. Ready to kick it up a notch?

Remote servers

Let's run drush commands on a remote server without having to log in!

<?php

// how about we put this code into
// dev.aliases.drushrc.php

$aliases['foo'] = array(
 'root' = '/var/www/path/to/docroot',
 'uri' => 'dev.foobar.com',
 'remote-host' => 'devbox.example.com',
 'remote-user' => 'ssh_username'
);

$aliases['bar'] = array(
 'root' => '/var/www/path/to/docroot',
 'uri' => 'dev.example.com',
 'remote-host' => 'devbox.example.com',
 'remote-user' => 'ssh_username'
);

This would grow your list of aliases thusly —

$ drush sa

@none
@local
@local.foo
@local.bar
@local.ibd
@dev
@dev.foo
@dev.bar

...and would let you run any old Drush command you want without having to even be bothered with logging in to that server!

Lots more examples and info out there, but this should get you started.

#drupal #devops

I remember being very confused by this one early on. There were boatloads of tutorials on how to change your $PATH, but what that even means in the first place I just kinda had to figure out over the course of it all. It's actually pretty simple. Here's my attempt.

If you're coming from a Windows background, and you were in the habit of being really fussy about where you installed software on the hard drive, you may have just known how to fire up any old piece of software on your system. You navigated to the application in Windows Explorer and double clicked on it. It was really simple. That icon that you actually clicked on was the “executable”, which is to say the file that starts the whole show.

Unix, Linux, and Mac systems also have executables. On a Mac, it's (represented by) the icon you click on to start the app. When you start getting deeper into development and start using the command line more, you're eventually going to come across some installation instructions that advise you to “update your path” for some reason. They usually give you a copy and paste thing to go along with it. But what does it mean?

Let me give you an example. Here's my path on this laptop right here —

MacPro-JGrubb 福 /usr/local/etc/ansible ➤ e0db473|master✓
10165 ± : echo $PATH [23h53m]
/usr/local/heroku/bin:/Users/jgrubb/.rbenv/bin:/Users/jgrubb/.rbenv/shims:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11/bin:/Users/jgrubb/.composer/vendor/bin

Now, never mind that there's a ton of garbage in there, and why is Heroku at the beginning of the path like that? I don't even remember doing that. Anyway, when you first start your computer up, or when you first start a shell session, your environment fires up and loads a bunch of configs. One of the configs that it loads is the list of places to look for the aforementioned executables.

In a nutshell, your computer is going to look down that path from left to right. The different locations are separated by :, so that's a list of different locations on the computer that will be scanned to see that “that thing” is installed there. For instance – the MacIntosh comes preinstalled with Vim and Ruby. By default, stuff that comes with the OS is installed in /usr/bin. But, if you want a more recent version of Ruby, you might install it into /usr/local/bin (this is where Homebrew puts much of its stuff). If your path did not have /usr/local/bin before /usr/bin, you'd still be executing the system version of Ruby instead of the one you want. It's really simple, but again – took me a while.

So, how do you change it? Presuming you use Bash for a shell, you probably have a file called ~/.bashrc or possibly ~/.bash_profile. If you don't, you can safely create either of those files and put in a line like this —

export PATH=/usr/local/bin:/or/whatever/usually/bin:$PATH

This just says “hey, whatever my path is now, add those two directories to the beginning of it, and then assign that to be my new $PATH”.

Questions?

#devops #generaldevelopment

Prelude

Fastly is a CDN (content deliver network). A CDN makes your site faster by acting as a caching layer between the world wide web and your webserver. It does this by having a globally distributed network of servers and by using some DNS trickery to make sure that when someone puts in the address of your website they're actually requesting that page from the nearest server in that network.

$ host www.ecnmag.com
www.ecnmag.com is an alias for global.prod.fastly.net.
global.prod.fastly.net is an alias for global-ssl.fastly.net.
global-ssl.fastly.net is an alias for fallback.global-ssl.fastly.net.
fallback.global-ssl.fastly.net has address 23.235.39.184
fallback.global-ssl.fastly.net has address 199.27.76.185

If they don't have a copy of the requested page, they'll get it from your webserver and save it for the next time. Next time, they served the “cached version” which is way faster for your users and lightens the load on your webserver (since the request never even makes it to your webserver). Excellent writeup here.

There are many different CDN vendors out there – Akamai being the oldest and most expensive that you may have heard of. A new entrant into the market is a company called Fastly. Fastly has decided on using Varnish as the core of their system. They have some heavyweight Varnish talent on the team and have added a few extremely cool features to “vanilla” Varnish that I'll get to in a moment.

Fastly's being built on top of Varnish is cool, mainly because every CDN out there has some sort of configuration language and to throw your hat in with any of them is also to throw your hat in with their particular configuration language. Varnish has a well known config format called VCL (Varnish configuration language) which, on top of having plenty of documentation and users out there already, is also portable to other installations of Varnish so that learning it is time well spent. This is the killer Fastly feature that first drew me in.


(you can skip this – backstory, not technical)

Prior to using the CDN as our front line to the rest of the internet, we'd been on a traditional “n-tier” web setup. This meant that any request to one of our sites from anywhere in the world would have to travel to a single point – our load balancer in Ashburn, Virginia in this case – and then travel all the way back to wherever. In addition to this obvious global networking performance suck, we use a managed hosting vendor, so they actually own and control our load balancer. Any changes that we'd want to have made to our VCL – the front line of defense against the WWW – would have to go through a support-ticket-and-review process. This was a bottleneck in the event of DDos situations, or any change to our caching setup for any reason.

Taking control of our caching front line was a neccessary step. This became the second killer Fastly feature once we started piloting a few of our sites on Fastly.


The killer-est killer feature of all has only just become clear to me. Fastly makes use of a feature called “Surrogate Keys” to improve the typical time-based expiration strategy that we'd been using for years now. They have a wonderful pair of blog posts on the topic here and here.

The way that Varnish works is basically a big, fast key-value store. How keys are generated and subsequently looked up, as well as how their values are stored are all subject to alteration by VCL, so you have a wonderful amount of control over the default methodology. By default it's essentially URLs as keys, and server responses as values, and this will get you pretty far down the line, but where you bump into the limits is as soon as you start pondering that each response has but one key that references it. Conversely, each key references only one object. By default...

Real life example – I work for a publishing company. Our websites are not super complicated IA-wise. We have pieces of content and listing pages of that content, organized mostly by some sort of topic. A piece of content can have any number of topics attached to it, and that piece of content (heretofore referred to as a “node”) should show up on the listing pages for any one of those terms.

Out of the box, Fastly/Drupal works really well for individual nodes. Drupal has a module for Fastly that communicates with their API to purge content when it's updated, so if an editor changes something on the node they won't have to wait at all for their changes to be reflected to unauthenticated users. The same is not true for listing pages. Since these pages are collections of content and have no deeper awareness of the individual members of the collection, they function on a typical time-based expiration strategy.

My strategy for the months since we launched this across all of our sites has been to set TTLs (time to live, basically the length of time something will be cached) as high as I can until an editor complains that content isn't showing up where they want it to. I recently had an editor start riding me about this, so lowered the TTLs to values so low that I knew we weren't getting much benefit of even having caching in the first place. I'd known about this Surrogate Key feature and decided to start having a deeper look.


The ideal caching scenario would have not only the node purged when editors updated it, but to have listing pages purge when a piece of content is published that should show up on that listing. This is where Surrogate Keys come into play. The Surrogate-Key HTTP header is a “space delimited collection of cache keys that pertain to an object in the cache”. If a purge request is sent to Fastly's API to purge “test-key”, anything with “test-key” in the Surrogate-Key header should fall out of cache and be regenerated.

In essence, what this means is that you can associate an arbitrary key with more than one object in the cache. You could tag anything on the route “api/mobile” with a surrogate-key “mobile” and when you want to purge your mobile endpoints, purge them all with one call rather than having to loop through every endpoint individually. On those topic listing pages you could use the topic or topic ID as a surrogate-key, and then any time a piece of content with that topic is added or updated, you can send a purge to that topic ID and have that listing page dropped. And only that listing page dropped.

// the basic algorithm, NOT functional Drupal code

if ($listing_page->type == "topic") {
	$keys = [];
	
	// Topics can have children, so fetch them.
	// pretend this returns a perfect array of topic IDs
	$topics = get_term_children($listing_page->topic);
	// Push the parent topic into the array as well.
	$topics[] = $listing_page->topic;
	
	foreach($topics as $topic) {
	$keys[] = $topic;
	}
	
	$key = implode(" ", $keys);
	add_http_header('Surrogate-Key', $key);
}

This results in a topic listing page getting a header like this -

# the parent topic ID as well as any child topic IDs
Surrogate-Key: 3979 3980 3779

Then, upon the update or creation of a node you do something like this -

// this would be something like hook_node_insert if you're a Drupalist
function example_node_save_subscriber($node) {
 $fastly = new Fastly(FASTLY_SERVICE_ID, API_KEY);
 foreach($node->topics as $topic_id) {
 $fastly->purgeKey($topic_id);
 }
}

This fires off a Fastly API call for each topic on that node that would cause anything with that surrogate key, aka topic ID, to be purged. This would be any topic listing page with this topic ID on it. Obviously if there are 500 topics on any piece of content you'll probably want to move this to a background job so you don't kill something, but you get the idea.


This is sort of like chasing the holy grail of caching. In theory this means that you are turning the caching TTLs up to maximum and only expiring something when it actually needs to be expired based on user action and intent, not based on some arbitrary time that I decide on based on my lust for having everything as fast as possible. The marvelous side effect of this is that (again in theory) everything should load even faster since there's almost no superfluous generation of pages at all.

I just released the code on Friday morning, and the editor who was previously riding me about this topic had only positive feedback for me, meaning – so far, so good.


FYI – the holy grail actually looks more like this -

#braindump #varnish #drupal #devops

GUIs change, but the command line is eternal. Memorize these 5 commands and a long and happy life awaits.


$cd
change directory. This is how you move 
around the file system.

$ls
List, or tell me what's in this directory. This has a 
huge list of useful modifying flags, such as -l (long, 
tell me the size, ownership, and permissions on each 
thing in here too), or -a (all, as in, show me hidden 
dotfiles as well)

$mv
Move, this is how you move something from here 
to there. This is also how you rename something 
even if it's in the right place.

$cp
Copy. Add -r to make it recursive, else it 
won't copy directories because it won't 
descend into them.

$pwd
present working directory. this 
is how you get it to tell you where you are 
in the file system.

#bareminimum #devops