Ignored By Dinosaurs 🦕

I work with a guy. He's incredibly smart. He's the seniormost developer here, and if you need to learn something new and get something large done, he's the guy to do it. We basically dropped him off in the AWS jungle and told him to learn Hadoop and the entire Hadoop ecosystem for a data warehouse project and he did it.

I work with another guy. He's also incredibly smart. But he asks me for the answer before attempting to find it on his own more often than not. He's got a point when he says “it's a lot faster for me to just ask you rather than spend time trying to find it on my own”, because he's here to do a job after all. I get that. But the best analogy I can come up with is a spin on the old adage -

You can give a man a fish, and he eats for a day. You can teach a man to fish and he eats for a lifetime.

There's a third kind of person, though – the person who goes out and finds out about fishing on their own and then teaches themselves how to fish. This person will be your boss, and will always be employed.

#generaldevelopment #life

Because sometimes you need to roll out a bunch of taxonomy terms across 26 sites, and you just don't feel like clicking those buttons.


$terms = [
  'iReport',
  'Infographic',
  'Video',
  'Case Study',
  'Application Note',
  'Data Sheet',
];

$vocab = taxonomy_vocabulary_machine_name_load('vocab_machine_name');

foreach($terms as $term) {
  $t = new stdClass;
  $t-name = $term;
  $t->vid = $vocab->vid;
  taxonomy_term_save($t);
}

Save this to something like create_terms.php and then run it with drush!

$ drush @site scr path/to/create_terms.php

#drupal

Problemspace

  • You're working with a managed hosting provider and have begun to run out of room on the local/networked filesystem. Vendor wants to upsize your storage for a charge.
  • You've got several different environments that you're working with (local|dev|prod|etc) and syncing the filesystem between them is an annoying chore in 2016. The various methods out there of proxying these requests don't really excite you.

Solutionspace

How about moving the Drupal filesystem up to The Cloud? One of Amazon's earliest products in AWS was the Simple Storage Service (S3), and one of it's core usecases is serving public files like images for websites, removing the need for storage of the assets and the (admittedly minimal) compute resources to serve them.

We had both of the issues outlined above and have just completed a migration of all our files up to S3, so I thought I'd write down some discoveries.


s3fs module

This is a rather unfortunately named module, since there is another open source project out there with the exact same name. I knew of it first and so assumed that this module had something to do with that, but that's not the case. Btw, I learned this in Steven Merrill's excellent session at DrupalCon in May, check out the video if you're still with me.

In a nutshell, s3fs hijacks the Drupal filesystem. You can put both your public and private filesystems up there, simple to do since S3 has a very rich permissions feature set. Just don't deliberately make your private filesystem “public” and you're set. It then rewrites any URLs that would've been to assets on your local filesystem to point to their new location in S3.

The setup is pretty straightforward, so just a few observations.

  • For multisite, you need to override a few things, namely the default setting for “S3 root folder”. For our install we needed to separate each sites assets into site specific folders with the same S3 bucket, so we filled that setting in with a string unique to the site, something like “nameofsite.com”.
  • There are UI buttons for moving your local files up to S3, but the AWS CLI works *WAY* faster. There are a wealth of well-documented options to pass, but the gist is this —
aws s3 sync . s3://NAME_OF_BUCKET/nameofsite.com/s3fs-public/ --acl public-read
  • After moving the files up to S3, the module needs to be made aware of what files exist up there, so you'll need to refresh the file cache. If you forget to do this part and flip the switch to have s3fs take over the public filesystem, you'll see bad things.
  • With a multisite setup, I found it much easier to flip the switch that says “Do *not* rewrite JS/CSS URLs”. The downside of this is that I have to make sure that random assets in the Drupal filesystem (ie not within public://) also exist in S3, since so many CSS and JS files refer to assets by root relative paths. This is a hack, but that's life sometimes.
// from Drupal docroot
$ rsync -av --prune-empty-dirs --include='\*/' / 
--include='\*.jpg' --include='\*.png' --include='\*.svg' / 
--include="\*.js" --include="\*.css" --include="\*.gif" /
--include="\*.woff" --include="\*.ttf" --include="\*.map" /
 --exclude='\*' . ~/some/destination/dir

This says “gimme all those file types in the whole Drupal file tree and move them over to some other dir” that I can then use AWS CLI to sync up to S3. You should take the opportunity before running this to delete all your local public:// files, because they'll get sucked up in this command as well. You won't need them anymore after you do this migration anyway.

// from ~/some/destination/dir
$ aws s3 sync . s3://NAME_OF_BUCKET --acl public-read

All in all fairly simple, and in theory makes our setup much more portable between environments as well as vendors. Another excellent writeup of this module can be found here.

#drupal

How I very deliberately ended up in tech

This journey is fairly well documented on this blog, but I'll distill it down to a post, since I've answered this question a few times lately.

Early years

I started playing guitar and bass as a teenager, and upon arriving at college had to declare a major. I chose music since that's the only thing I could see devoting the majority of my time toward studying, and picked the Recording and Production track in the Music Industries Studies dept at Appalachian State University. Only later did I realize that my affinity for recording and production was due as much to my love for computers as my sterling ears.

After college I joined a band and spent 7 years on the road, traveling mostly all of North America, but occasionally the world too. I did not have a desire to spend my life on the road playing music, however. I always pictured moving from a music performance career into a music business career as the industry hadn't completely imploded yet. It was actually at the same party described in this post that I had the vision of where I wanted to be later in life, and it was in “the business”.

The road

So years went by and I had more or less learned everything I cared to know about the music business when I had another epiphany. This post chronicles that one, and it was about technology. It was on a long car ride that I discovered, truly discovered, the economy that was about to come. “Oh! This is going to be a thing!” I thought to myself that first day of the iPhone App Store. “A thing that if I learned it, I could continue to be my own boss, continue having a creative rewarding career, and probably even make enough money to feed my kids!”

So I went for it. I bought myself a Mac laptop in July of 2008 with the goal of teaching myself iPhone app programming. I had no idea what I was doing or even how to learn.

After the road

I got truly sick of my previous career in early 2009 and spent the rest of that year planning my exit. After quitting at the end of 2009 I had nothing but time on my hands to spend about 100 hours a week banging this stuff into my brain.

I had a contact that threw me some work and out of that came the first Drupal site I ever built. In 2010. 2010 was a rough year by every metric, but we got through it and by 2011 I had a new music gig and a full funnel of contract work. Life was looking up.

Child #3 and beyond

Life got turned upside down again with the arrival of son #3 in 2012. At this point I had no time to fill the contract funnel anymore and decided to take the dreaded “straight job”. I'd only intended to stay for about 6 months until the waves calmed down at home, but as it turned out I really like working with people, and the people at the job were really cool for the most part.

I've been here since, almost 4 years now. I've moved from mostly front end developer to mostly Chief Architect of this entire joint, since almost everything interests me on some level and my boss and I have a really symbiotic working relationship. I also (in my opinion) excel at seeing the big picture of the system and figuring out how to get it done.

#life

Because sometimes you need to roll out an image style across 26 websites, and dammit you just don't feel like dealing with Features.

php

/**
 \* Adds mobile_content_image style
 \*
 \* @param $sandbox
 \* @return bool
 \*/

function hook_update_N(&$sandbox) {
  $style = image_style_load('mobile_content_image');
  if (!$style) {
    $style = image_style_save([
      'name' = 'mobile_content_image',
      'label' => 'Mobile Content Image (500 x 250)'
    ]);
    $effect = [
      'name' => 'image_scale_and_crop',
      'data' => [
        'width' => '500',
        'height' => '250'
      ],
      'isid' => $style['isid'] // presumably returned by the call above?
    ];
    image_effect_save($effect);
  }
  return TRUE;
}

#drupal

Problemspace

I want to be able to link a set of posts together in an order. If there is a next post relative to the one I'm on, I want a button to show up that says “next post” and links to it. If there is a previous post relative to the one that I'm on, I want a button that says “previous post” and links back to it. Pretty simple, conceptually. Basically I want to reproduce parts of the Drupal book.module as minimally as possible.

So my first naive attempt was to add 2 ForeignKey fields to the Post model – “previous” and “next”.

class Post(models.Model):

	title = models.CharField(max_length=255)
	body = models.TextField()
	summary = models.TextField(null=True, blank=True)
	slug = models.SlugField(max_length=255)
	pub_date = models.DateTimeField('Published at')
	published = models.BooleanField()
	tags = models.ManyToManyField(Tag)
	created = models.DateTimeField(auto_now_add=True)
	updated = models.DateTimeField(auto_now=True)
	previous_post = models.ForeignKey(
		'self',
		related_name='previous_post',
		blank=True,
		null=True,
		on_delete=None
	)
	next_post = models.ForeignKey(
		'self',
		related_name='next_post',
		blank=True,
		null=True,
		on_delete=None
	)

This worked on the front end but immediately raised a stink alarm, for a couple of reasons.

  • You'd have to go and save this info twice for it to really work. Once on the current post and again on the referred post to link it back. == Workflow suck

  • The truth about this ordering would be stored in two places, so it'd be really easy to mess something up and get out of sync.

This is essentially a doubly-linked list if you're keeping score, with the encumbant maintenance problems.

So I thought to perhaps override the save() method in order to hook into the operation and automatically populate the correct field on the referred item, but then of course, I'd have to do all kinds of gymnastics to watch for if that field were to be removed at some point and remove the corresponding field on the referred item, etc. I mean, it's a blog who gives a shit, but I've been doing this for long enough now that I can't help myself.

Another option in this same vein is to use the Django “signals” subsystem to hook into the same functionality, but the smell remains.

After coming home from DrupalCon it occurred to me that really all I need is the one pointer, since I should be able to derive the pointer back. I just had to figure out how to do it...

This is a pretty obvious use case – automatically deriving any pointers back to the current item. It just requires one extra DB query to ask “give me any items where the previouspostid is this item's id”.

The key is the related_name argument to the model.

I think this is automatically set for a normal ForeignKey field, but on models where the foreign key points back to the same model it's required. Judging from the docs, I was trying all manner of post.post_set, etc but it's actually just post.previous_post, which is counter-intuitive since what you're actually getting back from that is the “next” post. I chose to keep the “previous” field since you could just add the previous post as you're authoring the current one.

Current post model looks like this —

class Post(models.Model):

	title = models.CharField(max_length=255)
	body = models.TextField()
	summary = models.TextField(null=True, blank=True)
	slug = models.SlugField(max_length=255)
	pub_date = models.DateTimeField('Published at')
	published = models.BooleanField()
	tags = models.ManyToManyField(Tag)
	created = models.DateTimeField(auto_now_add=True)
	updated = models.DateTimeField(auto_now=True)
	previous = models.OneToOneField(
		'self',
		related_name='previous_post',
		blank=True,
		null=True,
		on_delete=None
	)

And the prev/next fields look like this —

{% if post.previous %}
 
[← previous: {{ post.previous.title }}]({% url )

{% endif %}
{% with next_post=post.previous_post %}
 {% if next_post %}
 
[next: {{ next_post.title }} →]({% url )


 {% endif %}
{% endwith %}

note

This might not technically be a linked list in the strictest sense, since a singly-linked list has pointers to the next node in the chain. I've implemented it here as a “previous” pointer, since it makes more sense in the edit workflow. Since it makes more sense, hopefully we'll make more cents!

Stay tuned for the next episode where I decide that I'd like to have a Table of Contents and rip this whole thing out and do it over again.

#generaldevelopment #django

This is an interweaving of Four Kitchens' Varnish 3 VCL and this generic Varnish 4 VCL.


vcl 4.0;
# Based on: https://github.com/mattiasgeniar/varnish-4.0-configuration-templates/blob/master/default.vcl

import std;
import directors;

backend server1 { # Define one backend
 .host = "127.0.0.1"; # IP or Hostname of backend
 .port = "8080"; # Port Apache or whatever is listening
 .max_connections = 300; # That's it

 .probe = {
 #.url = "/"; # short easy way (GET /)
 # We prefer to only do a HEAD /
 .request =
 "HEAD / HTTP/1.1"
 "Host: localhost"
 "Connection: close"
 "User-Agent: Varnish Health Probe";

 .interval = 5s; # check the health of each backend every 5 seconds
 .timeout = 1s; # timing out after 1 second.
 .window = 5; # If 3 out of the last 5 polls succeeded the backend is considered healthy, otherwise it will be marked as sick
 .threshold = 3;
 }

 .first_byte_timeout = 300s; # How long to wait before we receive a first byte from our backend?
 .connect_timeout = 5s; # How long to wait for a backend connection?
 .between_bytes_timeout = 2s; # How long to wait between bytes received from our backend?
}

/\*acl purge {
 # ACL we'll use later to allow purges
 "localhost";
 "127.0.0.1";
 "::1";
}\*/


/\*acl editors {
 # ACL to honor the "Cache-Control: no-cache" header to force a refresh but only from selected IPs
 "localhost";
 "127.0.0.1";
 "::1";
}\*/

sub vcl_init {
 # Called when VCL is loaded, before any requests pass through it.
 # Typically used to initialize VMODs.

 new vdir = directors.round_robin();
 vdir.add_backend(server1);
 # vdir.add_backend(server...);
 # vdir.add_backend(servern);
}

sub vcl_recv {
 # Called at the beginning of a request, after the complete request has been received and parsed.
 # Its purpose is to decide whether or not to serve the request, how to do it, and, if applicable,
 # which backend to use.
 # also used to modify the request
 call ban_list;
 #set req.url = std.tolower(req.url);

 set req.backend_hint = vdir.backend(); # send all traffic to the vdir director

 # Normalize the header, remove the port (in case you're testing this on various TCP ports)
 set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");

 # Normalize the query arguments
 set req.url = std.querysort(req.url);

 # Allow purging
 if (req.method == "PURGE") {
/\* if (!std.ip(req.http.X-Forwarded-For, "0.0.0.0") ~ purge) { # purge is the ACL defined at the begining
 # Not from an allowed IP? Then die with an error.
 return (synth(405, "This IP - " + std.ip(req.http.X-Forwarded-For, "0.0.0.0") + " is not allowed to send PURGE requests."));
 }\*/
 # If you got this stage (and didn't error out above), purge the cached result
 return (purge);
 }

 # Only deal with "normal" types
 if (req.method != "GET" &&
 req.method != "HEAD" &&
 req.method != "PUT" &&
 req.method != "POST" &&
 req.method != "TRACE" &&
 req.method != "OPTIONS" &&
 req.method != "PATCH" &&
 req.method != "DELETE") {

 return (pipe);
 }

 if (req.url ~ "^/status\.php$" ||
 req.url ~ "^/update\.php$" ||
 req.url ~ "^/admin$" ||
 req.url ~ "^/admin/.\*$" ||
 req.url ~ "^/flag/.\*$" ||
 req.url ~ "^.\*/ajax/.\*$" ||
 req.url ~ "^.\*/ahah/.\*$") {
 return (pass);
 }

 # Implementing websocket support (https://www.varnish-cache.org/docs/4.0/users-guide/vcl-example-websockets.html)
 if (req.http.Upgrade ~ "(?i)websocket") {
 return (pipe);
 }

 # Drupal's batch mode will behave in a funky manner since all cookies except
 # for the session get stripped out below. This makes batch fall into 
 # op=do_nojs mode, which isn't really needed. Just get Varnish out of the way.
 if (req.url ~ "(^/batch)") {
 return (pipe);
 }

 # Only cache GET or HEAD requests. This makes sure the POST requests are always passed.
 if (req.method != "GET" && req.method != "HEAD") {
 return (pass);
 }

 # Strip hash, server doesn't need it.
 if (req.url ~ "\#") {
 set req.url = regsub(req.url, "\#.\*$", "");
 }

 # Strip a trailing ? if it exists
 if (req.url ~ "\?$") {
 set req.url = regsub(req.url, "\?$", "");
 }

 # Some generic cookie manipulation, useful for all templates that follow
 # Remove the "has_js" cookie

 if (req.http.Cookie) {
 # 1. Append a semi-colon to the front of the cookie string.
 # 2. Remove all spaces that appear after semi-colons.
 # 3. Match the cookies we want to keep, adding the space we removed
 # previously back. (\1) is first matching group in the regsuball.
 # 4. Remove all other cookies, identifying them by the fact that they have
 # no space after the preceding semi-colon.
 # 5. Remove all spaces and semi-colons from the beginning and end of the
 # cookie string.
 set req.http.Cookie = ";" + req.http.Cookie;
 set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); 
 set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|SSESS[a-z0-9]+|NO_CACHE)=", "; \1=");
 set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]\*", "");
 set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
 
 if (req.http.Cookie == "") {
 # If there are no remaining cookies, remove the cookie header. If there
 # aren't any cookie headers, Varnish's default behavior will be to cache
 # the page.
 unset req.http.Cookie;
 }
 else {
 # If there is any cookies left (a session or NO_CACHE cookie), do not
 # cache the page. Pass it on to Apache directly.
 return (pass);
 }
 }

 if (req.http.Cache-Control ~ "(?i)no-cache") {
 #if (req.http.Cache-Control ~ "(?i)no-cache" && client.ip ~ editors) { # create the acl editors if you want to restrict the Ctrl-F5
 # http://varnish.projects.linpro.no/wiki/VCLExampleEnableForceRefresh
 # Ignore requests via proxy caches and badly behaved crawlers
 # like msnbot that send no-cache with every request.
 if (! (req.http.Via || req.http.User-Agent ~ "(?i)bot" || req.http.X-Purge)) {
 #set req.hash_always_miss = true; # Doesn't seems to refresh the object in the cache
 return(purge); # Couple this with restart in vcl_purge and X-Purge header to avoid loops
 }
 }

 # Large static files are delivered directly to the end-user without
 # waiting for Varnish to fully read the file first.
 # Varnish 4 fully supports Streaming, so set do_stream in vcl_backend_response()
 if (req.url ~ "^[^?]\*\.(7z|avi|bz2|flac|flv|gz|mka|mkv|mov|mp3|mp4|mpeg|mpg|ogg|ogm|opus|rar|tar|tgz|tbz|txz|wav|webm|xz|zip)(\?.\*)?$") {
 unset req.http.Cookie;
 return (hash);
 }

 # Remove all cookies for static files
 # A valid discussion could be held on this line: do you really need to cache static files that don't cause load? Only if you have memory left.
 # Sure, there's disk I/O, but chances are your OS will already have these files in their buffers (thus memory).
 # Before you blindly enable this, have a read here: https://ma.ttias.be/stop-caching-static-files/
 if (req.url ~ "^[^?]\*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.\*)?$") {
 unset req.http.Cookie;
 return (hash);
 }

 # Send Surrogate-Capability headers to announce ESI support to backend
 set req.http.Surrogate-Capability = "key=ESI/1.0";

 if (req.http.Authorization) {
 # Not cacheable by default
 return (pass);
 }

 return (hash);
}

sub vcl_pipe {
 # Called upon entering pipe mode.
 # In this mode, the request is passed on to the backend, and any further data from both the client
 # and backend is passed on unaltered until either end closes the connection. Basically, Varnish will
 # degrade into a simple TCP proxy, shuffling bytes back and forth. For a connection in pipe mode,
 # no other VCL subroutine will ever get called after vcl_pipe.

 # Note that only the first request to the backend will have
 # X-Forwarded-For set. If you use X-Forwarded-For and want to
 # have it set for all requests, make sure to have:
 # set bereq.http.connection = "close";
 # here. It is not set by default as it might break some broken web
 # applications, like IIS with NTLM authentication.

 # set bereq.http.Connection = "Close";

 # Implementing websocket support (https://www.varnish-cache.org/docs/4.0/users-guide/vcl-example-websockets.html)
 if (req.http.upgrade) {
 set bereq.http.upgrade = req.http.upgrade;
 }

 return (pipe);
}

sub vcl_pass {
 # Called upon entering pass mode. In this mode, the request is passed on to the backend, and the
 # backend's response is passed on to the client, but is not entered into the cache. Subsequent
 # requests submitted over the same client connection are handled normally.

 # return (pass);
}

# The data on which the hashing will take place
sub vcl_hash {
 # Called after vcl_recv to create a hash value for the request. This is used as a key
 # to look up the object in Varnish.

 hash_data(req.url);

 if (req.http.host) {
 hash_data(req.http.host);
 } else {
 hash_data(server.ip);
 }

 # hash cookies for requests that have them
 if (req.http.Cookie) {
 hash_data(req.http.Cookie);
 }
}

sub vcl_hit {
 # Called when a cache lookup is successful.

 if (obj.ttl >= 0s) {
 # A pure unadultered hit, deliver it
 return (deliver);
 }

 # https://www.varnish-cache.org/docs/trunk/users-guide/vcl-grace.html
 # When several clients are requesting the same page Varnish will send one request to the backend and place the others on hold while fetching one copy from the backend. In some products this is called request coalescing and Varnish does this automatically.
 # If you are serving thousands of hits per second the queue of waiting requests can get huge. There are two potential problems - one is a thundering herd problem - suddenly releasing a thousand threads to serve content might send the load sky high. Secondly - nobody likes to wait. To deal with this we can instruct Varnish to keep the objects in cache beyond their TTL and to serve the waiting requests somewhat stale content.

# if (!std.healthy(req.backend_hint) && (obj.ttl + obj.grace > 0s)) {
# return (deliver);
# } else {
# return (miss);
# }

 # We have no fresh fish. Lets look at the stale ones.
 if (std.healthy(req.backend_hint)) {
 # Backend is healthy. Limit age to 10s.
 if (obj.ttl + 10s > 0s) {
 #set req.http.grace = "normal(limited)";
 return (deliver);
 } else {
 # No candidate for grace. Fetch a fresh object.
 return(miss);
 }
 } else {
 # backend is sick - use full grace
 if (obj.ttl + obj.grace > 0s) {
 #set req.http.grace = "full";
 return (deliver);
 } else {
 # no graced object.
 return (miss);
 }
 }

 # fetch & deliver once we get the result
 return (miss); # Dead code, keep as a safeguard
}

sub vcl_miss {
 # Called after a cache lookup if the requested document was not found in the cache. Its purpose
 # is to decide whether or not to attempt to retrieve the document from the backend, and which
 # backend to use.

 return (fetch);
}

# Handle the HTTP request coming from our backend
sub vcl_backend_response {
 # Called after the response headers has been successfully retrieved from the backend.
 # set beresp.http.X-Backend = beresp.backend.name;

 # Pause ESI request and remove Surrogate-Control header
 if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
 unset beresp.http.Surrogate-Control;
 set beresp.do_esi = true;
 }

 # Enable cache for all static files
 # The same argument as the static caches from above: monitor your cache size, if you get data nuked out of it, consider giving up the static file cache.
 # Before you blindly enable this, have a read here: https://ma.ttias.be/stop-caching-static-files/
 if (bereq.url ~ "^[^?]\*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.\*)?$") {
 unset beresp.http.set-cookie;
 }

 # Large static files are delivered directly to the end-user without
 # waiting for Varnish to fully read the file first.
 # Varnish 4 fully supports Streaming, so use streaming here to avoid locking.
 if (bereq.url ~ "^[^?]\*\.(7z|avi|bz2|flac|flv|gz|mka|mkv|mov|mp3|mp4|mpeg|mpg|ogg|ogm|opus|rar|tar|tgz|tbz|txz|wav|webm|xz|zip|csv)(\?.\*)?$") {
 unset beresp.http.set-cookie;
 set beresp.do_stream = true; # Check memory usage it'll grow in fetch_chunksize blocks (128k by default) if the backend doesn't send a Content-Length header, so only enable it for big objects
 set beresp.do_gzip = false; # Don't try to compress it for storage
 }

 # Sometimes, a 301 or 302 redirect formed via Apache's mod_rewrite can mess with the HTTP port that is being passed along.
 # This often happens with simple rewrite rules in a scenario where Varnish runs on :80 and Apache on :8080 on the same box.
 # A redirect can then often redirect the end-user to a URL on :8080, where it should be :80.
 # This may need finetuning on your setup.
 #
 # To prevent accidental replace, we only filter the 301/302 redirects for now.
 if (beresp.status == 301 || beresp.status == 302) {
 set beresp.http.Location = regsub(beresp.http.Location, ":[0-9]+", "");
 }

 # Set 2min cache if unset for static files
 if (beresp.ttl <= 0s || beresp.http.Set-Cookie || beresp.http.Vary == "\*") {
 set beresp.ttl = 120s; # Important, you shouldn't rely on this, SET YOUR HEADERS in the backend
 set beresp.uncacheable = true;
 return (deliver);
 }

 # Don't cache 50x responses
 if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503 || beresp.status == 504) {
 return (abandon);
 }

 # Allow stale content, in case the backend goes down.
 # make Varnish keep all objects for 6 hours beyond their TTL
 set beresp.grace = 6h;

 return (deliver);
}

# The routine when we deliver the HTTP request to the user
# Last chance to modify headers that are sent to the client
sub vcl_deliver {
 # Called before a cached object is delivered to the client.

 if (obj.hits > 0) { # Add debug header to see if it's a HIT/MISS and the number of hits, disable when not needed
 set resp.http.X-Cache = "HIT";
 } else {
 set resp.http.X-Cache = "MISS";
 }

 # Please note that obj.hits behaviour changed in 4.0, now it counts per objecthead, not per object
 # and obj.hits may not be reset in some cases where bans are in use. See bug 1492 for details.
 # So take hits with a grain of salt
 set resp.http.X-Cache-Hits = obj.hits;

 # Remove some headers: PHP version
 unset resp.http.X-Powered-By;

 # Remove some headers: Apache version & OS
 unset resp.http.Server;
 unset resp.http.X-Drupal-Cache;
 unset resp.http.X-Varnish;
 unset resp.http.Via;
 unset resp.http.Link;
 unset resp.http.X-Generator;

 return (deliver);
}

sub vcl_purge {
 # Only handle actual PURGE HTTP methods, everything else is discarded
 if (req.method != "PURGE") {
 # restart request
 set req.http.X-Purge = "Yes";
 return(restart);
 }
}

sub vcl_synth {
 if (resp.status == 720) {
 # We use this special error status 720 to force redirects with 301 (permanent) redirects
 # To use this, call the following from anywhere in vcl_recv: return (synth(720, "http://host/new.html"));
 set resp.http.Location = resp.reason;
 set resp.status = 301;
 return (deliver);
 } elseif (resp.status == 721) {
 # And we use error status 721 to force redirects with a 302 (temporary) redirect
 # To use this, call the following from anywhere in vcl_recv: return (synth(720, "http://host/new.html"));
 set resp.http.Location = resp.reason;
 set resp.status = 302;
 return (deliver);
 }

 return (deliver);
}


sub vcl_fini {
 # Called when VCL is discarded only after all requests have exited the VCL.
 # Typically used to clean up VMODs.

 return (ok);
}

#varnish

So I gave a presentation at Drupaldelphia a few weeks ago about the Paragraphs module.

The Paragraphs module is my favorite Drupal module that I've come across in probably the last 5 years. It's basically Drupal's implementation of the concept of “structured content” – one of those terms that sounds so abstract that you probably feel an unconscious repulsion to even learning more about the idea, but hopefully I can help get you over that.


The problem

The problem is the dreaded *body field*. The body field is (historically) basically the dumping ground for everything that is going into a piece of content on the website. For sites like this blog, made up of 99.8% text, it works fabulously well and I suspect that in the early days of blogging and the internet most content that went into some kind of CMS was modeled in this way. You're reading a body field right now. There were undoubtedly some images placed in with the text, but anything really fancy or custom was most likely coded by hand, outside of the CMS.

Things went this way for a number of years and as CMSs like Drupal and Wordpress continued to gain popularity and more and more people began to use them to run their websites, more and more “things” began to wander into the body field. I'd very much like to add some images to this post, for example, but it's actually kind of a PITA to do it in a reliable way.

One day some dudes invented a website where the whole world could post and share videos, then they let you embed those videos into other web pages. So now the body field has to accommodate text, images, and video embeds.

The slideshow was born. “Why can't I put a slideshow into my article?!” became a battle cry from legions of downtrodden District 12 editors. “Imgur lets me create slideshows!”

“Data journalism” comes along, and with it a thousand fancy infographics from your internal production teams and 3rd party tools alike, distributed via iframes and js snippets and holy shit letting our users embed javascript is suicide, right??

The Twitter card embed. I'll stop there.

Soundcloud. Every other media site with their own custom video player. Imgur. Flickr. Hubspot. Disqus.

Some (crappy) solutions

This is a problem for a number of reasons. The most immediate issue that this causes is that unless all your editors know how to write perfect HTML, you're going to be stuck with The Wysiwyg. Wysiwygs have come a pretty long way in the last couple years (a few of them anyway), but I don't know of any serious Wysiwyg solution out there that is able to keep pace with the number of new “things” showing up on the internet. Our editors want to put these things in their content in a way that will effectively keep them from breaking the site, and it's our job to give that to them somehow.

The most evolved solution to this problem is the one that Wordpress came up with, and Jeff Eaton espoused last year in his DrupalCon Talk “The Battle for the Body Field”, basically – shortcodes. This approach allows for a lot of editor creativity which should be a primary goal of our solution, but puts some guardrails up so that we're not constantly fielding tickets about a broken article.

So to recap, here are the most commonly employed solutions to this problem

  • Don't let them put anything in there (Markdown)
  • Let them put everything in there (HTML)
  • Let them put almost anything in there, but try and keep them from blowing our leg off (shortcodes)

And yet

None of these addresses a fundamental thing that we should care about – reuse. Once you put something in the body field, it's essentially in the content roach motel, and it's never checking out. Your system can't have any awareness of what's inside that field, so unless someone manages to get to exactly that article where you used that image or that tweet, it's never going to be seen again.

There is another way though. Imagine being able to create a feed of images that were used in articles on your site that day. Imagine being able to grab all the twitter cards that were used in articles that were tagged to Cats. Or being able to easily add rich, multi-field captions to images without having to bend over backwards.

Structured Content

So if you take a step back and think about it, a piece of content on your website is often a fairly unstructured piece of work, but it can be broken down into a collection of pieces that are themselves very structured.

Take an image with caption. Trying to do this in the Wysiwyg frequently involves adding the caption to either the title or alt attribute and then using javascript to pull that out, build a DOM element out of it, and insert it somewhere in the vicinity of the image. What happens if you also need an attribution field in addition to the caption, though? That's the instant things start getting weird, and often we give the editor some unsatisfactory answer and they slink off to solve the issue in some unsatisfactory way.

But really, what if we treat that image/caption as it's own entity? Then you have an entity with an image field and a caption field. If you want to add an attribution field, that's very easy in this model – you just add an attribution field. Or a URL. Or a date.

Something with a few more moving parts – how about that image gallery? Well, another entity for starters, but make it so you can add any number of images to the entity and presto. Since our system is aware of the kind of entity that you're using here, it's trivial to wrap it in the CSS classes needed to pull off an image gallery.

So essentially, rather than your content being something like Title/Summary/Body/Image for this/Image for that you end up with something more like Title/Summary/Collection of individual entities that make up the body of the article. Those individual entities are pretty easy to manage in themselves, since they're highly predictable. You just need some mechanism for relating them into the article that they live in and making sure they display in the right order. Once you do that though, you're not bound strictly by the article model anymore. You can use those entities in other ways as well.

This article got waaaay longer than I intended, so I'll get into Drupal's answer to this issue in the next one. As far as I know, this concept has existed in the CMS world for a very long time, but Drupal is the only platform that I know of that actually has an implementation of this concept in the Paragraphs module. Until then.

#workflow #drupal

See the previous chapter on installing Drupal.


Hi there, and congrats on making it this far. You should be looking at a screen that looks like this —

Drupal Welcome Screen

Congrats, you've just set up a website with arguably the most advanced CMS in the world!


Creating Content

This is what a Content Mnagament System is for after all. If you followed the “standard install”, then you have a couple of different choices. In Drupal parlance, these are called “Content Types” and you have two of them so far – Basic Page and Article.

So what are content types?

I like to think of content types as the “things” on your website. These can be articles on a blog site, items in your catalog on an e-commerce site, pets for adoption on the local SPCA site, or really anything that you want to post on your site. In Drupal parlance we call them “Nodes”. Node was chosen for it's deliberate vagueness, just go with it for now. Those different types of content that you want to put on your site – blog posts, pets – are appropriately known as “content types”. You can make as many of them as you want, they're just one way to categorize your content into the same kind of “thing”.

So let's create a new article. Feel free to explore the admin menu, which should be visible at this point or you can just head to http://localhost:8888/node/add. That's where we get started. I'm assuming no prior web development experience, but I am assuming that you've probably at least uploaded an image or two, and filled in some forms on the web before. That's all you have to do here.

Drupal 8's got some nice new options with some nice new polish for folks who are creating the content, but we're not going to get into that now. Let's get our hands dirty.

The Drupalnoobs conference

This conference will be for newcomers to Drupal and will feature lots of session about all aspects of getting started with Drupal. The sessions will be lead by experienced and established Drupal developers who will present some pretty awesome material. The sessions will be recorded and put up on YouTube somewhere, so after the conference is done we'd like to put the videos up on the website. We'd like to group all the sessions into different tracks like “design” and “business” and “completely new to all of this”.

Naturally, we need a website to hold all of this stuff so that's what we're going to build and hopefully come out on the other side with a bit more grounding in how to get things done with Drupal.

#drupal

See the previous post in this series on getting started with Drupal


So welcome back, this is actually the most challenging part of this tutorial – installing Drupal. I'm assuming no prior web development experience, so the first part will be installing something to run your tutorial Drupal site on.

For this we'll be using a project called MAMP. MAMP is a package of software that makes it easy to set up Drupal (but not just Drupal) on your computer. I'll skip the deeper details for now, but head over to MAMP and download it. You can stick with the free version for now.

note: There are many basically equivalent packages out there for this purpose – XAMPP, Acquia Dev Desktop. If you'd prefer one of those feel free, but I'll be using MAMP because it's very simple and is what I got started with.

What is MAMP?

This section can be safely skipped if you don't care.

MAMP is basically the same package of software that runs on any webhosting server, think Dreamhost or GoDaddy. It stands for (M)y (A)pache, (M)ySQL, and (P)hp. These are the three things that you need to make Drupal work.

Apache is a “webserver”. The webserver piece is the one that sits there and listens for incomcing requests from someone's browser (or bot). When a request comes in from somewhere, Apache figures out what that request is asking for and routes it to the correct item. It could be an image, which is very simple because it's just a file sitting on a disk. In this case, Apache grabs the file and returns it. This is called “the Request/Response” cycle, and is pretty much the slab that the internet was built on.

Sometimes however, the request is asking for a webpage in Drupal. This case is a bit more complex, and that's when PHP and MySQL come into play. PHP is a programming language. You might know that already, but Drupal is mostly written in PHP. That's why you need it for this tutorial. When you create a new page on your new site, the content of that page gets stored not as a file on a disk, but as a row in a database. MySQL is an exceptionally popular database, and the one that we'll us for this tutorial.

On to the show

Hopefully MAMP has finished downloading by this point. Go through the installer, and when you get done you should be able to open MAMP in the same way that you'd open any other application. Once it starts up, you should have a screen that looks basically like this —

Mamp opening screen

Once you click “Start Servers” you've done it! You've built your first webserver stack!

Click into “Preferences”.

Under the preferences option, you'll get some options to twiddle with. Don't twiddle with them, at least not yet. Option names will vary, but you're looking for the rightmost tab, it's either called “Apache” or “Webserver” in the most recent versions. Under that tab will be a most pertinent piece of information – the “Document Root”.

[!info] Document Root – where the webserver will look for the files that it's trying to serve.

In a nutshell, once we download Drupal, we're going to put all it's files in that directory so make a note of where that directory is!

Downloading Drupal!

Head on over to Drupal.org and download Drupal! At the time of this writing, that giant green button takes you to another screen where you are presented with ah choice. I was hoping to shield my readers from this, but if you're going to learn Drupal I guess now's as good a time as any to explain why this choice exists at all.

You may skip all of this.


An interlude

Drupal has been around for nigh 12 years at this point. It was started in a Dutch kid's dorm room as more or less a message board for that dorm. Early in life it embraced the open source model for development, which means that other kids in his dorm were able to hack on it and add to it and improve on it and make it better for everybody.

Many years later Drupal was running some of the largest websites on the internet, and while it had been added on to and improved by thousands of developers by that point you could still find some of that 12 year old dorm code if you looked in the right places. Many people, your author included, felt disbelief that such code could be responsible for so much, yet at the same time took great comfort and pride that really anyone could learn this stuff just by following this code around. There truly was nothing really fancy about Drupal's codebase for a great many years. A few really smart patterns up front, followed diligently for years, and the rest is early internet history.

But time marches on, and with it evolution. Standards in computer engineering, common patterns for solving common problems, and much more complex needs on the web necessitated engaging with the wider PHP ecosystem. After all, the Easter Islands were once thriving communities, yet after time they thrived themselves right out of existence. Drupal wanted to avoid such a fate, so a decision was made in 2011 to replace some key pieces of Drupal's internal code with more modern code from a well known PHP framework – Symfony.

This made a heck of a lot of sense. Much of Drupal's aforementioned dorm code had very interesting, almost paleological qualities about the way that it solved problems as if “this was how our ancestors built a fire before we had matches”, and newcomers to Drupal that *did* have a background in software development were often left scratching their heads to some of the decisions. In a nutshell, learning Drupal was easier for newcomers to web development than it was for established developers.

Thus the rather controversial decision was made to standardize some of the very deepest parts of Drupal – those dealing with the “request/response” cycle.

Thus began a process that took 5 years and involved an almost complete rewrite of Drupal. This is both shocking and obvious in hindsight, since a complete rewrite is something you never, ever, ever want to do with a software project, yet once you modernize a piece of a system, the rest of the system looks that much more archaic.

The good part – Drupal is a modern and really impressive piece of software engineering, and includes many more features in the standard install that you're going to want on your site than previous versions. It's much more “batteries included” than older versions that required you to download and install lots of add ons to get it to do the things you really wanted it to do.

The bad part – much of the code that has been written by folks like you and me over the last decade doesn't work anymore. This is kinda brutal, but such is evolution. It also opens up something of a goldmine for new development opportunities within the Drupal ecosystem, but with that comes that learning to code for Drupal 8 will be a much different experience if you are new to building software. It'll require you to know what you're doing, which I most certainly didn't when I was learning Drupal (6).

The other good part – this entire tutorial can be done now with Drupal 8.

So go ahead and download Drupal 8, but once you decide that Drupal is, in fact, for you you'll probably revisit this topic.

Back to Drupal

So you've downloaded Drupal 8 – unzip it. You'll have a bunch of files and folders that look like this inside the newly unzipped directory -

Downloads/drupal-8.0.5 [ tree -L 1 ] 4:50 PM
.
├── LICENSE.txt
├── README.txt
├── autoload.php
├── composer.json
├── composer.lock
├── core
├── example.gitignore
├── index.php
├── modules
├── profiles
├── robots.txt
├── sites
├── themes
├── update.php
├── vendor
└── web.config

6 directories, 10 files

All those files go in “The Docroot” – which is the path that you noted earlier in your MAMP preferences under Apache/Webserver/whatever. It'll end in htdocs, so something like /Applications/MAMP/htdocs if you're on a Mac, or whatever that screen says if you're not.

The big payoff

Something always goes funny with people's computers, but at this point you should be able to navigate your browser to localhost:8888 and be greeted with the Drupal installation screen.

Drupal 8 install screen

We're going to be choosing all the defaults for this tutorial, click through the language and the next option is for “installation profile”, just choose Standard.

The next screen – “System Requirements” – is the tricky one. Ask below in the comments and we'll try to debug it together if you aren't allowed through. MAMP should have all this sorted out for you already, though so soldier on.

The next and basically final step is to give Drupal the connection credentials to your MySQL database. Those can be found on the welcome webpage if you click that middle button in MAMP. That'll take you to a screen that tells you for sure, but it should be something like

user: root
pass: root
host: localhost
(open up the advanced options)
port: 8889
(leave the table prefix empty)

At this point, you're in. You've installed Drupal. There is one more configuration screen that you can plug all the answers into on your own.

Save and continue on to the fun part of the tutorial!

#drupal