drupal

Migrating the Drupal public filesystem to S3

June 15, 2016

Problemspace

You're working with a managed hosting provider and have begun to run out of room on the local/networked filesystem. Vendor wants to upsize your storage for a charge.
You've got several different environments that you're working with (local|dev|prod|etc) and syncing the filesystem between them is an annoying chore in 2016. The various methods out there of proxying these requests don't really excite you.

Solutionspace

How about moving the Drupal filesystem up to The Cloud? One of Amazon's earliest products in AWS was the Simple Storage Service (S3), and one of it's core usecases is serving public files like images for websites, removing the need for storage of the assets and the (admittedly minimal) compute resources to serve them.

We had both of the issues outlined above and have just completed a migration of all our files up to S3, so I thought I'd write down some discoveries.

s3fs module

This is a rather unfortunately named module, since there is another open source project out there with the exact same name. I knew of it first and so assumed that this module had something to do with that, but that's not the case. Btw, I learned this in Steven Merrill's excellent session at DrupalCon in May, check out the video if you're still with me.

In a nutshell, s3fs hijacks the Drupal filesystem. You can put both your public and private filesystems up there, simple to do since S3 has a very rich permissions feature set. Just don't deliberately make your private filesystem “public” and you're set. It then rewrites any URLs that would've been to assets on your local filesystem to point to their new location in S3.

The setup is pretty straightforward, so just a few observations.

For multisite, you need to override a few things, namely the default setting for “S3 root folder”. For our install we needed to separate each sites assets into site specific folders with the same S3 bucket, so we filled that setting in with a string unique to the site, something like “nameofsite.com”.
There are UI buttons for moving your local files up to S3, but the AWS CLI works *WAY* faster. There are a wealth of well-documented options to pass, but the gist is this —

aws s3 sync . s3://NAME_OF_BUCKET/nameofsite.com/s3fs-public/ --acl public-read

After moving the files up to S3, the module needs to be made aware of what files exist up there, so you'll need to refresh the file cache. If you forget to do this part and flip the switch to have s3fs take over the public filesystem, you'll see bad things.
With a multisite setup, I found it much easier to flip the switch that says “Do *not* rewrite JS/CSS URLs”. The downside of this is that I have to make sure that random assets in the Drupal filesystem (ie not within public://) also exist in S3, since so many CSS and JS files refer to assets by root relative paths. This is a hack, but that's life sometimes.

// from Drupal docroot
$ rsync -av --prune-empty-dirs --include='\*/' / 
--include='\*.jpg' --include='\*.png' --include='\*.svg' / 
--include="\*.js" --include="\*.css" --include="\*.gif" /
--include="\*.woff" --include="\*.ttf" --include="\*.map" /
 --exclude='\*' . ~/some/destination/dir

This says “gimme all those file types in the whole Drupal file tree and move them over to some other dir” that I can then use AWS CLI to sync up to S3. You should take the opportunity before running this to delete all your local public:// files, because they'll get sucked up in this command as well. You won't need them anymore after you do this migration anyway.

// from ~/some/destination/dir
$ aws s3 sync . s3://NAME_OF_BUCKET --acl public-read

All in all fairly simple, and in theory makes our setup much more portable between environments as well as vendors. Another excellent writeup of this module can be found here.

#drupal

Programmatically creating taxonomy terms in Drupal

June 15, 2016

Because sometimes you need to roll out a bunch of taxonomy terms across 26 sites, and you just don't feel like clicking those buttons.


$terms = [
  'iReport',
  'Infographic',
  'Video',
  'Case Study',
  'Application Note',
  'Data Sheet',
];

$vocab = taxonomy_vocabulary_machine_name_load('vocab_machine_name');

foreach($terms as $term) {
  $t = new stdClass;
  $t-name = $term;
  $t->vid = $vocab->vid;
  taxonomy_term_save($t);
}

Save this to something like create_terms.php and then run it with drush!

$ drush @site scr path/to/create_terms.php

#drupal

Programmatically creating image styles in Drupal

June 7, 2016

Because sometimes you need to roll out an image style across 26 websites, and dammit you just don't feel like dealing with Features.

php

/**
 \* Adds mobile_content_image style
 \*
 \* @param $sandbox
 \* @return bool
 \*/

function hook_update_N(&$sandbox) {
  $style = image_style_load('mobile_content_image');
  if (!$style) {
    $style = image_style_save([
      'name' = 'mobile_content_image',
      'label' => 'Mobile Content Image (500 x 250)'
    ]);
    $effect = [
      'name' => 'image_scale_and_crop',
      'data' => [
        'width' => '500',
        'height' => '250'
      ],
      'isid' => $style['isid'] // presumably returned by the call above?
    ];
    image_effect_save($effect);
  }
  return TRUE;
}

#drupal

Drupal Paragraphs – what is structured content?

April 26, 2016

So I gave a presentation at Drupaldelphia a few weeks ago about the Paragraphs module.

The Paragraphs module is my favorite Drupal module that I've come across in probably the last 5 years. It's basically Drupal's implementation of the concept of “structured content” – one of those terms that sounds so abstract that you probably feel an unconscious repulsion to even learning more about the idea, but hopefully I can help get you over that.

The problem

The problem is the dreaded *body field*. The body field is (historically) basically the dumping ground for everything that is going into a piece of content on the website. For sites like this blog, made up of 99.8% text, it works fabulously well and I suspect that in the early days of blogging and the internet most content that went into some kind of CMS was modeled in this way. You're reading a body field right now. There were undoubtedly some images placed in with the text, but anything really fancy or custom was most likely coded by hand, outside of the CMS.

Things went this way for a number of years and as CMSs like Drupal and Wordpress continued to gain popularity and more and more people began to use them to run their websites, more and more “things” began to wander into the body field. I'd very much like to add some images to this post, for example, but it's actually kind of a PITA to do it in a reliable way.

One day some dudes invented a website where the whole world could post and share videos, then they let you embed those videos into other web pages. So now the body field has to accommodate text, images, and video embeds.

The slideshow was born. “Why can't I put a slideshow into my article?!” became a battle cry from legions of downtrodden District 12 editors. “Imgur lets me create slideshows!”

“Data journalism” comes along, and with it a thousand fancy infographics from your internal production teams and 3rd party tools alike, distributed via iframes and js snippets and holy shit letting our users embed javascript is suicide, right??

The Twitter card embed. I'll stop there.

Soundcloud. Every other media site with their own custom video player. Imgur. Flickr. Hubspot. Disqus.

Some (crappy) solutions

This is a problem for a number of reasons. The most immediate issue that this causes is that unless all your editors know how to write perfect HTML, you're going to be stuck with The Wysiwyg. Wysiwygs have come a pretty long way in the last couple years (a few of them anyway), but I don't know of any serious Wysiwyg solution out there that is able to keep pace with the number of new “things” showing up on the internet. Our editors want to put these things in their content in a way that will effectively keep them from breaking the site, and it's our job to give that to them somehow.

The most evolved solution to this problem is the one that Wordpress came up with, and Jeff Eaton espoused last year in his DrupalCon Talk “The Battle for the Body Field”, basically – shortcodes. This approach allows for a lot of editor creativity which should be a primary goal of our solution, but puts some guardrails up so that we're not constantly fielding tickets about a broken article.

So to recap, here are the most commonly employed solutions to this problem

Don't let them put anything in there (Markdown)
Let them put everything in there (HTML)
Let them put almost anything in there, but try and keep them from blowing our leg off (shortcodes)

And yet

None of these addresses a fundamental thing that we should care about – reuse. Once you put something in the body field, it's essentially in the content roach motel, and it's never checking out. Your system can't have any awareness of what's inside that field, so unless someone manages to get to exactly that article where you used that image or that tweet, it's never going to be seen again.

There is another way though. Imagine being able to create a feed of images that were used in articles on your site that day. Imagine being able to grab all the twitter cards that were used in articles that were tagged to Cats. Or being able to easily add rich, multi-field captions to images without having to bend over backwards.

Structured Content

So if you take a step back and think about it, a piece of content on your website is often a fairly unstructured piece of work, but it can be broken down into a collection of pieces that are themselves very structured.

Take an image with caption. Trying to do this in the Wysiwyg frequently involves adding the caption to either the title or alt attribute and then using javascript to pull that out, build a DOM element out of it, and insert it somewhere in the vicinity of the image. What happens if you also need an attribution field in addition to the caption, though? That's the instant things start getting weird, and often we give the editor some unsatisfactory answer and they slink off to solve the issue in some unsatisfactory way.

But really, what if we treat that image/caption as it's own entity? Then you have an entity with an image field and a caption field. If you want to add an attribution field, that's very easy in this model – you just add an attribution field. Or a URL. Or a date.

Something with a few more moving parts – how about that image gallery? Well, another entity for starters, but make it so you can add any number of images to the entity and presto. Since our system is aware of the kind of entity that you're using here, it's trivial to wrap it in the CSS classes needed to pull off an image gallery.

So essentially, rather than your content being something like Title/Summary/Body/Image for this/Image for that you end up with something more like Title/Summary/Collection of individual entities that make up the body of the article. Those individual entities are pretty easy to manage in themselves, since they're highly predictable. You just need some mechanism for relating them into the article that they live in and making sure they display in the right order. Once you do that though, you're not bound strictly by the article model anymore. You can use those entities in other ways as well.

This article got waaaay longer than I intended, so I'll get into Drupal's answer to this issue in the next one. As far as I know, this concept has existed in the CMS world for a very long time, but Drupal is the only platform that I know of that actually has an implementation of this concept in the Paragraphs module. Until then.

#workflow #drupal

First steps with Drupal – content types

April 7, 2016

See the previous chapter on installing Drupal.

Hi there, and congrats on making it this far. You should be looking at a screen that looks like this —

Drupal Welcome Screen

Congrats, you've just set up a website with arguably the most advanced CMS in the world!

Creating Content

This is what a Content Mnagament System is for after all. If you followed the “standard install”, then you have a couple of different choices. In Drupal parlance, these are called “Content Types” and you have two of them so far – Basic Page and Article.

So what are content types?

I like to think of content types as the “things” on your website. These can be articles on a blog site, items in your catalog on an e-commerce site, pets for adoption on the local SPCA site, or really anything that you want to post on your site. In Drupal parlance we call them “Nodes”. Node was chosen for it's deliberate vagueness, just go with it for now. Those different types of content that you want to put on your site – blog posts, pets – are appropriately known as “content types”. You can make as many of them as you want, they're just one way to categorize your content into the same kind of “thing”.

So let's create a new article. Feel free to explore the admin menu, which should be visible at this point or you can just head to http://localhost:8888/node/add. That's where we get started. I'm assuming no prior web development experience, but I am assuming that you've probably at least uploaded an image or two, and filled in some forms on the web before. That's all you have to do here.

Drupal 8's got some nice new options with some nice new polish for folks who are creating the content, but we're not going to get into that now. Let's get our hands dirty.

The Drupalnoobs conference

This conference will be for newcomers to Drupal and will feature lots of session about all aspects of getting started with Drupal. The sessions will be lead by experienced and established Drupal developers who will present some pretty awesome material. The sessions will be recorded and put up on YouTube somewhere, so after the conference is done we'd like to put the videos up on the website. We'd like to group all the sessions into different tracks like “design” and “business” and “completely new to all of this”.

Naturally, we need a website to hold all of this stuff so that's what we're going to build and hopefully come out on the other side with a bit more grounding in how to get things done with Drupal.

#drupal

Installing Drupal

March 29, 2016

See the previous post in this series on getting started with Drupal

So welcome back, this is actually the most challenging part of this tutorial – installing Drupal. I'm assuming no prior web development experience, so the first part will be installing something to run your tutorial Drupal site on.

For this we'll be using a project called MAMP. MAMP is a package of software that makes it easy to set up Drupal (but not just Drupal) on your computer. I'll skip the deeper details for now, but head over to MAMP and download it. You can stick with the free version for now.

note: There are many basically equivalent packages out there for this purpose – XAMPP, Acquia Dev Desktop. If you'd prefer one of those feel free, but I'll be using MAMP because it's very simple and is what I got started with.

What is MAMP?

This section can be safely skipped if you don't care.

MAMP is basically the same package of software that runs on any webhosting server, think Dreamhost or GoDaddy. It stands for (M)y (A)pache, (M)ySQL, and (P)hp. These are the three things that you need to make Drupal work.

Apache is a “webserver”. The webserver piece is the one that sits there and listens for incomcing requests from someone's browser (or bot). When a request comes in from somewhere, Apache figures out what that request is asking for and routes it to the correct item. It could be an image, which is very simple because it's just a file sitting on a disk. In this case, Apache grabs the file and returns it. This is called “the Request/Response” cycle, and is pretty much the slab that the internet was built on.

Sometimes however, the request is asking for a webpage in Drupal. This case is a bit more complex, and that's when PHP and MySQL come into play. PHP is a programming language. You might know that already, but Drupal is mostly written in PHP. That's why you need it for this tutorial. When you create a new page on your new site, the content of that page gets stored not as a file on a disk, but as a row in a database. MySQL is an exceptionally popular database, and the one that we'll us for this tutorial.

On to the show

Hopefully MAMP has finished downloading by this point. Go through the installer, and when you get done you should be able to open MAMP in the same way that you'd open any other application. Once it starts up, you should have a screen that looks basically like this —

Mamp opening screen

Once you click “Start Servers” you've done it! You've built your first webserver stack!

Click into “Preferences”.

Under the preferences option, you'll get some options to twiddle with. Don't twiddle with them, at least not yet. Option names will vary, but you're looking for the rightmost tab, it's either called “Apache” or “Webserver” in the most recent versions. Under that tab will be a most pertinent piece of information – the “Document Root”.

[!info] Document Root – where the webserver will look for the files that it's trying to serve.

In a nutshell, once we download Drupal, we're going to put all it's files in that directory so make a note of where that directory is!

Downloading Drupal!

Head on over to Drupal.org and download Drupal! At the time of this writing, that giant green button takes you to another screen where you are presented with ah choice. I was hoping to shield my readers from this, but if you're going to learn Drupal I guess now's as good a time as any to explain why this choice exists at all.

You may skip all of this.

An interlude

Drupal has been around for nigh 12 years at this point. It was started in a Dutch kid's dorm room as more or less a message board for that dorm. Early in life it embraced the open source model for development, which means that other kids in his dorm were able to hack on it and add to it and improve on it and make it better for everybody.

Many years later Drupal was running some of the largest websites on the internet, and while it had been added on to and improved by thousands of developers by that point you could still find some of that 12 year old dorm code if you looked in the right places. Many people, your author included, felt disbelief that such code could be responsible for so much, yet at the same time took great comfort and pride that really anyone could learn this stuff just by following this code around. There truly was nothing really fancy about Drupal's codebase for a great many years. A few really smart patterns up front, followed diligently for years, and the rest is early internet history.

But time marches on, and with it evolution. Standards in computer engineering, common patterns for solving common problems, and much more complex needs on the web necessitated engaging with the wider PHP ecosystem. After all, the Easter Islands were once thriving communities, yet after time they thrived themselves right out of existence. Drupal wanted to avoid such a fate, so a decision was made in 2011 to replace some key pieces of Drupal's internal code with more modern code from a well known PHP framework – Symfony.

This made a heck of a lot of sense. Much of Drupal's aforementioned dorm code had very interesting, almost paleological qualities about the way that it solved problems as if “this was how our ancestors built a fire before we had matches”, and newcomers to Drupal that *did* have a background in software development were often left scratching their heads to some of the decisions. In a nutshell, learning Drupal was easier for newcomers to web development than it was for established developers.

Thus the rather controversial decision was made to standardize some of the very deepest parts of Drupal – those dealing with the “request/response” cycle.

Thus began a process that took 5 years and involved an almost complete rewrite of Drupal. This is both shocking and obvious in hindsight, since a complete rewrite is something you never, ever, ever want to do with a software project, yet once you modernize a piece of a system, the rest of the system looks that much more archaic.

The good part – Drupal is a modern and really impressive piece of software engineering, and includes many more features in the standard install that you're going to want on your site than previous versions. It's much more “batteries included” than older versions that required you to download and install lots of add ons to get it to do the things you really wanted it to do.

The bad part – much of the code that has been written by folks like you and me over the last decade doesn't work anymore. This is kinda brutal, but such is evolution. It also opens up something of a goldmine for new development opportunities within the Drupal ecosystem, but with that comes that learning to code for Drupal 8 will be a much different experience if you are new to building software. It'll require you to know what you're doing, which I most certainly didn't when I was learning Drupal (6).

The other good part – this entire tutorial can be done now with Drupal 8.

So go ahead and download Drupal 8, but once you decide that Drupal is, in fact, for you you'll probably revisit this topic.

Back to Drupal

So you've downloaded Drupal 8 – unzip it. You'll have a bunch of files and folders that look like this inside the newly unzipped directory -

Downloads/drupal-8.0.5 [ tree -L 1 ] 4:50 PM
.
├── LICENSE.txt
├── README.txt
├── autoload.php
├── composer.json
├── composer.lock
├── core
├── example.gitignore
├── index.php
├── modules
├── profiles
├── robots.txt
├── sites
├── themes
├── update.php
├── vendor
└── web.config

6 directories, 10 files

All those files go in “The Docroot” – which is the path that you noted earlier in your MAMP preferences under Apache/Webserver/whatever. It'll end in htdocs, so something like /Applications/MAMP/htdocs if you're on a Mac, or whatever that screen says if you're not.

The big payoff

Something always goes funny with people's computers, but at this point you should be able to navigate your browser to localhost:8888 and be greeted with the Drupal installation screen.

Drupal 8 install screen

We're going to be choosing all the defaults for this tutorial, click through the language and the next option is for “installation profile”, just choose Standard.

The next screen – “System Requirements” – is the tricky one. Ask below in the comments and we'll try to debug it together if you aren't allowed through. MAMP should have all this sorted out for you already, though so soldier on.

The next and basically final step is to give Drupal the connection credentials to your MySQL database. Those can be found on the welcome webpage if you click that middle button in MAMP. That'll take you to a screen that tells you for sure, but it should be something like

user: root
pass: root
host: localhost
(open up the advanced options)
port: 8889
(leave the table prefix empty)

At this point, you're in. You've installed Drupal. There is one more configuration screen that you can plug all the answers into on your own.

Save and continue on to the fun part of the tutorial!

#drupal

Why Drupal?

March 26, 2016

It's an interesting time to be building things for the web. It's matured to a place where much of our day to day lives is conducted through sites that we access online. I personally buy books on paper, diapers, any piece to repair a broken appliance. I search for clues on how to do my job. I read up on the latest state of th technology scene. Every one of these places I go has largely the same stuff on it – some kind of login, some kind of way to get to “my stuff”, some kind of way to make sure that others can't get to my stuff.

Building this stuff is not exactly easy, and to do it in a way that makes it a lot harder for other people to break (or break into) is downright terrifying. So what do we do? We team up. How do we do it? Well, in this day and age it's called “open source software”.

Drupal is just such a piece of software. Right out of the box it comes with all that standard stuff – user accounts, a mechanism for posting stuff online – either publicly or privately, and mechanisms for doing many of the other things that you could think of. Lots of folks use Drupal and have been using Drupal for many years now, which means that for the vast majority of things that you want your site to do someone else has wanted their site to do that too. Even better, they've already solved the problem and given the solution back to us all, so you don't have to go solve it again. This is the essence of open source – folks all over the world working together, and a huge part of why I use Drupal.

What about Tool X?

To be sure, Tool X is excellent. It has a great user community and many of these same problems have been solved in a different way inside it, but it actually requires you to write code, which Drupal doesn't require for you to get started.

Tool W is another project much like Drupal, but it isn't nearly as flexible as Drupal, it stays within a smaller set of lines. It does what it does within those lines excellently, so definitely evaluate if you can solve your case with Tool W before coming to Drupal, as Drupal is a bigger piece of machinery with a steeper learning curve.

Who uses Drupal?

Lots of folks. Right now the best use case for Drupal is for sites that have a lot of content. These include publishers and their sites, government agencies and their sites, and educational insitutions and their sites. Lots of times these folks have specific needs for how they present their content to the world and how they allow the world to access it, and Drupal fits that use case like a glove.

Turns out that most things that go on the web aren't that far away from this specific use case, so extending Drupal to get there is often a pretty straightforward and well trodden path. This series hopes to make it even more straightforward for newcomers.

Why not Drupal?

Drupal is not good at everything, however. Don't use it to ingest real time trade data from the NYSE, for example. You'll want a leaner and more purpose built system for that, but chances are if you're building something like that you already know this.

Now, on to installing Drupal.

#drupal

Drupal is uncool because

March 20, 2016

So before the pitchforks come out, this is where this blog post came from.

I need someone to submit “Why is drupal uncool?” to https://t.co/WkZUojaBkS #DrupalCon

— Cathy YesCT (@YesCT) February 22, 2016

YesCT started riffing on great ideas for DrupalCon session. Alas I didn't submit any of them, I submitted my session about how Paragraphs module got my mojo back. It was not accepted, which is kind of a relief.

Anyway, I have a few ideas on this topic. Let's see what falls out.

It's PHP

So yes, PHP. AFAIK in all of it's 20+ year history PHP has never ever been the Hot New Thing. It has things to recommend it, namely the deployment story which enables folks to get started building things immediately and not have to worry about setting up anything on a server. If there were any other option that were this simple, PHP probably would not be the thing that it is today, but for that horse a kingdom has been built.

Just Google “why PHP sucks” for more on this topic. In a nutshell it's maddeningly inconsistent. I've been working with Python for all of 4 weeks and I already know it better than I know PHP in 8 years of working with it. “The API is consistent”, which is a fancy way of saying that you don't have to look up the order of the arguments for a function every single time you want to use that function. Quick, array_push() – is it haystack/needle or needle/haystack?

Drupal suffers from this spillover, as do basically all PHP frameworks. Until D8, Drupal didn't even have the benefit of object orientation (by benefit I mean, it makes it cooler).

It (historically) tries way too hard

Remember this – http://certifiedtorock.com/? Sadly, this is still a thing on the internet.

Or what about this one?

webchick world tour

I'm super, duper sorry for dredging this up, because I have nothing but respect for Webchick. Seems like everything she writes is so dead on and the respect that she has in the community probably surpasses what even Dries gets. But I'm not sure who thought this was a good idea.

In this regard, Drupal is like me at 15 or so. I was not cool, but I went and bought Doc Martins anyway because the cool kids wore em. I think both of these are around the same era – the time that Rails was absolutely the White Hottest New Thing and everyone else was trying to keep up.

The documentation sucks

And this is the real point here. I started working with Django last week. The documentation is complete, organized, and located in one indexed portion of the website. You can download a PDF of the entire thing and it's better than any O'Reilly book you could possibly buy about Django. If you land on a page for an old version of the framework, it lets you know.

The same thing goes for Postgres.

The same thing goes for Symfony.

The same thing goes for Rails.

The same thing goes for React.

These are open source projects that want to be used. They put forth as much effort into the documentation for the things they've built as they put into the things themselves. It's impossible as a Drupal dev to see documentation like this and not feel like “wow, this is a toolkit that takes itself seriously and respects folks who want to use it”. It's impossible as a Drupal dev to see documentation like this and not feel like “WTF is Acquia doing?”

Drupal 8 will not succeed until it has documentation like this. I have contributed to the Drupal 8 User Guide, but good luck finding it. You can find the project page, but the actual documentation is nowhere on the front page results of that search.

[Rather large rant about the JS framework in core discussion redacted.]

#drupal

Why Drupal 8 will fail

March 20, 2016

I started working with Django last week. The documentation is complete, organized, and located in one indexed portion of the website. You can download a PDF of the entire thing and it's better than any O'Reilly book you could possibly buy about Django. If you land on a page for an old version of the framework, it lets you know.

The same thing goes for Postgres.

The same thing goes for Symfony.

The same thing goes for Rails.

The same thing goes for React.

These are tools that want to be used. It's obvious from the onboarding tutorials in each of these that they want to make the process easy for noobs.

Contrast this with Drupal. I had been poking at and trying to figure out Drupal for almost a year (getting actual work done with Wordpress in the meantime) before I picked up a book that finally cleared it up for me. Oh! Drupal isn't supposed to do anything! You have to go module shopping to make it do simple things! And you have to go buy a book to tell you that!

And the situation has only gotten worse now that Acquia has decided to throw away over a decade of community knowledge about how to build Drupal sites. Where's the simple onboarding tutorial in here (?), because i can't find it.

I'm not saying that Drupal 8 is going to fail – god knows it is a ginormous step forward in SO many ways – but if it does it'll be because the Drupal project takes building things far more seriously than it does anything else, especially teaching others how to use those things. The smartest thing that Acquia could do at this point for the future of Drupal would be to put a complete moratorium on any new features until the currently existing features are covered with this level of official documentation.

#drupal

Drupal logstash syslog config

October 4, 2015

This one really took longer than it needed to.

If you're here, hopefully you've already been through this lesson on setting up the full ELK stack with Logtash-Forwarder thrown in to boot. For me it pretty much ran as intended from top to bottom, so hopefully you're already getting data into Elasticsearch and are flummoxed by how every single other logstash config out there to parse your syslog data doesn't seem to do the job and is still just treating it like every other syslog message.

The rest of the steps to configure Drupal/Logstash

Drupal's “syslog facility” setting

This is more or less the key. You have to dig around in Drupal, as well as your webserver and make sure that Drupal is logging to it's own log. By default it'll just go to syslog and then you'll have a hell of a time distinguishing messages from Drupal on the way in.

If you recall your Logstash-Forwarder config, you tagged the syslog watcher with a "type": "syslog" bit. This is really the only info that logstash has at the point that you're setting up your input filters/grok config.

Regardless of Linux flavor, follow this guide to set up the syslog module to point to a logfile of your choosing – https://www.drupal.org/documentation/modules/syslog. I just copied everything in here, so I now have /var/log/drupal.log and it works just fine. The only thing I haven't figured out yet is that now Drupal is logging to both syslog and drupal.log, so somebody tell me how to stop that from happening.

New Logstash-forwarder config

You'll just need to a) remove the old syslog watcher from your Logstash-Forwarder (henceforth LF) config and b) tell it to now watch the new drupal.log instead. This took the relevant bits of my LF config from this

 {
 "paths": [
 "/var/log/syslog",
 "/var/log/auth.log"
 ],
 "fields": { "type": "syslog" }
 }

to this

 {
 "paths": [
 "/var/log/drupal.log"
 ],
 "fields": { "type": "drupal" }
 }

Don't restart LF just yet, we have to config Logstash to understand what to do with input of “type: drupal” first.

New Logstash config

This is where I wasted most of my time over the last few days. I was under the mistaken impression that I could perform some kind of introspection into the fields that were parsed out and then tell Logstash to do this or that with them. As far as I can tell, you'd need to use the “Ruby” Logstash filter to do that, which I didn't feel like this was that complicated a use-case if I could just figure out the “right” way to do it.

Anyway, you've probably already stumbled across this – https://gist.github.com/Synchro/5917252, and this https://www.drupal.org/node/1761974 both of which, annoyingly, show the same useless config (for me, anyway).

My logs look like this —

Oct 4 08:52:34 690elwb07 drupal: http://www.biosciencetechnology.com|1443963154|php|162.220.4.130|http://www.biosciencetechnology.com/news/2011/04/students-prediction-points-way-hot-dense-super-earth||0||Notice: Trying to get property of non-object in abm_metadata_page_alter() (line 41 of /var/www/cascade/prod/brandsites_multi.com/htdocs/docroot/sites/all/modules/custom/abm_metadata/abm_metadata.module).

The config on that page is presumably looking for a string that begins with “http”, which this clearly does not. Here's the config for this particular sequence.

filter {
 if [type] == "drupal" {
 date {
 match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
 }
 grok {
 match => [ "message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: https?://%{HOSTNAME:drupal_vhost}?\|%{NUMBER:drupal_timestamp}\|(?[^\|]\*)\|%{IP:drupal_ip}\|(?[^\|]\*)\|(?[^\|]\*)\|(?[^\|]\*)\|(?[^\|]\*)\|(?.\*)" ]
 }
 }
}

Now restart Logstash, restart LF, and carry on.

#devops #drupal