theory

Platform.sh from scratch, part 0 – explaining Platform as simply as possible.

August 17, 2016

Hello, and welcome to “Platform.sh from Scratch”. In this prologue to the series, I'll go over some of the very highest level concepts of Platform.sh so that you'll have a clearer understanding of what this product is and why it came to be.

Platform.sh is a “Platform as a Service”, commonly referred to in this age of acronyms as a “PaaS”. The platform that we provide is essentially a suite of development and hosting tools to make developing software applications a smoother end-to-end process. In order to understand what this means though, I'm going to have to go into some detail in this first sidebar. Skip this if you're comfortable with this post so far.

[!abstract]

PaaS

Everyone has heard of Salesforce. Salesforce has come to be the poster child for what is now referred to as “SaaS” – Software as a service. Prior to the SaaS era if you wanted a piece of software, be it a video game or Quickbooks or anything else, you had to drive to a store and buy a box with some disks in it. Once the internet reached a level of market penetration into people's homes though, those stores went out of business. This is an obvious evolution in hindsight. SaaS is a high level thing – it's a runnable piece of software that you'll access over the internet via a URL. You might be able to modify/config it a little bit, but will never be your entire business. It's not your product. It's someone else's and will likely play some fractional part in your overall business plan.

Almost everyone by this point has heard of Amazon Web Services – AWS. AWS is basically what people are talking about when they say “The Cloud”. AWS is a suite of products that emerged from Amazon when they figured out that they needed a huge amount of datacenter capacity to be able to withstand massive retail events like “Black Friday” and “Cyber Monday”, and that for most of the rest of the year they had tons of excess capacity sitting around draining money from their wallet. What to do with all that excess capacity? Sell it to someone else.

This relatively simple premise has evolved over the last 10-12 years into numerous products from S3 (basically a giant, limitless hard drive in the sky) to EC2 (basically a giant, limitless hosting server in the sky) to Redshift (basically a giant, limitless database that can be used for data warehousing) to SES (a simple service that sends emails) to an ever growing host of other services that always seems to come out just before your start-up figures out that it needs them.

AWS and “the cloud” in general is often given the acronym “IaaS” – infrastructure as a service. They're selling you the low level hardware abstractions that you can assemble into an infrastructure on which to run your software and by extension your company. It requires a decent bit of specialized knowledge for how to use the individual pieces as well as how to plumb them together, but for all intents is infinitely flexible. It's this level that has had most of my interest for the past few years.

In the middle of these two is what's called “platform as a service” – PaaS. This is what Platform.sh is – a suite of software and hosting services that lets you efficiently build and develop your software application, and then deploy your software application to a hosting environment that doesn't require as much specialized knowledge on how to plumb all the pieces together. Nor does the hosting environment require you – and this is a most important detail – to set up monitoring and alerting for if something goes wrong in the public environment.

The PaaS takes elements of both IaaS and SaaS to allow you to build your software product but not have to hire an extra person just to know the low level server business.

So, back to the program. The development tool set of Platform.sh is entirely based around Git. Just in case the reader is not already familiar with Git, I should explain this a little bit.

[!abstract]

Git

Software projects are typically composed of lots of files. If you want to add a new feature, you might be required to make changes in more than one of those files. Of course, before you get started you'll want to make some kind of backup just in case. If it turns out that the change was buggy or unneeded and you want to revert back to a previous state, you'd just restore those few files back to their previous versions.

What if, however, you're working with a bunch of different people and more than one person is working on that change (an utterly common scenario)? How do you manage those backups between all those people? Saving copies of files is basically impossible to manage after a very short while, so out of this need SCM (source code management) was born. It's been through several different iterations by this point, and at this point in time the version of SCM that is leading the market is called Git.

Git is pretty cool. It basically takes snapshots of your entire project whenever you tell it to. It then keeps track of all those snapshots and lets you share those snapshots among a team of developers. Any snapshot can be reverted, and you can see the full history of every change to the codebase so you can keep track of “what happened when”. But wait! There's more!

This is not an exclusive feature of Git, but it has a feature called “branching”. Branching is intuitively named, and is basically the concept of taking a specific snapshot and making changes based off of that one snapshot while other work continues on down the main code line. This is the recommended way to work if you're going to make any kind of significant change to the software, and this method of working allows you to keep the main code line (almost always referred to as the “master branch”) in 100% working order. It can be thought of as having a furniture workshop away from your house where you can work and keep the house clean for company to come over at any moment, as opposed to working in the house and risking having a wreck to present should company decide to drop by.

In essence branching is making a complete copy of your project at a point in time that you can hack on all you like without disturbing anyone else. If and when the change is ready, you “merge” the code back in to the master branch, test it out to make sure everything is still groovy and then you can release the feature or bug fix to the public.
gitGraph
   commit
   commit
   branch develop
   checkout develop
   commit
   commit
   checkout main
   merge develop
   commit
   commit

You can read more about the super basics here if you wish. For now, all you really need to know is that Git

Makes it easier to develop software as a team
Makes it very cheap and easy to try out new features without breaking anything
Makes it easier to manage changes to your software and to revert back to a known non-broken state

update: Hey look, a really great post explaining all this better than I did.

Platform.sh has taken this branching and merging workflow and extended it out into the entire hardware stack. When you're building a software project of any size, there are considerations beyond just the code your team is writing.

Most applications of any size connect to some kind of database in the background, this is where they save “data stuff”. User uploaded images are a very common thing in the web app world, so if those images aren't there the app will look busted.

You can branch your code all you like, but you need these other supporting resources to really do your job. Platform.sh makes branches not just of your code, but the entire infrastructure that your project runs on. This allows you to use the common branching/merging workflow with the complete support of everything else that your application depends on.

This may seem like an obvious feature, since how can you develop a new feature without being able to run it (?), but no other service that I know of actually does this. A branch in Git triggers (for all intents) a complete copy of your production site without requiring you to set up any new servers, copy databases over, copy images and everything else, etc, etc, etc. It's a significant hassle to do all this stuff, trust me, and it slows the team down every time you have to do it. Removing this need removes a major friction point in the workflow for building new features on your software product.

But wait! There's more!!!

This is where it starts getting really, really good. In case you're not aware, there's a website called GitHub. It's where a whole lot of folks have decided to host their “git repos” – repo being short for “repository”, which is basically that series of snapshots of the state of your project/codebase back to the beginning of time. This is the repo for this blog – https://github.com/JGrubb/django-blog, and here's some of the code that just ran to generate this page you're reading – https://github.com/JGrubb/django-blog/blob/master/blog/views.py#L26-L33. Pretty cool, right? And if I were working on a project with a buddy, we could both use this same repo and work on the same project, whether I'm in Germany or New Jersey or wherever. I can pull his changes over and he can pull mine and this is basically how open source software gets written these days.

The same workflow applies though – if you want to make a new feature or even if you just want to fix a bug, you'd make a new branch and do your work and then “submit a merge request”. This basically pings the person who runs the project and says “hey, I would like to suggest making this change. Here's the code I'm changing, maybe you could look it over and if you agree with this change you can merge it in”. By way of an example, here's a list of “pull (aka merge) requests” for the codebase that comprises the documentation for Platform.

Again, this is how software gets written and it's pretty mind blowing if you think about it. We software developers are so used to it our minds cease to be amazed, but not because it's not amazing. I mean, currently participating in that list of PRs are folks from France, Chicago, Hong Kong, the UK, and so on. Amazing. It is also, however, a pain in the ass.

It's a pain in the ass because it's typically impossible to tell if something works or not just by looking at the code, so you have to pull their changes over to your computer and test them out somehow. I bet you can see where this is going! Platform has a GitHub integration (BitBucket too) that will automatically build a working version of any merge request that someone opens against your project. That let's you go visit a working copy of the project and test it out without having to do a thing. Now, I don't care how long you've been doing this, that is mind blowing. For example, here's Ori's (currently work in progress) PR for adding the Ruby runtime documentation – https://github.com/platformsh/platformsh-docs/pull/339. If you click the “show all checks” link down toward the bottom, it expands with a little link “details”. That link takes you to a complete copy of the documentation with Ori's change added to it, so you can read it like you normally would, rather than reviewing a “diff”. It's the future now!

What this means in the wider scope is that your time to set up new things to test out new ideas, only to have to tear it down once the tests pass is time that you don't have to waste anymore. You can test changes out and keep right on moving.

This GitHub integration is only one of the really cool and unique features that Platform provides, but this post has gotten absurdly long already. Fortunately, this is intended to be the prologue to this series, so I'll touch on as many of those features as I can as the series progresses.

#generaldevelopment #devops #theory #platformsh

Building a drop down menu from scratch

February 13, 2014

Dropdown menus may seem like something that falls under the “solved problem” category, but they can be surprisingly tricky. Tutorials that you find online will usually walk you through a very simple example that assumes markup that you never have. This will not be that. We're going to talk about the theory behind building a drop down so that you can better reason your way through the mess of markup that you're given.

If you're working with Drupal and your requirements are outside the scope of what Nice Menus can give you (which happens as soon as you hear the word “responsive”), this tutorial is for you.

[!note] Be advised that some parent themes do not render the submenus even if you set the parents to “expanded” in the menu admin. I'm not sure what the logic is for that, but it's a feature you should be aware of in some base themes.

Beginning

Your menu markup is going to look something like this —

* Item 1
* Item 2
* Item 3
	+ Sub-item 1
	+ Sub-item 2
	+ Sub-item 3
* Last item

Out of the box that'll render something like this

Item 1
Item 2
Item 3
- Sub-item 1
- Sub-item 2
- Sub-item 3
Last item

If you are working with Drupal, you're going to have to dig through a lot of layers of wrapper divs, and there will be a lot more classes added to each item, but the general structure is the same. One early gotcha is that all the submenu uls are also given a class of .menu, which is annoying at best.

* Item 1
* Item 2
* Item 3
	+ Sub-item 1
	+ Sub-item 2
	+ Sub-item 3
* Last item

Ignoring all that, the general idea is to hide the submenu ul, get all the top level items to line up next to each other, and show the submenus when you hover over a parent item. How?

ul {
	float: left;
	margin: 0;
	padding: 0;
}
ul li {
	position: relative;
	float: left;
	list-style: none;
	list-style-image: none; /\* DRUPAL!! \*/
	padding: .5em 1em;
	margin: 0;
}
ul li ul {
	display: none;
	position: absolute;
	width: 10em;
	left: 0;
	top: 1em;
}

ul li:hover ul {
	display: block;
}

JSBin here

Play by play

I'll assume that the left floating stuff is understandable. The real action with a drop down happens with the display:none; position:absolute set on the submenus and the position:relative set on the parent -s. position: relative means nothing much to the item on which it's set (unless you start adding positioning rules to it as well). It's importance here is because any child elements/nodes that are absolutely positioned inside of it will now be positioned as if they exist inside that element. Without position:relative on that item, the absolutely positioned elements inside of it will position themselves relative to the body element, or the first containing ancestor that is positioned relatively. See here for an example.

As an aside, these two ALA articles are required reading if this part makes your eyes cross.

The rest of this is hopefully understandable. display: none on the submenu hides it from view, until you hover over it's parent -, at which point it gets the display property of block, which makes it show up in your browser. Since it's absolutely positioned, it'll need a width specified. You'll need something narrow enough to prevent short item from floating next to each other, but wide enough to keep longer items from breaking to too many lines.

On Superfish, Nice Menus, javascript, etc

Presumably, you might have heard of Superfish. It was the defacto JS solution to drop downs for many years, most of them in IE6/pre-jQuery era. IE6 has a (ahem) “feature” where only certain elements properly respond to the :hover pseudo-selector. That meant for a great many years that the only real solution was to patch this behavior with javascript. Fortunately, you only have to deal with this issue now if you still support IE6.

The other, definitely legitimate issue, is that using CSS only means that the instant you leave the zone of the parent item (don't forget the the parent - is wrapped around the entire submenu), your submenu will disappear. This means either judicious placement of your submenu, or utilizing some javascript to make your menu behave a bit more smoothly. Both are good solutions, imo.

Here's an updated JSBin. Note in the collapsed CSS column I've commented out this part —

ul li:hover ul {
	/*display: block;*/
}

This means we'll be hiding and showing the dropdown with javascript (jQuery in this example). I've added a class of expanded to the parent * to make selector targeting easier. Here's the full javascript -

jQuery(function($){
	var timerOut;

	$('.expanded').on('mouseover', function(){
		clearTimeout(timerOut);
		var self = this;
		setTimeout(function(){
		$('ul', self).show();
	}, 300);

	}).on('mouseout', function() {
		var self = this;
		timerOut = setTimeout(function() {
		$('ul', self).hide();
	}, 300);
	});
});

So, setTimeout returns a numeric timer id that you can use to cancel out the setTimeout callback if you need to. Since we're going to need access to one event handler's timeout in another event handler, we're going to declare the variable for the timerId outside the scope of both of them – var timerOut in the outer function.

Any time you use jQuery on(), the element that is triggering the handler is this inside the callback function (the function after 'mouseover'). We'll assign that to var self; since we're going to enter another context once we enter the setTimeout() callback. By the way, all of this gobbledygook about scope and this is THE trick to Javascript. Understand function scope in Javascript and you'll be highly paid. I'm still getting there myself.

So anyway, discounting that bit, it's very simple. When you mouseover the parent, show the submenu. When you mouseout of the parent, hide the submenu. All we're doing is adding a delay to those actions firing. The trick is cancelling that hide() call if the user decides within 300ms that they didn't mean to wander out of the submenu. That's where clearTimout() works it's magic in the mouseover function. If there is a mouseout timer still ticking, it's ID will be assigned to timerOut and it'll get cleared. If it's already hidden the submenu, no harm and no foul.

Note that if $('ul', self) looks weird, what that means is the item in the context of self is what we're trying to find. Omit the context and it implicitly becomes the whole window. Add the context and is almost the same as saying $('li.expanded ul'). I say “almost” because the second, longer example will actually grab *any* ul inside of *any* li.expanded, which is not what you want. That's why specifying the context not only shortens your code and improves performance since the whole DOM doesn't need to be searched each time, but also scopes your selector dynamically based on which element triggered the handler. I know this is total babble, and I'm sorry.

Final gotcha

Drupal's version of jQuery is so dated that on() isn't available. If you have the option of jQuery_updating to 1.7, you can enjoy the modern era. If your site breaks, as is often the case, and you're stuck with lt 1.7, you'll need to use bind() instead. It works more or less the same in these use cases, but being familiar with event delegation is another JS Jedi trick, and the one promoted by modern Javascript authors.

In closing

This got longer than I wanted, but it's not the easiest thing in the world to build the ubiquitous drop down menu. My first one took me at least a week, and I think I eventually stumbled on Nice Menus to actually get the job done. Luckily, modern browser environments are much more predictable than they used to be, so knowing how to fish on your own is much easier these days, and the taste of a fish you caught on your own is always superior to something bought at the store, right?

This post touched on the word “responsive” at the top, and I'll follow up with how to work with that. If you've come this far, you've set yourself up nicely for an easier mobile menu job without having to fight against a bunch of other people's code.

#ui #css #javascript #theory

Explaining Non-Relational Databases To My Mom

May 17, 2013

I was on the phone with Mom yesterday, and we got to talking about technology – a thing that actually happens fairly frequently. Being an only kid, she's genuinely interested in everything that I do and it's been helpful to have someone who's mostly non-technical to bounce explanations off of when I'm getting my head around a new piece of gear.

The piece of gear that I was explaining the other day was something called Mongo DB. Mongo's parent company is called 10gen, and they landed on the startup scene about 5 years ago or so with their flagship product, Mongo DB. Mongo is currently the pre-eminent player in the “NoSQL” database market. The NoSQL flavor of databases has come en vogue in the last few years in certain technology sectors, primarily ones that are evolving so quickly that having to slow down to put forethought into your data store and how it's going to be structured might literally be the difference between your whole company suceeding or not. That whole sentence will make more sense in a minute, but first it'd be helpful to understand how a traditional, “relational” database works.

The Relational model

The relational model of storing data has been around for more than 40 years. Wikipedia, of course, has a great article – http://en.wikipedia.org/wiki/Relational_database – giving a technical overview, and I wrote my own article 4 years ago while trying to get my own head around the concept – http://www.ignoredbydinosaurs.com/2009/04/chapter2-databases. Wow, that's hilarious reading that article actually, since my “Railroad Earth setlist database” example is well underway – http://www.hobocompanion.org/. So yeah, read that second one for the idiot's guide.

The classic example I gave to my mom was that of a common blog. You have a table for all your posts. Each post in that table has a row, and each row has something like a numeric ID, and the text of that post. When you want to read a blog post, the URL in that address bar of your says something to the effect of “give me post #1”. Whatever blog hosting software your using then turns around to the database behind it and says “select everything from the posts table where the ID of the post is 1”. The database hands back the post, the software renders some HTML out of it and sends it back to your browser. This is the simplest example of your interacting with a database.

The relational model typically comes into play when you visit a blog that has comments. Now you don't just have a Posts table, but you also have a Comments table. That Comments table will most likely have the same ID column, and a column called “body” or something like that for storing the text of the comment. However, this table will also have a column called something like “postid”, and what gets stuffed in that column is the (you guessed it) ID of the blog post that this comment “relates” to. So now when your reader comes by, the blog software turns around to the database and asks for two things this time – “select everything from the posts table where the ID of the post is 1”, and then “select everything from the comments table where the postid is 1”. This second “query”, if you're lucky enough to write something that people respond to, will return a list of comments, an array if you will, that your blog software will then convert to HTML and append to your blog post in the form of the comments section. Pretty simple, or is it?

Issues with the relational model

For the purposes of this simplistic example, this hopefully isn't that hard to get your head around. “But”, you might be wondering, “does that really make sense to store blog posts over here and comments over there? They're clearly related. When are you ever going to be examining a post's comments without wanting that post also?”

Very good. And this is a prime example of when a “non-relational” or “NoSQL” database might make a lot of sense.

The non-relational model

Let's stick with the same example, the blog post and comments, but let's think about how to “model” this in a non-relational way. By the way, if you're still with me, you have a deeper technical understanding than 99% of everybody around you.

The non-relational data model would look more like a sheet of paper. In fact, the concept of one entity and all the data that pertains to that one entity is known in Mongo as a “document”, so truly this is a decent way to think about it.

The way that your blogging software interacts with this blog post is theoretically a lot simpler in a non-relational database. Request comes in, blogging software turns around to Mongo and says “Please give me back this specific post and everything related to it”, in this case a listing of comments. BUT, that's not all. Since we're not forced to be too uptight about having to define how the data is structured beforehand, what if we want to tag that post with an arbitrary number of categories? No problem, stick them on the same document and when blog software says “gimme everything on Post #1”, the tags, the comments, and any other randomly associated data come back with it. Your software doesn't need to know ahead of time, you don't need to know ahead of time. It all just kind of works, provided you know what you're doing on the software level.

This amount of flexibility is what makes Mongo a very popular “data store” with quickly moving techonology companies that might be chasing an idea that changes from week to week or day to day. You don't have to rearchitect your entire system when you discover this particular use case down the road where you need to stick some “extra data” on this particular “thing”.

Issues with the non-relational model

It seems obvious that for this example a non-relational database makes a lot of sense, but like any useful tool, it has it's limits.

For this example, let's use a grocery list. You can model the grocery list in a lot the same way – a piece of paper, you write the items you want on there. But let's say, for the purposes of this example, you figure out after a couple of months that you want to keep track of how many loaves of bread you actually bought last year. Well, with the non-relational model you might literally have to go through every list and count each loaf individually (*simplistic example alert*), whereas had you modeled this in a relational way, you could get that count back almost instantly. Yes, it'd be a bit more work on the front end – essentially you'd have a Lists table and an Items table. But, modelled correctly, you'd also have a table in the middle called a “join table” that allows you to associate any number of items with any number of lists. After a while, were you truly the hacker, you could probably write some code that'd predict what you need to put on that list without even having to put it on there yourself, based strictly off of patterns that are easily discernable in a relational database.

That's why the Hobo Companion has a relational database behind it. It's super easy to count how many and which shows each user was at (2000+ shows tagged between about 30 users so far). It's super easy to count the number of times they've played Mighty River. It's super easy to count the number of times that I played Mighty River (somewhere around 260 times). It's super easy to figure out where in the setlist they typically put Dandelion Wine (Either first, or at the end of the first set). These calculations are far from impossible in a non-relational database, but they are also pretty far from trivial.

In situations where you absolutely want to be really uptight about your data and how it's structured – think bank accounts as the classic example – a relational database is absolutely required, not just because it's theoretically better suited for the job, but it's also a “mature technology” that's older than I am.

The wonderful point here is that we actually have decent options depending on what it is we are actually trying to do. Startup hackers love Mongo precisely because it lets you move fast. You can rock out an application in a weekend without having to spend most of one day setting up the equipment in the studio first.

Thanks for sticking with me.

#databases #theory