Immutable Delivery

This article proposes a design pattern modeled after “Immutable Infrastructure”, something I call “Immutable Delivery”.  There has been a lot of debate and discussion about the usage of the term “Immutable” lately. Let me clearly say that there is no such thing as an immutable server or immutable infrastructure. I know of no example in my 35 years of working with IT infrastructure of a system or infrastructure that is  completely immutable. A system changes and continues to change the minute it is powered on. Things like logs, dynamic tables  and memory are constantly changing during a system’s lifecycle.

However, I think it’s okay to use the phrases “Immutable Infrastructure” or “Immutable Delivery” in the context of an system or infrastructure delivery pattern. In fact, I propose we think of it as a metaphor for a kind of full stack stateless application delivery pattern.  I’ve had mathematicians actually yell at me after a presentation on Infrastructure as Code and my use of the term idempotency.  When confronted in this sort of battle, I would always retreat with saying “It’s a term used to describe a certain type of operation we do in configuration management”. Henceforth; I suggest the same idea for the use of the phrases “Immutable Infrastructure” and “Immutable Delivery”.

First Things First

Let’s describe what an “Immutable Infrastructure” model might look like. Basically, it is a model where the complete infrastructure is delivered intact, for example as a  set of immutable virtual machines or as a set of immutable Linux containers. The idea is, by design, to never touch or change the running system. In most cases, the running system is the production system; but in some recent examples with containers, this model is also used in integration and testing structures. I like to say no CRUD for applications, middleware configuration files and operating systems. The model is that when something needs to be changed in the infrastructure,  it is done as a new deploy from the most recent versioned artifact (i.e., the complete system). If there needs to be a rollback, the same process is still a redeploy, except when in this case, the artifact is the older version. One caveat in this model is relational databases. It is very difficult, maybe impossible, to have immutable relational databases. However, some companies I have talked to sort of do what I call an “No RUD” for databases. In that, they create new records but do not replace, update or delete existing ones. Mileage always varies for all of this.

Order Matters

Back in 2010, I was given an interesting paper written by Steve Traugott called “Order Matters:  Turing Equivalence in Automated Systems Administration” (2002).  Taugott’s article described systems that were either divergent, convergent or congruent.  At the time, I was working at Chef and desired state configuration and building systems through convergent models was what I was currently evangelizing. At that time, I felt that, the “Order Matters” paper described the differentiation between how Chef and Puppet worked.  The short version here is that Chef used a prescriptive Ruby-based DSL that was executed on the local host in an order specific manner based on how you wrote the DSL code.  Whereas Puppet used a clever dependency graph on the Puppet server to determine some of the ordering at deployment time. In some cases, this made a difference for certain types of organizations (Note 1).  Traugott’s paper does an excellent job describing a thesis on why this “could” be so. What fascinated me was Traugott’s explanation of congruent systems.  At that time, there didn’t seem to be a commodified way to deliver this form of infrastructure, at least from my perspective.  

Building With Legos

A year later Netflix wrote an interesting blog post  called “Building With Legos”.  At the time, there was a lot of outrage (myself included) regarding this post. At first glance, it looked like Netflix was suggesting that they were advocating a model of “Golden Image” delivery infrastructure. Part of my initial reservation was that I had seen this pattern twice in my career with dire consequences. The first time was back in the late 90’s where companies would use a product called “Ghost” to build Windows servers. This was another one of those ideas where it sounded like a good idea at the time until you had, in some cases, thousands of poorly cataloged images and wrong image deploys caused major outages. Fast forward to around 2008 and here again, organizations were starting to make the same mistakes all over again with “cloud” images,  specifically in the form of Amazon AMI’s. I believe that sometime around 2009 the phrase “Image Sprawl” became a popular phrase for doing “cloud” wrong.  In fact, I remember a horrifying story in the early days of cloud, where the Public Broadcast Service (PBS)  accidentally made a proprietary AMI image public and it took them a couple of days to clean up all the viral versions of their private keys.  So at first glance of the Netflix blog post,  you could see how many were thinking they were suggesting a mode of bad configuration management.  However, on a closer read they were much smarter than this.  What they were indeed saying in the blog post was that they were treating the AMI’s like JAR or WAR files in that the AMI images were holistic artifacts built through a continuous integration/continuous delivery (CI/CD) process.  The AMI’s would be pulled at deploy time to be launched into production similar to the way a JAR or WAR would be pulled from an artifact repository.  Only in this example, it  included all the infrastructure (OS, Middleware, application and most of the application configuration files).  I like to use the phrase “Java Lied’ in many of my presentations.  They told us “Write once, run anywhere”.  What they forgot to say is this is true unless you have an incompatible runtime environment.  Netflix at the time of the blog post didn’t refer to that process as “Immutable Infrastructure” and of course it was not completely immutable.  They had to use a set of open source tools to discover the services and converge the deployed services, so hence, their systems were not immutable.  However, their service delivery pattern was indeed a model of an immutable delivery pattern. Some time later, Netflix did start to refer to their architecture as “Immutable Infrastructure”.   

Trash Your Servers and Burn Your Code: Immutable Infrastructure

June 2013 was the first time I had heard the phrase “Immutable Infrastructure”.  Chad Fowler wrote a blog post “Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components”. In the blog, Fowler proposes his idea born from functional programming techniques that offer immutable data structures. The belief was that if somehow we could deliver the complete infrastructure, for example a server, with all the infrastructure needed  for the application, then in his words, “ it would be easier to reason about and harder to screw up”.  Imagine delivering a server the same way we deliver an application, for example, like a WAR file. Fowler’s main point was that systems grow warts, especially when you are fire fighting.  For example, a system might be well defined through an configuration management  tool, but during an outage, changes may be made on the system directly and then never put back into the configuration management recipe or manifest later.  The list of possible human or system entropy examples goes on.  Fowler also points out that sometimes application code is deployed outside of “normal straight-from-source-control process.” Operating system patching or source repositories sometimes change in flight between testing and production. There are many other examples of possible mismatches in the flow you can find in the wild. All of this could be put in a bucket called “bad configuration management hygiene”; however, just like I have never seen a perfect immutable system, I have also never seen in my 35 years a “perfect system”.  I mean ”system” the way Dr. Deming would describe it, in that, all systems include humans.  

Docker and the Three Ways of Devops

When I first arrived at Docker back in February 2015,  I reviewed a Gartner paper called “Become More Agile and Get Ready for DevOps by Using Docker in Your Continuous Integration Environments” and it set me down a course of thinking. The author, Sean Kenefick, had a strong background in release engineering and wrote an excellent paper of how Gartner would suggest using Docker.  As I read the paper,  the first thing it reminded me of was Steve Traugott’s paper about order and why it mattered and the value of congruent systems.  I decided to write a blog post called “Docker and the Three Ways of Devops”.  During my research, I talked to a number of Docker users that were doing what Fowler described as Immutable Deployments, using Docker images as the immutable artifacts. This process was similar to what Netflix was doing with two major differences. One, the services were container images not virtual machines, and two, they were being delivered immutably from the developer’s perspective. After the container images were compiled and tested the “service” would be pushed to the CI process for service level integration testing. Most of the organizations using this model had already crossed over to a microservices architecture. The flow would go something like this:

  • The developer would test their service as a container, typically on a virtual machine running on their laptop.    
  • They would also load the other primary services in their service-oriented architecture, possibly owned by other teams, into their same virtual machine on their laptop.  
  • They would continue to compile, load and test their service sometimes on their laptop and other times through a sort of first-pass CI server.  
  • When testing was complete, they would typically check in their service as a container (binary) image with a meta file to describe the rest of the pipeline flow (CI/CD).  

All of the people I talked to agreed that the benefit of this process was that the developer was in control not only of the application code they wrote but also of any middleware and basic operating systems behavior (note 2 & 3). The benefits of an immutable delivery process, like the ones described by Netflix with their AMI flow, are that you can increase speed and decrease resources and possible variation with containers. Containers instantiate in around 500 milliseconds whereas virtual machines instantiate well over a minute. In a microservices architecture, many of the containers are around 100 megabytes whereas virtual machines could still be as large as 2 gigabytes. I like to say that containers are the killer app for microservices. With this mode the developer can test all of the other dependent services from their laptop.  Werner Vogels the CTO of Amazon is often quoted as saying, “You build it, you run it”. In Devops we like to say, “developers should wear pagers”. There is a reason why developers like Docker so much. When they build it, own it and get paged in the middle of the night, they know that for the most part the bits that they tested are the same (i.e., congruent ) bits that are running in production.  

In 2014 at Dockercon US, Michael Bryzek, of Gilt, gave a fantastic presentation “Immutable Infrastructure with Docker and EC2”.  In this presentation, he describes a process where the developers check in a set of binary container images with one single meta file.  I have personally transcribed what he says starting at 28:03 in his presentation:

“This is how we run our infrastructure. One of the things that developers have to do is provide the commands to start the Docker container, and that’s it. This is kind of amazing right?  Any EC2 instance that we spin up now, we don’t care if you’re running Node, Ruby, Scala, Java or if you made up your own programming language. It’s absolutely amazing how nice this is.  When we compare this to the way we did this in the past, we had one repository that had all of the different scripts to know how to build all of the different applications at Gilt. We have 1000 Git repos and over 300 different applications. We are 7 years old which means we have like 7 different ways of producing applications. There’s 25 different ways that we build software at Gilt and it’s all captured in a central repo.  That was at conflict with where we are going in terms of teams really owning their software and being able to deploy their own services.”

I have talked to a number of companies over the past year and many of them are moving to an “Immutable Delivery” process driven by microservices implemented by containers. Capital One at the Devops Enterprise Summit in October 2015 (DOES15)  gave a presentation called “Banking on Innovation & DevOps”. They said in that presentation that they are using Docker in production. They have also said that they are delivering software in an immutable delivery pattern. This model is not just for web scale anymore.  

In the end, “Immutable Infrastructure” or what I have coined as “Immutable Delivery”  is just a model with many variants. No large organization uses a single model to manage their infrastructure. Over the next few years, I look forward to working with all sorts of products, old and new, to find the correct balance of service delivery. My only goal is to be an evangelist of a model that Josh Corman, CTO at Sonatype, and I describe as “Immutable Awesomeness”.  This is  a presentation we did at DOES15.  We borrowed many of our ideas from the book “Toyota Supply Chain Management: A Strategic Approach to Toyota’s Renowned System”.  In this book, they describe the 4 V’s.  Increase Variety, Velocity, and Visibility and decrease Variability.  In short whatever works, works…

John Willis
Director of Ecosystem Development, Docker Inc.
@botchagalupe

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

Notes:

  1. To be clear, Puppet today allows for both models and this particular differentiation, in my opinion, no longer exists. In fact, both products today have relative parity with regards to ordering.
  2. For the nit-pickers, mileage varies on operating system immutability. Containers run on a host operating system and share the kernel.  Bad hygiene on the host will definitely cause “immutable” woes.  
  3. This is, by the way, a great area for co-existence between Infrastructure as Code products like Chef and Puppet and containerization products like Docker

References:

Why Order Matters: Turing Equivalence in Automated Systems Administration

http://www.infrastructures.org/papers/turing/turing.html

Building with Legos

http://techblog.netflix.com/2011/08/building-with-legos.html

VM Image Sprawl in Real Life

http://www.cloudscaling.com/blog/cloud-computing/vm-image-sprawl-in-real-life/

Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

http://chadfowler.com/blog/2013/06/23/immutable-deployments/

Become More Agile and Get Ready for DevOps by Using Docker in Your Continuous Integration Environments

https://www.gartner.com/doc/3016317/agile-ready-devops-using-docker

Docker and the Three Ways of Devops

https://blog.docker.com/2015/05/docker-three-ways-devops/

A conversation with Werner Vogels

http://queue.acm.org/detail.cfm?id=1142065

Immutable Infrastructure with Docker and EC2”

http://tech.gilt.com/2014/07/02/immutable-infrastructure-with-docker-and-ec2/

Banking on Innovation & DevOps

http://devopsenterprise.io/sessions/shortening-the-feedback-loop-devops-dashboard/

Toyota Supply Chain Management: A Strategic Approach to Toyota’s Renowned System

http://www.amazon.com/Toyota-Supply-Chain-Management-Strategic/dp/0071615490

Immutable Awesomeness

https://www.youtube.com/watch?v=-S8-lrm3iV4

 

2 Comments

Filed under DevOps

2 responses to “Immutable Delivery

  1. Pingback: Top 10 links for the week of Nov 23 - HighOps

  2. When you say immutable infrastructure I assume you are talking about app code + everything you need to run it (example, php fpm, node, modules, etc). I’m working on a project where I’m going to have 100+ containers. When I told my client that a single line of code update will require a complete rebuild and deploy of all of them the said WTF!!. I understand the benefits, but sometimes it’s difficult to convince people because the benefits appear in the long term.

    The question is what’s the right option. Pragmatic vs dogmatic. Implement something in the middle like using the code from a volume in the host. Or just do it right. How to convince them about the right approach.

    This is more a braindump than a question but I’d like to hear other experiences, thoughts, opinions.

    Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s