Links on Bridging Security and DevOps

If you remember, I (@wickett) said I would be doing more blogging for Signal Sciences in the new year. We still are in January, but I am glad to say that so far so good. Here are a couple highlights from recent posts:

That’s all for now.  Happy Friday everyone!

Leave a comment

Filed under Conferences, DevOps, Security

In the New Year, resolve to bring Security to the DevOps party

Happy New Year!  May this be your year of much successful DevOps.

Last year I wasn’t too vocal about my work over at Signal Sciences. Mostly because I was too busy helping to rapidly build a NextGen Web Application Firewall as a SaaS from the ground up. This year you will be hearing a bit more as I am regularly contributing to the Signal Sciences blog (Signal Sciences Labs) over at Medium (sorry WordPress!).

I will try and occasionally link into some of my posts over there to The Agile Admin, around topics like:

  • The challenges we faced building a modern security product
  • Bridging the gap with Security and DevOps
  • Attack Driven Operations
  • and other Rugged DevOps topics…

Which brings me to the point of this post…

Bring Security to the DevOps party!

I am making a personal goal this year to bring security engineers, auditors, penetration testers and even those forensics folks to the devops party.  I have my sights mostly set on DevOps Days Austin as the event to physically bring people to (watch out Austin Security people!) but I am already crafting blog posts and many cunning tweets to also bring them over as well.  This year can you join me in trying to bridge this gap?

Last month I had the opportunity to do Sec Casts panel with these fine folks (all of which you should follow) on topics around devops and security:

 

If you don’t want to hear us go on for about an hour, you can read the write-up here. I mention this panel specifically because I think the topics brought up in it are directly impactful to the goal of bridging security and devops.  Maybe it will give you some ideas on how to bridge the gap in your own organization.

Happy New Year and lets make this the year that Security is finally brought into the DevOps fold.

Leave a comment

Filed under DevOps

StackEngine Webinar – Docker and the Future of Configuration Management

I hope you’ve been enjoying our Docker and the Future of Configuration Management blog roundup!  I’m joining Jon Reeve of StackEngine, who’s sponsoring the roundup with prizes and such, in a Webinar this week to discuss the various points of view we’ve seen covered.

The Webinar will be on Wednesday Dec 09, 2015 at 11:00 AM CST. Register now at: https://attendee.gotowebinar.com/register/5726672543793290498

In this webinar – we’ll explore how Docker and containers are impacting the future of configuration management. Is true “Golden Image” management now a reality? We’ll explore different points of view and the pros and cons of Docker’s impact.

We’ll also review StackEngine’s approach to Docker and container management and how it is benefiting DevOps and Operations teams.

Leave a comment

Filed under DevOps

Using Docker To Deliver Open Source Security Tools

Security tools are confusing to use but they are even worse to install. You often get stuck installing development packages and loads of dependencies just to get one working. Most of the tools are written by a single person trying to get something out the door fast. And most security tools want advanced privileges so they can craft packets with ease or install `-dev` packages themselves.

The traditional answer to this was either to install all the things and just accept the sprawl and privilege escalation, or install them in a VM to segregate them. VMs are great for some use cases but they feel non-native and not developer friendly as part of an open source toolchain.

I am familiar with this problem intimately as I am on the core team for Gauntlt. Gauntlt is an open source security tool that–are you ready?–runs other security tools. In addition to Gauntlt needing its own dependencies like ruby and a handful of ruby gems, it also needs other security attack tooling installed to provide value. For people just getting started with Gauntlt, we have happily bundled all this together in a nice virtualbox (see gauntlt-starter-kit) that gets provisioned via chef. This has been a great option for us as it allows first-timers the ability to download a completely functioning lab. When teaching gauntlt workshops and training classes we use the gauntlt starter kit.

The problem with the VM lifestyle is that while it’s great for a canned demo, it doesn’t expand to the real world so nicely.

Let’s Invoke Docker

While working with Gauntlt, we have learned that Gauntlt is fun for development but it works best when it sits in your Continuous Integration stack or in your delivery pipeline. For those familiar with docker, you know this is one thing that docker particularly excels at. So let’s try using docker as the delivery mechanism for our configuration management challenge.

In this article, we will walk through the relatively simple process of turning out a docker container with Gauntlt and how to run Gauntlt in a docker world. Before we get into dockering Gauntlt, lets dig a bit deeper into how Gauntlt works.

Intro to Gauntlt

Gauntlt was born out of a desire to “be mean to your code” and add ruggedization to your development process. Ruggedization may be an odd way to phrase this, so let me explain. Years ago I found the Rugged Software movement and was excited. The goal has been to stop thinking about security in terms of compliance and a post-development process, but instead to foster creation of rugged code throughout the entire development process. To that end, I wanted to have a way to harness actual attack tooling into the development process and build pipeline.

Additionally, Gauntlt hopes to provide a simple language that developers, security and operations can all use to collaborate together. We realize that in the effort for everyone to do “all the things,” that no single person is able to cross these groups meaningfully without a shared framework. Chef and puppet crossed the chasm for dev and ops by adding a DSL, and Gauntlt is an attempt to do the same thing for security.

gauntlt-flow

How Gauntlt Works

Gauntlt runs attacks against your code. It harnesses attack tools and runs them against your application to look for things like XSS or SQL Injection or even insecure configurations.

Gauntlt provides simple primitives to wrap attack tooling and parse the output. All of that logic is contained in what Gauntlt calls attack files. Gauntlt runs these files and exits with a pass/fail and returns a meaningful exit code. This makes Gauntlt a prime candidate for chaining into your CI/CD pipeline.

Anatomy of an Attack File

Attack files owe their heritage to the cucumber testing framework and its gherkin language syntax. In fact Gauntlt is built on top of cucumber so if you are familiar with it then Gauntlt will be really easy to grok. To get a feel for what an attack file looks like, let’s do a simple attack and check for XSS in an application.

Feature: Look for cross site scripting (xss) using arachni against example.com
Scenario: Using arachni, look for cross site scripting and verify no issues are found
 Given "arachni" is installed
 And the following profile:
 | name | value |
 | url | http://example.com |
 When I launch an "arachni" attack with:
 """
 arachni --checks=xss --scope-directory-depth-limit=1
 """
 Then the output should contain "0 issues were detected."

Feature is the top level description of what we are testing, Scenario is the actual execution block that gets run. Below that there is Given-When-Then which is the plain English approach of Gherkin. If you are interested, you can see lots of examples of how to use Gauntlt in gauntlt/gauntlt-demo inside the examples directory.

For even more examples, we (@mattjay and @wickett) did a two hour workshop at SXSW this year on using Gauntlt and here are the slides from that.

Downsides to Gauntlt

Gauntlt is a ruby application and the downside of using it is that sometimes you don’t have ruby installed or you get gem conflicts. If you have used ruby (or python) then you know what I mean… It can be a major hassle. Additionally, installing all the attack tools and maintaining them takes time. This makes dockerizing Gauntlt a no-brainer so you can decrease your effort to get up and running and start recognizing real benefits sooner.

Dockerizing an Application Is Surprisingly Easy

In the past, I used docker like I did virtual machines. In retrospect this was a bit naive, I know. But, at the time it was really convenient to think of docker containers like mini VMs. I have found the real benefit (especially for the Gauntlt use-case) is using containers to take an operating system and treat it like you would treat an application.

My goal is to be able to build the container and then mount my local directory from my host to run my attack files (*.attack) and return exit status to me.

To get started, here is a working Dockerfile that installs gauntlt and the arachni attack tool (you can also get this and all other code examples at gauntlt/gauntlt-docker):

FROM ubuntu:14.04
MAINTAINER james@gauntlt.org

# Install Ruby
RUN echo "deb http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu trusty main" > /etc/apt/sources.list.d/ruby.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C3173AA6
RUN \
  apt-get update && \
  apt-get install -y build-essential \
    ca-certificates \
    curl \
    wget \
    zlib1g-dev \
    libxml2-dev \
    libxslt1-dev \
    ruby2.0 \
    ruby2.0-dev && \
  rm -rf /var/lib/apt/lists/*

# Install Gauntlt
RUN gem install gauntlt --no-rdoc --no-ri

# Install Attack tools
RUN gem install arachni --no-rdoc --no-ri

ENTRYPOINT [ "/usr/local/bin/gauntlt" ]

Build that and tag it with docker build -t gauntlt . or use the build-gauntlt.sh script in gauntlt/gauntlt-docker. This is a pretty standard Dockerfile with the exception of the last line and the usage of ENTRYPOINT(1). One of the nice things about using ENTRYPOINT is that it passes any parameters or arguments into the container. So anything after the containers name gauntlt gets handled inside the container by /usr/local/bin/gauntlt.

I decided to create a simple binary stub that I could put in /usr/local/bin so that I could invoke the container wherever. Yes, there is no error handling, and yes since it is doing a volume mount this is certainly a bad idea. But, hey!

Here is simple bash stub that can call the container and pass Gauntlt arguments to it.

#!/usr/bin/env bash

# usage:
# gauntlt-docker --help
# gauntlt-docker ./path/to/attacks --tags @your_tag

docker run -t --rm=true -v $(pwd):/working -w /working gauntlt $@

Putting It All Together

Let’s run our new container using our stub and pass in an attack we want it to run. You can run it without the argument to the .attack file as Gauntlt searches all subdirectories for anything with that extension, but lets go ahead and be explicit. $ gauntlt-docker ./examples/xss.attack generates this passing output.

@slow
Feature: Look for cross site scripting (xss) using arachni against scanme.nmap.org

  Scenario: Using arachni, look for cross site scripting and verify no issues are found # ./examples/xss.attack:4
    Given "arachni" is installed                                                        # gauntlt-1.0.10/lib/gauntlt/attack_adapters/arachni.rb:1
    And the following profile:                                                          # gauntlt-1.0.10/lib/gauntlt/attack_adapters/gauntlt.rb:9
      | name | value                  |
      | url  | http://scanme.nmap.org |
    When I launch an "arachni" attack with:                                             # gauntlt-1.0.10/lib/gauntlt/attack_adapters/arachni.rb:5
      """
      arachni --checks=xss --scope-directory-depth-limit=1 
      """
    Then the output should contain "0 issues were detected."                            # aruba-0.5.4/lib/aruba/cucumber.rb:131

1 scenario (1 passed)
4 steps (4 passed)
0m1.538s

Now you can take this container and put it into your software build pipeline. I run this in Jenkins and it works great. Once you get this running and have confidence in the testing, you can start adding additional Gauntlt attacks. Ping me if you need some help getting this running or if you have suggestions to make it better at: james@gauntlt.org.

Summary

If you have installed a ruby app, you know the process of managing dependencies–it works fine until it doesn’t. Using a docker container to keep all the dependencies in one spot makes a ton of sense. Not having to alter your flow and resorting to VMs makes even more sense. I hope that through this example of running Gauntlt in a container you can see how easy it is to abstract an app and run it almost like you would on the command line – keeping its dependencies separate from the code you’re building and other software on the box, but accessible enough to use as if it were just an installed piece of software itself.

Refs

1. Thanks to my buddy Marcus Barczak for the docker tip on entrypoint.

This article is part of our Docker and the Future of Configuration Management blog roundup.  If you have an opinion or experience on the topic you can contribute as well

1 Comment

Filed under DevOps

Heterogeneous Hardware Management and Benchmarking via Docker

We do a lot of Benchmarking and functionality testing here at Bitfusion.io to continuously evaluate our application acceleration technology across different CPU architectures, storage and memory configurations, and accelerators such as GPUs and FPGAs. As a startup, we have various high performance systems in-house, but there are always newer and more exciting hardware configurations for us to test and develop on. To compound the problem further, more often than not different cloud and hardware provider offer access to different types of hardware.

For example, Amazon has a nice selection of systems when it comes to Intel Sandybridge and Haswell processors, but their GPU selection of K520 GRID Nvidia GPUs in the g2.2xlarge and g2.8xlarge instances is rather dated by today’s standards. To get access to machines with multiple Nvidia state of the art K80 GPUs we had the need to quickly deploy our software on other cloud providers such as Softlayer where K80s are available. To test on AMD CPUs or to access some of our FPGAs we needed an easy way to do the same on our bare-metal servers in Rackspace. And finally, as mentioned before, we have several local systems which also needed to be part of the test infrastructure – I think you get the idea.

To solve the problem we needed three things: First, an ability to provision systems quickly on demand and bring them back down – a startup is always on the clock and on a budget. Second, we needed an easy way to deploy our benchmarks and testing across any infrastructure transparently and reliably. Third, we needed a way to get performance and monitoring data for our runs and to quickly visualize them – the tricky part here was we were shooting for high granularity data collection without interfering with the actual performance itself.

To solve these problems we adopted the following high-level architecture. To manage the systems inside our office and across the different cloud providers we utilized SaltStack exclusively. SaltStack manages all our machine images, created with Packer, for the various systems and cloud providers. While in general the images are fairly similar, different hardware occasionally needs specific run-time environment and drivers on the host systems, especially when it comes to accelerators. Using SaltStack we then create and manage dynamic pools of machines, where machines can be added or removed at any time from our pools.

All our builds, applications, and benchmarks live within Docker containers, along with various performance monitoring software. We can monitor performance from inside containers as well as from outside containers depending what it is exactly that we are looking for – some tools we utilize here include collectl and sysdig. Any streaming performance data we pipe out to Elasticsearch, taking care to cluster the streaming data periodically such that we only minimally disturb any resource measurements or I/O activity. Once the data is in Elasticsearch the visualization is just a simple exercise in data querying and java scripting. Below is one possible visualizations example of this data from our free Profiler tool which we discuss in more detail toward the end of this post:

post_image_1

Since we have a use-case which differs quite considerably from the usual Docker case of many small web sites or microservices, we have many more Docker container images and they are much larger than usual. In order to manage these we run our own instance of Docker Trusted Registry so that we keep the data traffic local when pushing or pulling container images to and from the registry. As a result of our obsession with performance, we run overlayfs as the Docker graph backend. The overlayfs driver is the component of Docker which manages all the layers of the files in the container images. Whilst there are several options here, overlayfs is the preferred, and fastest, choice going forward. It does have some caveats, we had to move to Linux Kernel 4.2+ and Docker 1.8.1 in order to achieve a stable configuration, and the underlying file system on which overlayfs is layered requires a configuration with a very large number of “inodes”, the data element which records where files are stored on the disk. This means the underlying file system must be specially formatted with a much larger number of inodes than normal. This can be done using a command such as

# mke2fs -N <desired inodes> -t ext4 ….

One major advantage of using Saltstack to manage our docker containers is that the Docker event stream. Events generated whenever a docker container is started, stopped or otherwise changes state can be trivially exposed on the Saltstack event-bus, where Saltstack Reactors can listen and respond to Docker events by triggering Saltstack actions, or sending messages to other systems.

All Docker containers across our pools are managed via an asynchronous event API which we wrote in-house. We wrote this API in house as it allows us to do run-time profiling of entire tool chain flows in diverse fields such as genomics, semiconductors, and machine learning just to name a few. Using our proprietary Boost technology we then automatically optimize these tool pipelines for maximum performance across heterogeneous compute clusters. The high-level diagram below illustrates one possible use case where thin container clients leverage heterogeneous containers to obtain compute acceleration:

post_image_2

One quick note regarding accelerators and device pass through, as of SaltStack version 2015.8.5 Docker device pass-through is not natively supported, this can be solved with a small patch to the SaltStack docker driver which we wrote internally and which can be found here:

https://github.com/bitfusionio/saltpatches

For day to day use, all the above is triggered from our Jenkins build servers. However, for those interested in taking a lightweight version of our profiling architecture for a spin, we put a web front-end on this flow which can be found at http://profiler.bitfusionlabs.com – here you can build any type of Docker based application benchmark directly from a browser exposed terminal, and can get quick performance results from various instance types across multiple cloud providers. Feel free to share the benchmarks with your friends and colleagues, or make them public so that others can benefit from them as well.

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

Leave a comment

Filed under DevOps

Immutable Delivery

This article proposes a design pattern modeled after “Immutable Infrastructure”, something I call “Immutable Delivery”.  There has been a lot of debate and discussion about the usage of the term “Immutable” lately. Let me clearly say that there is no such thing as an immutable server or immutable infrastructure. I know of no example in my 35 years of working with IT infrastructure of a system or infrastructure that is  completely immutable. A system changes and continues to change the minute it is powered on. Things like logs, dynamic tables  and memory are constantly changing during a system’s lifecycle.

However, I think it’s okay to use the phrases “Immutable Infrastructure” or “Immutable Delivery” in the context of an system or infrastructure delivery pattern. In fact, I propose we think of it as a metaphor for a kind of full stack stateless application delivery pattern.  I’ve had mathematicians actually yell at me after a presentation on Infrastructure as Code and my use of the term idempotency.  When confronted in this sort of battle, I would always retreat with saying “It’s a term used to describe a certain type of operation we do in configuration management”. Henceforth; I suggest the same idea for the use of the phrases “Immutable Infrastructure” and “Immutable Delivery”.

First Things First

Let’s describe what an “Immutable Infrastructure” model might look like. Basically, it is a model where the complete infrastructure is delivered intact, for example as a  set of immutable virtual machines or as a set of immutable Linux containers. The idea is, by design, to never touch or change the running system. In most cases, the running system is the production system; but in some recent examples with containers, this model is also used in integration and testing structures. I like to say no CRUD for applications, middleware configuration files and operating systems. The model is that when something needs to be changed in the infrastructure,  it is done as a new deploy from the most recent versioned artifact (i.e., the complete system). If there needs to be a rollback, the same process is still a redeploy, except when in this case, the artifact is the older version. One caveat in this model is relational databases. It is very difficult, maybe impossible, to have immutable relational databases. However, some companies I have talked to sort of do what I call an “No RUD” for databases. In that, they create new records but do not replace, update or delete existing ones. Mileage always varies for all of this.

Order Matters

Back in 2010, I was given an interesting paper written by Steve Traugott called “Order Matters:  Turing Equivalence in Automated Systems Administration” (2002).  Taugott’s article described systems that were either divergent, convergent or congruent.  At the time, I was working at Chef and desired state configuration and building systems through convergent models was what I was currently evangelizing. At that time, I felt that, the “Order Matters” paper described the differentiation between how Chef and Puppet worked.  The short version here is that Chef used a prescriptive Ruby-based DSL that was executed on the local host in an order specific manner based on how you wrote the DSL code.  Whereas Puppet used a clever dependency graph on the Puppet server to determine some of the ordering at deployment time. In some cases, this made a difference for certain types of organizations (Note 1).  Traugott’s paper does an excellent job describing a thesis on why this “could” be so. What fascinated me was Traugott’s explanation of congruent systems.  At that time, there didn’t seem to be a commodified way to deliver this form of infrastructure, at least from my perspective.  

Building With Legos

A year later Netflix wrote an interesting blog post  called “Building With Legos”.  At the time, there was a lot of outrage (myself included) regarding this post. At first glance, it looked like Netflix was suggesting that they were advocating a model of “Golden Image” delivery infrastructure. Part of my initial reservation was that I had seen this pattern twice in my career with dire consequences. The first time was back in the late 90’s where companies would use a product called “Ghost” to build Windows servers. This was another one of those ideas where it sounded like a good idea at the time until you had, in some cases, thousands of poorly cataloged images and wrong image deploys caused major outages. Fast forward to around 2008 and here again, organizations were starting to make the same mistakes all over again with “cloud” images,  specifically in the form of Amazon AMI’s. I believe that sometime around 2009 the phrase “Image Sprawl” became a popular phrase for doing “cloud” wrong.  In fact, I remember a horrifying story in the early days of cloud, where the Public Broadcast Service (PBS)  accidentally made a proprietary AMI image public and it took them a couple of days to clean up all the viral versions of their private keys.  So at first glance of the Netflix blog post,  you could see how many were thinking they were suggesting a mode of bad configuration management.  However, on a closer read they were much smarter than this.  What they were indeed saying in the blog post was that they were treating the AMI’s like JAR or WAR files in that the AMI images were holistic artifacts built through a continuous integration/continuous delivery (CI/CD) process.  The AMI’s would be pulled at deploy time to be launched into production similar to the way a JAR or WAR would be pulled from an artifact repository.  Only in this example, it  included all the infrastructure (OS, Middleware, application and most of the application configuration files).  I like to use the phrase “Java Lied’ in many of my presentations.  They told us “Write once, run anywhere”.  What they forgot to say is this is true unless you have an incompatible runtime environment.  Netflix at the time of the blog post didn’t refer to that process as “Immutable Infrastructure” and of course it was not completely immutable.  They had to use a set of open source tools to discover the services and converge the deployed services, so hence, their systems were not immutable.  However, their service delivery pattern was indeed a model of an immutable delivery pattern. Some time later, Netflix did start to refer to their architecture as “Immutable Infrastructure”.   

Trash Your Servers and Burn Your Code: Immutable Infrastructure

June 2013 was the first time I had heard the phrase “Immutable Infrastructure”.  Chad Fowler wrote a blog post “Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components”. In the blog, Fowler proposes his idea born from functional programming techniques that offer immutable data structures. The belief was that if somehow we could deliver the complete infrastructure, for example a server, with all the infrastructure needed  for the application, then in his words, “ it would be easier to reason about and harder to screw up”.  Imagine delivering a server the same way we deliver an application, for example, like a WAR file. Fowler’s main point was that systems grow warts, especially when you are fire fighting.  For example, a system might be well defined through an configuration management  tool, but during an outage, changes may be made on the system directly and then never put back into the configuration management recipe or manifest later.  The list of possible human or system entropy examples goes on.  Fowler also points out that sometimes application code is deployed outside of “normal straight-from-source-control process.” Operating system patching or source repositories sometimes change in flight between testing and production. There are many other examples of possible mismatches in the flow you can find in the wild. All of this could be put in a bucket called “bad configuration management hygiene”; however, just like I have never seen a perfect immutable system, I have also never seen in my 35 years a “perfect system”.  I mean ”system” the way Dr. Deming would describe it, in that, all systems include humans.  

Docker and the Three Ways of Devops

When I first arrived at Docker back in February 2015,  I reviewed a Gartner paper called “Become More Agile and Get Ready for DevOps by Using Docker in Your Continuous Integration Environments” and it set me down a course of thinking. The author, Sean Kenefick, had a strong background in release engineering and wrote an excellent paper of how Gartner would suggest using Docker.  As I read the paper,  the first thing it reminded me of was Steve Traugott’s paper about order and why it mattered and the value of congruent systems.  I decided to write a blog post called “Docker and the Three Ways of Devops”.  During my research, I talked to a number of Docker users that were doing what Fowler described as Immutable Deployments, using Docker images as the immutable artifacts. This process was similar to what Netflix was doing with two major differences. One, the services were container images not virtual machines, and two, they were being delivered immutably from the developer’s perspective. After the container images were compiled and tested the “service” would be pushed to the CI process for service level integration testing. Most of the organizations using this model had already crossed over to a microservices architecture. The flow would go something like this:

  • The developer would test their service as a container, typically on a virtual machine running on their laptop.    
  • They would also load the other primary services in their service-oriented architecture, possibly owned by other teams, into their same virtual machine on their laptop.  
  • They would continue to compile, load and test their service sometimes on their laptop and other times through a sort of first-pass CI server.  
  • When testing was complete, they would typically check in their service as a container (binary) image with a meta file to describe the rest of the pipeline flow (CI/CD).  

All of the people I talked to agreed that the benefit of this process was that the developer was in control not only of the application code they wrote but also of any middleware and basic operating systems behavior (note 2 & 3). The benefits of an immutable delivery process, like the ones described by Netflix with their AMI flow, are that you can increase speed and decrease resources and possible variation with containers. Containers instantiate in around 500 milliseconds whereas virtual machines instantiate well over a minute. In a microservices architecture, many of the containers are around 100 megabytes whereas virtual machines could still be as large as 2 gigabytes. I like to say that containers are the killer app for microservices. With this mode the developer can test all of the other dependent services from their laptop.  Werner Vogels the CTO of Amazon is often quoted as saying, “You build it, you run it”. In Devops we like to say, “developers should wear pagers”. There is a reason why developers like Docker so much. When they build it, own it and get paged in the middle of the night, they know that for the most part the bits that they tested are the same (i.e., congruent ) bits that are running in production.  

In 2014 at Dockercon US, Michael Bryzek, of Gilt, gave a fantastic presentation “Immutable Infrastructure with Docker and EC2”.  In this presentation, he describes a process where the developers check in a set of binary container images with one single meta file.  I have personally transcribed what he says starting at 28:03 in his presentation:

“This is how we run our infrastructure. One of the things that developers have to do is provide the commands to start the Docker container, and that’s it. This is kind of amazing right?  Any EC2 instance that we spin up now, we don’t care if you’re running Node, Ruby, Scala, Java or if you made up your own programming language. It’s absolutely amazing how nice this is.  When we compare this to the way we did this in the past, we had one repository that had all of the different scripts to know how to build all of the different applications at Gilt. We have 1000 Git repos and over 300 different applications. We are 7 years old which means we have like 7 different ways of producing applications. There’s 25 different ways that we build software at Gilt and it’s all captured in a central repo.  That was at conflict with where we are going in terms of teams really owning their software and being able to deploy their own services.”

I have talked to a number of companies over the past year and many of them are moving to an “Immutable Delivery” process driven by microservices implemented by containers. Capital One at the Devops Enterprise Summit in October 2015 (DOES15)  gave a presentation called “Banking on Innovation & DevOps”. They said in that presentation that they are using Docker in production. They have also said that they are delivering software in an immutable delivery pattern. This model is not just for web scale anymore.  

In the end, “Immutable Infrastructure” or what I have coined as “Immutable Delivery”  is just a model with many variants. No large organization uses a single model to manage their infrastructure. Over the next few years, I look forward to working with all sorts of products, old and new, to find the correct balance of service delivery. My only goal is to be an evangelist of a model that Josh Corman, CTO at Sonatype, and I describe as “Immutable Awesomeness”.  This is  a presentation we did at DOES15.  We borrowed many of our ideas from the book “Toyota Supply Chain Management: A Strategic Approach to Toyota’s Renowned System”.  In this book, they describe the 4 V’s.  Increase Variety, Velocity, and Visibility and decrease Variability.  In short whatever works, works…

John Willis
Director of Ecosystem Development, Docker Inc.
@botchagalupe

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

Notes:

  1. To be clear, Puppet today allows for both models and this particular differentiation, in my opinion, no longer exists. In fact, both products today have relative parity with regards to ordering.
  2. For the nit-pickers, mileage varies on operating system immutability. Containers run on a host operating system and share the kernel.  Bad hygiene on the host will definitely cause “immutable” woes.  
  3. This is, by the way, a great area for co-existence between Infrastructure as Code products like Chef and Puppet and containerization products like Docker

References:

Why Order Matters: Turing Equivalence in Automated Systems Administration

http://www.infrastructures.org/papers/turing/turing.html

Building with Legos

http://techblog.netflix.com/2011/08/building-with-legos.html

VM Image Sprawl in Real Life

http://www.cloudscaling.com/blog/cloud-computing/vm-image-sprawl-in-real-life/

Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

http://chadfowler.com/blog/2013/06/23/immutable-deployments/

Become More Agile and Get Ready for DevOps by Using Docker in Your Continuous Integration Environments

https://www.gartner.com/doc/3016317/agile-ready-devops-using-docker

Docker and the Three Ways of Devops

https://blog.docker.com/2015/05/docker-three-ways-devops/

A conversation with Werner Vogels

http://queue.acm.org/detail.cfm?id=1142065

Immutable Infrastructure with Docker and EC2”

http://tech.gilt.com/2014/07/02/immutable-infrastructure-with-docker-and-ec2/

Banking on Innovation & DevOps

http://devopsenterprise.io/sessions/shortening-the-feedback-loop-devops-dashboard/

Toyota Supply Chain Management: A Strategic Approach to Toyota’s Renowned System

http://www.amazon.com/Toyota-Supply-Chain-Management-Strategic/dp/0071615490

Immutable Awesomeness

https://www.youtube.com/watch?v=-S8-lrm3iV4

 

2 Comments

Filed under DevOps

Container Automation: Building a Brickyard

My name is Nathaniel Eliot, and I’ve worked extensively in software deployment over the last several years. I have worked on two automation frameworks around Chef: Ironfan, an open-source cluster management system from Infochimps, and Elzar, a (sadly closed-source) blue-green framework based on Spiceweasel. I currently work at Bazaarvoice, where I’m building out a Flynn.io installation.

Barnyard Blues

There is a catch-phrase in DevOps: “cattle, not pets”. It’s intended to describe the step forward that configuration management (CM, e.g. Chef, Puppet, Ansible, Salt, etc.) tools provide. Instead of building and maintaining systems by hand, DevOps-savvy engineers aim to build them via automated, repeatable systems. This has revolutionized system deployment, and resulted in vast improvements in the stability and manageability of large, complicated systems.

But while cattle are an important part of civilizing your software, they have drawbacks. As anybody who’s worked a farm will tell you, cattle management is hard work. Long-lived systems (which most barnyard-style deployments still are) decay with age, as their surrounding environment changes, and as faults are triggered in software components; upgrades to fix these issues can be complicated and fragile. New hosts for these systems can also suffer from unintended evolution, as external resources referenced by the CM build process change. System builds are often lengthy affairs, and often heavily intertwined, such that singular failures can block updates on unrelated resources.

These issues mean that failures are often addressed by “reaching into the cow”: SSH logins to affected hosts. As the phrasing implies, this should be considered a little gross. Your team’s collective understanding of a system is based on it being build in predictable ways from visible source code: an SSH login undermines that understanding.

Building a Brickyard

The phrase I like for container automation (CA, e.g. Flynn, Mesos+Docker, etc.) is “brickyard, not barnyard”. Bricks are more uniform, quicker to make, and easier to transport than cows: CA provides greater immutability of product, faster cycle time, and easier migration than CM.

Because everything is baked to the base image, the danger of environmental changes altering or breaking your existing architecture is far lower. Instead, those changes break things during the build step, which is decoupled from the deployment itself. If you expand this immutability by providing architecturally identical container hosts, your code is also less vulnerable to “works in dev” issues, where special development configuration is lacking on production machines.

Rapid cycle time is the next great advantage that CA provides, and arguably the largest from a business perspective. By simplifying and automating build and deployment processes, CA encourages developers to commit and test regularly. This improves both development velocity and MTTR (mean time to repair), by providing safe and simple ways to test, deploy, and roll back changes. Ultimately, a brick is less work to produce than a fully functioning cow.

Because CA produces immutable results, those results can easily be transported. The underlying CA tools must be installed in the new environment, but the resulting platform looks the same to the images started on it. This gives you a flexibility in migration and deployment that may be harder to achieve in the CM world.

These benefits are theoretically achievable with configuration management; Ironfan is a good example of many of these principles at work in the CM world. However, they aren’t first class goals of the underlying tools, and so systems that achieve them do so by amalgamating a larger collection of more generic tools. Each of those tools makes choices based on the more generic set of situations it’s in, and the net result is a lot of integration pain and fragility.

Bricks or Burgers

So when should you use CM, and when should you use CA? You can’t eat bricks, and you can’t make skyscrapers from beef; obviously there are trade-offs.

Configuration management works best at smoothing the gaps between the manually deployed world that most of our software was designed in, and the fully automated world we’re inching toward. It can automate pretty much any installation you can do from a command line, handling the wide array of configuration options and install requirements that various legacy software packages expect.

Container automation currently works best for microservices: 12-factor applications that you own the code for. In existing architectures, those often live either in overly spacious (and fallable) single servers, or in messy shared systems that become managerial black-holes. This makes them an easy first target, providing greater stability, management, and isolation than their existing setups.

However, that’s as things stand currently. Civilization may depend on both, and the cattle came first, but ultimately it’s easier to build with bricks. As frameworks like Flynn expand their features (adding volume management, deploy pipelines, etc), and as their users build experience with more ambitious uses, I believe CM is slowly going to be trumped (or absorbed) by the better CA frameworks out there.

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

Leave a comment

Filed under DevOps