Tag Archives: chef

Puppet and Chef do only half the job

Our first guest post on theagileadmin is by Schlomo Schapiro, Systems Architect and Open Source Evangelist at ImmobilienScout24. I met Schlomo and his colleagues at DevOpsDays and they piqued my interest with their YADT deployment tool they’ve open sourced.  Welcome, Schlomo!

“How do you update your system with OS patches” was one of my favourite questions last week at the Velocity Web Performance and Operations Conference in Santa Clara and at the devopsdays in Mountain View. I tried to ask as many people as would be ready to talk to me about their deployment solutions, most of whom where using one of puppet or chef.

The answers I got ranged from “we build a new image” through “somebody else is doing that” to “I don’t know”. So apparently many people see the OS stack different from their software stack. Personally, I find this very worrying because I strongly believe that one must see the entire stack (OS, software & configuration) as one and also validate it as one.

We see again and again that there is a strong influence from the OS level into the application level. Probably everybody already had to suffer from NFS and autofs and had seen their share of blocked servers. Or maybe how a changed behaviour in threading suddenly makes a server behave completely different under load. While some things like the recent leap second issue are really hard to test, most OS/application interactions can be tested quite well.

For such tests to be really trustworthy the version must be the same between test and production. Unfortunately even a very small difference in versions can be devastating. A specific example we recently suffered from is autofs in RHEL5 and RHEL6 which would die on restart up to the most recent patch. It took us awhile to find out that the autofs in testing was indeed just this little bit newer than in production to actually matter.

If you are using images and not adding any OS patches between image updates, then you are probably on the safe side. If you add patches on top of the image, then you also run a chance that your versions will deviate.

So back to the initial question: If you are using chef, puppet or any other similar tool: How do you manage OS patches? How do you make sure that OS patches and upgrades are tested exactly the same as you test changes in your application and configuration? How do you make sure that you stage them the same? Or use the same rollout process?

For us at ImmobilienScout24 the answer is simple: We treat all changes to our servers exactly the same way without discrimination. The basis for that is that we package all software and configuration into RPM packages and roll them out via YUM channels. In that context it is of course easy to do the same with OS patches and upgrades, they just come from different YUM channels. But the actual change process is exactly the same: Put systems with outstanding RPM updates into maintenance mode, do yum upgrade, start services, run some tests, put systems back into production.

I am not saying that everybody should work that way. For us it works well and we think that it is a very simple way how to deal with the configuration management and deployment issue. What I would ask everybody is to ask yourself how you plan to treat all changes in a server the same way so that the same tests and validation can be applied. I believe that using the same tool to manage all changes makes it much simpler to treat all changes equal than using different tools for different layers of the stack. And if only because a single tool makes it much easier to correlate such changes. Maybe it should be explored more how to use puppet and chef to do the entire job and manage the “lower” layers of the system stack as well as the upper layers.

Are you “doing DevOps”? Then maybe you can look at it like this: If you manage all the stuff on a server the same way it will help you to get everybody onto the same page with regard to OS patches. No more “surprise updates” that catch the developers cold because they are part of all the updates.

Hopefully at the next Velocity somebody will give a talk about how to simplify operations by treating all changes equal.


Filed under Conferences, DevOps

DevOps: It’s Not Chef And Puppet

There’s a discussion on the devops Google group about how people are increasingly defining DevOps as “chef and/or puppet users” and that all DevOps is, is using one of these tools.  This is both incorrect and ignorant.

Chef and puppet are individual tools that you can use to implement specific parts of an overall DevOps strategy if you want to use them – but that’s it.  They are fine tools, but do not “solve DevOps” for you, nor are they even the only correct thing to do within their provisioning niche. (One recent poster got raked over the coals for insisting that he wanted to do things a different way than use these tools…)

This confusion isn’t unexpected, people often don’t want to think too deeply about  a new idea and just want the silver bullet. Let’s take a relevant analogy.

Agile.  I see a lot of people implement Scrum blindly with no real understanding of anything in the Agile Manifesto, and then religiously defend every one of Scrum’s default implementation details regardless of its suitability to their environment.  Although that’s better than those that even half ass it more and just say “we’ve gone to our devs coding in sprints now, we must be agile, woot!” Of course they didn’t set up any of the guard rails, and use it as an exclude to eliminate architecture, design, and project planning, and are confused when colossal failure results.

DevOps.  “You must use chef or puppet, that is DevOps, now let’s spend the rest of our time fighting over which is better!” That’s really equivalent to the lowest level of sophistication there in Agile.  It’s human nature, there are people that can’t/don’t want to engage in higher order thought about problems, they want to grab and go.  I kinda wish we could come up with a little bit more of a playbook, like Scrum is for Agile, where at least someone who doesn’t like to think will have a little more guidance about what a bundle of best practices *might* look like, at least it gives hints about what to do outside the world of yum repos.  Maybe Gene Kim/John Willis/etc’s new DevOps Cookbook coming out soon(?) will help with that.

My own personal stab at “What is DevOps” tries to divide up principles, methods, and practices and uses agile as the analogy to show how you have to treat it.  Understand the principles, establish a method, choose practices.  If you start by grabbing a single practice, it might make something better – but it also might make something worse.

Back at NI, the UNIX group established cfengine management of our Web boxes. But they didn’t want us, the Web Ops group, to use it, on the grounds that it would be work and a hassle. But then if we installed software that needed, say, an init script (or really anything outside of /opt) they would freak out because their lovely configurations were out of sync and yell at us. Our response was of course “these servers are here to, you know, run software, not just happily hum along in silence.” Automation tools can make things much worse, not just better.

At this week’s Agile Austin DevOps SIG, we had ~30 folks doing openspaces, and I saw some of this.  “I have this problem.”  “You can do that in chef or puppet!” “Really?”  “Well… I mean, you could implement it yourself… Kinda using data bags or something…” “So when you say I can implement it in chef, you’re saying that in the same sense as ‘I could implement it in Java?'”  “Uh… yeah.” “Thanks kid, you’ve been a big help.”

If someone has a systems problem, and you say that the answer to that problem is “chef” or “puppet,” you understand neither the problem nor the tools. It’s “when you have a hammer, everything looks like a nail – and beyond that, every part of a construction job should be solved by nailing shit together.”

We also do need to circle back up and do a better job of defining DevOps.  We backed off that early on, and as a result we have people as notable as Adrian Cockroft saying “What’s DevOps?  I see a bunch of conflicting blog posts, whatever, I’ll coin my own *Ops term.” That’s on us for not getting our act together. I have yet to see a good concise DevOps definition that is unique if you remove the word DevOps and insert something else (“DevOps helps you bring value to software!  DevOps is about delivering a service to a customer!” s/DevOps/100 other things/).

At DevOpsDays, some folks contended that some folks “get” DevOps and others don’t and we should leave them to their shame and just do our work. But I feel like we have some responsibility to the industry in general, so it’s not just “the elite people doing the elite things.” But everyone agreed the tool focus is getting too much – John Willis even proposed a moratorium on chef/puppet/tooling talks at DevOpsDays Mountain View because people are deviating from the real point of DevOps in favor of the knickknacks too much.


Filed under DevOps

GeekAustin DevOps #1 – Puppet, Chef, bcfg2, but no Dev

All three Agile Admins were at this Austin event last Saturday; GeekAustin had a set of back to back presentations on puppet, chef, and bcfg2 downtown at Elysium. A good crowd was there, maybe 50 people.

Matt Ray presented on chef, and has posted his slides. Jeff McCune spoke on Puppet and Sol Jerome spoke on bcfg2, I’ll link their slides here whenever I become aware of them being posted.

I had to leave before bcfg2, but the chef and puppet presentations were interesting.  One of the main audience comments was “these are pretty similar,” which is true. Both are real good.

Probably the one problem with the event was that it wasn’t really DevOps.  It was just tool presentations for sysadmins.  Our developers took one look and didn’t come; we met one or two developers at the event that weren’t getting a lot out of it.

It’s fine to have an event looking at these tools – but I want to caution the DevOps community that the value of DevOps is the different approach to life. I think the excellent dev2ops post on “DevOps is not a technology problem, DevOps is a business problem” explains this perfectly. PEOPLE over PROCESS over TOOLS. If you don’t have engagement from the developers, you don’t have DevOps, no matter what whizbang gadgets you have.  Frankly I’d like to see the tool vendors address this in their presentations. A little bit of “you’re a sysadmin, this saves you work, isn’t it cool” is fine but how about talking about how this can be used to collaborate with the developers?  How does that model work?

At DevOpsDays US I warned about adopting new tools without adopting new tactics, using the example of the deployment of the Minie ball to muskets used in the American Civil War.  Greater accuracy and range, but everyone still just stood their in lines and charged in formation, and the slaughter was profound.  You may have a shiny new weapon, but if you are just using it back in your dark sysadmin cave, the problems that beset you will never go away. You’ll still be the bottom of the food chain, only taken seriously when someone can’t get their email.

I know the guys at Puppet Labs, Opscode, etc. are all big into DevOps – but I guess I’d like to see the tool presentations and value props speak to developers as well somehow. What would that look like?  Ideas?


Filed under DevOps

Velocity 2010: Infrastructure Automation with Chef

After a lovely lunch of sammiches, we kick into the second half of Workshop Day at Velocity 2010.  Peco and I (and Jeff and Robert, also from NI) went to Infrastructure Automation with Chef, presented by Adam Jacob, Christopher Brown, and Joshua Timberman of Opscode.  My comments in italics.

Chef is a library for configuration management, and a system written on top of it.  It’s also a systems integration platform, as we will see later.  And it’s an API for your infrastructure.

In the beginning there was cfengine.  Then came puppet.  Then came chef.  It’s the latest in open source UNIXey config management automation.

  • Chef is idempotent, which means you can rerun it and get the same result, and it does minimal work to get there.
  • Chef is reasonable, and has sane defaults, which you can easily change.  You can change its mind about anything.
  • Chef is open source and you can hack it easily.  “There’s more than one way to do it” is its mantra.

A lot of the tools out there (meaning HP/IBM/CA kinds of things) are heavy and don’t understand how quickly the world changes, so they end up being artifacts of “how I should have built my system 10 years ago.”

It’s based on Ruby.  You really need a third gen language to do this effectively; if they created their own config structure it would grow into an even less standard third gen language.  If you’re a sysadmin, you do indeed program, and people that say you’re not are lying to you.  Apache config is a programming language. Chef uses small composable primitives.

You manage configuration as idempotent resources, which are put together in recipes, and tracked like source code with the end goal of configuring your servers.

Infrastructure as Code

The devops mantra.  Infrastructure is code and should be managed with the same rigor.  Source control, etc.  Chef enables this approach.  Can you reconstruct your business from source code, data backup, and bare metal?  Well, you can get there.

When you talk about constraints that affect design, one of the largest and almost unstated assumptions nowadays is that it’s really hard to recover from failure.   Many aspects of technology and the thinking of technologists is built around that.  Infrastructure as code makes that not so true, and is extremely disruptive to existing thought in the field.

Your automation can only be measured by the final solution.  No one cares about your tools, they care about what you make with them.

Chef Basics

There is a chef client that runs on each server, using recipes to configure stuff.  There’s a chef server they can talk to – or not, and run standalone.  They call each system a “node.”

They get a bunch of data points, or attributes, off the nodes and you can search them on the server, like “what version of Perl are you running.”  “knife” is the command line tool you use to do that.

Nodes have a “run list.”  That’s what roles or recipes to apply to a node, in order.

Nodes have “roles.”  A role is a description of what a node should be, like “you’re a Web server.”  A role has a run list of its own, and attributes to modify them – like “base, apache2, modssl” and “maxchildren=50”.

Chef manages resources on nodes.  Resources are declarative descriptions of state.  Resources are of type package or service; basically software install and running software.  Install software at a given version; run a service that supports certain commands.  There’s also a template resource.

Resources take action through providers.  A provider is what knows how to actually do the thing (like install a package, it knows to use apt-get or yum or whatever).

Think about it as resources go through a platform to pick a provider.

Recipes apply resources in order.  Order of execution is determined by the order they’re listed, which is pretty intuitive.  Also, systems that fail within a recipe should generally fail in the same state.  Hooray, structured programming!

Recipes can include other recipes.  They’re just Ruby.  (Everything in Chef is Ruby or JSON). No support for asynchronous actions – you can figure out a way to do it (for file transfers, for example) but that’s really bad for system packages etc.

Cookbooks are packages for recipes.  Like “Apache.”  They have recipes, assets (like the software itself), and attributes.  Assets include files, templates (evaluated with a templating language called ERB), and attributes files (config or properties files).  They try to do some sane smart config defaults (like in nginx, workers = number of cores in the box).  Cookbooks also have definitions, libraries, resources, providers…

Cookbooks are sharablehttp://cookbooks.opscode.com/ FTW! They want the cookbook repo to be like CPAN – no enforced taxonomy.

Data bags store arbitrary data.  It’s kinda like S3 keyed with JSON objects .  “Who all plays D&D?  It’s like a Bag of Holding!”  They’re searchable.  You can e.g. put a mess of users in one.  Then you can execute stuff on them.  And say use it instead of Active Directory to send users out to all your systems.  “That’s bad ass!” yells a guy from the crowd.

Working with Chef

  1. Install it.
  2. Create a chef repo.  Like by git cloning their stock one.
  3. Configure knife with a .chef/knife.rb file.  There’s a Web UI too but it’s for feebs.
  4. Download some cookbooks.  “knife cookbook site vendor rails -d” gets the ruby cookbook and makes a “vendor branch” for it and merges it in.
  5. Read the recipes.  It runs as root, don’t be a fool with your life.
  6. Upload them to the server.
  7. Build a role (knife role create rails).
  8. Add cloud credentials to knife – it knows AWS, Rackspace, Terremark.
  9. Launch a new rails server (knife ec2 server create ‘role[rails]’) – can also bootstrap
  10. Run it!
  11. Verify it!  knife ssh does parallel ssh and does command, or even screen/tmux/macterm
  12. Change it by altering your recipe and running again.

Live Demo

This was a little confusing.  He started out with a data bag, and it has a bunch of stuff configured in it, but a lot of the stuff in it I thought would be in a recipe or something.  I thought I was staying with the presentation well, but apparently not.

The demo goal is good – configure nagios and put in all the hosts without doing manual config.

Well, this workshop was excellent up to here – though I could have used them taking a little more time in “Working with Chef” – but now he’s just flipping from chef file to chef file and they’re all full of stuff that I can’t identify immediately because I’m, you know, not super familiar with Chef.  THey really could have used a more “hello world”y demo or at least stepped through all the pieces and explained them (ideally in the same order as the “working with chef” spiel).

Chef 0.8 introduced the “chef shell,” shef.    You can run recipes line by line in it.

And then there was a fire alarm!  We all evacuate.  End of session.

Afterwards, in the gaggle, Adam mentioned some interesting bits, like there is Windows support in the new version.  And it does cloud stuff automatically by using the “fog” library.  And unicorn, a server for people that know about 200% more about Rails than me.  That’s the biggest thing about chef – if you don’t do any other Ruby work it’s a pretty  high adoption bar.

One more workshop left for Day 1!


Filed under Conferences, DevOps

OpsCamp Debrief

I went to OpsCamp this last weekend here in Austin, a get-togther for Web operations folks specifically focusing on the cloud, and it was a great time!  Here’s my after action report.

The event invite said it was in the Spider House, a cool local coffee bar/normal bar.  I hadn’t been there before, but other people that had said “That’s insane!  They’ll never fit that many people!  There’s outside seating but it’s freezing out!”  That gave me some degree of trepidation, but I still racked out in time to get downtown by 8 AM on a Saturday (sigh!).  Happily, it turned out that the event was really in the adjacent music/whatnot venue also owned by Spider House, the United States Art Authority, which they kindly allowed us to use for free!  There were a lot of people there; we weren’t overfilling the place but it was definitely at capacity, there were near 100 people there.

I had just hears of OpsCamp through word of mouth, and figured it was just going to be a gathering of local Austin Web ops types.  Which would be entertaining enough, certainly.  But as I looked around the room I started recognizing a lot of guys from Velocity and other major shows; CEOs and other high ranked guys from various Web ops related tool companies.  Sponsors included John Willis and Adam Jacob (creator of Chef) from Opscode , Luke Kanies from Reductive Labs (creator of Puppet), Damon Edwards and Alex Honor from DTO Solutions (formerly ControlTier), Mark Hinkle and Matt Ray from Zenoss, Dave Nielsen (CloudCamp), Michael Coté (Redmonk), Bitnami, Spiceworks, and Rackspace Cloud.  Other than that, there were a lot of random Austinites and some guys from big local outfits (Dell, IBM).

You can read all the tweets about the event if you swing that way.

OpsCamp kinda grew out of an earlier thing, BarCampESM, also in Austin two years ago.  I never heard about that, wish I had.

How It Went

I had never been to an “unconference” before.  Basically there’s no set agenda, it’s self-emergent.  It worked pretty well.  I’ll describe the process a bit for other noobs.

First, there was a round of lightning talks.  Brett from Rackspace noted that “size matters,” Bill from Zenoss said “monitoring is important,” and Luke from Reductive claimed that “in 2-4 years ‘cloud’ won’t be a big deal, it’ll just be how people are doing things – unless you’re a jackass.”

Then it was time for sessions.  People got up and wrote a proposed session name on a piece of paper and then went in front of the group and pitched it, a hand-count of “how many people find this interesting” was taken.

Candidates included:

  • service level to resolution
  • physical access to your cloud assets
  • autodiscovery of systems
  • decompose monitoring into tool chain
  • tool chain for automatic provisioning
  • monitoring from the cloud
  • monitoring in the cloud – widely dispersed components
  • agent based monitoring evolution
  • devops is the debil – change to the role of sysadmins
  • And more

We decided that so many of these touched on two major topics that we should do group discussions on them before going to sessions.  They were:

  • monitoring in the cloud
  • config mgmt in the cloud

This seemed like a good idea; these are indeed the two major areas of concern when trying to move to the cloud.

Sadly, the whole-group discussions, especially the monitoring one, were unfruitful.  For a long ass time people threw out brilliant quips about “Why would you bother monitoring a server anyway” and other such high-theory wonkery.  I got zero value out of these, which was sad because the topics were crucially interesting – just too unfocused; you had people coming at the problem 100 different ways in sound bytes.  The only note I bothered to write down was that “monitoring porn” (too many metrics) makes it hard to do correlation.  We had that problem here, and invested in a (horrors) non open-source tool, Opnet Panorama, that has an advanced analytics and correlation engine that can make some sense of tens of thousands of metrics for exactly that reason.


There were three sessions.  I didn’t take many notes in the first one because, being a Web ops guy, I was having to work a release simultaneously with attending OpsCamp 😛

Continue reading


Filed under DevOps, Uncategorized