Our first guest post on theagileadmin is by Schlomo Schapiro, Systems Architect and Open Source Evangelist at ImmobilienScout24. I met Schlomo and his colleagues at DevOpsDays and they piqued my interest with their YADT deployment tool they’ve open sourced. Welcome, Schlomo!
“How do you update your system with OS patches” was one of my favourite questions last week at the Velocity Web Performance and Operations Conference in Santa Clara and at the devopsdays in Mountain View. I tried to ask as many people as would be ready to talk to me about their deployment solutions, most of whom where using one of puppet or chef.
The answers I got ranged from “we build a new image” through “somebody else is doing that” to “I don’t know”. So apparently many people see the OS stack different from their software stack. Personally, I find this very worrying because I strongly believe that one must see the entire stack (OS, software & configuration) as one and also validate it as one.
We see again and again that there is a strong influence from the OS level into the application level. Probably everybody already had to suffer from NFS and autofs and had seen their share of blocked servers. Or maybe how a changed behaviour in threading suddenly makes a server behave completely different under load. While some things like the recent leap second issue are really hard to test, most OS/application interactions can be tested quite well.
For such tests to be really trustworthy the version must be the same between test and production. Unfortunately even a very small difference in versions can be devastating. A specific example we recently suffered from is autofs in RHEL5 and RHEL6 which would die on restart up to the most recent patch. It took us awhile to find out that the autofs in testing was indeed just this little bit newer than in production to actually matter.
If you are using images and not adding any OS patches between image updates, then you are probably on the safe side. If you add patches on top of the image, then you also run a chance that your versions will deviate.
So back to the initial question: If you are using chef, puppet or any other similar tool: How do you manage OS patches? How do you make sure that OS patches and upgrades are tested exactly the same as you test changes in your application and configuration? How do you make sure that you stage them the same? Or use the same rollout process?
For us at ImmobilienScout24 the answer is simple: We treat all changes to our servers exactly the same way without discrimination. The basis for that is that we package all software and configuration into RPM packages and roll them out via YUM channels. In that context it is of course easy to do the same with OS patches and upgrades, they just come from different YUM channels. But the actual change process is exactly the same: Put systems with outstanding RPM updates into maintenance mode, do yum upgrade, start services, run some tests, put systems back into production.
I am not saying that everybody should work that way. For us it works well and we think that it is a very simple way how to deal with the configuration management and deployment issue. What I would ask everybody is to ask yourself how you plan to treat all changes in a server the same way so that the same tests and validation can be applied. I believe that using the same tool to manage all changes makes it much simpler to treat all changes equal than using different tools for different layers of the stack. And if only because a single tool makes it much easier to correlate such changes. Maybe it should be explored more how to use puppet and chef to do the entire job and manage the “lower” layers of the system stack as well as the upper layers.
Are you “doing DevOps”? Then maybe you can look at it like this: If you manage all the stuff on a server the same way it will help you to get everybody onto the same page with regard to OS patches. No more “surprise updates” that catch the developers cold because they are part of all the updates.
Hopefully at the next Velocity somebody will give a talk about how to simplify operations by treating all changes equal.
I’m *strongly* in favor of this approach. We’re not there yet at Infochimps, but we’re laying the groundwork as quick as we can.
For RHEL this might not be best idea, but I didn’t see a mention of spacewalk in this article. You can use that for hosting patches, see whats in sync and keep everything in house.
Is there a common pattern for doing the maintenance mode + update servers in Chef? I am writing a utility cookbook internally that will set the update repository on servers so we can test new packages on a control server and then update the rest of the servers later.
But its the action of updating the other servers that I am having trouble with. Some are running application code, some are running Apache or other processes. How do you shut down all processes, perform a “yum update” and then restart everything across lots of different cookbooks (some being 3rd party ones).
Or do you manage that through just “sudo init 2” / “yum update” / “sudo init 3”?
This has little to do with automation tools, and everything to do with managing package repos. If you’re going to blindly install updates to some packages while testing others, especially if you aren’t consistent, you will eventually have problems.
Anyone who cares about their system should already have a complete list of packages installed on a system, including the “OS” packages. Then it’s just package version management, which any tool that knows about package versions can handle. I prefer CFEngine, but you can skin the same “ensure that this version of this package is installed” cat in a number of ways.