Tag Archives: tools

DevOps: It’s Not Chef And Puppet

There’s a discussion on the devops Google group about how people are increasingly defining DevOps as “chef and/or puppet users” and that all DevOps is, is using one of these tools.  This is both incorrect and ignorant.

Chef and puppet are individual tools that you can use to implement specific parts of an overall DevOps strategy if you want to use them – but that’s it.  They are fine tools, but do not “solve DevOps” for you, nor are they even the only correct thing to do within their provisioning niche. (One recent poster got raked over the coals for insisting that he wanted to do things a different way than use these tools…)

This confusion isn’t unexpected, people often don’t want to think too deeply about  a new idea and just want the silver bullet. Let’s take a relevant analogy.

Agile.  I see a lot of people implement Scrum blindly with no real understanding of anything in the Agile Manifesto, and then religiously defend every one of Scrum’s default implementation details regardless of its suitability to their environment.  Although that’s better than those that even half ass it more and just say “we’ve gone to our devs coding in sprints now, we must be agile, woot!” Of course they didn’t set up any of the guard rails, and use it as an exclude to eliminate architecture, design, and project planning, and are confused when colossal failure results.

DevOps.  “You must use chef or puppet, that is DevOps, now let’s spend the rest of our time fighting over which is better!” That’s really equivalent to the lowest level of sophistication there in Agile.  It’s human nature, there are people that can’t/don’t want to engage in higher order thought about problems, they want to grab and go.  I kinda wish we could come up with a little bit more of a playbook, like Scrum is for Agile, where at least someone who doesn’t like to think will have a little more guidance about what a bundle of best practices *might* look like, at least it gives hints about what to do outside the world of yum repos.  Maybe Gene Kim/John Willis/etc’s new DevOps Cookbook coming out soon(?) will help with that.

My own personal stab at “What is DevOps” tries to divide up principles, methods, and practices and uses agile as the analogy to show how you have to treat it.  Understand the principles, establish a method, choose practices.  If you start by grabbing a single practice, it might make something better – but it also might make something worse.

Back at NI, the UNIX group established cfengine management of our Web boxes. But they didn’t want us, the Web Ops group, to use it, on the grounds that it would be work and a hassle. But then if we installed software that needed, say, an init script (or really anything outside of /opt) they would freak out because their lovely configurations were out of sync and yell at us. Our response was of course “these servers are here to, you know, run software, not just happily hum along in silence.” Automation tools can make things much worse, not just better.

At this week’s Agile Austin DevOps SIG, we had ~30 folks doing openspaces, and I saw some of this.  “I have this problem.”  “You can do that in chef or puppet!” “Really?”  “Well… I mean, you could implement it yourself… Kinda using data bags or something…” “So when you say I can implement it in chef, you’re saying that in the same sense as ‘I could implement it in Java?'”  “Uh… yeah.” “Thanks kid, you’ve been a big help.”

If someone has a systems problem, and you say that the answer to that problem is “chef” or “puppet,” you understand neither the problem nor the tools. It’s “when you have a hammer, everything looks like a nail – and beyond that, every part of a construction job should be solved by nailing shit together.”

We also do need to circle back up and do a better job of defining DevOps.  We backed off that early on, and as a result we have people as notable as Adrian Cockroft saying “What’s DevOps?  I see a bunch of conflicting blog posts, whatever, I’ll coin my own *Ops term.” That’s on us for not getting our act together. I have yet to see a good concise DevOps definition that is unique if you remove the word DevOps and insert something else (“DevOps helps you bring value to software!  DevOps is about delivering a service to a customer!” s/DevOps/100 other things/).

At DevOpsDays, some folks contended that some folks “get” DevOps and others don’t and we should leave them to their shame and just do our work. But I feel like we have some responsibility to the industry in general, so it’s not just “the elite people doing the elite things.” But everyone agreed the tool focus is getting too much – John Willis even proposed a moratorium on chef/puppet/tooling talks at DevOpsDays Mountain View because people are deviating from the real point of DevOps in favor of the knickknacks too much.

7 Comments

Filed under DevOps

Why A HTTP Sniffer Is Awesome

While looking at Petit for my post on log management tools, I was thrilled to see it link to a sniffer that generates Web type logs called Justniffer.  Why, you might ask, isn’t that a pretty fringe thing?  Well settle in while I tell you why it’s bad ass.

We used to run a Web analytics product here called NetGenesis.  Like all very old Web analytics products, it relied on you to gather together all your log files for it to parse, resulting in error prone nightly cronjob kinds of nonsense.  So they came out with a network sniffer that logged into Apache format, like this does apparently.  It worked great and got the info in realtime (as long as the network admins didn’t mess up our network taps, which did happen from time to time).

I quickly realized this sniffer was way better than log aggregation, especially because my environment had all kinds of weird crap like Domino Web servers and IIS5 that don’t log in a civilized manner.  And since it sat between the Web servers and the client, it could log “client time,” “server time”, and had a special “900” error code for client aborts/timeouts.  I self-implemented what would be a predecessor to todays’ RUM tools like Tealeaf and Coradiant on it.  We used it to do realtime traffic analysis, cross-site reporting, and even used it for load testing as we’d transform and replay the captured logs against test servers. Using it also helped us understand the value of the Steve Souders front end performance stuff when he came around.

Eventually our BI folks moved to a Javascript page tag based system, which are the modern preference in Web analytics systems.  Besides the fact that these schemes only get pages that can execute JS and not all the images and other assets, we discovered that they were reasonably flawed and were losing about 10% of the traffic that we were seeing in the network sniffer log.  After a long and painful couple months, we determined that the lost traffic was from no known source and happened with other page tag based systems (Google Analytics, etc.), not just this supplier’s tool, and the BI folks finally just said “Well…  It gives us pretty clickstreams and stuff, let’s go ahead with it.”  Sadly that sunset our use of the Netgenesis network sniffer and there wasn’t another like it in the open source realm (I looked).  Eventually we bought a Coradiant to do RUM (the sales rep kept trying to explain this “new network RUM concept” to us and kept being taken aback and how advanced the questions were we asked) but I missed the accessibility of my sniffer log…  Big log aggregators like Splunk help fill that gap somewhat but sometimes you really want to grep|cut|sort|uniq the raw stuff.

On the related topic of log replayers, we have really wanted one for a long time.  No one has anything decent.  We’ve bugged every supplier that we deal with on any related product, from RUM to load testing to whatever.  Recording a specific transaction and using that is fine, but nothing compares to the demented diversity of real Internet traffic.  We wrote a custom replayer for our sniffer log, although it didn’t do POST (didn’t capture payloads – looks like justniffer can though!) and got a lot of mileage out of it.  Found al ot of app bugs before going to production with that baby.  Anyway, none of the suppliers can figure it out (Oracle just put together a DB traffic version of this in their new version 12 though).  Now that there’s a sniffer we can use, we already have a decent replayer, we’re back in business!  So I’m excited, it’s a blast from the past but also one of those core little things that you can’t believe there isn’t one of, and that empowers someone to do a whole lot of cool stuff.

Leave a comment

Filed under DevOps

Log Management Tools

We’re researching all kinds of tools as we set up our new cloud environment, I figure I may as well share for the benefit of the public…

Most recently, we’re looking at log management.  That is, a tool to aggregate and analyze your log files from across your infrastructure.  We love Splunk and it’s been our tool of choice in the past, but it has two major drawbacks.  One, it’s  quite expensive. In our new environment where we’re using a lot of open source and other new-format vendors, Splunk is a comparatively big line item for a comparatively small part of an overall systems management portfolio.

Two, which is somewhat related, it’s licensed by amount of logs it processes per day.  Which is a problem because when something goes wrong in our systems, it tends to cause logging levels to spike up.  In our old environment, we keep having to play this game where an app will get rolled to production with debug on (accidentally or deliberately) or just be logging too much or be having a problem causing it to log too much, and then we have to blacklist it in Splunk so it doesn’t run us over our license and cause the whole damn installation to shut off.  It took an annoying amount of micromanagement for this reason.

Other than that, Splunk is the gold standard; it pulls anything in, graphs it, has Google-like search, dashboards, reports, alerts, and even crazier capabilities.

Now on the “low end” there are really simple log watchers like swatch or logwatch.  But we’d really like something that will aggregate ALL our logs (not just syslog stuff using syslog-ng – app server logs, application logs, etc.), ideally from both UNIX and Windows systems, and make them usefully searchable.  Trying to make everything and everyone log using syslog is an ever receding goal.  It’s a fool’s paradise.

There’s the big appliance vendors on the “high end” like LogLogic and LogRhythm, but we looked at them when we looked at Splunk and they are not only expensive but also seem to be “write only solutions” – they aggregate your logs to meet compliance requirements, and do some limited pattern matching, but they don’t put your logs to work to help you in your actual work of application administration the dozen ways Splunk does.  At best they are “SIEM”s – security information and event managers – that alert on naughty intruders.  But with Splunk I can do everything from generate a report of 404s to send to our designers to fix their bad links/missing images to graph site traffic to make dashboards for specific applications for their developers to review.  Plus, as we’re doing this in the cloud, appliances need not apply.  (Ooo, that’s a catchy phrase, I’ll have to use that for a separate post!)

I came across three other tools that seem promising:

  • Logscape from Liquidlabs – does graphing and dashboards like Splunk does.  And “live tail” – Splunk mysteriously took this out when they revved from version 3 to 4!  Internet rumor is that it’s a lot cheaper.  Seems like a smaller, less expensive Splunk, which is a nice thing to be, all considered.
  • Octopussy – open source and Perl based (might work on Windows but I wouldn’t put money on it).  Does alerting and reporting.  Much more basic, but you can’t beat the price.  Don’t think it’ll meet our needs though.
  • Xpolog – seems nice and kinda like Splunk.  Most of the info I can find on it, though, are “What about xpolog, is good!” comments appended to every forum thread/blog post about Splunk I can find, which is usually a warning sign – that kind of guerrilla marketing gets old quick IMO.  One article mentions looking into it and finding it more expensive but with some nice features like autodiscovery, but not as open as Splunk.

Anyone have anything to add?  Used any of these?  We’ve gotten kind of addicted to having our logs be immediately accessible, converted into metrics, etc.  I probably wouldn’t even begrudge Splunk the money if it weren’t for all the micromanagement you have to put into running it.  It’s like telling the fire department “you’re licensed for a maximum of three fires at a time” – it verges on irresponsible.

21 Comments

Filed under DevOps