Tag Archives: DevOps

What’s a “DevOp?”

I ran across an interesting post by Dmitriy Samovskiy about the difference between a DevOp and a Sysadmin and it raised up some thoughts I’ve had about the classification of different kinds of sysadmin types and the confusion that the “Ops” part of “DevOps” sometimes causes.

I think that using “DevOp” as a job or role name isn’t a good idea, and that really what the term indicates is that there are two major classes of technical role,

  • Devs – people who work with code mostly
  • Ops – people who work with systems mostly

You could say a “DevOp” is someone who does some of both, but I think the preferred usage is that DevOps, like agile, is about a methodology of collaboration.  A given person having both skill sets of fulfilling both roles doesn’t require a special term, it’s like anyone else in IT with multiple hats.

Of course, inside each of these two areas is a wide variety of skills and specialized roles.  Many of the people talking about “DevOps” are in five-person Web shops, in which case “Ops” is an adequate descriptor of “all the infrastructure crap guy #5 does.”

But in larger shops, you start to realize how many different roles there are.  In the dev world, you get specialization, from UI developers to service developers to embedded developers to algorithm developers.  I’d tend to say that even Web design/development (HTML/CSS/JS) and QA are often considered part of the “dev side of the house.”  It’s the same in Ops.

Now, traditionally many “systems” teams, also known as “infrastructure” teams, have been divided up by technology silo only.  You have a list of teams of types UNIX, Windows, storage, network, database, security, etc.  This approach has its strengths but also has critical weaknesses – which is why ITIL, for example, has been urging people to reorganize around “services you are delivering” lines.

In the dev world, you don’t usually see tech silos like that.  “Here’s the C programmer department, the Java programmer department, the SQL programmer department…  Hand your specs to all those departments and hope you get a working app out of it!”  No, everyone knows intuitively that’s insane.  But largely we still do the same thing in traditional systems teams (“Ops” umbrella).

So historically, the first solution that emerged was a separate kind of group.  Here at NI, the first was a “Web Ops” group called the Web Admins, which was formed ten years ago when it became clear that running a successful Web site cannot be done by bringing together fractional effort from various tech silos.  The Web Admins work with the developers and the other systems teams – the systems teams do OS builds, networking, rack-and-jack, storage/data center, etc. and the Web Admins do the software (app servers, collab systems, search, content management, etc.), SaaS, load balancing, operational support, release management, etc.  Our Web Admin team ended up expanding very strongly into the application performance management and Web security areas because no one else was filling them.

In more dotcommey companies, you see the split between their “IT group” and their “Engineering” or “Operations” group that is “support for their products,” as two entirely different beasts.

Anyway, the success of this team spawned others, so now there are several teams we call “App Admins” here at NI, that perform this same role with respect to sitting between the developers and the “system admins.”  To make it more complicated, even some of the apps (“Dev”) teams are also spawning “App Ops” teams that handle CI work and production issue escalation, freeing up the core dev teams for more large-scale projects.  Our dev teams are organized around line of business (ecommerce, community, support, etc.) so they find that helpful. (I’ll note that the interface between line of business organization and technology silo organization is not an easy one.)

Which of these teams are the “DevOps?”  None of them.  Naturally, the teams that are more in the middle feel the need for it more, which is why I as a previous manager of the Web Admins am the primary evangelist for DevOps in our organization.  The “App Admins” and the new “App Ops” teams work a lot more closely together on “operational” issues.

But this is where the term “Ops” has bad connotations – in my mind, “operations”, as closely related to “support”, is about the recurring activities around the runtime operation of our systems and apps.  In fact, we split the Web Admin team into two sub-teams – an “operations” team handling requests, monitoring, releases, and other interrupt driven activity, and a “systems” team that does systems engineering.  The interface between systems engineering and core dev teams is just as important as the interface around runtime, even more so I would say, and is where a lot of the agile development/agile infrastructure methodology bears the most fruit.  Our system engineering team is involved in projects alongside the developers from their initiation, and influence the overall design of the app/system (side note, I wish there was a word that captured “both app and system” well; when you say system people sometimes take that to mean both and sometimes to just mean the infrastructure).  And *that’s* DevOps.

Heck, our DBA team is split up even more – at one point they had a “production support” team, a “release” team, an “architecture” team, and a “projects” team.

But even on the back end systems teams, there are those that have more of a culture of collaboration – “DevOps” you might call it – and they are more of a pleasure to interface with, and then there’s those who are not, who focus on process over people, you might say.  I am down with the “DevOps” term just because it has the branding buzz around it, but I think it really is just a sexier way to say “Agile systems administration.”

On a related note, I’ve started to see job postings go by for “DevOps Engineers” and other such.  I think that’s OK to some degree, because it does differentiate the likely kind of operating environment of those jobs from all the noise posted as “UNIX Engineer III”, but if you are using “DevOps” as a job description you need to be pretty clear in your posting what you mean in terms of exact skills because of this confusion.  Do you mean you just want a jack of all trades who can write Java/C# code as well as do your sysadmin work because you’re cheap?  Or do you want a sysadmin who can script and automate stuff? Or do you want someone who will be embedded on project teams and understand their business requirements and help them to accomplish them?  Those are all different things that have different skill sets behind them.

What do you think?  It seems to me we don’t really have a good understanding of the taxonomy of the different kinds of roles within Ops, and thus that confuses the discussion of DevOps significantly.  Is it a name for, or a description of, or a prescription for, some specific sub-team?  Which is it for – production support, systems engineering, does IT count or is it just “product” support orgs for SaaS?

5 Comments

Filed under DevOps

DevOps “From the Trenches” Report – HomeAway

We were out at HomeAway for a technical discussion, and DevOps reared its head as it does so frequently nowadays.  In the context of talking about their preparation to scale up for their big Chevy Chase Super Bowl commercial, they were doing all kinds of stuff.  One of the things they noted was that the traditional dev and ops headbutting changed due to the long hours of work they had to put in together.  They tried going off and doing “their parts” separately – ops doing network, servers, load balancers, and hosting and developers doing coding, caching, tuning, and testing – but the time pressure, importance, and complexity of the project forced them together into a room, and once they started to collaborate they just stayed there, working in close proximity, for the duration.  When asked about the big takeaways from the entire project, the developers noted that “Leaning how everything interacts has changed how we build things” – for example, doing “pull the plug” fault testing has made for more resilient architectures and higher confidence and quality of life for both the dev and ops teams!  They didn’t describe it as “DevOps,” but that’s what it boils down to.

The more I talk to other successful Austin tech companies – HomeAway, BazaarVoice, Pervasive – the more that I hear DevOps concepts mentioned as keys to their success – and they didn’t do them because they “wanted to do this cool DevOps thing,” but they did what was needed to succeed and it turns out that a part of that is bringing development and operational concerns together into a whole.  It reminds me of the story behind the Visible Ops book, where the authors researched what high performing IT shops had in common and then realized those successful behaviors all mapped to certain ITIL areas (mainly change management).  That is a compelling validation of its efficacy.

Anyway, I urged them to consider doing that presentation in public venues; it really was a great story and hit on many of the best practices that have been emerging from the ops and performance world over the last few years.  They must be doing something right because they’re growing like gangbusters – if you want to take a vacation and rent someone else’s house/condo instead of going to a hotel, go try out homeaway.com!

Leave a comment

Filed under DevOps

DevOps Cafe Podcast

Damon Edwards and John Willis run the DevOps Cafe Podcast.  It’s a great listen, and they have a lot of people on talking about exciting advances in the ops world (including Allspaw, John Kim, kaChing, Shopzilla).  And for this last one, they interviewed me! Apparently we’re on the cutting edge of doing DevOps in a traditional type organization as opposed to a lil’ Web startup.

So if you want to hear me natter on about DevOps and the lessons I’ve learned over my career that have brought me to it for 40 minutes or so, here you go.

Leave a comment

Filed under DevOps

DevOps and Security

I remember some complaints about DevOps from a couple folks (most notably Rational Survivability) saying “what about security!  And networking!  They’re excluded from DevOps!”  Well, I think that in the agile collaboration world, people are only excluded to the extent that they refuse to work with the agile paradigm.  Ops used to be “excluded” from agile, not because the devs hated them, but because the ops folks themselves didn’t willingly go collaborate with the devs and understand their process and work in that way.  As an ops person, it was hard to go through the process of letting go of my niche of expertise and my comfortable waterfall process, but once I got closer to the devs, understood what they did, and refactored my work to happen in an agile manner, I was as welcome as anyone to the collaborative party, and voila – DevOps.

Frankly, the security and network arenas are less incorporated into the agile team because they don’t understand how to be (or in many cases, don’t want to be).  I’ve done security work and work with a lot of InfoSec folks – we host the Austin OWASP chapter here at NI – and the average security person’s approach embodies most of what agile was created to remove from the development process.  As with any technical niche there’s a lot of elitism and authoritarianism that doesn’t mesh well with agile.

But this week, I saw a great presentation at the Austin OWASP chapter by Andre Gironda (aka “dre”) called Application Assessments Reloaded that covered a lot of ground, but part of it was the first coherent statement I’ve seen about what agile security would look like.  I especially like his term for the security person on the agile team – the “Security Buddy!”  Who can not like their security buddy?  They can hate the hell out of their “InfoSec Compliance Officer,” though.

Anyway, he has a bunch of controversial thoughts (he’s known for that) but the real breakthroughs are acknowledging the agile process, embedding a security “buddy” on the team, and leveraging existing unit test frameworks and QA behavior to perform security testing as well.  I think it’s a great presentation, go check it out!

1 Comment

Filed under DevOps, Security

Give Me An API Or Give Me Death

Catchy phrase courtesy #meatcloud…   But it’s very true.  I am continuously surprised by the chasm between the “old generation” of software that jealously demands its priests stay inside the temple, and the “new generation” that lets you do things via API easily.  As we’ve been building up a new highly dynamic cloud-based system, we’ve been forced to strongly evaluate our toolset and toss out products with strong “functionality” that can’t be managed well in an automated infrastructure.

Let me say this.  If your product requires either a) manual GUI operations or b) a config file alteration and restart, it is not suitable for the new millenium.  That’s just a fact.

We needed an LDAP server to hold our auth information.  It’s been a while since I’ve done that, so of course OpenLDAP immediately came to mind.  So we tried it.  But what happens when you want to dynamically add a new replication slave?  Oh, you edit a bunch of config files and restart.  Well, sure, I’d like my auth system to be offline all the time, but…  So we tried OpenDS.  The most polished thing in the world?  No.  Does it have all the huge amount of weird functionality I probably won’t use anyway of OpenLDAP?  No.  But it does have an administration interface that you can issue directives to and have them take hold in realtime.  “Hey dude start replicating with that new box over there OK?”  “Sir, yes sir.”  “Outstanding.”  And since it’s Java, I can deploy it easily to targets in an automated fashion.  And even though the docs aren’t all up to date and sometimes you have to go through their interactive command line interface to do something – once you do it, the interface can be told to spit out the command-line version of that so you can automate it.  Sold!

The monitoring world is like this too.  Oh, we need an open source monitoring system?  Like everyone else, Nagios comes first to mind.  But then you try to manage a dynamic environment with it.  Again, their “solution” is to edit config files and restart parts of the system.  I don’t know about you, but my monitoring systems tend to be running a LOT of tests at any given time and hiccups in that make Baby Jesus (and frequently whoever is on call) cry.  So we start looking at other options.  “Well, you just come here in the UI and click to add!” the sales rep says proudly.  “Click,” goes the phone.  We end up looking at stuff like Zabbix, Zenoss, etc.  In fact, at least for the short term, we are using Cloudkick.  In terms of the depth of monitoring, it supports 1/100 of what most monitoring solutions do.  System stats mostly; there’s plugins for LDAP and mySQL but that’s about it, the rest is “here’s where you can plug in your own custom agent plugin…”  But, as my systems come up they get added to their interface automatically, tagged with my custom namespace.  And I’d rather have my systems IN a monitoring system that will give me 10 metrics than OUTSIDE a monitoring system that would give me 1000.

It’s also about agility.  We are trying to get these products to market way fast.  We don’t have time to become high priests of the “OpenLDAP way of doing things” or the “Nagios way of doing things.”  We want something that works upon install, that you can make a call to (ideally REST-based, though command line is acceptable in a pinch, and if there’s an iPhone app for it you get extra credit) in order to tell it what to do.  Each of these items is about 1/100 of everything that needs to go into a full working system, and so if I have to spend more than a week to get you working and integrate with you – it’s a dealbreaker.  You got away with that back when there weren’t other choices, but now in just about every sector there’s someone who’s figured out that ease of access and REST API for integration plus basic functionality is as valuable as loads of “function points” plus being hellishly crufty.

Heck, we ended up developing our own cloud management stuff because when we looked at the RightScales and whatnot of the world, they did a great job of managing the cloud providers’ direct APIs for you but didn’t then offer an API in return…  And that was a dealbreaker.  You can’t automate end to end if you come smacking up against a GUI.  (Since, RightScale has put out their own API in beta.  Good work guys!)

More and more, people are seeing that they need and want the “API way.”  If you don’t provide that, then you are effectively obsolete.  If I can’t roll up a new system – either with your software or something your software needs to be looking at/managing – and have it join in with the overall system with a couple simple API commands, you’re doing it wrong.

Leave a comment

Filed under Cloud, DevOps

DevOps Time!

All right!  After the last three days of Velocity 2010, we’ve talked a lot about ops and even hinted at devops, although often in a “recycled from previous Velocity” fashion.  But today it’s time to mainline it with DevOpsDays!

I’m going to be too busy actually participating to do full writeups like I did from Velocity, but I’ll distill down the best takeaways and bring them here as soon as I can.  If you just can’t wait and aren’t here, follow along on twitter at #devopsdays!

Leave a comment

Filed under DevOps

Velocity 2010 – Getting Fast

The first session in Day 1’s afternoon is Getting Fast: Moving Towards a Toolchain for Automated Operations.  Peco, Jeff, and I all chose to attend it.  Lee Thompson and Alex Honor of dto gave it.

I have specific investment in this one, as a member of the devops-toolchain effort, so was jazzed to see one of its first outputs!

A toolchain is a set of tools you use for a purpose.  Choosing the specific tools should be your last thing.  They have a people over process over tools methodology that indicates the order of approach.

Back in their ControlTier days they wrote a paper on fully automated provisioning.  Then the devops-toolchain Google group and OpsCamp stuff was created to promote collaboration around the space.

Discussion on the Google group has been around a variety of topics, from CM to log management to DevOps group sizing/hiring.

Ideas borrowed from in the derivation of the devops toolchain concept:

  • Brent Chapman’s Incident Command System (this is boss; I wrote up the session on it from Velocity 2008)
  • Industrial control automation; it’s physical but works similarly to virtual and tends to be layered and toolchain oriented.  Layers include runbook automation, control, eventing, charting, measurement instrumentation, and the system itself.  Statistical process control FTW.
  • The UNIX toolchain as a study in modularity and composition; it’s one of the most durable best practices approaches ever.  Douglas McIlroy FTW!

Eric Raymond (The Art of UNIX Programming, The Cathedral and the Bazaar)

Interchangeable parts – like Honore Blanc started with firearms, and lean manufacturing and lean startup concepts today.

In manufacturing, in modern automation thought, you don’t make the product, you should make the robots that make the product.

Why toolchains?

  • Projects are failing due to handoff issues, and automation and tools reduce that.
  • Software operation – including release and operations – are critical nonfunctional requirements of the development process.
  • Composite apps mean way more little bits under manangement
  • Cloud computing means you can’t slack off and sit around with server racking being the critical path

Integrated tools are less flexible – integratable tools can be joined together to address a specific problem (but it’s more complex).

Commercial bundled software is integrated.  It has a big financial commitment and if one aspect of it is weak, you can’t replace it.  It’s a black box/silo solution that weds you to their end to end process.

Open source software is lots of independent integratable parts.  It may leave gaps, and done wrong it’s confused and complicated.  But the iterative approach aligns well with it.

They showed some devops guys’ approaches to automated infrastructure – including ours!  Woot!

KaChing’s continuous deployment is a great example of a toolchain in action.  They have an awesome build/monitor/deploy GUi-faved app for deploy and rollback.

Toolchains

Then they showed a new cut at the generalized architecture, with Control, Provisioning, Release, Model, Monitoring, and Sources as the major areas.

Release management became a huge area, with subcomponents of repository, artifact, build, SCM, and issue tracker.

In monitoring and control, they identified Runbook Automation, Op Console/Control, Alarm Management, Charting/History/SPC, and Measurement Instrumentation.

Provisioning consists of Application Service Orchestration, System Configuration, Cloud/VM or OS install.

This is all great stuff.  All these have open source tools named; I’ll link to wherever these diagrams are as soon as I find it!  I must not have been paying enough attention to the toolchain wiki!

Hot Tips

  • Tool projects fail if the people and process aren’t aligned.
  • Design the toolchain for interoperability
  • Adopt a SDLC for any tool you develop
  • Separate the dev release process from the package process
  • Need better interchange boundaries (the UNIX pipe equivalent)
  • No one size fits all – different tools is OK
  • Communication is your #1 ingredient for success

All in all an awesome recap of the DevOps toolchain effort!  Great work to everyone who’s done stuff on it, and I know this talk inspired me to put more time into it – I think this is a super important effort that can advance the state of our discipline!  And everyone is welcome to join up and join in.

Leave a comment

Filed under Conferences, DevOps

F5 On DevOps and WordPress Outages

Lori MacVittie has written a very interesting post on the F5 blog entitled “Devops: Controlling Application Release Cycles to Avoid the WordPress Effect.”

In it, she analyzes a recent WordPress outage and how “feathered” releases can help mitigate impact in multitenant environments.  And specifically talks about how DevOps is one of the keys to accomplishing these kinds of schemes that require apps and systems both to honor them.

Organizations that encourage the development of a devops role and discipline will inevitably realize greater benefits from virtualization and cloud computing because the discipline encourages a broader view of applications, extending the demesne of application architecture to include the application delivery tier.

Nice!  In my previous shop we didn’t use F5s, we used Netscalers, but there was the same interesting divide in that though they were an integral part of the application, they were seen as “Infrastructure’s thing.”  Apps weren’t cognizant of them and whenever functionality needed to be written against them (like cache invalidation when new content was published) it fell to us, the ops team.  And to be honest we discouraged devs from messing with them, because they always wanted some ill-advised new configuration applied when they did. “Can’t we just set all the timeouts to 30 minutes?”

But in the newer friendlier world of DevOps coordination, traditionally “infrastructure” tools like app delivery stuff, monitoring, provisioning, etc. need to be a collaboration area, where code needs to touch them (though in a way Ops can support…)  Anyway, a great article, go check it out.

Leave a comment

Filed under DevOps

Good DevOps Discussions

An interesting point and great discussion on “what is DevOps”, including a critique about it not including other traditional Infrastructure roles well, on Rational Survivability (heh, we’re using the same blog theme.  I feel like a girl in the same dress as another at a party.).  It seems to me that some of the complaints about DevOps – only a little here, but a lot more from Andi Mann, Ubergeek – seem to think DevOps is some kind of developer power play to take over operations.  At least from my point of view (an ops guy driving a devops implementation in a large organization) that is absolutely not the case.  Seems to me to be a case of over-touchiness based on the explicit and implicit critique of existing Infrastructure processes that DevOps represents.  Which is natural; agile development had/has the exact same challenge.

Note that DevOps is starting to get more press; here’s a cnet article talking about DevOps and the cloud (two great tastes that taste great together…).

And here’s a bonus slideshare presentation on “From Agile Development to Agile Operations” that is really good.

2 Comments

Filed under DevOps

Velocity and DevOpsDays!

A double threat is coming your way.  Velocity 2010, the Web performance and operations conference, is June 22-24 in Santa Clara, CA.  As one of the very few conventions targeted at our discipline, we’ve been attending since the first one in 2008.  And this time, there’s dessert – the day after it ends, a new DevOps unconference, DevOpsDays 2010, will be held nearby in Mountain View!

OpsCamp Austin kicked ass, and I’m sure this will be even better.  So come double up on Ops knowledge and meet other right-thinking individuals.

If you want to read all my musings from the previous Velocity conferences, you can do that too!

Leave a comment

Filed under DevOps