Scrum for Operations: Fitting In As An Ops Engineer

So far in this series, I’ve introduced the basics of Scrum as it generally is used and explained the practices that make it extremely successful. But that’s for developers, right? If you are in operations, what does this mean to you? How do you fit in? For an ops person, the major challenges are mental – you have to reorient your way of thinking, and then things drop into place very well.

I’m writing from the perspective of a Web operations guy, though I’ve done more traditional sysadmin work and managed infrastructure (and dev) teams over time (and started off as a dev, many years ago). Some of my terminology is oriented towards creating a product and keeping a Web site up, but you should be able to conceptually substitute your own kind of system, just as all different kinds of developers, not just Web developers, use and benefit from agile.

The Team

First, “DevOps.” Get an Ops person assigned to the dev team. This is fundamental – if it’s an externalized relationship, where the dev team is making requests of your “Infrastructure org”, you will not be seen as part of the team and your effectiveness will be extremely diminished. You need to be more or less dedicated to this project, not handling it from some shared work queue. This reinforces the fundamental values of Agile. You join the team, and you dedicate yourself to the overall success of the product you are working on. It is this integration, and the trust that arises from shared goals, that will remove a lot of the traditional roadblocks you are used to facing when dealing with a dev team. A real agile team should have similarly embedded product, QA, and UX folks, it’s not a new idea.

You are not “a UNIX guy” or “A DBA” any more.  You are “a member of the Ratings and Reviews team,” and you happen to have a technical specialty. This may seem like sophistry but it’s actually one of the most critical parts of this cultural transformation.

The Backlog

Start thinking of tasks in a customer-feature-facing kind of way for the backlog. For example, no one but you wants to hear about “configuring the SAN,” they want to know that at the end of the sprint “customers will be able to save files to persistent storage.” If what you’re doing doesn’t have any benefit to the end customer – why are you doing it again? You shouldn’t be.

Figure out how to state operational concerns like performance, maintainability, and availability as benefits in the backlog. Some infrastructure stuff belongs in the backlog, other parts of it belong more in standards (e.g. the team Definition of Done now states you have to have monitoring on a new service…). The product manager and dev team aren’t dumb, they will understand that performance, availability, security, ability to release their software, etc. are important goals that have merit in the backlog. The typical story-lingo is “As an X, I want Y so I can do Z.” “As a client, I want my data backed up so that in the case of a disaster, I am minimally affected.” “As an engineer, I want the uptime state of my services monitored so I can ensure customers are being served.”

You will be challenged (and this is good) on items that are “monkey work.”  “I need to go delete log files off that server, so it doesn’t crash.” Hey, why are we doing that?  Why is it manual? Should we have a story for proper log rotation? Need a developer to help? You will see a virtuous cycle develop to “fix things right.” Most of the devs haven’t seen a lot of the demeaning stuff you’re asked to do, and they’ll try to help fix it.

The Sprint

I’ll be honest, the first time I was confronted with the prospect of breaking up systems work into sprints I thought it was very unlikely it could be done. “Things are either short interrupts or long projects, right, that doesn’t make any sense.” And then I did it, and the scales dropped from my eyes. Remember refactoring. Developers doing agile are used to refactoring, while we are used to only having “one bite at the apple” – if we don’t get the systems all 100% right before we unleash the developers on them, then we won’t be able to change them later right?  Wrong!

In a certain sense, sprint planning is a big load off from traditional planning. Infrastructure folks are used to being asked to provide a granular task breakdown and timeline of 6 months worth of work for some big-bang implementation. Then when reality causes the plan to deviate from that, everyone freaks.  Agile takes horizon planning and institutionalizes it – you only need to be able to specifically plan your next 2 (or so) weeks, and if you can’t do that you need to try harder. What can you implement in 2 weeks that has some kind of value? Get a Tomcat running sprint 1, then tune it sprint 2, then monitor it sprint 3 – don’t bundle everything up into one huge mass.

Testing

Figure out what unit tests mean to you for things you are implementing.  “Nothing” is the wrong answer.  If you’re making a network change, for example, there is something you can do to test that short of “waiting for people to complain.” If you are installing tomcat on a server – if you’re using a framework like chef or puppet they’ll have testing options built in, but even if not there’s certain things you can do to ensure its functionality instead of passing it on and causing lost time and rework when someone else finds out it’s not working right.

More to come, meditate upon those truths for a bit – ask questions in the comments!

2 Comments

Filed under DevOps

2 responses to “Scrum for Operations: Fitting In As An Ops Engineer

  1. These are good ideas and thoughts. We implemented scrum in our infrastructure and operations org. It worked out OK but here are some things we changed or could not do:

    1.) First, it’s very unlikely that any org has enough resources to assign someone to every dev team. As great as that sounds, if you have 20 dev teams and 8 network engineers – the math doesn’t add up. Instead, we built cross functional infra/ops teams (someone from network, server, storage, database, etc) to take on projects vs. the traditional silo -> hand off method.

    2.) We did start out with scrumming/sprinting these teams. Unfortunately, it did not work out as planned for a few reasons. Many of our projects had hardware orders and installation involved. We could not sprint the delivery and installation as that was out of our hands (shipping time, receiving, unboxing etc). Also, some projects were dependent on third party vendors to complete the work/project. So, if we had a story called “venor to finish implementation” it didn’t alwasy line up with the sprint.

    3.) What we started doing instead was using Kanban for these stories/tasks. With Kanban, there is no timebox. This removes the issue of constantly carrying stories over that could not be completed due to the reasons listed in #2 above. It also let us lay out the stories top to bottom, like “order hardware”, “install hardware”, “configure hardware”, “set up network” etc.

    The two best things we implemented that worked wonders though were: cross-skillset team (taking someone from each silo and creating a project team) and having daily stans ups (sometimes we went to 3 stand ups a week especially if a lost of hardware configuration was involved).

    Peace

    • Thanks for your thoughts!

      Definitely, a staffing model with enough ops people to cover all the teams is important. It’s always possible to “play zone” – but I’ve found the throughput and results aren’t the same. I encourage modern orgs to get with the program and not try to cut costs so much that they cut their own throats by not letting each project team move at its own pace.

      And yeah, traditional hardware is hard to sprint to… I guess. Was it easy to do in waterfall? Kinda, because you had to plan ahead. Is the delivery time from your vendor so random? I’d kinda speak to my vendor about that. You set up items “top to bottom” in Scrum too, that’s the backlog.

      Kanban is great and I think it’s the end goal – just in most places “kanban” means “nothing, we just use a task board.” Getting a high cadence of delivery in kanban requires a lot of individual discipline that’s unlikely for people to develop when they just move over to agile for the first time.

      Anyway, that’s why I tend to start with Scrum, but if Kanban is working for you that’s great!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.