Tag Archives: agile

by Ernest Mueller | January 30, 2025 · 10:56 am

What Is A Blocker?

A recent project delay at work put me in mind of this recurring issue I’ve seen with a lot of agile teams. That is, reluctance to call something a blocker. Karthik and I are working on a new revision of our LinkedIn Learning courser “DevOps Foundations: Lean and Agile” so I decided to dig into this a bit.

I think many engineers believe a blocker is “something that prevents me from doing any work whatsoever on this entire project.” This in my experience leads to a lot of project delays and unaddressed issues because something was not identified and communicated widely enough to be swiftly resolved.

This is unfortunately encouraged by some Agile wonks who start hairsplitting with terms. “Well, there’s a blocker and then there’s an impediment,” they say. As I google this, it turns out you can differentiate between “delays, impediments, blockers, and roadblocks. Oh, and dependencies.” And boards, reports, etc. have to do/in progress/done and blockers, not 10 other categories.

Here’s the deal. Most teams out there are not formally trained on PM or agile and have essentially figured out what they know via osmosis. And there’s one term they even vaguely understand, which is “blocker”. (I have never seen a team distinguish formally between blockers and impediments in decades of doing this.).

While I love wordplay as much as the next person, I don’t think this attempt to categorize bad things on the infinite spectrum of bad things is practical, and best belongs as an organic explanation of the impact. Does it prevent you from proceeding on that piece of work? On any work? You can proceed on it but it can’t become done until the thing is resolved? Or it doesn’t technically stop work but it does put the project a week behind? Sure, say it. It’s important to know the impact but it does not change the nature of the existence of an issue and the need to swarm on or escalate it.

The practical definition of a blocker from a team member level is “anything not entirely in my control that is stopping or delaying work now or in the very near future.”

The practical definition of a blocker from a management point of view is “anything that is getting in the way of the team that I need to know about or do something about.”

We have to go back to the entire reason to have a term like “blocker”, which is to allow the team, or failing that their management escalation, to resolve issues that prevent the continued timely flow of work. Period, end of story, if process definition hairsplitting isn’t serving that core goal then do what does.

Definition people love to say something’s not “technically a blocker – yet.” “Well not having a cloud accout to use as the required test fixture isn’t technically a blocker because I won’t need it for another two days, even though there’s been no obvious headway on the request we made to IT for it.” Can anyone seriously contend that’s not a blocker? It’s a problem that is clearly visible on the road ahead, you don’t have to run into it first like my cheap Roomba does in order to escalate it, and doing so is antithetical to the overall agile goal of ensuring smooth and continuous flow. I don’t tolerate “technically true” when it becomes “wilfully dumb.”

Underlying this seems to be some unstated assumption that blockers are “bad” and you are bad for having one or reporting one. And I get it, there’s plenty of bad scrum masters/managers/etc. out there that operate unthinkingly on some Neanderthal level and react as “person say thing I don’t like, person is bad.” (Or the modern tech bro Neanderthal who has some variation of this like “well I need to discourage people from reporting blockers to make them use their masculine energy to pull themselves up by their bootstraps blah blah.”) Sure, toxic people can drive any process off track, that shouldn’t be the default however.

I believe in a healthy Agile environment team members should be encouraged to bring up anything threatening to slow or stop work. It can be a small thing, but it creates an opening for help from the team. Even “I haven’t used this tool before and it’s taking a little longer and I am not sure I’ll get this task done by sprint end” – that’s an opportunity for someone to hop on for a half hour to pair with you or train you.

If you’re off schedule there’s some blocker around, whether it can be handled in the team or needs escalation. You don’t have to escalate everything, though even if the team handled it, schedule or other impacts need to be communicated. For escalations, make it clear it’s an escalation and who to, don’t just assume everyone who gets a status report will seize on all the blocker lines as to dos. “Blocker: No headway on Azure text fixture, it’s needed to complete our work this sprint and will delay us if it’s not in place in 2 days – @ernest we need your help with this one” is perfect.

As someone providing oversight for a lot of sprint teams working on consulting engagements often with client-prescribed milestone deliverables, I keep getting into situations where a sprint full of reporting “no blockers” suddenly turns into “well but of course we won’t have any of the deliverables at sprint end tomorrow.” That makes everyone unhappy, especially me if it’s something I could have urged the client or an external team to provide for the team more promptly. Give people the opportunity to intervene to keep you on track!

I’m going to start adding “definition of blocker” right after “definition of done” in kickoff discussions because of how chronic this issue is – I venture to say I’ve seen it everywhere, it’s just more tolerated in environments where schedules aren’t taken too seriously.

Let me know how you handle this issue, if you encourage a wide definition of blocker, and your experiences on this!

Leave a comment

Filed under Agile, Management

Tagged as agile, blocker, business, development, impediment, process, project-management, scrum, scrum-master

by Ernest Mueller | October 9, 2017 · 1:10 pm

DevOps Foundations: Lean and Agile

Well you’re in for a treat – we’re getting all of the Agile Admins in on making DevOps courses, and Karthik and I did a course that’s just released today – DevOps Foundations: Lean and Agile.

It’s available both on LinkedIn Learning:
https://www.linkedin.com/learning/devops-foundations-lean-and-agile

and on Lynda.com:
https://www.lynda.com/JIRA-tutorials/DevOps-Foundations-Lean-Agile/622078-2.html

After James and I did DevOps Foundations, the “101” course, we were focused on building out courses for the three major practice areas of DevOps – Continuous Deployment, Infrastructure Automation, and Site Reliability Engineering (in progress now). But our lynda.com content manager said there was interest in us also expanding on the use of Agile and Lean especially as it relates to DevOps.

Karthik is our agile admin Agile expert; he’s presented at several Agile conferences and the like, so he and I decided to take it on. But how would we bring a DevOps specific take to it? We started outlining a course and realized it could turn into a giant boring encyclopedia of every Lean and Agile term ever. Most of what we have to add isn’t reading definitions, it’s sharing our experiences actually doing this (my Scrum for Operations series on this blog is perennially popular).

So we decided to take a tip from both Eliahu Goldratt’s The Goal and Gene Kim’s The Phoenix Project by framing the course as a fictional story! By stitching a narrative together of a Lean, Agile, DevOps transformation of a hypothetical company out of our real world stories from a variety of implementations, we figured we could explain the concepts in context and make them more interesting. Let us know what you think!

Lynda Course Description:

By applying lean and agile principles, engineering teams can deliver better systems and better business outcomes—both of which are crucial to the success of DevOps. In this course, instructors Ernest Mueller and Karthik Gaekwad discuss the theories, techniques, and benefits of agile and lean. Learn how they can be applied to operations teams to create a more effective flow from development into operations and accelerate your path of “concept to cash.” In addition to key concepts, you can hear in-the-trenches examples of implementing lean and agile in real-world software organizations.

Topics Include:
What is agile?
What is lean?
Measuring success
Learning and adapting
Building a culture of metrics
Continuous learning
Advanced concepts

Duration:
1h 26m

Leave a comment

Filed under Agile

Tagged as agile, course, DevOps, lean, lynda

by Ernest Mueller | July 24, 2017 · 5:19 pm

New DevOps Courses In The Can!

Well it was busy week last week – James, Karthik, and I were all in lovely Carpenteria, CA at the lynda.com/LinkedIn Learning HQ to film some new DevOps Foundations courses!

We have two out already – James and I did DevOps Foundations, which lays out the basics of DevOps from culture to containers. It’s a three hour course, and should suffice to orient someone in all the ways of DevOps and defines Continuous Delivery, Infrastructure Automation, and Reliability Engineering as its three practice areas. (There’s a course handout under the Exercise Files that has links and bibliography, as well.) It’s DevOps 101 if you could use that!

And then we started to flesh out the major DevOps practice areas we defined in that course as 200-level courses. These focus on concepts but illustrate with tool demos. So we filmed DevOps Foundations: Infrastructure Automation in March, which released the end of April. It’s two hours, and covers infrastructure as code concepts and the basics of creating infrastructure from specs with e.g. CloudFormation, provisioning systems with e.g. Chef, and going immutable with Docker.

But now we have an irresistible urge to do more, so in a double shot that took about a year off my life, last week James and I recorded DevOps Foundations: Continuous Delivery, which goes over continuous integration and delivery and shows you how to build a delivery pipeline – we used Jenkins/Nexus/Chef/go/abao/Robot Framework but again we focused on concepts and did just enough implementation to illustrate it.

James went home mid-week and Karthik came out, and we also recorded DevOps Foundations: Lean and Agile! Lean and Agile are integrally related to DevOps and especially to being successful at DevOps. Our content manager actually asked us to do this one; we were kinda bulling ahead on our three main practice areas, but we said sure! We cover some Agile and Lean basics, and then we take a tip from The Goal and The Phoenix Project, and the bulk of the course is a fictional implementation stitched together from real experiences we’ve both had doing these at various companies. It was fun! Here’s a look at behind the scenes.

Both of these should drop in about 5 weeks, so keep an eye out.

Leave a comment

Filed under DevOps

Tagged as agile, continuous delivery, course, lean, linkedin, lynda

Link

The first scientific survey I’ve seen of the benefits of Lean in software is an ACM-IEEE paper done in the Finnish software industry summarized in this presentation. Enjoy!

Leave a comment

by Ernest Mueller | November 28, 2016 · 9:31 pm

by Ernest Mueller | October 22, 2015 · 10:00 am

Agile Organization: Project Based Organization

This is the fourth in the series of deeper-dive articles that are part of Agile Organization Incorporating Various Disciplines.

The Project Based Organization Model

You all know this one. You pick all the resources needed to accomplish a project (phase of development on a product), have them do it, then reassign them!

Benefits Of The Project Based Model

The beancounters love it. You are assigning the minimum needed resource to something “only as long as it’s needed.”

Drawbacks Of The Project Based Model

Where to even begin?

First of all, teams don’t do well if not allowed to go through the Tuckman’s stages of development (forming/storming/norming/performing); engineer satisfaction plummets.
Long term ownership isn’t the team’s responsibility so there is a tendency to make decisions that have long term consequences – especially bearing on stability, performance, scalability – because it’s clear that will be someone else’s problem. Even when there is a “handoff” planned, it’s usually rushed as the project team tries to ‘get out of there’ from due date or expenditure pressures. More often there is a massive generation of “orphans” – services no one owns. This is immensely toxic – it’s a problem with shipping software, but with a live service it’s awful, as even if there’s some “NOC” type ops org somewhere that can get it running again if there’s an issue, chronic issues can’t be fixed and problems cause more and more load on service consumers, support, and NOC staff.
Mentoring, personnel development, etc. are hard and tend to just be informal (e.g. “take more classes from our LMS”).

Experience With The Project Based Model

At Bazaarvoice, we got to where we were getting close to doing this with continued reorganization to gerrymander just the right number of people onto the projects with need every month. Engineer satisfaction tanked to the degree that it became an internal management crisis we had to do all kinds of stuff to dig ourselves back out of.

Of course, many consulting relationships work this way. It’s the core behind many of the issues people have with outsourced development. There are a lot of mitigations for this, most of which are “try not to do it” – like I’ve worked with outsourcers trying to ensure lower churn on outsource teams, try to keep teams stable and working on the same thing longer.

It does have the merit of working if you just don’t care about long term viability. Developing free giveaway tools, for example – as long as they’re not so bad they reflect poorly on your company, they can be problematic and unowned in the long term.

Otherwise, this model is pretty terrible from a quality results perspective and it’s really only useful when there’s hard financial limitations in place and not a strong culture of responsibility otherwise. It’s not very friendly to agile concepts or devops, but I am including it here because it’s a prevalent model.

1 Comment

Filed under Agile, DevOps

Tagged as agile, DevOps, formation, Management, org, organization, structure, team

by Ernest Mueller | October 20, 2015 · 10:00 am

Agile Organization: Fully Integrated Service Teams

This is the third article in the series of deeper-dive articles that are part of Agile Organization Incorporating Various Disciplines.

The Fully Integrated Service Team Model

The next step along the continuum of decentralization is complete integration of the disciplines into one service team. You simply have an engineering manager, and devs, operations staff, QA engineers, etc. all report to them. It’s similar to the Embedded Crossfunctional Team model but you do away with the per-discipline reporting structure altogether.

Benefits Of Integrated Service Teams

This has the distinct benefit of end to end ownership. Engineers of every discipline have ownership for the overall product. It allows them to break out of their single-discipline shell, as well – if you are good at regression testing but also can code, or are a developer but strong in operations, great! There’s no fence saying whose job is whose, you all pull tasks off the same backlog. In general you get the same benefits as the Crossfunctional Team model.

Drawbacks of Integrated Service Teams

This is theoretical nirvana, but has a number of challenges.

First, a given team manager may not have the knowledge or experience in each of those areas. While you don’t need deep expertise in every area to manage a team, it can be easy to not understand how to evaluate or develop people from another discipline. I have seen dev managers, having been handed ops engineers, fail to understand what they really do or they value, and lose them as a result.

Even more dangerous is when that happens and the manager figures they didn’t need that discipline in the first place and just backfills with what they are comfortable with. For a team to really own a service from initiation to maintenance, the rest of the team has to understand what is involved. It’s very easy to slip back into the old habits of considering different teams first class vs second class vs third class citizens, just making classes of engineer within your team. And obviously, disenfranchising people works directly against energizing them and giving them ownership and responsibility.

Mitigations for that include:

Time – over time, a team learns the basics of the other branches and what is required of them.
Discipline “user groups” (aka “guilds”) – having a venue for people from a horizontal discipline to meet and share best practices and support each other. When we did this with our ops team we always intended to set up a “DevOps user group” but between turnover and competing priorities, it never happened – which reduced the level of success.

A second issue is scaling. Moving from “zone” to “man” coverage, as this demands, is more resource intensive. If you have nine product teams but five operations engineers, then it seems like either you can’t do this or you can but have to “share” between several teams. Such sharing works but directly degrades the benefits of ownership and impedance matching that you intend to gain from this scheme. In fact, if you want to take the prudent step of having more than one person on a team know how to do something – which you probably should – then you’d need 18 and not just nine ops engineers.

Mitigations for this include:

Do the math again. If the lack of close integration with that discipline is holding back your rate of progress, then you’re losing profits to reduce expenditures – a bad bet for all but the most late-stage companies.
Crosstraining. You may have one ops, or QA, or security expert, but that doesn’t (and, to be opinionated, shouldn’t) mean that they are the only ones who know how to perform that function. When doing this I always used the rule “if you know how to do it, you’re one of the people that should pull that task – and you should learn how to do it.” This can be as simple as when someone wants the QA or ops or whatever engineer to do something, to instead walk the requestor through how to do it.

Experience with Integrated Service Teams

Our SaaS team at NI was fully integrated. That worked great, with experienced and motivated people in a single team, and multiple representatives of each discipline to help reinforce each other and keep developing.

We also fully integrated DevOps into the engineering teams at Bazaarvoice. That didn’t work as well, we saw attrition from those ops engineers from the drawbacks I went over above (managers not knowing what to do with/how to recruit, retain, develop ops engineers). In retrospect we should not have done it and should have stayed with an embedded crossfunctional team in that environment – the QA team did so and while collaboration on the team was slightly impeded they didn’t see the losses the ops side did.

Leave a comment

Filed under Agile, DevOps

Tagged as agile, DevOps, formation, Management, org, organization, structure, team

by Ernest Mueller | October 15, 2015 · 10:00 am

Agile Organization: Embedded Crossfunctional Service Teams

This is the second in the series of deeper-dive articles that are part of Agile Organization Incorporating Various Disciplines. Previously I wrote about the “traditional” siloed model as Agile Organization: Separate Teams By Discipline.

Basic agile doctrine, combined with ITSM thinking, strongly promulgate a model where a team owns a business service or product. I’ll call this a “service team” for the sake of argument, but it applies to products as well.

The Embedded Crossfunctional Team Model

Simply enough, a service team has specialists from other groups – product, QA, Ops, whatever – assigned to it on an exclusive basis. They go sit with (where possible) and participate with the group in their process. The team then contains all the resources required to bring their service or product to completion (in production, for services). The mix of required skills may vary – in this diagram, product #4 is probably a pure back end service and product #3 is probably a very JavaScriptey UI product, for example.

Benefits of Embedded Crossfunctional Teams

This model has a lot of benefits.

Since team members are semi-permanently assigned into the team, they don’t have conflicting work and priorities. They learn to understand what the needs are of that specific product and collaborate much more effectively with all the others working on it.
This leads to removal of many bottlenecks and delays as the team, if it has all the components it needs to deliver its service, can organically assign work in an optimal way.
It is amazing how much faster this model is in practice than the separate teams by discipline model in terms of time to delivery.
Production uptime and performance are improved because the team is “eating its own dog food” and is aware of production issues directly, not as some ticket from some other team that gets routed, ignored, finger-pointed…

Drawbacks of Embedded Crossfunctional Teams

Multiple masters is always an issue for an engineer. Are you trying to please the service team’s manager or your “real” manager, especially when their values seem to conflict? You have to do some political problem-solving then, and engineers hate doing that. It also provides some temptation to double-resource or otherwise have the embedded engineer “do something else” the ops team needs done, violating the single service focus.
It’s more expensive. Yes, if you do this, you need at least one specialist per service team. You have to play man-to-man, not zone, to get the benefits of the approach. This should make you think about how you create your service teams; if you define a service as one specific microservice and you have teams with e.g. 2 devs on them, then embedding specialists is way more expensive. Consider basing things on the business concept of a service and having those devs working on more than one widget, targeting the 2-pizza team size (also those devs will be supporting/sustaining whichever services aren’t brand new). (Note that this is only “more expensive” in the sense that doesn’t bother with ROI and just takes raw costs as king – you get stuff out and start making revenue faster, so it’s really less expensive from a whole-company POV, but from the IT beancounter perspective “what’s profit?”)
While you get benefits of crosstraining across the disciplines, the e.g. ops folks don’t have regular all-day contact with other ops folks and so you need to take care to set up opportunities for the “ops team” to get together, share info, mentor each other, etc. as well. Folks call these “tribes” or “guilds” or similar.

Experience with Crossfunctional Teams

It can be hard at the beginning, when teams don’t understand each others’ discipline or even language yet. I had a lengthy discussion with an application architect on one team – he felt that having ops people in the design reviews he was holding was confusing and derailing. The ops people spoke some weird moon-man language to him and it made the reviews go longer and require a lot more explanation. I said “Yes, they do. But we have two choices – keep doing it together, have people learn each others’ concerns and language, and start a virtuous cycle of collaboration, or split them apart and propagate the vicious cycle we know all too well where we have difficulty working together.” So we powered through that and stayed together, and it all worked out well in the end.

When we piloted this at Bazaarvoice, one of the first ops to embed got in there, worked with the devs, and put all his work into their JIRA project. The devs got sticker shock very quickly once they saw how much work there was in delivering a reliable service – and when they dug into those tickets, they realized that they weren’t BS or busywork, but that when they thought about it they said “yeah… We certainly do need that, all this is for real!” The devs then started pulling tickets on monitoring, backups, provisioning, etc. because they realized all that workload on one person would put their delivery date behind. It was nice to see devs realize all the work that really went into doing ops – “out of sight, out of mind,” and too often devs assume ops don’t do anything except move their files to production on occasion. The embedding allowed them to rally to control their own delivery date instead of just “be blocked on the ops team.”

No one approach is “best,” but in general my experiences so far lead me to consider this one of the better models to use if you can get the organizational buy-in to the fundamental “you built it, you run it” concept.

Leave a comment

Filed under Agile, DevOps

Tagged as agile, DevOps, formation, Management, org, organization, structure, team

by karthequian | June 9, 2015 · 11:21 am

Devops State of the Union 2015

James, Karthik, and Ernest did a Webcast on Devops State of the Union 2015 talk for the BrightTalk Cloud Summit. It went well! We had 187 attendees on the live feed. In this blog post we’ll add resources discussed during the talk and we will seed the comments below with all the questions we received during the webcast and answer them here – you’re all welcome to join in the discussion.

The talk was intended to be an overview of DevOps, with a bunch of blurbs on current and developing trends in DevOps – we don’t go super deep into any one of them (this was only 40 minutes long!). If you didn’t understand something, we’ve added resource links (we got some questions like “what is a container” and “what is a 12-factor app,” we didn’t have time to go into that in great detail so check some of the links below for more.

Resources:

DevOps State of the Union 2015 Slides on SlideShare
What Is DevOps?
The 2014 State of DevOps Report (Puppet, Thoughtworks, IT Revolution Press)
CAMS (John Willis)
The Three Ways (Gene Kim)
The Twelve-Factor App
Docker Questions (Common questions we got during QA at the BrightTalk summit)
Advanced topic: Factorish and the 12-Fakter App, by Paul Czarkoswki
Gartner quotes on DevOps – 20% of G2000, 90% fail without culture shift
Rightscale 2015 State of the Cloud Report
Donkey and Unicorn graphic by Mireille Rae
Knit monkey by Su Ami

22 Comments

Filed under DevOps

Tagged as agile, chatops, DevOps, docker, iot, Security, webcast

by Ernest Mueller | February 27, 2015 · 7:25 am

Awesome Upcoming Austin Techie Events

We’re entering cool event season… I thought I’d mention a bunch of the upcoming major events you may want to know about!

Product Camp Austin, Saturday March 7 (free), for product management types
Puppet Camp Austin, Tuesday March 10 ($50), for Puppet users
SXSW Interactive, March 13-17 ($1295)
Container Days Austin, Friday evening March 27 and Saturday March 28 ($60), for docker, CoreOS, and containerization fanatics.
DevOpsDays Austin 2015, Monday May 4-Tuesday May 5 ($120), for lovers of DevOps. Sponsorships and call for presenters are also open.
Keep Austin Agile 2015 – Friday May 8 ($285), for Agilists of all stripes.

In terms of repeating meetings you should be going to,

CloudAustin – Evening meeting every 3rd Tuesday at Rackspace for cloud and related stuff aficionados! Large group, usually presentations with some discussion.
Agile Austin DevOps SIG – Lunchtime discussion, Lean Coffee style, at BancVue about DevOps. Sometimes fourth Wednesdays, sometimes not. There are a lot of other Agile Austin SIGs and meetings as well.
Austin DevOps – Evening meetup all about DevOps. Day and location vary.
Docker Austin – First Thursday evenings at Rackspace, all about docker.
Product Austin – Usually early in the month at Capital Factory. Product management!

3 Comments

Filed under DevOps

Tagged as agile, austin, Cloud, Conferences, DevOps, devopsdays, docker, product

by Ernest Mueller | September 30, 2014 · 4:16 pm

Scrum for Operations: Just Add DevOps

OK, so it’s been a while since the last installment in this series. I had talked about how we’d brought Scrum to our operations team, which worked fine, and then added in the developers as well, which didn’t. Our first attempt at dividing the whole product up into four integrated DevOps service teams collapsed quickly. Here’s how we got past that to do it again and succeed, with a fully integrated DevOps setup managing both proactive/project/feature work and reactive/support/tactical work in an effective way.

The first challenge we had was just that I couldn’t manage four new scrum teams totally on my own. We had gotten the ops team working well on Scrum but the development team hadn’t been. We didn’t have any scrum masters other than me, and were low on people designated as tech leads as well. So step one, we just mushed everyone into a 30+-person Scrum team, which sucked but was the least of the evils, and I immediately worked on mentoring people up as both Scrum masters and tech leads. I basically led by example and asked for volunteers for who was interested in Scrum mastering and/or being a tech lead for each of the planned sub-teams.

This was interesting – people self-selected in ways I would not have predicted. Some of the willing were employees and others were contractors – fine, as far as I’m concerned. “From each according to his ability, to each according to his need.” I then maximally had them start to lead standups and take ownership over sprints and such, coaching them in the background. In that way, it only took a little while to re-form up into those four teams. This gave some organic burn-in time as well, so that the ops folks got more used to not being “the ops team” and leaning on people that were supposed to be on other sprint teams. As I empowered the new leaders this became self-correcting (“Stop working with that other guy on their stuff, we have things that need doing for our service!”).

The second and largest problem was managing the work.

Part 1: “Planned” Work

Product Management

We had practically zero product manager support at the beginning, so it was up to us to try to balance planned work from many sectors – feature requests from PM, tooling requests from Customer Service and Technical Support, security stuff from Security, random requests from Legal and Compliance, brainwaves from other products’ engineering managers. It quickly began to take a full person worth of time to handle, because if not managed proactively it turned into attending meetings with people complaining as a reactive result. I went to the PM team and said “come on man, give us more PM support,” and once we got one, I worked with her on being able to manage the whole overall package.

One of the chronic product manager problems in a SaaS environment is the “not my domain” syndrome. PMs want to talk about features, but when it comes to balancing stakeholders like Security and internal operational projects, they wash their hands of it. At best you get a “Well… You guys get 50% of your time to figure out what to do with that stuff and then 50% of your bandwidth will be my features, OK?”

For the record, that’s not OK. As a product manager, you need to be able to balance all the work against all the work. Maybe you don’t have an ops background, that’s fine – you probably didn’t have a <business domain here> background when you came to work either. Learn. A lot of the success of a SaaS product is in the balancing of features against stability/scalability work against compliance work… If you want to take the “I’m the CEO of the product” role, then you need to step up and own all of it, otherwise you’re just that product’s Director of Wishful Thinking.

Anyway, luckily our PM, even though she didn’t have experience doing that, was willing to learn and take it on. So we could reason about spending a month putting in a new cluster versus adding a new display feature – they all went into the same roadmap.

Work Management in JIRA

We managed that in the standard, simple way of a quarterly spreadsheet roadmap correlating to epics in an Agile rapid board in JIRA; we’d do stakeholder meetings and stack rank them and then move them into the team backlogs and flesh them out when they were ready for execution. (It was important to keep the clutter out of the backlogs till then – even if it’s supposed to be future stuff, the more items sitting in a team’s backlog, the worse their focus is.)

We kept each service as one JIRA project but combined them into rapid boards as necessary – a given team might own a couple (like the “Workbench and CMS Team” had those two and a smaller tooling one). This way when we transferred a piece of tech around we could just move the JIRA project as well, and incorporate it into the target team’s rapid board.

Portfolio Management

Some people say that the ideal world is each team owning one microservice. I don’t agree with this – we had a number of teams under other parts of the org that were only like 2 people because they owned some little service. This was difficult to sustain and transition; when things like Black Friday came up that required 24×7 support from each team for a week it was brutal on them, and even worse once development eased up on a service it just got orphaned.

If you don’t keep a service portfolio and tightly manage who is tasked with supporting each one, you are opening yourself up to a world of hurt. And that’s where we started. Departmentwide, we’d have teams work on something and then wander off and if there was a problem they’d call on the dev who was somewhere working on some other deadline now. This worked terribly. I got the brunt of this earlier when I had joined the company as a release manager and asked “So, those core services, who owns those?” “Well… No one. I mean everyone!”

So for my teams, we put together a tracking list, used both as a service registry but also for cross-training. It was a simple three-level outline, with major services, service components, and low level elements. We had the whole team work together on filling out the list – since we were managing a 7-year-old legacy system we ended up with a list of 275 or so leaf items. Every one had to have an owning team (team, not individual), and unless you retired a service, there was no dropping it on the floor. You owned it, you retired or transitioned it, and I didn’t care if “the dev that worked on that moved to some other team.” Everything was included, from end user facing services like “the user portal” to internal services like our CI servers.

Team Management

This transitions into how we managed the teams. Teams were standard “two-pizza size” – a team of 5-7 people is optimal and we would combine services up till there was enough work for a team of that size. This avoided the poor coverage of mini-teams for micro-services.

Knowledge Management

Then we also used the service registry as the “merit badge system.” We had a simple qualification procedure – to get qualified in an element, you had to roll out code on it and you could sign off for yourself to say that you were qualified on one of those leaf elements. To get your “merit badge” in a service component, you needed an existing subject matter expert (SME) to sign off that you knew what you were doing, and you needed to understand how the service was written, deployed, tested, monitored. To become a SME in a service, the existing SMEs would have a board of review. SMEs were then understood to be qualified to make architectural changes on their own initiative, those with merit badges were understood to be qualified to make code changes with nothing more than the usual code review and CI testing.

This was very important for us because we were starting in a place where people had been allowed to specialize too much. Only one person was the SME on any given service and if the person who didn’t understand the Workbench was out, or quit, or whatever, suddenly no one knew a whole lot about it. That’s a terrible velocity burden even without attrition and it’s a clear and present danger to your service’s viability when there is. We started tracking the “merit badges” and had engineers set goals around how many they’d earn (or how many they’d review, for the more experienced). We used a lot of contract programmers and I told the contractor manager that I wanted to use that to rate the expertise of people on our account and that I wanted to see the numbers rise on each person over time.

Part 2 – “Unplanned” Work

Our team was only doing planned work 40% of the time, however. Since we were integrated DevOps teams working on a service with thousands of paying end customers, and that service was custom-integrated with customer Web sites and import/export of customer data, there was a continuous load of work coming in from Customer Support and from our own operations (alerts, etc.). All this “flow” work was interrupt-driven, and urgent to one degree or another.

The usual techniques for handling this have a lot of problems. “Let’s make a separate sustaining team!” Well, this turns into a set of second class developers. Thus you get worse devs there on that team. And those devs are more utilized one week, less the next; when things are quiet they get seconded to other efforts by people that haven’t read the Principles of Product Development Flow and think having everyone highly utilized is valuable, and then the load ramps up and something gives… Plans emerge to make it a training ground for new devs, till you realize that even new devs don’t want to put up with that and just quit… I’ve seen this happen in several places. I am a firm believer in dog fooding – if you are writing a service, you need to handle its problems. If there are a lot of problems, fix it!!! If you are writing the next version of the service – well, doing that in isolation from the previous service dooms you to making the same errors, adding in some new ones as well. No sustaining teams, only evergreen teams.

So we had the rule that each integrated team of devs and ops handled their own dirty laundry. And at 60% of the workload, that was a lot of laundry. How did we do it? Everyone was worried about how we could deliver decent feature velocity while overwhelmed by flow work. Therefore…

Triage

In addition to the teams’ daily standups, we had a daily triage meeting where the engineering manager, team leads, PM, and representatives from Support, Sales, and whoever had a crisis item that they felt needed to be handled on an expedited basis (not waiting on the next sprint, in other words) would come to. Each new intake would be reviewed. In the beginning we had to kick a lot of requests back for insufficient detail, but that corrected itself fast. We’d all agree on priority – “that’s affecting one customer but they’re one of our top customers, we’ll work it as P2” or the like.

For customer reported issues, we had SLAs we committed to in contracts (1 day for P1, etc). Ops issues could be super urgent – “P1, one of our services is down!” – or doable later on. So what we did was to create a separate Kanban board in JIRA for all these kinds of issues. Anything that really could wait or was large/long term would get migrated into the Scrum backlogs, but anything where “we need someone to do this stat” went in here. It served the same purpose as an “expedite lane,” it was just a jumbo four-lane highway of an expedite lane because so much of the work was interrupt driven.

But does this mean “give up on Scrum?” No. Without the sprint cadence it was hard to hit a release cadence, and also it was easy for engineers to just get lost in the soup and stop delivering at a good rate, frankly. So we still ran sprints, and here was the rules of engagement we provided the engineers.

Need More Work?

Pull things for your team/service off the triage queue
If there’s nothing in the triage queue, pull the next item from the sprint backlog
If there’s nothing in the sprint backlog or the triage queue, pull something off the top of the backlog. Or relax, either one.

Then for standups, we had a master Agile board that contained everything – all projects, the triage board, everything. So when you looked at a given engineer’s swimlane, you could see the sprint work they had, the flow work they had, and anything they were working on from someone else’s project (“Hey, why are you doing that?”). Again, via JIRA agile board composition that’s easy to do. Sometimes teams would try to do standups just looking at swimlanes containing “their” projects and it always ended up with things falling in the gap, so each time that happened I reiterated the value of seeing everything that person is assigned, not just what you think/hope they are assigned, since they are exclusively attached to that one team.

At first, everyone fretted about the conflict between the flow work and the sprint work. “What if there’s a lot of support work today?!?” But as we went on sprint by sprint, it became evident that over the course of a two week sprint, the amount of support and operations work evened out. Sprint velocity was regular despite the team working flow work as well. Having full-sized sprint teams and two-week iterations meant that even if one day there was a big production issue and whoever grabbed it was slammed – over the rest of the time and people it evened out. This shouldn’t be too surprising, it’s called “flow” for a reason – there are small ebbs and surges but in general the amount over time at scale is the same. Was it as perfectly “efficient” as having some people working flow and others working sprints? No, there is definitely a % overhead that incurs. But maximum utilization of each person’s time, as I mentioned before, is a golden calf. Lean principles show us that we want the best overall outcome – and this worked.

Flow work was addressed quickly, and our customer ticket SLA attainment percentage whipped up from a starting level of less than 50% over several quarters to sit at 100%, meaning every single support ticket that came in was addressed within its advertised SLA. Once that number hit 100% the support ticket time to live started to fall as well.

At the same time, sprint velocity for each of the four sprint teams went up over time – that is, just the story points they were delivering out of the feature backlog improved, in addition to the improvements in flow work. We’d modify the process based on engineer feedback and try alterations for a sprint to see how they panned out, but in general by keeping the overall frame in place long enough that people got used to it, they became more and more productive. Flow work and planned work both improved at the same time, not at each others’ expense.

The Dynamic Duo

This scheme had two issues. One was that engineers were sometimes confused about what they should be working on, flow tickets with SLAs or their sprint tasks. Our Scrum masters were engineers on the teams too, they weren’t full time PMs that could afford to be manually managing everyone’s work. The second was that operational issues that came in and required sub-day response time couldn’t wait for triage and ended up with either frantic searches for anyone who could help (which often became “everyone sitting in the team area”) or missed items.

I have always been inspired by Tom Limoncelli’s Time Management for System Administrators. He advocates an “interrupt shield” where someone is designated as the person to handle walkups and crises. At NI I had instituted this in process, at BV in the previous Ops team there had been a “The Dude” role (complete with Jeff Bridges bobblehead) that had been that. Thus the Dynamic Duo were born.

The teams each had one dev and one DevOps on call at a given time; we managed the schedule in PagerDuty. Whoever was on call that week became “on the Dynamic Duo” during the days as well. When we went into a sprint, they would not pull sprint tasks and would be dedicated to operational and urgent support issues. It was the Dynamic Duo because we needed someone with ops and someone with dev expertise on this – one without the other meant problems were not effectively solved. I even made a cool wiki page with Batman and Robin stuff all over it and we got Bat-phones painted red. I evangelized this inside the company. “Don’t walk up and grab a dev with your question. Don’t chat your favorite engineer to get them to work on something. Come turn on the Bat-signal, or call the Bat-phone, and the Dynamic Duo will come help you.”

This was good because it blunted the sharp tip of very urgent requests. The remaining flow work (2 people couldn’t handle the 60% of the load that was interrupt driven) was easier for the sprinting devs to pull in as they finished sprint tasks without worrying about timeliness – the real crises were handled by the Dynamic Duo. The Duo also ran the triage meeting and even if they weren’t working on all the triage work, they bird dogged it as a kind of scrum/kanban/project manager over the flow work. In the rare case there wasn’t anything for them to jump on, they could slack – time to read, learn, etc. both as compensation for the oncall and adrenaline rushes that week but also because it’s hard to fit time for that into sprints… And as we know from the Principles of Product Development Flow, running teams at too high of a utilization is actually counterproductive.

Conclusion

That’s the short form of how we did it – I wanted to do a lot more but I realized that since it’s been a year since I intended to write this, I’d better shake and bake and get it out and then I’m happy to follow up with stuff any of you are curious about!

This got us to a pretty happy place. As time went on we tweaked the sprint process, the triage process, and the oncall/Duo process but for a set of teams of our size with our kind of workload it was close to an optimal solution. With largely the same team on the same product, the results of these process changes were:

Flow work improved as measured by customer ticket SLA attainment and other metrics
Sprint work velocity improved as measured by JIRA reports
Engineering satisfaction improved as measured by internal NPS surveys

Improvement of all these factors was not slight, but was instead 50% or more in all cases.

Feel free and ask me about parts of this you find interesting and I’ll try to expand on them. It wasn’t as simple as “add Agile” or “add DevOps,” it definitely took some custom wrangling to balance our specific SaaS service’s needs in the best manner.

3 Comments

Filed under Agile, DevOps

Tagged as agile, development, DevOps, Management, Operations, scrum, scrum4ops, system administration, systems

Tag Archives: agile

What Is A Blocker?

DevOps Foundations: Lean and Agile

New DevOps Courses In The Can!

Agile Organization: Project Based Organization

The Project Based Organization Model

Benefits Of The Project Based Model

Drawbacks Of The Project Based Model

Experience With The Project Based Model

Agile Organization: Fully Integrated Service Teams

The Fully Integrated Service Team Model

Benefits Of Integrated Service Teams

Drawbacks of Integrated Service Teams

Experience with Integrated Service Teams

Agile Organization: Embedded Crossfunctional Service Teams

The Embedded Crossfunctional Team Model

Benefits of Embedded Crossfunctional Teams

Drawbacks of Embedded Crossfunctional Teams

Experience with Crossfunctional Teams

Devops State of the Union 2015

Scrum for Operations: Just Add DevOps

Part 1: “Planned” Work

Product Management

Work Management in JIRA

Portfolio Management

Team Management

Knowledge Management

Part 2 – “Unplanned” Work

Triage

The Dynamic Duo

Conclusion

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

Archives