The other day I came across two interesting articles that showcase two facets of one problem (and more notably, a problem that I have been working on myself). Read the two articles, they are:
- Can Sustaining Engineering Be Agile and Engaging For Team Members by Matt Roberts
- Legacy Application Strangulation by Paul Hammant
I manage a large mostly-sustaining team here at Bazaarvoice that I’ve moved to Agile and DevOps. As Matt points out, sustaining teams are problematic in theory. The strangulation approach, especially the airline booking app “single trunk” approach, is better from a number of perspectives. But, our org made the decision to put all the legacy work with a sustaining team so that the many teams of new-product devs would be able to get maximum speed. It did allow for greater speed of new development to not have to do support at the same time. However, it also provided significant challenges – initially underestimating the effort needed to sustain, new teams not having the benefit of the lessons old team already learned from running at scale, sustaining teams feeling like second class citizens, other teams being tempted to shed even newer work to the sustaining team (even though it’s technically just for the one product). I can’t prove that taking the strangulation vs sustaining approach would have been better, but in retrospect, I would want to try that instead. We are strangling old vs new product from a customer-facing point of view in terms of dialing up new products/dialing down old ones instead of doing “big bang” upgrades, but we’re not doing it inside a single team/single trunk model like Matt mentions and it seems like that could mitigate many of these issues.
We are making the best of the sustaining gig on our team, however. It’s not light work or lacking in innovation. We run support requests through a kanban and then have two scrum-type sprint teams for CI and occasional feature work, plus lots of infrastructure work. We dole out up to a billion hits a day and have a reach of 400M+ users, with traffic and data volume doubling year over year, so we are in the interesting position of being largely frozen in terms of features, how most product managers understand them (pretty buttons!), but having to innovate and rearchitect quite aggressively on all our “nonfunctional” areas (performance, availability, security, etc.). When people tell us we’re “feature frozen” I tell them they have a poor understanding of the word “feature,” or maybe they should think about “changes” rather than “features.” This is one of the key DevOps culture change points many orgs have to face, and educating PMs and upper management on a more holistic definition of “feature” that includes managing nonfunctional requirements is a key success factor.
We’re also doing a number of the things Matt’s article encourages to make sustaining work engaging. We push hard on customer satisfaction (we are riding at 99% of customer tickets fulfilled within SLA; we have a big dashboard with leaderboards that promote that), empower the team to perform continuous improvement to make the system better, and consult with the “next gen” teams on their work. As a result we have really good results, really good relationships with the Implementation and Support groups outside Engineering, and pretty good team morale. Of course, general recognition and stuff like that so that everyone sees and appreciates the team’s work helps.
Though in the end, we are also trying to outsource the sustaining work so that our engineers aren’t all sad from having to do it. (Our current team soldiers on because they know the company depends on them, but other engineers in the company don’t want to do move over to do sustaining work,.) So… There’s that. Our job is to juggle the desire of employees to move off sustaining plus the desire of other teams to get those employees and developing the outsourcers’ expertise with the needs of maintaining the legacy app with the excellence required.
From what I’ve learned from this, I believe a solid product renewal plan would involve:
a) Teams that own services, not apps or projects (ITSM/ITIL 101).
b) Those teams own the design, development, sustaining, operations, deployment, and whatever other task you want to apply to the product – from conception to delivery.
c) Every app and library and service and tool has to be owned by an appropriate service team, regardless of what engineer moved to what team or corporate reprioritization happened or whatever completely-legitimate corporate sob story you have.
d) Then if you need to make a major sea change, you employ the strangulation method to transition effort on a team, not using a separate sustaining team.
The risk with this approach is that a team gets filled up with sustaining work. But that is a chance for them to eat their own dog food. Go fix whatever’s causing that sustaining work! Retire the stuff that doesn’t make sense any more! Passing completed items off into a black hole for “sustaining” chews up just as much resources and time, it just provides the convenient fiction that since you can’t see it, it must not be affecting your velocity.
What do you think? How have you approached this problem? Am I on crack? Let me know.