Tag Archives: DevOps

Loose Lips Sink Ships: Precision in Language

loose_lips_might_sink_shipsWe talk a lot about communication in the DevOps space.  Often the discussion is about how to “talk nice” and be collaborative, but I wanted to take a minute to talk about something even more important – the precision of your communication in the first place.

I was reminded of this because over the holidays I’ve been doing some nonfiction reading – first, Thomas J. Cutler’s The Battle of Leyte Gulf, and after that, Cornelius Ryan’s classic A Bridge Too Far.

Both books are rife with examples of how imprecise language – often at the highest levels – caused terrible errors and loss of life.

The example that I’ll unpack is the infamous (to WWII wonks, at least) example of Admiral “Bull” Halsey’s naval charge northward leaving the landing forces at Leyte Gulf in the Philippines completely unprotected. He had previously sent a message to his superiors indicating his intent to split his forces from three into four task forces. When he later sent a message that he was proceeding north with “three task forces” to pursue Japanese aircraft carriers, Admiral Nimitz and other commanders assumed that the fourth task group had already been formed and was being left behind to defend the landing forces.  But this was not the case – in his first message he had meant “I plan to reform into four groups, you know, sometime in the future, but not now.” So he went north with his entire fleet, leaving no one behind to defend the landing forces.

By the time the others began to suss out that something was wrong, it was too late – Halsey was too far north to return in time when a large Japanese fleet came upon the much lighter landing forces and forced them into a lengthy fighting retreat. Many American ships sacrificed themselves against an overwhelming opponent to buy time for the doomed task force, until for reasons still not made clear to posterity, Japanese Admiral Kurita broke off the attack for no good reason. (Extensive historical analysis hasn’t clearly determined whether he had bad intel, suffered from battle fatigue, or simply lost his nerve. The after action report of Admiral Sprague of the landing forces simply stated that “The failure of the enemy… to wipe out all vessels of this task unit can be attributed to our successful smoke-screen, our torpedo counterattack, continuous harassment of the enemy by bomb, torpedo, and strafing air attacks, timely maneuvers, and the definite partiality of Almighty God.”)  A happier ending than it could have been, but still, ships sank and men died due to a lack of clear and precise communication.

The same kind of miscommunication happens at work all the time.  A recent real example – I was incident commander on a production incident where a customer-facing server couldn’t be connected to by a support server used to perform interactive customer debugging sessions. It was being coordinated through HipChat. One engineer was looking at the problem from the the customer-facing server.  One of the engineers who administers the support server came on to chat and a discussion ensued. The support engineer indicated that there were hundreds of interactive customer support sessions in progress on that support server. Another senior engineer, an expert on the customer servers, declared, “Well, let’s try rebooting the server.  It’s the only thing to do.” “Really?” says the support engineer?  “Yes, let’s do it,” says the senior engineer.

Of course he meant the customer-facing server, which we knew didn’t have any traffic at the moment, and not the support server.  But the support engineer assumed he meant the support server, and proceeded to begin to reboot it – at this point I intervened and had to say “STOP, HE IS NOT TALKING ABOUT YOUR SERVER.” But it was a reasonable mistake – if you talk about “the server,” then everyone is going to interpret that as “the server most important to me at this time.” Anyway, we stopped the support engineer from disrupting the 100 other customer support sessions going on via his server and we restarted the customer-facing server instead. (Which didn’t help of course, “reboot the server” is something that went out with Windows NT, it’s a desperate attempt to not perform quality troubleshooting – but that’s material for another post.)

If at any time you find yourself talking to someone else about something there’s clearly more than one of:

  • “the” server – or even “the” support server or “the” bastion server
  • “the” incident/problem
  • “that” ticket
  • “that” email I sent you
  • “the” bug
  • “the” build

You are being a dumbass.  I can’t be blunt enough about this.  You are deliberately courting miscommunication that in the best case adds time and friction and in the worst case causes major errors to be made.  I have seen costly errors happen because of real world exchanges just like this:

“Are you working on the problem?”  “Yes.”  (And by this they mean the other problem the asker doesn’t know about.)

“It’s critical we fix that big bug from this morning!” “Yes, someone’s on it!” (And by this they mean the bug that got ranked as high criticality in the backlog, but the asker means their own special favorite bug.)

“Is the server back up?”  “Yes.” “OK I will now begin a lengthy operation… Hey what the hell?!?”  (Oh, you meant another one of the 5 servers being restarted this hour.)

Everyone should know, in a sober moment, that in even a moderately sized environment there are often multiple incidents going on in a day (and even at the same time), that there are multiple servers and multiple tickets (and apps and environments and builds and developers and customers…). But they forget and default back to “the world revolves around me” and assume you share that context.

An example. If you hit me up in chat and say “the build broke, what could be wrong,” and our build server has somewhere around 100 builds in it, all of which are breaking and being fixed continuously by a large group of developers – at best you are being super inconsiderate and forcing me to do diagnostic work that could be forestalled by you cut-and-pasting an URL or whatnot into your browser, and at worst I’m going to go look at some other build instead and you get the help you need much later than you wanted it.

Even better, if you’ve sent me 10 emails that morning, asking me “did you get that last email I sent you” is pointless verbal masturbation of an excessively time-wasting variety.  Either I’m going to assume “yes” and potentially give you the wrong answer, or I have to spend time trying to elicit identifying information from you. Don’t do it. Use your words.

Be Specific

I expect crisp and exact communication out of my engineers, and it’s as simple as hearkening back to grade school and using proper nouns.

You are working on bug BUG-123, “Subject line of the bug.”   You are logged into server i-123456 as ec2-user. You are working on incident I-23, “Customer X’s system is down.” You sent me an email around 10 AM with the subject line “Emergency audit response needed.” SAY THAT.

Being in chat vs email vs in person/phone/video call is not an excuse.  This is not a “tone formality” issue, it is a precision and correctness issue.

Having people with different primary languages is not an excuse. Unless you’re from El-Adrel, your language has proper nouns and you know how to use them, and you know which server you’re logged into.  (Well, OK, you might not, but that’s a different problem.) Slang is a language problem, other things are a language problem, but being able to clearly state the proper noun at hand is NOT a language problem.  Proper nouns are the one thing that never need translating!

Detect Fuzzy Language And Make It Specific

But what if you’re not the one initiating the discussion?  What if someone else is already giving you unclear direction or questions?  Well, I’m going to let you in on a little secret – managers are often terrible at this.  Like with Admiral Halsey, sometimes being “hard charging” gets confused with “lack of clarity.”

Sometimes it seems like the higher level of management someone’s in, the more they think that the general rules of discourse and process don’t apply to them. You must not be “in tune with the business needs” if you aren’t sure exactly which thing they are vaguely referring to. Frustrating, but you just need to manage up by starting to be precise yourself. Don’t compound the possible error by making an assumption; if there’s not a proper noun in play you need to state one yourself to ensure precision.

“Customer X is angry, are you working on that bug?!?”  Obviously “yes” or “no” is probably a bad answer that, if you’re not on the same page already, will continue to cause confusion. Reply instead, “I am working on bug BUG-123, ‘Subject line of the bug.’  Is that the one you’re talking about?”

“Did you get that last email, I expect a response!” says the call or text.  Well, just saying “yes” because I saw an email from that guy an hour ago I just responded to isn’t right –  it might not be the right one. Get clarification.

Whatever happens, try to verify information you’re being given.  “Reboot the server!” “To confirm, you want me to reboot server i-123436, the customer support server, right?”

Ask for clarification quickly.  In the example above, Nimitz and the other commanders just let the imprecision slide, thinking “Well…  I’m sure he meant the fourth task group is staying…  I don’t want to look dumb by asking or make him look bad by having to clarify.  It’ll all be fine.”  And by the time they got nervous enough to ask him directly it was too late for him to return in time and people died as a result.

Now sometimes issues are urgent and the person sending the request/demand doesn’t respond to your request for clarification in a timely manner, even if you respond immediately.  What do you do?  Well, you have to do what you think is best/safest, but try to get things back on the right path ASAP by using all the communication paths at hand to clarify. If the request was unclear, do not let your response be unclear.  “Restart the server!”  “Done.”  You’re both culpable at this point. “I restarted server i-123456 at 10:50 AM” puts you in the right.

Then afterwards, send the person this article so they can understand how important precision is.  Luckily, in IT it is usually less likely to cause deaths than in the military, but poor communication can and has lost many dollars, hours, and jobs.

6 Comments

Filed under DevOps

DevOps Fundamentals Going Strong!

Our LinkedIn Learning content manager, Jeff Kellum, tells James and I that our DevOps Fundamentals course is “the third most popular IT course in our library right now”!  You can start a free trial period at Lynda by going to http://www.lynda.com/trial/ErnestMueller.  We’ll post more about the experience we had making the course, it was a lot of fun and we learned a lot about going in front of the camera!

 

Leave a comment

Filed under DevOps

A New Way To Learn DevOps!

IMG_0635.JPG

James Wickett and Ernest Mueller on the set for DevOps Fundamentals

We’re really excited to announce that James Wickett and I (Ernest Mueller) from The Agile Admin have put together a comprehensive DevOps Fundamentals course for lynda.com – a 3 hour long course covering everything from DevOps’ Agile and Lean roots, to DevOps culture to book recommendations and we even cover future .

As you know we here at The Agile Admin have spent a lot of time trying to help people learn DevOps – for a variety of reasons, many of the original DevOps practitioners were reluctant to even define the term, and were against a lot of the “DevOps” training/certification programs that sprung up because they weren’t really a good reflection of the real scope of the movement. While we understand those factors and agree with some of the specific critiques, we think it’s frankly been criminally difficult to learn DevOps with the available resources to date (best answer: go to a variety of events, crawl some random blogs and twitters, try to piece it together yourself over time, read some kinda-related books…).  The unicorns don’t need any more than that, but all four of the Agile Admins have worked in corporate IT before and have a lot of sympathy for all the folks out there that *don’t* work for Etsy or Netflix and are trying to figure out how this new world can make their work and life better.

In the course, we go into what we consider to be the three primary practice areas of DevOps – continuous delivery, infrastructure automation, and reliability engineering. lynda has a free trial period so feel free and go give it a look to see if it could help you!

To give you an idea of what is included in the course, here’s the course outline. Even in a three hour class there’s no way to comprehensively cover these topics, so we tried to extensively point you out at other resources as we go and have a whole section on great DevOps learning resources.

  • Introduction
  • DevOps Basics
    • What Is DevOps?
    • DevOps Core Values: CAMS
    • DevOps Principles: The Three Ways
    • Your DevOps Playbook
    • 10 Practices For DevOps Success (Part 1)
    • 10 Practices for DevOps Success (Part 2)
    • DevOps Tools: The Cart Or The Horse?
  • DevOps: A Culture Problem
    • The IT Crowd and the Coming Storm
    • Use Your Words
    • Do Unto Others
    • Throwing Things Over Walls
    • Kaizen: Continuous Improvement
  • The Building Blocks of DevOps
    • DevOps Building Block: Agile
    • DevOps Building Block: Lean
    • ITIL, ITSM, and the SDLC
  • Infrastructure Automation
    • Infrastructure As Code
    • Golden Image to Foil Ball
    • Immutable Deployment
    • Your Infrastructure Toolchain
  • Continuous Deployment
    • Small + Fast = Better
    • Continuous Integration Practices
    • The Continuous Delivery Pipeline
    • The Role of QA
    • Your CI Toolchain
  • Reliability Engineering
    • Engineering Doesn’t End With Deployment
    • Design for Operation: Theory
    • Design for Operation: Practice
    • Operate for Design: Metrics and Monitoring
    • Operate for Design: Logging
    • Your SRE Toolchain
  • Additional DevOps Resources
    • Unicorns, Horses, and Donkeys, Oh My
    • The 10 Best DevOps Books You Need To Read
    • Navigating the Series of Tubes
  • The Future of DevOps
    • Cloud to Containers to Serverless
    • The Rugged Frontier of DevOps: Security
  • Conclusion
    • Next Steps: Am I a DevOp Now?

We worked long and hard on the course and we think it represents all the must-know aspects of DevOps and can get you started down the path of implementation with a good foundation.  Check it out and let us know what you think!

3 Comments

Filed under DevOps

Three Upcoming DevOps Events You Should Attend

I wanted to mention a couple Austin area events folks should be aware of – and one international one!  November is full of DevOps goodness, so come to some or all of these…

The international one is called All Day DevOps, Tuesday November 15 2016, and is a one long day, AMER and EMEA hours, 3-track, free online conference.  It has all the heavy hitter presenters you’d expect from going to Velocity or a DevOpsDays or whatnot, but streaming free to all.  Sign up and figure out what you want to watch in what slot now!   James, Karthik, and I are curating and hosting the Infrastructure track so, you know, err on that side 🙂  There’s nearly 5000 people signed up already, so it should be lively!

Then there’s CD Summit Austin 2016.  There’s a regional IT conference called Innotech, and devops.com came up with the great idea of running a DevOps event alongside it. It’s Wednesday November 16 (workshops) and Thursday November 17 (conference) in the Austin Convention Center. All four of the Agile Admins will be doing a panel on “The Evolution of Agility” at 11:20 on Thursday so come on out!  It’s cheap, even both days together are like $179.

But before all that – the best little application security convention in Texas (or frankly anywhere for my money) – LASCON is next week!   Tues and Wed Nov 1-2 are workshop days and then Thu-Fri Nov 3-4 are the conference days. I’m doing my Lean Security talk I did at RSA last fall on Friday, and James is speaking on Serverless on Thursday. $299 for the two conference days.

Loads of great stuff for all this month!

 

Leave a comment

Filed under Conferences, DevOps

The DevOps Handbook

2016-10-04-22-10-23

I haz it!

It, of course, is the new DevOps Handbook, in which luminaries Gene Kim, Patrick Debois, John Willis, John Allspaw, and Jez Humble put together a single coherent guide to understanding and implementing DevOps. Most of the “DevOps” books to date have really just nibbled around the edges of DevOps instead of addressing its entire scope head on. This book does so, and will become the standard reference in anyone’s DevOps library.  Get it on Amazon or elsewhere!

1 Comment

Filed under DevOps

Lean Security

James and I have been talking lately about the conjunction of Lean and Security.  The InfoSec world is changing rapidly, and just as DevOps has incorporated Lean techniques into the systems world, we feel that security has a lot to gain from doing the same.

We did a 20 minute talk on the subject at RSA, you can check out the slides and/or watch the video:

While we were there we were interviewed by Derek Weeks.  Read his blog post with a transcript of the interview, and/or watch the interview video!

Back here in Austin, I did an hour-long extended version of the talk for the local OWASP chapter.  Here’s a blog writeup from Kate Brew, and the slides and video:

We’ll be writing more about it here, but we wanted to get a content dump out to those who want it!

Leave a comment

Filed under DevOps, Security

DevOps Enterprise Summit Videos Are Up

There’s a crop of great talks from this event, check them out here. If you look really hard you can see my talk too!

Leave a comment

Filed under Conferences, DevOps

Docker: Service Management Trumps Configuration Management

Docker – It’s Lighter, Is That Really Why It’s So Awesome?

When docker started to hit so big, I have to admit I initially wondered why.  The first thing people would say when they wanted to talk about its value is “There’s slightly less overhead than virtualization!” Uh… great? But chroot jails etc. have existed for a long time, like even back when I got started on UNIX,and fell out of use for a reason,  and there also hadn’t been a high pressure rush of virtualization and cloud vendors trying to keep up with the demand for “leaner” VMs – there was some work to that end but it clearly wasn’t a huge customer driver. If you cared too much about the overhead, you had the other extreme of just jamming multiple apps onto one box, old school style. Now, you don’t want to do that – I worked in an environment where that was the policy and I developed my architectural doctrine of “sharing is the devil” as a result. While running apps on bare metal is fast and cost effective, the reliability, security, and manageability impediments are significant. But “here’s another option on the spectrum of hardware to VM” doesn’t seem that transformative on its face.

OK, so docker is a process wrapper that hits the middle ground between a larger, slower VM and running unprotected on hardware. But it’s more than that.  Docker also lets you easily create packaged, portable, ready to run applications.

The Problems With Configuration Management Today

The value proposition of docker started to become more clear once the topic of provisioning and deployment came up. Managing systems, applications and application deployments has been at worst a complicated muddle of manual installation, but at best a mix of very complex configuration management systems and baked images (VMs or AMIs). Engineers skilled in chef or puppet are rare. And developers wanted faster turnaround to deploy their applications. I’ve worked in various places where the CM system did app deploys but the developers really, really wanted to bypass that via something like capistrano or direct drop-in to tomcat, and there were always continuous discussions over whether there should be dual tooling, a CM system for system configuration and an app deployment system for app deploys. And if you have two different kinds of tooling controlling  your configuration (especially when, frankly, the apps are the most important part) leads to a lot of conflict and confusion and problems in the long term.

And many folks don’t need the more complex CM functionality. Many modern workloads don’t need a lot of OS and OS software – the enterprise does that, but many new apps are completely self-contained, even to the point of running their own node.js or jetty, meaning that a lot of the complexity of CM is not needed if you’re just going to drop a file onto a  vanilla box and run it. And then there’s the question of orchestration. Most CM systems like to put bits on boxes, but once there’s a running interconnected system, they are more difficult to deal with.  Many discussions about orchestration over the years were frankly rebuffed by the major CM vendors and replied to with “well, then integrate with something (mcollective, rundeck).” In fact, this led to the newer products like ansible and salt arising – they are simpler and more orchestration focused.

But putting bits on boxes is only the first step.  Being able to operate a running system is more important.

Service Management

Back when all of the agile admins were working at National Instruments, we were starting a new cloud project and wanted to automate everything from first principles. We looked at Chef and Puppet but first, we needed Windows support (this was back in 2008, and their Windows support was minimal), and second, we had the realization that a running cloud, REST services type system is composed of various interdependent services, and that we wanted to model that explicitly. We wanted more than configuration management – we wanted service management.

What does it look like when you draw out your systems?  A box and line diagram, right? Here’s part of such a diagram from our systems back then.

phylogical

Well, not to oversimplify, but when you use something like CloudFormation, you get the yellow boxes (hardware, VMs, cloud instances). When you use something like chef or puppet, you get the white boxes (software on the box). But what about the lines? The point of all those bits is to create services, which are called by customers and/or other services, and being able to address those services and the relationships between them is super important. And trying to change any of the yellow or white boxes without intelligent orchestration to handle the lines – what makes your services actually work – is folly.

In our case, we made the Programmable Infrastructure Environment – modeled the above using XML files and then added a zookeeper-based service registry to handle the connections, so that we could literally replace a database server and have all the other services dependent on it detect that, automatically parse their configurations, restart themselves if necessary, and connect to the new one.

This revolutionized the way we ran systems.  It was very successful and was night and day different from the usual method of provisioning, but more importantly, controlling production systems in the face of both planned and unplanned changes. It allowed us to instantiate truly identical environments, conduct complex deployments without downtime, and collaborate easily between developers and operations staff on a single model in source control that dictated all parts of the system, from system to third party software to custom applications.

That, in conjunction with ephemeral cloud systems, also made our need for CM a lot simpler – not like a university lab where you want it to always be converging to whatever new yum update is out, but creating specifically tooled parts for one use and making new ones and throwing the old ones away as needed. Since we worked at National Instruments, this struck us as on the same spectrum the difference from hand-created hardware boards to FPGAs to custom chips – the latter is faster and cheaper and basically you throw it away for a new one when you need a change, though those others are necessary steps along the path to creating a solid chip.

We kept wondering when the service management approach would catch on.  Ubuntu’s Juju works in this way, but stayed limited to Ubuntu for a long time (though it got better recently!) and hasn’t gotten much reach as a result.

Once docker came out – lo and behold, we started to see that pattern again!

Docker and Service Management

Dockerfiles are simple CM systems that pull some packages, install some software, and open some ports. Here’s an example of a dockerfile for haproxy:

#
# Haproxy Dockerfile
#
# https://github.com/dockerfile/haproxy
#
# Pull base image.
FROM dockerfile/ubuntu
# Install Haproxy.
RUN \
sed -i ‘s/^# \(.*-backports\s\)/\1/g’ /etc/apt/sources.list && \
apt-get update && \
apt-get install -y haproxy=1.5.3-1~ubuntu14.04.1 && \
sed -i ‘s/^ENABLED=.*/ENABLED=1/’ /etc/default/haproxy && \
rm -rf /var/lib/apt/lists/*
# Add files.
ADD haproxy.cfg /etc/haproxy/haproxy.cfg
ADD start.bash /haproxy-start
# Define mountable directories.
VOLUME [“/haproxy-override”]
# Define working directory.
WORKDIR /etc/haproxy
# Define default command.
CMD [“bash”, “/haproxy-start”]
# Expose ports.
EXPOSE 80
EXPOSE 443

Pretty simple right? And you can then copy that container (like an AMI or VM image) instead of re-configuring every time. Now, there are arguments against using pre-baked images – see Golden Image or Foil Ball. But  at scale, what’s the value of conducting the same operations 1000 times in parallel, except for contributing to the heat death of the universe? And potentially failing from overwhelming the same maven or artifactory server or whatever when massive scaling is required?  There’s a reason Netflix went to an AMI “baking” model rather than relying on config management to reprovision every node from scratch. And with docker containers each container doesn’t have a mess of complex packages to handle dependencies for; they tend to be lean and mean.

But the pressure of the dynamic nature of these microservices has meant that actual service dependencies have to be modeled. Bits of software like etcd and docker compose are tightly integrated into the container ecosytem to empower this. With tools like this you can define a multi-service environment and then register and programmatically control those services when they run.

Here’s a docker compose file:

web: 
build: . 
ports:
 - "5000:5000"  
volumes:
 - .:/code  
links:
 - redis 
redis:
 image: redis

It maps the web server’s port 5000 to the host port 5000 and creates a link to the “redis” service.  This seems like a small thing but it’s the “lines” on your box and lines diagram and opens up your entire running system to programmatic control.  Pure CM just lets you change the software and the rest is largely done by inference, not explicit modeling. (I’m sure you could build something of the sort in Chef data bags or whatnot, but that’s close to saying “you could code it yourself” really.)

This approach was useful even in the VM and cloud world, but the need just wasn’t acute enough for it to emerge.  It’s like CM in general – it existed before VMs and cloud but it was always an “if we have time” afterthought – the scale of these new technologies pushed it into being a first order consideration, and then even people not using dynamic technology prioritized it. I believe service management of this sort is the same way – it didn’t “catch on” because people were not conceptually ready for it, but now that containers is forcing the issue, people will start to use this approach and understand its benefit.

CM vs. App Deployment?

In addition, managing your entire infrastructure like a build pipeline is easier and more aligned with how you manage the important part, the applications and services.  It’s a lot harder to do a good job of testing your releases when changes are coming from different places – in a production system, you really don’t want to set the servers out there and roll changes to them in one way and then roll changes to your applications in a different way.  New code and new package versions best roll through the same pipeline and get the same tests applied to them. While it is possible to do this with normal CM, docker lends itself to this by default. However, it doesn’t bother to address this on the core operating system, which is an area of concern and a place where well thought out CM integration is important.

Conclusion

The future of configuration management is in being able to manage your services and their dependencies directly, and not by inference. The more these are self-contained, the more the job of CM is simpler just as now that we’ve moved into the cloud the need for fiddling with hardware is simpler. Time spent messing with kernel configs and installing software has dropped sharply as we have started to abstract systems at a higher level and use more small, reusable bits. Similarly, complex config management is something that many people are looking at and saying “why do I need this any more?”  I think there are cases where you need it, but you should start instead with modeling your services and managing those as the first order of concern, backing it with just enough tooling for your use case.

Just like the cloud forced the issue with CM and it finally became a standard practice instead of an “advanced topic,” my prediction is that containers will force the issue with service management and cause it to become more of a first class concern for CM tools back even in cloud/VM/hardware environments.

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

4 Comments

Filed under DevOps

Docker and the Future of Configuration Management – Coming In November!

All right!  We have a bunch of authors lined up for our blog roundup on Docker and the future of CM.  We’ll be publishing them throughout the month of November. But it’s not too late to get in on the action, speak up and you can get a guest post too! And have a chance to win that sweet LEGO Millenium Falcon…

To get you in the mood, here’s some good Docker and CM related posts I’ve read lately:

And a video, Immutable Awesomeness – I saw John and Josh present this at DevOps Enterprise Summit and it’s a must-watch! Using containers to create immutable infrastructure for DevOps and security wins.

Leave a comment

Filed under DevOps

Agile Organization: Project Based Organization

This is the fourth in the series of deeper-dive articles that are part of Agile Organization Incorporating Various Disciplines.

The Project Based Organization Model

You all know this one.  You pick all the resources needed to accomplish a project (phase of development on a product), have them do it, then reassign them!

project

Benefits Of The Project Based Model

  • The beancounters love it. You are assigning the minimum needed resource to something “only as long as it’s needed.”

Drawbacks Of The Project Based Model

Where to even begin?

  • First of all, teams don’t do well if not allowed to go through the Tuckman’s stages of development (forming/storming/norming/performing); engineer satisfaction plummets.
  • Long term ownership isn’t the team’s responsibility so there is a tendency to make decisions that have long term consequences – especially bearing on stability, performance, scalability – because it’s clear that will be someone else’s problem. Even when there is a “handoff” planned, it’s usually rushed as the project team tries to ‘get out of there’ from due date or expenditure pressures. More often there is a massive generation of “orphans” – services no one owns. This is immensely toxic – it’s a problem with shipping software, but with a live service it’s awful, as even if there’s some “NOC” type ops org somewhere that can get it running again if there’s an issue, chronic issues can’t be fixed and problems cause more and more load on service consumers, support, and NOC staff.
  • Mentoring, personnel development, etc. are hard and tend to just be informal (e.g. “take more classes from our LMS”).

Experience With The Project Based Model

At Bazaarvoice, we got to where we were getting close to doing this with continued reorganization to gerrymander just the right number of people onto the projects with need every month. Engineer satisfaction tanked to the degree that it became an internal management crisis we had to do all kinds of stuff to dig ourselves back out of.

Of course, many consulting relationships work this way. It’s the core behind many of the issues people have with outsourced development. There are a lot of mitigations for this, most of which are “try not to do it” – like I’ve worked with outsourcers trying to ensure lower churn on outsource teams, try to keep teams stable and working on the same thing longer.

It does have the merit of working if you just don’t care about long term viability.  Developing free giveaway tools, for example – as long as they’re not so bad they reflect poorly on your company, they can be problematic and unowned in the long term.

Otherwise, this model is pretty terrible from a quality results perspective and it’s really only useful when there’s hard financial limitations in place and not a strong culture of responsibility otherwise. It’s not very friendly to agile concepts or devops, but I am including it here because it’s a prevalent model.

1 Comment

Filed under Agile, DevOps