Tag Archives: retrospective

DevOpsDays Austin 2022 Retrospective

DevOpsDays Austin was back in person in 2022 for our 10 year anniversary! So how’d it go? We are committed to transparency and metrics here so enjoy our retrospective.

Financially it went well; we sold 317 tickets and had 296 people show up (an almost unheard of 93% show rate), we were able to donate $28,000 to support the LGBT+ members of our community via Out Youth and The Trevor Project, bringing us to $100,000 donated to charity total over the history of the event, and still ended up with a $1000 increase in our bank account.

We also send out retrospective surveys to our attendees, speakers, sponsors, volunteers, and even our fellow organizers to find out how we’re doing and get an idea of what we should do better on (or keep doing) in the future!

Here’s the deeper details if you want them: DevOpsDays 2022 Retrospective

To sum up, however, it’s looking good! Our last surveys were from 2019, from our last pre-pandemic DevOpsDays Austin, so we have a previous number to compare to.

Our attendee NPS was up to 77 (44 responses) from 62 (50 responses). And the things people loved were, basically, the personal interaction. Community, people, discussions, and openspaces were the most cited positives by far. We knew people had been missing that for a couple years now so our retrospective format and event plan were specifically designed to promote small group interactions.

The gripes were more varied. Primary was the food, which is fair enough. While we don’t intend to change from a boxed lunch format – it leaves so much more time for the actual conference, so we left fancy catered lunches behind long ago – we were forced to use the venue caterer and they ran out of food especially veggie options, and we had asked for breakfast tacos both mornings and on day two the breakfast was what I can only call “leftover meat bits.” So room for improvement, with the understanding that boxed lunches are here to stay, but we’ll definitely see what we can do about better options and especially making sure there’s not shortages of anyone’s dietary needs.

The other leading concern we’re just plain going to ignore, and here’s why. It was “the retro format – but what about technical talks, what about content for newbies?”

At DevOpsDays Austin we explicitly reject the assumption that all events must be the same generic thing every time. We specifically change our format every year. We’ve had the Monsters of DevOps where we went for flying in big keynoters (including all the authors of the DevOps Handbook) and doing everything up huge; we’ve had DevOps Unplugged where talks were voted in on site and there were no sponsor tables. This year we had a “class reunion” format with talks only being 20 minute “retrospectives” from what the speaker’s learned over their time in DevOps (some speakers were experienced, others were new voices). We very, very clearly talked about this format on our website, social media, and emails to attendees and sponsors. In the end, people just don’t read, and there’s nothing really to do about that. And we won’t be doing the retro format two years in a row, we’ll keep mixing it up!

Organizer feedback was good (we have 20 organizers), up slightly, with everyone enjoying working with the group, and some concerns about unclear roles and roles already taken. That’s always a challenge – we have a lot of organizers but not all of them are up to actually leading something. We have people volunteer to own roles and then encourage them to reach out to the others/the others to chip in when we need something, but that doesn’t always go well, which is frustrating for everyone. In the end, most roles need someone who can commit to consistent participation over the planning period (there’s a couple specialty roles like making signage that can be backloaded, but not many). But we want to be inclusive and not tell people “no, take the year off if you can’t be putting in a couple hours a week including making the organizer calls, and truly own something.” We’ve wrestled with this for 10 years, no clear answer is in sight.

Speakers love speaking at the event. NPS was 92, slightly up from 90 in 2019. They love the audience, how supportive and welcoming they are, and how low stress and chill the experience is. There’s always some AV issues as a fly in the ointment – we do AV checks but not everyone shows up for them.

Volunteers have a good experience too. NPS was 88, slightly down from 94 but still good; we try to make sure that the load isn’t too much on any given volunteer so they can also enjoy the event. Posting the openspace topics is always a challenge each year; we tweet out photos and then desperately type them into the sessionize, but a bunch of attendees are social media impaired I guess and it’s hard to get the schedule to everyone, but there’s not a lot of options given that openspaces are predicated on doing the agenda immediately previous; I’m not sure more time would help short of printing out copies or having live monitors everywhere.

Sponsor feedback was down from 60 to 50 NPS. They do like the audience and authentic content. The main problem was the new venue and unclear flow meant that the platinum sponsor rooms were more out of the way than we’d planned (we gave them tables in the gold area as well once this became clear). And then the general sponsor gripes about it not being a good lead gen event. We always tell sponsors this is a good participation event, not a good lead gen event, no badge scanners, no sponsor list, etc., but a previously mentioned people don’t read, plus the teams being sent out aren’t the people buying the sponsorships and often just assume they’ll be getting a standard conference experience. We sell out every year so I’ll worry about it when that stops.

There’s one other thing worth mentioning, which is that we did require masking at the event and asked people to either be vaccinated or test prior to the event.

One the one hand, a couple sponsors and attendees griped about the masking.

On the other hand, despite other events resulting in superspreading (Kubecon EU, RSA, even some DevOpsDays events):

So, with all due respect, we are very happy with our choice and that we had a safe event. No one likes wearing masks. If you don’t like it enough to not come – don’t come. Hopefully it won’t be necessary next year.

Everything was pretty good! There was one issue, though – in all the survey sub-questions, there was a drop in the perceived friendliness of the organizer team, so we’re going to make some changes there – stay tuned to hear what!

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Austin is Back For 2022!

Your DevOpsDays Austin 2022 Organizers!

Well, we had to skip 2 years in a row due to the pandemic, but we were finally able to have DevOpsDays Austin in person again in 2022! It’s our tenth anniversary of DoD Austin, we had the first one at National Instruments back in 2012, one of the early DevOpsDays in the US.

We had to move to a new venue, and used the beautiful University of Texas Etter-Harbin Alumni Center (our site lead Bill is a UT alum which makes it half price!). The Etter-Harbin Center is right across from the stadium where we had DoDA in the years leading up to the hiatus. It has plenty of great outdoor spaces, which we used for lunches and happy hours, as well as a great main ballroom with views of the outside. It worked great for our target of 350 attendees, and we think we could make it work for more in the future.

We were thinking and came across the perfect theme – it’s our tenth anniversary, and we’re just back from the pandemic, and DevOps is also just a little more than ten years old and at a weird inflection point that has people asking “is DevOps dead? Where does DevOps go from here?” So we decided that since we were also in the Alumni Center, the obvious theme was our 10 Year Class Reunion!

We don’t take our themes lightly at DevOpsDays Austin. We settled on a new theme for our talks – instead of the normal RFC for whatever technical and culture topics, we required all talks to be a retrospective format – reflecting on what you’ve learned over the years of DevOps and what you think the future holds. We had lots of great speakers, many of whom are long time parts of the DoD Austin community, both locals like Rob Hirschfeld, Christa Meck, Ross Dickey, and Victor Trac, as well as folks from other parts of the Earth like Patrick Debois, Damon Edwards, J. Paul Reed, Pete Cheslock, and Michael Cote, who all frequently come to Austin to share with us.

And we printed a yearbook, with pics of speakers from all the events, our tshirts over the years, and more! Very snazzy, and we had people sign each others’ yearbooks to add some fun to the hallway track. In fact, you can view the yearbook online and buy a hardcopy here if you want!

The DevOpsDays 2022 Yearbook

We did require COVID protocols – masking inside and (honor system) vaccine or test, and while it is a bummer to not see each others’ faces, it also resulted in only one person I know of getting COVID the week after, so well worth it.

We didn’t have to worry about sponsor interest! We sold out quickly. Here’s the ones I got snaps of!

Everything went great, and it was super to finally get back together and interact with our local DevOps community. J. Paul Reed led a great session where a retro was done on DevOps in general!

And one of the best things is that we managed to carry on our tradition of giving our excess proceeds to charity! I’ll do a separate post on that, but the short form is that we contributed $28,000 to LGBT-supporting charities, half to The Trevor Project and half to Out Youth here in Austin, bringing us to $100,000 given to charity over our 10 years in existence! Stay tuned for more details on that…

2 Comments

Filed under Conferences, DevOps

Why Blaming “Human Error” Is Wrong

I’ve been writing a LinkedIn Learning course on postmortems lately and digging into all the fun research on the topic (Dekker, Hollnagel, and so on), and deepening my knowledge on the things I hope most of you know (root cause is a myth, blaming “human error” is wrong…).

I came across an example that really brought it home to me why the continuous blaming of human error is wrong – not “mean,” not “unenlightened,” but just plain logically ineffective.

One of the classic examples of design choices contributing to aviation accidents is the similarity and close placement of landing gear and flap controls in an airplane cockpits. Pilots lower the landing gear, then when they land they pull the flaps – but a small miss has them retract the landing gear instead and they pancake in.

In fact, the US Air Force did a study at the close of World War II where they looked back at all kinds of “pilot error” crashes and identified a bunch of design problems that contributed, and the flap/landing gear confusion was #2 on the list, forming 16%

Analysis of Factors Contributing to 460 ‘Pilot-Error’ Experiences in Operating Aircraft Controls,” by P.E. Fitts and R.E. Jones, USAF Aero Medical Laboratory, Memorandum Report, July 1947.

Then, 20 years later, a major study on the same topic

Aircraft Design-Induced Pilot Error, National Transportation Safety Board, Department of Transportation, Washington, D. C., PB # 175 629, July 1967.
And then, another 13 years later, suddenly they realized the same thing about small craft.

Well, there were very simple fixes mandated to this problem early on – the FAA now requires, for example, the landing gear control be shaped like a wheel and the flap control be shaped like a flap, so basic visual and tactile feedback is available to distinguish the two (especially at night, under stress, etc.).

Screen Shot 2019-09-04 at 5.00.37 PM

There’s other simple tricks that greatly reduce these accidents, like putting a catch on the landing gear retraction. Suddenly the “human error” goes away (leading to the reasonable question of how broadly we define “human error…”

So why did this “known” problem persist – damaging planes and killing people I might add – for 33 years?  In fact, it still happens, here’s a lovely writeup from 2015.

Basically, because all of the accidents where this happened continued to be declared “pilot error.” When there’s not any significant further inquiry (which to be fair, bothers people like airlines and the government and aircraft manufacturers and people with money), just saying it’s pilot error gets the problem over with by a minor sacrifice (a pilot) instead of doing any harder work.

At my new job, we’re doing risk analysis and commercial insurance for unmanned autonomous vehicles (drones). It’s interesting to now be working in a space that’s actually closer than tech to all this safety research. And you can see the same things happening.

Sure, actual research shows that it’s technical problems, not really human error, that is the problem most of the time – see

News article: More drone crashes caused by technical glitches, not human error, study shows.

Study: Exploring Civil Drone Accidents and Incidents to Help Prevent Potential Air Disasters

It cites technical problems as 64% of the time and frankly doesn’t really distinguish human error from design-induced human error.

But of course you can still just blame human error.  I smelled something questionable in the recent news reports of how the crew was to blame for a UK Watchkeeper drone crash. Oh sure, the drone failed to land correctly so the crew intervened, so it’s their fault.  I suspect if they had not intervened and it had continued to malfunction it would also be “their fault.” Here’s the full Ministry of Defense writeup, which goes to the “loss of situational awareness” synonym for human error. And here’s a later Register article with more details, like “The most appropriate [flight reference card] drill …stated: ‘If UA [unmanned aircraft] not maintaining centreline axis: Engine cut……..Command’.” and that they were under supervision of contractors from the drone manufacturer. Apparently following the actual designated drill recommendation of cutting the engine still makes it your fault.

Screen Shot 2019-09-05 at 9.51.07 AM

Of course, “Five drones – almost 10% of a 54-strong fleet bought from French firm Thales – have been wrecked in mid Wales crashes.”  Apparently they have a lot of navigation problems.  So when one goes to land in a populated area and is clearly not navigating properly and goes off the runway and the crew cuts the engine…  The crash is ‘their fault.’  Riiiiiight.

It’s actually a fascinating question – when drone operations are more and more autonomous, how long can we just hold the “crew” responsible for anything that goes wrong? It’s cheaper and less embarrassing, so I’m betting a good while.

For us in tech this opens up an interesting discussion, beyond the obvious statement of “if you do an incident postmortem and simply write it off to developer or operator error you aren’t doing your job”.

We can’t always fix all design issues immediately.  At what point, though, does not prioritizing a better-than-bandaid fix become negligence? “Thirty years,” like with the flap/landing gear thing?

There’s a lot of legislation that tries to protect the powerful from lawsuits etc. – but as autonomy becomes more common, how long will that last for us? Technology firms have managed to get out of being held responsible for endemic security flaws (largely thanks to Microsoft) for decades.

You can see this beginning to crumble in aviation with things like the recent Boeing 737 Max crashes. “It’s human error!” declares the Boeing CEO.  But people aren’t that dumb and the Internet helps information get out that was previously inaccessible.  So the next tack is blame the software, but that’s also buck-passing… The software was the band-aid fix on top of the design issues.

When will our lovely band of insulation finally be whittled away in tech? Soon, I’d bet… “Oh sure let’s crank out some self driving cars, I’m sure it’ll be fine and we can just give the standard ‘what me worry’ face when our crappy design ends up killing some soccer mom that we use when we mess up a software patch nowadays.”

In the end, if you are motivated by actual safety, or uptime, or security instead of the CYA game of who to blame, you have to push beyond the nearest human, or the outermost band-aid on your Rube Goldberg system, to improve the system. You’re going to have to consider how your design and interfaces of your software and the tooling you use to operate it contribute. You’re going to have to use facts and numbers and not soothing opinions to say “you know what? That goes wrong more than our other systems – there’s something wrong with it, we have to dig in and figure out what.”

Leave a comment

Filed under DevOps

DevOpsDays Austin 2019 Retrospective

2019-05-02 12.49.54As mentioned, DevOpsDays Austin 2019 went off great!  And after the event, we sent out extensive surveys to attendees, sponsors, volunteers, speakers, and even the organizers to learn and improve. (Thanks to everyone who gave their feedback, we appreciate it!)

Last year we also did an extensive retrospective to figure out how we wanted this year to go, and this year’s event was driven by that feedback and our vision to make DoD Austin the place for practitioners to come, learn from each other, and build the local community.

Let me share this year’s retro with you – some of the numbers and sentiments are below with my thoughts. If you want the full details, sure, here you go!

Full DevOpsDays Austin 2019 Retrospective (pdf)

If you’re not familiar with a NPS score, it’s used to measure sentiment on a scale from -100 to +100.  When you get asked “would you recommend” something on a 1-10 scale, generally they’re taking that number and bucketing it into 1-6 being detractors (counted as negative), 7-8 being neutral, and 9-10 being promoters (counted as positive). Above 0 is “good”, above 50 is “excellent.”  See more about NPS scores here.

Sorry about the quality of the pics, these are basically ones I snapped myself on my iPhone. But hopefully they show some of what happened at the event!

Attendee Feedback (62 NPS, 50 responses)

2019-05-02 09.43.28

Damon Edwards

“Informative, laid back, friendly, humorous event. My favorite conference for a couple of years now.” 84% of attendees said they were likely to return.

The things people liked the most as measured by the freeform comments were the openspaces (9 comments), the speakers/talks, especially their diversity (8 votes), the culture/atmosphere of the event (5 votes), and the community and people (5 votes).

This makes me happy. DevOpsDays isn’t just “a conference,” it really focuses on building community – people meeting each other in a friendly and collaborative environment. The content is nice but it’s not the primary value of the event.

2019-05-02 09.48.15

Mandy Whaley

Concerns people had the most were “Nothing/great job” (10 votes), difficulty with travel and parking at the venue, including handicap access (6 votes), talks (6 votes), we want better lunches (4 votes).

Read on for more but we’re probably changing venues next year and will keep access in mind.  Now on the lunches – we used to have fancy lunches and they were a significant time and effort sink, with long lines, lots of time spent, and so on.  We moved to box lunches and now lunch goes fast and easy and leaves everyone more time to interact with each other.  We do not plan to ever change back from that, but we will see if we can get a BBQ place or something to do a nice lunch box.

(There were more likes and dislikes and we are evaluating action on all of them, but dang this post is going to be long already so I’m focusing on the top line items.)

Speaker Feedback (90 NPS, 10 responses)

2019-05-02 11.10.39

Pete Cheslock

  • “Everyone was really positive; welcoming, low-pressure environment.”
  • Experience – 50% excellent, 50% very good
  • Organization – 40% extremely, 50% very organized
  • Friendliness – 90% extremely, 10% very friendly

Likes: No tech problems/helpful techs/setup organized (x4), Supportive/welcoming (x3), Engaged audience (x3).  Dislikes: Chromebook support problem, schedule slippage, openspaces competing with Conversations talks.

Great overall, some things for us to tweak!  After several years in the same venue and buying a lot of gear, our crack AV team have the tech end of it pretty much down pat.

2019-05-03 15.20.05

Jon Loyens

Organizer Feedback (88 NPS, 8 responses)

  • “Just [wanted] to say how much I enjoy working with the crew and watching it all come together to put on a great event for the community. I get a lot out of doing it each year and see my contribution as an important way to give back.”
  • Time spent – 62.5% just right, 12.5% little long, 12.5% little short, 12.5% way too short
  • 93% likely to return (the one that isn’t pleaded a heavy year at work coming up)

Major likes included working together (x3), inclusion (x2), and the opportunity to give back (x2). Dislikes included some stressing out and looking for problems, and speaker notification happening late. There was good discussion about explaining openspaces more especially for the newer folks.

It’s important to me that our organizers have a good time too – my assigned domain on the organizer team is “Organizers” – besides working the master budget and schedule for folks, I facilitate and try to ensure that this volunteer gig is not onerous, and I’m happy we seem to be there.

2019-05-02 13.33.45

Deborah Hawkins

Volunteer Feedback (94 NPS, 17 responses)

  • Experience: 72.7% excellent, 27.78% very good
  • How much time you spend – 83% about right, 11% too much, 6% too little
  • 93% likely to return

We have a lot of volunteers from the community that come to slave away working the event for a free ticket and a couple meals, basically.  It’s very important to all of us that they have a good experience – these are the future organizers, and community members going above and beyond to give back to the community.  Boyd and Daria and the other organizers did a great job both organizing the work and making sure the volunteers had time to participate in the event and have a good experience – even given the storm-nightmare loadout at the end of the event. Thanks to all our great volunteers!

Sponsor Feedback (60 NPS, 10 responses)

  • “A++ highly recommend, etc. Y’all did a bang-up job putting this together, and the community is certainly a testament to your hard work and continuous efforts. I’ve told everyone at HQ that we need to learn from you.”
  • Experience – 70% excellent, 20% very good, 10% good
  • Liked: “Always a great event – excellent sessions, great opportunities to meet with customers and prospects.” Vendor area good. Friendly people and networking.
  • Disliked: Platinum sponsors were upstairs. Water bottles ran out. We want badge scanners. No day before setup. Only 1 minute blurb. Schedule off track. When will courtesy shipping be picked up.

2019-05-03 09.49.41So… Sponsors. For a number of years we kept expanding our sponsor offerings.  Then we realized the event had become too much of a traditional conference and we were spending lots of space, time, and effort on sponsors, when to be honest we don’t really need all that much money to put on the event.  Two years ago after a bunch of sponsor problems and everyone working themselves to the bone to provide professional conference services I did away with sponsor tables altogether. We let them back this year but really wanted to make the event not about that.  We also warn the sponsors up front this isn’t a “churn the leads” event, we want sponsors who are going to send technical people to engage with the community.

Did it work out that way?  Kinda. There’s too much expectation set up about what “conferences are like” and “DevOpsDays are like” and between the person purchasing the sponsorship and the people actually sent on site there’s a lot of room for expectations to drift.

2019-05-03 14.52.36

Tristan Slominski

I feel like there’s plenty of big conferences for that kind of sponsor engagement.  DevOpsDayses didn’t used to be like that, but as time goes on and they all grow it’s tempting to “improve” by making it more sponsor focused. We love sponsors who engage with the community but we consciously balance their participation in the event.

Funny story… Like I said we only let sponsor tables back on a limited basis this year. But there was a run on them, and we sold out of the ones we needed to fund the event quickly and had a bunch of sponsors still wanting to participate, including ones who had participated for  years. So we extended the sponsor room, just to let them participate, because we felt bad about excluding them. So we always sell out, so that’s probably a sign that we’re doing fine there.

And we got to sponsor a house for the homeless with the spare money, so that’s spiffy.

Recruiter Feedback (-50 NPS, 2 responses)

This is a new addition that didn’t work out so well. We had imagined a big recruiter speed dating thing. But few recruiters and attendees signed up for it so we pivoted into a recruiter fair.  It was during happy hour, but half the attendees leave before that. We had them by the bar, but the DevOps Trivia during the happy hour was also a big draw.

While all the recruiters rated their experience “good” they had low traffic.

So, sorry that didn’t work out. But I stressed to the organizers that this wasn’t a failure – if we don’t try new things that don’t work out sometimes, we’re not trying hard enough.

We’re one of the great grand-daddy DevOps events. We have years of experience, ample funding, and a big community.  Smaller DoDs, especially ones getting off the ground, often need to hew close to the “standard format” for a safe launch and to pay their bills.  We can afford to experiment, so I strongly urge the team every year to try different things.  It’s OK if we appeal to different sets of the community each year.  It’s OK to not do something again (even if it went well) and it’s OK to try new things as stretch goals. I kinda like putting how we run our event where our DevOps mouth is, so to speak.

This lets us try things out first. We were the first DoD with a multi-content track. We created the new “Conversations” talk format this year. We keep innovating, and sometimes there’s just not a fit given the constraints of venue, time, people, and so on. So this one didn’t go off great, but to me that just means we’re legitimately experimenting hard enough.

Ernest’s Retrospective Thoughts

Overall it went great!  Smooth, excellent execution by everyone involved. I feel like the Austin tech community is stronger for our event existing and that’s what I want out of it.

My main challenge personally this year was with the talks.

We really went into this year with an intent to curate the talks to a pretty specific practitioner format. DoD Austin has a bunch of years behind it so we don’t necessarily need the DevOps “talk circuit” talks to fill slots.  We feel like we can be very specific about the experience we want to curate – no repeat talks from other events (go watch them on the Internet, everyone posts videos!), some preference to local speakers, encourage diversity both in speakers and in content…  But we didn’t execute on that well.  We started using Papercall this year and it makes it easy for people to mass submit to multiple events – a great feature but somewhat antithetical to our needs. We had 200 submissions for 20 slots and had a lot of weeding to do and had to turn away a lot of folks. And while we had good talks, they didn’t fit our proposed theme necessarily.

We also just selected talks late, to where it risked people whose talks were declined not being able to attend because we sold out our attendee cap.

The second challenge was with openspaces.  In general the larger the event, the harder it is to make openspaces work. Once there’s more than 25 people in an openspace the format collapses and it’s just “2-3 people talking to each other and everyone else straining to hear,” basically a super crap panel talk. Putting them in the luxury boxes in the stadium worked really well there, because only so many people can fit into one, so it was a forcing function to keep them small enough to work. So they went well overall.

But some folks didn’t like them. Each year we get some feedback from folks more used to traditional content.  “Maybe we should get the openspace topics submitted before the conference so they’re already on the schedule!” No offense, but over my dead body. That’s not what openspaces are about and openspaces are the heart of DevOpsDays. They are for what the actual attendees want to talk about right then; the entire point is that they’re not programmed content. Early DevOpsDays were a couple talks and then pretty much all openspaces.  My general attitude is “if you don’t want to participate in openspaces, this is not the event for you.” We need to explain openspaces more ahead of time though, to seed ideas and get new people to understand the format.  Our experiment with mini-talks and then linked openspaces worked out great, I went to two of them and got high value out of them.

Next Year

A couple big changes are coming next year.

First of all, we’re probably changing venue.  We’ve enjoyed the stadium a lot, and love the staff there, but we’ve probably done as much as we can with the event in that particular form factor.

We’re considering going entirely to the new 20 minute talk format.  They were well received – if you really have more content than 20 minutes, a linked openspace is probably the best venue to explore it with highly engaged attendees!  And it’ll prevent people just submitting their “same talk” as much. We can also get more speakers in!

Also, we know it’s a bummer that we’ve been capping attendance and sponsors and that people who want to attend get turned away. So far we’ve felt like we have had to, both because of venue capacity but also to keep openspaces good and keep the great atmosphere and community and opportunities for engagement that make our event distinct.

Now that we have enough experience, we think we might be able to go bigger and still keep the small group and one-on-one interaction. We’ve all been to a bunch of conferences and seen other things – 1-1 mentoring table signups, for example, and other formats that facilitate it.  We’re also thinking about adding some “working groups” – opportunities to do something, produce position papers, whatnot, give the experts a really neat thing to do at the event.

And maybe even add on a third day, with all unstructured content. On a Saturday so people could bring their kids and stuff.

I wanted to just blaze big next year; the rest of the team loved the vision but reminded me how much burn-in there is on a new venue – getting A/V figured out, all the rough spots of a year one… So we may iterate into it, with getting a new venue and going slightly larger and trying out new engagement ideas next year, and then the year after saying “Big tent!  All are welcome!  Fly in for this one, no attendee or sponsor caps!” and making it a heroically sized event.

There’s no one right format for DevOpsDays – I encourage other organizers to keep experimenting as well.  Your event doesn’t have to be the same year to year; you can target different goals and audiences and sizes and such each time.

If anyone read this far, feel free and comment with your thoughts below! (Obligatory disclaimer, don’t tell me “well this isn’t right for my DevOpsDays” – that’s fine, none of this is to declare the “right” way to do an event, it’s just what is working for us in our community with our particular goals.)

Leave a comment

Filed under Conferences, DevOps

Assigning Fault To Human Error Is A Human Error

fitz-1024x683

We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.”  One of the most common root causes people like to assign blame to is “human error”.  Not to mince words, this is usually political, buck-passing CYA of the highest order.

I just read a great article on the recent U.S. Navy ship collision issues I wanted to pass on.  If you have been keeping up with the news, there has been a rash of Navy ships colliding with other ships causing fatalities. When you go Google it up, you see a whole bunch of “Navy attributes it to human error…”

But now go read this article, Something’s Wrong In The Surface Fleet And We’re Not Talking About It.  It’s written by Capt. Michael Junge, an experienced Naval officer. The TL;DR is that you can say “human error” all you want, fire someone, and call it case closed, but these accidents are a systemic amount of understaffing of Naval surface ships and massive undertraining and maintenance that is a leading indicator of even worse to come should an actual wartime deployment be necessary.

Even in engineering, we are tempted to push the problem down onto the person that made a mistake.  Fully engaging with the system that caused the need for the action that caused the mistake, the lack of validation that makes mistakes possible, and so on is hard thinkin’.  It is threatening when people point out flaws in processes and systems and code you had a hand in.  But the only way to actually improve your situation is to soberly assess what the actual contributors to issues are, and work towards fixing them.

4 Comments

Filed under DevOps

Velocity 2013 Day 3 Liveblog: How to Run a Post-Mortem With Humans (Not Robots)

How to Run a Post-Mortem With Humans (Not Robots)

Got here a little late – not enough time in these breaks!!!

Dan Milstein (@danmil) of Hut 8 talking on how to build a learn-from-failure friendly culture.

1. and 2. – missed ’em!

3. Relish the absurdities of your system.  Don’t be embarrassed when you get a new hire and you show them your sucky deployment.  Own it, enjoy it.

Axioms to follow to have a good postmortem:

  • Everyone involved acted in good faith
  • Everyone involved is competent
  • We’re doing this to find improvements

Human error is the question, not the answer. Restate the problem to include time to recovery. “Why” is fine but look at time to detection, time to resolution. Why so long?

“Which of these is the root cause?” That’s a stupid and irrelevant question. Usually there’s not one, it’s a conjunction of factors blee blee. Look for the “broadest fix.” [Ed: Need to get a “Root cause is a myth” shirt to go with my “Private cloud is a myth” one.]

Corrective actions/remediations/fixes

Incrementalism or you’re fired! You can’t boil the ocean and “replace it wholesale.” Engineers love to say “it’s so terrible we just can’t touch it we have to replace it.” No. You have 4 hours to do the simplest thing to make it better, go.
“Well… OK I guess we could put a wrapper script around it…” OK, great! [Ed: We need to do that with all our database-affecting command line tools… Wrapper script that checks replication lag and also logs who ran it… Done and done!]

Don’t think about automation, think about tools. People think that computers are perfectly reliable and we should remove the humans.  Evidence shows this doesn’t work well. Skynet syndrome – lots of power, often written by those who don’t do the job.  Tools -> humans solve the problem, iterate on giving them better tools. Not everyone brings this baggage with automation but many do. “Do the routine task” – automate.  “Should I do this task and how should it be done?” – human.

Things are in partial failure mode all the time.  [Ed: Peco calls this “near miss” syndrome from the way they make flying safer – learn from near misses, not just crashes.]

To get started:

  • Elect a postmortem boss
  • Look for a goldilocks incodent
  • Expect awkwardness (use some humor to defuse)
  • THERE MUST BE FIXES
  • incrementally improve the incremental improvements (and this process)

Reading list!  Dang get the slides.

Leave a comment

Filed under Conferences, DevOps