We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.” One of the most common root causes people like to assign blame to is “human error”. Not to mince words, this is usually political, buck-passing CYA of the highest order.
I just read a great article on the recent U.S. Navy ship collision issues I wanted to pass on. If you have been keeping up with the news, there has been a rash of Navy ships colliding with other ships causing fatalities. When you go Google it up, you see a whole bunch of “Navy attributes it to human error…”
But now go read this article, Something’s Wrong In The Surface Fleet And We’re Not Talking About It. It’s written by Capt. Michael Junge, an experienced Naval officer. The TL;DR is that you can say “human error” all you want, fire someone, and call it case closed, but these accidents are a systemic amount of understaffing of Naval surface ships and massive undertraining and maintenance that is a leading indicator of even worse to come should an actual wartime deployment be necessary.
Even in engineering, we are tempted to push the problem down onto the person that made a mistake. Fully engaging with the system that caused the need for the action that caused the mistake, the lack of validation that makes mistakes possible, and so on is hard thinkin’. It is threatening when people point out flaws in processes and systems and code you had a hand in. But the only way to actually improve your situation is to soberly assess what the actual contributors to issues are, and work towards fixing them.
“We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.””
No, we certainly don’t “know that.” A much more accurate statement would be, “Many times, when people report a ‘root cause’, they are over-simplifying.” (Just like the statement from the blog I quoted!)
And it is somewhat ironic that a post about the silliness of assigning blame contains a line like “Not to mince words, this is usually political, buck-passing CYA of the highest order.”
If you look at Blameless Post Mortems in the SRE world, it’s fairly well known that what OP is talking about is true. Usually, there is political retribution in blame.
No, if I look at “Blameless postmortems” I would know that they are *assuming* it is true. The fact that this dogma is widespread certainly does not prove is is true!
In fact, as I tried to point out, *this very post* IS “assigning fault to human error”!