We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.” One of the most common root causes people like to assign blame to is “human error”. Not to mince words, this is usually political, buck-passing CYA of the highest order.
I just read a great article on the recent U.S. Navy ship collision issues I wanted to pass on. If you have been keeping up with the news, there has been a rash of Navy ships colliding with other ships causing fatalities. When you go Google it up, you see a whole bunch of “Navy attributes it to human error…”
But now go read this article, Something’s Wrong In The Surface Fleet And We’re Not Talking About It. It’s written by Capt. Michael Junge, an experienced Naval officer. The TL;DR is that you can say “human error” all you want, fire someone, and call it case closed, but these accidents are a systemic amount of understaffing of Naval surface ships and massive undertraining and maintenance that is a leading indicator of even worse to come should an actual wartime deployment be necessary.
Even in engineering, we are tempted to push the problem down onto the person that made a mistake. Fully engaging with the system that caused the need for the action that caused the mistake, the lack of validation that makes mistakes possible, and so on is hard thinkin’. It is threatening when people point out flaws in processes and systems and code you had a hand in. But the only way to actually improve your situation is to soberly assess what the actual contributors to issues are, and work towards fixing them.