It’s about what broke, not who broke it

Wednesday, March 28, 2018

A "blameless" post-mortem after an outage isn't easy. It takes time to be able to look beyond the person and focus on the what.

The important thing at the time was that we cared about what broke, not who broke it. Who broke it is frequently just a roll of the dice: who got that particular task, bug, or ticket assigned to them, and happened to run this valid command instead of that also-valid command? Why would you ever assign blame based on that?

If anything, you'd want to find out the general pattern of what had broken and then go scour the code base to see if it had happened anywhere else. Chances are, someone didn't just come up with that particular string-mangling "wizardry" out of thin air, and they either picked it up from some other part of the code, or someone else later copied from them and did the same stuff in their own code.

Source: It's about what broke, not who broke it