Eugen Oetringer writes “We Selected the Simple Solution. The Complex Solution Became a Worldwide Standard.” A key issue identified is: “In an increasingly complex world, root cause analysis changed from unrestricted analysis to analysis within defined boundaries, such as the boundary of an organization or a best practice.” When the limited solution set is already given, it’s not surprising that a fix to a complex problem is not in your restricted box. In fact, if the “best practices” checklists actually had a solution to a wicked real world problem, then you wouldn’t be experiencing that problem in the first place.
Let me say that clearly again. Organizations make massive efforts to execute the limited “best practices” checklists. When a problem does occur and you need a true root cause analysis, it doesn’t make sense to start with the existing checklists. The probabilities are very large that the root cause is either something NOT in the current solution set OR it’s a wicked interaction effect between system components that wasn’t modeled in the checklist. This is particularly true when some relatively minor change sets off cascading failures.
From on-the-ground experience with doing true root-cause analyses on wicked problems, the most significant problem is to convince management to put everything on the table, because we often get systems and political issues they’d rather not hear. Almost as important an issue is to prevent the analysis from becoming a witch hunt. We start our sessions with the following “This is nobody’s fault, AND everybody here is responsible for a solution.” We work hard with a few of the most respected senior staff to model admitting their own mistakes on even small issues. When people hear the smartest guys in the room say “oops, I probably shouldn’t have done that” (usually a short cut), it changes the dynamic and allows everybody to be more open about their own contributions to wicked problems.
The other thing we’ve learned is not to stop at the first easy cause. We ask again and again, “what else might be going on with the system?” When systems become too complex to draw, it’s hard for even the best to have an accurate mind map of what’s out there. The more parts of the system we explore as a group, the more accurate our shared understanding of what drives the system and the less likely we are to make ignorant mistakes because we didn’t know a part impacted so many places.