Despite the different approaches among the various schools of root-cause analysis and the specifics of each application domain, RCA generally follows the same steps.
Identification and description Effective
problem statements and event descriptions (as failures, for example) are helpful and usually required to ensure the execution of appropriate root-cause analyses. Problem statements are the North Star of the RCA as it keeps the team focused on what they are investigating and prevents them from going astray.
Gathering, organizing and analyzing information Most RCAs begin with a fact finding session to gather available information such as witness statements, the chronology of events and applicable requirements for the evolutions that were taking place at the time of the event. The information can be used to establish a
sequence of events or
timeline for the event, and to identify the line of the defenses that should have prevented the event (i.e. the administrative requirements, and physical and cyber barriers). Available databases should also be queried and analyzed (such as corrective action program and safety program databases), and data analysis tools such as Pareto charts, process maps, fault trees, and other tools that provide insights into performance gaps. Any number of data analysis tools can be brought to bear, including data analysis tools from Lean Six Sigma, statistical analysis tools, and others such as
hierarchical clustering and
data-mining solutions (such as
graph-theory-based data mining). Another consists in comparing the situation under investigation with past situations stored in case libraries, using
case-based reasoning tools and can include change analysis, comparative timeline analysis and task analysis.
Analysis of defenses After identifying the defenses in place that should have prevented the event or accident, an analysis of defenses (traditionally called
Barrier Analysis) can be conducted in every case, including non-RCA investigations. One method is to list the defenses on chart or a virtual white board, then look at the information and data that was gathered for evidence of the effectiveness of that defense to look for deficiencies or gaps in performance where the administrative requirements were not met, or where the physical or cyber barriers were bypassed. These initial gaps in performance are merely symptoms of deeper-seated causes. These symptomatic performance gaps are used to develop lines of inquiry questions as outlined below, to pursue the symptoms back to their points of origin (i.e. the root causes) using cause-and-effect analysis.
Generating focused, unbiased lines of inquiry questions After gathering available information, organizing it into charts with timelines and other data, after analyzing available data, and after conducting an analysis of defenses, those insights are used to generate questions. These questions become lines of inquiry for cause-and-effect analysis. The questions must be unbiased, and to prevent any bias from the RCA team from tainting the investigation, questions should be tied to a specific defense, or to a specific insight from the data analysis (e.g.,
Pareto charts,
process maps,
fault trees,
control charts) and other tools that provide insights into performance gaps. There should not be any curiosity questions, questions that reflect "confirmation bias" (i.e. asking a leading question so they answer what the RCA team thinks are the causes), or questions that are accusatory in nature that will cause those helping the investigation to close down and withdraw.
Cause-and-effect analysis Once a robust set of lines of inquiry questions has been developed from the factual evidence collected, the applicable requirements, and an analysis of the available data, those questions can be taken to the organization's subject matter experts. This begins the process of cause-and-effect analysis. Once a question is posed to the affected organization, their answer is used to pose a follow-up
Socratic questions. Socratic questions keep the investigation flowing down to the next deeper causal factors until the organization runs out of answers, or the last causal factor is beyond the organization's control. There are many skills involved in conducting an effective cause-and-effect analysis, including facilitation skills, communication skills, and Socratic questioning. When conducted properly, this will take the RCA down to the deepest-seated root causes. A word of caution:
Ishikawa or the Fishbone Diagram, and the
5-Whys methods, are not rigorous enough for conducting a root-cause analysis. The Fishbone is from the 1940s and the 5-Whys is from the 1930, and there are much more advanced methods available. Look for methods that were developed in this century (the year 2000 and later), as they are more likely to account for the new dynamics of the modern sociotechnical work environments.
Charting the results of the RCA The best way to chart the results of an RCA investigation is to start populating the final chart from the start. This process has become much easier with the advent of virtual whiteboards. In a single virtual whiteboard, the timelines, lines of defenses, data analysis, lines of inquiry questions, cause-and-effect analysis, root causes, and corrective action plan can be displayed.
Corrective actions to prevent recurrence From a management perspective, the RCA effort is not complete without a comprehensive corrective action plan to address the root causes, the contributing factors, and the "Extent of the Causes." The corrective action plan should be developed by the issue owners and does not require participation by the RCA team, although the team is an excellent source of guidance for the issue owners. The Extent of Cause reviews are conducted to determine the extent of the damage or impact that the root causes and contributing factors had on humans, equipment, or facilities. Extent of Cause reviews are an Achilles heel in the vast majority of organizations and a primary reason why RCAs and corrective action plans fail to prevent recurrence. Also, care must be taken to avoid corrective action plans that simply add more administrative requirements and more training to the organization. To avoid this, use the
Hierarchy of Hazard Controls and Lean Mistake Proofing as guidelines for developing effective corrective actions that have a much higher likelihood of preventing recurrence.
Effectiveness reviews After a pre-determined period after the implementation of the corrective action plan, an effectiveness review is scheduled to evaluate the effectiveness of those corrective actions. This requires specifying a set of metrics or indicators that will be monitored prior to and after the corrective actions are implemented, so their impact can be measured. If the desired results are not achieved, which in most cases is a significant reduction in the magnitude or frequency of the event or problem, then the RCA must be reopened as it was not effective. To be effective, root-cause analysis must be performed systematically. The process enables the chance to not miss any other important details. A team effort is typically required, and ideally all persons involved should arrive at the same conclusion. In aircraft accident analyses, for example, the conclusions of the investigation and the root causes that are identified must be backed up by documented evidence.
Transition to corrective actions The goal of RCA is to identify the root cause of the problem with the intent to stop the problem from recurring or worsening. The next step is to trigger long-term corrective actions to address the root cause identified during RCA, and make sure that the problem does not resurface. Correcting a problem is not formally part of RCA, however; these are different steps in a problem-solving process known as
fault management in IT and telecommunications,
repair in engineering,
remediation in aviation,
environmental remediation in
ecology,
therapy in
medicine, etc. ==Application domains==