What happens if we fail to learn from our near-misses?

by Steve Smith

Originally published in 2019, revisited in 2024

“The day soldiers stop bringing you their problems

is the day you have stopped leading them.

They have either lost confidence that you can help them

or concluded that you do not care.

Either case is a failure of leadership.” - Gen. Colin Powell

Author's Note: In light of recent events with Boeing 737-MAX planes, we revisited this related article from 2019, and are re-circulating it now.

At Experiential Consulting, LLC we have been focused on the importance of learning from near-misses for many years, and have helped clients integrate near-miss reporting into their organizational culture. We believe that sharing the learning from near-misses is the gateway for organizations to develop a culture of openness, feedback, problem solving, and continuous learning. Experts debate if the things that cause near-misses are the things that ultimately lead to catastrophes or fatalities, but in the outdoor programs we work with, we find that a near-miss can serve as an accident precursor, and that there is much to be gained by learning to talk about our near-misses. We have written and presented at conferences extensively about this concept before.

Several recent events (in early 2019) lead us to revisit this topic today. The most newsworthy (and obvious) example can be found in the tragic crashes involving Boeing's 737 MAX, and the subsequent global grounding of those planes. As the news continues to come in, we see some themes here that are worth highlighting, including systems thinking, learning from near-misses, the fallacy of blaming human error, and ultimately, organizational culture.

What happened to the 737-MAX airplanes? To summarize the Boeing crashes, which have been correlated and connected to each other according to FAA Administrator Daniel Elwell, the pilots struggled to maintain control of the planes during takeoff, which may be attributed to a new technological feature on the planes called the Maneuvering Characteristics Augmentation System, or MCAS, a safety mechanism that automatically corrects for a plane entering a stall pattern. If the plane loses lift under its wings during takeoff and the nose begins to point too far upward, the MCAS kicks in and automatically forces the nose back down. If functioning correctly, this can help to prevent the plane from stalling (and eliminate the human error of taking off at too steep of an angle). In the case of the first crash, the MCAS kicked in and forced the nose of the plane abruptly down during takeoff at a critical and irrecoverable time. At the time of this blog being published, more and more evidence is coming in connecting the factors between the two crashes, though the investigations are ongoing.

Systems thinking: It's easy to just say that the planes crashed due to operator (cockpit) error. Or we can back up another step and blame it on the training they did or didn't receive, or even on their plane's manual which has been called "criminally insufficient" by some pilots. If we keep going, we find a software problem which was discovered in the wake of the first crash in October, 2018 (Lion Airlines). This software issue was reportedly in the midst of being resolved between Boeing and the FAA when the United States government shut down for 35 days, stalling the resolution of that software fix. Backing up even further, the FAA has been led by an interim (acting) director for the past two years, as no permanent director has been successfully appointed.

The captain who questioned the 737 Max 8's flight manual had this to add: "The fact that this airplane requires such jury rigging to fly is a red flag. Now we know the systems employed are error-prone — even if the pilots aren't sure what those systems are, what redundancies are in place and failure modes. I am left to wonder: what else don't I know?"

So, what caused the accidents? Was it operator error? Lack of training? Poor instructions? A software problem? The federal government shutdown? Leading safety experts are learning to resist the natural human desire to isolate single causes, and look at incidents like this in more complex and inter-connected ways, taking a holistic view. Applying root cause analysis (RCA) would lead us to isolate a single problem or two that we can fix, but experts believe this approach satisfies our need for optics at the expense of actual learning, making us more prone to recurrence. As safety author Charles Perrow has written, accidents are caused by complex factors tightly coupled together, not single ones that we can isolate and simply fix. When we do try to isolate root causes, often power dynamics and biases lead us to focus on front-line elements like workers, operator error, or even training instead of looking at the bigger system within which those humans, errors, and trainings operate. Safety author Dr. Sidney Dekker has said that there are no root causes for why an accident occurs, in the same way there are no root causes for why an accident doesn't occur. Rather than focus on blaming, retraining, and other simple fixes, we are better served by asking ourselves, what in the work environment made that error possible, or why did it make sense to the frontline worker at the time?

Taking it a step further, safety expert Dr. Todd Conklin states it more bluntly: "When investigating an accident, don't limit yourself to human error or non-compliance -- you will always find both." Error is normal, and so commonplace that it's actually present not only in the small number of events that go catastrophically wrong, but it's present in almost all of the other ones too. In most cases, despite our mistakes, things don't go wrong -- which can lead us to learn the wrong lessons (breeding complacency, as recreation law attorney Charles "Reb" Gregg writes). However, if we are diligent and focus our attention on why things go right, we can learn deeper lessons. We can aim our efforts towards resilience so that when errors are inevitably made, they don't convert into tragedies.

Learning from near-misses: One of the ways we can build resilience in an organization is to develop systems to learn from each other. OSHA refers to near-misses as "accident precursors" and advises organizations to develop tools to report, analyze, and learn from their near-misses. We have referred to near-misses as "cheap lessons" in our presentations and emphasized the opportunity they present to us for learning, without the corresponding cost or trauma that comes from actual incidents or injuries. Although there is the potential for near-misses to be misused, ignored, or for the wrong lessons to be learned, we believe there is much more positive that can come from near-miss reporting if the organizational system and culture sees them as a pathway for continuous learning and prevention.

In the case of the Boeing plane's problems, we see some unfortunate examples of how there were opportunities for learning which were missed. There is a database designed for pilots to be able to publicly and openly report concerns or incidents that occur, without fear of retribution. As it turns out, there is a clear and specific pattern of at least 11 reports going back to October 2018 documenting recurring problems with the auto-pilot. One such report, from 2018, warned that "the aircraft pitched nose down after engaging autopilot during departure. Autopilot was disconnected and flight continued to destination." A quick search of the database reveals several other examples with similar cautionary reports.

A near-miss report does no good if we fail to act on it, or to investigate what can be done to correct the problem. Furthermore, failing to act on near-miss reports does little to encourage future reports, further compounding the blind spot and reducing opportunities to create learning from these reports. In addition, compiling a backlog of near miss reports with no corresponding action to correct the problem(s) is a legal nightmare, according to Gregg: "From a legal standpoint, hardly anything is more harmful to a defense than a failure to properly react (and record that reaction) to a prior similar incident."

We have described near-misses using the metaphor of an iceberg, where the critical incidents are the obvious ones above the surface, while the near misses, unsafe conditions, and unsafe acts are often hidden beneath the surface of organizational leadership's views. We advise our clients to seek to see beneath the surface, and not just focus on the tip of the iceberg which they can see. The near-misses may be hidden, but they are hidden opportunities for learning and prevention.

Sadly, this Boeing case is not the only recent example of an industry failing to listen to near-miss reports, with tragic consequences for the public. A recent article details how the Food and Drug Administration (FDA) created a private database, outside of the publicly visible one, where companies granted a special exemption would have failures pertaining to their medical equipment reported. This functionally hides, from both the general public and the doctors who rely on that equipment, any reports of problems, equipment failures, or even serious injuries pertaining to that equipment. According to the article, "The FDA has built and expanded a vast and hidden repository of reports on device-related injuries and malfunctions, a Kaiser Health News investigation shows. Since 2016, at least 1.1 million incidents have flowed into the internal 'alternative summary reporting' repository, instead of being described individually in the widely scrutinized public database known as MAUDE, which medical experts trust to identify problems that could put patients in jeopardy."

Takeaways: Creating a culture of continuous learning

As the FDA and FAA examples, show, failure to pay attention to safety concerns being voiced from the front-lines can contribute to the likelihood of an incident occurring. We can never eliminate human error, so focusing on people as the problem is a simple solution that fails to actually address the system problems that endure. In fact, safety strategies that see people as problems (like the MCAS system in the Boeing planes) can make matters worse. Dekker advises us to see people not as problems to be managed, but as solutions to our safety problems. Similarly, Conklin urges safety leadership to "fix the work, not the worker" and has identified the following five principles in his most recent book, The Five Principles of Human Performance:

1. People make mistakes

2. Blame fixes nothing

3. Learning and improving is vital

4. Context drives behavior

5. How leadership respond to failure matters

Josh Cole (IFMGA Guide and Experiential Consulting Associate Consultant) points out an additional wrinkle in the FAA story, with implications for the outdoor industry: The conflicting interests of FAA individual designees. Much of the inspection system relies upon commercial employees (e.g. Boeing employees) who act occasionally in the interest of the FAA while still in the employ of the company that the FAA oversees. Cole points out a similar (potentially conflicting) relationship that exists in our industry when certified guides, instructors, or trainers from one program conduct audits or accreditation reviews for their peers in similar programs. Cole adds, "These people often know and work with the people that they are assessing and personal bias is unavoidable to some degree or another. I don't think that this system (for the FAA or the outdoor industry) is inherently flawed, but it requires a level of trust and presence of backup systems in order to operate appropriately."

The barriers to reporting near-misses are well known and hard to tackle all at once. If you want to create an environment where leadership can see the same things that frontline workers see, start by building effective mechanisms for near-miss reporting, reward people for taking their time to submit those reports, and most importantly, do something useful with the information you receive as a result. In so doing, we can create a culture of continuous learning, and even establish "habits of excellence" as James O'Neill described his safety culture movement in a corporate environment. If we fail to do so, we run the risk Gen. Colin Powell described in this article's opening quotation.

Latest update (May 2019): As news continues to come in around the contributing factors, it's now even more clear that Boeing had multiple opportunities to intervene in ways that could have helped prevent these incidents. The most recent reports indicate Boeing had known about the issues with the software the year prior to the first (Lion Air) crash, but had chosen not to act on those reports.

Takeaway for outdoor education programs: It's not enough just to create near-miss and incident reporting systems. We have a responsibility to act on the information we receive, and to document those actions so as to best protect ourselves, and demonstrate our commitment to continuous learning.

References:

Cole, Josh. Personal correspondence

Conklin, Todd. The Five Principles of Human Performance (2019)

Dekker, Sidney. Safety Differently (2014)

Gregg, Reb. Fred C. Church Blog (2019) and personal correspondence with the author

Perrow, Charles. Normal Accidents: Living with High Risk Technology (1984)

Various News Sources linked directly within the blog post

https://www.bloomberg.com/news/features/2021-11-16/are-boeing-planes-unsafe-pilots-blamed-for-corporate-errors-in-max-737-crash

https://www.seattletimes.com/business/boeing-aerospace/final-report-on-boeing-737-max-crash-disputed-agencies-note-pilot-error-as-a-factor/#comments