I will never forget watching the TV news on January, 28th, 1986. This was the day the space shuttle Challenger exploded only seventy three seconds after launch killing all seven crew members.

On February 1, 2003 the space shuttle Columbia explodes over Texas on re-entry, again resulting in the deaths of all seven crew members.

What lead to these catastrophic events? These tragedies pointed to serious issues within NASA.

These were the questions everyone was asking and one of the main tools used to uncover the answers was a Fault Tree Analysis. This tool was developed in 1962 in Bell Laboratories for the aerospace industry. It has since been found useful for risk analysis regardless of industry. An FTA can be used for any identified undesirable event to affectively capture all contributing factors.

This ability to flush out risk by reviewing a chain of cause and effects, make Fault Tree Analysis a powerful tool for driving safety. It is one you need for your risk assessment tool kit.

What is a Fault Tree Analysis?

A Fault Tree Analysis is a map/diagram showing from top to bottom an undesired event and all contributing factors. At the very top of the map is one undesirable event, such as brakes failing on a car. For the undesirable event you could list “car can’t stop.” Under the event you would list any factors that could lead to the cars brakes failing such as:

  • A faulty master cylinder
  • Low brake fluid
  • Worn pads

For each of the above listed items, you would list any activities that could lead to the identified problem. Example: for low brake fluid you might list:

  • Broken pipe
  • Leaking cylinder
  • Lose bleed screw

When drawn out, the map gives you a visual representation of the event and contributing factors.

FTA’s use a standard set of icons. This makes it a nice brain storming tool that is easy to draw and read. The icons represent “gates” and “events.”

The two most commonly used gates are an “or” and an “and” gate. An “or” represents any identified factor that could cause a failure just by itself. If we go back to our car break failure, the listed items of: low brake fluid, faulty master cylinder and worn brake pads would all be drawn with the “or” icon, because any one of them could cause a cars brakes to fail.

An “and” gate icon would be used for any items that needed to occur together in order to cause a failure of the car brakes. Where “or” items can cause a failure by themselves, “and” items need a combination of more than one item to cause failure.

Register for my free webinar: How Effective Hazard Assessments Improve Your Safety Culture

Fault Trees Analysis for Safety

Most do not realize the lengths gone to within engineering to ensure safety in regards to design methods. From the creation of manufacturing equipment to something as complex as the space shuttle, safety must be factored into each design step. This is a major component of any project management system. For each component and all systems involved risk assessments are done with the aid of a Failure Mode and Effects Analysis (FMEA) and FTA. This gives a structured approach to analyzing all components and systems that could lead to a failure.

The next time you’re flying in a commercial jet, be happy this process was used to ensure safety of jet design and to identify potential problems before they could occur in real life.

The use of tools like Fault Tree Analysis are crucial for flushing out potential system failures in advance with the goal of eliminating potential failures all together. This tool enables a proactive approach in safety at the design phase. As more data is gained through testing or from actual product history, you can add a statistical value to the events and predict failure and how reliable a design will be.

A Fault Tree Analysis can be affectively used for many different potential hazards, from missile guidance system failures to cyber hacking. As you work your way down the fault tree, you continually ask yourself “how.” How can this fail?

When complete, reading the tree from bottom to top gives a step by step guide for occurrence of the hazard. Example: If having your car stolen is the top listed undesirable event, bottom to top tells you step by step how to steal a car. Read from bottom to top it becomes a “how-to guide” for stealing a car.

If you have sufficient data based on past failures, you can predict probability of failure based on time and conditions. The goal of this effort is to prevent the identified undesired event from happening. This means as you work your way down the map, you list components, systems, subsystems and the potential problems that could arise with each. For each identified problem a control measure is sought out, just as with an identified hazard. This is the whole purpose of the diagram. Find potential problems before they occur and put control measures in place. Control measures may include redundant or backup systems. If there is a likelihood a failure could occur, let’s find a way to prepare for it in advance.

Space Shuttle Challenger Disaster

Fault Tree Analysis was one of the tools used when investigating the space shuttle Challenger tragedy. Seventy three seconds after launch, the shuttle exploded. The investigation found that the right solid rocket booster separated, causing damage to the external tank. This led to the destruction of the shuttle by aerodynamic forces.

The top item on the fault tree is solid rocket booster separation. Working down the tree, the cause of the separation was an O-ring joint failure. The O-ring sealed a joint connecting the solid rocket booster to the main part of the shuttle. Both the primary and secondary O-rings failed, allowing heated gasses and flames to escape and make contact with the external tank, causing a structural failure.

Two main factors were uncovered:

  • Technical – The O-ring joint had already been identified as being inadequate and a new design was underway. Previous flights had shown O-ring erosion had taken place, making the secondary O-ring useless.
  • Organizational – Cold temperature the morning of the launch had engineers concerned. Ice had formed on the shuttle. The O-rings may not perform well with the cold weather. NASA management made the decision that the risk was acceptable and the launch was given the go-ahead.

This horrible disaster could have been avoided. I am not trying to criticize NASA, I know there were other factors such as budget cuts and pressure to meet set targets. (Sounds like most businesses)

Look at Some Examples on the Web

You can find lots of great examples of Fault Tree Analysis maps in Google images, or by using any online search engine. Since this is a very visual tool, I would suggest checking some examples out. This tool can be simple and quite helpful. It can also become quite complicated involving software and lots of data. It is worth getting familiar with.

This is a tool that should be used in the design phase to identify potential risks and assign control measures. It should also be used after an accident or “undesirable event” to help identify all contributing factors leading to the event.

When it comes to flushing out risk and determining reliability, FTM is a powerful tool. I have given you a brief understanding of its potential. Check out some examples online. It is worth your time and attention to become familiar with. It will give you another structured approach for your risk analysis tool kit.

Register for my free webinar: How Effective Hazard Assessments Improve Your Safety Culture