The difference between Failure Modes

and Failure Mechanisms

Carl Kirstein, Pr.Eng, CMRP
Asset Decision Engineering at CMRP

There seems to be some uncertainty between Failure Modes and Failure Mechanisms in literature
and in general. Some sources treat them as though they are the same thing, others let them
overlap, and others just avoid the subject entirely. If you do RCM or RCAs though you'll be
confronted by them and probably waste hours discussing or arguing about them. I'll venture my
opinion on the matter to help ease the pain.

Failure Modes are associated with deviant function or behaviour.

Failure Mechanisms are associated with deviant physical condition or physical state.

Failure Modes are events described by verbs and adverbs

Failure Mechanisms are states or conditions described by nouns and adjectives

A Failure Mode is the direct effect of a Failure Mechanism, or

A Failure Mechanism is a direct cause of a Failure Mode.

Confusion between the Modes and Mechanisms usually happen when you detect a Failure
Mechanism before it causes a Failure Mode - for example you find bridged-out
control or corroded liner during FFT inspections before they could cause skew
running or ore spilling. You would raise a work order or notification to correct the Failure
Mechanism, but that does not make it a Failure Mode.

Confusion could also arise because the Failure Mode of a component could be the Failure
Mechanism of the next higher assembly level. If Modes and Mechanisms are distinguished by
behaviour vs condition, this confusion is more manageable, but it will remain contentious. The
best way to manage it is by establishing what the item under scrutiny is, for instance are you
looking at the pump system, pump, or the motor driving the pump? If everyone is at least
focussing on the same object, then the confusion and contention drops.
Recently on the RCM2 LinkedIn Group, an interesting discussion emerged on
the differences between root cause and failure mode? Worth sharing…

Adhen Utomo, Mechanical Engineer at Kaltm Prima Coal in Indonesia, asked

this question, “what is the difference between root cause and failure mode?”

Denis Marshment, long time RCM2 Practitioner and Director of Asset

Dynamics Asia. Stated that “Root cause and failure modes are essentially the
same thing. Both are causes of failure. The only difference is in the
techniques that we use to identify them. With Root Cause Analysis (RCA) we
are studying a failure event that has already occurred with the aim of
preventing its re-occurrence. To do this we must understand all the
contributing factors that led to the failure and identify the likely causes. We
stop listing failure modes (causes) when it becomes possible to implement a
suitable failure management policy – this is the “Root Cause”. RCA is
concerned with a single failure event and is applied after the failure has
occurred, so in this sense its scope is limited and it is a reactive approach.

The other technique that uses root causes/failure modes is the RCM
approach. With RCM we try to identify all the likely causes of failure and their
consequences for an asset or system and identify suitable failure
management policies to address each failure mode. RCM is a proactive
approach that is hopefully applied before the asset has failed with the
objective being to mitigate or prevent the failure consequences. The output of
RCM is a maintenance plan for the asset or system that covers all the likely
failure modes.

Both approaches are extremely useful and quite complimentary.”

Steve Turner of OMCS added that he has a different understanding. “To me, if
we are talking assets, there are failure modes which by definition from a
dictionary, means the “manner for form” by which things fail. The word
“cause” can mean the same thing. However when someone adds the word
root to either failure mode or cause, they are meaning the failure mode or
cause that was at the root of the problem. My understanding of causal
relationships is that they are a continuum and they can have multiple trees.
You can keep asking why forever and so you can never get to the root cause
in reality…. there is no such thing. There are numbers of causes and events
that end up causing failure. So with Dennis definition, his root cause is when
we can put in a suitable failure management policy… Under his definition,
Root Causes is a post failure thing. My definition is root cause can be
established ahead of failures… RCA can be done in the future tense but it is
not called by that name in futuristic studies.
I don’t want to add to the confusion, but some people use the term failure
mechanism too and they say this term differs from failure mode. Maintenance
is full of words that people use differently. You can see that Dennis has a
different understanding to me… If I get into a technical conversation I often
ask what they mean by the words they use so I understand the definitions
being applied.”

Adhen Utomo, whom asked this question, realized that this company needs to
decide and develop a dictionary related to maintenance since there is a lot
confusion and debate for maintenance “word”. He agreed with Denis because
“We have the same information from John Moubray, but there are a lot that
might be we must search and digging for details of this issue in maintenance.”

Matt Thomnpson of Rio tinto added “The only thing I would like to add to the
discussion is that athough failure modes ( from an equipment centric point of
view) can often be thought of as causes not all causes can be thought of as
failure modes. For example administrative causes or failures of procedures or
management policies can be root causes of an RCA cause and effect chain but
are not equipment or component failure modes. Root cause and failure mode
are classified as two separate things.”

Matt had an example (however simple it is) to illustrate the differences:

EQUIPMENT: e.g. furnace tube boiler

FAILURE: (what happened) e.g. Catastrophic failure of the welded joint

between the furnace tube and tube plate.

FAILURE MODE: (by definition is what the equipment or component failed

from) e.g. Corrosion fatigue.

ROOT CAUSE/S: (by definition, what caused the failure mode to occur AND
what can be changed to prevent re-occurrence. Remember there can be more
than one!!)
e.g. Poor feed water treatment accelerated corrosion; Rapid firing, particularly
from cold, increased thermal stress on the boiler; Over pressurization and
temperature cycles.