Академический Документы
Профессиональный Документы
Культура Документы
Comment
Events_MA
Ident
Date
WorkingAge
Event
Comment
EventsDescription_MA
EventName
P
Comment
VarDescription_MA
VariableName
MeasureUnit
WarnLimit1
WarnLimit2
Comment
CovariatesOnEvent_
Event
StartingDate
EndingDate
Covariate1Name
Covariate2Name
Comment
The above tables are identical (except for the _MA table name
suffix) to their counterparts in the analysis of simple items. The
following additional tables, however, are required for complex
items. Each component or failure mode in a complex item will
Page 49
Optimal Maintenance Decisions (OMDEC) Inc 2004
behave according to their individual models. The supplementary
tables shown below relate a model to an Ident (unit of a fleet), an
event, and a variable (co-variate):
IdentToModel
ModelName
IdentName
Date
EventToModel
ModelName
InputEventName
OutputEventName
InputP
OutputP
VarToModel
ModelName
InputVariableName
OutputVariableName
VariableDataType
MeasureUnit
WarnLimit1
WarnLimit2
Notice the phrases Input and Output in several field
names of the last two tables. We use these fields to map database
values from the component to the model. The model itself must use
the keywords B, EF, and ES. However, in the database we
may use any labels for these events. Using the EventToModel table
we tell a model which events in the database to use as beginning,
failure, and suspension events in the model. For example, we can
have two types of Beginning events, B1 and B2 in the database.
They correspond to our two components which can begin their
lives at different times. Similarly, we may have two types of
failure events, EF1 and EF2. And two types of suspensions, ES1
and ES2. We need to tell a particular model (of a particular failure
mode or component), which event records (for example, those with
the values ES1 or ES2 in the database) to use as the suspension
event for the failure mode currently being modeled or predicted.
That is, we need to tell the model for GearOne to use the events
B1, EF1, and ES1 as the beginning, failure, and suspension events:
B, EF, and ES. We perform this mapping in the EventToModel
table.
For convenience and flexibility, we use the same mapping
technique in the VarToModel table to change the names of the
original variables in the database to other names in the particular
model. For example, we might call the fault growth parameter for
GearOne, FGP1, in the database. But in the model for GearOne,
we might like to call it simply FGP or just Health_Indicator
(since we already know that this model is for GearOne and it is
indicating health degradation). When the intelligent agent is
deployed it will run through each model of the complex item and
provide a decision and remaining useful life corresponding to that
component or failure mode.
The table IdentToModel allows us to include selected individual
units (from a fleet) in the model. We may wish to do this when a
Page 50
Optimal Maintenance Decisions (OMDEC) Inc 2004
particular component is present in some units of the fleet but not in
others. For example, we would not model a failure mode
associated with a turbocharger on those engines that are not
equipped with one.
Let us look then at the Events_MA table (Figure 9-29) for the
Gearbox under analysis.
Figure 9-29: Events_MA table for a gearbox with two failure modes.
Note that, in Figure 9-29, there is no B1 or B2 to designate the
beginnings of the individual components. That is because, in the
particular case being modeled, when one gear fails, both are
replaced. Therefore we have decided to use the event B to mark
the life beginnings of either component
101
. We have chosen to use
EF to designate the failure of GearOne and EF2 for that of
GearTwo. The suffix, _nn, of the Ident (e.g. GearOne_01)
designates one component life. GearOne had 11 lives, and so did
GearTwo. GearOne failed 8 times, and GearTwo failed twice.
Page 51
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 9-30 is a view of two extracts from the Inspections_MA
table.
Figure 9-30: Partial views of Inspections_MA table
Once the decision models have been built an deployed (following
the general methods presented in Example 1 (page82) a typical
optimized CBM decision at a point in time might resemble that of
Figure 9-31.
Page 52
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 9-31: EXAKT output for two failure modes, GearOne and GearTwo
The above results for the decision models of a complex system
may be achieved by the following steps using the EXAKT
program.
Detailed Explanation Steps to follow
1 Start, Exakt for Modeling
2
File, Open, navigate to , ComplexItemsDemo_WMOD.mdb,
Open
3
Modeling (on the Menu bar), Data setup, type in the
attachment script (actually it is already keyed in for you),
Execute, Save
4
Data Preparation, Enter General Data, Project Title:
Complex Items Demo, CBM Model: Gear One, Description:
Vibration Analysis for Gear One, Time Unit: Hrs., OK
5 Marginal Analysis
6
Idents Selection, GearOne_01 to GearOne_11: check,
Events Selection, B, Select Event: B, Precedence: 5, Apply,
EF, Select Event: EF, Precedence: 2, Apply, ES, Select
Event: ES, Precedence: 3, Apply, Variable Selection,
Health_Indicator, Select Variable: Health_Indicator, Apply,
OK.
7
A. Modeling, Weibull PHM, Select Covariates, sub-model
Name: HI, Health_Indicator, , OK, X
B. Select Covariates, sub-model Name: HI_1, Fix shape
parameter=1: check, OK, OK, X
8
Transition Probability Model, Transition Rates, OK, Decision
Model, Decision Model Parameters, Replacement (C): 1000,
Failure (C+K): 6000, Cost Unit: $, Inspection Interval: 500,
OK, X
9
Decisions, GearOne_01, shift+GearOne_11, Report, Full
Report Icon , PgDn, PgDn, PgDn
For the second model of our complex system, please repeat
steps 4-9 with the following substitutions.
Step Change this: To:
4 Model: Gear One Model: Gear Two
4 Description: Vibration
Analysis for Gear One
Description: Vibration
Analysis for Gear Two
6 GearOne_01 to
GearOne_11: check
GearTwo_01 to
GearTwo_11: check
6 EF, Select Event: EF EF1, Select Event: EF
7* sub-model Name: HI sub-model Name: HI
8* Replacement (C): 1000,
Failure (C+K): 6000,
Cost Unit: $, Inspection
Interval: 500
Replacement (C):
1000, Failure (C+K):
6000, Cost Unit: $,
Inspection Interval:
500
9 GearOne_01,
shift+GearOne_11
GearTwo_01,
shift+GearTwo_11
Summary
Page 53
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 9. extended the usefulness of CBM by providing a systematic
methodology for declaring a potential failure and for determining the
remaining useful life (or P-F interval) of an item. Example 1 Creating an
intelligent agent (page 82) provided a tutorial to familiarize you with the
functionality of the EXAKT software for the automated interpretation of
condition data. Example 2 Data validation (page 88) was a case study
from a mining application showing realizable savings from CBM
optimization and providing several techniques for data cleaning. The step-
by-step tutorial that reproduces in EXAKT many of the lessons of
Example 2 is provided in Appendix 11. on page 189. Example 3 Complex
Items (page 103) extended the methodology to items having multiple
failure modes. An Exercise 4 data smoothing is provided in Appendix
11.on page 193
References
Cox, D.R., (1972) Regression models and life tables (with discussion),
J.Roy. Stat. Soc. B, Vol. 34,pp. 187-220.
Jardine, A.K.S., Banjevic D. and Makis V, (1997) Optimal replacement
policy and the structure of software for condition-based maintenance,
Journal of Quality in Maintenance Engineering, Vol. 3, No.2, pp. 109-119.
Campbell, J.D. and Jardine A.K.S. (Editors), (2001) Maintenance
Excellence: Optimizing Equipment Life-Cycle Decisions, Marcel Dekker,
(Chapter 12: Optimizing Condition Based Maintenance, by M. Wiseman).
Page 54
Optimal Maintenance Decisions (OMDEC) Inc 2004
Part 4. Reliability Centered Maintenance
Chapter 10. Pillars of RCM
Introduction
Page 55
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 11. Failure Modes and Effects Analysis
Question 1 Functional Analysis
The process
Example 1
Function Functional
Failure
Failure mode Failure effects
2 To insulate passengers from
shocks caused by crossing rail
joints, bumps and to minimize
transient oscillations after
crossing such bumps.
Function Functional
Failure
Failure mode Failure effects
3 To insulate passengers from
jerks during acceleration and
braking
Function Functional
Failure
Failure mode Failure effects
4 To control the roll angle of the
car body relative to the truck
Function Functional
Failure
Failure mode Failure effects
5 To ensure that the carriage floor
is level with the platforms when
train stops at a station
6 To assist in stopping the train at
up to 0.88 m/s2
7 To prevent direct contact
between axle box and truck frame
under severe bounce conditions
8 To permit the truck to be lifted
and/or the car to be towed easily
9 To ensure that wheel sets
remain attached to truck while
truck is being lifted
10 To insulate the car from
shocks to some extent if the air
bag fails
11To limit lateral movement of car
relative to truck
12 To prevent traction link
retaining nut from coming undone
Example 2
Page 56
Optimal Maintenance Decisions (OMDEC) Inc 2004
Item description:
Pack delivers temperature-controlled air to
conditioned-air distribution ducts of airplane. Major
assemblies are heat exchanger, air-cycle machine,
anti-ice valve, water separator, and bulkhead check
valve.
Redundancies and protective features (include
instrumentation):
The three packs are completely independent. Each
pack has a check valve to prevent loss of cabin
pressure in case of duct failure in unpressurized
nose-wheel compartment. Flow to each pack is
modulated by a flow-control valve which provides
automatic over-temperature protection backed up
by an over- temperature trip off. Full cockpit
instrumentation for each pack includes indicators
for pack flow, turbine inlet temperature, pack-
temperature valve position, and pack discharge
temperature.
Reliability data: Built-in test equipment (described): none
Can aircraft be dispatched with item
inoperative? If so list any limitations which
must be observed:
Yes. No operating restrictions with one pack
inoperative.
Hidden functions: Yes
Functions Functional failures Failure modes Failure effects
1 To supply air to
conditioned air
distribution ducts at the
temperature called for by
pack temperature
controller
2 To be capable of
preventing loss of cabin
pressure by backflow if
the duct is fails in
unpressurized nose-wheel
compartment
Example 3
Item description: Distributed control system (DCS)
Redundancies and protective features (include
instrumentation):
Built-in test equipment (describe):
Operating context: Continuous process. Unionized. 500 employees. See business plan. Biggest product
Ethylene. Can also produce gasoline Two lines: 1. Material flow 2. Olefins. Raw material safely stored at
high pressure (6000 MPa) in storage underground caverns. It is pipelined to production facilities. Ethylene
converted to polyethylene. There is a "hot side" and "cold side". Raw material undergoes cracking
(breaking carbon chains) and becomes ethylene. The plant extends over several acres (a square kilometer).
The DCS (distributed control system) is integral to the entire production line. There are 3 different types of
DCS. Recently there has been a benzene spill. Environmental excursions occur occasionally. Installed in
1996. Capital expenditures have been curtailed recently. Individual heaters can be shut down for
maintenance.
Hidden
functions:
Yes, UPS
Functions Functional
failures
Failure modes Failure effects
To provide safe, secure, uninterrupted,
redundant, cost effective, continuous process
control and monitoring according to the target
product of the day, within the parameters
Page 57
Optimal Maintenance Decisions (OMDEC) Inc 2004
specified by product specification and by
current environmental regulations, in the
presence of a UPS (uninterruptible power
supply)
To alarm on abnormal conditions in the
process real time
To allow manual intervention
To interface with other control systems
To graphically present the process to the
operators
To exchange data with other control systems
To capture historical data
To provide the means to alter control logic
To backup/restore configuration data
To execute batch recipes within the
continuous process, for example cleaning
cycles
To provide safe shutdown in the event of a
hardware failure.
To alert the operator, in real time, when some
part of the DCS hardware or a field device
fails.
To be immune from physical, electromagnetic,
electronic, environmental intrusion
To be ergonomic
To conform to NEMA standards
Example 4
Functions Functional failures Failure modes Failure effects
1 To provide a
renewable surface that
protects the carcass of
the tire so that it can be
retreaded
Question 2 Failure analysis
The process
Example 1
Ctrl.
No.
Function Statements (Quantitative
Performance Requirements)
Failed States (Ways
Performance is Lost)
Failure Causes
1
To provide smooth rolling support for half the
weight of a passenger car (up to 26.5 tons)
on the rails at speeds up to 120 kph
Fails to provide support
5
Unable to support the car on
the rails at 120 kph
16
Fails to provide rolling
support
21
Fails to provide a smooth
ride
Example 2
Functions Functional failures Failure modes Failure effects
Page 58
Optimal Maintenance Decisions (OMDEC) Inc 2004
1 To supply air to
conditioned air
distribution ducts at the
temperature called for by
pack temperature
controller
A conditioned air is not
supplied at called-for
temperature
2 To be capable of
preventing loss of cabin
pressure by backflow if
the duct is fails in
unpressurized nose-wheel
compartment
A No protection against
backflow
Example 3
Item description: Distributed control system (DCS)
Functions Functional
failures
Failure modes Failure effects
To provide safe, secure,
uninterrupted, redundant, cost
effective, continuous process control
and monitoring according to the
target product of the day, within the
parameters specified by product
specification and by current
environmental regulations, in the
presence of a UPS (uninterruptible
power supply)
Fails to
provide
security
Unauthorized usage of
console either when
unattended or if
password stolen
Unable to
log in
Password forgotten
Unable to
protect
against loss
of control
UPS has failed
Control lost Complete loss of
communication with
ring
Complete loss of
communication with
controller node
All consoles fail
Complete loss of
communication on
module bus
Complete loss of
communication on slave
bus
Console LAN fails
Redundancy
lost
Console hardware or
software fails
Controller hardware or
software fails
Question 3 Failure modes analysis
The process
Why? Why? Why? Why? Why? Why? Why? Why?
Ventilation Fan fails Motor Motor Airways Inadequate
Page 59
Optimal Maintenance Decisions (OMDEC) Inc 2004
system
fails
fails trips clogged
with dirt.
design
Defective
sensor
Bearing
seized
Lubricant
allowed to run
dry
Wrong
lubricant
Improperly
labeled
Stores
error
Label
misread
Inattention
Insufficient
training
Power
drive
fails
Belts
failed
Incorrectly
installed
Insufficient
training.
Employee
turnover
Poor
working
conditions
Missing
documentation
Inadequate
document
control
Inadequate
tools
Incorrectly
specified
Distribution
system fails
Duct
fails
Duct
clogged
Duct
pierced
Damper
failed
Example 1
Functions Functional failures Failure modes Failure effects
1 To provide smooth
rolling support for half
the weight of a
passenger car (up to
26.5 tons) on the rails
at speeds up to 120
kph
Fails to provide
support
Weld in frame fails due to
fatigue
Wheel collapses due to
fatigue
Axle fails due to fatigue
Truck frame component
fails due to fatigue
Functions Functional
failures
Failure modes Failure effects
1 To provide smooth
rolling support for half
the weight of a
passenger car (up to
26.5 tons) on the rails at
speeds up to 120 kph
Unable to
support the car
on the rails at
120 kph
Differential wear of steel
treads on the same axle
Spalling on wheel tread
Wheel flange shears off
Page 60
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chevron rubber shears
Tie bar rod axle rod
slackens off
Chevron rubber settles
Chevron rubber elastically
yields
Traction link bolt comes
adrift
Traction link falls off due to
fatigue
Functions Functional failures Failure modes Failure effects
To provide smooth
rolling support for half
the weight of a
passenger car (up to
26.5 tons) on the rails
at speeds up to 120
kph
Fails to provide
rolling support
Bearing collapses due to
fatigue failure of cage,
rollers, spacer or inner or
outer race
Bearing collapses due to
excessive clearing in
housing
Bearing collapses due to
bumpy rails
Bearing fails due to under
lubrication
Plug falls out of axle box
cover
Bearing fails due to over
lubrication
Moisture in lubricant causes
bearing to fail
Functions Functional failures Failure modes Failure effects
To provide smooth
rolling support for half
the weight of a
passenger car (up to
26.5 tons) on the rails
at speeds up to 120
kph
Fails to provide a
smooth ride
Flats worn on wheel tread
Example 2
Functions Functional failures Failure modes Failure effects
1 To supply air to
conditioned air
distribution ducts at the
temperature called for by
pack temperature
controller
A conditioned air is not
supplied at called-for
temperature
air-cycle machine seized
ram-air passages in heat
exchanger blocked
anti-ice valve fails
water separator fails
2 To prevent loss of
cabin pressure by
backflow if the duct is
A No protection against
backflow
bulkhead check valve
fails
Page 61
Optimal Maintenance Decisions (OMDEC) Inc 2004
fails in unpressurized
nose-wheel compartment
Item description: Distributed control system (DCS)
Functions Functional
failures
Failure modes Failure effects
To provide safe, secure,
uninterrupted, redundant, cost
effective, continuous process control
and monitoring according to the
target product of the day, within the
parameters specified by product
specification and by current
environmental regulations, in the
presence of a UPS (uninterruptible
power supply)
Fails to
provide
security
Unauthorized usage of
console either when
unattended or if
password stolen
Unable to
log in
Unable to
protect
against loss
of control
Control lost
Redundancy
lost
Question 4 Effects analysis
The process
Example 1
Function Statement Failure Failure
mode
Effects
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
support
Weld in
frame fails
due to
fatigue
The truck as a whole collapses. This is most likely
to occur when the car is most heavily loaded - in
other words when it is full of passengers, and
probably while the train is going round a corner. As
a result, it would almost certainly be derailed. At
present, the truck is replaced when a crack longer
than 100 mm is found. (Such a crack would be
found during course of other inspections that occur
often enough to detect it). Downtime to replace
truck on its own 16 hours.
Function Statement Failure Failure
mode
Effects
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
support
Wheel
collapses due
to fatigue
The truck as a whole collapses. This is most likely
to occur when the car is most heavily loaded - in
other words when it is full of passengers, and
probably while the train is going round a corner. As
a result, it would almost certainly be derailed. Only
one cracked wheel has been found to date. It takes 8
hours to replace a wheel
Page 62
Optimal Maintenance Decisions (OMDEC) Inc 2004
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
support
Axle fails
due to
fatigue
The truck as a whole collapses. This is most likely
to occur when the car is most heavily loaded - in
other words when it is full of passengers, and
probably while the train is going round a corner. As
a result, it would almost certainly be derailed. No
axles have failed so far.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
support
Truck frame
component
fails due to
fatigue
Initial cracking is likely to lead to frame distortion,
which could make the truck unstable enough to
derail the train. As before, this is most likely to
happen when heavily loaded - in other words, when
it is full of passengers, and probably while the train
is going round a corner. So far, the only frame
component which has shown signs of failing has
been the transom, which cracked and has since been
reinforced with a steel plate. Downtime to replace a
truck is 16 hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Differential
wear of steel
treads on the
same axle
If the difference between wheel diameters is greater
than 2 mm, the possibility of derailment at speeds
near 120 kph increases. Downtime to re-profile a
pair of wheels is 3 hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Spalling on
wheel tread
This could lead to differential wear. If the
difference between wheel diameters is greater than
2 mm, the possibility of derailment at speeds near
120 kph increases. Downtime to re-profile a pair of
wheels is 3 hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Wheel flange
shears off
This failure is only likely to a flange which has
been weakened by excessive wear. It is most likely
to happen on a heavily loaded train going round a
corner at high speed, which would almost certainly
lead to a derailment. Downtime to replace a set of
wheels 3 hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Chevron
rubber shears
Truck frame rests directly on the axle box bump
stop. Wheel loading is unevenly distributed and
wheels are prevented from moving off-axis during
curving - both of these conditions may cause
derailment under adverse conditions of load and
speed. Downtime to replace the chevron rubber
about 16 hours. (The clearance between the bump
stop and the truck frame should be 30 +1-0 mm)
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Tie bar rod
axle rod
slackens off
Wheel arch could distort and chevron rubber could
shear. Truck frame rests directly on the axle box
bump stop. Truck frame rests directly on the axle
box bump stop. Wheel loading is unevenly
distributed and wheels are prevented from moving
off-axis during curving - both of these conditions
may cause derailment under adverse conditions of
load and speed. Time to tighten axle rod nut in
Depot 15 minutes.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
Unable to
support the car
on the rails at
120 kph
Chevron
rubber settles
Settling could cause excessive contact between
vertical bump stop and wheel arch. This would
restrict wheel set movement during curving, and
could cause derailment under severely adverse
conditions of load and speed. Clearance should be
Page 63
Optimal Maintenance Decisions (OMDEC) Inc 2004
120 kph 30 +1-0mm. Time to replace chevron rubber 4
hours. See also function 2.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Chevron
rubber
elastically
yields
Settling could cause excessive contact between
vertical bump stop and wheel arch. This would
restrict wheel set movement during curving, and
could cause derailment under severely adverse
conditions of load and speed. Clearance should be
30 +1-0mm. Time to replace chevron rubber 4
hours. See also function 2.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Traction link
bolt comes
adrift
The traction link falls off at one end, so the traction
center is connected to the truck by only one link.
Asymmetric load on the remaining link damages
the bushes, interfering with ride comfort and
possibly twisting the link mounting plates. This in
turn causes the second traction link to shear off,
which would mean that the truck is only connected
to the car by the air bags. A twisted mounting could
also restrict truck movement during curving, which
may lead to derailment under adverse conditions of
load and speed. one end of the traction link could
also hit the ground in such a way that the truck
frame or traction center has to fault over it, causing
a spectacularly nasty derailment. Time to replace a
traction link bolt two hours (note that the nuts on
the traction link bolts are held in place by split pins,
which means that this failure should not occur if the
split pin is in place - see also function 11)
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Unable to
support the car
on the rails at
120 kph
Traction link
falls off due
to fatigue
The traction link falls off at one end, so the traction
center is connected to the truck by only one link.
Asymmetric load on the remaining link damages
the bushes, interfering with ride comfort and
possibly twisting the link mounting plates. This in
turn causes the second traction link to shear off,
which would mean that the truck is only connected
to the car by the air bags. A twisted mounting could
also restrict truck movement during curving, which
may lead to derailment under adverse conditions of
load and speed. One end of the traction link could
also hit the ground in such a way that the truck
frame or traction center has to fault over it, causing
a spectacularly nasty derailment. Time to replace a
traction link five hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Bearing
collapses due
to fatigue
failure of
cage, rollers,
spacer or
inner or
outer race
Collapsed bearing causes a "hot box", and train
must stop at the next station to evacuate passengers
which causes a traffic delay of 20-60 minutes. It is
also possible that a failed bearing could cause a
derailment. The hot box melts the chevron causing
it to emit smoke. The chevron also collapses,
damaging the tie-bar and axle. Time to replace a
wheel set complete with bearing and axle box 8
hours.
Page 64
Optimal Maintenance Decisions (OMDEC) Inc 2004
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Bearing
collapses due
to excessive
clearing in
housing
If the axle box liner bore exceeds the bearing outer
race external diameter by more than 0.6 mm,
relative movement between the liner and outer race
causes excessive vibration and collapse of the
bearing. This causes a hot box, and train must stop
at the next station to evacuate passengers which
causes a traffic delay of 20-60 minutes. It is also
possible that a failed bearing could cause a
derailment. The hot box melts the chevron causing
it to emit smoke. The chevron also collapses,
damaging the tie-bar and axle. Time to replace a
wheel set complete with bearing and axle box 8
hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Bearing
collapses due
to bumpy
rails
Excessive interaction between railhead and wheel
sets applies shock loads to bearings, leading to
either fracture of bearing components or accelerated
fatigue failure. This causes a hot box, and train
must stop at the next station to evacuate passengers
which causes a traffic delay of 20-60 minutes. It is
also possible that a failed bearing could cause a
derailment. The hot box melts the chevron causing
it to emit smoke. The chevron also collapses,
damaging the tie-bar and axle. Time to replace a
wheel set complete with bearing and axle box 8
hours. Rails to be analyzed separately.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Bearing fails
due to under
lubrication
Seized bearing causes a hot box, and train must stop
at the next station to evacuate passengers which
causes a traffic delay of 20-60 minutes. It is also
possible that a failed bearing could cause a
derailment. The hot box melts the chevron causing
it to emit smoke. The chevron also collapses,
damaging the tie-bar and axle. Time to grease an
axle box 30 mins.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Plug falls out
of axle box
cover
Lubricant drains out, causing bearing to seize
resulting in a hot box. Train must stop at the next
station to evacuate passengers which causes a
traffic delay of 20-60 minutes. It is also possible
that a failed bearing could cause a derailment. The
hot box melts the chevron causing it to emit smoke.
The chevron also collapses, damaging the tie-bar
and axle. Wheel set would be replaced if plug was
found to be missing. Time required to do so 8
hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Bearing fails
due to over
lubrication
Over-lubrication leads to excessive churning and
eventual breakdown of lubricant, causing bearing to
seize resulting in a hot box. Train must stop at the
next station to evacuate passengers which causes a
traffic delay of 20-60 minutes. It is also possible
that a failed bearing could cause a derailment. The
hot box melts the chevron causing it to emit smoke.
The chevron also collapses, damaging the tie-bar
and axle. It is felt that this failure is unlikely to
occur because the amount of lubricant is controlled.
Page 65
Optimal Maintenance Decisions (OMDEC) Inc 2004
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
rolling support
Moisture in
lubricant
causes
bearing to
fail
Moisture in lubricant reduces its lubricating
effectiveness and may also cause the bearing to
corrode, in both cases leading to bearing failure
resulting in a hot box. Train must stop at the next
station to evacuate passengers which causes a
traffic delay of 20-60 minutes. It is also possible
that a failed bearing could cause a derailment. The
hot box melts the chevron causing it to emit smoke.
The chevron also collapses, damaging the tie-bar
and axle. Time to replace wheel set is 8 hours.
1 To provide smooth
rolling support for half
the weight of a passenger
car (up to 26.5 tons) on
the rails at speeds up to
120 kph
Fails to provide
a smooth ride
Flats worn
on wheel
tread
A wheel flat longer than 40 mm is likely to affect
ride comfort. It will also damage the railhead. The
noise and vibration caused by a flat wheel tread is
usually detected quickly by Operations. Time to re-
profile a wheel set on the under floor lathe is 3
hours.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Air bag leaks
via top plate
of car bolster
faster than it
can be
pumped in
Air bag deflates, so forces are transmitted between
truck and car through the layer and emergency
springs only. This causes a sharper ride, but train
does not have to be withdrawn from service
immediately. Time to replace air bag 8 hours. See
also function 5.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Steel wire
inside airbag
fails
Air bag fabric cannot contain the air pressure on its
own, so bag bursts causing forces to be transmitted
through layer and emergency springs only. This
causes a sharper ride, but train does not have to be
withdrawn from service immediately. Time to
replace air bag 8 hours. See also 44 and 45.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Chevron
spring rubber
settles
Reduced clearance causes more frequent contact
between vertical bump stop and wheel arch over
bumps. This reduces ride quality and increases
stresses on all truck components. See also 10 above.
Time to replace chevron 8 hours.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Chevron
elastically
yields
Reduced clearance causes more frequent contact
between vertical bump stop and wheel arch over
bumps. This reduces ride quality and increases
stresses on all truck components. See also 11 above.
Time to replace chevron 8 hours.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Damper non-
return valve
fails in
closed
position
Damper "seizes" and transmits shocks directly from
truck frame to underside of car (in the case of the
vertical damper) or to traction center (in the case of
the horizontal damper). This reduces ride quality
and increases stresses on all truck components.
Time to replace a defective damper in Depot 1
hour.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
Fails to insulate
passengers
adequately
Damper oil
viscosity
increased by
dirt or
oxidation
Damper becomes steadily stiffer until it eventually
seizes altogether, transmitting shocks directly from
truck frame to underside of car (in the case of the
vertical damper) or to traction center (in the case of
the horizontal damper). This reduces ride quality
Page 66
Optimal Maintenance Decisions (OMDEC) Inc 2004
after crossing such
bumps.
and increases stresses on all truck components.
Time to replace a defective damper in Depot 1
hour.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Excessive
metal-to-
metal contact
between
damper
piston and
cylinder
Damper becomes steadily stiffer until it eventually
seizes altogether, transmitting shocks directly from
truck frame to underside of car (in the case of the
vertical damper) or to traction center (in the case of
the horizontal damper). This reduces ride quality
and increases stresses on all truck components.
Time to replace a defective damper in Depot 1
hour.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Layer spring
stiffness
decreases
Serious loss of stiffness means that secondary
suspension is provided by the air bag only. This
reduces ride comfort and increases shock loads
especially on the air bag itself. Time to replace
layer spring at Depot 8 hours. See also 45.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to insulate
passengers
adequately
Air bag,
layer spring,
and
emergency
spring all fail
Car has no secondary suspension at all, so all
shocks which pass through the primary suspension
are transmitted directly to the car. Ride becomes
very rough and stresses on local truck components
are severely increased. Replacement of the three
suspension components takes 8 hours at the Depot.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to
minimize
oscillations
Oil leaks out
of damper
seals
(vertical or
horizontal
damper)
In the case of the vertical damper, full damping
capability would have to be provided by the damper
opposite, which might not be able to cope and
hence which might also fail rapidly itself. Even if
the opposite damper did not fail, damping
efficiency is impaired so oscillations are not
effectively damped, which could cause discomfort
on longer journeys. There is only one horizontal
damper, so the effect of loss of this damper is
immediate. Under damping also increases cyclic
stresses on other suspension components, especially
the torsion bar, which could shorten the life of these
components. Time to replace a defective damper in
Depot 1 hour.
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to
minimize
oscillations
Damper non
return valve
fails in open
position
In the case of the vertical damper, full damping
capability would have to be provided by the damper
opposite, which might not be able to cope and
hence which might also fail rapidly itself. Even if
the opposite damper did not fail, damping
efficiency is impaired so oscillations are not
effectively damped, which could cause discomfort
on longer journeys. There is only one horizontal
damper, so the effect of loss of this damper is
immediate. Under damping also increases cyclic
stresses on other suspension components, especially
the torsion bar, which could shorten the life of these
components. Time to replace a defective damper in
Depot 1 hour.
Page 67
Optimal Maintenance Decisions (OMDEC) Inc 2004
2 To insulate passengers
from shocks caused by
crossing rail joints,
bumps and to minimize
transient oscillations
after crossing such
bumps.
Fails to
minimize
oscillations
Damper
mounting
bolts become
detached
Dampers come adrift and oscillations are not
effectively damped, which causes discomfort and
may induce motion sickness on longer journeys.
Horizontal damper could be dragged along a rail. It
may also drop off in front of a wheel, possibly
leading to derailment. Time to replace a defective
damper in Depot 1 hour.
3 To insulate passengers
from jerks during
acceleration and braking
Fails to insulate
passengers
from jerky
stops and starts
Compound
spring
retaining nut
fails, leading
to
dislocation
of the
compound
spring
The car body is still supported by the secondary
suspension, but the center pivot crashes back and
forth against the traction center when starting and
stopping. This causes a jerky ride and considerably
increases shock loads on the truck and local car
components (especially the center pivot, traction
center and air bags). A dislocated spring could also
prevent the truck from curving correctly, which
may lead to a derailment under adverse conditions
of load and speed. Time to rectify this defect 2
hours at the Depot. (Note that the retaining nut is
held in place by the split pin, so this failure would
not occur if the split pin is in place)
3 To insulate passengers
from jerks during
acceleration and braking
Fails to insulate
passengers
from jerky
stops and starts
Compound
spring rubber
deteriorates
The car body is still supported by the secondary
suspension, but the center pivot crashes back and
forth against the traction center when starting and
stopping. This causes a jerky ride and considerably
increases shock loads on the truck and local car
components (especially the center pivot, traction
center and air bags). A dislocated spring could also
prevent the truck from curving correctly, which
may lead to a derailment under adverse conditions
of load and speed. Time to rectify this defect 2
hours at the Depot.
3 To insulate passengers
from jerks during
acceleration and braking
Fails to insulate
passengers
from jerky
stops and starts
Traction link
rubber bush
fails
Starting and stopping forces are damped only by the
compound spring, which leads to a jerky ride and a
general increase in shock loads. Time to replace
bush 2 hours.
4 To control the roll
angle of the car body
relative to the truck
Fails to control
the roll angle of
the car body at
all
Torsion bar
shears
If the torsion bar shears, one end of the car body
lurches from side to side during cornering. This
could disturb and possibly frighten passengers. The
car also becomes highly unstable and the resulting
loss of balance could lead to derailment, especially
if a heavily loaded car was going at high speed
round a corner. Time to replace the torsion bar in
Depot 4 hours.
4 To control the roll
angle of the car body
relative to the truck
Fails to control
the roll angle of
the car body at
all
Torsion bar
retaining key
fails
The torsion bar would rotate by itself and cause
noise and vibration. However, the torsion bar would
not be sheared, so derailment is unlikely to occur.
Time to replace the torsion bar in Depot 4 hours.
4 To control the roll
angle of the car body
relative to the truck
Fails to control
the roll angle of
the car body at
all
Torsion bar
turnbuckle
fastening
comes
undone
Torsion bar has nothing to act against, causing one
end of the car to lurch from side to side during
cornering, disturbing and possibly frightening
passengers. The car also becomes highly unstable
and the resulting loss of balance could lead to
derailment, especially if a heavily loaded car was
going at high speed round a corner. Time to
reconnect the turnbuckle in Depot 4 hours.
Page 68
Optimal Maintenance Decisions (OMDEC) Inc 2004
4 To control the roll
angle of the car body
relative to the truck
Fails to control
the roll angle of
the car body at
all
Torsion bar
bearing worn
due to lack
of
lubrication
Excessive clearance means that the torsion bar rests
directly on the edge of the bearing housing. The
resulting point load on the torsion bar greatly
increases the chances of the bar shearing, causing
instability and a possible derailment. Time to
replace this bearing at Deport 4 hours.
5 To ensure that the
carriage floor is level
with the platforms when
train stops at a station
Unable to
ensure that
carriage floor is
level with the
platform
Air bag leaks
via top plate
of car bolster
faster than it
can be
pumped in
If the step is not level with the platform, a
passenger could trip and fall. Time to replace air
bag at Deport 8 hours. See also 22 above
5 To ensure that the
carriage floor is level
with the platforms when
train stops at a station
Unable to
ensure that
carriage floor is
level with the
platform
Air bag
bursts
If the step is not level with the platform, a
passenger could trip and fall. Time to replace air
bag at Deport 8 hours.
5 To ensure that the
carriage floor is level
with the platforms when
train stops at a station
Unable to
ensure that
carriage floor is
level with the
platform
Leveling
valve
turnbuckle
loose
Air bag cannot be charged efficiently so carriage
floor cannot be aligned with platform before
passengers start moving on and off the train. This
means that a passenger could trip and fall. This
failure occurred quite often in the past, but the
locknut and spring washer were replaced by a nylon
washer, and it has not happened for a year.
5 To ensure that the
carriage floor is level
with the platforms when
train stops at a station
Unable to
ensure that
carriage floor is
level with the
platform
Layer spring
stiffness
decreases
Car body sags, which can be compensated for
initially by adding adjustment shims. Serious loss
of stiffness means that shims can no longer
compensate. Time to replace layer spring at depot 8
hours.
6 To assist in stopping
the train at up to 0.88
m/s2
Completely
unable to assist
in stopping the
train
Brake pad
worn more
than 10 mm
One worn pad is unlikely to affect the stopping
performance of the whole train, but a number of
worn pads could do so. Pads are usually replaced
when wear exceeds 7 mm and it takes 20 minutes to
repair a pad in the Depot.
6 To assist in stopping
the train at up to 0.88
m/s2
Completely
unable to assist
in stopping the
train
Brake disk
wear exceeds
2.5 mm
One worn disc would not have a significant to
affect on the stopping performance of the whole
train, but several worn disks would do so. Disks are
re-profiled on the under floor wheel lathe when
wear exceeds 2 mm. This takes 2 hours.
6 To assist in stopping
the train at up to 0.88
m/s2
Completely
unable to assist
in stopping the
train
Brake pad
falls off
Brake pad holder scratches the disk, so the disk has
to be re-profiled (2 hours) and brake pad replaced
(20 minutes). One worn disc would not have a
significant effect on the braking performance but
several worn discs would do so.
7 To prevent direct
contact between axle box
and truck frame under
severe bounce conditions
Unable to
prevent contact
between axle
box and truck
under severe
bounce
conditions
Vertical
bump stop
missing
The axle box could hammer against the truck frame
when passing over bumps, leading to deformation
of the axle box and possible accelerated failure of
the axle bearings. Time to replace the bump stop in
Depot up to 8 hours.
8 To permit the truck to
be lifted and/or the car to
be towed easily
Truck cannot
be lifted or car
towed easily
Lifting point
fails due to
wear or
corrosion
This failure could occur while the truck is
suspended in mid-air, which means that it could fall
onto somebody. Time to repair eye by welding 3
hours.
Page 69
Optimal Maintenance Decisions (OMDEC) Inc 2004
8 To permit the truck to
be lifted and/or the car to
be towed easily
Truck cannot
be lifted or car
towed easily
Lifting point
damaged by
external
force
Eye could be weakened or the truck could be
improperly secured for lifting, causing a suspended
truck to fall, possibly onto somebody. Time to fit
new eye 3 hours.
8 To permit the truck to
be lifted and/or the car to
be towed easily
Truck cannot
be lifted or car
towed easily
Lifting point
sheared off
by external
force
Truck could not be lifted at all using the eye, so
alternative arrangements would have to be made.
9 To ensure that wheel
sets remain attached to
truck while truck is
being lifted
Wheel set falls
off truck while
truck is being
lifted
Tie bar
fractures
Wheel set could drop onto somebody while the
truck is suspended in mid-air. Time to replace the
tie bar up to 8 hours in the Depot.
10 To insulate the car
from shocks to some
extent if the air bag fails
Incapable of
insulating the
car if the air
bag fails
Emergency
spring fails
This failure on its own has no effect. If the air bag
fails and the emergency spring both fail, secondary
suspension has to be provided by the layer spring
on its own. 30 above explains what happens if air
bag, layer spring and emergency spring all fail.
Time to replace the emergency spring at Depot 8
hours.
11 To limit lateral
movement of car relative
to truck
Unable to limit
lateral
movement of
car relative to
truck
Lateral bump
stop rubbers
worn away
Under extreme conditions of lateral load, car bolster
stool could hit truck frame, reducing ride comfort
and generally increasing shock loads. Time to
replace lateral bump stop rubber at Depot 8 hours.
11 To limit lateral
movement of car relative
to truck
Unable to limit
lateral
movement of
car relative to
truck
Lateral bump
stop falls off
Under extreme conditions of lateral load, car bolster
stool could hit truck frame, reducing ride comfort
and generally increasing shock loads. Time to
replace lateral bump stop rubber at Depot 8 hours.
12 To prevent traction
link retaining nut from
coming undone
Unable to
prevent traction
link retaining
nut from falling
off bolt
Split pin falls
out
This failure only matters if the retaining nut starts
coming loose. If the retaining bolt falls out, effects
are described in 12 above. Time to replace split pin
at Depot 1 hour.
13 To prevent compound
spring retaining nut from
coming undone
Unable to
prevent the
compound
spring retaining
nut from falling
off
Split pin falls
out
This failure only matters if the retaining nut starts
coming loose. If the retaining nut falls off, the
compound spring would fall off. Large clearance
between the center pivot and the center plate would
cause fierce vibrations in the car compartment and
further damage to the bolster stool. Time to replace
split pin in Depot 1 hour.
Example 2
Functions Functional failures Failure modes Failure effects
1 To supply air to
conditioned air
distribution ducts at the
temperature called for by
pack temperature
controller
A conditioned air is not
supplied at called-for
temperature
1 air-cycle machine
seized
Reduced pack flow,
anomalous readings on
pack-flow indicator and
other instruments
2 blocked ram-air
passages in heat
exchanger
High turbine-inlet
temperature and partial
closure of slow-control
valve by over-
Page 70
Optimal Maintenance Decisions (OMDEC) Inc 2004
temperature protection,
with resulting reduction
in Pack airflow
3 failure of anti-ice
valve
If valve fails in open
position, increasing
impact discharge
temperature; if valve
fails in closed position,
reduced pack airflow
4 failure of water
separator
Condensation (water
drops, fog, or ice
crystals) in cabin
2 To be able to prevent
loss of cabin pressure by
backflow if the duct is
fails in unpressurized
nose-wheel compartment
A No protection against
backflow
1 failure of bulkhead
check valve
None (hidden function);
if duct and or connectors
fail in pack bay, loss of
cabin pressure by
backflow, and airplane
must descend to lower
altitude
Item description: Distributed control system (DCS)
Functions Functional
failures
Failure modes Failure effects
To provide safe, secure,
uninterrupted, redundant, cost
effective, continuous process
control and monitoring
according to the target
product of the day, within the
parameters specified by
product specification and by
current environmental
regulations, in the presence of
a UPS (uninterruptible power
supply)
Fails to
provide
security
Unauthorized
usage of
console either
when
unattended or
if password
stolen
An unauthorized and untrained person
gains access an operating console or an
engineering console. This may lead to a
condition where loss of life or
environmental disaster can occur. In this
eventuality legal or civil proceedings
will likely be brought against the
Company.
Unable to
log in
Password
forgotten
Operator unable control the plant.
Operator would look for another console
which has a log in. In a worst case
scenario all consoles would be locked
out and emergency shutdown would be
initiated if the operator suspects
abnormal operation at that particular
time.
Unable to
protect
against loss
of control
UPS has failed Under normal conditions this failure
would be noticed by the operator who
checks the alarms in the normal
execution of his daily tasks.
Control lost Complete loss
of
communication
with ring
Unreliable or no data shown on console.
Operator loses ability to control the
plant. Emergency shutdown initiated.
The most common cause of this failure
in the past has been contractors
inadvertently cutting cables. This is
likely to take at least 2 hours to one day
to fix entailing a loss of production. This
failure mode is considered to be rare
Page 71
Optimal Maintenance Decisions (OMDEC) Inc 2004
event.
Complete loss
of
communication
with controller
node
One node goes off line. This could be
preceded by any of dirt fouling of fan,
moisture penetration, RF interference,
electronic component failure. Partial or
complete shutdown depending on
importance of node. Unreliable or no
data shown on console. Operator loses
ability to control the plant. Emergency
shutdown initiated. The most common
cause of this failure in the past has been
contractors inadvertently cutting cables.
This is likely to take at least 2 hours to
one day to fix entailing a loss of
production. This has happened
occasionally in the past.
Page 72
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 12. Decision Algorithm
Questions 5, 6, and 7
The process
4. Nonoperational maintenance (M) consequences, which
involve only the direct cost of repair
Example 1 shows several of the records from the full analysis of
the rail passenger car Truck. In the column H S P M we decide,
from the effects description, whether the consequences are hidden,
safety or environmental, production (operational) or maintenance
(non operational). We test each of the four possible consequences
in this order, and we stop as soon as the we ascertain that the
circumstances (effects) of the failure mode provoke the
consequence being tested.
Page 73
Optimal Maintenance Decisions (OMDEC) Inc 2004
Example 1
Function
Statements
Failed
States
Failure
Causes
Local and Global Effects from the
Failure Cause
H
S
O
M
C
C
C
C
T
T
T
T
D
2
N
N
M
M
M
M
Maintenance
Tasks
Interval By
1 2 3 4
To provide smooth
rolling support for
half the weight of a
passenger car (up
to 26.5 tons) on the
rails at speeds up to
120 kph
Fails to provide
support
Weld in
frame
fails due
to fatigue
The truck as a whole collapses. This is most
likely to occur when the car is most heavily
loaded - in other words when it is full of
passengers, and probably while the train is
going round a corner. As a result, it would
almost certainly be derailed. At present, the
truck is replaced when a crack longer than
100 mm is found. (Such a crack would be
found during course of other inspections
that occur often enough to detect it).
Downtime to replace truck on its own 16
hours.
S C Inspect frame for
cracks greater than
100 mm
To be included
with other
scheduled
tasks
Page 75
Optimal Maintenance Decisions (OMDEC) Inc 2004
The RCM decision algorithm is represented by the matrix of
Figure 12-1 That is also in the heading of the decision half of the
RCM worksheet.
H C T D R
S C T 2 R
O C T N R
M C T N R
Figure 12-1 RCM Decision Diagram. Redesign, R, is mandatory in rows
H and S if no proactive task reduces the consequences of failure to a
tolerable level.
We execute the RCM decision logic by beginning at the top right
of Figure 12-1 and working to the left before descending to the
next lower row. The letter in each cell of the matrix represents a
question (step) in the RCM decision algorithm. The full text of the
questions (below) should be explicitly recited as the decision
diagram is being traversed. Avoid the tendency to abbreviate the
questions so much that their meaning is lost or distorted.
Full text of decision diagram questions
H. Is the function's failed state hidden? That is, will the
failure go unnoticed until another function fails or some
extraordinary event occurs?
S. Does the failure affect safety, health, or the
environment?
O. Can the failure provoke operational (production)
consequences. These include cost, quality, and customer
service.
M. Are the only consequences those that affect
maintenance or the maintenance budget?
C. Is a condition based maintenance (CBM) task
applicable? Can it reliably detect the 'failing' state early
enough to reduce the multiple failure's probability and/or
its consequences to a tolerable level? Is it effective? Does it
make economic sense to perform this task at the frequency
required?
T. Is a time based maintenance task applicable? Is there an
age (useful life) at which the probability of failure due to
this failure mode increases rapidly, and do most items
survive to this age? Effective: Can a routine (TBM) task
reduce the multiple failure's probability and/or its
consequences to a tolerable level? Two types of time based
tasks are considered under this heading: 1) Scheduled
Overhaul, and 2) Scheduled Discard, the letter being
mandatory for a safe-life item
104
.
D. Is a detection task applicable? Will it reduce the
multiple failure's probability to a tolerable level. Is it
Page 76
Optimal Maintenance Decisions (OMDEC) Inc 2004
effective? Is it practical to do the task at the required
interval?
2. Can a combination of 2 or more TBM and CBM tasks be
applicable (avoid or reduce the safety consequences to a
tolerable level)? Are they effective (practical)?
N. No time nor condition based activities need be
scheduled.
R. A hardware, software, or procedural modification that
will reduce the failure's probability and/or its consequences
to a tolerable level is mandatory (H or S) or may be
desirable (P or M).
For the failure mode (cause) Weld in frame fails due to
fatigue we ask whether the failure is hidden. Since the
failures direct effects will be clearly visible (probably
catastrophic) to operating personnel, this failure is not
hidden. Therefore we proceed to the next cell to the right
and ask whether there is a CBM task that is applicable and
effective. We need search no further than the effects
description to learn that it is entirely feasible to detect a
crack at the potential failure stage of 100 mm length. It will
be effective (economically feasible to do so) because there
will be ample opportunity to perform this inspection often
enough during other routine work (to be described in
subsequent rows of the analysis.). Hence we stop at that
point and enter C under the second third of the matrix.
Example 2
H C T D M
S C T 2 M
O C T N M
M C T N M
Example 4
Page 77
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 12-2 The shock-strut assembly on the main landing gear of the
Douglas DC-10. The outer cylinder is a structurally significant item.
Structures Worksheet: type of Aircraft Douglas DC-10-10
Item Number: 101 No. per aircraft: 2
Item Name: Shock-strut outer cylinder Major area: main landing gear
Vendor part/model no: PN ARG 7002-505 Zones: 144, 145
Design criterion:
Damage tolerant element: __
Safe-life element: Yes
Inspection access:
Description/location details:
Shock-strut assembly is located on main landing gear; SSI
consists of outer cylinder (both faces)
Internal: Yes
External: Yes
Material (include manufacturer's trade name): Steel alloy
4330 MOD (Douglas TRICENT 300 M)
Redundancy and external
detectability:
No redundancies; only one cylinder
each landing gear, left and right
wings. No external detectability of
internal corrosion.
Fatigue-test data Is element inspected via a
related SSI? If so, list SSI no.: No
Expected fatigue life:
Classification of item
(significant/nonsignificant):
significant
Crack propagation:
Established safe-life: 46,800 landings 70,200 oper. hours
Design conversion ratio: 1.5 operating hours/flight cycle
R
e
s
i
d
u
a
l
s
t
r
e
n
g
t
h
F
a
t
i
g
u
e
l
i
f
e
C
r
a
c
k
g
r
o
w
t
h
C
o
r
r
o
s
i
o
n
A
c
c
i
d
e
n
t
a
l
d
a
m
a
g
e
C
l
a
s
s
n
o
.
C
o
n
t
r
o
l
l
i
n
g
f
a
c
t
o
r
I
n
s
p
e
c
t
i
o
n
(
i
n
t
.
/
e
x
t
)
Proposed task Initial interval
Page 78
Optimal Maintenance Decisions (OMDEC) Inc 2004
- - - 1 4 1 CorrosionInternal Magnetic-particle
inspection for cracking and
detailded visual inspection
for corrosion
Sample at 6000 to
9000 hours and at
12000 to 15000
horus to establish
best interval
External General inspection of outer
surface
Detailed visual inspection
for corrosion and cracking
During preflight
walkarounds and at
A checks
Not to exceed 1,000
hours (C check)
Remove and discard at life
limit
34,800 hours
Figure 12-3 RCM Worksheet for structurally significant items
The worksheet of Figure 12-3 differs from that of the previous
examples. This form applies to the anlaysis of structurally
significant items. All structually significant items, fall into one of
two categories:
1. Damage-tolerant item: A monolithic or multiple load path
item in which a crack or complete failure of an element will not
reduce residual strength below the safety level prior to detection,
or
2. Safe-life item: A structurally significant item whose
potential failure is not reliably detectable.
Table 12-1 explains the rating system for the first 5 columns of
Figure 12-3. The analysis shows the treatment of a safe-life item in
an airline context. Because the shock-strut outer cylinder on the
main landing gear of the Douglas DC-10 has been classified a safe-
life item it must be discarded before a fatigue crack is expected to
occur. Hence it is not rated for residual strength, fatigue life, or
crack propagation characteristics (the first three columns of Figure
12-3). The Class Number of column 6 is set to the minimum of the
columns 1 to 5. The controlling factor is that which corresponds
to the minimum (of the 5 columns).
Safe-life limits are only effective, however, if nothing prevents the
item from reaching them. In the case of structural items, there are
two factors that introduce this possibility corrosion and
accidental damage. Experience has shown that landing-gear
cylinders of this type are subject to two corrosion problems. First,
the outer cylinder is susceptible to corrosion from moisture that
enters the joints at which other components are attached; second,
high-strength steels such as 4330 MOD are subject to stress
corrosion in some of the same areas. The item is given a corrosion
rating of 1, which results, therefore, in a (overall) class number of
1.
Page 79
Optimal Maintenance Decisions (OMDEC) Inc 2004
In addition to the corrosion rating, the shock-strut cylinder is rated
for susceptibility to accidental damage. The cylinder is exposed to
relatively infrequent damage from rocks and other debris thrown
up by the wheels. The material is also hard enough to resist most
such damage. Its susceptibility is therefore very low, and the rating
is 4. However, because the damage is random and cannot be
predicted, a general check of the outer cylinder, along with the
other landing-gear parts, is included in the walkaround inspections
and the A check, with a detailed inspection of the outer cylinder
scheduled at the C-check interval.
Table 12-1
Reduction in
residual strength
Fatigue life of
element
Crack-
propagation rate
Susceptibility to
corrosion
Susceptibility to
accidental damage
No. of
elements that
can fail
without
reducing
strength below
damage
tolerant level
r
a
t
i
n
g
Ratio of
fatigue life to
design goal
Ratio of
interval to
fatigue-life
design goal
r
a
t
i
n
g
Ratio of
corrosion-free
age to fatigue-
life design
goal
r
a
t
i
n
g
Exposure as a
result of
location
r
a
t
i
n
g
One 1 1/8 1 1/8 1 High 1
Two or
more
106
2 2 2 Moderate 2
Two or
more
107
3 3/8 3 3/8 3 Low 3
Two or
more
108
4 4 4 Very low 4
Page 81
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 13. Can RCM and Streamlined RCM
peacefully co-exist?
Introduction
Religious or political zealots confront one another, often, not on the basis
of the mores of their respective doctrines, but rather from superficial
differences in the details surrounding each others cultural reference
points. Mathematicians take pride in their ability to adopt a new set of
definitions and symbols as effortlessly as they would don a fresh suit of
clothes. Thus they proceed, unfettered by prior points of view, to build
new theorems upon old. The world of maintenance has, not dissimilarly,
spawned a multitude of cultures and languages for formulating solutions to
real problems.
In the preceding chapters we conducted RCM on several diverse item
types. We systematically answered each of the seven RCM questions
about the item, and, in the order stipulated by the SAE JA-1011 standard:
1) functions?, 2) failures?, 3) failure modes?, 4) failure effects?, 5)
consequences?, 6) scheduled tasks?, and 7) default tasks?. We entered
the answers to the questions in an electronic spreadsheet (for example, MS
Excel or a database form) formatted as the RCM Worksheet illustrated in
Figure 10-2. on page 111.
This chapter explores streamlined RCM software. We begin with an
examination of what is meant by streamlining. We illustrate the
streamlined approach by describing a popular representative RCM
software package called RCM Turbo
109
. We set up a cross-reference
dictionary of terms describing similar sounding but, sometimes,
differently applied concepts in the two languages. Finally we summarize
the relative advantages and potential drawbacks of the streamlined RCM
and the RCM processes. Through this process, we discover how the
juxtaposition of two approaches may benefit the proponents of both.
Why streamline RCM?
Page 82
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 10. (page 110) cited the SAE Standard Evaluation Criteria for
Reliability-Centered Maintenance (RCM) Processes that defines RCM
as:
a specific process used to identify the policies which must be
implemented to manage the failure modes which could cause the
functional failure of any physical asset in a given operating context.
It goes on, to define the process by adding:
Any RCM process shall ensure that all the following seven questions
are answered
satisfactorily and are answered in the sequence shown as follows:
a. What are the functions and associated desired standards of
performance of the asset in its present operating context (functions)?
b. In what ways can it fail to fulfill its functions (functional failures)?
c. What causes each functional failure (failure modes)?
d. What happens when each failure occurs (failure effects)?
e. In what way does each failure matter (failure consequences)?
f. What should be done to predict or prevent each failure (proactive tasks
and task intervals)
g. What should be done if a suitable proactive task cannot be found
(default actions)?
Were we to consider the process (of answering the 7 RCM questions in the
sequence stipulated) unacceptably resource intensive, then,
understandably, we would seek to replace it with a process that consumes
less time and fewer resources, but by one that provides, no less a
responsible (sufficiently rigorous) analysis. We emphasize that the JA
1011 SAE standard stipulates a minimal set of criteria for a process to be
called RCM. Therefore, it is to be expected that most commercially
packaged RCM software systems and methodologies will add a
considerable number of features that will enhance and facilitate the
experience.
The original
110
as well as the various streamlined RCM methods all
demand that the assembled team of analysts (operational, process, and
maintenance specialists) possess, collectively, the knowledge necessary to
make informed decisions regarding the maintenance characteristics of the
item under scrutiny. The process chosen (either original or streamlined)
must, therefore, encourage the maximum contribution by each participant
so that RCM decisions will carry the force of all knowledge and
experience available on the team. The success of any RCM
methodology, therefore, depends heavily on its ability to gain true
consensus, throughout every stage of the analysis. The group, guided by a
well trained facilitator, exercises its best judgment when visualizing the
Page 83
Optimal Maintenance Decisions (OMDEC) Inc 2004
typical worst case scenario (TWCS) surrounding each functional failure
analyzed.
With these objectives in mind, we compare the two processes, starting
with a dictionary of some of their respective terms of reference.
RCM/RCM Turbo dictionary
Table 13-1 Relationship between RCM and RCM Turbo terminology
RCM RCM Turbo
Item: a collection of parts, or systems that is
convenient to analyze as a group. It has been
selected at a high enough level of indenture that its
failure may easily be related to that of the
equipment as a whole, but at a low enough level so
that the analysis is of manageable size (i.e. having a
manageable number of failure modes).
Maintainable item (MI): same meaning
No equivalent terminology is specified by the RCM
minimum criteria standard. (Any convenient or
existing equipment hierarchy naming system may
be used.) Operating context is recorded in a flexible
structure at the top of the RCM worksheet.
Productive unit (PU): A system that includes
several maintainable items. A convenient place to
record the operating context of the MI. A productive
unit belongs to a Major Unit and a Plant is the
highest level in the Turbo RCM hierarchy.
Worksheet: A document (conveniently an
electronic spreadsheet or simple database
application) onto which the answers to the 7 RCM
questions are recorded during the RCM team
session.
The RCM Turbo software product is not meant to
be populated during the sessions, but afterwards by
the facilitator or other person trained in the use of
the software. A MS Excel form (Figure 13-2 page
163) is provided for use during the sessions.
The RCM minimum criteria standard does not
specify a criticality or priority scale with which to
schedule the order of items to be analyzed. Nowlan
and Heap developed a simple priority system for the
aviation industry that has only two criticality
ratings: 1)significant item
111
, and 2) non-significant
item. This classification system has proved useful in
a variety of other industries. For structurally
significant items (SSI) Nowlan and Heap apply a
further classification of one to four for each of the
five categories: 1)Residual strength after failure, 2)
Fatigue life, 3) Crack growth, 4) Corrosion, and 5)
Accidental damage. The minimum class (for all 5)
determines task frequency. There are two categories
of SSI: 1) Damage-tolerant and 2) Safe-life.
Classifications 1 to 5 apply to damage-tolerant
items, but only classifications 4 and 5 apply to safe-
life items. (See Example 4 of Chapter 12. on page
150).
Criticality/Priority: values used to set priorities for
PUs and MIs. It is derived by question and answer
sessions driven by the program. (Criticality
calculations in no way detract from RCM. They
merely add another dimension to the analysis.)
Failure: Describes the way in which a specified
function no longer performs as required. It
distinguishes (for example) full from partial
failure of a function. The RCM Worksheet enforces
a one-to-many integrity constraint between Function
and Failure.
Failure: same basic definition. However Turbo-
RCM does not constrain a one-to-many relationship
between Function and Failure.
Failure Mode: A reasonably likely cause of a
specified failure. Consists of a noun, a verb (active
or passive form) and a phrase such as due to .
For example bolt cracks due to stress corrosion
Failure Mode: A superset of the RCM definition.
Structured in 3 parts as follows:
1) a component reference, 2) a Failure Mode &
Effect field - a single field that includes both RCM
Page 84
Optimal Maintenance Decisions (OMDEC) Inc 2004
fatigue. The number of failure modes to list and
their depth of causality depend on operating
context. RCM enforces a one-to-many integrity
constraint between failure and failure mode. RCM
Turbo does not.
concepts (Failure Mode and Failure Effects), and 3)
a Root cause reference. An example of a RCM
Turbo failure mode is: Bearings + wear between
rolling elements and racers leading to increased
vibration levels, localized heating and eventual
seizure and total stoppage of process due to +
normal wear and tear.
Failure Mode: In RCM, the terms Root Cause,
Failure Mode, Failure Mechanism, Failure
Reason, etc are all equivalent and represented by
the term Failure Mode. It is an event in the
causality chain that leads to the failed state. The
link in the causality chain selected as the Failure
Mode is the one that the organization can manage
effectively and practically by whichever means
(proactive, detective, or redesign).
Root cause: related to Failure Mode. Same
definition. That is, Root Cause in Turbo RCM is
equivalent to Failure Mode in RCM.
Failure Effects: Text answering the following:
what sequence of events (considering a TWCS
112
in the component, in the system, organization wide,
and in the external world) could be touched off by
the failure mode?
how does the failure make itself known? What
observable events lead up to the failure?
how is safety or the environment impacted?
(without mentioning the words "safety" or
"environment")
how is production impacted? (quality, cost,
customer service)
is there any additional damage caused by the
failure?
how long will it take and what actions must be
accomplished to correct the failure?
How does the likelihood of this failure depend on
deeper causes? Has it happened before? How often?
Under what circumstances?
Same definition but it is structurally embedded in
the Failure Mode & Effect field. In addition the
following Failure Mode fields (with sample data)
contribute to the Effects narrative:
Unit Output Reduction: Total stoppage,
PU Downtime Cost: $11,390 / hour,
MI Downtime Cost: $11,390 / hour
F/mode&Effects: Shaft failure-Chemical
corrosion,overtorque, indicated by cracks, increase
in vibration leading to shutdown of Brownstock
washer
Characteristic: Definitive life / wear out
charactersitcs
Measurabilty: Moderately easy to monitor
Category: Normal Operation
Typical Warn Time: 4 Weeks
Root cause: Normal wear & tear
MTBF: 5 years
Consequence: Total stoppage
Strategy: CBM
Hidden Function: A Function whose failure will
not be detected under normal circumstances.
Identified by RCM during functional analysis when
examining each component (from schematics,
p&ids, photographs, and physical walkaround) and
listing the functions they suggest. Code phrases
(such as able to, in the presence of, etc) are
used to point out that a function is hidden or
protected by a hidden function. Subsequent
questions address the hidden function. The hidden
consequence supplants all other possible failure
consequences in the RCM logic for determining a
mitigating task.
Hidden Failure Mode: Same meaning. Consists of
the fields: Component, Failure Mode & Effects,
Task Description, Frequency, Duration, Initiate
Date, Job Group ID, Service Period, No. of Units in
Service, No. of failures, and MTBF of the protective
device (calculated).
RCM records this information in the free text
answer to question 4, Failure Effects
MTBF: related to the Failure Mode.
RCM records this information in the answer to
question 6 Tasks when following one of the four
branches (H, S, O, N) in the RCM decision logic
tree.
Strategy: related to Failure Mode. Takes one of
three possible values: 1) fixed time maintenance, 2)
condition based maintenance, or 3) operate to
failure
Page 85
Optimal Maintenance Decisions (OMDEC) Inc 2004
Same definition. RCM records this information in
the free text answer to question 4, Failure effects
P-F Interval: related to Failure Mode. Estimated
interval (measured in working age units) between
the appearance of a potential failure and a functional
failure.
Potential failure: An indicator that a failure mode
has initiated.
S/A (secondary action) Indicator: same meaning.
No equivalent concept in RCM. If a failure mode is
due to design, lubrication, overload, or maintenance
practices, they would each constitute a separate
failure mode, and this information would be
included in the failure mode description itself. The
word Safety or Environment is not mentioned
until the consequence phase of the RCM logic
diagram.
Category: related to Failure Mode. Takes one of six
possible values: 1) Design, 2) Lubrication, 3)
Normal Operation, 4) Overload Condition, 5)
Maintenance practices, or 6) Safety
RCM records this information in the free text
answer to question 4, Failure effects
Characteristic: related to Failure Mode. Takes one
of three possible values: 1) Definitive life/wearout,
2) General degradation, and 3) Random
Consequences: Question 5. Takes one of four
possible values: 1) Hidden, 2) Safety
/Environmental, 3) Operational, and 4)Non-
operational.
RCM records RCM Turbos Consequence in the
free text answer to question 4 Failure effects and
in the third or fourth option of Question 5
Consequences.
Consequence: related to Failure Mode. Takes one
of four possible values: 1) Total stoppage, 2) Partial
stoppage/quality, 3) No immediate effect, or 4) No
effect
RCM records this information in the free text
answer to Question 4 Failure effects and in the
answer to Question 6 Tasks. Q6 asks whether
there is an applicable CBM task. Once a (CBM or
other) task is found to be applicable (practical)
RCM then asks whether it will be effective. That is,
will it sufficiently reduce or entirely avoid the
consequences of failure at acceptable cost?
Measurability: related to Failure Mode. Takes one
of three possible values: 1) Easy, 2) Moderate, or 3)
Impossible
Redesign: RCM records this information in the
free text answer to question 7, Default Tasks.
Differs from RCM Turbo only in the sequence in
which this question appears (i.e. following a
determination that no proactive or failure finding
task adequately mitigate the consequences of the
failure.)
Design Notes: related to the Failure Mode. Records
decision/recommendation to design-out the failure
mode. (strictly speaking it is presented out of RCM
sequence.)
RCM provides no specific field for this
information, leaving its provision up to the
implementer or commercial packager.
Strategy Notes: related to Failure Mode. A free text
field used to store comments or notes on the chosen
maintenance strategy. Useful where a second or
alternative strategy has been considered and
rejected.
RCM records this information in the free text
answer to question 4, Failure Effects. However, in
far less detail.
Breakdown Action: related to Failure Mode.
Describes what must be done to repair the
functional failure. Also has the specific fields:
Work Order No., SOP, Duration, Downtime, MI
Status, S/A Initiator, Resources (up to six steps),
Assumptions, Materials, Spares.
RCM records this information in the free text
answer to question 6, Tasks. However, in far less
detail.
Primary Action: Related to the Failure mode.
Describes what should be done to prevent the failure
mode. Also has the specific fields: Work Order
No., SOP, Duration, Downtime, MI Status, S/A
Page 86
Optimal Maintenance Decisions (OMDEC) Inc 2004
Initiator, Resources (up to six steps),
Assumptions, Materials, Spares.
RCM records this information in the free text
answer to question 6, Tasks. However, in far less
detail.
Secondary Action: related to Failure Mode.
Describes what must be done following the
detection of a potential failure. Also has the specific
fields: Work Order No., SOP, Duration,
Downtime, MI Status, S/A Initiator, Resources
(up to six steps), Assumptions, Materials, Spares.
RCM records this information in the free text
answer to question 4, Failure Effects. However, in
far less detail.
Overhaul Action: related to Failure Mode. Records
Overhaul Maintenance actions. For example, where
the Secondary Action was the change-out of a
rotable item which itself requires subsequent
overhaul. Also has the specific fields: Work Order
No., SOP, Duration, Downtime, MI Status, O/H
Venue, S/A Initiator, Resources (for up to six
steps), Assumptions, Materials, Spares.
Not called a library. However, the records are
equally accessible (structured as answers to the
seven questions) in the RCM worksheets
comprising the global RCM table. No corporate
harmonizing process need be applied because every
record is a one-off development. However, tools,
training, supervision and support are required to
validate and maintain the knowledge base.
Templating of an entire item, is, nonetheless,
possible by copying any or all records of an item
after carefully comparing their respective operating
context descriptions.
Failure Data Library: a table of 3 part failure
modes referenced by Machine Type. An
administration process is used to control the quality
of data from multiple sites and harmonize it for the
purpose of providing templates where applicable
in future analyses of other MIs or PUs. The ease of
templating justifies the appellation Streamlined
in the case of RCM Turbo.
We may conclude from Table 13-1, that, although RCM Turbo refers to
itself as a streamlined process, and, that some of its terminology differs
from that of RCM, it does not omit any vital knowledge element specified
by the SAE RCM minimum criteria standard. RCM Turbo does deviate
from the sequence stipulated in the standard. As pointed out in Chapter 10.
(page 110), in practice, however, RCM is not a sequential process. RCM
analysts anticipate the answers to subsequent questions while working
the current question. Furthermore, the RCM process is iterative. That is,
the analysts often return to a previous answer and adjust it in the light of
revelations further on in the process. The iterative and non-sequential
nature of the RCM process tends to render less important the differences
between the two approaches.
The terminology comparisons of Table 13-1 show that RCM Turbo
extends the information elements of RCM into greater structural detail.
Such data structuring facilitates the post-RCM processes (included in the
RCM Turbo software package) of workload smoothing, frequency
calculations, and CMMS integration as well as integration with a spares
optimization (optional) package.
Page 87
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 13-1 of Example 1 shows how the RCM Worksheet of Chapter 10.
(Figure 10-2 page 111) may be combined with the extended data fields of
RCM Turbo.
Page 88
Optimal Maintenance Decisions (OMDEC) Inc 2004
Example 1
PU Code: Repulper, MI Code: Repulper screw
Function Statement Failure Failure
mode
Effects
To feed material 24
hours/day
Does not feed
at all
Shaft fails Unit Output Reduction: Total stoppage,
PU Downtime Cost: $11,390 / hour,
MI Downtime Cost: $11,390 / hour
F/mode&Effects: Shaft failure-Chemical
corrosion,overtorque, indicated by cracks, increase
in vibration leading to shutdown of Brownstock
washer
Characteristic: Definitive life / wear out
charactersitcs
Measurabilty: Moderately easy to monitor
Category: Normal Operation
Typical Warn Time: 4 Weeks
Root cause: Normal wear & tear
MTBF: 5 years
Consequence: Total stoppage
Strategy: CBM
Figure 13-1 RCM Worksheet applied to a RCM Turbo example
In the RCM worksheet of Figure 13-1 we note that most of the
RCM Turbo failure mode fields (in bold) fall quite readily into
the RCM Effects column, with the possible exception of the field
Strategy. The latter appears to preempt the RCM decision logic
of question 6. We view this, nonetheless, as an insignificant
departure (from RCM), given that RCM analysts consider the
mitigating task in the normal course describing the effects of
failure. It is essential that the RCM consequences (H, S, O, or M)
be determined and the complete decision logic of RCM (Figure
12-1 on page 143) be applied immediately following this RCM
Turbo step.
RCM Turbo facilitates data entry with a convenient Visual Basic
MS Excel form illustrated in Figure 13-1.
Page 89
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 13-2 MS Excel failure mode entry form in RCM Turbo
RCM Turbo then will perform a primary (i.e. a CBM) task
frequency calculation and display the results that 14 days (i.e. half
the warning interval) is the recommended task frequency.
Page 90
Optimal Maintenance Decisions (OMDEC) Inc 2004
RCM Turbo has blended risk and cost (in much the same way as
described in Chapter 9. page 78) to estimate an optimal CBM
inspection frequency. It performs analogous calculations for time-
based and failure-finding tasks. The complete set of RCM Turbos
data fields is given in Appendix 12. on page 195.
Conclusions
1. Table 13-1 illustrates that streamlined RCM (as it is embodied in
RCM Turbo), is not streamlined (i.e. in the sense of being abridged or
reduced). Rather, it encompasses the principles of RCM, adding features
that address CMMS integration, quantitative reliability assessment and
task frequency calculations, spares, workload scheduling and balancing,
and other considerations.
2. RCM Turbo does address the 7 RCM questions, however, not in
the sequence stipulated by the RCM Standard.
3. The RCM Turbo software expands the 7 information elements of
RCM into multiple database fields. For example, MTBF, P-F Interval,
Repair time, etc are all explicit fields related to a Failure Mode.
4. Combining the RCM Worksheet with RCM Turbo can reduce the
workload in RCM Turbo. For example, if a failure mode has Safety
Consequences, there is no need to bother filling in the Turbo field
Consequence.
5. In this author's view (that some proponents of both camps may
challenge) a RCM Worksheet along the lines described in Figure 10-2 on
page 111, provides excellent team focus regardless of the methodology
adopted. If populated (perhaps adapted as in Figure 13-1 page 162) with
RCM Turbo's needs (see Turbo's data sheet page ) in mind, the worksheet
will benefit both streamlined and original RCM users.
6. Both RCM and RCM Turbo demand that the persons (primarily
maintainers and operators), directly impacted by the decisions flowing
from either process, participate fully in the process. Indeed they must
drive it. External consultants can only teach the principles and techniques
of RCM. The organization must select its analysts from among its most
experienced and competent operators and maintainers. It must chose a
facilitator who will learn the process fluently, elicit, and faithfully record
the technical knowledge of the analysts.
References:
1. RCM Turbo Maintenance Plan Development System Quick
Reference Guide
2. RCM Turbo V9.2 User Guide
3. RCM Turbo V9 desktop guide rev 2
4. RCMT92 Installation Instructions
Page 91
Optimal Maintenance Decisions (OMDEC) Inc 2004
Page 92
Optimal Maintenance Decisions (OMDEC) Inc 2004
Chapter 14. Appendices
Appendix 1.
The role of the RCM Facilitator
1. Administration 2. Animation 3. Clarity 4. Time Management
5. Focus
The quality and success of the RCM analysis will depend
on how well the facilitator has mastered and exercises his
skills outlined in Table 14-1: RCM facilitators checklist.
The facilitators skill and vigilance will prevent the analysis
from being dangerously superficial, or, conversely, from
becoming bogged down and stalled in unnecessary detail.
The novice facilitator should refer often to this scorecard
throughout the RCM project, and continually self-evaluate
h(is)(er) performance, (initially under the watchful eye of
an experienced RCM practitioner) with respect to each of
the items in Table 14-1.
Table 14-1: RCM facilitators checklist
1.0 Administration Score
Shortly after the RCM analysis has been completed, assemble the
worksheets and supporting documentation (drawings, photographs) into
a coherent, readable dossier for review and authorization by a
designated auditor.
1 2 3 4 5
In the planning phase, before an RCM analysis begins, ensure that
potentially useful documentation (drawings, schematics, etc) are readily
accessible to the team. Discuss the general RCM objectives, beforehand,
with resource people, outside the team, so they may respond quickly
when called upon to provide clarification or information when required
the course of the analysis.
1 2 3 4 5
Assist in the selection of the appropriately skilled RCM team members. 1 2 3 4 5
Assist in the initial decomposition of the asset/plant into manageable
significant items for individual RCM analyses. Position the items
boundaries so that it can be analyzed in 6 to 14 3 hour sessions. Ensure
that an item has not been defined at too low a level of indenture where
failure modes would be difficult to relate to the failure of the equipment
as a whole. During the analysis, decide how the failure modes of a
subsystem should be handled whether to 1) break out the subsystem
for more convenient, separate, analysis later, or 2) consider each of the
subsystem's failure modes as part of the main analysis, or 3) consider
the subsystem's failure modes as a single failure mode, or 4) consider
(as part of the main analysis) each of the subsystem's dominant failure
mode(s) singly and the other failure modes lumped under the title
others.
1 2 3 4 5
Page 93
Optimal Maintenance Decisions (OMDEC) Inc 2004
Assist in the development of the items operating context 1 2 3 4 5
Assist in the scheduling of the RCM sessions 1 2 3 4 5
Report regularly on progress to the RCM sponsor. Call upon h(im)(er)
for help in resolving technical, organizational, or human issues as they
arise
1 2 3 4 5
Assist in the preparation of the presentation (by a team member) to
management at the end of the analysis
1 2 3 4 5
Provide team members access to the evolving RCM worksheet as the
analysis unfolds from session to session.
1 2 3 4 5
2.0
Animation
Score
Recognize and be sensitive to each personality type. Help each team
member contribute fully to the RCM process by using one or more of
these techniques: Gently discourage the extrovert from monopolizing
the floor by (following a tirade) asking a question to another team
member. ("George, what do you think about that") Encourage the
introvert by asking h(im)(er) questions and by assigning short research
tasks between sessions on unclear issues. (calling a vendor, checking a
log sheet, etc). Ask h(im)(er) to report on h(is)(er) findings at the
beginning of the next meeting. Be careful not to harass h(im)(er).
1 2 3 4 5
Recognize when true consensus is achieved. Never permit a vote. Keep
in mind that a lone dissenter may be right. Record h(is)(er) position and
ask h(im)(er) to agree to disagree until further elucidating information
comes along.
1 2 3 4 5
Sustain the morale of the group by summarizing progress at the
beginning of each session, and by always being positive about the
process. Express praise and gratitude when someone makes a
noteworthy contribution to the analysis.
1 2 3 4 5
At the beginning of the first session of the RCM analysis, help the team
set and agree upon the ground rules (smoking, punctuality, etc)
1 2 3 4 5
Recognize when the team simply does not know (about some aspect
of the asset) by being alert to statements beginning with "I think ..." or "I
believe ...". Assign short research tasks to team members to find out.
1 2 3 4 5
Remind participants of the objectives and importance of the analysis and
that they have been chosen to participate because of their knowledge
and experience.
1 2 3 4 5
With an inexperienced team be alert to misunderstandings of the process
and the meanings of questions. Use timeouts to clarify points of RCM
procedure when required. Common misunderstandings are a) mixing up
failed states and failure modes, b) mixing up average life (mtbf), useful
life, and B
n
life, etc., distinguishing potential failure from functional
1 2 3 4 5
Page 94
Optimal Maintenance Decisions (OMDEC) Inc 2004
failure, d) recognizing the difference between a failure finding task and
an on-condition task
Be alert to answering the wrong question. This could occur at anytime
throughout the RCM process. An example is the raising of an
operational consequence when the process has moved onto the safety
and environmental branch of the decision diagram.
1 2 3 4 5
Safeguard the self-esteem of each team member. Loss of face may
occur by persons considered knowledgeable. Under all circumstances
emphasize (in timeouts and anecdotes) that RCM is, above all else, a
learning forum to bridge the discontinuities in the knowledge of
individuals by gaining advantage from the collective perspectives of the
team.
1 2 3 4 5
3.0
Clarity
Score
Input the answers to the RCM questions into the RCM worksheet. 1 2 3 4 5
While entering the answers, retain team members wording as much as
possible. Occasionally, when necessary suggest ways of expressing the
answers more succinctly in written form. Revise and correct the text
outside the meeting without altering what was said and meant during the
session. When in doubt obtain approval from the team for extensive
word-smithing. Avoid jargon. Ensure that the technical terms used on
the worksheet will be understood by everyone on the site.
1 2 3 4 5
4.0
Time Management
Score
Following an RCM decision to modify an asset or operating procedure,
resist the enticement to redesign the asset (or operating procedure)
during the RCM meeting. Allow the team to go only so far as to
elaborate the redesign requirement. Do NOT embark on a design
brainstorming process at this time.
1 2 3 4 5
Remind the team of the time allotted to the current analysis and the rate
of progress necessary to attain that goal.
1 2 3 4 5
Keep the pace of analysis (all 7 steps) at an average rate of 6 failure
modes per hour.
1 2 3 4 5
Indicate that about 1/3 of the time will be dedicated to defining the
functions, 1/3 on failures, modes, and effects (FMEA), and 1/3 on
consequences, decisions, and task definition and assignment.
1 2 3 4 5
5.0
Focus on the process
Score
Ask the RCM questions. Never answer them. (If the team may have
made a technical error or omission rephrase the questions to probe in a
particular direction or ask that a particular point be checked between
sessions.)
1 2 3 4 5
Call a timeout when necessary to explain pertinent the RCM process. 1 2 3 4 5
Page 95
Optimal Maintenance Decisions (OMDEC) Inc 2004
Elaborate the asset's operating context at the beginning of the analysis.
Keep it in the teams mind throughout the analysis.
1 2 3 4 5
Ensure that the 7 RCM questions are asked completely, in the manner,
and the order prescribed by SAE JA1011
113
.
1 2 3 4 5
Resist the tendency to skip questions, or parts of questions by taking
their answers for granted. In particular ask, explicitly, each question
(page 143) along the appropriate logic branch of the decision diagram.
The RCM process must be performed rigorously. In spite of the
repetitious nature of the process do not abbreviate the questions so much
that their meaning is lost or distorted.
1 2 3 4 5
Pay strict attention to the following issues with respect to each of the
SAE JA1011 RCM questions (5.1 to 5.7)...
1 2 3 4 5
5.1 What are the functions and associated desired standards of
performance of the asset in its present operating context
(functions)?
Ask the team to uncover the primary functions, the secondary functions,
including all hidden functions. Afterwards invoke the PEACHES
mnemonic to double check that all functions have been listed.
1 2 3 4 5
Direct the team to include as many quantitative performance
requirements as practical in each function statement to fully describe the
users (owners, societal) objectives for the asset. The function statement
usually begins with To or Not to . Avoid the use of and
between two verbs.
1 2 3 4 5
Simplify the function list by deciding when certain functions may be
more conveniently included as a failure mode of another functional
failure. For example, the function "Not to trip when the liquid level is
below 100 hectoliters" preferably should be included as the failure mode
"pump trips due to grounded electrical contact" of the primary function
"To pump x liters ... "
1 2 3 4 5
Have the team use code phrases to imply a hidden function (e.g. to be
capable of, to be able to, to heat to 140C in the presence of a standby
heater.)
1 2 3 4 5
5.2 In what ways can it fail to fulfill its functions (functional failures)?
Ensure that each quantitative performance requirement within an
individual function statement is addressed. Separate partial and total loss
with respect to each requirement.
1 2 3 4 5
5.3 What causes each functional failure (failure modes)?
Page 96
Optimal Maintenance Decisions (OMDEC) Inc 2004
Pay enormous attention to the number of failure modes to be included
and to their depth of causality. The list should be tempered by the
reasonable likelihood of occurrence and by the gravity of the
consequences (always keeping the operating context in mind.) More
serious consequences would tend to lengthen the list of failure modes to
be addressed. The depth (no of times to ask why) of causality at which
to specify a failure mode is likewise operating context sensitive. The
depth should be that at which the organization can do something about
the failure or its consequences.
1 2 3 4 5
5.4 What happens when each failure occurs (failure effects)?
Extract from the team the sequence of events (internally and
organization-wide) that could be touched off by the failure mode? Also
describe:
how does the failure make itself known?
how is safety or the environment impacted? (without mentioning the
words "safety" or "environment")
how is production impacted? (quality, cost, customer service)
is there any additional damage caused by the failure?
how long will it take and what actions must be accomplished to correct
the failure?
How does the likelihood of this failure depend on deeper causes? Has
it happened before? Under what circumstances?
1 2 3 4 5
5.5 In what way does each failure matter (failure consequences)?
Carefully examine the failure effects as elaborated in 5.4 above and
select one of the four possible consequences.
1 2 3 4 5
5.6 What should be done to predict or prevent each failure (proactive
tasks and task intervals)?
For CBM tasks, explore alternative technologies, and expose the true
costs of the proposed program. For all proactive tasks consider the long
run costs of the task and the those of the failure consequences it is
designed to reduce or prevent.
1 2 3 4 5
Set the proactive task intervals. For CBM estimate using consensus the
P-F interval, or apply a risk based non-deterministic approach such as
EXAKT when the failure mechanism is not clearly understood. For
TBM estimate the useful life regarding the failure mode in question.
1 2 3 4 5
5.7 What should be done if a suitable proactive task cannot be found
(default actions)?
The three possible default actions: run-to-failure, failure detection, and
redesign must be considered when so directed by the decision diagram.
For hidden failures, the detection interval must consider the acceptable
level of risk of a multiple failure.
1 2 3 4 5
Ensure that the group has considered all practical aspects of the task that
has been selected. The task descriptions must contain the necessary
detail to ensure that no misunderstanding is possible when it is
transcribed into the maintenance system.
1 2 3 4 5
Page 97
Optimal Maintenance Decisions (OMDEC) Inc 2004
Appendix 2.
Sizing the analysis
The RCM facilitator, at the outset, makes a most important
decision to define the boundaries of the item being analyzed.
RCM can be applied at almost any level of the
Figure 14-1
asset hierarchy. However Figure 14-1 implies that there are
compromises that we must weigh when selecting level at which to
define our item. The advantage of a high level is that the items
functions and functional failures are more clearly related to the
performance requirements of the equipment as a whole a
desirable characteristic.
Time is the facilitators prime consideration. The more failure
modes that need to be considered, the longer the analysis will take.
Experience tells us that we should size the item so that it may be
analyzed in from 5 and 15 three-hour sessions. A well run analysis
averages 6 failure modes per hour. Hence a small analysis would
contain about 90 failure modes while a large one would analyze
about 270. These figures make it apparent that the facilitator must
carefully control the process, lest it flounder by not achieving the
target item analysis in the allotted amount of time. This could
jeopardize the entire RCM initiative.
Page 98
Optimal Maintenance Decisions (OMDEC) Inc 2004
Selecting the significant items
Figure 14-2: Selecting the significant items for analysis
Figure 14-2 depicts the initial significant item selection process.
The criteria of significance and hiddenness dictate which items
need to be analyzed within the RCM project. Priorization of the
analyses lies outside the scope of RCM because it varies according
industry. Variants of RCM (such as Turbo RCM, see Chapter 13.
page 156) provide structured priority systems. Whatever priority
sequence has been chosen, the analysis are scheduled and team
members assigned, taking into account operational and personnel
constraints. The schedule provides a concrete set of objectives and
milestones for the RCM project.
Appendix 3.
Failure finding intervals for complex items (multiple failure
modes and devices)
Failure finding interval for devices with more than one failure
mode.
( )
3 2 1
1 1 1
2
sd sd sd mf
pf
ff
M M M M
M
I
+ +
=
where:
I
ff
= failure finding interval
Page 99
Optimal Maintenance Decisions (OMDEC) Inc 2004
M
pf
= reliability (mean time between failure) of the
protected function
M
mf
= tolerable mean time between multiple failure
M
sd1
= mean time between failure due to failure mode 1 of
the safety device
M
sd2
= mean time between failure due to failure mode 2 of
the safety device
M
sd3
= mean time between failure due to failure mode 3 of
the safety device
Failure finding interval for redundant devices (based on the
linear approximation).
( )
n
mf
pf
sd ff
M
M n
M I
1
1
+
=
where:
n = number of redundant devices of the same kind.
Failure finding interval for voting systems.
( ) ( )
+
=
mf
pf
sd ff
M n
M r r n
M I
!
1 !
Voting systems are usually called k out of n systems,
where:
n = number of sensors in parallel
k = number of sensors needed to activate the safety action
r = number of sensors which must be failed for the safety
system to fail
so: r = n - k + 1
Optimal failure finding interval for parallel redundant devices
where only cost is a factor
( )
n
mf
ff pd
n
sd
off
C n
C M n M
I
1
) 1 (
+
=
where:
C
mf
= average cost of a multiple failure
n = number of redundant safety devices of the same kind.
Appendix 4.
Truck description
Appendix 5.
Page 100
Optimal Maintenance Decisions (OMDEC) Inc 2004
Terminology used:
Appendix 6.
Relationship between hazard, reliability, and density
functions
Appendix 8.
Inherent reliability characteristics
120
Inherent reliability characteristic
Impact on PM applicability and effectiveness
Failure consequences Determine the significance of items for scheduled
maintenance; establish the definition of task
effectiveness; determine default strategy when no
applicable and effective PM task can be found
Visibility of functional failure to
operating crew under normal
circumstances
Determines the need for a failure-finding task to
ensure that failure is detected
Ability to measure/detect
reduced resistance to failure
Determines applicability of on-condition tasks
Rate at which failure resistance
decreases with operating age
once a potential failure
121
occurs
Determines interval for on-condition tasks
Age-reliability relationship Determines applicability of rework and discard tasks
Age-reliability-covariate
relationship
Determines the key risk factors for interpreting on-
condition data.
Cost of corrective maintenance
Helps establish PM task effectiveness, except for
safety and environment impacting failures
Cost of preventive maintenance Helps establish PM task effectiveness (except for
safety and environment impacting failures).
Need for safe-life limits to
prevent safety or environment
failures
Determines applicability and interval of safe-life
discard tasks
Need for servicing and
lubrication
Determines applicability and interval of servicing and
lubrication tasks
Appendix 9.
Failure mode depth of causality
Why
?
Why
?
Why? Why? Why? Why? Why? Why?
Ventil
ation
syste
m
fails
Fan
fails
Motor
fails
Motor trips Airways
clogged with
dirt.
Inadequate
design
Defective
sensor
Bearing
seized
Lubricant
allowed to run
dry
Page 101
Optimal Maintenance Decisions (OMDEC) Inc 2004
Wrong
lubricant
Improperly
labeled
Stores error
Label
misread
Inattention
Insufficient
training
Power
drive
fails
Belts failed Incorrectly
installed
Incorrectly
specified
Distri
butio
n
syste
m
fails
Duct
fails
Duct
clogged
Duct
pierced
Damper
failed
Appendix 10.
Expected failure time
Appendix 11.
Exercise(Example 2 Data validation)
Detailed Explanation Steps to follow
1
In this exercise we will examine some of the
data validation tools in EXAKT.
Download the wheelmotor oil analysis data
from
www.omdec.com/reliability/wheelmotor.zip.
2
Check for logical (chronolgogical sequencing)
errors. Examine the Data Check report. It will
give you an overall picture of the sample, and
indicate errors such as missing beginning or
ending events.
Start EXAKT for Modeling, Maximize
EXAKT Modeling window, File, Open,
Navigate to locate the file
Mar2004CRC_WMOD, Modeling, Select
Current Model, CBM Model: PHM(no OC),
OK, Activate Left pane (Database explorer
pane), Edit, Check Database, Data, Look at
the report, Reduce and Close the Report
3
Executing the instructions on the right should
give you a screen that looks like Figure 9-16
on page 91.
A) Left pane (Database explorer pane),
Open DataCheck table, View, Inspections,
Include Events View, OK
B) Arrange windows and panes so that the
Inspections and Events window covers the
top two-thirds of the screen and the
DataCheck window the bottom third.
Spread the windows so that they span the
entire width of the screen. Spread the
Description column of the DataCheck
window so that you can see as much of it
as possible. (It could take a few tries as the
edge of the column seems to stick and
spring back, so do it slowly.) The top
window should have four panes.
Page 102
Optimal Maintenance Decisions (OMDEC) Inc 2004
4
The tables and views are all in automatic
synchronization. This makes it easy to find and
correct errors, as we shall see in subsequent
steps.
EXAKT has no way of distinguishing between
missing ending events and temporary
inspections. Therefore you will see many
requests to Check whether this history is
temporary suspended or "EF/ES" is missing.
The user makes sure that all such indicated
records correspond to units that are operating
currently. EXAKT will then assume that they are
indeed temporary suspensions. Otherwise the
message means that you are missing an ending
event, either an EF or an ES. You must
manually add the missing record. If the lifetime
corresponding to the message is in fact on
going at the moment, then you must ignore this
message.
DataCheck Window
5
The 5
th
record of the DataCheck table has the
description This record can't be properly
identified. It has the same Ident, Date, WAge,
and Event as the previous record:Id=5503R 2,
Date=...
DataCheck window, Record 5
6
Note that the synchronized corresponding
record (819) is flagged in the Inspections
window and the Events window likewise has its
pointer positioned at record 404.
Inspections window, widen the Date
column so the full date is visible, scroll up 1
row on the scroll bar so that record 818 is
visible
7
Note that record 818 corresponds to an oil
sample taken on the same equipment on the
same day. EXAKT is suspicious about this and is
asking you to verify the dates and working ages
for these two. Maintenance planning personnel
tell us that record 819 must be an error.
Therefore we may delete it.
122
Delete record 819 (by selecting it and
hitting del).
8
Here is a similar type of problem. But in this
case two samples have the same working age
but different calendar dates. EXAKT is not
pleased with this situation and is asking you to
do something about it.
DataCheck window, record 6, Inspections
window, scroll up one row so that records
822 and 823 are visible.
9
Thus does one go systematically through the
database records, as indicated by the
DataCheck table, correcting the anomolies that
are pointed out by EXAKT.
Do not bother making any more corrections
for purposes of this exercise. Close the
Inspections, and DataCheck windows.
10
After following the instructions on the right you
will have reproduced Figure 9-17 on page 92.
View, Cross Graph, maximize window,
Table: Inspections, Horizontal:
WorkingAge, Vertical: SI, Condition:
Si<1000, Show
11
After following the instructions on the right you
will have reproduced Figure 9-18 on page 93.
Horizontal: Fe, Vertical Si, delete
Si<1000, Show, reduce, X
12
Examine the OutputVarScript. It uses a succinct
data query language to conveniently transform
combinations of existing covariates into new
covariates for building and testing risk models.
The *(condition), shown on several lines of
this program, is read where condition true.
The statement of interest is the next to last:
Database explorer pane (left pane),
OutputVarScript, X
Page 103
Optimal Maintenance Decisions (OMDEC) Inc 2004
CorrSi=Si*(Si<>900)+1.2*Fe*(Si=900);
It is telling the program to return the actual
value of Si where Si <>900 and to use 1.2*Fe
where Si=900.
13
After following the instructions on the right you
will have reproduced Figure 9-19 on page 93.
Modelling (on menu bar), Create Model
Input tables, Complete data, View, Cross
Graph, Table: C_Inspections, Horizontal:
Fe, Vertical CorrSI, Show, reduce, X
14
EXAKT handles events (such as oil changes,
adjustments, alignments, calibrations and other
minor maintenance) that impact condition data
in a correct manner. The instructions on the
right will display Figure 9-21 on page 95. It is
often useful to display the events and
inspections in a single table. Not the regularity
of the oil change events.
Modeling (on menu bar), Select Current
Model, CBM Model: PHM(with OC), OK,
Activate Left pane (Database explorer
pane), Modelling (on menu bar), Create
Model Input tables, Complete data,
Database pane, C_Inspections, Scroll to
record 345, reduce and close the
C_Inspections table
15
Executing the instructions on the right will
display a graph similar to that of Figure 9-22
on page 96
Modeling (on menu bar), Select Current
Model, CBM Model: PHM(noHistExcl),
Submodel: FeCorrSed, OK, Procedures
panel, Modeling, Weibull PHM, In Order of
Appearance, close the graph
16
Follow the instructions on the right and when
we scroll down to the last row, we see the
history number of the offending history in
Figure 9-22. The number is found to be 64.
Database pane, Residuals:
PHM(noHistExcl)(FeCorrSed) #1, click on
the Residual column header to order the
records by Residual, scroll down to last
row, note the History Number of 64, close
the table
17
We must identify which history of which unit is
the offending one. Following the instructions on
the right, we can find the history is the 2
nd
history of unit 5509R.
Procedures panel, Decisions, All Histories,
Select History 5501L[1] (That is the first
lifetime of the left wheelmotor of haul truck
5501), hit the DnArrow key 63 times, Close
We need to examine the cause of the offending
history. The instructions on the right reproduce
Figure 9-23 on page 97. From this Figure, we
observe that the cause of offending history is
the unusually high values of Fe and Si not
explained by a failure event. A reasonable
solution to obtain a better fit model is to
assume that a maintenance event was not
properly recorded and to exclude this history
from the model.
Database pane, Inspections, scroll down to
row 2768, X
Page 104
Optimal Maintenance Decisions (OMDEC) Inc 2004
Exercise 4 data smoothing and fixing shape factor to 1
Random fluctuation of monitored condition data characterizes
many otherwise straight-forward CBM applications. In this
exercise we use the monitored pressure test data, which reflects the
deterioration of a sealing system in a nuclear fuel rod manipulating
mechanism. For additional background and details on this
application, you may refer to the document
www.omdec.com/articles/p_paperCandu.html.
1
Download the database files from
www.omdec.com/publications/reliabilty/candu.zip
or from the OMDEC CD.
2
Start EXAKT for Modeling, Maximize EXAKT
Modeling window, File, Open, Navigate to locate
the file candu_WMOD, Data
3
Note the randomness yet increasing
nature (generally rising slope) of the data.
Although it is obvious that the item ages
in a fairly linear fashion, how does one
make a decision at any given inspection if
the data is so erratic? How do we know if
a high reading is due to noise or to a
deteriorating failure mode? The following
steps in EXAKT provide a solution to this
problem.
Activate left (database explorer view) pane,
View, Inspections, OK, Ident drop down list, hit
various idents and observe their corresponding
sets of inspection data, reduce the inspections
window, close (X) the inspections window.
4
We wish to get rid of any randomness
that is irrelevant to risk of failure. EXAKT
provides a way to perform smoothing
transformations of the data. In the
OutputVarScript window you will see a
small program that transforms the
original variable LeakRate into the
transformed variables leakSmooth and
leakSmoothAve. EXAKTs programming
language provides several smoothing
functions. Smooth() and SmoothAve() are
smoothing functions that take parameters
to adjust the way in which they transform
the variables.
Database pane, OutputVarScript, X
(Note that we have defined 4 new variables from
the original LeakRate and WorkingAge variables:
leakSmooth0, leakSmooth, leakSmoothAve0, and
leakSmoothAve
By studying (in the Guide and Manual) the
definitions of the various EXAKT transformation
functions such as Smooth(), SmoothAve(), Last()
and NonDecr() in the manual, you will soon get
to understand how this transformation works.)
5
The instruction on the left generates the
decision graphs of the model built directly
on the original (untransformed) data.
Observe how much randomness there is
in the inspection data. Such randomness
may bias the model and may make it
difficult to clearly apply an optimal
decision.
A) Modeling (on menu bar), Select Current
Model, CBM Model: Seals, Submodel: LR_b1, OK,
Procedures panel, Decisions, Select Ident: 5EH1,
scroll down to last row, shift+8WH4, Report,
Close, maximize the decision graph window, click
full report icon, PageDown or PageUp, reduce
the decision graph window, X
B) Modeling (on Procedures panel), Weibull PHM,
Select Covariates, (note the variable used for
this model LR_b1 is LeakRate), Cancel, Seals
(LR_b1):3, X, Seals (LR_b1):2 X
6
The model LR_Smooth0 uses a variable
that has been smoothed by the Smooth()
function in EXAKT. On the decision
graphs, we observe that we have
Repeat Step 5A but select the submodel
LR_Smooth0 instead of LR_b1
Repeat Step 5B but note the variable used for
Page 105
Optimal Maintenance Decisions (OMDEC) Inc 2004
eliminated the randomness of the
previous submodel. But we have another
problem. We observe a drooping
artifact
123
at the end of every history. This
causes a poor model and a poor decision
recommendation because the current
value of the condition indicator
leakSmooth0 is erroneously low! In step 7
we will correct this problem with a further
transformation.
this model LR_Smooth0 is leakSmooth0, Cancel,
Seals (LR_b1):3, X, Seals (LR_b1):2 X
7
The adjusted smoothed variable produces
a better model and a better decision
recommendation. Note that the
randomness of the data is further reduced
and the drooping artifact has been
corrected.
Repeat Step 5A but this time use the submodel
LR_Smooth
Repeat Step 5B but this time note that the
variable used in the submodel LR_Smooth is
leakSmooth
9
Now that we have seen some techniqes
for pre-processing data to eliminate
confusing noise, we may look more
closely at the model itself. You may be
wondering about the naming convention
we for the model LR_Smooth_b1. The
b1 part of the name indicates that we
have fixed Beta, the shape factor, to 1.
We will proceed to learn why we did this.
Activate left (explorer) pane, Modeling (on menu
bar), Select Current Model, LR_Smooth, OK
10
We note, in carrying out the steps on the
right, that this Submodel LR_Smooth
uses the transformed variable leakSmooth
and that the Fix shape factor to 1
checkbox is unchecked.
Modeling (on Procedures panel), Weibull PHM,
Select Covariates, Cancel
11
Upon executing the steps at the right, we
note that the model is rejected by the
Kolmogorov-Smirnov test. The test is
telling us that the hypothesis that the
model is good (fits the data) must be
rejected.
Residual Analysis, Summary Report, expand and
scroll down. (note that the goodness of fit
hypothesis is rejected), reduce window, X
13
EXAKT has told us in step 8 that working
age is not significant. In fact it is highly
significant, so much so that it correlates
closely with the LeakRate. Thus EXAKT is
really telling us that the LeakRate itself
contains all the information we need, to
establish a good predictive model, and it
is telling us that we should remove the
WorkingAge factor from the model by
setting Shape to 1.
Modeling (on menu bar), Select Current Model,
LR_Smooth_b1, Modeling (on Procedures panel),
Weibull PHM, (note that the shape parameter has
been fixed to 1 for this submodel), Cancel
Residual Analysis, Summary Report, expand and
scroll down. (note that the goodness of fit
hypothesis is not rejected), reduce window, X
14
Similar results can be found for models:
LR_SmoothAve0_b1, and
LR_SmoothAve_b1. You may go ahead
examine these models using the tecniques
you have learned in this exercise
Appendix 12.
Data for RCM Turbo
Table 14-4 RCM Turbo FAILURE MODE ANALYSIS DATA SHEET
Page 106
Optimal Maintenance Decisions (OMDEC) Inc 2004
Productive Unit Code:
Name:
Maintainable Item Code:
Name:
Unit Output Reduction: Total Stoppage Partial Stoppage/Quality No Immediate Effect
No Effect
FM #
Component/Part:
Failure Mode & Effect is:
Root Cause is:
MTBF = Confident Warning Time (98%) = Early Warning
Time (70%)=
Life/Wear: Early Life Mid Life End Life PA Effectiveness (%) =
Category: Design Lubrication Normal Operation Overload
Condition
Review Maintenance Practice Safety/Environmental
Consequence: Total Stoppage Partial Stoppage/Quality No Immediate Effect
No Effect
Characteristic: Definitive Life/Wear Out General Degradation Random-
Constant Probability
Measurability: Easy-Can monitor in fail degrade time Moderately Easy
Impossible
Strategy: CBM FTM OTF
Primary Action Description:
Job Duration: Downtime: Secondary Action Initiator:
Maintainable Item Status: Running Stop Downday
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Material Cost: Consequential Damage Cost:
Estimated Cost of Downtime (if any):
Secondary Action Description:
Job Duration: Downtime:
Maintainable Item Status: Running Stop Downday
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Material Cost: Consequential Damage Cost:
Estimated Cost of Downtime (if any):
Breakdown Action Description:
Page 107
Optimal Maintenance Decisions (OMDEC) Inc 2004
Job Duration: Downtime:
Maintainable Item Status: Running Stop Downday
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Res: Hrs: Crew size: Res: Hrs: Crew size: Res: Hrs: Crew size:
Material Cost: Consequential Damage Cost:
Estimated Cost of Downtime (if any):
Spares (PA, SA, O/H & BD):
Strategy Notes:
Design Notes:
Materials Notes (PA, SA, O/H & BD):
Maintenance Actions/Assumptions (PA, SA, O/H & BD):
Appendix 13.
Default decision diagram answers in the absence of
operating experience
Table 14-5 The default answer to be used in developing an initial
scheduled-maintenance program in the absence of data from actual
operating experience.
Stage at which
question can be
answered
Decision
question
Default
answer to be
used in case
of
uncertainty
Initial
program
(with
default)
Ongoing
program
(operating
data)
Possible
adverse
consequences
of default
condition
Default
consequences
eliminated
with
subsequent
operating
information
IDENTIFICATION OF SIGNIFICANT ITEMS
Is the item
clearly
nonsignificant
No: classify item
as significant
X. X. Unnecessary
analysis
no
EVALUATION OF FAILURE CONSEQUENCES
Is the occurrence
of a failure
evident to the
operating crew
during
performance of
normal duties?
No (except for
critical
secondary
damage):
classify function
as hidden.
X. X. Unnecessary
inspections that
are not cost-
effective
yes
Does the failure
cause a loss of
function or
secondary
Yes: classify
consequences as
critical
X. X. Unnecessary
redesign or
scheduled
maintenance that
No for the
redesign; yes for
scheduled
maintenance
Page 108
Optimal Maintenance Decisions (OMDEC) Inc 2004
damage that
could have a
direct adverse
effect on
operating safety
and the
environment?
is not cost-
effective
Does the failure
have a direct
adverse effect on
operational
capability?
Yes: classify
consequences as
operational
(production )
X. X. Scheduled
maintenance that
is not cost-
effective
yes
EVALUATION OF PROPOSED TASKS
Is an on-
condition task to
detect potential
failures
technically
feasible?
Yes: include on-
condition task in
the program.
X. X. Scheduled
maintenance that
is not cost-
effective
yes
If an on-
condition task is
technically
feasible
(effective), is it
worthwhile?
Yes: assigned
inspection
intervals short
enough to make
the task
effective.
X. X. Scheduled
maintenance that
is not cost-
effective
yes
Is a rework task
to reduce the
failure rate
applicable?
No (unless there
are real and
applicable data):
assign item to no
scheduled
maintenance.
-- X. Delay in
exploiting
opportunity to
reduce costs
yes
If a reworked
task is
applicable, is it
effective?
No (unless there
are real and
applicable data):
assign item
scheduled
maintenance
-- X. Unnecessary
redesign (safety)
or delay in
exploiting
opportunity
No for redesign;
yes for
scheduled
maintenance
Is a discard task
to avoid failures
or reduce the
failure rate
applicable?
No (except for
safe-life items):
assign item to
know scheduled
maintenance
X.
(safe life
only)
X.
(economic
life)
Delay in
exploiting
opportunity to
reduce costs
Yes
If a discarded
task is
applicable, is it
effective?
No (except for
safe-life items):
assign item to
know scheduled
maintenance
X.
(safe life
only)
X.
(economic
life)
Delay in
exploiting
opportunity to
reduce costs
yes
Appendix 14.
Additional Relcode examples
Exercise 3
The cloth filter on a sugar centrifuge is currently replaced
on a preventive basis if a suitable opportunity occurs and
the cloth has been in use for at least 20 hours. The cloth is
Page 109
Optimal Maintenance Decisions (OMDEC) Inc 2004
also replaced on failure. The following data are available
for 10 hour time intervals of cloth life.
Age in
Hours
Failure
Replacements
Preventive
Replacements
0-9.99 14 0
10-19.99 5 0
20-29.99 2 4
30-39.99 1 8
Figure 14-12: Relcode data entry for cloth filters
Figure 14-13
Page 110
Optimal Maintenance Decisions (OMDEC) Inc 2004
Exercise 4
A metropolitan transport company operates a fleet of
similar buses. Engine failures necessitating replacement
have occurred in the kilometer ranges shown in the
following table which also shows the number of engines
currently running in each age range.
Age Range
(Kilometers)
Failure
Replacements
Survivors
0-49,999 2 35
50,000-99,999 8 27
100,000-149,999 33 12
150,000-199,999 44 62
Figure 14-14: Relcode data entry for engines
Page 111
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 14-15
Exercise 5
A new type of car has recently been released and is subject
to warranty. An analysis of warranty claims shows several
alternator failures, although, as a proportion of the whole
population the numbers are quite small.
The available data are as follows:
Age Range
(Kilometers)
Failure
Replacements
Survivors
0-49,999 1 48
50,000-99,999 2 123
100,000-149,999 3 56
150,000-199,999 4 44
Figure 14-16: Relcode data entry for alternator failure warranties
Page 112
Optimal Maintenance Decisions (OMDEC) Inc 2004
Figure 14-17
5.