Risk-Based Maintenance RBM A Quantitativ

Journal of Loss Prevention in the Process Industries 16 (2003) 561–573
www.elsevier.com/locate/jlp
Risk-based maintenance (RBM): a quantitative approach for

maintenance/inspection scheduling and planning
Faisal I. Khan ∗, Mahmoud M. Haddara
Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. John’s, Nfld, Canada A1B 3X5
Abstract
The overall objective of the maintenance process is to increase the profitability of the operation and optimize the total life cycle
cost without compromising safety or environmental issues. Risk assessment integrates reliability with safety and environmental
issues and therefore can be used as a decision tool for preventive maintenance planning. Maintenance planning based on risk
analysis minimizes the probability of system failure and its consequences (related to safety, economic, and environment). It helps
management in making correct decisions concerning investment in maintenance or related field. This will, in turn, result in better
asset and capital utilization.
This paper presents a new methodology for risk-based maintenance. The proposed methodology is comprehensive and quantitative.
It comprises three main modules: risk estimation module, risk evaluation module, and maintenance planning module. Details of
the three modules are given. A case study, which exemplifies the use of methodology to a heating, ventilation and air-conditioning
(HVAC) system, is also discussed.
 2003 Elsevier Ltd. All rights reserved.
Keywords: Maintenance; Risk assessment; Risk-based maintenance; Risk-based inspection; Maintenance planning
1. Introduction maintenance strategy which maximizes availability and

efficiency of the equipment; controls the rate of equip-
The last two decades witnessed major progress in the ment deterioration; ensures a safe and environmentally
development of new maintenance strategies. Progress in friendly operation; and minimizes the total cost of the
the maintenance area has been motivated by the increase operation. This can only be achieved by adopting a struc-
in the number, size, complexity, and variety of physical tured approach to the study of equipment failure and the
assets; growing awareness of the impact of maintenance design of an optimum strategy for inspection and mainte-
on the environment, safety of personnel, the profitability nance.
of the business, and quality of the products. Maintenance management techniques have been
Unexpected failures usually have adverse effects on through a major process of metamorphosis, from focus-
the environment and may result in major accidents. Stud- ing on periodic overhauls to the use of condition moni-
ies by Kletz (1994), Khan and Abbasi (1998), and toring, reliability-centered maintenance, and expert sys-
Kumar (1998) show the close relationship between tems. Most recently, risk-based maintenance
maintenance practices and the occurrence of major acci- methodologies started to emerge.
dents. Profitability is closely related to availability and Chen and Toyoda (1990) proposed a strategy for
reliability of the equipment, while product quality is very maintenance scheduling based on equalizing incremental
much dependent on equipment condition. The major risk. The risk-based inspection and maintenance strategy
challenge for a maintenance engineer is to implement a developed by the American Society of Mechanical
Engineers (1991) was used as a basis for developing a
“base resource document on risk-based inspection” by
the American Petroleum Institute, API (1995).
∗
Corresponding author. Tel.: +1-709-737-8939/7652; fax: +1-709- Work by Aller, Horowitz, Reynolds, and Weber
737-4042. (1995) and Reynolds (1995) constituted the basis for the
E-mail address: fkhan@engr.mun.ca (F.I. Khan). development of a risk-based inspection policy for equip-
0950-4230/$ - see front matter  2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jlp.2003.08.011
562 F.I. Khan, M.M. Haddara / Journal of Loss Prevention in the Process Industries 16 (2003) 561–573
Nomenclature
A system performance loss factor (dimensionless)
B financial loss factor (dimensionless)
C human health loss factor (dimensionless)
D environmental loss factor (dimensionless)
AR area under the damage radius (m2)
AD asset density in the vicinity of the event (up till 苲500 m radius) ($/m2)
PDI population density in the vicinity of the event (up till 苲500 m radius) (persons/m2)
IM importance factor can be derived from Fig. 4 (dimensionless)
i number of events, fire, explosion, toxic release, etc.
t failure time (h)
h characteristics life of the component (scale of Weibull distribution) (h)
β slope or shape factor of Weibull distribution
F(t) failure probability function
PDF1 population distribution factor (dimensionless)
ment owned by Brunei Shell Petroleum (Hagemeijer and uses information obtained from the study of failure
Kerkveld (1998)). modes and their economic consequences.
A risk-based approach has been applied successfully Risk analysis is a technique for identifying, charac-
to the maintenance of oil pipelines. Dey, Ogunlana, terizing, quantifying, and evaluating the loss from an
Gupta, and Tabucanon (1998) discussed a simple risk- event. Risk analysis approach integrates probability and
based model for the maintenance of a cross-country consequence analysis at various stages of the analysis
pipeline. Nessim and Stephens (1998) proposed a quanti- and attempts to answer the following questions:
tative risk analysis model, and recently Dey (2001)
described a more general model for risk-based inspection 앫 What can go wrong that could lead to a system fail-
and maintenance of cross-country pipelines. ure?
The use of a risk-based policy in the maintenance of 앫 How can it go wrong?
medical devices has been tackled by Capuano and Kor- 앫 How likely is its occurrence?
itko (1996) and Ridgway (2001). 앫 What would be the consequences if it happens?
The review of the literature indicates that there is a
new trend to use the level of risk as a criterion to plan In this context, risk can be defined
maintenance tasks. However, most of the previous stud- qualitatively/quantitatively as the following set of
ies focused on a particular equipment type. It seems that duplets for a particular failure scenario.
there is a need for a more generalized methodology that
can be applied to all types of assets irrespective of Risk ⫽ probability of failure
their characteristics. ⫻ consequence of the failure
There is also a need for a more realistic quantification
of risk factors. The quantitative description of risk is Risk assessment can be quantitative or qualitative. The
affected by the quality of the consequence study and the output of a quantitative risk assessment will typically be
accuracy of the estimates of the probability of failure. a number, such as cost impact ($) per unit time. The
This study will focus, among other things, on these two number could be used to prioritize a series of items that
factors. It is hoped that this study will lead to a math- have been risk assessed. Quantitative risk assessment
ematical model that can be used to develop an optimum requires a great deal of data both for the assessment of
maintenance strategy. probabilities and assessment of consequences. Fault tree
or decision trees are often used to determine the prob-
1.1. Concept of risk and its relevance in maintenance ability that a certain sequence of events will result in a
certain consequence. Qualitative risk assessment is less
One of the main objectives of a sound maintenance rigorous and the results are often shown in the form of
strategy is the minimization of hazards, both to humans a simple risk matrix where one axis of the matrix rep-
and to the environment, caused by the unexpected failure resents the probability and the other represents the
of the equipment. In addition, the strategy has to be cost consequences. If a value is given to each of the prob-
effective. Using a risk-based approach ensures a strat- ability and a consequence, a relative value for risk can
egy, which meets these objectives. Such an approach be calculated. It is important to recognize that the quali-
F.I. Khan, M.M. Haddara / Journal of Loss Prevention in the Process Industries 16 (2003) 561–573 563
tative risk value is a relative number that has little mean- (2) risk evaluation, which consists of risk aversion and
ing outside the framework of the matrix. Within the risk acceptance analysis, and
framework of the matrix, it provides a natural prioritiz- (3) maintenance planning considering risk factors.
ation of items assessed using the matrix. However, as
these risk values are subjective, prioritizations based on
these values are always debatable. 3. Module I: risk estimation
The proposed risk-based maintenance (RBM) strategy
aims at reducing the overall risk of failure of the This module comprises four steps, which are logically
operating facilities. In areas of high and medium risk, a linked as shown in Fig. 2. A detailed description of each
focused maintenance effort is required, whereas in areas step is presented below.
of low risk, the effort is minimized to reduce the total
scope of work and cost of the maintenance program in 3.1. Step I.1. Failure scenario development
a structured and justifiable way. The quantitative value
of the risk is used to prioritize inspection and mainte- A failure scenario is a description of a series of events
nance activities. RBM suggests a set of recommen- which may lead to a system failure. It may contain a
dations on how many preventive tasks (including the single event or a combination of sequential events. Usu-
type, means, and timing) are to be performed. The ally, a system failure occurs as a result of interacting
implementation of RBM will reduce the likelihood of an sequence of events. The expectation of a scenario does
unexpected failure. Detailed description of the method- not mean it will indeed occur, but that there is a reason-
ology is presented in subsequent sections. able probability that it would occur. A failure scenario
is the basis of the risk study; it tells us what may happen
2. Risk-based maintenance methodology
The risk-based maintenance methodology is broken

down into three main modules, see Fig. 1:
(1) risk determination, which consists of risk identifi-

cation and estimation,
Fig. 1. Architecture of RBM methodology. Fig. 2. Description of risk estimation module.

so that we can devise ways and means of preventing or flashing, and the rate of evaporation. The models for
minimizing the possibility of its occurrence. Such scen- explosions and fires are used to predict the character-
arios are generated based on the operational characteristics of explosions and fires. The impact intensity mod-
istics of the system; physical conditions under which els are used to predict the damage zones due to fires,
operation occur; geometry of the system, and safety explosion and toxic load. Lastly, toxic gas models are
arrangements, etc. Recently, Khan (2001) has proposed used to predict human response to different levels of
a systematic procedure—maximum credible accident exposures to toxic chemicals. There are many tools
scenario (MCAS)—to evaluate failure (accidents) scen- available to conduct this analysis such as WHAZAN,
arios in a process system. The procedure introduces the MAXCRED, RISKIT, etc. (Khan & Abbasi, 1999a).
concept of maximum credible scenarios as an alternative MAXCRED is one of the recent tools that is built upon
to the current methodology based on the worst-case the latest models of fires, explosions and toxic release
scenario as recommended by many regulatory agencies. and dispersion (Khan & Abbasi, 1999a). The total conse-
The developed failure scenarios are then screened to quence assessment is a combination of four major categ-
short list the ones that are more relevant to the system at ories as described below.
hand. MCAS provides the criteria to form this short list.
3.2.1. System performance loss
3.2. Step I.2. Consequence assessment Factor A accounts for the system’s performance loss
due to component/unit failure. This is estimated semi-
The objective here is to prioritize equipment and their qualitatively based on the expert’s opinion. In this work,
components on the basis of their contribution to a system it is suggested using the following procedure for
failure. For example, in the case of a pressure contain- determining the value of this parameter:
ment, a pinhole leak on a process line may not lead to
Ai ⫽ function (performance) (1)
a total loss of production. This is in contrast to a failure
of a pipe valve which may cause a shut down of the line. Details of the function are given in Table 1.
Consequence analysis involves assessment of likely
consequences if a failure scenario does materialize. 3.2.2. Financial loss
Initially, consequences are quantified in terms of damage Factor B accounts for the damage to the property or
radii (the radius of the area in which the damage would assets and may be estimated for each accident scenario
readily occur), damage to property (shattering of window using the following relations:
panes, caving of buildings) and toxic effects
(chronic/acute toxicity, mortality). The calculated dam- Bi ⫽ (AR)i ⫻ (AD)i / UFL (2)
age radii are later used to assess the effect on human
health, and environmental and production losses. Fig. 3
B⫽ 冘 Bi
i ⫽ 1,n
(3)
illustrates the procedure for this step. The assessment of
consequences involves a wide variety of mathematical where i denotes the number of events, i.e. fire, explosion,
models. For example, source models are used to predict toxic release, etc. The UFL in Eq. (2) signifies the level
the rate of release of hazardous material, the degree of of an unacceptable loss. In the present study, we will
use a value of 1000 for UFL. This value is subjective
and may change from case to case as per an organiza-
tion’s criterion.
3.2.3. Human health loss

A fatality factor is estimated for each accident scen-
ario using the following equations:
PD1 ⫽ PD1 ⫻ PDF1 (4)
Ci ⫽ (AR)i ⫻ (PD1)i / UFR (5)
C⫽ 冘
i ⫽ 1,n
Ci (6)
where UFR denotes an unacceptable fatality rate. The

suggested value for UFR is 10⫺3 (subjective value and
may change from case to case). The PDF1 defines the
population distribution factor, which reflects heterogen-
eity of the population distribution. If the population is
Fig. 3. Consequence assessment chart. uniformly distributed in the region of study (~500 m
Table 1
Quantification scheme for system performance function used in Eq. (1)
Class Description Function (operation)
I 쎲 Very important for system operation 8–10

쎲 Failure would cause system to stop functioning
II 쎲 Important for good operation 6–8
쎲 Failure would cause impaired performance and adverse consequences
III 쎲 Required for good operation 4–6
쎲 Failure may affect the performance and may lead to subsequent failure of the system
IV 쎲 Optional for good performance 2–4
쎲 Failure may not affect the performance immediately but prolonged failure may cause system to fail
V 쎲 Optional for operation 0–2
쎲 Failure may not affect the system’s performance
radius), the factor is assigned a value of 1; if the popu- Con ⫽ [0.25A2 ⫹ 0.25B2 ⫹ 0.25C2 ⫹ 0.25D2]0.5 (9)
lation is localized and away from the point of accident
the lowest value 0.2 is assigned. Values for this para- 3.3. Step I.3. Probabilistic failure analysis
meter have been adapted from the latest work of Hirst
and Carter (2000). Probabilistic failure analysis is conducted using fault
tree analysis (FTA). The use of FTA, together with
3.2.4. Environment and/or ecological loss components’ failure data and human reliability data,
The factor C signifies damage to the ecosystem, which enables the determination of the frequency of occurrence
can be estimated as: of an accident. Developing probabilistic fault trees is
Di ⫽ (AR)i ⫻ (IM)i / UDA (7) made easier using a methodology called “analytical
冘
simulation”, see Khan and Abbasi (2000).
D⫽ Di (8) The key features of this step are:
i ⫽ 1,n
UDA indicates a level for the unacceptable damaging (1) Fault tree development: The top event is identified
area, the suggested value for this parameter is 1000 m2 based on the detailed study of the process, control
(subjective value and may change from case to case). arrangement, and behavior of components of the
IM denotes importance factor. IM is unity if the damage unit/plant. A logical dependency between the causes
radius is higher than the distance between an accident leading to the top event (failure) is developed.
and the location of the ecosystem. This parameter is (2) Boolean matrix creation: The fault tree developed is
quantified using Fig. 4, see Khan and Abbasi (1997). transformed to a Boolean matrix. If the dimension
Finally, these three factors are combined together to of the Boolean matrix is too large to be handled by
yield the factor Con. the available computer, a structural moduling tech-
nique may be applied (Shafaghi, 1988; Yllera,
1988). This technique proposes moduling of the fault
tree into a number of smaller submodules with
dependency relations among them. This reduces the
memory allocation problem as well as makes the
computation faster.
(3) Finding of minimum cutsets and optimization: Mini-
mum cutsets are determined from the Boolean
(Greenberg & Slater, 1992). If the problem has been
structurally moduled, then each module is solved
independently, and the results are combined. The
minimum cutsets are then optimized using an appro-
priate technique. Optimization is necessary in order
to eliminate the unimportant paths (cutsets).
(4) Probability analysis: The optimized minimum cut-
sets are used to estimate probabilities. The present
authors recommend the use of Monte–Carlo simul-
ation method (Rauzy, 1993; Soon, Joo, & Myung,
Fig. 4. Quantification of importance factor (IM). 1985) for this purpose. The simulation methods not
only give the probability of the top event but they estimated risk exceeds the acceptance criteria are ident-
also provide information on the sensitivity of the ified. These are the units that should have an improved
results. In addition, simulation is helpful in studying maintenance plan.
the impact of each of the initiating events. To
increase the accuracy of the computations and
reduce the margin of error due to inaccuracies 5. Module III: maintenance planning
involved in the reliability data of the basic events
(initiating events), we recommend the use of a fuzzy Units whose level of estimated risk exceeds the
probability set (Dubois & Prade, 1980; Lai, acceptance criteria are studied in detail with the objec-
Shenoi, & Fan, 1988; Noma, Tankara, & Asai, 1981; tive of reducing the level of risk through a better mainte-
Tanaka, Fan, Lai, & Toguchi, 1983). Fuzzy prob- nance plan. The details of this analysis are given below,
ability set theory is used in analytical simulation see Fig. 6.
algorithm and coded in PROFAT software (Khan &
Abbasi, 1999b). 5.1. Step III.1. Estimation of optimal maintenance
(5) Improvement index estimation: The improvement duration
index provides a measure of the impact of each root
cause on the final failure event. Improvement indices The individual failure causes are studied to determine
are estimated using the simulation results. To esti- which one affects the probability of failure adversely. A
mate the impact of a root cause, the simulation is reverse fault analysis is carried out to determine the
carried out twice: with and without the cause. The required value of the probability of failure of the root
“improvement index” is then obtained as a measure event. A maintenance plan is then completed.
of the change in the probability of occurrence of the
final event. 5.2. Step III.2. Re-estimation and re-evaluation of risk
3.4. Step I.4. Risk estimation The last step in this methodology aims at verifying
that the maintenance plan developed produces accept-
The results of the consequence and the probabilistic able total risk level for the system.
failure analyses are then used to estimate the risk that
may result from the failure of each unit. In the next mod-
ule, we will show how the estimated risk is evaluated 6. Case study: a maintenance plan for an HVAC
against an acceptance criteria. system
6.1. System description

4. Module II: risk evaluation
Heating, ventilation and air-conditioning (HVAC)
The objective of this module is to evaluate the esti- systems control the temperature, humidity, and total air
mated risk using the methodology explained above. The quality in residential, commercial, and industrial build-
algorithm used is shown in Fig. 5. This evaluation algor- ings. Efficient and failure free operation of HVAC sys-
ithm comprises two steps as detailed below. tem is critical for the safety of patients. A typical HVAC
system is critical for the safety of patients. A typical
4.1. Step II.1. Setting up an acceptance criteria HVAC system consists of various mechanical, electrical,
and electronic components such as motors, compressors,
In this step, we identify the specific risk acceptance pumps, fans, ducts, pipes, thermostats, and switches. A
criteria to be used in our study. To allow for different simplified block diagram of an HVAC system is shown
criteria for the acceptable level of risk depending on the in Fig. 7. To maintain an uninterrupted operation of an
system nature and type, an open-ended methodology has HVAC system requires a plan for early correction of
been used in this study. Different acceptance risk criteria anticipated problems. Further, planned maintenance
are available in the literature, see ALARP (as low as ensures conservation, recovery, and recycle of chloro-
reasonably possible), Dutch acceptance criteria, and fluorocarbon (CFC) and hydrochlorofluorocarbon
USEPA acceptance criteria (Lees, 1996). (HCFC) refrigerants used in systems. The release of
CFCs and HCFCs contributes to the depletion of the
4.2. Step II.2. Risk comparison against acceptance stratospheric ozone layer, which protects plant and ani-
criteria mal life from ultraviolet radiation.
The present case study deals with the analysis of an
In this step, we apply the acceptance criteria to the HVAC system (Wong, 2000) and the development of
estimated risk for each unit in the system. Units whose a maintenance plan to provide efficient and failure free
Fig. 5. Description of risk evaluation module.
operation. The process flow diagram of the HVAC is failure scenario. The results of consequence analysis for
shown in Fig. 7. envisaged failure scenarios of the different units of the
system are presented in Table 3. It is evident from Table
6.2. Risk estimation 3 that the highest impact on the system performance
results from three units: the air supply fan, the EP relay,
6.2.1. Failure scenarios and the freeze protection system.
The complete HVAC system has been divided into 10
different functional units according to their operational 6.2.3. Probabilistic failure analysis
characteristics (Table 2). Two most probable failure An analysis to determine the failure probability distri-
scenarios have been developed for most of the units and bution for each unit is listed in Table 3. For units having
listed in Table 2. These failure scenarios have been sub- more than one failure scenario, the scenarios that have
jected to consequence analysis. the maximum consequences are selected for sub-
sequent analysis.
6.2.2. Consequence analysis Memorial University facility management division has
Consequence analysis has been carried out for envis- been maintaining performance data of all of the compo-
aged failure scenarios for each of the 10 units. The oper- nents of the HVAC system. In the present study, 5 year
ation of an HVAC system does not involve the pro- data have been used (1993–1998) (Wong, 2000). These
cessing of chemicals and the effect of its stoppage cannot data have been used to verify various failure functions
be measured in terms of the lost production. Thus, the (distribution) and it has been observed that two-para-
consequences related to financial and to human health meter Weibull distribution define them the best (Eq. (8)).
loss have been ignored. The focus in this study is on
the consequences related to system performance and the
effect on the environment. The consequences for these
two major classes are combined by applying Eq. (9).
F(t) ⫽ 1⫺exp ⫺ 再冉冊冎t
h
b
(10)
They are then normalized on a scale of 1–10 for each The two parameters, h and b, are estimated for the dif-
ferent units and their subcomponents and are presented

in Tables 4 and 5 (Wong, 2000).
Subsequently, fault trees have been developed for the
envisaged failure scenarios of the different units. Fig. 8
depicts the fault tree for the whole system, while Fig. 9
shows the fault tree for the air supply fan. These fault
trees are used to estimate the probability of occurrence
of failure according to the different scenarios. The
results of this analysis are shown in Tables 4 and 5. It
is evident from Table 4 that the failure of the air supply
fan, the humidifier, the EP relay system, and the damper
motor are the most probable causes for the failure of the
HVAC system.
6.2.4. Risk estimation

The results of the consequence and the probabilistic
analyses are combined to quantify the risk factors.
Tables 4 and 5 provide the values estimated for the risk
factors of the different units of the HVAC system.
The risk for the HVAC system failure is estimated at
1.01 (for 1 year duration), which is far above the accept-
ance level of 1.0E⫺02.
6.3. Risk evaluation
The results in Table 4 show that in order to reduce

the risk of the HVAC system failure, we need to reduce
the probability of failure of the air supply fan, the
humidifier, the EP relay system, and the damper motor.
This will be dealt with through the use of a more effec-
tive maintenance program. For illustration purpose, step-
Fig. 6. Description maintenance planning module.
wise detailed risk calculations for damper motor are
shown in Appendix A.
Fig. 7. Simplified block diagram of a typical HVAC system.

Table 2
Units in a typical HVAC system
Unit number in Fig. 8 Unit name Failure scenarios
1 Outdoor louver 쎲 Louver is blocked by foreign material

쎲 Louver is damaged or removed
2 Damper motor 쎲 Failed to allow fresh air intake during system operation
쎲 Failed to stop fresh air to HVAC during system shut down
3 Air filter unit Filter failed to remove particles from intake air
쎲 Pre-filter failed
쎲 Main filter failed
4 Freeze protection unit 쎲 Unit failed to operate on demand
5 Heating unit 쎲 Failed to provide adequate heating
쎲 Provide excess heating
6 Cooling unit 쎲 Cooling coil failed to provide adequate cooling
쎲 Coiling coil provide excess cooling
7 Humidification unit 쎲 Failed to supply adequate moisture
쎲 Supplied excess moisture
8 EP relay unit 쎲 Failed to provide enough control air
쎲 Failed to energize the final control element (electric control system)
9 Computer control unit 쎲 Control system failed
10 Air supply fan 쎲 Failed to supply adequate condition air with acceptable noise level
approach suggested in this work is explained briefly in

this section using the air supply fan unit as an example.
앫 A value of the probability of the top event on the fault

tree of the unit is determined. This value is chosen
such that the resulting risk meets the risk acceptance
criterion. In the case of air supply fan system, the
value of risk is 苲1.0E⫺03.
앫 Using the value of the probability of failure of the
top event, a reverse fault tree analysis is conducted
to determine the required probabilities of the root
events. The probability of failure of a root event is
then used to estimate the time interval between con-
secutive inspection/maintenance tasks. Using this
analysis, we were able to determine the values for the
time intervals between consecutive maintenance tasks
of 41 days for external accessories such as fan belts,
etc., and 75 days for internal accessories such as fan
bearing, etc.
This exercise was repeated for the other units of the

system and the estimates determined for the maintenance
intervals are given in Table 6.
These values are then used to develop a maintenance
plan using the RBM methodology as shown in Table 6.
Fig. 8. Fault tree for air supply fan failure scenario; numbers are
explained in Table 2.
7. Summary and conclusion

6.4. Maintenance planning
Maintenance is aimed at increasing the availability of
One of the objectives of this study is to develop a any system taking account of safety or environment
technique to design maintenance plans for reducing the issues and optimizing total life cycle cost. Risk assess-
level of risk resulting from the failure of a system. The ment integrates reliability analysis with safety and
Table 3
Results of consequence analysis for different accident scenarios
Unit name Failure scenarios Consequence analysis results
Outdoor louver 쎲 Louver is blocked by foreign material 6

쎲 Louver is damaged or removed 4
Damper motor 쎲 Failed to allow fresh air intake during system operation 6
쎲 Failed to stop fresh air to HVAC during system shut down 4
Air filter unit Filter failed to objects/particles from intake air
쎲 Pre-filter failed 4
쎲 Main filter failed 4
Freeze protection unit 쎲 Units failed to operate on demand 8
Heating unit 쎲 Failed to provide adequate heating 4
쎲 Provide excess heating 3
Cooling unit 쎲 Cooling coil failed to provide adequate cooling 4
쎲 Coiling coil provided excess cooling 3
Humidification unit 쎲 Failed to supply adequate moisture 5
쎲 Supplied excess moisture 4
EP relay unit 쎲 Failed to provide enough control air 6
쎲 Failed to energize the final control element (electric control system) 8
Computer control unit 쎲 Control system failed 6
Air supply fan 쎲 Failed to supply adequate conditional air with acceptable level of noise 8
HVAC system 쎲 Fail to perform as desired 5
Table 4
Results of risk estimation module; units in italicized exceeding the acceptance level
Unit name h b Probability of failure in 1 year Risk factor
Outdoor louvera Not available Not available 1.0E⫺04 6.0E⫺04

Damper motor 51,996.6 3.85 1.05E⫺03 6.3E⫺03
Air filter unita Not available Not available 1.0E⫺04 5.0E⫺04
Freeze protection unita Not available Not available 1.0E⫺04 6.0E⫺04
Heating unita Not available Not available 1.0E⫺04 4.0E⫺04
Cooling unita Not available Not available 1.0E⫺04 4.0E⫺04
Humidification unit 57,608.6 2.99 3.58E⫺03 1.8E⫺02
EP relay unit 69,366.5 3.04 1.86E⫺03 1.5E⫺02
Computer control unita Not available Not available 1.0E⫺04 8.0E⫺04
Air supply fan See Table 5 See Table 5 0.1965 1.57
Overall HVAC system failure as per Fig. 8 0.2021 1.01
a
Failure data for these units were not available, as they did not ever fail on operation; the failure probability for these units is adopted from
the literature (Lees, 1996).
Table 5
Details of air supply fan failure
Component number used in Fig. 9 Unit name h b Probability of failure in 1 year Risk factor
1 Fan belt failure 20,464.3 2.146 1.49E⫺01 1.192

2 Vortex vanes failed 68,043.1 1.638 3.42E⫺02 2.73E⫺01
3 Fan bearing failed 62,328.5 2.466 7.88E⫺03 6.3E⫺02
4 Fan assembly failed 121,417.4 2.035 4.74E⫺03 3.8E⫺02
5 Fan motor failed 132,780.2 1.712 9.47E⫺03 7.6E⫺02
Air supply fan failure as per Fig. 9 0.1965 1.572
environmental issues. Risk-based maintenance attempts 앫 How can it cause the system to fail?
to answer five important questions related to integrity 앫 What would be the consequences if it fails?
and fault free operation of the system: 앫 How probable is it to occur?
앫 How frequent an inspection/maintenance of what
앫 What can cause the system to fail? components would avert such failure?
quence assessment, (iii) probabilistic failure analysis,

and (iv) risk estimation.
This paper illustrates the applicability of the proposed
methodology by applying it to a HVAC system. Initially,
the complete HVAC system is divided into 10 different
units. Among these units, four—damper motor, freeze
protection unit, EP relay unit, and supply air fan—were
identified to be most risky, and contributing the
maximum in the overall risk of HVAC failure. An
inspection/maintenance schedule has been worked for all
four units. It is further demonstrated analytically that the
implementation of this inspection/maintenance schedule
would bring down the high level of unacceptable risk to
an acceptable level.
Appendix A. Detailed calculations for damper

motor unit of HVAC system
Fig. 9. Fault tree for air supply fan failure scenario; numbers are
explained in Table 5. A.1. Failure scenarios
Two scenarios are envisaged for this unit, they are:

Having known the answers to these five questions, it
is safe to say that maintenance planning based on risk
앫 Scenario 1: failed to allow fresh air
analysis is expected to provide cost effective mainte-
앫 Scenario 2: failed to stop fresh air to HVAC during
nance, which minimizes the consequences (related to
shut down.
safety, economic, and environment) of a system
outage/failure. This will, in turn, result in a better asset
and capital utilization. Risk-based maintenance stra-
tegies can be used to improve the existing maintenance A.2. Consequence analysis
policies through optimal decision procedures in different
phases of the life cycle of a system.
This paper presents a new methodology for risk-based
maintenance. The proposed methodology is more com- A.2.1. Scenario 1
prehensive and quantitative. It comprises three main
modules: (i) risk estimation module, (ii) risk evaluation System performance loss is 100%, A = 10
module, and maintenance planning module. Each mod- No financial loss, B = 0
ule consists of many steps, i.e. risk estimation module Due to non availability of fresh air serious human health
involves: (i) failure scenario development, (ii) conse- effects, C = 6
Table 6
Results of optimal maintenance duration computations
Unit name Optimal maintenance duration Revised frequency of failure (year⫺1) Un-revised risk factor Revised risk factor
(days)
Damper motor 132 2.1E⫺05 6.2E⫺03 1.26E⫺04

Humidification unit 172 3.77E⫺04 1.8E⫺02 1.88E⫺03
EP relay unit 81 1.91E⫺05 1.5E⫺02 1.57E⫺04
Air supply fan
쎲 Fan belt and vortex 41 3.42E⫺03 1.57 2.73E⫺02
쎲 Fan bearing, etc. 75
Overall HVAC risk prior to implementing maintenance plan 1.01

Overall HVAC risk after implementing maintenance plan 2.2E⫺02
No environmental and ecological loss, D = 0 two level hierarchical structure to equalize incremental risk. IEEE
Con = [0.25 × 102 + 0.25 × 62]0.5 = 5.83 = 6 Truncations on Power Systems, 5(4), 1510–1561.
Dey, P. M. (2001). A risk-based model for inspection and maintenance
of cross-country petroleum pipeline. Journal of Quality in Mainte-
nance Engineering, 7(1), 25–41.
A.2.2. Scenario 2 Dey, K. P., Ogunlana, S. O., Gupta, S. S., & Tabucanon, M. T. (1998).
A risk-based maintenance model for cross-country pipelines. Cost
Engineering, 40(4), 24–31.
Significant loss of system performance, A = 8 Dubois, D., & Prade, H. (1980). Fuzzy sets and systems: Theory and
No financial loss, B = 0 applications. New York: Academic Press.
Moderately serious human health effects, C = 4 Greenberg, H. R., & Slater, B. B. (1992). Fault tree and event tree
No environmental and ecological loss, D = 0 analysis. New York: Van Nostrand Reinhold.
Con = [0.25 × 82 + 0.25 × 42]0.5 = 4.47 = 4 Hagemeijer, P. M., & Kerkveld, G. (1998). A methodology for risk-
based inspection of pressurized systems. Proceedings of the Insti-
tute of Mechanical Engineers, Part E, 212, 37–47.
Final consequence results = maximum of 4 and 6 = Hirst, I. L., & Carter, D. A. (2000). A “Worst Case” methodology
6 for risk assessment of major accident installations. Process Safety
Progress, 19(2), 78–82.
Khan, F. I. (2001). Maximum credible accident scenario for realistic
and reliable risk assessment. Chemical Engineering Progress, Nov-
A.3. Probabilistic failure analysis ember, 55–67.
Khan, F. I., & Abbasi, S. A. (1997). Accident hazard index: A multi-
Failure probability in 1 year of operation attribute scheme for process industry hazard rating. Institution of
b 3.85
Chemical Engineers (IChemE) of UK (Environmental Protection
Failure probability ⫽ 1⫺e⫺(t/h) ⫽ 1⫺e(365×24/51966.6) and Safety), IChemE, UK, 75B, 217.
Khan, F. I., & Abbasi, S. A. (1998). Safe maintenance practice. Chemi-
⫽ 1.05 ⫻ 10⫺3 cal Industry Digest, March, 91–105.
Khan, F. I., & Abbasi, S. A. (1999a). MAXCRED—a new software
package for rapid risk assessment in chemical process industries.
Environment Modeling and Software, 14, 11–25.
A.4. Risk estimation Khan, F. I., & Abbasi, S. A. (1999b). PROFAT: A user-friendly system
for probabilistic fault tree analysis. Process Safety Progress, 18(1),
42–49.
Risk factor due to damper motor = 6×1.05 × 10⫺3 = Khan, F. I., & Abbasi, S. A. (2000). Analytical simulation and PRO-
6.3 × 10⫺3 FAT II: A new methodology and a computer automated tool for
Total calculated risk of the HVAC system = 1.01 fault tree analysis in chemical process industries. Journal of Haz-
ardous Materials, 75, 1–27.
Kletz, T. A. (1994). What went wrong. Houston, TX: Gulf Publi-
cation House.
A.5. Risk evaluation and maintenance planning Kumar, U. (1998). Maintenance strategies for mechanized and auto-
mated mining systems: a reliability and risk analysis based
HVAC Target risk = 2.2 × 10⫺2 approach. Journal of Mines, Metals and Fuels, Annual review,
343–347.
Target risk calculated for damper motor based on
Lai, F. S., Shenoi, S., & Fan, L. T. (1988). Fuzzy fault tree analysis
HVAC target risk and reverse fault tree analysis = 2.1 theory and applications. In Kandel, & Avni (Eds.), (pp. 139–167).
× 10⫺5 Engineering risk and hazard assessment, vol. 1. Florida: CRC
Based on target risk, preventive maintenance time, Press Inc.
132 days. Lees, F. P. (1996). Loss prevention in chemical process industries, vol.
1. London: Butterworths.
Nessim, M., & Stephens, M. (1998). Quantitative risk-analysis model
guides maintenance budgeting. Pipe Line and Gas Industry, 81(6),
References 1–33.
Noma, K., Tankara, H., & Asai, K. (1981). Fault tree analysis with
Aller, J. E., Horowitz, N. C., Reynolds, J. T., & Weber, B. J. (1995). fuzzy probability. Journal of Ergonomics, 17, 291–297.
Risk based inspection for petrochemical industry. In Risk and safety Rauzy, A. (1993). New algorithms for fault tree analysis. Reliability
assessment where is the balance? New York: American Society of Engineering and System Safety, 40, 203–211.
Mechanical Engineers. Reynolds, J. T. (1995). Risk based inspection improves safety of press-
API (1995). Base resource document on risk based inspection for API ure equipment. Oil and Gas Journal, special 16 January issue.
committee on refinery equipment. Washington, DC: American Pet- Ridgway, M. (2001). Classifying medical devices according to their
roleum Institute. maintenance sensitivity: A practical, risk-based approach to PM
ASME (1991). Research task force on risk based inspection guidelines, program management. Biomedical Instrumentation and Tech-
risk based inspection development of guidelines. In General docu- nology, May/June, 167–176.
ment CRTD 20-1. Washington, DC: American Society of Mechan- Shafaghi, A. (1988). Structure modeling of process systems for risk
ical Engineers. and reliability analysis. In Kandel, & Avni (Eds.), (pp. 45–64).
Capuano, M., & Koritko, S. (1996). Risk oriented maintenance. Biom- Engineering risk and hazard assessment, vol. 2. Florida: CRC
edical Instrumentation and Technology, January/February, 25–37. Press Inc.
Chen, L. N., & Toyoda, J. (1990). Maintenance scheduling based on Soon, H. C., Joo, Y. P., & Myung, K. K. (1985). The Monte–Carlo
method without sorting for uncertainty propagation analysis in Wong, D. (2000). A knowledge-based decision support system in
PRA. Reliability Engineering, 10, 233. reliability-centered maintenance of HVAC systems. PhD thesis,
Tanaka, H., Fan, L. T., Lai, F. S., & Toguchi, K. (1983). Fault tree Memorial University of Newfoundland, St. John’s, Canada.
analysis by fuzzy probability. IEEE Transactions on Reliability, R-
32, 453–456.

Risk-Based Maintenance RBM A Quantitativ

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Risk-Based Maintenance RBM A Quantitativ

Загружено:

Авторское право:

Доступные форматы

Journal of Loss Prevention in the Process Industries 16 (2003) 561–573

Risk-based maintenance (RBM): a quantitative approach for

1. Introduction maintenance strategy which maximizes availability and

2. Risk-based maintenance methodology

The risk-based maintenance methodology is broken

(1) risk determination, which consists of risk identifi-

Fig. 1. Architecture of RBM methodology. Fig. 2. Description of risk estimation module.

3.2.3. Human health loss

where UFR denotes an unacceptable fatality rate. The

Class Description Function (operation)

I 쎲 Very important for system operation 8–10

6.1. System description

Fig. 5. Description of risk evaluation module.

ferent units and their subcomponents and are presented

6.2.4. Risk estimation

6.3. Risk evaluation

The results in Table 4 show that in order to reduce

Fig. 7. Simplified block diagram of a typical HVAC system.

Unit number in Fig. 8 Unit name Failure scenarios

1 Outdoor louver 쎲 Louver is blocked by foreign material

approach suggested in this work is explained briefly in

앫 A value of the probability of the top event on the fault

This exercise was repeated for the other units of the

7. Summary and conclusion

Unit name Failure scenarios Consequence analysis results

Outdoor louver 쎲 Louver is blocked by foreign material 6

Unit name h b Probability of failure in 1 year Risk factor

Outdoor louvera Not available Not available 1.0E⫺04 6.0E⫺04

1 Fan belt failure 20,464.3 2.146 1.49E⫺01 1.192

quence assessment, (iii) probabilistic failure analysis,

Appendix A. Detailed calculations for damper

Two scenarios are envisaged for this unit, they are:

Damper motor 132 2.1E⫺05 6.2E⫺03 1.26E⫺04

Overall HVAC risk prior to implementing maintenance plan 1.01

Вам также может понравиться