Multi-State System Reliability Analysis and Optimization For Engineers and Industrial Managers (UGF-Universal Generating Function Method)

Multi-state System Reliability Analysis and
Optimization for Engineers and Industrial Managers

Anatoly Lisnianski Ilia Frenkel Yi Ding
Multi-state System
Reliability Analysis and
Optimization for Engineers
and Industrial Managers
123
Anatoly Lisnianski, PhD Ilia Frenkel, PhD
The Israel Electric Corporation Ltd Shamoon College of Engineering
Planning, Development and Technology Industrial Engineering and Management
Division Department
The System Reliability Department Center for Reliability and Risk Management
New Office Building Bialik/Basel Sts.
st. Nativ haor, 1. Beer Sheva 84100
Haifa, P.O.Box 10 Israel
Israel iliaf@sce.ac.il
anatoly-l@iec.co.il
lisnians@zahav.net.il
Yi Ding, PhD
Nanyang Technological University
School of Electrical and Electronic
Engineering
Division of Power Engineering
Singapore
dingyi@ntu.edu.sg
ISBN 978-1-84996-319-0 e-ISBN 978-1-84996-320-6

DOI 10.1007/978-1-84996-320-6
Springer London Dordrecht Heidelberg New York
British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2010932023
Springer-Verlag London Limited 2010
MATLAB is a registered trademark of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA,
01760-2098 USA, www.mathworks.com
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be
reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of
the publishers, or in the case of reprographic reproduction in accordance with the terms of licences
issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms
should be sent to the publishers.
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of
a specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the
information contained in this book and cannot accept any legal responsibility or liability for any errors
or omissions that may be made.
Cover design: eStudioCalamar, Figueres/Berlin
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

To my wife Ella
Anatoly Lisnianski
To my wife Tania
Ilia Frenkel
To my parents Tao and Jinhong

Yi Ding
Preface
Most books on reliability theory are devoted to traditional binary reliability mod-
els allowing for only two possible states for a system and for its components: per-
fect functionality (up) and complete failure (down). Many real-world systems are
composed of multi-state components that have different performance levels and
several failure modes with various effects on the entire system performance. Such
systems are called multi-state systems (MSSs). Examples of MSS are power sys-
tems, communication systems, and computer systems where the system perform-
ance is characterized by generating capacity, communication, or data processing
speed, respectively. In real-world problems of MSS reliability analysis, the great
number of system states that need to be evaluated makes it difficult to use tradi-
tional binary reliability techniques. Since the mid 1970-s and to the present day
numerous research studies have been published that focus on MSS reliability.
This book is the second one devoted to MSS reliability. The first book devoted
to MSS reliability and optimization was published in 2003 by A. Lisnianski
and G. Levitin, Multi-State System Reliability. Assessment, Optimization and Ap-
plications. World Scientific. Almost 7 years have passed and the MSS extension
of classical binary-state reliability theory has been intensively developed during
this time. More than 100 new scientific papers in the field have been published
since that time; special sessions devoted to MSS reliability have been organized at
international reliability conferences (Mathematical Methods in ReliabilityMMR,
European Safety and Reliability ConferencesESREL, etc.). Additional experi-
ence has also been gathered from industrial settings. Thus, recently MSS reliabil-
ity has emerged as a valid field not only for scientists and researchers, but also for
engineers and industrial managers.
The aim of this book is to provide a comprehensive, up-to-date presentation of
MSS reliability theory based on current achievements in this field and to present a
variety of significant case studies that are interesting for both engineers and indus-
trial managers.
New theoretical issues (that were not presented previously), including com-
bined random process methods and a universal generating function technique, sta-
tistical data processing for MSS, reliability analysis of aging MSS, methods for
calculation of reliability-associated cost for MSS, fuzzy MSS, etc., are described.
The book presents important practical problems such as life cycle cost analysis
and optimal decision making (redundancy and maintenance optimization, optimal
viii Preface
resources allocation) for real-world MSS. Numerous examples are included in

each section in order to illustrate the presented mathematical tools. Besides theo-
retical examples, real-world MSSs (such as power systems, air conditioning sys-
tems, production systems, etc.) are considered as case studies. Reliability is
money! is a main thesis of the book and all theoretical issues are presented from
this point of view.
The authors anticipate that the book will be attractive for researchers, practical
engineers, and industrial managers in addressing issues related to reliability and
performability analysis. In addition, it will be a helpful textbook for undergraduate
and graduate courses in several departments including industrial engineering, elec-
trical engineering, mechanical engineering, and applied mathematics. The book is
self-contained and does not require the reader to use other books or papers.
It should be noticed that it is impossible to describe all the achievements in the
field in a single book. Naturally some interesting results remained outside of the
books scope. In such cases the authors provide the readers with the corresponding
references.
There are eight chapters in this book.
Chapter 1 introduces the MSSs as en object of study. It defines generic model
and describes the basic properties of MSSs. This chapter also introduces reli-
ability indices used in MSSs and presents different examples of MSSs in nature
and in engineering.
Chapter 2 is devoted to modern stochastic process methods for MSS reliability
assessment. It presents Markov models of multi-state elements and an entire
system and methods for calculation of MSS reliability measures. It also de-
scribes Markov reward models as a basic technique for computation of all MSS
reliability measures and reliability-associated costs. A combined performance-
demand model is presented for reliability assessment of MSSs with random
variable demand. The chapter includes a basic description of embedded
Markov chains and semi-Markov processes, main properties, and equations for
evaluating important characteristics of the processes. The general semi-Markov
models for reliability analysis are presented both for single units and entire
MSSs. It is shown how the restrictions corresponding to Markov models can be
essentially relaxed by using semi-Markov process technique.
Chapter 3 is devoted to the statistical analysis of reliability data for real-world
MSSs. It presents the basic Markov model of a MSS with observed failure and
repair data and describes point estimating for transition intensities (fail-
ure/repair rates) as well as confidence intervals.
Chapter 4 is devoted to universal generating function (UGF) method. It con-
tains a basic theory of UGF, key definitions, techniques, and illustrative exam-
ples. It demonstrates how one can find MSS reliability measures based on sys-
tem UGF representation. A technique is demonstrated allowing the entire MSS
UGF (u-function) to be obtained using the UGF of individual system compo-
nents for different types of systems. In addition, Chapter 4 details the methods
for evaluating the importance of element in MSS. It provides effective tools for
Preface ix
importance analysis in complex MSSs with different physical natures of per-

formance. An application of the UGF technique to estimating the lower and
upper bounds of reliability indices for the continuous-state system is also pre-
sented in the chapter.
Chapter 5 presents combined UGF and stochastic process method as a universal
tool for overcoming the problem of dimension curse. Based on this method,
the reliability analysis of complex MSSs with redundancy is performed. It
proves that redundancy in MSSs is essentially different than in binary-state sys-
tems. Dynamic reliability assessment for interconnected power systems is pre-
sented as a case study.
Chapter 6 is devoted to reliability-associated cost evaluation for MSSs and op-
timal management decision making. It introduces reliability (unreliability)-
associated cost as a main part of the life cycle cost for any repairable MSS. The
history of the life cycle cost analysis, its principles, and recent standards are de-
scribed. The chapter reveals that incorrect management is the main reason be-
hind great financial losses. Methods for optimal management decision making
based on reliability-associated cost analysis are presented in the chapter.
Chapter 7 introduces aging MSSs and describes methods for their reliability
evaluation. It considers a problem of aging in a MSS compared with a binary-
state system aging and presents methods for reliability-associated cost assess-
ment for aging MSSs. Corrective and preventive maintenance models are con-
sidered for aging MSSs. A real aging air conditioning system is considered as a
case study.
Chapter 8 introduces the basic theory of fuzzy multi-state system (FMSS), ba-
sic definitions, and concepts. It is shown that it provides a useful tool to com-
plement conventional MSS reliability theories, where the state probability
and/or the state performance level (rate) of a system component cannot be ex-
actly determined but can be represented as fuzzy values. Corresponding nu-
merical examples are presented in order to illustrate the methods.
The genetic algorithm (GA) is used as the universal optimization technique. Its
comprehensive description is presented in Appendix A.
Appendix B presents parameter estimation and hypothesis testing for the non-
homogeneous Poisson process that is used in reliability analysis for aging
MSSs.
Appendix C presents corresponding MATLAB codes.
The authors shared the work in writing this book.
The preface was written by all the authors.
Chapter 1 was written by Drs. Lisnianski and Frenkel.
Chapters 24 were written by Dr. Lisnianski.
Chapter 5 was written by Drs. Lisnianski and Ding.
Chapter 6 was written by Drs. Lisnianski and Frenkel.
Chapter 7 and Appendices B and C were written by Dr. Frenkel.
Chapter 8 and Appendix A were written by Dr. Ding.
x Preface
We would like to express our sincere appreciation to our teachers and friends
Prof. Igor Ushakov, founder of the International Group on Reliability The Gne-
denko e-Forum and Prof. Eliyahu Gertsbakh from the Ben Gurion University, Is-
rael. Their works and ideas had a great impact on our book. We would like to
thank our colleagues Dr. L. Khvatskin from SCEShamoon College of Engineer-
ing, Israel, Dr. G. Levitin, Dr. D. Elmakis, Dr. H. BenHaim, and Dr. D. Laredo
from the Israel Electric Corporation, Prof. M. Zuo from the University of Alberta,
Canada, Prof. L. Goel and Prof. P. Wang from Nanyang Technological University,
Singapore, for their friendly support and discussions from which this book bene-
fited.
We would also like to thank the SCEShamoon College of Engineering (Is-
rael), and its president, Prof. J. Haddad and the SCE Industrial Engineering and
Management Department and its dean Prof. Z. Laslo for providing a supportive
and intellectually stimulating environment. We also give thanks to the Internal
Funding Program of SCE for partially supporting our research work.
It was a pleasure working with the Springer senior editorial assistant, Ms.
Claire Protherough.
Anatoly Lisnianski
Israel Electric Corporation Limited, Haifa, Israel
Ilia Frenkel
SCEShamoon College of Engineering, Beer Sheva, Israel
Yi Ding
Nanyang Technological University, Singapore
December 2009
Contents
1 Multi-state Systems in Nature and in Engineering....................................... 1

1.1 Multi-state Systems in the Real World: General Concepts.......................... 1
1.2 Main Definitions and Properties ..................................................................8
1.2.1 Generic Multi-state System Model......................................................8
1.2.2 Main Properties of Multi-state Systems ............................................ 13
1.3 Multi-state System Reliability and Its Measures ....................................... 16
1.3.1 Acceptable and Unacceptable States. Failure Criteria.......................16
1.3.2 Relevancy and Coherency in Multi-state System Reliability
Context ....................................................................................................... 17
1.3.3 Multi-state Systems Reliability Measures ......................................... 18
References........................................................................................................27
2 Modern Stochastic Process Methods for Multi-state System Reliability

Assessment ..................................................................................................... 29
2.1 General Concepts of Stochastic Process Theory........................................ 30
2.2 Markov Models: Discrete-time Markov Chains ........................................ 34
2.2.1 Basic Definitions and Properties ....................................................... 34
2.2.2 Computation of n-step Transition Probabilities and State
Probabilities................................................................................................ 36
2.3 Markov Models: Continuous-time Markov Chains ................................... 40
2.3.1 Basic Definitions and Properties ....................................................... 40
2.3.2 Markov Models for the Evaluating Reliability of Multi-state
Elements..................................................................................................... 48
2.3.3 Markov Models for Evaluating the Reliability of Multi-state
Systems ...................................................................................................... 66
2.4 Markov Reward Models ............................................................................79
2.4.1 Basic Definition and Model Description ...........................................79
2.4.2 Computation of Multi-state System Reliability Measures Using
Markov Reward Models.............................................................................84
2.5 Semi-Markov Models ................................................................................99
2.5.1 Embedded Markov Chain and Definition of Semi-Markov
Process......................................................................................................100
xii Contents
2.5.2 Evaluation of Reliability Indices Based on Semi-Markov

Processes ................................................................................................... 105
References ..................................................................................................... 113
3 Statistical Analysis of Reliability Data for Multi-state Systems .............. 117

3.1 Basic Concepts of Statistical Estimation Theory..................................... 117
3.1.1 Properties of Estimators ................................................................. 118
3.1.2 Main Estimation Methods .............................................................. 120
3.2 Classical Parametric Estimation for Binary-state System........................ 127
3.2.1 Basic Considerations ...................................................................... 127
3.2.2 Exponential Distribution Point Estimation..................................... 128
3.2.3 Interval Estimation for Exponential Distribution ........................... 131
3.3 Estimation of Transition Intensities for via Output Performance
Observations .................................................................................................. 132
3.3.1 Multi-state Markov Model and Observed Reliability Data.
Problem Formulation .............................................................................. 132
3.3.2 Method Description........................................................................ 135
3.3.3 Algorithm for Point Estimation of Transition Intensities
for Multi-state Systems ........................................................................... 138
3.3.4 Interval Estimation of Transition Intensities for Multi-state
System..................................................................................................... 139
References ..................................................................................................... 142
4 Universal Generating Function Method..................................................... 143

4.1 Mathematical Fundamentals .................................................................... 143
4.1.1 Generating Functions ..................................................................... 144
4.1.2 Moment Generating Functions and the z-transform ....................... 148
4.1.3 Universal Generating Operator and Universal Generating
Function .................................................................................................. 152
4.1.4 Generalized Universal Generating Operator .................................. 155
4.1.5 Universal Generating Function Associated with Stochastic
Processes ................................................................................................. 158
4.2 Universal Generating Function Technique .............................................. 159
4.2.1 Like-term Collection and Recursive Procedure.............................. 159
4.2.2 Evaluating Multi-state System Reliability Indices Using Universal
Generating Functions .............................................................................. 162
4.2.3 Properties of Composition Operators ............................................. 167
4.2.4 Universal Generating Function of Subsystems with Elements
Connected in Series................................................................................. 170
Connected in Parallel .............................................................................. 172
4.2.6 Universal Generating Function of Series-parallel Systems ............ 175
4.2.6 Universal Generating Function of Systems with Bridge
Structure .................................................................................................. 178
Contents xiii
4.3 Importance and Sensitivity Analysis Using Universal Generating

Function .........................................................................................................183
4.4 Estimating Boundary Points for Continuous-state System Reliability
Measures ........................................................................................................ 188
4.4.1 Discrete Approximation ................................................................. 189
4.4.2 Boundary Point Estimation ............................................................ 193
References...................................................................................................... 198
5 Combined Universal Generating Function and Stochastic Process

Method................................................................................................................201
5.1 Method Description ................................................................................. 202
5.1.1 Performance Stochastic Process for Multi-state Element............... 202
5.1.2 Multi-state System Reliability Evaluation...................................... 207
5.2 Redundancy Analysis for Multi-state Systems ........................................ 214
5.2.1 Introduction .................................................................................... 214
5.2.2 Problem Formulation...................................................................... 216
5.2.3 Model Description.......................................................................... 218
5.2.4 Algorithm for Universal Generating Function Computation for
Entire Multi-state System........................................................................ 226
5.2.5 Reliability Measures Computation for Entire Multi-state System..228
5.3 Case Studies............................................................................................. 228
References...................................................................................................... 234
6 Reliability-associated Cost Assessment and Management Decisions

for Multi-state Systems ..................................................................................... 237
6.1 Basic Life Cycle Cost Concept ................................................................ 238
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis ......... 242
6.2.1 Case Study 1: Air Conditioning System........................................ 243
6.2.2 Case Study 2: Feed Water Pumps for Power Generating Unit ....... 257
6.3 Practical Cost-reliability Optimization Problems for Multi-state
Systems .......................................................................................................... 265
6.3.1 Multi-state System Structure Optimization .................................... 265
6.3.2 Single-stage Expansion of Multi-state Systems .............................270
References......................................................................................................272
7 Aging Multi-state Systems .............................................................................273

7.1 Markov Model and Markov Reward Model for Increasing Failure Rate
Function ......................................................................................................... 273
7.1.1 Case Study: Multi-state Power Generating Unit ............................ 275
7.2 Numerical Methods for Reliability Computation for Aging Multi-state
System ........................................................................................................... 281
7.2.1 Bound Approximation of Increasing Failure Rate Function .......... 283
7.2.2 Availability Bounds for Increasing Failure Rate Function............. 285
xiv Contents
7.2.3 Total Expected Reward Bounds for Increasing Failure Rate

Function .................................................................................................. 287
7.3 Reliability-associated Cost Assessment for Aging Multi-state System ... 291
7.3.1 Case Study: Maintenance Investigation for Aging Air
Conditioning System ............................................................................... 293
7.4 Optimal Corrective Maintenance Contract Planning for Aging
Multi-state System ......................................................................................... 299
7.4.1 Algorithm for Availability and Total Expected Cost Bound
Estimation ............................................................................................... 301
7.4.2 Optimization Technique Using Genetic Algorithms ...................... 302
7.4.3 Case Study: Optimal Corrective Maintenance Contract for Aging
Air Conditioning System......................................................................... 303
7.5 Optimal Preventive Replacement Policy for Aging Multi-state
Systems.......................................................................................................... 310
7.5.1 Problem Formulation...................................................................... 311
7.5.2 Implementing the Genetic Algorithm............................................. 313
7.5.3 Case Study: Optimal Preventive Maintenance for Aging Water
Desalination System................................................................................ 315
References ..................................................................................................... 318
8 Fuzzy Multi-state System: General Definition and Reliability

Assessment ......................................................................................................... 321
8.1 Introduction ............................................................................................. 321
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System ................ 323
8.3 Reliability Evaluation of Fuzzy Multi-state Systems............................... 336
8.3.1 Fuzzy Universal Generating Functions: Definitions and
Properties .................................................................................................. 336
8.3.2 Availability Assessment for Fuzzy Multi-state Systems .................. 337
8.3.3 Fuzzy Universal Generating Function for Series-parallel Fuzzy
Multi-state Systems ................................................................................... 338
8.3.4 Illustrative Examples........................................................................ 343
References ..................................................................................................... 346
Appendix A Heuristic Algorithms as a General Optimization Technique ... 347

A.1 Introduction............................................................................................. 347
A.2 Parameter Determination Problems ........................................................ 355
A.3 Partition and Allocation Problems .......................................................... 356
A.4 Mixed Partition and Parameter Determination Problems........................ 359
A.5 Sequencing Problems.............................................................................. 360
A.6 Determination of Solution Fitness .......................................................... 362
A.7 Basic Genetic Algorithm Procedures and Reliability Application.......... 364
References ..................................................................................................... 365
Contents xv
Appendix B Parameter Estimation and Hypothesis Testing for Non-

homogeneous Poisson Process .......................................................................... 367
B.1 Homogeneous Poisson Process ............................................................... 367
B.2 Non-homogeneous Poisson Process........................................................368
B.2.1 General Description of Non-homogeneous Poisson Process ......... 368
B.2.2 Hypothesis Testing ........................................................................370
B.2.3 Computer-intensive Procedure for Testing the Non-homogeneous
Poisson Process Hypothesis .................................................................... 372
References...................................................................................................... 375
Appendix C MATLAB Codes for Examples and Case Study

Calculation ......................................................................................................... 377
C.1 Using MATLAB ODE Solvers ............................................................. 377
C.2 MATLAB Code for Example 2.2.......................................................... 377
C.5 MATLAB Code for Air Conditioning System (Case Study 6.2.1) ....... 381
C.5.1 Calculating Average Availability .................................................. 381
C.5.2 Calculating Total Number of System Failures............................... 383
C.5.3 Calculating Mean Time to System Failure .................................... 384
C.5.4 Calculating Probability of Failure-free Operation ......................... 386
C.6 MATLAB Code for Multi-state Power Generation Unit
(Case Study 7.1.1).......................................................................................... 387
C.6.1 Calculating Average Availability .................................................. 387
C.6.2 Calculating Total Number of System Failures............................... 388
C.6.3 Calculating Reliability Function.................................................... 388
References ..................................................................................................... 389
Index ................................................................................................................... 391

1 Multi-state Systems in Nature and in
Engineering
1.1 Multi-state Systems in the Real World: General Concepts
All systems are designed to perform their intended tasks in a given environment.
Some systems can perform their tasks with various distinctive levels of efficiency
usually referred to as performance rates. A system that can have a finite number
of performance rates is called a multi-state system (MSS). Usually a MSS is com-
posed of elements that in their turn can be multi-state. Actually, a binary system is
the simplest case of a MSS having two distinctive states (perfect functioning and
complete failure).
The basic concepts of MSS reliability were primarily introduced in the mid of
the 1970's by Murchland (1975), El-Neveihi et al. (1978), Barlow and Wu (1978),
and Ross (1979). Natvig (1982), Block and Savits (1982), and Hudson and Kapur
(1982) extended the results obtained in these works. Since that time MSS reliabil-
ity began intensive development. Essential achievements that were attained up to
the mid 1980's were reflected in Natvig (1985) and in El-Neveihi and Prochan
(1984) where can be found the state of the art in the field of MSS reliability at this
stage. Readers that are interested in the history of ideas in MSS reliability theory
at next stages can find the corresponding overview in Lisnianski and Levitin
(2003) and Natvig (2007).
In practice there are many different situations in which a system should be con-
sidered a MSS:
Any system consisting of different binary-state units that have a cumulative ef-
fect on the entire system performance has to be considered a MSS. Indeed, the
performance rate of such a system depends on the availability of its units, as the
different numbers of the available units can provide different levels of task per-
formance. The simplest example of such a situation is the well-known k-out-of-
n systems. These systems consist of n identical binary units and can have n+1
states depending on the number of available units. The system performance rate
is assumed to be proportional to the number of available units. It is assumed
that performance rates corresponding to more than k-1 available units are ac-
ceptable. When the contributions of different units to the cumulative system
2 1 Multi-state Systems in Nature and in Engineering
performance rate are different, the number of possible MSS states grows dra-
matically as different combinations of k available units can provide different
performance rates for the entire system.
The performance rate of elements composing a system can also vary as a result
of their deterioration (fatigue, partial failures) or because of variable ambient
conditions. Element failures can lead to the degradation of the entire MSS per-
formance.
In general, the performance rate of any element can range from perfect func-
tioning up to complete failure. The failures that lead to a decrease in the element
performance are called partial failures. After partial failure, elements continue to
operate at reduced performance rates, and after complete failure the elements are
totally unable to perform their tasks.
Consider the following examples of MSSs:
1. In a power supply system consisting of generating and transmitting facilities,
each generating unit can function at different levels of capacity. Generating
units are complex assemblies of many parts. The failures of different parts may
lead to situations in which the generating unit continues to operate, but at a re-
duced capacity. This can occur during outages of several auxiliaries such as
pulverizers, water pumps, fans, etc. For example, Billinton and Allan (1996)
describe a three-state 50 MW generating unit. The performance rates (generat-
ing capacity) corresponding to these states and probabilities of the states are
presented in Table 1.1.
Table 1.1 Capacity distribution of 50 MW generator
Number of state Generating capacity (MW) State probability

1 50 0.960
2 30 0.033
3 0 0.007
2. At last time multi-state models are used in medicine (Giard et al. 2002; van den
Hout and Matthews 2008; Marshall and Jones 2007; Putter et al. 2007), etc. In
(van den Hout and Matthews 2008) is considered a cognitive ability during old
age. An illness-death model is presented in order to describe the progression of
an illness over time. The model considers three states: the healthy state, an ill-
ness state, and the death state. The model is used to derive the probability of a
transition from one state to another within a specified time interval.
3. As a next example, consider a wireless communication system consisting of
transmission stations. The state of each station is defined by the number of sub-
sequent stations covered in its range. This number depends not only on the
availability of station amplifiers, but also on the conditions for signal propaga-
tion that depend on weather, solar activity, etc.
1.1 Multi-state Systems in the Real World: General Concepts 3
4. Figure 1.1 presents a coal transportation subsystem (Lisnianski and Levitin

2003) that continuously supplies the power station boilers and consists of five
basic elements:
primary feeder (1), which loads the coal from the bin to the primary con-
veyor;
set of primary conveyors (2) connected in parallel, which transport the coal
to the stacker-reclaimer;
stacker-reclaimer (3), which lifts the coal up to the secondary conveyor
level;
secondary feeder (4), which loads the set of secondary conveyors con-
nected in parallel and
set of secondary conveyors (5), which supplies the burner feeding system
of the boilers.
Fig. 1.1 Example of flow transmission series system
The amount of coal supplied to the boilers at each time unit proceeds consecu-
tively through each element. The feeders and the stacker-reclaimer can have
two states: working with nominal throughput and total failure. The throughput
of the sets of conveyors (primary and secondary) can vary depending on the
availability of individual two-state conveyors.
5. Another category of the MSS is a task processing system for which the per-
formance measure is characterized by an operation time (processing speed).
This category may include control systems, information or data processing sys-
tems, manufacturing systems with constrained operation time, etc. The opera-
tion of these systems is associated with consecutive discrete actions performed
by the ordered line of elements. The total system operation time is equal to the
sum of the operation times of all of its elements. When one measures the ele-
ment (system) performance in terms of processing speed (reciprocal to the op-
eration time), the total failure corresponds to a performance rate of 0. If at least
one system element is in a state of total failure, the entire system also fails
completely. Indeed, the total failure of the element corresponds to its process-
ing speed equal to 0, which is equivalent to an infinite operation time. In this
case, the operation time of the entire system is also infinite. An example of the
task processing series system (Lisnianski and Levitin 2003) is a manipulator
control system (Figure 1.2) consisting of:
a visual image processor (1);

a multi-channel data transmission subsystem (2), which transmits the data
from the image processor to main processing unit;
the main multi-processor unit (3), which generates control signals for ma-
nipulator actuators;
a manipulator (4).
The system performance is measured by the speed of its response to the occur-
ring events. This speed is determined by the sum of the times needed for each
element to perform its task (from initial detection of the event to the comple-
tion of the manipulator actuators performance). The time of data transmission
also depends on the availability of channels, and the time of data processing
depends on the availability of the processors as well as on the complexity of
the image. The system reliability is defined as its ability to react within a
specified time during an operation period.
Fig. 1.2 Example of task processing series system
6. Consider the local power supply system presented in Figure 1.3 (Lisnianski and
Levitin 2003). The system is aimed at supplying a common load. It consists of
two spatially separated components containing generators and two spatially
separated components containing transformers. Generators and transformers of
different capacities within each component are connected by a common bus
bar. To provide interchangeability of the components, bus bars of the genera-
tors are connected by a group of cables. The system output capacity (perform-
ance) must be no less than a specified load level (demand).
Fig. 1.3 Local power system
7. Another example of MSS is a network of roads with different speed limitations

presented in Figure 1.4 (Doulliez and Jamoulle, 1972). Each possible route
from A to B consists of several different sections. The total travel time is de-
termined by the random speed limitations at each section (depending on the
traffic and the weather conditions) of the network and by the chosen route. This
time characterizes the system performance and must be no less than some
specified value (demand).
Fig. 1.4 Bridge-shaped network of roads with different speed limitations
8. The most commonly used refrigeration system for supermarkets today is the
multiplex direct expansion system (Baxter 2002, IEA Annex 26 2003). All dis-
play cases and cold storerooms use direct-expansion air-refrigerant coils that
are connected to the system compressors in a remote machine room located in
the back or on the roof of the store. Heat rejection is usually done with air-
cooled condensers with simultaneously working axial blowers mounted outside.
Evaporative condensers can be used as well and will reduce condensing tem-
perature and system energy consumption. Figure 1.5 shows the major elements
of a multiplex refrigeration system. Multiple compressors operating at the same
saturated suction temperature are mounted on a skid, or rack, and are piped
with common suction and discharge refrigeration lines.
Using multiple compressors in parallel provides a means of capacity control,
since the compressors can be selected and cycled as needed to meet the refrig-
eration load. A fault in a single unit or item of machinery cannot have detri-
mental effects on the entire store, only decrease the system cool capacity. Fail-
ure of a compressor or axial condenser blower leads to partial system failure
(reducing output cooling capacity) as well as to complete failures of the system.
We can treat a refrigeration system as a MSS, where the system has a finite
number of states.
Consider the refrigeration system used in a supermarket. The system consists
of four compressors situated in the machine room and two main axial condenser
blowers. It is possible to add one reserve blower. The reserve blower begins to
work only when one of the main blowers has failed.
Fig. 1.5 Multiplex refrigeration system
So, the entire refrigerating system has the following output performance lev-
els.
The full performance of the refrigerating system is 10.5 109 BTU per
year.
When one of the compressors fails the refrigeration system reduces its per-
formance to 7.9 109 BTU per year.
When two compressors fail the refrigeration system reduces its perform-
ance to 5.2 109 BTU per year.
When three compressors fail the refrigeration system reduces its perform-
ance to 2.6 109 BTU per year.
Failure of one blower reduces the refrigeration system performance to
5.2 109 BTU per year.
9. Sometimes MSS performance can be represented by using probability meas-

ures. In Figure 1.6 one can see an airport radar system (Ushakov 1994), which
should cover an angle of 180.
Fig. 1.6 Airport radar system
It consists of two identical stations: one of them covers 0 to 110 and the other
covers 70 to180. The MSS performance measure is the probability of success-
fully revealing a target. The probability of revealing a target by one station is
psuc = 0.9. In the overlapping zone the probability is
Psuc = 1 (1 psuc )2 = 0.99 .
Thus, the entire airport radar system will have the following performance lev-
els. If both radars are available, then the entire MSS output performance will be

as follows: g 2 = 40 40
0.99 + 0.9 = 0.92 . If only one radar is
180 180

available, then the MSS performance will be g1 = 110 0.9 = 0.55 . If
180
both radars are unavailable, then the MSS performance will be g0 = 0 .
Additional interesting examples one can find in Natvig and Morch (2003),
Levitin (2005), and Kuo and Zuo (2003). In Natvig and Morch (2003) was pre-
sented a detailed investigation of gas pipeline network. Levitin (2005), Kuo and
Zuo (2003), Nordmann and Pham (1999), and Zuo and Liang (1994) considered
special types of MSS such as weighted voting systems, multi-state consecutively
connected systems, sliding window systems. Kolowrocki (2004) describes some
types of communication lines and rope transportation systems.
1.2 Main Definitions and Properties
1.2.1 Generic Multi-state System Model
In order to analyze MSS behavior one has to know the characteristics of its ele-
ments. Any system element j can have kj different states corresponding to the per-
formance rates, represented by the set
{
g j = g j1 , g j 2 ,, g jk j , }
where g ji is the performance rate of element j in the state i, i {1, 2,...k j }.
The performance rate Gj(t) of element j at any instant t 0 is a random vari-
able that takes its values from gj: Gj(t) gj. Therefore, for the time interval [0,T],
where T is the MSS operation period, the performance rate of element j is defined
as a stochastic process.
In some cases, the element performance cannot be measured only by a single
value, but by more complex mathematical objects, usually vectors. In these cases,
the element performance is defined as a vector stochastic process Gj(t).
The probabilities associated with the different states (performance rates) of the
system element j at any instant t can be represented by the set
{ }
p j ( t ) = p j1 ( t ) , p j 2 ( t ) , , p jk j ( t ) , (1.1)
where
{
p ji ( t ) = Pr G j ( t ) = g ji . } (1.2)
1.2 Main Definitions and Properties 9
Note that since the element states compose the complete group of mutually ex-
clusive events (meaning that element j can always be in one and only one of kj
kj
states), p
i =1
ji (t ) = 1, for any t : 0 t T .
Expression (1.2) defines the probability mass function (pmf) for a discrete ran-
dom variable Gj(t) at any instant t. The collection of pairs gji, pji(t),
i = 1, 2, , k j completely determines the probability distribution of performance
of the element j at any instant t.
Observe that the behavior of binary elements (elements with only total failures)
can also be represented by performance distribution (PD). Indeed, consider a bi-
nary element b with a nominal performance (performance rate corresponding to a
fully operable state) g* and the probability that the element is in the fully operable
state p(t). Assuming that the performance rate of the element in a state of complete
{ }
failure is 0, one obtains its PD as follows: gb = 0, g * , pb ( t ) = {1 p ( t ) , p ( t )} .
Fig. 1.7 Cumulative performance curves for steady-state behavior of multi-state elements
The steady-state (long-term or stationary, t ) performance distribution can

be represented graphically in the form of cumulative curves. In this representation,
each value of performance x corresponds to the probability that the element pro-
vides a performance rate that is no less than this level: Pr{G j x}. For compari-
son, the graphs representing the performance distribution of binary element b and
multi-state element j with five different states are presented in Figure 1.7. Observe
that the cumulative discrete PD is always a decreasing stepwise function.
When the MSS consists of n elements, its performance rates are unambiguously
determined by the performance rates of these elements. At each moment, the sys-
tem elements have certain performance rates corresponding to their states. The
state of the entire system is determined by the states of its elements. Assume that
the entire system has K different states and that gi is the entire system performance
rate in state i (i {1,,K}). The entire MSS performance rate at time t is a random
variable that takes values from the set {g1,,gK}.
Definition 1.1 Let Ln = {g11,..., g1k1 } {g 21,..., g 2 k 2 } ... {g n1,..., g nkn } be the
space of possible combinations of performance rates for all of the MSS elements
and M = {g1,,gK} be the space of possible values of the performance rate for the
entire system.
The transform f ( G1 (t ),, Gn (t ) ) : Ln M , which maps the space of the ele-
ments performance rates into the space of the systems performance rates, is
called the MSS structure function.
Note that the MSS structure function is an extension of a binary structure func-
tion. The only difference is in the definition of the state spaces: the binary struc-
ture function is mapped as {0,1}n {0,1} , while in the MSS, one deals with much
more complex spaces.
Now we can define a generic model of the MSS.
This generic MSS model should include models of the performance stochastic
processes
G j (t ), j = 1, 2,, n (1.3)
for each system element j and of the system structure function that produces the
stochastic process corresponding to the output performance of the entire MSS
G (t ) = f ( G1 (t ), , Gn (t ) ) . (1.4)
In practice, performance stochastic processes G j (t ) may be presented in some

different forms. For example, performance probability distributions for all of the
system elements may be given at any instant of time t during the operation period
[0,T]. Then the MSS is presented by these probability distributions
g j , p j ( t ) , 1 j n, (1.5)
and system structure function:
G (t ) = f ( G1 (t ), , Gn (t ) ) . (1.6)
It also does not matter how the structure function is defined. It can be repre-
sented in a table or in analytical form or be described as an algorithm for unambi-
guously determining the system performance G(t) for any given set
{G1 (t ),, Gn (t )} . Below we will consider examples for some possible representa-
tions of MSS structure functions.
Example 1.1 Consider a 2-out-of-3 MSS. This system consists of three binary ele-
ments with the performance rates Gi(t) {gi1, gi2}={0, 1}, for i=1,2,3, where
0, if element i is in a state of complete failure;
g i1 =
1, if element i functions perfectly.
The system output performance rate G(t) at any instant t is
0, if there is more than one failed element;

G ( t ) = 1, if there is only one failed element;
2, if all the elements function perfectly.

The values of the system structure function G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) for

all the possible system states are presented in Table 1.2.
Table 1.2 Structure function for 2-out of-3 system
G1(t) G2(t) G3(t) f(G1(t),G2(t),G3(t))

0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 2
Example 1.2 Consider a flow transmission system [Figure 1.8 (a)] consisting of
three pipes (Lisnianski and Levitin 2003).
(a) (b)
Fig. 1.8 Two different MSSs with identical structure functions
The oil flow is transmitted from point C to point E. The pipes performance is
measured by their transmission capacity (ton per minute). Elements 1 and 2 are
binary. A state of total failure for both elements corresponds to a transmission ca-
pacity of 0 and the operational state corresponds to the capacities of the elements
1.5 and 2 tons per minute, respectivelyso that G1(t) {0,1.5}, G2(t) {0,2}. Ele-
ment 3 can be in one of three states: a state of total failure corresponding to a ca-
pacity of 0, a state of partial failure corresponding to a capacity of 1.8 tons per
minute, and a fully operational state with a capacity of 4 tons per minute so that
G3(t) {0,1.8,4}. The system output performance rate is defined as the maximum
flow that can be transmitted from C to E.
The total flow between points C and D through parallel pipes 1 and 2 is equal
to the sum of the flows through each of these pipes. The flow from point D to
point E is limited by the transmitting capacity of element 3. On the other hand,
this flow cannot be greater than the flow between points C and D. Therefore, the
flow between points C and E (the system performance) is
G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
The values of the system structure function G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) for

all the possible system states are presented in Table 1.3.
Table 1.3 Possible states of oil transmission system
G1(t) G2(t) G3(t) f(G1(t),G2(t),G3(t))

0 0 0 0
0 0 1.8 0
0 0 4 0
0 2 0 1
0 2 1.8 1.8
0 2 4 2
1.5 0 0 0
1.5 0 1.8 1.5
1.5 0 4 1.5
1.5 2 0 0
1.5 2 1.8 1.8
1.5 2 4 3.5
Example 1.3 Consider a data transmission system [Figure 1.8 (b)] consisting of
three fully reliable network servers and three data transmission channels (ele-
ments). The data can be transmitted from server C to server E through server D or
directly. The time of data transmission between the servers depends on the state of
the corresponding channel and is considered to be the channel performance rate.
This time is measured in seconds.
Elements 1 and 2 are binary. They may be in a state of total failure when data
transmission is impossible. In this case data transmission time is formally defined
as . They may also be in a fully operational state when they provide data trans-
mission for 1.5 s and 2 s, respectively: G1(t) {,1.5}, G2(t) {,2}. Element 3
can be in one of three states: a state of total failure, a state of partial failure with
data transmission for 4 s, and a fully operational state with data transmission for
1.8 s: G3(t) {,4,1.8}. The system performance rate is defined as the total time
the data can be transmitted from server A to server C.
When the data is transmitted through server D, the total time of transmission is
equal to the sum of times G1(t) and G2(t) it takes to transmit them from server C to
server D and from server D to server E, respectively. If either element 1 or 2 is in
a state of total failure, data transmission through server D is impossible. For this
case we formally state that (+2) = and (+1.5) = . When the data are trans-
mitted from server C to server E directly, the transmission time is G3(t). The mini-
mum time needed to transmit the data from C to E directly or through D deter-
mines the system transmission time. Therefore, the MSS structure function takes
the form
G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
Note that the different technical systems in Examples 1.2 and 1.3, even when
they have different reliability block diagrams (Figures 1.8 A and B), correspond to
the identical MSS structure functions.
1.2.2 Main Properties of Multi-state Systems
1.2.2.1 Relevancy of System Elements
In the binary context, the relevancy of a system element means that in some condi-
tions the state of an entire system completely depends on the state of this element.
In terms of the system structure function, the relevancy of element j means that
there exist such G1(t),,Gn(t) that
f (G1 (t ),..., G j 1 (t ),1, G j +1 (t ),..., Gn (t )) = 1,

(1.7)
f (G1 (t ),..., G j 1 (t ), 0, G j +1 (t ),..., Gn (t )) = 0.
Note that for the binary systems Gj(t){0,1} for 1 j n.

When the MSS is considered, the element is relevant if some changes in its
state without changes in the states of the remaining elements cause changes in the
entire system state. In terms of the MSS structure function, the relevancy of ele-
ment j means that there exist such G1(t),,Gn(t) that for some gjk gjm
f (G1 (t ),..., G j 1 (t ), g jk , G j +1 (t ),..., Gn (t ))

(1.8)
f (G1 (t ),..., G j 1 (t ), g jm , G j +1 (t ),..., Gn (t )).
For example, consider a system of switches connected in a series (Figure 1.9).
Fig. 1.9 MSS with series switches
When commanded to open, each switch has different states corresponding to

the different delays in fulfilling the command (it is assumed that in the worst-case
scenario the switch delay has a finite value, which means that the probability that
the switch fails to open is negligible). It is easily seen that the circuit disconnec-
tion time is equal to the disconnection time of its fastest element. The possible de-
lays of the switches are presented in Table 1.4.
One can see that any change in the state of the second switch does not affect
the disconnection time of the entire circuit. Therefore, the second element is not
relevant.
Table 1.4 Possible delays of switches and entire circuit disconnection times
Possible switch delays (s) Possible circuit

disconnection
Element 1 Element 2 Element 3
times
0.3, 0.7 0.9, 1.2 0.3, 0.5, 0.8 0.3, 0.5, 0.7
1.2.2.2 Coherency
In the binary system context coherency means that:

All system elements are relevant.
The fault of all the elements causes the fault of the entire system.
The operation of all the elements results in the operation of the entire system.
Once the system has failed, no additional failure can make the system function
again.
When the system is operating, no repair or addition of elements can cause the
system to fail.
For MSSs these requirements are met in systems with monotonic structure
functions:
f (G1 (t ),, Gn (t )) = 1, if G j ( t ) = 1 for 1 j n,

f (G1 (t ),, Gn (t )) = 0, if G j ( t ) = 0 for 1 j n, (1.9)
f (G1 (t ),, Gn (t )) f (G1 (t ),, G n (t )),
if there is no j for which G j G j (for a binary system, this can be reformulated as

follows: there is no such j that G j = 1 and G j = 0 ).
So, in a multi-state case, the system is coherent if and only if its structure func-
tion is non-decreasing in each argument and all of the system elements are rele-
vant. Note that from this structure function property it follows that the greatest
system performance is achieved when the performance rates of all of the elements
are greatest and the lowest system performance is achieved when the performance
rates of all of the elements are the lowest.
1.2.2.3 Homogeneity
The MSS is homogenous if all of its elements and the entire system itself have the
same number of distinguished states. One can easily see that all binary-state sys-
tems are homogenous.
For example, consider a system of switches connected in series (Figure 1.9).
Assume that all the switches are identical and have the same number of states. The
total failure of a switch corresponds to infinite delay. Since the time of circuit
closing is equal to the closing time of its fastest element and since the elements are
identical, the entire system delay can be equal only to the delay of one of its ele-
ments. The possible system delays are the same as the delays of a single element.
This means that the system is homogenous.
Despite the fact that homogenous MSSs are intensively studied, in real applica-
tions most systems do not possess this property. Indeed, even when considering
the same MSS of series switches and allowing for different switches to have dif-
ferent operational delays, one obtains a MSS in which the number of system states
is not equal to the number of states of the elements (see examples in Table 1.5).
Table 1.5 Possible delays of switches and entire circuit disconnection times
Possible switch delays (s) Possible circuit

Type of MSS Element No. disconnection
1 2 3 times
Homogenous MSS with
0.3, 0.7, 0.3, 0.7, 0.3, 0.7, 0.3, 0.7,
multi-state elements
Non-homogenous MSS with
0.3, 0.7, 0.4, 0.7, 0.3, 0.8, 0.3, 0.4, 0.7, 0.8,
multi-state elements
Non-homogenous MSS with
0.3, 0.6, 0.4, 0.3, 0.4, 0.6,
binary elements
1.3 Multi-state System Reliability and Its Measures
1.3.1 Acceptable and Unacceptable States. Failure Criteria
MSS behavior is characterized by its evolution in the space of states. The entire
set of possible system states can be divided into two disjoint subsets correspond-
ing to acceptable and unacceptable system functioning. The system entrance into
the subset of unacceptable states constitutes a failure. MSS reliability can be de-
fined as the systems ability to remain in acceptable states during the operation pe-
riod.
Since the system functioning is characterized by its output performance G(t),
the state acceptability at any instant t 0 depends on this value. In some cases
this dependency can be expressed by the acceptability function F(G(t)) that takes
non-negative values if and only if the MSS functioning is acceptable. This takes
place when the efficiency of the system functioning is completely determined by
its internal state. For example, only those states where a network preserves its con-
nectivity are acceptable. In such cases, a particular set of MSS states is of interest
to the customer. Usually these states are interpreted as system failure states,
which,when reached, imply that the system should be repaired or discarded.
Much more frequently, the system state acceptability depends on the relation
between the MSS performance and the desired level of this performance (demand)
that is determined outside of the system. In general, the demand W(t) is also a ran-
dom process. Below we shall consider such a case when the demand can take dis-
crete values from the set w = {w1 ,, wM } . Often the desired relation between the
system performance and the demand can be expressed by the acceptability func-
tion F ( G ( t ) , W ( t ) ) . The acceptable system states correspond to
1.3 Multi-state System Reliability and Its Measures 17
F ( G ( t ) ,W (t ) ) 0 and the unacceptable states correspond to

F ( G ( t ) , W ( t ) ) < 0 . The last inequality defines the MSS failure criterion.
In many practical cases, the MSS performance should exceed the demand. In
such cases the acceptability function takes the form
F ( G ( t ) ,W (t ) ) = G ( t ) W ( t ) .
The system behavior during the operation period can be characterized by the
possibility of entering the subset of unacceptable states more than once. The case
where a MSS can enter this subset only once usually corresponds to unrepairable
deteriorating systems. For repairable systems the transitions between subsets of
acceptable and unacceptable states may occur an arbitrary number of times.
Note that in some cases it may be impossible to divide a MSSs state space into
acceptable and unacceptable states. Only some functional associated with two sto-
chastic processes G(t) and W(t) may be of interest in order to define MSS failure.
For example, MSS failure may be defined as an event when functional
T
J = [G (t ), W (t )] dt is greater than some specified value J 0 and () is defined
0
as some arbitrary function. For a power system, where G(t) and W(t) are treated as
respectively, generating capacity and load (demand, which is required by consum-
ers), functional J is interpreted as an energy not supplied to consumers, where
() is defined as follows: (t ) = W (t ) G (t ), if W ( t ) G ( t ) 0 , and
(t ) 0, if W ( t ) G ( t ) < 0. Such a functional J is called a failure criteria func-
tional.
1.3.2 Relevancy and Coherency in Multi-state System Reliability

Context
In Section 1.2.2 the MSS relevancy was considered as properties of the structure
function representing the system performance. When the MSS is considered from
the reliability viewpoint, the system demand should be taken into account too. The
system performance value is of interest as well as the demand value. In this con-
text, an element is relevant if changes in its state without changes in the states of
the remaining elements cause changes in the systems reliability. The relevancy is
now treated not as an internal property of the system, but as one associated with
the systems ability to perform a task, which is defined outside the system. In this
context element j, relevancy means that there exist G1(t),,Gn(t) for which for
some gjk gjm
{ }
J f (G1 (t ),..., G j 1 (t ), g jk , G j +1 (t ),..., Gn (t )) ,W J 0 , (1.10)
while
{ }
J (G1 (t ),..., G j 1 (t ), g jm , G j +1 (t ),..., Gn (t ) ,W > 0
Note that this condition is tougher than condition (1.8). Indeed, a relevant ele-
ment according to expression (1.8) can be irrelevant according to (1.10).
For example, consider a system of switches connected in a series (Figure 1.9)
and assume that the switches are binary elements with switching delays, presented
in the last row of Table 1.5. Assume that the system disconnection time is not
greater than constant W: ( J = W G ( t ) ) . Observe that for W 0.6, the second
switch is relevant since when the first and third switches do not work, the systems
success depends on the state of the second switch. For W < 0.6, the second switch
is irrelevant since when the first and third switches do not work, the system fails to
meet the demand independently of the state of the second switch. (According to
expression (1.8) the second switch is always relevant).
Using the acceptability function, one can also give a definition of system co-
herency that is more closely related to the one given for binary systems. Indeed,
the definition of coherency for binary systems operates with notions of fault and
normal operation, while when applied to MSS all that is required is the monotonic
behavior of the structure function. In the context of reliability, MSS coherency
means that the improvement in the performance of the system elements cannot
cause the entire system to transition from an acceptable state to an unacceptable
one:
if F ( f (G1 (t ),, G n (t )), W ) 0 and there is no j for which G j G j ,

(1.11)
then F ( f (G1 (t )..., Gn (t )), W ) 0.
1.3.3 Multi-state System Reliability Measures
To numerically characterize MSS behavior from a reliability point of view, one

has to determine the MSS reliability indices. These indices can be considered as
extensions of the corresponding reliability indices for a binary-state system.
Some indices are based on considering the system evolution in the time do-
main. In this case the relation between the systems output performance, and the
demand represented by the two corresponding stochastic processes must be stud-
ied. Figure 1.10 shows an example of the behavior of MSS performance and de-
mand as the realizations of the stochastic processes.
When the system is considered in the given time instant or in a steady state
(when its output performance distribution does not depend on time), its behavior is
determined by its performance represented as a random variable.
Note that in a steady state the distribution of the variable demand can be repre-
sented (in analogy with the distribution of MSS performance) by two vectors
(w,q), where w = {w1 , , wM } is the vector of possible demand levels wj,
j = 1, , M , and q = {q1 , , qM } is the vector of steady-state probabilities of cor-
responding demand levels q j = Pr {W = w j } , j = 1,, M .
Fig. 1.10 MSS behavior as stochastic process
When one considers MSS evolution in the space of states during system opera-
tion period T, the following random variables can be of interest:
Time to failure, T f , is the time from the beginning of the systems life up
to the instant when the system first enters the subset of unacceptable states.
Time between failures, Tb , is the time between two consecutive transitions
from the subset of acceptable states to the subset of unacceptable states.
Number of failures, NT , is the number of times the system enters the subset
of unacceptable states during the time interval [0,T].
In Figure 1.10, one can see an example of the random realization of two sto-
chastic processes G(t) and W(t). Assume that the system performance value ex-
ceeds the value of demand: F ( G ( t ) , W ( t ) ) = G ( t ) W ( t ) . In this case, the first
time that the process G(t) downcrosses the level of demand, W(t) determines the
time to MSS failure. This time is designated as T f . The random variable T f is
characterized by the following indices:
Probability of a failure-free operation or reliability function R(t) is the

probability that T f is greater than or equal to the value t (t > 0), where in
the initial state (at instant t = 0) the MSS is in one of the acceptable states:
{ }
R ( t ) = Pr T f t | F ( G ( 0 ) , W ( 0 ) ) 0. (1.10)
Mean time to failure (MTTF) is the mean time up to the instant when the
system enters the subset of unacceptable states for the first time:
{ }
E Tf . (1.11)
From now on E {}
is used as an expectation symbol.
The same two indices can be defined for the random variable Tb:
The probability that the time between failures is greater than or equal to t:
Pr {Tb t} . (1.12)
Mean time between failures (MTBF):
E {Tb } . (1.13)
The reliability indices associated with the random variable NT are:

The probability that NT is not greater than some specified number n:
Pr{NT n}. (1.14)
The expected number of system failures in the interval [0,T]:
E { NT } . (1.15)
Measures in expressions (1.14) and (1.15) are often important when logistica
problems related to MSS operations are considered (for example, determining the
required number of spare parts).
MSS instantaneous (point) availability A(t,w) is the probability that the MSS at
instant t > 0 is in an acceptable state:
{ }
A ( t , w ) = Pr F ( G ( t ) , W ( t ) ) 0 . (1.16)
MSS availability in the time interval [0,T] is defined as
T
1
1{ F [G (t ), W (t )] 0} dt ,
T 0
AT = (1.17)
where
1, if F [G (t ), W (t )] 0,
1{ F [G (t ),W (t )] 0} =
0, if F [ G (t ),W (t ) ] < 0.
The random variable AT represents the portion of time when the MSS output
performance rate is in an acceptable area. For example, in Figure 1.10
AT = (T T1 T2 ) T . This index characterizes the portion of time when the MSS
output performance rate is not less than the demand.
The expected value of AT is often used and is called demand availability (Aven
and Jensen 1999):
AD = E { AT } . (1.18)
For large t (t ) , the system initial state has practically no influence on its
availability. Therefore, the steady-state (stationary or long-term) MSS availability
A ( w) for the constant demand level W ( t ) = w can be determined on the basis
of the system steady-state performance distribution:
K
A ( w) = pk 1(F ( g k , w) 0) , (1.19)
k =1
where
1, if F ( gi , w) 0,
1( F ( gi , w) 0) =
0, if F ( gi , w) < 0,
and pk = lim pk (t ) is the steady-state probability of the MSS state k with the cor-
t
responding output performance rate gk.

In the case where F ( G ( t ) ,W ( t ) ) = G ( t ) W ( t ) , we have F ( g k , w ) = g k w

and
K
A ( w) = pk 1(g k w) = p. k (1.20)
k =1 gk w
Figure 1.11 demonstrates the cumulative performance curve for a MSS in a

steady state. In this figure the stationary availability corresponds to the point
where the cumulative performance curve crosses the value of w.
Fig. 1.11 MSS steady-state cumulative performance curve
As was stated above, a steady-state distribution of the variable demand can be

represented by two vectors w and q, where w = {w1 , , wM } is the vector of pos-
sible demand levels wj, j = 1,, M and q = {q1 , , qM } is the vector of steady-
state probabilities of the corresponding demand levels q j = Pr W = w j , { }
j = 1,, M .
So, in this case the steady-state availability index may be obtained:
M M K
A ( w , q) = A(wm )qm = qm pk 1(F ( g k , wm ) 0) , (1.21)
m =1 m =1 k =1
Tm Tm
qm = = , m = 1, 2, , M . (1.22)
M
T
T
m =1
m
In power engineering, the index 1 A ( w, q ) is often used. This index is called

the loss of load probability (LOLP) (Billinton and Allan 1996). The MSS per-
formance in this case is interpreted as the power system generating capacity and
its demand is interpreted as a load.
In order to obtain indices that characterize the average MSS output perform-
ance, one can use the performance expectation. The mean value of MSS instanta-
neous output performance at time t is determined as
Gmean ( t ) = E {G ( t )} . (1.22)
If the long-run probabilities pk = lim pk (t ) exist, the steady-state expected per-

t
formance takes the form
K
G = pk g k . (1.23)
k =1
The average MSS expected output performance for a fixed time interval [0,T]
is defined as
T
1
T 0
GT = Gmean (t )dt. (1.24)
Observe that the mean MSS performance does not depend on demand.
In some cases a conditional expected performance is used. This index repre-
sents the mean performance of MSS on the condition that it is in an acceptable
state. In the steady state it takes the form
g k pk 1(F ( g k ,W ) 0)
G* = k =1
K . (1.25)
pk 1(F ( gk ,W ) 0)
k =1
It is often important to know the measure of system performance deviation

from a demand when the demand is not met. In the special case where
F ( G ( t ) , W ( t ) ) = G ( t ) W ( t ) , the instantaneous performance deviation can be
represented as
D(t , w) = max {W (t ) G (t ), 0} (1.26)

and is called the instantaneous performance deficiency at instant t. For example, in

power systems D(t) is interpreted as a generating capacity deficiency and deter-
mines the total power of consumers that must be immediately switched off from
the system.
Since D(t,w) is a random variable at time instant t, it can be characterized by
the following measures:
The probability that at instant t D(t , w) does not exceed some specified
level d:
Pr { D(t , w) d } ; (1.27)
the mean value of the MSS performance deficiency (deviation) at instant t:
Dm ( t , w ) = E { D ( t , w )} . (1.28)
When the MSS is in a steady state and demand is constant W ( t ) = w , perform-

ance deficiency is not a function of time and can be obtained from the system
steady-state performance distribution (Figure 1.11, dashed area) as
K
D = pk max( w g k , 0) . (1.29)
k =1
For a variable demand represented by two vectors (w,q), steady-state perform-

ance deficiency D takes the form
M K
D ( w, q ) = pk qm max( wm gi , 0) . (1.30)
m =1 i =1
The average MSS expected performance deficiency for a fixed time interval
[0, T ] is defined as follows:
1T
DT = Dt dt . (1.31)
T 0
The cumulative performance deficiency for a fixed interval [0,T] is defined as

follows:
T
D T = D(t , w)dt. (1.32)

0
For example, in power systems D T is the energy not supplied to consumers

during the time interval [0,T]. (In Figure 1.10 the cumulative performance defi-
ciency is the sum of the dashed areas.)
In some cases the instantaneous performance deficiency makes no sense as the
system uses storage facilities to accumulate a product. The deficiency appears not
when the system performance does not meet the demand, but rather when the ac-
cumulated performance in interval [0,T] is less than the accumulated demand at
this interval. This takes place in oil and gas transmission systems with intermedi-
ate reservoirs. The accumulated performance deficiency in this case takes the
form
T T T
D T = (W (t ) G (t ))dt = W (t )dt G (t )dt. (1.33)

o 0 0
As D T is a random variable, one can define the following characteristics:

the probability that random D T does not exceed some specified level
[sometimes this measure is called throughput availability (Aven 1993)]
Pr{D T }. (1.34)

the expected amount of the product not supplied to consumers during the
interval [0,T]:
D m=E D T .
{ } (1.35)
Computation of most of the above-mentioned reliability indices is quite a diffi-

cult problem that is systematically studied in the following chapters of this book.
For now we only consider a simple example in order to illustrate some of the reli-
ability measures introduced above.
Example 1.4 Consider two power system generators with a nominal capacity of
100 MW as two separate MSSs. In the first generator, some types of failure re-
quire the capacity to be reduced to 60 MW and other types lead to a complete out-
age. In the second generator, some types of failure require the capacity to be re-
duced to 80 MW, others lead to a capacity reduction to 40 MW, and still others
lead to a complete outage.
The capacity and demand can be presented as a fraction of the nominal capac-
ity. There are three possible relative capacity levels that characterize the perform-
ance of the first generator: g11 = 0.0, g12 = 60 / 100 = 0.6, g13 = 100 / 100 = 1.0 ; and
four relative capacity levels that characterize the performance of the second gen-
erator: g 21 = 0.0, g 21 =40 /100 = 0.4, g 23 = 80 /100 = 0.8, g 23 = 100 / 100 = 1.0 .
Assume that the corresponding steady state probabilities are as follows:
p11 = 0.1, p12 = 0.6, p13 = 0.3 for the first generator and p21 = 0.05, p22 = 0.25 ,
p23 = 0.3, p24 = 0.4 for the second generator.
The required capacity level is 50 MW, which corresponds to w = 50 / 100 = 0.5.
The MSS stationary availability is
A1 ( w) = A1 (0.5) =
g1 k 0.5
p1k = 0.6 + 0.3 = 0.9,
A 2 ( w) = A2 (0.5) =
g 2 k 0.5
p2 k = 0.3 + 0.4 = 0.7.
The expected steady-state MSS performance (1.23) is
3
G1 = p1k g 1k = 0.1 0 + 0.6 0.6 + 0.3 1.0 = 0.66,
k =1
which means 66% of the nominal generating capacity for the first generator and
4
G2 = p 2 k g 2 k = 0.05 0 + 0.25 0.4 + 0.3 0.8 + 0.4 1.0 = 0.74,
k =1
which means 74% of the nominal generating capacity for the second generator.
The steady-state performance deficiency (1.30) is
D1 (0.5) =
g1 k W < 0
p1k (W g1k ) = 0.1 (0.5 0.0) = 0.05,
D2 (0.5) =
g 2 k W < 0
p2 k (W g 2 k ) = 0.05 (0.5 0.0) + 0.25 (0.5 0.4) = 0.05.
In this case, D may be interpreted as the expected electric power unsupplied to

consumers. The absolute value of this unsupplied demand is 5 MW for both gen-
erators. Multiplying this index by the considered system operating time T one can
obtain the LT index, which is interpreted as the expected unsupplied energy.
Note that since the obtained reliability indices have different natures, they can-
not be used interchangeably. For instance, in the present example, the first genera-
tor performs better than the second one when availability is considered
( A1 ( 0.5) > A2 ( 0.5) ) , the second generator performs better than the first one when
the expected capacity is considered (G1 < G2 ), and both generators have the
same unsupplied demand (G1 = G2 ).
References 27
Some addition useful information about MSS readers can find in Lisnianski and
Levitin (2003) book, which is completely devoted to MSS reliability and in Aven
and Jensen (1999), Levitin (2005), and Xie et al. (2004), where special chapters
were devoted to MSS reliability.
References
Aven T (1993) On performance measures for multistate monotone systems. Reliab Eng Syst Saf
41:259266
Aven T, Jensen U (1999) Stochastic models in reliability. Springer, New York
Barlow RE, Wu AS (1978) Coherent systems with multi-state components. Math Operat Res 3:
275281
Baxter V (2002) Advances in supermarket refrigeration systems. In: Proceedings of the 7th in-
ternational Energy Agency heat pump conference, Beijing, China
Billinton R, Allan R (1996) Reliability evaluation of power systems. Plenum, New York.
Block H, Savits T (1982) A decomposition of multistate monotone system. J Appl Prob 19:391
402
Doulliez P, Jamoulle E (1972) Transportation networks with random arc capacities. RAIRO 3:
4560
El-Neweihi E, Proschan F (1984) Degradable systems: a survey of multistate system theory.
Commun Stat Theory Methods 13:405432
Giard N, Lichtenstein P, Yashin A (2002) A multi-state model for genetic analysis of the aging
process. Stat Med 21:25112526
Hudson JC, Kapur KC (1982) Reliability theory for multistate systems with multistate compo-
nents. Microelectron Reliab 22:17
IEA Annex 26: Advanced supermarket refrigeration/heat recovery systems, Final Report, Vol-
ume 1. Oak Ridge National Laboratory, Oak Ridge, TN, 2003
Kolowrocki K (2004) Reliability of large systems. Elsevier, Amsterdam
Kuo W, Zuo M (2003) Optimal reliability modeling: principles and applications. Wiley, New
York
Levitin G (2005) Universal generating function in reliability analysis and optimization. Springer,
London
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
Malinowski J, Preuss W (1995) Reliability of circular consecutively connected systems with
multistate components. IEEE Trans Reliab 44:532534
Marshall G, Jones R (2007) Multi-state models in diabetic retinopathy. Stat Med 14(18):1975
1983
Murchland J (1975) Fundamental concepts and relations for reliability analysis of Multistate sys-
tems. In: Barlow RE, Fussell JB and Singpurwalla N (eds) Reliability and fault tree analysis:
theoretical and applied aspects of system reliability. SIAM, Philadelphia: pp 581-618
Natvig B (1982) Two suggestions of how to define a multistate coherent system. Adv Appl
Probab 14: 434455
Natvig B (1985) Multi-state coherent systems. In: Jonson N, Kotz S (eds) Encyclopedia of statis-
tical sciences, vol 5. Wiley, New York: pp 732735
Natvig B, Morch H (2003) An application of multistate reliability theory to an offshore gas pipe-
line network. Int J Reliab Qual Saf Eng 10(4): 361381
Natvig B (2007) Multi-state reliability theory. In: Ruggeri F, Kenett R, Faltin FW (eds) Encyclo-
pedia of Statistics in Quality and Reliability, Wiley, New York: pp 11601164
Nordmann L, Pham H (1999) Weighted voting systems. IEEE Trans Reliab 48:4249
Putter H, Fiocco M, Geskus B (2007) Tutorial in biostatistics: competing risk and multi-state
models. Stat Med 26(11):23892430
Ross SM (1979) Multivalued state component systems. Ann Prob 7:379383
Ushakov I (ed) (1994) Handbook of reliability engineering. Wiley, New York
Van den Hout A, Matthews F (2008) Multi-state analysis of cognitive ability data. Stat Med,
published on line Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/3360
Xie M, Dai YS, Poh KL (2004) Computing system reliability: models and analysis. Kluwer/ Ple-
num, New York
Zuo M, Liang M (1994) Reliability of multistate consecutively connected systems. Reliab Eng
Syst Saf 44:173176
2 Modern Stochastic Process Methods for
Multi-state System Reliability Assessment
The purpose of this chapter is to describe basic concepts of applying a random

process theory to MSS reliability assessment. Here, we do not present the basics
of the measure-theoretic framework that are necessary to pure mathematicians.
Readers who need this fundamental framework and a more detailed presentation
on stochastic processes can find it in Kallenberg (2002), Karlin and Taylor (1981)
and Ross (1995). For reliability engineers and analysts, the books of Trivedi
(2002), Epstein and Weissman (2008), Aven and Jensen (1999), and Lisnianski
and Levitin (2003) are especially recommended. A great impact to stochastic
processes application to MSS reliability evaluation was done by Natvig (1985)
and Natvig et al. (1985).
In this chapter, the MSS system reliability models will be consequently studied
based on Markov processes; Markov rewards processes, and semi-Markov proc-
esses. The Markov processes are widely used for reliability analysis because the
number of failures in arbitrary time intervals in many practical cases can be de-
scribed as a Poisson process and the time up to the failure and repair time are often
exponentially distributed. This chapter presents a detailed description of a dis-
crete-time Markov chain as well as a continuous-time Markov chain in order to
provide for readers a basic understanding of the theory and its engineering appli-
cations. It will be shown how by using the Markov process theory MSS reliability
measures can be determined. It will also be shown how such MSS reliability
measures as the mean time to failure, mean number of failures in a time interval,
and mean sojourn time in a set of unacceptable states can be found using the
Markov reward models. These models are also the basis for reliability-associated
cost assessment and life-cycle cost analysis. In practice, basic assumptions about
exponential distributions of times between failures and repair times sometimes do
not hold. In this case, more complicated mathematical techniques such as semi-
Markov processes and embedded Markov chains may be applied. Corresponding
issues are also considered in this chapter.
30 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
2.1 General Concepts of Stochastic Process Theory
A stochastic or random process is, essentially, a set of random variables where the
variables are ordered in a given sequence. For example, the daily maximum tem-
peratures at a weather station form a sequence of random variables, and this or-
dered sequence can be considered as a stochastic process. Another example is the
sequence formed by the continuously changing number of people waiting in line at
the ticket window of a railway station.
More formally, the sequence of random variables in a process can be denoted
by X ( t ) , where t is the index of the process.
The values assumed by the random variable X ( t ) are called states, and the set
of all possible values forms the state space of the process. So, a stochastic process
is a sequence of random variables { X ( t ) t T }, defined on a given probability
space, indexed by the parameter t, where t varies over an index set T. In this book,
we mainly deal with stochastic processes where t represents time.
A random variable X can be considered as the rule for assigning to every out-
come of an experiment the value X ( ) . A stochastic process is a rule for as-
signing to every the function X ( t , ) . Thus, a stochastic process is a family of
time functions depending on the parameter or, equivalently, a function of t
and . The domain of is the set of all the possible experimental outcomes and
the domain of t is a set of non-negative real numbers.
For example, the instantaneous speed of a car movement during its trip from
point A to point B will be a stochastic process. The speed on each trip can be con-
sidered as an experimental outcome , and each trip will have its own speed
X ( t , ) that characterizes for this case an instantaneous speed of the trip as a
function of time. This function will be different from such functions of other trips
because of the influence of many random factors (such as wind, broad conditions
etc.). In Figure 2.1 one can see three different speed functions for three trips that
can be treated as three different realizations of the stochastic process. It should be
noticed that the cut of this stochastic process at time instant t1 will represent the
random variable with mean Vm. In real-world systems many parameters such as
temperature, voltage, frequency, etc. may be considered stochastic processes.
The time may be discrete or continuous. A discrete time may have a finite or
infinite number of values; continuous time obviously has only an infinite number
of values. The values taken by the random variables constitute the state space.
This state space, in its turn, may be discrete or continuous. Therefore, stochastic
processes may be classified into four categories according to whether their state
spaces and time are continuous or discrete. If the state space of a stochastic proc-
ess is discrete, then it is called a discrete-state process, often referred to as a chain.
2.1 General Concepts of Stochastic Process Theory 31
Fig. 2.1 Three realizations of stochastic process V(t)
The stochastic process X ( t , ) has the following interpretations:
1. It is a family of functions X ( t , ), where t and are variables.

2. It is a single time function or a realization (sample) of the given process if t is a
variable and is fixed.
3. It is a random variable equal to the state of the given process at time t when t is
fixed and is variable.
4. It is a number if t and are fixed.
One can use the notation X ( t ) to represent a stochastic process omitting, as in
the case of random variables, its dependence on .
For a fixed time t = t1 , the term X ( t1 ) is a simple random variable that de-
scribes the state of the process at time t1. For a fixed number x1 , the probability of
the event X ( t1 ) x1 gives the CDF of the random variable X ( t1 ), denoted by
F ( x1 ; t1 ) = FX (t1 ) ( x1 ) = Pr { X (t1 ) x1} . (2.1)
CDF F ( x1; t1 ) is called the first-order distribution of the stochastic proc-

ess { X (t ) | t 0}. Given two time instants t1 and t2, X ( t1 ) and X ( t2 ) are two ran-
dom variables in the same probability space. Their joint distribution is known as
the second-order distribution of the process and is given by
F ( x1 , x2 ; t1 , t 2 ) = FX (t1 ) X (t 2 ) ( x1 , x2 ) = Pr{ X (t1 ) x1 , X (t 2 ) x2 } . (2.2)
In general, the nth-order joint distribution of the stochastic process

{ X (t ) | t 0}
is defined by
F ( x1 , x2 ,..., xn ; t1 , t2 ,..., tn ) = FX ( t1 ) X ( t2 )... X ( tn ) ( x1 , x2 ,..., xn ) =

(2.3)
= Pr { X (t1 ) x1 , X (t2 ) x2 ,..., X (tn ) xn }
for all t1<t2<<tn.

The last formula represents a complete description of a stochastic process. In
practice, to get such a complete description of a stochastic process is a very diffi-
cult task. Fortunately, in practice many stochastic processes permit a simpler de-
scription.
The simplest form of the joint distribution corresponds to a family of independ-
ent random variables. Then the joint distribution is given by the product of indi-
vidual distributions.
Definition 2.1 A stochastic process { X (t ) | t 0} is said to be an independent
process if its nth-order joint distribution satisfies the condition
n n
F ( x1 , x2 ,..., xn ; t1 , t 2 ,..., t n ) = F ( xi ; ti ) = Pr{ X (ti ) xi } . (2.4)
i =1 i =1
The assumption of an independent process considerably simplifies analysis, but

it is often unwarranted and we are forced to consider some kind of dependence.
The simplest and a very important type of dependence is the first-order depend-
ence or Markov dependence.
Definition 2.2 A stochastic process { X (t ) | t 0} is called a Markov process if for
any t0 < t1 < t2 < ... < tn 1 < tn < t the conditional distribution of X(t) for given val-
ues of X ( t0 ) , X ( t1 ) ,..., X ( tn ) depends only on X ( tn ):
Pr{ X (t ) x | X (tn ) = xn , X (tn 1 ) = xn 1 ,..., X (t1 ) = x1 , X (t0 ) = x0 } =

(2.5)
= Pr{ X (t ) x | X (tn ) = xn }.
This is a general definition, which applies to Markov processes with a continu-

ous-state space. When MSS reliability is studied, discrete-state Markov processes
or Markov chains are mostly involved. In the next sections we will study both dis-
crete-time and continuous-time Markov chains.
In the Markov process, the probabilities of the random variable at time t > tn
depend on the value of the random variable at tn but not on the realization of the
process prior to tn. In other words, the state probabilities at a future instant, given
the present state of the process, do not depend on the states occupied in the past.
Therefore, this process is also called memoryless.
In many cases the conditional distribution (2.5) has the property of invariance
with respect to the time origin tn:
2.1 General Concepts of Stochastic Process Theory 33
Pr{ X (t ) x | X (t n ) = xn } = Pr{ X (t t n ) x | X (0) = xn } . (2.6)
Such a Markov process is said to be homogeneous.

In addition we consider here two important stochastic processes that will be
used in the future: point and renewal processes.
A point process is a set of random points ti on the time axis. For each point proc-
ess one can associate a stochastic process X ( t ) equal to the number of points ti
in the interval ( 0,t ) . In reliability theory point processes are widely used to de-
scribe the appearance of events in time (e.g., failures, terminations of repair, etc.).
An example of the point processes is the so-called Poisson process. The Pois-
son process is usually introduced using Poisson points. These points are associated
with certain events, and the number N ( t1 , t2 ) of the points in an interval ( t1 , t2 ) of
length t = t2 t1 is a Poisson random variable with parameter t , where is the
mean occurrence rate of the events:
e t ( t ) k
Pr { N ( t1 , t2 ) = k } = . (2.7)
k!
If the intervals ( t1 , t2 ) and ( t3 , t4 ) are not overlapping, then the random vari-
ables N ( t1 , t2 ) and N ( t3 , t4 ) are independent. Using the points ti one can form
the stochastic process X ( t ) = N ( 0, t ) .
The Poisson process plays a special role in reliability analysis, comparable to the
role of the normal distribution in probability theory. Many real physical situations
can be successfully described with the help of Poisson processes.
A well-known type of point process is the so-called renewal process. This
process can be described as a sequence of events, the intervals between which are
independent and identically distributed random variables. In reliability theory, this
kind of mathematical model is used to describe the flow of failures in time.
To every point process ti one can associate a sequence of random variables yn
such that y1 = t1 , y2 = t2 t1 ,..., yn = tn tn 1 , where t1 is the first random point to
the right of the origin. This sequence is called a renewal process. An example is
the life history of items that are replaced as soon as they fail. In this case, yi is the
total time the ith item is in operation and ti is the time of its failure.
One can see a correspondence among the following three processes:
a point process ti ;
a discrete-state stochastic process X ( t ) increasing (or decreasing) by 1 at
points ti ; and
a renewal process consisting of random variables yi such that

tn = y1 + ... + yn .
A generalization of this type of process is the so-called alternating renewal

process. This process consists of two types of independent and identically distrib-
uted random variables alternating with each other in turn. This type of process is
convenient for the description of repairable systems. For such systems, periods of
successful operation alternate with periods of idle time.
2.2 Markov Models: Discrete-time Markov Chains
2.2.1 Basic Definitions and Properties
As was described above, a Markov process is a stochastic process whose dynamic

behavior is such that the probability distribution for its future development de-
pends only on the present state and not on how the process arrived at that state
(Trivedi 2002). Generally Markov technique is very effective in many practical
important cases (International Standard IEC 61165 2006).
When the state space, S, is discrete (finite or countably infinite), then the
Markov process is known as a Markov chain. Since the state space is discrete and
countable, we can assume without loss of generality that S = {0,1, 2,3,...}. If the
parameter space, T (recall that we usually will consider time as the parameter), is
discrete too, then we have a discrete-time Markov chain. Since the parameter
space is discrete, we will let T = {0,1, 2,3,...} . Thus, a Markov chain
{ X (n), n = 0,1, 2,...} is described by a sequence of random variables
X ( 0 ) = x0 , X (1) = x1 , X ( 2 ) = x2 ,... , where x0 , x1 , x2 , are integer numbers. If
the state of the system at time step n is j, we denote it as X ( n ) = j. Then X0 is the
initial state of the system at time step 0. By using these designations in analogy
with (2.5), the Markov property can be defined as
Pr{ X n = xn | X 0 = x0 , X 1 = x1 ,..., X n 1 = xn 1} =
(2.8)
= Pr{ X n xn | X n 1 = xn 1}.
As in the case of a general Markov process, Equation 2.8 implies that chain be-
havior in the future depends only on its present state and does not depend on its
behavior in the past.
2.2 Markov Models: Discrete-time Markov Chains 35
We designate the probability that at step n the chain will be in state j as p j (n).
Thus, we can write
p j (n) = Pr{ X n = j} . (2.9)
We also define the probability pij (m, n) that the chain makes a transition to
state j at step n if at step m it was in state i. This probability is a conditional prob-
ability, and we can write the following
pij (m, n) = Pr{ X (n) = j | X (m) = i}, 0 m n. (2.10)
Conditional probability pij (m, n) is known as the transition probability func-

tion of the Markov chain.
Here we will only consider homogeneous Markov chains those in which
pij (m, n) depends only on difference n-m. For such chains, the simpler notation
pij (n) = Pr{ X (m + n) = j | X (m) = i}, 0 m n (2.11)
is usually used to denote so-called n-step transition probabilities. In words,

pij (n) is the probability that a homogeneous Markov chain will move from state
i to state j in exactly n steps.
If n = 1, for homogeneous Markov chains we can also write
pij (1) = Pr { X (m + 1) = j | X (m) = i} = pij = const. (2.12)
The probabilities pij are called one-step transition probabilities.

For engineering applications here we will consider only finite and countable
state space S = {0,1, 2,..., M }. The one-step transition probabilities can be con-
densed into a transition (one-step) probability matrix P, where
p00 p01 ... p0 M

p p11 ... p1M
P = pij = 10 . (2.13)
... ... ... ...

pM 0 pM 1 ... pMM
Since for all i, j S , 0 pij 1, and each row in P adds up to 1, matrix P is a

stochastic matrix.
The probability mass function of the random value X ( 0 ) is called the initial
probability row-vector
p ( 0 ) = p0 ( 0 ) , p1 ( 0 ) ,..., pM ( 0 ) (2.14)
and presents the initial conditions of a Markov chain.

An equivalent description of the Markov chain can be given by a directed graph
called the state-transition diagram (or state diagram for short) of the Markov
chain. A node labeled i of the state diagram represents state i of the Markov chain
and a branch labeled pij from node i to j represents the corresponding one-step
transition probability from state i to state j.
2.2.2 Computation of n-step Transition Probabilities and State

Probabilities
The problem being considered here is in obtaining an expression for evaluating the
n-step transition probability pij ( n ) from the one-step transition probabilities
pij = pij (1) . Recall that for a homogeneous Markov chain according to expression
(2.11) we have the following:
pij (n) = Pr{ X (m + n) = j | X (m) = i}, 0 m n .
Let us consider the transition probability pij (m + n) that the process goes to
state j at the (m + n) step, given that at 0 step it is in state i. In order to reach state
j at the (m + n) step the process first reaches some intermediate state k at step m
with probability pik (m) and then moves from k and reaches j at step (m + n) with
probability pkj (n). The Markov property implies that there are two independent
events. Then using the theorem of total probability we obtain
pij (m + n) = pik (m) pkj (n) . (2.15)

k
Equation 2.15 is one form of the widely known ChapmanKolmogorov equa-

tion and provides efficient calculation of the n-step transitions probabilities.
We designate as P(n) the matrix of n-step probabilities or, in other words, the
matrix whose (i,j) entry is pij ( n ) . Then, if in (2.15) we let m = 1 and replace n
by n-1, we can rewrite Equation 2.15 in matrix form:
P ( n ) = P P ( n 1) = P n , (2.16)
where P is the one-step probabilities of the Markov chain.

In words, the n-step transition probability matrix is the nth power of the one-
step transition probability matrix.
Based on the obtained results the unconditional state probabilities p j (n) can be
examined. Their values depend on the initial state probabilities at n = 0 and on the
number of steps passed since n = 0. It can be written as follows:
p j (n) = Pr{ X (n) = j ) =

(2.17)
= Pr( X (0) = i ) Pr( X (n) = j | X (0) = i ) = pi (0) pij (n).
i i
In matrix form expression (2.17) can be rewritten as
p ( n) = p ( 0) Pn , (2.18)
where p(0) and p(n) are the row-vectors of the state probabilities initially (at step
n = 0 ) and after the nth step, respectively.
This implies that unconditional state probabilities of a homogeneous Markov
chain are completely determined by the one-step transition probability matrix P
and the initial probability vector p(0).
To illustrate the presented approach, we consider the following example.
Example 2.1 (Bhat and Miller 2002). Assume a two-state Markov chain with the
states denoted by 0 and 1 (Figure 2.2).
Fig. 2.2 Two-state discrete-time Markov chain
The one-step transition probability matrix will be as follow
p p01 1 p01 p01

P = 00 = ,
p10 p11 p10 1 p10
since it must hold that p00 + p01 = 1 and p10 + p11 = 1. Assume that p01 = and
p10 = . Then
1
P= .
1
The initial conditions at step n = 0 are the following p0 (0) = a and

p1 (0) = 1 a, so the initial state probability row-vector is given as
p ( 0 ) = [ p0 (0), p1 (0) ] = [ a,1 a ] .
Find the n-step transition probability matrix and unconditional probabilities

p0 ( n ) and p1 ( n ) of states 0 and 1 at step n, respectively.
Solution. According to the given one-step transition probability matrix P we can

write
p00 (1) = p00 = 1 , p01 (1) = p01 = ,

p10 (1) = p10 = , p11 (1) = p11 = 1 .
For n > 1 , using Equation 2.16, we obtain
p00 (n) = p00 (1) p00 (n 1) + p10 (1) p01 (n 1) =

= (1 ) p00 (n 1) + p01 (n 1).
Now since the row sums of matrix P n1 are unity, we have
p01 (n 1) = 1 p00 (n 1) .
Substituting p01 (n 1) into the previous equation we obtain for n > 1
p00 (n) = (1 ) p00 (n 1) + [1 p00 (n 1)] = + (1 ) p00 (n 1) .
By using the last recurrent equation we can write the following:

p00 (1) = 1 ,
p00 (2) = + (1 )(1 ),
p00 (3) = + (1 ) + (1 ) 2 (1 ),
...
p00 (n) = + (1 ) + (1 ) 2 + ... +
+ (1 ) n 2 + (1 ) n 1 (1 )
n 2
= (1 ) k + (1 ) n 1 (1 ).
k =0
Based on the formula for the sum of a finite geometric series, we can write:
n2 1 (1 ) n 1 1 (1 ) n 1
(1 ) k = 1 (1 )
=
+
.
k =0
Therefore, the expression for p00 (n) can be rewritten in the following form:
(1 ) n
p00 (n) = + .
+ +
Now p01 (n) can be found:
(1 )n
p01 (n) = 1 p00 (n) = .
+ +
Expressions for the two remaining entries p10 (n) and p11 (n) can be found in a
similar way. (Readers can do it themselves as an exercise.)
Thus, the n-step transition probability matrix can be written as
+ (1 ) n (1 ) n

+ +
P ( n) = Pn = .
(1 )n + (1 ) n

+ +
Based on this n-step transition probability matrix and on the given initial state
probability row-vector p ( 0 ), one can find state probabilities after the nth step by
using Equation 2.18
+ (1 ) n (1 ) n

+ +
p ( n ) = p ( 0 ) P n = [ a,1 a ]
(1 )n + (1 ) n

+ +
+ (1 ) n (1 )n
= [a( + ) ], [a ( + ) ] .
+ +
Therefore, the state probabilities after the nth step are as follows:
+ (1 ) n
p0 (n) =
+
[ a( + ) ],
(1 )n
p1 (n) =
+
[ a( + ) ].
2.3 Markov Models: Continuous-time Markov Chains
2.3.1 Basic Definitions and Properties
As in the previous section we confine our attention to discrete-state Markov sto-

chastic processes or Markov chains. The continuous-time Markov chain is similar
to that of the discrete-time case, except that the transitions from any given state to
another state can take place at any instant of time. Therefore, for a discrete-state
continuous-time Markov chain the set of values X(t) is discrete, X (t ) {1, 2,...},
and parameter t has a continuous range of values, t [0, ). In reliability applica-
tions the set S of states is usually finite, S = {1, 2,..., K } , and so X (t ) {1,2,..., K } .
A discrete-state continuous-time stochastic process { X (t ) | t 0} is called a
Markov chain if for t0 < t1 < ... < t n 1 < t n its conditional probability mass func-
tion satisfies the relation
Pr { X (tn ) = xn | X (tn 1 ) = xn 1 ,..., X (t1 ) = x1 , X (t0 ) = x0 }

(2.19)
= Pr { X (tn ) = xn | X (tn 1 ) = xn 1 } .
2.3 Markov Models: Continuous-time Markov Chains 41
Introducing the notations t = tn 1 and tn = tn 1 + t the expression (2.19) sim-

plifies to
Pr{ X (t + t ) = i | X (t ) = j} = ji (t , t + t ) . (2.20)
The following designation is often used for the simplification:
ji (t , t + t ) = ji (t , t ) .
These conditional probabilities are called transition probabilities. If the prob-

abilities ji ( t , t ) do not depend on t, but only on the time difference t , the
Markov process is said to be (time-) homogeneous. jj ( t , t ) is the probability
that no change in the state will occur in a time interval of length t given that the
process is in state j at the beginning of the interval. Note that
1, if j = i,
ji (t , t ) = (2.21)
0, otherwise.
Taking into account (2.21) one can define for each j a non-negative continuous
function a j ( t ) :
jj (t , t ) jj (t , t + t ) 1 jj (t , t + t )
a j (t ) = lim = lim (2.22)
t 0 t t 0 t
and for each j and i j a non-negative continuous function a ji ( t ) :
ji (t , t ) ji (t , t + t ) ji (t , t + t )
a ji (t ) = lim = lim . (2.23)
t 0 t t 0 t
The function a ji ( t ) is called the transition intensity from state i to state j at

time t. For homogeneous Markov processes, the transition intensities do not de-
pend on t and therefore are constant.
If the process is in state j at a given moment, in the next t time interval there
is either a transition from j to some state i or the process remains at j. Therefore
jj (t ) + ji (t ) = 1. (2.24)
i j
Designating a jj = a j and combining (2.24) with (2.22) one obtains

1
a jj = a j = lim
t 0
ji (t ) =
t i j i j
a ji . (2.25)
Let pi ( t ) be the state probabilities of X ( t ) at time t:
pi (t ) = Pr { X (t ) = i} , j = 1,..., K ; t 0. (2.26)
Expression (2.26) defines the probability mass function (pmf) of X(t) at time t.
Since at any given time the process must be in one of K states,
p (t ) = 1
i =1
i (2.27)
for any t 0.
By using the theorem of total probability, for given t > t1 , we can express the
pmf of X(t) in terms of the transition probabilities ij (t1 , t ) and the pmf of X(t1):
p j (t ) = Pr( X (t ) = j ) = Pr{ X (t ) = j | X (t1 ) = i}Pr{ X (t1 ) = i}

iS
(2.28)
= ij (t1 , t ) pi (t1 ).
iS
If we let t1 = 0 in (2.28), we obtain the following equation:
p j (t ) = ij (0, t ) pi (0). (2.29)

iS
This means that the probabilistic behavior of a continuous-time Markov chain

in the future is completely determined by the transition probabilities ij (0, t ) and
the initial probability vector p ( 0 ) = [ p1 (0),..., pK (0) ] .
The transition probabilities of a continuous-time Markov chain { X (t ) | t 0}
satisfy for all i, j S , the ChapmanKolmogorov equation, which can be written
for this case in the following form:
ij (t1 , t ) = ik (t1 , t2 ) kj (t2 , t ), 0 t1 < t2 t . (2.30)

k S
The proof of this equation is based on the theorem of total probability:

Pr { X (t ) = j | X (t1 ) = i}
(2.31)
= Pr { X (t ) = j | X (t2 ) = k , X (t1 ) = i} Pr { X (t2 ) = k | X (t1 ) = i} .
k S
The subsequent application of the Markov property (2.20) to expression (2.31)

yields (2.30).
The state probabilities at instant t + t can be expressed based on state prob-
abilities at instant t by using the following equations:

p j (t + t ) = p j (t ) 1 a ji t + pi (t )aij t , i, j = 1,..., K . (2.32)
i j i j
Equation 2.32 can be obtained by using the following considerations.

The process can achieve state j at instant t + t in two ways.
1. The process may already be in state j at instant t and does not leave this state up
to the instant t + t. These events have probabilities p j ( t ) and 1 a ji t ,
i j
respectively.
2. At instant t the process may be in one of the states i j and during time t
transits from state i to state j. These events have probabilities pi ( t ) and aij t ,
respectively. These probabilities should be multiplied and summarized for all
i j because the process can achieve state j from any state i.
Now one can rewrite (2.32) by using (2.29) and obtain the following:
p j (t + t ) = p j (t )[1 + a jj t ] + pi (t ) aij t (2.33)

i j
or
p j (t + t ) p j (t )
K K K
= pi (t )aij t + p j (t )a jj t = pi (t )aij t p j (t ) a ji t. (2.34)
i =1 i =1 i =1
i j i j i j
After dividing both sides of Equation 2.34 by t and passing to limit t 0,

we get
dp j (t ) K K
= pi (t )aij p j (t ) a ji , j = 1, 2,..., K . (2.35)
dt i =1 i =1
i j i j
The system of differential equations (2.35) is used for finding the state prob-
abilities p j ( t ) , j = 1, , K for the homogeneous Markov process when the initial
conditions are given as
p j (t ) = j , j = 1,..., K . (2.36)
More mathematical details about (2.35) may be found in Trivedi (2002) or in

Ross (1995).
When a state-transition diagram for continuous-time Markov chain is built,
Equation 2.35 can be written by using the following rule: the time derivative of
p j ( t ) for any arbitrary state j equals the sum of the probabilities of the states that
have transitions to state j multiplied by the corresponding transition intensities
minus the probability of state j multiplied by the sum of the intensities of all tran-
sitions from state j.
Introducing the row-vector p ( t ) = [ p1 (t ), p2 (t ),..., pK (t )] and the transition in-
tensity matrix a
a11 a12 ... a1K

a a22 ... a2 K
a = 21 , (2.37)
... ... ... ...

aK 1 aK 2 ... aKK
in which the diagonal elements are defined as a jj = a j , we can rewrite system

(2.35) in matrix notation:
dp (t )
= p (t ) a . (2.38)
dt
K
Note that the sum of the matrix elements in each row equals 0: a
j =1
ij = 0 for
each i (1 i K ) .
When the system state transitions are caused by failures and repairs of its ele-
ments, the corresponding transition intensities are expressed by the elements fail-
ure and repair rates.
An elements failure rate ( t ) is the instantaneous conditional density of the
probability of failure of an initially operational element at time t given that the
element has not failed up to time t. Briefly, one can say that ( t ) is the time-to-
failure conditional probability density function (pdf). It expresses a hazard of fail-
ure in time instant t under a condition where there was no failure up to time t. The
failure rate of an element at time t is defined as
1 F (t + t ) F (t ) f (t )
(t ) = lim
t 0 t R (t ) = R(t ) , (2.39)

where R ( t ) = 1 F ( t ) is the reliability function of the element, F ( t ) is the CDF

of the time to failure of the element, and f ( t ) is pdf of the time to failure of the
element.
For homogeneous Markov processes the failure rate does not depend on t and
can be expressed as
= MTTF 1 , (2.40)
where MTTF is the mean time to failure. Similarly, the repair rate ( t ) is the
time-to-repair conditional pdf. For homogeneous Markov processes a repair rate
does not depend on t and can be expressed as
= MTTR 1 , (2.41)
where MTTR is the mean time to repair.

A state i is said to be an absorbing state; if once entered, the process is destined
to remain in that state. A state j is said to be reachable from state i if for some t>0,
ij (t ) > 0 . A continuous-time Markov chain is said to be irreducible if every state
is reachable from every other state.
In many applications, the long-run (final) or steady-state probabilities
pi = lim pi (t ) are of interest. For an irreducible continuous-time Markov chain
t
these limits always exist for every state i S ,
pi = lim pi (t ) = lim ij (t ) = lim i (t ) (2.42)

t t t
and they are independent of the initial state j S . If the steady-state probabilities
exist, the process is called ergodic. For the steady-state state probabilities, the
computations become simpler. The set of differential equations (2.35) is reduced
to a set of K algebraic linear equations because for the constant probabilities all
dpi ( t )
time-derivatives are equal to zero, so = 0, i = 1,..., K .
dt
Let the steady-state probabilities pi = lim pi ( t ) exist. For this case in steady
t
state, all derivatives of state probabilities on the left-hand side of (2.35) will be ze-
roes. So, in order to find the long-run probabilities the following system of alge-
braic linear equations should be solved:
k K
0 = pi (t )aij p j (t ) a ji , j = 1, 2,..., K . (2.43)
i =1 i =1
i j i j
The K equations in (2.43) are not linearly independent (the determinant of the
system is zero). An additional independent equation can be provided by the simple
fact that the sum of the state probabilities is equal to 1 at any time:
p
i =1
i = 1. (2.44)
Thus, steady-state probabilities of ergodic continuous-time Markov chains can

be found using expressions (2.43) and (2.44).
Now we consider additional important parameters of the process in steady
state: state frequency and mean time of staying in state. The frequency f i of state
i is defined as the expected number of arrivals into this state per unit time. Usually
the concept of frequency is associated with the long-term (steady-state) behavior
of the process. In order to relate the frequency, probability, and mean time of
staying in state i, we consider the system evolution in the state space as consisting
of two alternating periods the stays in i and the stays outside i. Thus, the proc-
ess is represented by two states. Designate the mean duration of the stays in state i
as Ti and that of the stays outside i, Toi . The mean cycle time, Tci , is then
Tci = Ti + Toi . (2.45)
From the definition of the state frequency it follows that, in the long run, f i
equals the reciprocal of the mean cycle time
1
fi = . (2.46)
T ci
Multiplying by Ti both sides of Equation 2.46 one gets
Ti
T i fi = = pi . (2.47)
T ci
Therefore
pi
fi = . (2.48)
Ti
This is a fundamental equation, which provides the relation between the three
state parameters in the steady state.
Unconditional random value Ti is minimal from all random values Tij that
characterize the conditional random time of staying in state i if the transition is
performed from state i to any state j i :
Ti = min{Ti1 ,..., Tij }. (2.49)
All conditional times Tij are distributed exponentially with the following cu-
mulative distribution functions Fij (Tij t ) = 1 e
aij t
. All transitions from state i
are independent and, therefore, the cumulative distribution function of uncondi-
tional time Ti of staying in state i can be computed as follows:
Fi (Ti t ) = 1 Pr{Ti > t} = 1 Pr{Tij > t}

j i
aij t (2.50)
= 1 1 Fij (Tij t ) = 1 e
aij t
= 1 e j i
.
j i j i
This means that unconditional time Ti is distributed exponentially with parame-

ter ai = aij , and the mean time of staying in state i is asfollows:
j
1 . (2.51)
Ti =
a
j i
ij
Substituting Ti in expression (2.48) we finally get
f i = pi aij . (2.52)
j i
Once state probabilities, pi or pi(t), have been computed, reliability measures

are usually obtained as corresponding functionals of these probabilities.

Elements
According to the generic MSS model (Chapter 1), any system element j can have
kj different states corresponding to the performance rates, represented by the set
{ }
g j = g j1 ,..., g jk j . The current state of the element j and, therefore, the current
value of the element performance rate G j ( t ) at any instant t are random variables.
G j ( t ) takes values from g j : G j ( t ) g j . Therefore, for the time interval [0,T],
where T is the MSS operation period, the performance rate of element j is defined
as a stochastic process. Note that we consider only the Markov process where the
state probabilities at a future instant do not depend on the states occupied in the
past.
In this subsection, when we deal with a single multi-state element, we can omit
index j for the designation of a set of the elements performance rates. Thus, this
set is denoted as g = { g1 ,..., g k } . We also assume that this set is ordered so that
gi+1 gi for any i.
The elements can be divided into two groups. Those elements that are observed
only until they fail belong to the first group. These elements either cannot be re-
paired, or the repair is uneconomical, or only the life history up to the first failure
is of interest. Those elements that are repaired upon failure and whose life histo-
ries consist of operating and repair periods belong to the second group. In the fol-
lowing subsections, both groups are discussed.
2.3.2.1 Non-repairable Multi-state Element
As mentioned above, the lifetime of a non-repairable element lasts until its first
entrance into the subset of unacceptable states. In general, the acceptability of an
elements state depends on the relation between the elements performance and
the desired level of this performance (demand). The demand W(t) is also a random
process that takes discrete values from the set w = { w1 , , wM }. The desired rela-
tion between the system performance and the demand can be expressed by the ac-
ceptability function F(G(t),W(t)).
First consider a multi-state element with only minor failures defined as failures
that cause element transition from state i to the adjacent state i1. In other words,
a minor failure causes minimal degradation of element performance. The state-
space diagram for such an element is presented in Figure 2.3.
The element evolution in the state space is the only performance degradation
that is characterized by the stochastic process {G(t) | t 0 }. The transition inten-
sity for any transition from state i to state i1 is i ,i 1 , i = 2,..., k .
Fig. 2.3 State-transition diagram for non-repairable element with minor failures
When the sojourn time in any state i (or in other words, the time up to a minor
failure in state i) is exponentially distributed with parameter i ,i 1 , the process is a
continuous-time Markov chain. Moreover, it is the widely known pure death
process (Trivedi 2002). Let us define the auxiliary discrete-state continuous time
stochastic process { X ( t ) | t 0} , where X ( t ) {1,..., k} . This process is strictly
associated with the stochastic process {G ( t ) | t 0} . When X ( t ) = i, the corre-
sponding performance rate of a multi-state element is gi : G ( t ) = gi . The process
X(t) is a discrete-state stochastic process decreasing by 1 at the points ti, i = 1, , k ,
when the corresponding transitions occur. The state probabilities of X(t) are
pi (t ) = Pr{ X (t ) = i}, i = 1,..., k for t 0 . (2.53)
Note that
k
pi (t ) = 1 (2.54)
i =1
for any t 0 , since at any given time the process must be in some state.
According to the system (2.35), the following differential equations can be
written in order to find state probabilities for the Markov process presented in
Figure 2.3:
dpk (t )
dt = k , k 1 pk (t ),

dpi (t )
= i +1,i pi +1 (t ) i ,i 1 pi (t ), i = 2,3,..., k 1, (2.55)
dt
dp1 (t )
dt = 2,1 p2 (t ).

One can see that in state k there is only one transition from this state to the state
k1 with the intensity of k , k 1 and there are no transitions to state k. In each state
i, i = 2,3,, k 1, there is one transition to this state from the previous state i+1
with the intensity i +1,i and there is one transition from this state to state i1 with
the intensity i ,i 1 . Observe that there are no transitions from state 1. This means
that if the process enters this state, it is never left. State 1 for non-repairable multi-
state elements is the absorbing state.
We assume that the process begins from the best state k with a maximal ele-
ment performance rate of gk. Hence, the initial conditions are
pk (0) = 1, pk 1 (0) = pk 2 (0) = ... = p1 (0) = 0. (2.56)
Using widely available software tools, one can obtain the numerical solution of
the system of differential equations (2.55) under initial conditions (2.56) even for
large k. The system (2.55) can also be solved analytically using the Laplace
Stieltjes transform (Gnedenko and Ushakov 1995). Using this transform and tak-
ing into account the initial conditions (2.56) one can represent (2.55) in the form
of linear algebraic equations:
sp k ( s ) 1 = k , k 1 p k ( s ),

sp i ( s ) = i +1,i p i +1 ( s ) i ,i 1 p i ( s ), i = 2,3,..., k 1, (2.57)

sp1 ( s ) = 2,1 p 2 ( s ),

where p k ( s ) = L { pk (t )} = e st pk (t ) is the LaplaceStieltjes transform of a func-
0
dp (t )
tion pk (t ) and L k = sp k ( s ) pk (0) is the LaplaceStieltjes transform of
dt
the derivative of a function pk (t ).
The system (2.57) may be rewritten in the following form:
1
p k ( s ) = s + ,
k , k 1
i +1,i
p i ( s ) = p i +1 ( s ), i = 2,3,..., k 1, (2.58)
s + i , k 1

p1 ( s ) = 2,1 p 2 ( s ).
s
Starting to solve this system from the first equation and sequentially substitut-
ing the obtained results into the next equation, one obtains
1
p k ( s ) = ,
s + k , k 1
i +1,i i + 2,i +1 k , k 1 1

p i ( s ) = ... , i = 2,3,..., k 1, (2.59)
( s + i ,i 1 ) ( s + i +1,i ) ( s + k 1, k 2 ) ( s + k , k 1 )
3,2 4,3 k , k 1 1
p1 ( s ) = 2,1 ... .
s ( s + 2,1 ) ( s + 3,2 ) ( s + k 1, k 2 ) ( s + k , k 1 )
Now in order to find the functions pk (t ), the inverse LaplaceStieltjes trans-

form L1 { p k ( s )} = pk (t ) should be applied (Korn and Korn 2000).
In the most common case when F ( gi , w ) = gi w (the element performance
should not be less than the demand) for the constant demand level gi +1 w > gi
( i = 1,..., k 1) the acceptable states are the states i+1,,k, where the element per-
formance is above level gi .
The probability of the state with the lowest performance p1 (t ) determines the
unreliability function of the multi-state element for the constant demand level
g 2 w > g1 . Therefore, the reliability function defined as the probability that the
element is not in its worst state (total failure) is
R1 (t ) = 1 p1 (t ) . (2.60)
In general, if the constant demand is gi +1 w > gi , i = 1,..., k 1 , the unreliabil-

ity function for the multi-state element is a sum of the probabilities of the unac-
ceptable states 1,2,,i. Thus, the reliability function is
i
Ri (t ) = 1 p j (t ) . (2.61)
j =1
The mean time up to multi-state element failure for this constant demand level
can be interpreted as the mean time up to the process entering state i. It can be cal-
culated as the sum of the time periods during which the process remains in each
state j > i. Since the process begins from the best state k with the maximal ele-
ment performance rate gk [the initial conditions (2.56)], we have
k
1
MTTFi =
j = i +1
, i = 1, 2,..., k 1 . (2.62)
j , j 1
According to (1.23) one can obtain the element mean instantaneous perform-
ance at time t as
k
Et = gi pi (t ) . (2.63)
i =1
The element mean instantaneous performance deficiency for the constant de-
mand w according to (1.29) is
k
Dt = pi (t ) max( w g i ,0) . (2.64)
i =1
Example 2.2 We consider an electric generator installed in an airplane where its

maintenance is impossible during flight. This generator assumed as a non-
repairable multi-state element that can have only minor failures. The generator has
4 possible performance levels (in states 4, 3, 2, and 1 its capacities are g 4 = 10
KW, g3 = 8 KW, g 2 = 5 KW and g1 = 0, respectively) and the following failure
rates: 4,3 = 2 year 1 , 3,2 = 1 year 1 , and 2,1 = 0.7 year 1 . The initial state is the
best state 4.
Each flight duration is T flight = 10 h. The airplane was designed for N flight = 50
flights up to general maintenance on the ground. Thus, the service time up to the
general maintenance is defined as Tservice = 500 h. The failure is defined as de-
creasing of generating capacity down the demand level 6 KW.
Our objective is to find the expected energy not supplied to the airplane's con-
sumers during the airplane service time, the probability that the failure occurs dur-
ing the service time, and the mean time up to the failure.
Solution. In order to find state probabilities the following system of differential
equations should be solved according to (2.55):
dp4 (t )
dt = 4,3 p4 (t ),

dp3 (t ) = p (t ) p (t ),
dt 4,3 4 3,2 3

dp2 (t ) = p (t ) p (t ),
dt 3,2 3 2,1 2
dp (t )
1 = 2,1 p2 (t ),
dt
with the initial conditions p4 (0) = 1, p3 (0) = p2 (0) = p1 (0) = 0.

Using the LaplaceStieltjes transform, we obtain
1 4,3
p 4 ( s ) = , p 3 ( s ) = ,
s + 4,3 ( s + 3,2 )( s + 4,3 )
3,2 4,3 2,13,2 4,3
p 2 ( s ) = , p1 ( s ) = .
( s + 2,1 )( s + 3,2 )( s + 4,3 ) s ( s + 2,1 )( s + 3,2 )( s + 4,3 )
Using the inverse Laplace-Stieltjes transform, we find the state probabilities as

functions of time t:
4 ,3t
p4 (t ) = e ,
4,3 3,2 t 4 ,3
p3 (t ) = (e e ),
4,3 3,2
3,2 4,3 [(4,3 3,2 )e t + (2,1 4,3 )e t + (3,2 2,1 )e t ]
2 ,1 3,2 4 ,3
p2 (t ) = ,
(3,2 2,1 )(4,3 3,2 )(2,1 4,3 )
p4 (t ) = 1 p2 (t ) p3 (t ) p4 (t ).
These probabilities are presented in Figure 2.4.

Now we can obtain the reliability measures for this multi-state element. The re-
liability functions for different demand levels are according to (2.61):
R1 (t ) = 1 p1 (t ), for g1 < w g 2 ,
R2 (t ) = 1 p1 (t ) p2 (t ), for g 2 < w g3 ,
R3 (t ) = 1 p1 (t ) p2 (t ) p3 (t ) = p4 (t ), for g3 < w g 4 .
These reliability functions are also presented in Figure 2.4.

0.8
p1(t)
p2(t)
0.6 p3(t)
Probability
p4(t)
R1(t)
0.4 R2(t)
0.2
0
0 2 4 6 8
Time (years)
Fig. 2.4 State probabilities and reliability measures for non-repairable element with minor fail-
ures
According to (2.63) we obtain the element mean instantaneous performance at

time t:
4
Et = gi pi (t ) = 10 p4 (t ) + 8 p3 (t ) + 5 p2 (t ) + 0 p1 (t ).
i =1
The demand is constant during the flight and w = 6 KW. Therefore, according
to (2.64), the element mean instantaneous performance deficiency is
4
Dt = pi (t ) max( w gi , 0) = 1 p2 (t ) + 6 p1 (t ).
i =1
Functions Et and Dt are presented in the Figure 2.5.

Note that the expected energy not supplied (EENS) to the airplane consumers
during the service time Tservice = 500 h will be as follows:
Tservice
EENS =
0
Dt dt 0.547 KWh .
10
Et
8 Dt
6
Kw
0
0 2 4 6 8
Time (years)
Fig. 2.5 Mean instantaneous performance and mean instantaneous performance deficiency for
non-repairable element with minor failures
Now based on (2.62) we obtain the mean times to failure
1 1 1
MTTF1 = + + = 2.93 year for g1 < w g 2 ,
4,3 3,2 2,1
1 1
MTTF2 = + = 1.5 year for g 2 < w g3 ,
4,3 3,2
1
MTTF3 = = 0.5 year for g3 < w g 4 .
4,3
For the constant demand w = 6 KW, the mean time to failure is equal
to MTTF2 = 1.5 years. The probability that this failure (decreasing the generating
capacity lower than a demand level of 6 KW) will not occur during the service
time according to the graph in Figure 2.4 will be as follows:
R2 ( t = Tservice ) = R2 ( 500 h ) = 0.997.
Now consider a non-repairable multi-state element that can have both minor
and major failures (a major failure is a failure that causes the element transition
from state i to state j: j < i 1 ). The state-space diagram for such an element rep-
resenting transitions corresponding to both minor and major failures is presented
in Figure 2.6.
Fig. 2.6 State-transition diagram for non-repairable element with minor and major failures
For the continuous-time Markov chain that is represented by this state-space

diagram, the following system of differential equations for state probabilities can
be written according to Equations 2.35:
dpk (t ) k 1
dt = p k (t ) k ,e ,
e =1
dpi (t ) k i 1
= e ,i pe (t ) pi (t ) i , e , i = 2,3,..., k 1, (2.65)
dt e = i +1 e =1
dp1 (t ) k
= e ,1 pe (t ),
dt e=2
with the initial conditions (2.56).

After solving this system and obtaining the state probabilities
pi ( t ) , i = 1, , k , the mean instantaneous performance and the mean instantane-
ous performance deficiency can be determined by using (2.63) and (2.64).
As in the case of the non-repairable multi-state element with minor failures, the
unavailability of the element with both minor and major failures is equal to the
sum of the probabilities of unacceptable states. Therefore, for the constant demand
w ( gi < w gi +1 ) one can use expression (2.61) for determining the element reli-
ability function.
The straightforward method for finding the mean time up to failure is not appli-
cable for multi-state elements with minor and major failures. The general method
for solving this problem is based on the Markov reward model and is presented in
a later section.
2.3.2.2 Repairable Multi-state Elements
The more general model of a multi-state element is the model with repair. The re-
pairs can also be both minor and major. A minor repair returns an element from
state j to state j + 1 while a major repair returns it from state j to state i, where
i > j + 1.
The special case of the repairable multi-state element is an element with only
minor failures and minor repairs. The stochastic process corresponding to such an
element is called the birth and death process. The state-space diagram of this proc-
ess is presented in Figure 2.7 (a).
k-1,k k,k-1
k-1 k,2
2,k
k-2,k-1 k-1,k-2
1,k k,1
... ...
2,3 3,2
1,k-1 k-1,1
2
1,2 2,1
(a) (b)
Fig. 2.7 State-transition diagrams for repairable element with minor failures and repairs (a) and
for repairable element with minor and major failures and repairs (b)
The state-space diagram for the general case of the repairable multi-state ele-
ment with minor and major failures and repairs is presented in Figure 2.7 (b). The
following system of differential equations can be written for the state probabilities
of such elements:
dpk (t ) k 1 k 1
dt = e , k pe (t ) pk (t ) k ,e ,
e =1 e =1
dpi (t ) k i 1 i 1 k
= e ,i pe (t ) + e ,i pe (t ) pi (t )( i , e + i , e ),
dt e = i +1 e =1 e =1 e = i +1 (2.66)
i = 2,3,..., k 1

dp1 (t ) k k
dt = e ,1 pe (t ) p1 (t ) 1, e ,
e=2 e=2
with the initial conditions (2.56). Solving this system one obtains the state prob-
abilities pi ( t ) , i = 1, , k .
When F ( gi , w ) = gi w for the constant demand level gi < w gi +1 , the ac-
ceptable states where the element performance is above level gi are i + 1,, k .
Thus, the instantaneous availability is
k
Ai (t ) = pe (t ) . (2.67)
e = i +1
The element mean instantaneous performance and mean instantaneous per-

formance deficiency can be determined by using (2.63) and (2.64).
In many applications the steady-state probabilities lim pi (t ) are of interest for
t
the repairable element. As was said above, if the steady-state probabilities exist,
the process is called ergodic. For the steady-state probabilities the computations
become simpler. The set of differential equations (2.66) is reduced to a set of k al-
gebraic linear equations because for the constant probabilities all time-derivatives
dp (t )
are equal to zero, thus, i = 0 , i=1,,k.
dt
Let the steady-state probabilities pi = lim pi (t ) exist. In order to find the prob-
t
abilities the following system of algebraic linear equations should be solved
k 1 k 1
0 = p
e, k m p k k , e ,
e =1 e =1
k i 1 i 1 k
0 = e ,i pe + e ,i pe pi ( i ,e + i ,e ), i = 2,3,..., k 1, (2.68)
e = i +1 e =1 e =1 e = i +1
k k
0 = e ,1 pe p1 1,e .
e=2 e=2
The k equations in (2.68) are not linearly independent (the determinant of the
system is zero). An additional independent equation can be provided by the simple
fact that the sum of the state probabilities is equal to 1 at any time:
k
pi = 1 . (2.69)
i =1
The determination of the reliability function for repairable multi-state elements

is based on finding the probability of the event when the element enters the set of
unacceptable states the first time. It does not matter which one of the unacceptable
states is visited first. It also does not matter how the element behaves after enter-
ing the set of unacceptable states the first time.
i
k ,0 = k , j
j =1
i
k 1,0 = k 1, j
j =1
i
i +1,0 = i +1, j
j =1
Fig. 2.8 State-transition diagram for determination of reliability function Ri(t) for repairable ele-
ment (for a constant demand rate w: gi<w<gi+1)
In order to find the element reliability function Ri (t ), for the constant demand
w ( gi < w gi +1 ) , an additional Markov model should be built. All states 1,2,,i
of the element corresponding to the performance rates that are lower than the de-
mand w should be united in one absorbing state. This absorbing state can be con-
sidered now as state 0 and all repairs that return the element from this state back to
the set of acceptable states should be forbidden. This corresponds to zeroing all
the transition intensities 0, m for m = i + 1,..., k . The transition rate m,0 from
any acceptable state m (m > i ) to the united absorbing state 0 is equal to the sum
of the transition rates from state m to all the unacceptable states (states 1,2,,i):
i
m,0 = m, j , m = k , k 1,, i + 1. (2.70)
j =1
The state-transition diagram for computation of the reliability function is pre-

sented in Figure 2.8. For this diagram, the state probability p0 (t ) characterizes the
reliability function of the element because after the first entrance into the absorb-
ing state 0 the element never leaves it: Ri (t ) = 1 p0 (t ).
The system of differential equations for determining the reliability function of
the element takes the following form:
dpk (t ) k 1 k 1
= e , k pe (t ) pk (t ) k ,e + k ,0 ,
dt e = i +1 e = i +1
dp (t ) k j 1
j 1 k
j = e , j pe (t ) + e , j pe (t ) p j (t ) j , e + j ,0 + j ,e ,
dt e = j +1 e =1 e = i +1 e = j +1 (2.71)

for i < j < k
dp (t ) k
0 = e ,0 pe (t ).
dt e = i +1
Solving this system under initial conditions
pk (0) = 1, pk 1 ( 0 ) = = pi ( 0 ) = p0 ( 0 ) = 0
one obtains the reliability function as
k
Ri (t ) = 1 p0 (t ) = p j (t ) . (2.72)
j = i +1
Obviously, the final state probabilities for system (2.71) are as follows:
pk = pk 1 = = pi +1 = 0 , p0 = 1,
because the element always enters the absorbing state 0 when t .

k
Based on the computed reliability function Ri (t ) = p j (t ) one can find the
j = i +1
mean time to first failure, when the element performance drops for the first time
under demand level w, where gi < w gi +1 :

MTTFi = Ri (t )dt . (2.73)
0
Once state probabilities, pi or pi(t), have been computed, reliability measures

are usually obtained based on these probabilities.
Example 2.3 (Lisnianski and Levitin 2003). Consider a data processing unit that
has k = 4 possible performance levels with corresponding task processing speeds:
g 4 = 100 s 1 , g3 = 80 s 1 , g 2 = 50 s 1 , and g1 = 0 s 1 .
The unit has the following failure rates
4,3 = 2 year 1 , 3,2 = 1 year 1 , 2,1 = 0.7 year 1 ( for minor failures ) ,
3,1 = 0.4 year 1 , 4,2 = 0.3 year 1 , 4,1 = 0.1 year 1 ( for major failures )
and the following repair rates
3,4 = 100 year 1 , 2,3 = 80 year 1 , 1,2 = 50 year 1 ( for minor repairs ) ,
1,4 = 32 year 1 , 1,3 = 40 year 1 , 2,4 = 45 year 1 ( for major repairs ) .
The demand is constant w = 60 s 1 .
Find such element reliability measures as availability, mean performance, mean
performance deficiency, reliability function, and mean time to first failure.
Solution. The state-space diagram for the unit is presented in Figure 2.9 (a). We
assume that the initial state is the best state 4.
In order to find the state probabilities, the following system of differential
equations should be solved:
dp4 (t )
dt = (4,3 + 4,2 + 4,1 ) p4 (t ) + 3,4 p3 (t ) + 2,4 p2 (t ) + 1,4 p1 (t ),

dp3 (t ) = p (t ) ( + + ) p (t ) + p (t ) + p (t ),
dt 4,3 4 3,2 3,1 3,4 3 1,3 1 2,3 2

dp2 (t ) = p (t ) + p (t ) ( + + ) p (t ) + p (t ),
dt 4,2 4 3,2 3 2,1 2,3 2,4 2 1,2 1
dp (t )
1 = 4,1 p4 (t ) + 3,1 p3 (t ) + 2,1 p2 (t ) ( 1,2 + 1,3 + 1,4 ) p1 (t ),
dt
with the initial conditions p4 (0) = 1 , p3 (0) = p2 (0) = p1 (0) = 0 .

(a) (b)
Fig. 2.9 State-transition diagrams for four-state element with minor and major failures and re-
pairs
The element instantaneous availability can be obtained for different constant

demand levels:
A3 (t ) = p4 (t ), for g3 < w g 4 ,
A2 (t ) = p4 (t ) + p3 (t ), for g 2 < w g3 ,
A1 (t ) = p4 (t ) + p3 (t ) + p2 (t ) = 1 p1 (t ), for g1 < w g 2 .
These element instantaneous availabilities are presented in Figure 2.10.
0.995
0.99 A1(t)
A2(t)
Availability
A3(t)
0.985
0.98
0.975
0.97
0 0.02 0.04 0.06 0.08 0.1
Time (years)
Fig 2.10 Instantaneous availability of four-state element

The element mean instantaneous performance at time t is
4
Et = g k pk (t ) = 100 p4 (t ) + 80 p3 (t ) + 50 p2 (t ) + 0 p1 (t ).
k =1
For demand w = 60 s 1 the element availability will be the following
Aw ( t ) = A2 ( t ) .
The mean instantaneous performance deficiency (for constant demand

w = 60 s 1 ) is
4
Dt = pk (t ) max( w g k ,0) = 10 p2 (t ) + 60 p1 (t ) .
k =1
The indices Dt and Et, as functions of time, are presented in Figure 2.11.
100 0.1
Instantaneous mean performance
99.9
Performance deficiency
0.08
99.8
99.7 0.06
99.6
0.04
99.5
0.02
99.4
0 0.02 0.04 0.06 0.08 0.1 0

0 0.02 0.04 0.06 0.08 0.1
Time (years) Time (years)
(a) (b)
Fig. 2.11 Instantaneous mean performance (a) and performance deficiency (b) of the four-state
element
If one wants to find only the final state probabilities he can do it without solv-
ing the system of differential equations. As was shown above, the final state prob-
abilities can be found by solving the system of linear algebraic equations (2.68) in
which one of the equations is replaced with Equation 2.69. In our example, the
system of linear algebraic equations that should be solved takes the form
(4,3 + 4,2 + 4,1 ) p4 = 3,4 p3 + 2,4 p2 + 1,4 p1 ,

(3,2 + 3,1 + 3,4 ) p3 = 4,3 p4 + 2,3 p2 + 1,3 p1 ,

(2,1 + 2,3 + 2,4 ) p2 = 4,2 p4 + 3,2 p3 + 1,2 p1 ,
p + p + p + p = 1.
1 2 3 4
Solving this system, we obtain the final state probabilities:
1,4 (b2 c3 b3 c2 ) + 1,2 (a2 b3 a3b2 ) + 1,3 (a3 c2 a2 c3 )

p1 = ,
a1b2 c3 + a2 b3 c1 + a3b1c2 a3b2 c1 a1b3 c2 a2 b1c3
2,3 (a1c3 a3 c1 ) + 2,4 (b3 c1 b1c3 ) + (2,1 + 2,3 + 2,4 )(a1b3 a3b1 )
p2 = ,
3,2 (a1b2 a2 b1 ) + (3,2 + 3,1 + 3,4 )(a1c2 a2 c1 ) + 3,4 (b1c2 b2 c1 )
p3 = ,
p4 = 1 p1 p2 p3 ,
where
a1 = 1,4 2,4 , a2 = 1,4 3,4 , a3 = 1,4 + 4,3 + 4,2 + 4,1 ,

b1 = 1,3 2,3 , b2 = 1,3 + 3,2 + 3,1 + 3,4 , b3 = 1,3 4,3 ,
c1 = 1,2 + 2,1 + 2,3 + 2,4 , c2 = 1,2 3,2 , c3 = 1,2 4,2 .
The steady-state availability of the element for constant demand w = 60 s 1 is
A = p4 + p3 ,
the mean steady-state performance is
4
E = g k pk = 100 p4 + 80 p3 + 50 p2 + 0 p1 ,
k =1
and the mean steady-state performance deficiency is
4
D = pk max( w g k ,0) = 10 p2 + 60 p1 .
k =1
As one can see in Figures 2.10 and 2.11 the steady-state values of the state
probabilities are achieved during a short time period. After 0.07 years, the process
becomes stationary. Due to this consideration, only the final solution is important
in many practical cases. This is especially so for elements with a relatively long
lifetime. This is the case in our example if the element lifetime is at least several
years. However, if one deals with highly responsible components and takes into
account even small information losses at the beginning of the process, an analysis
based on a system of differential equations should be performed.
In order to find the element reliability function Rw (t ), for the constant demand
w = 60 s 1 ( g 2 < w g3 ) , an additional Markov model should be built. States 1
and 2 corresponding to performance rates that are lower than the demand w should
be united in one absorbing state. This absorbing state can be considered now as
state 0 and all repairs that return the element from this state back to the set of ac-
ceptable states should be forbidden. This corresponds to zeroing the transition in-
tensities 0,3 and 0,4 . The transition rates from the acceptable states 3 and 4 to the
united absorbing state 0 are equal to the sum of the corresponding transition rates
from these states to the unacceptable states 1 and 2. According to (2.70) we obtain
4,0 = 4,1 + 4,2 , 3,0 = 3,1 + 3,2 .
The state-space diagram for computation of the reliability function Rw (t ) is pre-
sented in Figure 2.9 (b). For this state-space diagram, the state probability p0 (t )
characterizes the reliability function of the element because after the first entrance
into the absorbing state 0 the element never leaves it.
The system of differential equations for determining the reliability function of
the element takes the form
dp4 (t )
dt = (4,3 + 4,2 + 4,1 ) p4 (t ) + 3,4 p3 (t ),

dp3 (t )
= 4,3 p4 (t ) (3,2 + 3,1 + 3,4 ) p3 (t ),
dt
dp0 (t )
dt = (4,1 + 4,2 ) p4 (t ) + (3,1 + 3,2 ) p3 (t ).

Solving this system under initial conditions p4 (0) = 1, p3 (0) = p0 (0) = 0 we ob-
tain the reliability function as Rw (t ) = 1 p0 (t ) . This function is presented in Fig-
ure 2.12.
When the reliability function is known, the mean time to first failure (element's
capacity dropping below to demand w = 60 s 1 ) can be found by using (2.73):

MTTFw = Rw (t )dt 2.3 years.
0
0.8
0.6
Reliability
0.4
0.2
0
0 2 4 6 8
Time (years)
Fig. 2.12 Reliability Rw(t) of a four-state element

Systems
Consider a system consisting of several multi-state elements. Each combination of

the states of these elements constitutes a unique system state. Any system element
j can have kj different states corresponding to the performance rates, represented
{ }
by the set g j = g j1 , g j 2 ,, g jk j . The current state of element j and, therefore,
the current value of the element performance rate G j ( t ) at any instant t are ran-
dom variables. G j ( t ) takes values from g j : G j ( t ) g j . The performance rate of
any element j is defined as a continuous-state Markov process in the time interval
[0,T], where T is the MSS operation period. Such models for different types of
MSS elements were studied in the previous section.
According to the generic MSS model, we assume that
Ln = {g11 ,..., g1k1 } {g 21 ,..., g 2 k2 } ... {g n1 ,..., g nkn }
is a space of possible combinations of performance rates for all n system elements

and g = { g1 , g 2 , , g K } is a space of possible values of the performance rate for
the entire system. The transform ( G1 (t ),,Gn (t ) ) : Ln g, which maps the
space of the performance rates of the elements into the space of the system per-
formance rates at any instant t, defines the system structure function. Therefore,
by using the structure function, the entire MSS performance rate can be computed
for any combination of performance rates of system elements. The current state of
the entire MSS and, therefore, the current value of the system output performance
rate G(t) at any instant t are random variables. G(t) is a continuous-time Markov
chain that takes values from g : G ( t ) g = { g1 , g 2 , , g K }.
We suppose that Markov processes for different elements are independent and
that there are no simultaneous state transitions of any different elements. In other
words, there may be only one failure or one repair in a system at any instant t.
The traditional application of the Markov technique to MSS reliability evalua-
tion consists of two stages: development of the state-space diagram for the entire
system and the evaluation of the systems reliability based on solving a system of
differential equations corresponding to the diagram.
The proper design of the state-transition diagram is a critical task in Markov
analysis, especially for the MSS. The explosion of the number of states when the
modeled system is large enough is still a major problem. In such cases a state-
space diagram representation in its pictorial form often becomes impossible. One
of the possible solutions is to use a formalized description of the system. When
such a description is used, the state-space diagram is not actually presented in pic-
torial form, but knowledge of the rules that govern the MSS evolution enable us to
explore the state-space graph systematically by using a computer. In addition, it is
important to understand that the state-space diagram plays only an auxiliary role.
The main aim here is to determine the transition intensity matrix a that defines the
system of differential equations (2.38) and hence the corresponding Markov
model. Therefore, in this context we speak about the formalized generation of the
transition intensity matrix and, therefore, about the Markov model generation.
Based on this idea, efficient algorithms are built for the reliability evaluation. One
possible algorithm for Markov model generation for the MSS is as follows.
Algorithm for the generation of the Markov model
1. Arrangement of the failure and repair rate sets

For every element j of the MSS, the given element failure and repair rates
should be arranged in the following ordered set of failure rates:
{ ( j)
k j , k j 1 , k( jj,)k j 2 ,..., k( jj,1) , k( jj) 1, k j 2 , k( jj) 1, k j 3 ,..., k( jj) 1,1 ,..., 3,2
( j)
, 3,1
( j)
, 2,1
( j)
}
and the ordered set of repair rates
{ ( j)
1,2 ,..., 1,( kj )j 1 , 1,( kj )j , 2,3
( j)
}
,..., 2,( jk)j 1 , 2,( jk)j ,..., k( jj ) 2, k j 1 , k( jj ) 2, k j , k( jj ) 1, k j .
If for element j there is no failure that causes a decrease in the element per-
formance from level gjm to level g jm m , the corresponding failure rate
(mj,)m m is equal to zero in the failure rate set. In the same manner, if there is
no repair that returns the performance of element j from level g jm m to level
( j)
gjm, the corresponding repair rate m m, n is equal to zero in the repair rate
set.
2. Generation of MSS states

All the K=k1k2kn possible MSS states are generated as different combina-
tions of all the possible performance levels of the systems elements. To each
system state a set {g1i ,..., g nl }, i [1, k1 ],..., l [1, kn ], of corresponding states
of the system elements should be assigned.
3. Enumeration of the system states and the computation of the MSS output per-
formance
All system states should be enumerated. For computer-based algorithms the
enumeration order is not important. What is really important is the correspon-
dence among the number of states ns ( ns [1, K ]) , the set of performance
rates of elements in this state {g1i ,..., g nl }, and the MSS output performance
rate g ns in this state that is determined by the MSS structure function
g ns = ( g1i ,..., g nl ), ns = 1,..., K .
4. State-transition analysis and generating the transition matrix

At this stage, the existence of connections between any system state ns and
other states must be determined. These connections are defined by failures and
repairs of the system elements.
According to the assumption that there are no simultaneous transitions in
any different elements, the transition from an arbitrary system state character-
ized by the set of element performances {g1i,,gjm,,gnh} is possible only to
one of the states in which just one of the elements changes its performance:
{g 1i } {
,, g jm ,, g nh g1i ,, g jf ,, g nh , where m f , 1 j n. }
The transition in which f < m corresponds to the element failure (with transi-
tion intensity ( with transition intensity ) and the transition in which
( j)
m, f
f > m corresponds to the element repair ( with transition intensity ) .

( j)
m, f
In order to determine all the transitions in the MSS state-space diagram,

one has to choose all the pairs of system states that differ by the state of a sin-
gle element. For each pair, the corresponding transition intensities (failure and
repair rates) should be chosen from the corresponding ordered sets.
If the MSS transits from state n1 to state n2 because of a failure with the in-
tensity m( j, )f ( f < m ) of the arbitrary element j, then the element an1n2 of tran-
sition matrix a located in the intersection of row n1 and column n2 is
an1n2 = m( j, )f . (2.74)
If the MSS transits from state n1 to state n2 because of repair with intensity
m( j, )f ( f > m ) of an arbitrary element j, then the element an1n2 of transition
matrix a located in the intersection of row n1 and column n2 is
an1n2 = m( j, )f . (2.75)
If the transition from state n1 to state n2 does not exist, then the element
an1n2 of transition matrix a located in the intersection between row n1 and col-
umn n2 is zero:
an1n2 = 0 . (2.76)
5. Determination of diagonal elements in the transition intensity matrix

The last step in generating the transition intensity matrix a is the determina-
tion of its diagonal elements. As is known (see previous section), the elements
in each row of matrix a add up to 0. Hence, diagonal elements of the transition
intensity matrix should be defined as follows:
K
aii = ain , i = 1,..., K . (2.77)
n =1
n i
By applying the five-step algorithm, one obtains a transition intensity matrix

for MSS. Based on the matrix, the system of differential equations (2.38) describ-
ing the system behavior can be directly derived.
The algorithm described above is general. It can build a Markov model for
quite complex MSS and reduces the risk of errors and misrepresentations.
MSS reliability indices such as instantaneous availability, instantaneous ex-
pected performance, and instantaneous performance deficiency can be found in the
same way as was demonstrated for multi-state element (the only difference is the
greater order of the system of differential equations).
At first, the system of differential equations must be solved and probabilities
pi(t) must be found for all system states i=1,,K.
For the constant demand level w the MSS instantaneous availability can be ob-
tained as the sum of probabilities of all acceptable states (the states where MSS
output performance is greater than or equal to w). Therefore, MSS instantaneous
availability can be defined as
K
A(t ) = pi (t ) 1( gi w), (2.78)
i =1
MSS mean instantaneous performance can be defined as
K
Et = gi pi (t ), (2.79)
i =1
and MSS mean instantaneous performance deficiency can be defined as
K
Dt = pi (t ) max( w g i ,0). (2.79)
i =1
In order to find the MSS reliability function Ri (t ) for the constant demand w,
gi < w gi +1 , the Markov model should be changed. All the system states from
the unacceptable area where the performance rate is lower than demand w, should
be united in one absorbing state with the number 0. Transitions from state 0 to any
acceptable state should be forbidden. The transition rate from any acceptable state
j to the absorbing state should be determined as the sum of the transition rates
from state j to all the unacceptable states. After performing these changes, we ob-
tain the new transition intensity matrix. By solving the differential equation (2.38
with this matrix one obtains the probability of state 0 p0 (t ) and determines the
system reliability function as R(t ) = 1 p0 (t ).
Example 2.4 (Lisnianski and Levitin 2003). Consider the flow transmission sys-
tem from Example 1.2 (Chapter 1) that was presented in Figure 1.8 (a). It consists
of three elements (pipes). The oil flow is transmitted from point C to point E. The
performance of the pipes is measured by their transmission capacity (tons per min-
ute). Elements 1 and 2 are repairable and each has two possible states. A state of
total failure for both elements corresponds to a transmission capacity of 0 and the
operational state corresponds to capacities of 1.5 and 2 tons per minute, respec-
tively, so that G1 ( t ) { g11 , g12 } = {0,1.5} and G2 ( t ) { g 21 , g 22 } = {0, 2} .
The failure rates and repair rates corresponding to these two elements are
2,1
(1)
= 7 year 1 , 1,2
(1)
= 100 year 1 for element 1,
2,1
( 2)
= 10 year 1 , 1,2
(2)
= 80 year 1 for element 2.
Element 3 is a multi-state element with only minor failures and minor repairs.
It can be in one of three states: a state of total failure corresponding to a capacity
of 0, a state of partial failure corresponding to a capacity of 1.8 tons per minute,
and a fully operational state with a capacity of 4 tons per minute. Therefore,
G3 ( t ) { g31 , g32 , g33 } = {0,1.8, 4} .
The failure rates and repair rates corresponding to element 3 are
3,2
(3)
= 10 year 1 , 2,1
(3)
= 7 year 1 ,
1,2
(3)
= 120 year 1 , 2,3
(3)
= 110 year 1 .
The system output performance rate is defined as the maximum flow that can
be transmitted from C to E. As was shown in Example 1.2, the MSS structure
function is
Gs ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
The demand is constant: w = 1.0 ton per minute.

The MSS structure is presented in Figure 2.13.
G1 ( t ) {0,1.5}
G3 ( t ) {0,1.8, 4}
G ( t ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )}
G2 ( t ) {0, 2}
Fig. 2.13 MSS structure
The state-transition diagrams of system elements are presented in Figure 2.14.

g12 = 1.5 g 22 = 2 g 33 = 4
2(1,1) 1(,12) 2,1

(2)
3,2
(3)
2,3
(3)

(2)
1,2
g 32 = 1.8
g11 = 0
g 21 = 0
2,1
(3)
1,2
(3)
g 31 = 0
Fig. 2.14 State-transition diagrams of the system elements
In order to derive the system of differential equations for the MSS we apply the
algorithm described above:
1. The failure and repair rate sets for the system elements are:
{ } { }
element 1: 2,1
(1)
, 1,2
(1)
{ } { }
; element 2 : 2,1
(2)
, 1,2
(2)
;
element 3 : { , = 0, } , { , = 0, } .
(3)
3,2
(3)
3,1
(1)
2,1
(3)
1,2
(3)
1,3
(3)
2,3
2. All the system states are generated as combinations of all possible states of sys-
tem elements (characterized by their performance levels). The total number of
different system states is K = k1k2 k3 = 2 2 3 = 12.
3. A unique number is assigned to each system state. All the system states with
their numbers ns corresponding performance rates are presented in columns 1 to
5 of Table 2.1. For every state, the system output performance rate is computed
based on the MSS structure function. For example, in state 1 we have
G1 (t ) = g12 = 1.5, G2 (t ) = g 22 = 2.0, G3 (t ) = g33 = 4.0 .Using the system struc-
ture function, we obtain the entire MSS output performance in state 1 as
G ( t ) = g1 = f ( g12 , g 22 , g33 ) = min { g12 + g 22 , g33 } = min {1.5 + 2.0, 4.0} = 3.5.
4. The state transition analysis is performed for all pairs of system states. For ex-
ample, for state number 2 where the states of the elements are {g11, g22,
g33}={2,4,2} the transitions to states 1, 5, and 6 exist with the intensities
1,2
(1)
, 2,1
(2)
, 3,2
(3)
, respectively. All the existing transitions and corresponding tran-
sition intensities are also presented in Table 2.1. Based on Table 2.1 one can
easily find the non-diagonal elements of the transition intensity matrix that de-
scribes an evolution of the MSS in the state space (the elements of the matrix
corresponding to the absence of transitions should be zeroed).
Table 2.1 Markov model generated for MSS of Example 2.4
Performance ns
ns G1 G2 G3 G 1 2 3 4 5 6 7 8 9 10 11 12
1 1.5 2.0 4.0 3.5 (1)
2,1 ( 2)
2,1 (3)
3,2
2.0 4.0 2.0 1,2 2,1 3,2

(1) ( 2) (3)
2 0
4.0 1.5 1,2 2,1 3,2

(2) (1) (3)
3 1.5 0
4 1.5 2.0 1.8 1.8 2,3 2,1 2,1 2,1

(3) (1) ( 2) (3)
5 0 0 4.0 0 1,2
(2)
1,2
(1)
3,2
(3)
6 0 2.0 1.8 1.8 2,3

(3)
1,2
(1)
2,1
( 2)
2,1
(3)
7 1.5 0 4.0 1.5 2,3

(3)
1,2
(2)
2,1
(1)
2,1
(3)
8 1.5 2.0 0 0 1,2

(3)
2,1
(1)
2,1
( 2)
9 0 0 4 0 2,3
(3)
1,2
(2)
1,2
(1)
2,1
(3)
10 0 2.0 0 0 1,2
(3)
1,2
(1)
2,1
( 2)
11 1.5 0 0 0 1,2
(3)
1,2
(2)
2,1
(1)
12 0 0 0 0 1,2
(3)
1,2
(2)
1,2
(1)
5. The diagonal elements of the transition intensity matrix are determined in such
a way that the sum of elements of each row of the matrix equals zero. These di-
agonal elements are as follows:
a11 = (2,1
(1)
+ 2,1
(2)
+ 3,2
(3)
) a77 = ( 2,3
(3)
+ 1,2
(1)
+ 2,1
(1)
+ 2,1
(3)
)
a22 = ( 1,2
(1)
+ 2,1
(2)
+ 3,2
(3)
) a88 = ( 1,2
(3)
+ 2,1
(1)
+ 2,1
(2)
)
a33 = ( 1,2
(1)
+ 2,1
(2)
+ 3,2
(3)
) a99 = ( 2,3
(3)
+ 1,2
(2)
+ 1,2
(1)
+ 2,1
(3)
)
a44 = ( 2,3
(3)
+ 2,1
(1)
+ 2,1
(2)
+ 2,1
(3)
) a10,10 = ( 1,2
(3)
+ 1,2
(1)
+ 2,1
(2)
)
a55 = ( 1,2
(1)
+ 1,2
(1)
+ 3,2
(3)
) a11,11 = ( 1,2
(3)
+ 1,2
( 2)
+ 2,1
(1)
)
a66 = ( 2,3
(3)
+ 1,2
(1)
+ 2,1
(2)
+ 2,1
(3)
) a12,12 = ( 1,2
(3)
+ 1,2
(2)
+ 1,2
(1)
)
The state-space diagram of the system is presented in Figure 2.15 (in this dia-
gram the corresponding system performance is presented in the lower parts of the
circle).
2(1,1)
1,( 22)
1,(12) 2( 3,3)
(2)
3(,32) (1 )2,1
2 ,1
3(,32) 1,( 22)
( 3)
2,3
2( ,12 ) 1,(12)
2(1,1) 3,( 32)
2( 3,3)
( 2)
1,2
1,(12) 1,( 22)
(3 )
3, 2 ( 3)
2,3 2,1(2)
(3 )
2(1,1)
2 ,1 ( 3)
1,2 (2) 1,( 32)
2,1
( 3)
1,2
2( 3,1) 1,( 22)
2(1,1) 2( 3,1)
( 2)
1, 2
1,(12) 2( 3,3)
(3)
1, 2 2 ,1
( 2) (2)
2,1
2(1,1)
2( ,12 )
1,(12)
Fig. 2.15 State-transition diagram for MSS
According to (2.37) and (2.38) the corresponding system of differential equa-

tions for the state probabilities pi (t ), 1 i 12 takes the following form:
dp1 (t )
dt
= 2,1(
(1)
+ 2,1
(2)
+ 3,2
(3)
p1 (t ) + 1,2
(1)
)
p2 (t ) + 1,2
(2)
p3 (t ) + 2,3 (3)
p4 (t ),
dp2 (t )
dt
= 2,1
(1)
p1 (t ) 1,2 (1)
(
+ 2,1
( 2)
+ 3,2
(3)
p2 (t ) + 1,2
(2)
)
p5 (t ) + 2,3(3)
p6 (t ),
dp3 (t )
dt
= 2,1
(2)
p1 (t ) 1,2(2)
(
+ 2,1
(1)
+ 3,2
(3)
p3 (t ) + 1,2
(1)
)
p5 (t ) + 2,3
(3)
p7 (t ),
dp4 (t )
dt
= 3,2
(3)
p1 (t ) 2,3
(3)
(
+ 2,1(1)
+ 2,1
( 2)
+ 2,1
(3)
p4 (t ) + 1,2(1)
p6 (t ) )
+ 1,2
(2)
p7 (t ) + 1,2
(3)
p8 (t ),
dp5 (t )
dt
= 2,1
(2)
p2 (t ) + 2,1
(1)
p3 (t ) 1,2
(2)
+ 1,2
(1)
+ 3,2
(3)
(
p5 (t ) + 2,3
(3)
p9 (t ), )
dp6 (t )
dt
= 3,2
(3)
p2 (t ) + 2,1
(1)
(
p4 (t ) 2,3
(3)
+ 1,2
(1)
+ 2,1
(2)
+ 2,1
(3)
p6 (t ) )
+ 1,2
(2)
p9 (t ) + 1,2
(3)
p10 (t ),
dp7 (t )
dt
= 3,2
(3)
p3 (t ) + 2,1
(2)
(
p4 (t ) 2,3
(3)
+ 1,2
(2)
+ 2,1
(1)
+ 2,1
(3)
p7 (t ) )
+ 1,2
(1)
p9 (t ) + 2,3
(3)
p11 (t ),
dp8 (t )
dt
= 2,1
(3)
(
p4 (t ) 1,2
(3)
+ 2,1
(1)
+ 2,1(2)
)p8 (t ) + 1,2
(1)
p10 (t ) + 1,2(2)
p11 (t ),
dp9 (t )
dt
= 3,2
(3)
p5 (t ) + 2,1
(2)
p6 (t ) + 2,1
(1)
(
p7 (t ) 2,3
(3)
+ 1,2(2)
+ 1,2
(1)
+ 2,1
(3)
) p9 (t )
+ 1,2
( 3)
p12 (t ),
dp10 (t )
dt
= 2,1
(3)
p6 (t ) + 2,1
(1)
p8 (t ) 1,2(
(3)
+ 1,2
(1)
+ 2,1
(2)
)
p10 (t ) + 1,2
(2)
p12 (t ),
dp11 (t )
dt
= 2,1
(3)
p7 (t ) + 2,1
(2)
p8 (t ) 1,2 (
(3)
+ 1,2(2)
+ 2,1
(1)
)
p11 (t ) + 1,2
(1)
p12 (t ),
dp12 (t )
dt
= 2,1
(3)
p9 (t ) + 2,1
(2)
p10 (t ) + 2,1
(1)
p11 (t ) 1,2 (
(3)
+ 1,2
(2)
+ 1,2
(1)
)
p12 (t ).
Solving this system with the initial conditions p1 (0) = 1 , pi (0) = 0 for
2 i 12 one obtains the probability of each state at time t.
According to Table 2.1, in different states a MSS has the following perform-
ance rates: in state 1 g1 = 3.5, in state 2 g 2 = 2.0, in states 4 and 6 g 4 = g 6 = 1.8,
in states 3 and 7 g3 = g 7 = 1.5, in states 5, 8, 9, 10, 11 and 12
g5 = g8 = g9 = g10 = g11 = g12 = 0. Therefore,
Pr {G = 3.5} = p1 ( t ) ,
Pr {G = 2.0} = p2 ( t ) ,
Pr {G = 1.5} = p3 ( t ) + p7 ( t ) ,
Pr {G = 1.8} = p4 ( t ) + p6 ( t ) ,
Pr {G = 0} = p5 ( t ) + p8 ( t ) + p9 ( t ) + p10 ( t ) + p11 ( t ) + p12 ( t ) .
For the constant demand level w = 1 one obtains the MSS instantaneous avail-
ability as a sum of state probabilities where the MSS output performance is greater
than or equal to 1. States 1, 2, 3, 4, 6, and 7 are acceptable. Hence
A(t ) = p1 (t ) + p2 (t ) + p3 (t ) + p4 (t ) + p6 (t ) + p7 (t ) .
Instantaneous availability and probabilities of different MSS performance lev-

els are presented in Figure 2.16.
0.8
Availability
Probabilities
0.6
Pr (G=3.5)
Pr (G=2.0)
0.4 Pr (G=1.8)
Pr (G=1.5)
Pr (G=0)
0.2
0
0 0.05 0.1 0.15 0.2
Time (years)
Fig. 2.16 Instantaneous availability and probabilities of different MSS performance levels
The MSS mean instantaneous output performance is
12
Et = pi (t )gi .
i =1
The MSS mean instantaneous performance deficiency is
12
Dt = pi (t ) max( w g i ,0) .
i =1
These functions are presented in Figure 2.17.

In order to find the reliability function R(t) for w = 1 one must unite all the un-
acceptable states into one absorbing state, forbid repairs that return the MSS from
this state to the acceptable states, and replace the failure rates from each accept-
able state j to absorbing state 0 by the sum of the failure rates from state j to all the
unacceptable states.
3.5 0.014
0.012
Performance deficiency
3.4
Output performance
0.01
3.3
0.008
3.2 0.006
0.004
3.1
0.002
3 0
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
Time (years) Time (years)
(a) (b)
Fig. 2.17 Reliability indices of the flow transmission MSS
The corresponding state-space diagram is presented in Figure 2.18.
(1)
2,1 (2)
1,2
(1)
1,2 (2)
2,1
(3)
3(,32) 2,3
3(,32) (3)
2,3 (1)
3,( 32)
2,1
(1)
1,2 2(3,3)
1,(12) (2)
2,1
(2)
2,1 ( 2) ( 3)
2,1 + 2,1
(1) (3)
2,1 + 2,1 2(3,1)
Fig. 2.18 State-transition diagram for evaluating MSS reliability function
In this state-space diagram all unacceptable states (with MSS output perform-
ance lower than w = 1 ) are united into one absorbing state 0. As was described
above, the system unreliability is treated as the probability that the MSS enters the
unacceptable area the first time at time instant t.
The corresponding system of differential equations is as follows:
dp1 (t )
dt
= 2,1
(1)
(
+ 3,2
(3)
+ 2,1(2)
)
p1 (t ) + 1,2(1)
p2 (t ) + 1,2(2)
p3 (t ) + 2,3
(3)
p4 (t ),
dp2 (t )
dt
= 2,1
(1)
p1 (t ) 2,1(2)
+ 1,2
(1)
(
+ 3,2(3)
)
p2 (t ) + 2,3 (3)
p6 (t ),
dp3 (t )
dt
= 2,1
(2)
p1 (t ) 2,1(1)
+ 1,2
(2)
(
+ 3,2 (3)
)
p3 (t ) + 2,3
(3)
p7 (t ),
dp4 (t )
dt
= 3,2
(3)
p1 (t ) 2,1(1)
+ 2,3
(3)
(
+ 2,1(2)
+ 2,1
(3)
)
p4 (t ) + 1,2 (1)
p6 (t ) + 1,2
(2)
p7 (t ),
dp6 (t )
dt
= 3,2
(3)
p2 (t ) + 2,1
(1)
p4 (t ) 2,3 (3)
(
+ 2,1
(2)
+ 2,1
(3)
+ 1,2(1)
)p6 (t ),
dp7 (t )
dt
= 3,2
(3)
p3 (t ) + 2,1
(2)
(
p4 (t ) 2(3),3 + 2,1(3)
+ 2,1
(1)
+ 1,2(2)
)p7 (t ),
dp0 (t )
dt
= 2,1
(2)
p2 (t ) + 2,1
(1)
p3 (t ) + 2,1(3)
p4 (t ) + 2,1 (
( 2)
+ 2,1
(3)
) (
p6 (t ) + 2,1 (3)
+ 2,1)
(1)
p7 (t ).
Solving this system under the initial conditions p1 (0) = 1 ,

p0 (0) = p2 (0) = p3 (0) = p4 (0) = p6 (0) = p7 (0) = 0, we find the probability p0 (t )
of absorbing state 0. This probability characterizes the reliability function
R(t ) = 1 p0 (t ) for w = 1. The reliability function is presented in Figure 2.19.
0.9
0.8
Reliability function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.2 0.4 0.6 0.8 1
Time (years)
Fig. 2.19 Reliability function of flow transmission MSS
2.4 Markov Reward Models 79
2.4 Markov Reward Models
2.4.1 Basic Definition and Model Description
In the preceding subsections, it was shown how some important MSSs' reliability
indices can be found by using the Markov technique. Here we consider additional
indices such as states frequencies and the mean number of system failures during
an operating period. It is also very important that the Markov reward models con-
sidered here are very useful for MSS life cycle cost analysis and reliability-
associated cost computation. Here we describe the common computational
method, which is based on the general Markov reward model that was primarily
introduced by Howard (1960) and then was essentially extended for different ap-
plications in Mine and Osaki (1970) and many other research works. The corre-
sponding overview can be found in Reibman et al. (1989).
This model considers the continuous-time Markov chain with a set of states
{ , K } and transition intensity matrix a = aij , i, j = 1,, K . It is suggested
1,
that if the process stays in any state i during the time unit, a certain amount of
money rii should be paid. It is also suggested that each time that the process tran-
sits from state i to state j a certain amount of money rij should be paid. These
amounts of money rii and rij are called rewards (the reward may also be negative
when it characterizes losses or penalties). Rewards may also be considered in
other senses, not only as money. It may be, for example, the energy of a power
generating system, information quantity of a communications system, the produc-
tivity of a production line, etc. Markov process with rewards associated with its
states and transitions is called a Markov process with rewards. For these proc-
esses, an additional matrix r = rij , i, j = 1,, K of rewards is determined. If all
rewards are zeroes, the process reduces to the ordinary continuous-time discrete-
state Markov process.
Note that the rewards rii and rij have different dimensions. For example, if rij is
measured in cost units, reward rii is measured in cost units per time unit. The value
that is of interest is the total expected reward accumulated up to time instant t
under specified initial conditions.
Let Vi (t ) be the total expected reward accumulated up to time t, given the ini-
tial state of the process at time instant t = 0 is state i. According to Howard
(1960), the following system of differential equations must be solved under speci-
fied initial conditions in order to find the total expected rewards:
dVi (t ) K K
= rii + aij rij + aijV j (t ), i = 1,..., K . (2.80)
dt j =1 j =1
j i
System (2.80) can be obtained in the following manner. Assume that at time in-
stant t = 0 the process is in state i. During the time increment t , the process can
remain in this state or transit to some other state j. If it remains in state i during
time t , the expected reward accumulated during this time is rii t. Since at the
beginning of the time interval [ t , t + t ]] the process is still in state i, the ex-
pected reward during this interval is Vi (t ) and the expected reward during the en-
tire interval [ 0, t + t ] is Vi (t + t ) = rii t + Vi (t ). The probability that the process
will remain in state i during the time interval t equals 1 minus the probability
that it will transit to any other state j i during this interval:
K
ii (0, t ) = 1 aij t = 1 + aii t . (2.81)
j =1
j i
On the other hand, during time t the process can transit to some other state
j i with the probability ij (0, t ) = aij t. In this case the expected reward ac-
cumulated during the time interval [ 0, t ] is rij. At the beginning of the time in-
terval [ t , t + t ]] the process is in state j. Therefore, the expected reward during
this interval is V j (t ) and the expected reward during the interval [ 0, t + t ] is
Vi (t + t ) = rij + V j (t ).
In order to obtain the total expected reward one must summarize the products
of rewards and corresponding probabilities for all of the states. Thus, for small t
one has
K
Vi (t + t ) (1 + aii t ) [ rii t + Vi (t ) ] + aij t rij + V j (t ) , i = 1,..., K . (2.82)
j =1
j i
Neglecting the terms with an order greater than t one can rewrite the last ex-
pression as follows:
Vi (t + t ) Vi (t ) K K
= rii + aij rij + aijV j (t ), i = 1,..., K . (2.83)
t j =1 j =1
j i
Passing to the limit in this equation gives (2.80).

Defining the vector column of total expected rewards V(t) with components
V1 (t ),..., VK (t ) and vector column u with components
K
ui = rii + aij rij , i = 1,..., K , (2. 84)
j i
j =1
one obtains Equation 2.80 in matrix notation:
d
V ( t ) = u + aV ( t ) . (2.85)
dt
Usually system (2.80) should be solved under initial conditions Vi (0) = 0,

i = 1,, K .
In order to find the long-run (steady-state) solution of (2.80), the following sys-
tem of algebraic equations must be solved
0 = u + aV ( t ) , (2.86)
where 0 is a vector column with zero components.

Example 2.5 As an example of the method application we consider a line for ice-
cream production. The nominal productivity (performance) of the line is
N ic = 280 ice creams per hour. The profit from the selling (delivery according
contract) of this product is rprf = US $15 / h. The line only has complete failures
with a failure rate of = 0.1 year 1 . If the line fails, the owner by contract is
forced to pay US $3 penalty for one unsupplied ice cream per hour, so
c p = US $3 per ice-cream per hour. After the line failure, a repair is performed
with a repair rate of = 200 year 1 . The mean cost of repair is cr = US $14, 000.
The problem is to evaluate a total expected reward RT associated with the pro-
duction line operating during the time interval [0, T].
Solution. The state-space diagram for the ice-cream production line is presented in
Figure 2.20. It has only two states: perfect functioning with a nominal productivity
(state 2) and complete failure where the unit generating capacity is zero (state 1).
The profit from ice cream selling defines the reward associated with state 2, so
r22 = rprf . The transitions from state 2 to state 1 are associated with failures and
have intensity . If the line is in state 1, the penalty cost c p N ic should be paid for
each time unit (hour). Hence, the reward r11 associated with state 1 is r11 = c p N ic .
The transitions from state 1 to state 2 are associated with repairs and have an
intensity of . The repair cost is cr , therefore the reward associated with the tran-
sition from state 1 to state 2 is r12 = cr . There is no reward associated with the
transition from state 2 to state 1, so r21 = 0.
Fig. 2.20 Markov reward model for ice-cream production line
The reward matrix takes the form
r r12 c p N ic cr
r = rij = 11 =
r21 r22 0 rprf
and the transition intensity matrix will be as follows:
a a12
a = aij = 11 = .
a21 a22
Using (2.85) the following system of differential equations can be written in

order to find the expected total rewards V1 (t ) and V2 (t ):
dV1 (t )
dt = c p N ic + cr V1 (t ) + V2 (t ),

dV2 (t ) = r + V (t ) V (t ).
dt prf 1 2
A total expected reward RT associated with the production line operating dur-
ing the time interval [0, T] is equal to the expected reward V2 (t ) accumulated up
to time t, given the initial state of the process at time instant t = 0 is state 2.
Using the LaplaceStieltjes transform under the initial conditions
V1 (0) = V2 (0) = 0, we transform the system of differential equations into the fol-
lowing system of linear algebraic equations:
c p L + cr
sv1 ( s ) = v1 ( s ) + v2 ( s ),
s
sv ( s ) = v ( s ) v ( s ),
2 1 2
where vk (s ) is the LaplaceStieltjes transform of a function Vk (t ).

The solution of this system is
c p L + cr
v2 ( s ) = .
s 2 (s + + )
After applying the inverse LaplaceStieltjes transform we obtain
c p L + cr ( + ) t
V2 (t ) = L1 {v2 ( s )} = e + ( + )t 1 .
( + )2
The total expected cost CT during the operation time T is
c p L + cr ( + )T
CT = V2 (T ) = e + ( + )T 1 .
( + )2
For relatively large T the term e ( + )T can be neglected and the following ap-
proximation can be used:
( c p L + cr )
CT T.
+
Therefore, for large T, the total expected reward is a linear function of time and
the coefficient
( c p L + c r )
cun =
+
defines the annual expected cost associated with production line unreliability. For
the data given in the example, cun = $13.14 106 year-1.
2.4.2 Computation of Multi-state System Reliability Measures

Using Markov Reward Models
In its general form the Markov reward model was intended to provide economic
and financial calculations. From the preceding subsection and Example 2.5 it is
clear that the Markov reward model is a very useful tool for life cycle cost analy-
sis, and corresponding case studies will be presented in Chapters 6 and 7. How-
ever, it was shown by Lubkov and Stepanyans (1978) and Volik et al. (1988) that
this tool may also be very suitable for reliability analysis and important reliability
measures could be easily found by the corresponding determination of the rewards
in matrix r. In these works it was suggested that demand w is constant. The
method was extended by Lisnianski (2007) to MSS with variable demand, where
demand is assumed to be a continuous-time Markov chain with m different possi-
ble states (levels) w1,,wm and corresponding constant transition intensities with a
given matrix b = bij , i, j = 1, 2, m. Here we apply this method for MSS reliabil-
ity analysis.
2.4.2.1 Multi-state System with Variable Demand
In the previous section, MSS was considered to have constant demand. In prac-
tice, this is often not so. A MSS can fall into a set of unacceptable states in two
ways: either through a performance decrease because of failures or through an in-
crease in demand.
For example, consider the demand variation that is typical for power systems.
Usually demand can be represented by a daily demand curve. This curve is cyclic
in nature with a maximum level (peak) during the day and a minimum level at
night (Endrenyi 1979); (Billinton and Allan 1996). Another example is a number
of telephone calls arriving during a time unit to a telephone station. In the simplest
and most frequently used model, the cyclic demand variation can be approximated
by a two-level demand curve as shown in Figure 2.21 (a).
In this model, the demand is represented as a continuous-time Markov chain
with two states: w = {w1 , w2 } [Figure 2.21 (b)], where w2 is a peak level of de-
mand and w1 is a low level. When the cycle time Tc and the mean duration of the
peak tp are known (usually ( usually Tc = 24 h ) , the transition intensities of the
model can be obtained as
1 1
p = , l = , (2.87)
Tc t p tp
where p is the transition intensity from a low demand level to a peak level and
l is the transition intensity from a peak demand level to a low level.
w2
p
w1
T
Actual demand curve t
Two-level approximation
(a) (b)
Fig. 2.21 Two-level demand model: (a) approximation of actual demand curve, and (b) state-
transition diagram
In the further extension of the variable demand model the demand process can
be approximated by defining a set of discrete values {w1 , w2 ,, wm } representing
different possible demand levels and determining the transition intensities between
each pair of demand levels (usually derived from the demand statistics). The reali-
zation of the stochastic process of the demand for a specified period and the corre-
sponding state-space diagram are shown in Figure 2.22. bij is the transition inten-
sity from demand level wi to demand level wj.
(a) (b)
Fig. 2.22 Discrete variable demand: (a) realization of general Markov demand process, and (b)
state-transition diagram for general Markov demand process
So, for a general case we assume that demand W(t) is also a random process
that can take on discrete values from the set w = {w1 ,, wm } . The desired relation
between the MSS output performance and the demand at any time instant t can be
expressed by the acceptability function (G (t ),W (t )). The acceptable system
states correspond to (G (t ), W (t )) 0 and the unacceptable states correspond to
(G (t ),W (t )) < 0. The last inequality defines the system failure criterion. Usually
in power systems, the system generating capacity should be equal to or exceed the
demand. Therefore, in such cases the acceptability function takes on the following
form:
(G (t ),W (t )) = G (t ) W (t ) (2.88)
and the criterion of state acceptability can be expressed as
(G (t ),W (t )) = G (t ) W (t ) 0 . (2.89)
Below we present a general method that proved to be very useful for the com-
putation of system reliability measures when MSS output performance and de-
mand are independent discrete-state continuous-time Markov processes.
2.4.2.2 Combined Performance-demand Model
Consider a MSS where its output performance is represented by a stochastic proc-

ess G(t) that is described as a continuous-time Markov chain Ch1 with K different
possible states g1 , , g K and corresponding transition intensities matrix a = aij ,
i, j = 1, 2, , K . Therefore, Ch1 is a mathematical model for stochastic process
G(t) that represents MSS output performance. This process is graphically pre-
sented in Figure 2.23, where system output performances for each state are repre-
sented inside the ellipses and the state number is presented near the corresponding
ellipse. Transition intensities are presented near the arcs connecting the corre-
sponding states. The state with the largest performance g K is the best state and all
the states are ordered according to their capacity, so that g K > g K 1 > ... > g1 .
Fig. 2.23 Markov model for MSS output performance
The demand process W(t) is also modeled as a continuous-time Markov chain

Ch2 with m different possible states w1, , wm and corresponding constant transi-
tion intensities with the matrix b = bij , i, j = 1, 2, , m . Ch2 is a mathematical
model for the demand stochastic process W(t) and is graphically represented in
Figure 2.24. The demand levels for each state are presented inside the ellipses. As
in the previous case, the state number is presented inside the corresponding circle
and transition intensities are presented near the corresponding arcs (connecting
corresponding states). State m is the state with the largest demand, and all states
are ordered according to their demand levels, so that wm > wm 1 > ... > w1 .
Fig. 2.24 Markov model for MSS demand

The performance and demand models can be combined based on the inde-
pendence of events in these two models. The probabilities of transitions in each
model are not affected by the events that occur in another one. The state-space
diagram for the combined m-state demand model and K-state output capacity
model is shown in Figure 2.25. Each state in the diagram is labeled by two indi-
ces indicating the demand level w {w1 ,..., w m } and the element performance
rate g {g1 , g 2 ,..., g K }.
These indices for each state are presented in the lower part of the correspond-
ing circle. The combined model is considered to have mK states. Each state cor-
responds to a unique combination of demand levels wi and element performance
g j and is numbered according to the following rule:
z = ( i 1) K + j , (2.90)
where z is a state number in the combined performance-demand model,

z = 1, , mK ;
i is the demand level number, i = 1, m;
j is the MSS output performance level number, j = 1, K .
In order to designate that state z in a combined performance-demand model
corresponds to demand level wi and performance g j , we use the form
z ~ {wi , g j } . (2.91)
In Figure 2.25 the number of each state is shown in the upper part of the corre-
sponding circle.
In addition to transitions between states with different performance levels, there
are transitions between states with the same performance levels but with different
demand levels. All intensities of horizontal transitions are defined by transition in-
tensities bi , j , i, j = 1, m of the Markov demand model Ch2, and all intensities of
vertical transitions are defined by transition intensities ai , j , i, j = 1, K of the per-
formance model Ch1. All other (diagonal) transitions are forbidden. We designate
the transition intensity matrix for the combined performance-demand model as
c = cij , where i, j = 1, 2,, mK .
Thus, the algorithm of the combined performance-demand model building
based on separate performance and demand models Ch1 and Ch2 can be presented
by the following steps.
K 2K ... mK
w 1 gK w2 gK wm gK
K-1 2K-1 ... mK-1

w1 gK-1 w2 gK-1 wm gK-1
... ... ...
1 K+1 ... (m-1)K+1
w1 g1 w2 g1 wm g1
Fig. 2.25 Combined performancedemand model [Unacceptable states are grey]
Algorithm
1. The state-space diagram of a combined performance-demand model is shown in
Figure 2.25, where the nodes represent system states and the arcs represent cor-
responding transitions.
2. The graph consists of mK nodes that should be ordered in K rows and m col-
umns.
3. Each state (node) should be numbered according to rule (2.90).
4. All intensities c z1 ,z2 of horizontal transitions from state z1 (corresponding to

demand wi and performance gj) to state z2 (corresponding to demand ws and the
same performance gj according to rule (2.90)) are defined by demand transition
intensities matrix b,
cz1 , z2 = bi , s , (2.92)
where z1 ~ {wi , g j }, z2 ~ {ws , g j }, i, s = 1,..., m, j = 1,..., K .
5. All intensities of vertical transitions from state z1 (corresponding to demand wi

and performance gj) to state z3 (corresponding to the same demand wi and per-
formance gt according to rule (2.90) are defined by the performance transition
intensities matrix a,
cz1 , z3 = a j ,t , (2.93)
where
z1 ~ {wi , g j }, z3 ~ {wi , g t }, i = 1,..., m, j , t , = 1,..., K . (2.94)
6. All diagonal transitions are forbidden so that the corresponding transitions in-
tensities in matrix c are zeroed.
2.4.2.3 Reward Determination for Computation of Multi-state System

Reliability Indices
In the previous subsection we built the combined performance-demand model and,

therefore, defined its transition intensity matrix c based on matrices a and b for
performance and demand processes.
When the combined performance-demand model is built we can consider it as a
continuous-time Markov chain with a set of states {1, , mK } and a transition in-
tensity matrix c = cij , i, j = 1,, mK . . In general it is assumed that a certain re-
ward rii is associated with the process of staying in any state i during a time unit.
It is also assumed that each time the process transits from state i to state j a reward
rij is associated with this transition.
Let Vi (t ) be the expected total reward accumulated up to time t, given the ini-
tial state of the process at time instant t = 0 is in state i. According to (2.80), the
following system of differential equations must be solved under specified initial
conditions in order to find the total expected rewards for the combined perform-
ance-demand model:
dVi (t ) mK mK
= rii + cij rij + cijV j (t ), i = 1,, mK . (2.95)
dt j =1 j =1
j i
In the most common case, the MSS begins to accumulate rewards after time in-
stant t = 0, therefore, the initial conditions are
Vi ( 0 ) = 0, i = 1,, mK . (2.96)
If, for example, the state number K (Figure 2.25) with the highest performance
level and the lowest demand level is defined as the initial state, the value VK(t)
should be found as a solution of system (2.95).
In order to find reliability measures for a MSS the specific reward matrix r
should be defined for each measure. Based on the combined performance-demand
model, the theory of the Markov reward processes can be applied for computation
of reliability measures for Markov MSS. As was said above, we assume that de-
mand W(t) and MSS output performance G(t) are mutually independent continu-
ous-time Markov chains.
MSS average availability A ( T ) is defined as a mean fraction of time when the

system resides in the set of acceptable states during the time interval [0,T],
T
1
A (T ) =
T 0
A(t )dt , (2.97)
where A(t) is the instantaneous (point) availability the probability that the MSS
at instant t > 0 is in one of the acceptable states:
A(t ) = Pr{ (G (t ),W (t )) 0} . (2.98)
As was shown in the previous section, A(t) can be found by solving differential
equations (2.35) and summarizing the probabilities corresponding to all acceptable
states. But based on the Markov reward model MSS average availability A ( T )
may be found more easily without using expression (2.97). For this purpose the
rewards in matrix r for the combined performance-demand model should be de-
termined in the following manner:
The rewards associated with all acceptable states should be defined as 1.
The rewards associated with all unacceptable states should be zeroed as well as
all the rewards associated with the transitions.
The mean reward Vi(T) accumulated during the interval [0,T] defines how long
the power system will be in the set of acceptable states in the case where state i is
the initial state. This reward should be found as a solution of system (2.95) under
initial conditions (2.96). After solving (2.95) and finding Vi(t), the MSS average
availability can be obtained for each different initial state i = 1, 2,, mK :
Vi (T )
Ai (T ) = . (2.99)
T
Usually the state K with the greatest performance level and minimum demand
level is determined as an initial state.
The mean number Nfi(T) of MSS failures during the time interval [0, T], if state i
is the initial state, can be treated as a mean number of MSS entrances into the set
of unacceptable states during the time interval [0,T]. For its computation, the re-
wards associated with each transition from the set of acceptable states to the set of
unacceptable states should be defined as 1. All other rewards should be zeroed.
In this case the mean accumulated reward Vi(T), obtained by solving (2.95) pro-
vides the mean number of entrances into the unacceptable area during the time in-
terval [0,T]:
N fi (T ) = Vi (T ) . (2.100)
When the mean number of system failures is computed, the corresponding fre-
quency of failures or frequency of entrances into the set of unacceptable states can
be found:
1
f fi (T ) = . (2.101)
N fi (T )
Expected accumulated performance deficiency (EAPD) can be defined as mean

performance deficiency accumulated within the interval [0, T]. The rewards for
any state number z = ( i 1) K + j , in a combined model, where w j gi > 0, should
be defined as rzz = w j gi . All other rewards should be zeroed. Therefore, the
mean reward Vi(T) accumulated during the time interval [0,T], if state i is in the
initial state, defines the mean accumulated performance deficiency:
T
EAPDi = Vi (T ) = E (W (t ) G (t ))dt . (2.102)
0
Mean time to failure (MTTF) is the mean time up to the instant when the system
enters the subset of unacceptable states for the first time. For its computation the
combined performance-demand model should be transformed all transitions that

return the MSS from an unacceptable states should be forbidden, as in this case all
unacceptable states should be treated as absorbing states.
In order to assess MTTF for a MSS, the rewards in matrix r for the transformed
performance-demand model should be determined as follows:
The rewards associated with all acceptable states should be defined as 1.
The reward associated with unacceptable (absorbing) states should be zeroed,
as should all rewards associated with transitions.
In this case, the mean accumulated reward Vi(t) defines the mean time accumu-
lated up to the first entrance into the subset of unacceptable states (MTTF), if state
i is the initial state.
Probability of system failure during the time interval [0,T] The combined per-
formance-demand model should be transformed as in the previous section for cal-
culating the MSS reliability function all unacceptable states should be treated as
absorbing states and, therefore, all transitions that return the system from unac-
ceptable states should be forbidden.
Rewards associated with all transitions to the absorbing state should be defined
as 1.
All other rewards should be zeroed.
The mean accumulated reward Vi(T) in this case defines the probability of sys-
tem failure during the time interval [0,T] if state i is the initial state. Therefore, the
MSS reliability function can be obtained as
Ri (T ) = 1 Vi (T ), i = 1,..., K . (2.103)
Example 2.6 Consider reliability evaluation for a power system, whose output
generating capacity is represented by a continuous-time Markov chain with three
states. The corresponding capacity levels for states 1, 2, and 3 are
g1 = 0, g 2 = 70 MW, g3 = 100 MW, respectively, and the transition intensity ma-
trix is as the follows:
500 0 500
a = aij = 0 1000 1000 .
1 10 11
All intensities aij are represented in such units as 1/year.

The corresponding capacity model Ch1 is graphically shown in Figure 2.27 (a).
The demand for the power system is also represented by a continuous-time
Markov chain with three possible levels w1 = 0, w2 = 60, w3 = 90. This demand is
shown graphically in Figure 2.26.
Fig. 2.26 Daily demand for the power system
Daily peaks w2 and w3 occur twice a week and five times a week, respectively,
and the mean duration of the daily peak is Tp = 8 h. The mean duration of low
demand level w1 = 0 is defined as TL = 24 8 = 16 h.
According to the approach presented in Endrenyi (1979) that is justified for a
power system, peak duration and low level duration are assumed to be exponen-
tially distributed random values.
The acceptability function is given: ( G (t ), W (t ) ) = G (t ) W (t ). Therefore, a
failure is treated as an entrance into the state where the acceptability function is
negative or G (t ) < W (t ).
Find the mean number of generator entrances into the set of unacceptable states
during the time interval [0,T].
Solution. Markov performance model Ch1 corresponding to the given capacity
levels g1 = 0, g 2 = 70, g3 = 100 and transition intensity matrix a is graphically
shown in Figure 2.27 (a).
Markov demand model Ch2 is shown in Figure 2.27 (b). States 1, 2, and 3 rep-
resent the corresponding demand levels w1 , w2 , and w3 . Transition intensities are
such as follows:
1 1 1
b21 = b31 = = h = 1110 years 1 ,
Tp 8
2 1 2 1
b12 = = = 0.0179 h 1 = 156 years 1 ,
7 TL 7 16
5 1 5 1
b13 = = = 0.0446 h 1 = 391 years 1 .
7 TL 7 16
(a) (b)
Fig. 2.27 Output performance model (a) and demand model (b)
There are no transitions between states 2 and 3, therefore b23 = b32 = 0.

Taking into account the sum of elements in each row of the matrix to be zero,
we can find the diagonal elements in the matrix.
Therefore, a transition intensity matrix b for the demand takes the form:
547 156 391

b = bij = 1110 1110 0 .
1110 0 1110
All intensities bij are also represented in 1/year.

By using the suggested method we find the mean number Nf(T) of system fail-
ures during the time interval [0,T] if the state with maximal generating capacity
and minimal demand level is given as the initial state.
First, the combined performance-demand model should be built according to
the algorithm presented above. The model consists of mK = 3 3 = 9 states
(nodes) that should be ordered in K = 3 rows and m = 3 columns. Each state
should be numbered according to rule (2.41). All intensities of horizontal transi-
tions from state z1~ {wi , g j } to state z2~ {ws , g j }, i, s = 1,3, j = 1,3 are defined by
demand transition intensity matrix b
cz1 z2 = bi , s .
All intensities of vertical transitions from state z1~ {wi , g j } to state z3~ {wi , gt },
i = 1,3 , j , t = 1,3, are defined by the capacity transition intensity matrix a
cz1 z3 = a j ,t .
All diagonal transitions are forbidden; therefore, the corresponding transition

intensities in matrix c are zeroed.
The state-space diagram for the combined performance-demand Markov model
for this example is shown in Figure 2.28.
Fig. 2.28 Combined performance-demand model [Unacceptable states are grey]
Corresponding transition intensity matrix c for the combined performance-

demand model can be written as follows:
x1 0 a1,3 0 0 0 b1,3 0 0
0 x2 a2,3 0 0 0 0 b1,3 0

a1,3 a3,2 x3 0 0 0 0 0 b1,3

0 0 0 x4 0 a1,3 b2,3 0 0
c = cij = 0 0 0 0 x5 a2,3 0 b2,3 0

0 0 0 a31 a3,2 x6 0 0 b2,3
b 0 0 b3,2 0 0 x7 0 a1,3
3,1
0 b3,1 0 0 b3,2 0 0 x8 a2,3

0 0 b3,1 0 0 b3,2 a3,1 a3,2 x9
where
x1 = a1,3 b1,3 , x2 = a2,3 b3,1 , x3 = a1,3 a3,2 b1,3 ,
x4 = a1,3 b2,3 , x5 = a2,3 b2,3 , x6 = a3,1 a3,2 b2,3 ,
x7 = a1,3 b3,1 b3,2 , x8 = a2,3 b3,1 b3,2 , x9 = a3,1 a3,2 b1,3 b3,2 .
The state with the maximum performance g3 = 100 MW and the minimum de-
mand w1 = 0 ( state 3) is given as the initial state. In states 2, 5, and 8 the MSS
performance is 70 MW, in states 3, 6, and 9 it is 100 MW, and in states 1, 4, and 7
it is 0. In states 4, 7, and 8 the MSS performance is lower than the demand. These
states are unacceptable and have a performance deficiency:
D4 = w2 g1 = 60 MW, D7 = w3 g1 = 90 MW, and D8 = w3 g 2 = 70 MW.
States 1, 2, 3, 5, 6, and 9 constitute the set of acceptable states.
In order to find the mean number of failures the reward matrix should be de-
fined according to the suggested method. Each reward associated with transition
from the set of acceptable states to the set of unacceptable states should be defined
as 1. All other rewards should be zeroed. Therefore, in a reward matrix
r14 = r17 = r28 = r98 = r97 = 1 and all other rewards are zeroes. So, reward matrix r
is obtained:
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0
r = rij = 0 0 0 0 0 0 0 1 0 .

0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 1 0
The corresponding system of differential equations is as follows:
dV1 (t )
= b1,3 ( a1,3 + b1,3 ) V1 (t ) + a1,3V3 (t ) + b1,3V7 (t ),
dt
dV2 (t )
= b1,3 ( a2,3 + b1,3 ) V2 (t ) + a2,3V3 (t ) + b1,3V8 (t ),
dt
dV3 (t )
= a1,3V1 (t ) + a3,2V2 (t ) ( a1,3 + b1,3 + a3,2 ) V3 (t ) + b1,3V9 (t ),
dt
dV4 (t )
= ( a1,3 + b2,3 ) V4 (t ) + a1,3V6 (t ) + b2,3V7 (t ),
dt
dV5 (t )
= b2,3 ( a2,3 + b2,3 )V5 (t ) + a2,3V6 (t ) + b2,3V8 (t ),
dt
dV6 (t )
= a3,1 + a3,1V4 (t ) + a3,2V5 (t ) ( a3,1 + a3,2 + b2,3 ) V6 (t ) + b2,3V9 (t ),
dt
dV7 (t )
= b3,1V1 (t ) + b3,2V4 (t ) ( a1,3 + b3,1 + b3,2 ) V7 (t ) + a1,3V9 (t ),
dt
dV8 (t )
= b3,1V2 (t ) + b3,2V5 (t ) ( a2,3 + b3,1 + b3,2 ) V8 (t ) + a2,3V9 (t ),
dt
dV9 (t )
= a3,1 + a3,2 + b3,1V3 (t ) + b3,2V6 (t ) + a3,1V7 (t ) + a3,2V8 (t )
dt
( a3,1 + a3,2 + b3,1 + b3,2 ) V9 (t ).
By solving the system of these differential equations under the initial condi-
tions Vi ( t ) = 0, i = 1, , 9 all expected rewards Vi ( t ) , i = 1,,9 can be found as
functions of time t.
The state K = 3, in which the system has a maximum capacity level and a
minimum demand, is given as the initial state. Then, according to expression
(2.100) the value V3(T) is treated as the mean numbers of system entrances into the
area of unacceptable states or the mean number of power system failures during
the time interval [0,T]. The function N f 3 ( t ) = V3 ( t ) is graphically presented in
Figure 2.29, where N f 3 (t ) is the mean number of system failures in the case, when
state 3 is an initial state.
The function N f 1 ( t ) = V1 (t ) characterizes the mean number of system failures
in the case where state 1 is given as the initial state. It is also presented in this fig-
ure. As shown, N f 3 ( t ) < N f 1 ( t ) because state 1 is closer to the set of unac-
ceptable states it has the direct transition to the set in the unacceptable area and
state 3 does not. Therefore, at the beginning of the process the systems entrance
into the set of unacceptable states is more likely from state 1 than from state 3.
Figure 2.29 (a) graphically represents a number of power system failures for a
short period only 8 d. However, after this short period the function N f 3 (t ) will
2.5 Semi-Markov Models 99
be a linear function. The reliability evaluation is usually performed over an ex-

tended period (years). See Figure 2.29 (b). For example, for 1 year we ob-
tain N f 3 (T = 1 year) 132.
3 140
Mean number of failures

Mean number of failures
2.5 120
100
2
80
1.5
60
1
Nf1(t) 40
Nf3(t)
0.5 20
0 0
0 2 4 6 8 0 0.2 0.4 0.6 0.8 1
Time (days) Time (years)
(a) (b)
Fig. 2.29 Mean number of generator entrances to the set of unacceptable states: (a) short time
period, and (b) 1 year time period
According to (2.101) the frequency of the power system failures can be ob-
tained:
1
ff3 = = 0.0076 year 1 .
Nf3
2.5 Semi-Markov Models
As was mentioned above, a discrete-state, continuous-time stochastic process can

only be represented as a continuous-time Markov chain when the transition time
between any states is distributed exponentially. This fact seriously restricts the ap-
plication of the Markov chain model to real-world problems. One of the ways to
investigate processes with arbitrarily distributed sojourn times is to use a semi-
Markov process model. The main advantage of a semi-Markov model is that it al-
lows non-exponential distributions for transitions between states and generalizes
several kinds of stochastic processes. Since in many real cases the lifetime and re-
pair times are not exponential, this is very important.
The semi-Markov processes were introduced almost simultaneously by Levy
(1954) and Smith (1955). At the same time, Takacs (1954) introduced essentially
the same type of processes and applied them to some problems in counter theory.
The foundations of the theory of semi-Markov processes can be found in Cinlar
(1975), Gihman and Skorohod (2004), Korolyuk and Swishchuk (1995), and
Silverstov (1980). For readers interested in the field of semi-Markov processes

applications to the reliability theory and performability analysis, the following
books may be especially recommended: Limnios and Oprisan (2000), Kovalenko
et al. (1997), and Sahner et al. (1996). Some interesting examples one can find in
Grabski and Kolowrocki (1999). Using of Petri nets in semi-Markov process
evaluation is also proved to be effective (Ulmeanu and Ionescu 1999).
The general theory of semi-Markov processes is quite complex. Here we study
some aspects of reliability evaluation based on using semi-Markov processes that
do not involve very complex computations. In many real-world problems, rela-
tively simple computation procedures allow engineers to assess the reliability of
MSSs with arbitrary transition times without Monte-Carlo simulation. This espe-
cially relates to MSS steady-state behavior.
2.5.1 Embedded Markov Chain and Definition of Semi-Markov

Process
In order to define a semi-Markov process, consider a system that at any time in-
stant t 0 can be in one of various possible states g1 , g 2 ,, g K . The system be-
havior is defined by the discrete-state continuous-time stochastic performance
process G ( t ) { g1 , g 2 , , g K } . We assume that the initial state i of the system
and one-step transition probabilities are given as follows:
G (0) = gi , i {1,..., K },
(2.104)
{ }
jk = P G (tm ) = g k G (tm 1 ) = g j , j , k {1,..., K }.
Here jk is the probability that the system will transit from state j with per-
formance rate g j to state k with performance rate g k . Probabilities jk ,
j , k {1,..., K } define the one-step transition probability matrix = jk for the
discrete-time chain G (tm ), where transitions from one state to another may hap-
pen only at discrete time moments t1 , t2 , , tm 1 , tm , . Such a Markov chain G(tm)
is called Markov chain embedded in stochastic process G(t), or embedded Markov
chain for short.
To each jk 0 a random variable corresponds T jk* with the cumulative distri-
bution function
Fjk* ( t ) = Fjk* (T jk* t ) (2.105)

and probability density function f jk* ( t ) . This random variable is called a condi-
tional sojourn time in state j and characterizes the system sojourn time in the j un-
der condition that the system transits from state j to state k.
The graphical interpretation of possible realization of the considered process is
shown in Figure 2.30. At the initial time instant G ( 0 ) = gi . The process transits to
state j (with performance rate gj) from the initial state i with probability ij . There-
fore, if the next state is state j, the process remains in state i during random time
Tij* with cdf Fij* ( t ) . When the process transits to state j, the probability of the
transition from this state to any state k is jk . If the system transits from state j to
state k, it remains in state j during random time T jk* with cdf Fjk* ( t ) up to the tran-
sition to state k.
Fig. 2.30 Semi-Markov stochastic process
This process can be continued over an arbitrary period T.

The described stochastic process G(t) is called a semi-Markov process if each
time the next state and the corresponding sojourn time in the current state must be
chosen independently of the previous history of the process.
The chain G (t m ) in this case will be a Markov chain with one-step transition
probabilities jk , j , k {1,..., K } and be called an embedded Markov chain.
So, in order to define the semi-Markov process one has to define the initial
state of the process and the matrices = jk and F* ( t ) = Fij* ( t ) for
i, j {1,..., K }.
Note that the process in which the arbitrarily distributed times between transi-
tions are ignored and only time instants of transitions are of interest is a homoge-
neous discrete-time Markov chain. However, in a general case, if one takes into
account the sojourn times in different states, the process does not have Markov
properties. (It remains a Markov process only if all the sojourn times are distrib-
uted exponentially.) Therefore, the process can be considered a Markov process
only at time instants of transitions. This explains why the process was named
semi-Markov.
The most general definition of the semi-Markov process is based on kernel ma-
trix Q(t). Each element Qij (t ) of this matrix determines the probability that a one-
step transition from state i to state j occurs during the time interval [0,T]. Using a
kernel matrix, one-step transition probabilities for embedded Markov chain can be
obtained as
ij = lim Qij (t ) (2.106)

t
and the CDF Fij* ( t ) of the conditional sojourn time in state i can be obtained as
1
Fij* ( t ) = Qij (t ) . (2.107)
ij
Based on the kernel matrix, the CDF Fi (t ) of unconditional sojourn time Ti in

any state i can be definedas
K K
Fi (t ) = Qij (t ) = ij F *ij (t ) . (2.108)
j =1 j =1
Hence, for pdf of the unconditional sojourn time in state i with performance
rate gi, we can write
d K
f i (t ) = Fi (t ) = ij f *ij (t ) . (2.109)
dt j =1
Based on (2.109), the mean unconditional sojourn time in state i can be ob-
tained as
K
Ti = tf i (t )dt = ijT *ij , (2.110)
0 j =1
where Tij* is the mean conditional sojourn time in state i given that the system tran-
sits from state i to state j.
Kernel matrix Q(t) and the initial state completely define the stochastic behav-
ior of a semi-Markov process.
In practice, when MSS reliability is studied, in order to find the kernel matrix
for a semi-Markov process, one can use the following considerations (Lisnianski
and Yeager 2000). Transitions between different states are usually executed as
consequences of such events as failures, repairs, inspections, etc. For every type of
event, the cdf of time between them is known. The transition is realized according
to the event that occurs first in a competition among the events.
In Figure 2.31, one can see a state-transition diagram for the simplest semi-
Markov process with three possible transitions from initial state 0. The process
will transit from state 0 to states 1, 2, and 3 when events of some different types 1,
2, and 3, respectively, occurs. The time between events of type 1 is random vari-
able T0,1 distributed according to CDF F0,1(t). If an event of type 1 occurs first, the
process transits from state 0 to state 1. The random variable T0,2 that defines the
time between events of type 2 is distributed according to cdf F0,2(t). If an event of
type 2 occurs earlier than other events, the process transits from state 0 to state 2.
Fig. 2.31 State-transition diagram of simplest semi-Markov process
The time between events of type 3 is random variable T0,3 distributed according
to cdf F0,3(t). If an event of type 3 occurs first, the process transits from state 0 to
state 3.
The probability Q01 (t ) that the process will transit from state 0 to state 1 up to
time t ( the initial time t = 0 ) may be determined as the probability that under
condition T0,1 t , the random variable T0,1 is less than variables T0,2 and T0,3.
Hence, we have
Q01 (t ) = Pr {(T0,1 t ) & (T0,2 > t ) & (T0,3 > t )}

t
= dF0,1 (u ) dF0,2 (u ) dF0,3 (u ) (2.111)
0 t t
t
= 1 F0,2 (u ) 1 F0,3 (u ) dF0,1 (u ).
0
In the same way we obtain
t
Q02 (t ) = 1 F0,1 (u ) 1 F0,3 (u ) dF0,2 (u ), (2.112)
0
t
Q03 (t ) = 1 F0,1 (u ) 1 F0,2 (u ) dF0,3 (u ). (2.113)
0
For a semi-Markov process with the state-transition diagram presented in Fig-

ure 2.31, we have the following kernel matrix:
0 Q01 (t ) Q02 (t ) Q03 (t )

0 0 0 0
Q (t ) = . (2.114)
0 0 0 0

0 0 0 0
Expressions (2.112) (2.114) can be easily generalized to the arbitrary number

of possible transitions from initial state 0.
In order to demonstrate the technique of kernel matrix computation we consider
the following example.
Example 2.7 We consider the simplest system with a state-transition diagram as
shown in Figure 2.31. Two random variables T0,1 and T0,2 are exponentially dis-
t t
tributed with CDFs F0,1 (t ) = 1 e 0,1 and F0,2 (t ) = 1 e 0,2 , respectively, and the
third random variable T0,3 has the following CDF:
0, if t < Tc ,
F0,3 (t ) =
1, if t Tc
(such a CDF corresponds to the arrival of events with constant period Tc).
Find:
1. one-step transition probabilities Q01(t), Q02(t), Q03(t) for the kernel matrix;
2. cumulative distribution function for unconditional sojourn time T0 in state 0;
3. one-step transition probabilities for the embedded Markov chain.
Solution. Using (2.111) (2.113) we obtain one-step probabilities for the kernel
matrix:
0,1 ( + )t
[1 e 0,1 0,2 ], if t < Tc ,
+
0,1 0,2
Q01 (t ) =
0,1 [1 e ( 0,1 + 0,2 )Tc ], if t T ,
0,1 + 0,2 c

0,2 ( + )t
[1 e 0 ,1 0 ,2 ], if t < Tc ,
0,1 + 0,2
Q02 (t ) =
0,2 [1 e ( 0 ,1 + 0 ,2 )Tc ], if t T ,
+ c
0,1 0,2
0, if t < Tc ,
Q03 (t ) = ( 0 ,1 + 0 ,2 )Tc
e , if t Tc .
According to (2.108), unconditional sojourn time T0 in state 0 is distributed as

follows:
( +
1 e 0,1 0,2 , if t < Tc ,
3 )t
F0 (t ) = Q0 j (t ) =
j =1 1, if t Tc .
One-step transition probabilities for embedded Markov chain are defined ac-
cording to (2.106):
0,1
01 = 1 e ( 0,1 + 0,2 )Tc
,
0,1 + 0,2
0,2
02 = 1 e ( 0 ,1 + 0 ,2 )Tc
,
0,1 + 0,2
03 = e ( 0,1 + 0,2 )Tc

.
2.5.2 Evaluation of Reliability Indices Based on Semi-Markov

Processes
In order to find the MSS reliability indices, the system state-space diagram should
be built as was done in previous sections for Markov processes. The only differ-
ence is that, in the case of the semi-Markov model, the transition times may be
distributed arbitrarily. Based on transition time distributions Fi, j (t ), the kernel
matrix Q(t) should be defined according to the method presented in the previous
section.
The main problem of semi-Markov process analysis is to find the state prob-
abilities. Let ij (t ) be the probability that the process that starts in initial state i at
instant t = 0 will be in state j at instant t. It was shown that probabilities ij (t ),
i, j {1, , K } , can be found from the solution of the following system of inte-
gral equations:
K t
ij (t ) = ij [1 Fi (t )] + qik ( ) kj (t ) d , (2.115)
k =1 0
where
dQik ( )
qik ( ) = , (2.116)
d
K
Fi (t ) = Qij (t ) , (2.117)
j =1
1, if i = j ,
ij = (2.118)
0, if i j.
The system of linear integral equations (2.115) is the main system in the theory
of semi-Markov processes. By solving this system, one can find all the probabili-
ties ij (t ) , i, j {1,..., K }, for a semi-Markov process with a given kernel matrix
Qij (t ) and given initial state.
Based on the probabilities ij (t ), i, j {1,..., K }, important reliability indices
can easily be found. Suppose that system states are ordered according to their per-
formance rates g K g K 1 ... g 2 g1 and demand g m w > g m 1 is constant.
State K with performance rate gK is the initial state. In this case system instantane-
ous availability is treated as the probability that a system starting at instant
t = 0 from state K will be at instant t 0 in any state g K ,, g m . Hence, we obtain
K
A(t , w) = Ki (t ) . (2.119)
j =m
The mean system instantaneous output performance and the mean instantane-
ous performance deficiency can be obtained, respectively, as
K
Et = gi Ki (t ) (2.120)
i =1
and
m 1
Dt ( w) = ( w gi ) Ki (t )1( w > gi ). (2.121)
i =1
In the general case, the system of integral Equations 2.115 can be solved only
by numerical methods. For some of the simplest cases the method of the Laplace
Stieltjes transform can be applied in order to derive an analytical solution of the
system. As was done for Markov models, we designate a LaplaceStieltjes trans-
form of function f(x) as
~ t
f ( s ) = L{ f ( x)} = e sx f ( x)dx . (2.122)
0
Applying the LaplaceStieltjes transform to both sides of (2.115) we obtain
K
ij ( s ) = iji ( s ) + ik fik ( s )kj ( s ), 1 i , j K , (2.123)
k =1
where i ( s ) is the LaplaceStieltjes transform of the function

i (t ) = 1 Fi (t ) = f i (t )dt = Pr{Ti > t} (2.124)
t
and, therefore,
~ 1 ~
i ( s ) = [1 f i ( s )] . (2.125)
s
The system of algebraic equations (2.123) defines LaplaceStieltjes transform

of probabilities ij (t ) , i, j {1,..., K }, as a function of the main parameters of a
semi-Markov process.
By solving this system, one can also find steady-state probabilities. A detailed
investigation is beyond the scope of this book and we only give here the resulting
formulae for computation of steady-state probabilities. Steady-state probabilities
ij = lim ij (t ) (if they exist) do not depend on the initial state of process I, and for
t
their designation, one can use only one index: j . It is proven that
p jT j
j = K
, (2.126)
pT
j =1
j j
where p j , j=1,,K are steady-state probabilities of the embedded Markov chain.

These probabilities are the solutions of the following system of algebraic equa-
tions:

p j =
i =1
pi ij , j = 1,..., K ,
K (2.127)
p = 1.

i =1
i
Note that the first K equations in (2.127) are linearly dependant and we cannot
K
solve the system without the last equation pi = 1 .
i =1
In order to find the reliability function, an additional semi-Markov model
should be built in analogy with the corresponding Markov models: all states corre-
sponding to performance rates lower than constant demand w should be united in
one absorbing state with the number 0. All transitions that return the system from
this absorbing state should be forbidden. The reliability function is obtained from
this new model as R( w, t ) = K 0 (t ).
Example 2.8 (Lisnianski and Levitin 2003). Consider an electric generator that
has four possible performance (generating capacity) levels g 4 = 100 MW,
g3 = 70 MW, g 2 = 50 MW, and g1 = 0. The constant demand is w = 60 MW. The
best state with performance rate g 4 = 100 MW is the initial state. Only minor fail-
ures and minor repairs are possible. Times to failures are distributed exponentially
with following parameters: 3,2 = 5 104 h 1 , 2,1 = 2 104 h 1 . Hence, times to
failures T4,3, T3,2, T2,1 are random variables distributed according to the corre-
sponding CDF:
4 ,3 t 3,2 t 2,1t
F4,3 (t ) = 1 e , F3,2 (t ) = 1 e , F2,1 (t ) = 1 e .
Repair times are normally distributed. T3,4 has a mean time to repair of
T3,4 = 240 h and a standard deviation of 3,4 = 16 h, T2,3 has a mean time to repair
of T2,3 = 480 h and standard deviation 2,3 = 48 h, T1,2 has a mean time to repair
T1,2 = 720 h and standard deviation 1,2 = 120 h. Hence, the CDF of random vari-
ables T3,4, T2,3, and T1,2 are, respectively:
1
t
(u T3,4 )
F3,4 (t ) = exp du,
2 2 3,4
2
3,4 0
1
t
(u T2,3 )
F2,3 (t ) = exp du,
2 2 2,3
2
2,3 0
1
t
(u T1,2 )
F1,2 (t ) = exp du.
2 2 1,2
2
1,2 0
Find the generator steady-state availability, mean steady-state performance

(generating capacity), and mean steady-state performance deficiency and the gen-
erator reliability function.
Solution. The state-transition diagram of the generator is shown in Figure 2.32 (a).
F3,4(t) F4,3(t)
F2,3(t) F3,2(t)
F1,2(t) F2,1(t)
(a) (b)
Fig. 2.32 Generator representation by stochastic process: (a) generator evolution in the state
space, and (b) semi-Markov model
State 4 is an initial state with generating capacity g4. After the failure, which
occurs according to distribution F4,3(t), the generator transits from state 4 to state 3
with reduced generating capacity g3.
If a random repair time in state 3, which is distributed according to CDF
F3,4(t), is lower than the time up to the failure in state 3, which is distributed ac-
cording to F3,2(t), the generator will come back to state 4. If the repair time is
greater than the time up to the failure in state 3, the generator will fall down to
state 2 with generating capacity g2.
If the random repair time in state 2, which is distributed according to CDF
F2,3(t), is lower than the time up to the failure in state 2, which is distributed ac-
cording to F2,1(t), the generator will come back to state 3. If the repair time is
greater than the time up to the failure in state 2, the generator will fall down to
state 1 with generating capacity g1.
In state 1 after repair time, which is distributed according to F1,2 ( t ) , the genera-
tor will come back to state 2.
Based on (2.111)(2.113), we obtain the following kernel matrix
Q(t)= Qij (t ) , i, j = 1, 2,3, 4:
0 Q12 (t ) 0 0
Q (t ) 0 Q23 (t ) 0
Q ( t ) = 21
0 Q32 (t ) 0 Q34 (t )

0 0 Q43 (t ) 0
in which
t
Q12 (t ) = F1,2 (t ), Q21 (t ) = [1 F2,3 (t )]dF2,1 (t ),
0
t t
Q23 (t ) = [1 F2,1 (t )]dF2,3 (t ), Q32 (t ) = [1 F3,4 (t )]dF3,2 (t ),
0 0
t
Q34 (t ) = [1 F3,2 (t )]dF3,4 (t ), Q43 (t ) = F4,3 (t ).
0
The corresponding semi-Markov process is presented in Figure 2.32 (b).

Based on the kernel matrix, the cdf of unconditional sojourn times in states 1,
2, 3, and 4 can be written according to (2.108) as
F1 (t ) = Q12 (t ) F2 (t ) = Q12 (t ) + Q23 (t )

F3 (t ) = Q32 (t ) + Q34 (t ) F4 (t ) = Q43 (t )
According to (2.109) and (2.110) we have the following mean unconditional

sojourn times: T1 = 720 h, T2 = 457 h, T3 = 226 h, T4 = 1000 h.
Using (2.106) we obtain one-step probabilities for the embedded Markov chain:

12 = F1,2 () = 1, 21 = [1 F2,3 (t )]dF2,1 (t ), 23 = [1 F2,1 (t )]dF2,3 (t ),
0 0

32 = [1 F3,4 (t )]dF3,2 (t ), 34 = [1 F3,2 (t )]dF3,4 (t ), 43 = F4,3 () = 1.
0 0
Calculating the integrals numerically, we obtain the following one-step prob-

ability matrix for the embedded Markov chain:
0 12 0 0 0 1 0 0
0 23 0 0.0910 0 0.9090 0
= lim Q ( t ) = 21 = .
t 0 32 0 34 0 0.1131 0 0.8869

0 0 43 0 0 0 1 0
In order to find steady-state probabilities p j , j = 1, 2,3, 4, for the embedded

Markov chain, we have to solve the system of algebraic equations (2.127) that
takes the form
p1 = 21 p2 ,

p2 = 12 p1 + 32 p3 ,

p3 = 23 p2 + 43 p4 ,
p = p ,
4 34 3
p1 + p2 + p3 + p4 = 1.
By solving this system we obtain: p1 = 0.0056, p2 = 0.0615, p3 = 0.4944,

p4 = 0.4385.
Now using (2.126) we obtain the steady state probabilities
p1T1 p2T2
1 = 4
= 0.0069, 2 = 4
= 0.0484,
p jT j
j =1
p jTj
j =1
p3T3 p4T4
3 = 4
= 0.1919, 4 = 4
= 0.7528.
p jTj
j =1
p jTj
j =1
The steady-state availability of the generator for the given constant demand is
A ( w ) = 3 + 4 = 0.9447.
According to (2.120), we obtain the mean steady-state performance
4
E = g k k = 91.13 MW,
k =1
and according to (2.121), we obtain the mean steady-state performance deficiency
D = ( w g 2 ) 2 + ( w g1 )1 = 0.50 MW.
In order to find the reliability function for the given constant demand
w = 60 MW, we unite states 1 and 2 into one absorbing state 0. The modified
graphical representation of the system evolution in the state space for this case is
shown in Figure 2.33 (a). Figures 2.33 (b) shows the state-space diagram for the
corresponding semi-Markov process.
(a) (b)
Fig. 2.33 State-transition diagrams for evaluating reliability function of generator: (a) evolution
in modified state space, and (b) semi-Markov model
As in the previous case, we define the kernel matrix for the corresponding
semi-Markov process based on expressions (2.111) (2.113):
0 0 0
Q ( t ) = Q30 (t ) 0 Q34 (t ) ,
0 Q43 (t ) 0
where
t t
Q30 (t ) = [1 F3,4 (t )]dF3,1 (t ), Q34 (t ) = [1 F3,1 (t )]dF3,4 (t ), Q43 (t ) = F4,3 (t ).
0 0
The reliability function for constant demand w = 60 MW is defined as
R( w, t ) = 40 (t ) .
References 113
According to (2.115), the following system of integral equations can be written

in order to find the probability 40 (t ):
t

40 ( t ) = 0 q43 ( )30 (t )d ,

t t

30 (t ) = 0 34 40
q ( ) (t ) d + 0 q30 ( )00 (t )d , .

(t ) = 1.
00

The reliability function obtained by solving this system numerically is pre-

sented in Fig. 2.34.
0.8
Reliability function
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Time (hours)
Fig. 2.34 Reliability function of generator
References
Aven T, Jensen U (1999) Stochastic models in reliability. Springer, New York

Bhat U, Miller G (2002) Elements of applied stochastic processes. Wiley, New York
Billinton R, Allan R (1996) Reliability evaluation of power systems. Plenum, New York
Cinlar E (1975) Introduction to stochastic processes. Prentice-Hall, Englewood Cliffs, NJ
Endrenyi J (1979) Reliability modeling in electric power systems. Wiley, New York
Epstein B, Weissman I (2008) Mathematical models for systems reliability. CRC/Taylor & Fran-
cis, London
Gihman I, Skorohod A (2004) Theory of stochastic processes. Springer, Berlin

Gnedenko B, Ushakov I (1995) Probabilistic reliability engineering. Wiley, New York
Grabski F, Kolowrocki K (1999) Asymptotic reliability of multi-state systems with semi-Markov
states. In: Schueller G, Kafka P (eds) Safety and Reliability. Proceedings of ESREL99, Mu-
nich, Germany: pp 317322
Howard R (1960) Dynamic programming and Markov processes. MIT Press, Cambridge, MA
International Standard (2006) Application of Markov techniques. Int Electrotech Commission
IEC 61165, Geneva, Switzerland
Kallenberg O (2002) Foundations of modern probability. Springer, Berlin
Korczak E (1997) Reliability analysis of non-repaired multistate systems. In: Guedes Soares C
(ed) Advances in Safety and Reliability. Proceedings of the ESREL'97, Lisbon, Portugal,
Pergamon, London: pp 22132220
Korn G, Korn T (2000) Mathematical handbook for scientists and engineers: Definitions, Theo-
rems, and Formulas for Reference and Review. Dover, New York
Korolyuk V, Swishchuk A (1995) Random evolution for semi-Markov systems. Kluwer, Singa-
pore
Kovalenko I, Kuznetsov N, Pegg Ph (1997) Mathematical theory of reliability of time dependent
systems with practical applications. Wiley, Chchester, UK
Levy P (1954) Process semi-markoviens. In: Proceedings of the International Congress on
Mathematics, Amsterdam, pp 416426
Limnios N, Oprisan G (2000) Semi-Markov processes and reliability. Birkhauser, Boston
Lisnianski A (2007) The Markov reward model for a multi-state system reliability assessment
with variable demand. Qual Technol Quant Manag 4(2):265278
Lisnianski A, Yeager A (2000) Time-redundant system reliability under randomly constrained
time resources. Reliab Eng Syst Saf 70:157166
Lubkov N, Stepanyans A (1978) Reliability analysis of technical systems based on multi-level
models. In: Trapeznikov V (ed) Theoretical issues of control systems development for ships.
Nauka, Moscow, pp 193206 (in Russian)
Mine H, Osaki S (1970) Markovian decision processes. Elsevier, New York
Natvig B (1985) Multistate coherent systems. In: Jonson N, Kotz S (eds) Encyclopedia of Statis-
tical Sciences. Wiley, New York, pp 732735
Natvig B, Hjort N, Funnemark E (1985) The association in time of a Markov process with appli-
cation to multistate reliability theory. J. Appl. Probab. 22:473479
Papoulis A (1991) Probability, random variables and stochastic processes. McGraw-Hill, New
York
Reibman A, Smith R, Trivedi K (1989) Markov and Markov reward model transient analysis: an
overview of numerical approaches. Eur J Oper Res 40:257267
Ross S (1995) Stochastic Processes. Wiley, New York
Sahner R, Trivedi K, Poliafito A (1996) Performance and reliability analysis of computer sys-
tems: an example-based approach using the SHARPE software package. Kluwer, Boston
Silverstov D (1980) Semi-Markov processes with discrete state space. Sovetskoe Radio, Moscow
(in Russian)
Smith RM, Trivedi KS, Ramesh AV (1989) The analyses of computer systems using markov re-
ward process. In: Takagi H (ed) Stochastic Analyses of Computer and Communication Sys-
tems, Elsevier, Amsterdam
Smith W (1955) Regenerative stochastic processes. Proc Roy Soc Lond Ser A 232:631
Ulmeanu A, Ionescu D (1999) The computer-assisted analysis of the semi-markovian stochastic
petri nets and an application. In: Ionescu D, Limnios N (eds) Statistical and Probabilistic
Models in Reliability. Birkhauser, Boston, pp 307320
Takacs L (1954) Some investigations concerning recurrent stochastic processes of certain type.
Magyar Tud Akad Mat Kutato Int Kzl 3:115128
References 115
Trivedi K (2002) Probability and statistics with reliability, queuing and computer science appli-
cations. Wiley, New York
Volik B, Buyanov B, Lubkov N, Maximov V, Stepanyants A (1988) Methods of analysis and
synthesis of control systems structures. Energoatomizdat, Moscow (in Russian)
3 Statistical Analysis of Reliability Data for
Multi-state Systems
3.1 Basic Concepts of Statistical Estimation Theory
The purpose of this chapter is to describe basic concepts of applying statistical

methods to MSSs reliability assessment. Here we will stay in the Markov model
framework and consider modern methods for estimation of transition intensity
rates. But first basic concepts of statistical estimation theory will be briefly pre-
sented. Readers who need more fundamental and detailed development of estima-
tion theory may wish to consult such texts as Bickel and Doksum (2007) or Leh-
mann and Casella (2003). Engineering applications can be found in Hines and
Montgomery (1997), Ayyub and McCuen (2003), etc.
The theory of estimation was founded by R. Fisher in a series of fundamental
papers around 1930 (Fisher 1925, 1934).
Suppose we carry out an experiment whose outcome X is random, X ,
where is called the sample space, i.e., the collection of all possible outcomes of
our experiment. Let us consider a typical situation X = { x1 , x2 ,..., xn } , where xi are
independent observations (for example, measurements) of n objects, chosen at
random from a certain population. Set { x1 , x2 ,..., xn } is called a random sample of
size n and each xi is called an observation. In the framework of classical statistics,
a sample is usually composed of random independently and identically distributed
observations. From a practical point of view this assumption means that observa-
tions of a given sample are obtained independently and under the same conditions.
We assume that the probability law of X has a known mathematical form, say,
with probability density function f ( x;1 ,..., k ) , where 1 ,..., k are unknown pa-
rameters of the population. For example, there may be a normal distribution
1
f ( x; , ) =
2
{ }
exp ( x ) 2 / 2 2 , < x < +,
118 3 Statistical Analysis of Reliability Data for Multi-state Systems
where and are two unknown parameters of the population, so
{1 , 2 } = { , } .
There will then always be an infinite number of functions of sample values,
called a statistic, which may be proposed to estimate one or more of the parame-
ters. Formally a statistic S = S ( X ) is any function of X. The statistic (as a func-
tion!) is called the estimator, meanwhile its numerical value is called the estimate.
Evidently the best estimate would be one that falls nearest to the true value of
the parameter to be estimated. In other words, the statistic whose distribution con-
centrates as closely as possible near the true value of the parameter may be re-
garded as the best estimate. Hence, the basic problem of the estimation in the
above case can be formulated as follows:
Determine the functions of the sample observations
1 ( x1 , x2 ,..., xn ),...,k ( x1 , x2 ,..., xn )
such that their distribution is concentrated as closely as possible to the true value
of the parameter. The estimating functions are then referred to as estimators.
3.1.1 Properties of Estimators
Several properties of estimators are of interest to engineers. The concepts that are
widely used, and sometimes misunderstood, include consistency, unbiasedness, ef-
ficiency, and sufficiency.
Consistency An estimator is consistent if the probability that will deviate

from parameter more than any fixed amount > 0, no matter how small, ap-
proaches zero as the sample size n becomes larger and larger. More formally, let
be an estimator of parameter based on a sample of size n. Then is a
n { }
n
consistent sequence of estimators of (or, briefly, n is a consistent for ) if for

every > 0
{
lim Pr n > = 0.
n
} (3.1)
Obviously consistency is a property concerning the behavior of an estimator for

indefinitely large values of sample size n, i.e., as n . Its behavior for finite n
is ignored.
3.1 Basic Concepts of Statistical Estimation Theory 119
Unbiasedness This is a property associated with finite n. A statistic

n = n ( x1 ,..., xn } is said to be an unbiased estimate of parameter if
{ }
E n = , (3.2)
where E{} is the expectation symbol.

The bias of an estimator is defined as
{ }
b = E n .

(3.3)
Therefore, bias deprives a statistic result of representativeness by systemati-

cally distorting it. It is not the same as a random error that may distort at any one
occasion but balances out on the average. It is important to note that bias is a sys-
tematic error. For an unbiased estimator the bias is 0.
It was shown that unbiased estimators are always consistent estimators (Ayyub
and McCuen 2003). However, a consistent estimator is not necessarily unbiased.
For example, the sample variance is consistent but must be corrected for its bi-
asedness by multiplying it by the factor n / (n 1).
Efficiency This is an important criterion for evaluating the quality of an estimator.

It is desirable to have an estimate that is close to true value. A measure of close-
ness is the variance, thus the efficiency of an estimator is inversely proportional to
its variance. A consistent estimator, , will be more efficient than another esti-
1
{} { }
mator, 2 , if Var 1 < Var 2 , where Var { } is the variance. If in a class of
consistent estimators for a parameter there exists one whose sampling variance is
less than that of any other estimator, it is called the most efficient estimator.
Whenever such an estimator exists, it provides a criterion for the measurement of
efficiency of the other estimators.
If is the most efficient estimator with variance V , and is any other esti-
1 1 2
mator with variance V2 , then the efficiency E of 2 is defined as
1
E= . (3.4)
2
Obviously E cannot exceed unity.
Sufficiency An estimator is said to be sufficient for a parameter if it contains

all the information in the sample regarding the parameter. Sufficiency implies that
no other estimator computed from the same sample can provide additional infor-
mation about the parameter.
3.1.2 Main Estimation Methods
3.1.2.1 Point Estimation
Point and interval estimations are the two basic kinds of estimation procedures
considered in statistics. Point estimation provides a single number obtained on the
basis of a data set (a sample) that represents a parameter of the distribution func-
tion or other characteristic of the underlying random variable of interest. The point
estimation does not provide any information about its accuracy. As opposed to
point estimation, interval estimation is expressed in terms of confidence intervals,
and the confidence interval includes the true value of the parameter with a speci-
fied confidence probability.
Several methods of point estimation are considered in mathematical statistics.
In this subsection, two of the most common methods, i.e., the method of moments
and the method of maximum likelihood, are briefly described.
The method of moments is an estimation procedure based on empirically estimated
moments (sample moments) of the random variable. We assume that the sample
{ x1 , x2 ,..., xn } was obtained by n observations of continuous random variable X.
Naturally one can define the sample mean and sample variance (the first and the
second moments) as the respective expected values of the sample of size n as fol-
lows:
1 n
x= xi
n i =1
(3.5)
and
1 n
S2 = ( xi x )2 . (3.6)
n i =1
Then x and S 2 can be used as the point estimates of the distribution mean
and variance 2 . It should be mentioned that estimator of variance (3.6) is bi-
ased since x is estimated from the same sample. However, this bias can be re-
moved by multiplying it by n / (n 1):
1 n
S2 = ( xi x ) 2 . (3.7)
n 1 i =1
Then, according to the method of moments, the sample moments are equated to
the corresponding distribution moments. The solutions of the equations obtained
provide the estimators of the distribution parameters. Estimates obtained by the
method of moments are always consistent, but they may not be efficient.
In order to illustrate the method of moments we consider the following exam-
ple.
Example 3.1 We assume there is a sample { x1 , x2 ,..., xn } that was taken from the
uniform distribution whose density function is given by
1
, a xb
f ( x) = b a
0, otherwise

with unknown parameters a and b.

The problem is to compute estimates a and b of parameters a and b, respec-
tively based on the given sample { x1 , x2 ,..., xn }.
Solution. Expectation and variance 2 of the uniform distribution is given by
b+a (b a )2
= , 2 = .
2 12
On the other hand, based on the sample { x1 , x2 ,..., xn } its mean and variance
can be estimated by using (3.5) and (3.7):
1 n
= xi ,
n i =1
1 n
2 =
n 1 i =1
( xi ) 2 .
Thus according to methods of moments one will have the two following equa-
tions:
b+a
= ,
2
(b a) 2
2 = .
12
Solving these equations for a and b gives
a = / 3,
b = + / 3.
Thus we obtain the following estimates a and b of parameters a and b, re-

spectively:
1 n 1 1 n
a =
n i =1
xi
3
( xi x )2 ,
n 1 i =1
1 n
1 1 n
b = xi + ( xi x )2 .
n i =1 3 n 1 i =1
The maximum-likelihood method is one of the most widely used methods of esti-
mation. This method is based on the principle of calculating values of parameters
that maximize the probability of obtaining a particular sample.
Consider a continuous random variable, X, with probability density function
f ( X , ) , where is a parameter. Assume that we have a sample { x1 , x2 ,..., xn }
of size n from the distribution of random variable X. Under the maximum-like-
lihood approach, the estimate of is found as the value that provides the high-
est (or most likely) probability density of observing the particular set
{ x1 , x2 ,..., xn }. The likelihood of the sample is the total probability of drawing
each item of the sample.
Generally speaking, the definition of the likelihood function is based on the
probability (for a discrete random variable) or on the probability density function
(for a continuous random variable) of the joint occurrence of n events (observa-
tions), X = x1 ,..., X = xn . For independent events the total probability is the prod-
uct of all the individual item probabilities. Thus, the likelihood function for a con-
tinuous distribution is introduced as
L( x1 , x2 ,..., xn , ) = f ( x1 , ) f ( x2 , ) ... f ( xn , ) . (3.8)

The maximum likelihood estimate, , is chosen as one that maximizes the

likelihood function, L( x1 , x2 ,..., xn , ), with respect to :
L( x1 , x2 ,..., xn ,) = max L( x1 , x2 ,..., xn , ) . (3.9)

The straightforward way to find the maximum of a parameter is to calculate the

first derivative with respect to this parameter and equate it to zero. Therefore, the
following equation can be written:
L( x1 , x2 ,..., xn , )
= 0, (3.10)

from which the maximum likelihood estimate can be obtained.

Due to the multiplicative form (3.8) of the likelihood function, it will be more
convenient to maximize the logarithm of the likelihood function, i.e., to solve the
following equation:
ln L( x1 , x2 ,..., xn , )
= 0. (3.11)

The logarithm is the monotonous transformation, and thus the estimate of

obtained from this equation is the same as that obtained from (3.10).
To illustrate the maximum-likelihood method, we consider two following ex-
amples, which will be useful for us in further subsections.
Example 3.2. Consider a binary-state component whose time to failure is expo-
nentially distributed with probability density function f (t ) = e t , where is an
unknown parameter. We assume that there is a sample {t1 , t2 ,..., tn } of n times to
failure that was drawn from this exponential distribution. Find the maximum like-
lihood estimate for .
Solution. Using (3.8) one obtains the likelihood function
n n
L(t , ) = exp( ti ) = n exp[ ( ti )]
i =1 i =1
and its logarithm
n
ln L(t , ) = n(ln ) ti .
i =1
Equation 3.11 can be written in the following way:
L(t , ) n n
= ti = 0 .
i =1
Solving this equation, the maximum likelihood estimate for can be obtained:
n
= n
.
t
i =1
i
It should be noted that the estimate is indeed the maximum likelihood esti-
mate, because we have the following second-order condition:
2 ln L n
2 2
= <0.
2
Example 3.3 Consider the test where only the number of tested items and the
number of failures are known. The measure that can be estimated on the basis of
such a sample is the failure probability, q, within the test period. Find the estimate
q for this probability by using the maximum-likelihood method.
Solution. Let a series of Bernoulli trials have n failures in N trials. Then the likeli-
hood function is
N
L(q) = q n (1 q) N n .
n
As in the previous example, in order to simplify the differentiation, the loga-

rithm of the likelihood function is taken as
N
ln L(q) = + n ln q + ( N n) ln(1 q) .
n
Then, by differentiating one obtains
ln L (q) n N n
= .
q q 1 q
Therefore the equation for the estimate q of parameter q takes the following
form:
n N n
=0.
q 1 q
By solving this equation one obtains the maximum likelihood estimate:
n
q = .
N
The result seems to be natural in order to estimate the failure probability one
should calculate the ratio between the number of failed items and the number of
all tested items. It was shown that this estimate is unbiased and effective.
We will use the results from Examples 3.2 and 3.3 for transition intensity point es-
timation for MSSs.
3.1.2.2 Interval Estimation
Let { x1 , x2 ,..., xn } be a random sample of n observations from a population involv-

ing a single unknown parameter . Let f ( x, ) be the probability density func-
tion of the distribution from which the sample is drawn. Suppose that the function
f ( x1 , x2 ,..., xn ) is an estimator of the population parameter . Having obtained
the value of the estimate = f ( x1 , x2 ,..., xn ) of the parameter, the question is,
can we make some reasonable probability statements about the unknown parame-
ter in the population from which the sample drawn?
Such a question is very reasonable, because if we took many different samples
from the single population and computed for each sample an estimate of an un-
known parameter, we could form a histogram for the sample parameter. For ex-
ample, the histogram for the sample means would approximate the sampling dis-
tribution of the estimated mean of the population, where is now treated as a
random variable. This question is very well answered by the technique of confi-
dence interval, which was introduced by Neyman (1935).
According to this technique we choose some small value of , for example,
0.05 or 0.01, and then determine two constants, say, c1 and c2, such that
Pr {c1 < < c2 f } = 1 . (3.12)

The constants c1 and c2 are called confidence limits and the interval [ c1 , c2 ]
within which the unknown value of the population parameter is expected to lie is
called the confidence interval. (1 ) is called the confidence coefficient.
Below we consider an example illustrating the basic idea of confidence limit
construction.
Example 3.4 Let { x1 , x2 ,..., xn } be a random sample from normal distribution,

N ( , 2 ), in which is unknown and 2 is assumed to be known. Construct con-
fidence limits for unknown parameter .
1 n
Solution. It was shown that sample mean X = xi considering as a statistic has
n i =1
the normal distribution N ( , 2 n). We introduce a new random variable
X
Z= ,
n
which has the standard normal distribution N(0, 1) with mean 0 and standard de-
viation 1. Using this distribution one can write
X
Pr z1 / 2 z1 / 2 = 1 ,
n
where z1 / 2 is the 100 (1 2 ) th percentile of the standard normal distribution,

which is presented in the corresponding table. After simple algebraic transforma-
tions, the last expression can be rewritten as

Pr X z1 / 2 X + z1 / 2 = 1 .
n n
This equation provides the symmetric (1 ) confidence interval of interest.

For example, if = 0.05, according the table of standard normal distribution
z1 / 2 = 1.96. Thus, we obtain

Pr X 1.96 X + 1.96 = 0.95.
n n
3.2 Classical Parametric Estimation for Binary-state System 127

This means that X 1.96 are 95% confidence limits for the unknown
n

population mean ( parameter ) . The interval X 1.96 , X + 1.96 is
n n
called the 95% confidence interval.
3.2 Classical Parametric Estimation for Binary-state System
3.2.1 Basic Considerations
In this section we briefly consider statistical methods for estimating the reliability
model parameter, such as of the exponential distribution, for binary-state sys-
tems. Our goal is to find the point estimate and confidence interval for this pa-
rameter.
Generally, estimation of parameters can be based on field data as well as on
data obtained from a special reliability or life test. In reliability testing, a sample
of components is placed on test under those environmental conditions in which the
components are expected to function. All times to failure are recorded. There are
two major types of tests. The first is testing with replacement of the failed items
where each item should be replaced after its failure by a new one, and the second
is testing without replacement. A sequence of recorded times tofailure is consid-
ered as a given sample for further analysis. A complete sample is one in which all
items have failed during a test for a given observation period, and all the failure
times are known. Note that the likelihood function for a complete sample was in-
troduced in Section 3.1 (Example 3.2). But in the real world, obtaining a complete
sample of observations is often impracticable. Usually we stop the test either at a
prescribed time or after observing a prescribed number of failed items. Otherwise,
the test becomes too time consuming or too costly. Thus, for some items the life-
time is censored, i.e., our information about it has the form the lifetime exceeds
some value t. Modern products are usually reliable enough so that a complete
sample is a rarity. Therefore, generally, reliability data are incomplete and we are
dealing with censored samples.
Let N be the number of items placed on the test and we assume that all items
are tested simultaneously.
If during the test period, T, only r items have failed, the failure times being
known and the failed items are not replaced, the sample is called singly censored
on the right at T. In this case about N-r unfailed items we only know that their
failure times are greater than the test period T. According to Lawless (2002) an
observation is called right censored at T if the exact value of observation is not

known, but it is known that it is greater than or equal to T.
If the failure time for an item is not known, but it is known that it is less than a
given value, then the failure time is called left censored. This type of censoring is
never used in reliability and so it is not under our consideration.
The time-terminated test is a test that is terminated at a predetermined nonrandom
time T. The number of failures, r, observed during the test period will be a random
variable. Such a situation is known as Type I right censoring (Meeker and Escobar
1998). In the time-terminated life test, n units (components) are placed on a test
and the number of components that failed during the test time and the correspond-
ing time to failure of each component are recorded.
The failure-terminated test is a test that is terminated when a nonrandom number
of failures specified in advance have been observed. In this case, the duration of
the test is a random variable. Such a situation is known as Type II right censoring.
Both time-terminated and failure-terminated reliability tests can be performed
with replacement of each failed item or without replacement.
3.2.2 Exponential Distribution Point Estimation
Time-terminated Test with Replacement

Suppose N identical components (items) are placed on test with replacement. The
test is terminated after a specified time Ts. Then the total time on test, T, which is
accumulated by both failed and unfailed components, is obtained by
T = NTs . (3.13)
If r failures have been observed up to time Ts, then the maximum likelihood
point estimate of the component failure rate can be found in a similar way to Ex-
ample 3.2:
r
= . (3.14)
T
Thus, the corresponding estimate of the components mean time to failure can
be obtained:
T
MTTF = . (3.15)
r
It should be noticed that the number of units tested during the test, ntest, is
ntest = N + r . (3.16)
If only one item is placed on test ( N = 1) and r failures were recorded during
the test time T, then we obtain
r r
= = . (3.17)
T Ts
Recall that expressions (3.14)(3.17) are true under the assumption that re-
placement times are negligibly small. If it is not so, then the total accumulated re-
placement time, TR , should be calculated and the failure rate may be estimated by
the following expression:
r
= . (3.18)
Ts TR
Time-terminated Test Without Replacement

Suppose N identical components (items) are placed on test without replacement.
As in the previous case, the test is terminated after a specified time, Ts, during
which r failures were recorded. Then the total time on test, T, which is accumu-
lated by both failed and unfailed components is
r
T = ( N r )Ts + ti , (3.19)
i =1
where
ti recorded time to failure for failed item i,
r
t
i =1
i accumulated time on test of the r failed items,
( N r )Ts accumulated time on test of the unfailed (surviving) items.
Failure-terminated Test with Replacement

Suppose N identical items are placed on test with replacement and an item is re-
placed as soon as it fails. The test is terminated after a time, Tr, when the rth fail-
ure has occurred.
The total time on test, T, associated with failed and unfailed items is given by
T = NTr . (3.20)
When the time to failure is exponentially distributed, can be estimated as
r r
= = . (3.21)
T NTr
The respective estimate of mean time to failure is given by
T
MTTF = . (3.22)
r
In this case the total number of items tested is
ntest = N + r 1, (3.23)
because the test is terminated when the last failed item fails, and so the last failed
item is not replaced.
Failure-terminated Test Without Replacement
N identical items are placed on test without replacement when a failure occurs,
the failed item is not replaced by a new one. The test is terminated after a time, Tr,
when the rth failure has occurred.
The total time on test, T, is obtained by
r
T = ( N r )Tr + ti , (3.24)
i =1
where
ti recorded time to failure for failed item i,
r
t
i =1
i accumulated time on test of the r failed items,
( N r )Tr accumulated time on test of the unfailed (surviving) items.

When the time to failure is exponentially distributed, can be estimated as
r r
= = r
. (3.25)
T
( N r )Tr + ti
i =1
The estimate of mean time to failure is given by
r
( N r )Tr + ti
T
MTTF = = i =1
. (3.26)
r r
In this case the total number of items tested is
ntest = N (3.27)
since no items are being replaced.
3.2.3 Interval Estimation for Exponential Distribution
Epstein (1960) considered the failure-terminated test and showed that if the time
2r
to failure is exponentially distributed with parameter , the variable = 2T

has the 2 distribution with 2r degrees of freedom. Therefore, one can write
2r
Pr 2 2;2 r 12 2;2 r = 1 . (3.28)

Taking into account = r , after rearranging one will have a two-sided con-
T
fidence interval for the true value of :
1 2 1 2
Pr 2;2 r 1 2;2 r = 1 . (3.29)
2T 2T
So one can obtain the upper confidence limit or the one-sided confidence inter-
val
1 2
Pr 1 2;2 r = 1 . (3.30)
2T
For the time-terminated test the exact confidence limits are not available. For
this case the approximate two-sided confidence interval for the failure rate, , was
obtained as
1 2 1 2
Pr 2;2 r 1 2;2 r + 2 = 1 . (3.31)
2T 2T
The corresponding upper confidence limit or the one-sided confidence interval

is given by
1 2
Pr 1 2;2 r + 2 = 1 . (3.32)
2T
3.3 Estimation of Transition Intensities via Output Performance

Observations
3.3.1 Multi-state Markov Model and Observed Reliability Data.

Problem Formulation
Actually, a binary-state system is the simplest case of a MSS having two distinc-
tive states (perfect functioning and complete failure). Point estimation for transi-
tion intensities of two-state (binary) Markov models was briefly considered in the
previous subsections. But till now there have been almost no investigations con-
sidering this problem in a multi-state context, in spite of the fact that it is an actual
practical problem. For example, in the field of power system reliability assessment
it has been recognized (Billinton and Allan 1996) that modeling large generating
units in generating capacity adequacy assessment by simple two-state models can
yield pessimistic appraisals. In order to assess unit reliability more accurately,
many utilities now use multi-state models instead of two-state representations. In
these models steady-state probabilities of a unit residing at different generating
capacity levels are used. Usually a steady-state probability of a unit residing at a
specified capacity level is simply defined as the part of the operation time when
the unit is at this capacity level. When the short-term behavior of MSSs is studied,
the investigation cannot be based on steady-state (long-term) probabilities. The
investigation should use a general MSS model, where transition intensities be-
tween any states of the model are known. The problem is to estimate these transi-
tion intensities from actual MSS failures (output performance deratings) and repair
statistics, which is represented by the observed realization of an output perform-
ance stochastic process. Below we shall present the corresponding technique for
point and interval estimation of transition intensities via output performance ob-
servation. The technique was primarily presented in Lisnianski (2008).
3.3 Estimation of Transition Intensities via Output Performance Observations 133
A general Markov model of a MSS with minor and major failures and repairs
(Lisnianski and Levitin 2003) is presented in Figure 3.1.
There are N states in the model, where each state i [1,..., N ] has its own as-
signed performance level gi . Usually state N is associated with the nominal per-
formance level and state 1 is associated with complete system failure, and all other
states i [ 2,..., N 1] are associated with the corresponding reduced performance
levels gi . The transition intensity from state i to state j is designated as aij .
aN-1,N aN,N-1
N-1
a2,N aN,2
aN-2,N-1 aN-1,N-2
a1,N ... ... aN,1
a2,3 a3,2
a1,N-1 aN-1,1
2
a1,2 a2,1
Fig. 3.1 General Markov model for a MSS
As a result, MSS output performance is known for any time instant t [0, T ] ,
where T is the total observation time, as well as the corresponding time instants of
MSS transitions from any performance level gi to level g j , i, j [1,..., N ] . The
example of a single realization of such a stochastic process is presented in Figure
3.2.
By its nature stochastic process GA (t ) is a discrete-state continuous-time proc-
ess. For this stochastic process the following designations are introduced.
Ti ( m ) systems sojourn time of mth residing in state i during observation time
T;
ki accumulated number of system entrances into state i (or accumulated num-
ber of system exits from state i to any other state) during observation time T;
kij accumulated number of system transitions from state i to any state j i

during observation time T.
For example, for the realization presented in Figure 3.2 during the MSS obser-
vation time T resided in state N ( k N = 4 ) four times, once it transited from state
N to state N1 ( k N , N 1 = 1) , once it transited from state N to state 3 ( k N ,3 = 1) , and
once it transited from state N to state 1 ( k N ,1 = 1) .
Fig. 3.2 MSS output performance GA (t ) as a stochastic process (single realization)
Thus, the reliability data for MSS that can be derived from the observation of
the output performance stochastic process during time T are the following.
For each state i are known:
{ }
1. sample Ti (1) , Ti (2) ,..., Ti ( ki ) of system sojourn times in state i during observation
time T;
2. number kij of system transitions from state i to any possible state j during ob-
servation time T; and
3. number k i of system residences in state i (or number of system exits from state
i to any other possible state) during observation time T.
The problem is to estimate transition intensities aij , i, j [1,..., N ] based on a
single realization of discrete-state continuous-time stochastic process GA (t ) that
was observed during time T.
3.3.2 Method Description
As was shown above, stochastic process GA (t ) is a discrete-state, continuous-

time Markov process. Here we introduce an additional stochastic process associ-
ated with (closely related to) process GA (t ). If random times between transitions
from state i to state j i in process GA (t ) are ignored and only time instants of
transitions are of interest, the resulting process will be a discrete-state, discrete-
time Markov chain. Such a chain may be considered only at time instants of tran-
sitions in underlying process GA (t ) and is called a Markov chain
GAm (n), n = 0,1, 2,... , embedded in process GA (t ) (Korolyuk and Swishchuk
1995). The embedded Markov chain GAm (t ) is completely defined by its initial
states probability distribution and one-step transition probabilities
ij , i, j = [1,...N ] .
The transitions between different states of the model in Figure 3.1 are executed
as consequences of events such as failures and repairs. Since a MSS is described
by a Markov model the CDF Fij (t ) of time between transition from state i to any
state j i is defined by the corresponding transition intensity such as
aij t
Fij (t ) = 1 e , (3.33)
where aij is the transition intensity from state i to state j.

Function Fij (t ) is the distribution of so-called conditional sojourn time Tij in
state i (Limnious and Oprisan 2000), which characterizes the system sojourn time
in state i under the condition that the unit transits from state i to state j. If the unit
is in state i at initial time instant t = 0, the probability Qij (t ) that it will transit
from state i to state j up to time t is called a one-step transition probability of dis-
crete-state, continuous-time process GA (t ). All these probabilities Qij (t ),
i, j = 1,..., N define kernel matrix Q(t) (Limnious and Oprisan 2000) for stochastic
process GA (t ): :
Q ( t ) = Qij ( t ) . (3.34)
These one-step probabilities for a kernel matrix may be defined in the follow-
ing way (Lisnianski and Jeager 2000). Each probability Qik ( t ) defines the prob-
ability that random variable Tik will be minimal from all other random variables
Tij , j i, j k , j = 1,..., N , which defines all possible transitions from state i to all
other states. thus, for each k i one will have
Tik = min{Ti1 ,..., Ti , k 1 , Ti , k +1 ,..., TiN }. (3.35)
Based on (3.35) one can obtain the one-step probability Qik ( t ) as the probabil-
ity that under condition Tik t the random variable Tik will be less than all other
variables Tij , j i, j k , j = 1,..., N .
Hence, for each i = 1, 2,..., N and k i can be written the following expression
Qik ( t )
{
= Pr (Tik t ) & (Ti1 > t ) & ... & (Ti , k 1 > t ) & (Ti , k +1 > t ) &...& ( TiN > t ) }
t
= dFik (u ) dFi1 (u )... dFi , k 1 (u ) dFi , k +1 (u )... dFiN (u ) (3.36)
0 t t t t
t
= [1 Fi1 (u ) ] ... 1 Fi , k 1 (u ) 1 Fi , k +1 (u ) ...[1 FiN (u ) ]dFik (u ).
0
By using (3.36) and taking into account expression (3.33) one obtains
aij t
N
a
Qik (t ) = N ik 1 e j=1 . (3.37)

aij
j =1

Based on one-step probabilities Qij ( t ) , i, j = 1,..., N the cdf Fi (t ) of uncondi-

tional sojourn time Ti in any state i can be obtained as
N
N aij t
Fi ( t ) = Qik ( t ) = 1 e j =1
. (3.38)
k =1
So, for a Markov model of a MSS, the unconditional sojourn time Ti is an ex-
ponentially distributed random variable with mean
1 1
Timean = N
= , (3.39)
A
aij
j =1
N
where A = aij .
j =1
According to Section 3.2.2 (time-terminated test) we can obtain an estimation

Timean of the mean unconditional sojourn time by using the sample
{T i
(1)
}
, Ti (2) ,..., Ti ( ki ) :
ki
Ti( j )
j =1
Timean = . (3.40)
ki
Based on (3.39) and (3.40) one can write the following expression for estimat-
ing the sum A of intensities of all transitions that exit from state i:
1 ki
A = = . (3.41)

Timean
ki
T
j =1
i
( j)
By using expression (3.41) one can estimate only the sum of intensities for all
transitions that exit from any state i. To estimate individual transition intensities,
an additional expression can be obtained in the following way.
Based on kernel matrix Q(t) for stochastic process GA (t ) one can obtain one-
step transition probabilities for embedded Markov chain GAm (t ):
ij = lim Qij (t ) . (3.42)

t
Taking into account expression (3.37) one will have

aij t
N
a

ik = lim Qik (t ) = lim N ik 1 e j=1 = aik (3.43)
t t
aij N
j =1 aij
j =1
or
N
aik = ik aij . (3.44)
j =1
Based on an observed single realization of the output performance stochastic

process, one-step transition probabilities ik of an embedded Markov chain can be
easily estimated as a ratio of corresponding numbers of transitions (Example 3.3):
kik
ik = . (3.45)
ki
Substituting estimates (3.41) and (3.45) into expression (3.44) the following es-
timate will be obtained for the transition intensity:
k 1 kik kik
aik = ik A = ik = = , i, k = [1,..., N ], i k , (3.46)

ki Timean ki
T i
Ti ( j )
j =1

where T i is the systems accumulated time residence in state i during total ob-

servation time T.
N
For a Markov MSS with N states the sum a
j =1
ij = 0 , therefore
N
aii = aij . (3.47)
j =1
j i
3.3.3 Algorithm for Point Estimation of Transition Intensities for

Multi-state System
Based on the method described in the previous subsection, the following algorithm
for data processing is suggested for multi-state Markov systems with N possible
states.
1. Calculate accumulated time of the systems residence in state i during total ob-
servation time T:
ki
T i = Ti(m) .
m =1
2. Estimate transition intensity aij from state i to state j i using the following
expression:
kij
aij = .
T i

3. Estimate the transition intensities for j = i using the following expression:
N
aii = aij .
j =1
j i
3.3.4 Interval Estimation of Transitions Intensities for Multi-state

System
MSS output performance was observed during time T; therefore, in this case we
are dealing with a time-terminated test. Thus, based on expression (3.31) de-
scribed in Section 3.2.3, the following two-sided confidence interval for the true
value of aij can be written:
1 1

Pr 2 2;2 kij aij 12 2;2 kij + 2 = 1 . (3.48)
2T i 2T i

Incidentally, based on expression (3.32) from Section 3.2.3, the corresponding

upper confidence limit or the one-side confidence interval for the true value of aij
can be obtained as
1

Pr aij 12 ;2 kij + 2 = 1 . (3.49)
2T
i
Example 3.5 A diesel generator with a nominal generating capacity of 360 KW is

considered. During observation time T = 1 year the generator was in the following
four states: state 4 with nominal capacity g 4 = 360 KW, state 3 with reduced gen-
erating capacity g3 = 325 KW, state 2 with reduced generating capacity g 2 = 215
KW, and state 1 (complete failure) with generating capacity g1 = 0. The corre-
sponding accumulated times of the generators residence in each state i during ob-
servation time T = 8760 h (1 year) were obtained:
T 1 = 480 h, T 2 = 742 h, T 3 = 511 h, T 4 = 7027 h.

The observed numbers kij of the generators transitions from state i to state j
are presented in Table 3.1.
Table 3.1 Observed numbers of generator transitions
State number 1 2 3 4
1 - 0 0 31
2 18 - 0 64
3 11 0 - 50
4 20 43 58 -
Find the point and interval estimations of transition intensities for a four-state
Markov model of the diesel generator.
Solution.
1. According to the given data, accumulated times of the systems residence in
states i=1,,4 during total observation time are as follows
T 1 = 480 h, T 2 = 742 h, T 3 = 511 h, T 4 = 7027 h.

2. Transition intensities should be estimated using the following expression:
kij
aij = , for i j .
T i
Therefore, based on the given kij in Table 3.1, we obtain the following point es-
timates:
0 0 31
a12 = = 0, a13 = = 0, a14 = = 0.065 h 1 ,
480 480 480
18 0 64
a21 = = 0.024 h 1 , a23 = = 0, a24 = = 0.086 h 1 ,
742 742 742
11 0 50
a31 = = 0.022 h 1 , a32 = = 0, a34 = = 0.098 h 1 ,
511 511 511
20 43 58
a41 = = 0.003 h 1 , a42 = = 0.006 h 1 , a43 = = 0.008 h 1 .
7027 7027 7027
3. Diagonal transition intensities should be estimated using the following expres-

sion:
4
aii = aij , i = 1,..., 4.
j =1
j i
So we have a11 = 0.065 h 1 , a22 = 0.110 h 1 , a33 = 0.120 h 1 ,

a44 = 0.017 h 1 .
4. As a result using the presented algorithm the following matrix of point estima-
tions of transition intensities was computed:
0.065 0 0 0.065
0.024 0.110 0 0.086
aij = .
0.022 0 0.120 0.098
0.003 0.006 0.008 0.017
5. Now based on expressions (3.48) the two-sided confidence interval for the true
values of transition intensities can be obtained.
For example, to calculate the two-sided confidence interval for a14 we have
kij = 31 and T 1 = 480 h; therefore, for = 0.1, by using (3.48) one obtains

1 1
Pr 0.05;2
2
31 a14 0.95;2
2
31+ 2 = 1 0.1 = 0.9,
2 * 480 2 * 480
or after the corresponding calculations
Pr {0.047 a14 0.087} = 0.9.

This means that the true value of a14 is within the interval [0.047, 0.087] with a
probability 0.9. All other confidence intervals can be found in the same way and
readers can do it themselves as an exercise.
References
Ayyub B, McCuen R (2003) Probability, statistics and reliability for engineers and scientists.
Chapman & Hall/CRC, London, NY
Bickel P, Doksum K (2007) Mathematical statistics. Pearson Prentice Hall, New Jersey
Billinton R, Allan R (1996) Reliability evaluation of power system, Plenum, New York
Epstein B (1960) Estimation from life test data. Technometrics 2:447454
Fisher R (1925) Theory of statistical estimation. Proceedings of the Cambridge Philosophical
Society, 22:700-725
Fisher R (1934) Two new properties of mathematical likelihood. Proceedings of Royal Society,
A, 144: 285307
Gertsbakh I (2000) Reliability theory with application to preventive maintenance. Springer, Lon-
don
Hines W, Montgomery D (1997) Probability and statistics in engineering and management sci-
ence. Wiley, New York
International Standard IEC60605-4 (2001) Procedures for determining point estimates and confi-
dence limits for equipment reliability determination tests. International Electrotechnical
Commission, Geneva, Switzerland
Korolyuk V, Swishchuk A (1995) Semi-Markov random evolutions. Kluwer, Dordrecht
Lawless J (2002) Statistical models and methods for lifetime data. Wiley, New York
Lehmann E, Casella G (2003) Theory of point estimation. Springer-Verlag, NY
Limnious N, Oprisan G (2000) Semi-Markov processes and reliability. Birhauser, Boston
Lisnianski A (2008) Point estimation of the transition intensities for a Markov multi-state system
via output performance observation. In: Bedford T et al (eds) Advances in Mathematical
Modeling for Reliability. IOS, Amsterdam
Lisnianski A, Jeager A (2000) Time-redundant system reliability under randomly constrained
time resources. Reliab Eng Syst Saf 70:157166
plications. World Scientific Singapore
Meeker W, Escobar L (1998) Statistical methods for reliability data. Wiley, New York
Modarres M, Kaminskiy M, Krivtsov V (1999) Reliability engineering and risk analysis: a prac-
tical guide. Dekker, New York
Neyman J (1935) On the problem of confidence intervals. Ann Math Stat 6:111116
4 Universal Generating Function Method
In recent years a specific approach called the universal generating function (UGF)
technique has been widely applied to MSS reliability analysis. The UGF technique
allows one to find the entire MSS performance distribution based on the perform-
ance distributions of its elements using algebraic procedures. This technique
(sometimes also called the method of generalized generating sequences) (Gne-
denko and Ushakov 1996) generalizes the technique that is based on a well-known
ordinary generating function. The basic ideas of the method were primarily intro-
duced by I. Ushakov in the mid 1980s (Ushakov 1986, 1987). Then the method
was described in a book by Reinshke and Ushakov (1988), where one chapter was
devoted to UGF. (Unfortunately, this book was published only in German and
Russian and so remained unknown for English speakers.) Wide application of the
method to MSS reliability analysis began in the mid-1990s, when the first applica-
tion was reported (Lisnianski et al. 1994) and two corresponding papers (Lisnian-
ski et al. 1996; Levitin et al. 1998) were published. Since then, the method has
been considerably expanded in numerous research papers and in the books by Lis-
nianski and Levitin (2003), and Levitin (2005).
Here we present the mathematical fundamentals of the method and illustrate the
theory by corresponding examples in order to provide readers with a basic knowl-
edge that is necessary for understanding the next chapters.
4.1 Mathematical Fundamentals
The UGF approach is based on intuitively simple recursive procedures and pro-
vides a systematic method for the system states enumeration that can replace ex-
tremely complicated combinatorial algorithms. It is very convenient for a comput-
erized realization of the different enumeration problems that often arise in MSS
reliability analysis and optimization.
Generally, the UGF approach allows one to obtain the systems output per-
formance distribution based on the given performance distribution of the systems
elements and the system structure function. In many real-world problems it can be
done by using a simple algebraic operation and does not require great computa-
tional resources. The computational burden is the especially crucial factor when
144 4 Universal Generating Function Method
one solves MSS reliability analysis and optimization problems where the perform-
ance measures have to be evaluated for a great number of possible solutions along
the search procedures. This makes using the traditional methods in MSS reliability
analysis and optimization problematic. In contrast, the UGF technique is fast
enough to be implemented in such problems and has proved to be very effective.
The UGF approach is universal enough because an analyst can use the same
procedures for systems with a different physical nature of performance and differ-
ent types of element interaction.
The UGF technique is based on an approach that is closely connected to gener-
ating functions that are widely used in probability theory. Therefore, we consider
these functions first.
4.1.1 Generating Functions
Consider a discrete random variable X that can take values k = 0,1, 2, and have
the following distribution (probability mass function):
Pr{ X = k} = pk , k = 0,1, 2,,

(4.1)
p
k =0
k = 1.
Definition 4.1 The generating function of random variable X denoted by X ( z ) is

defined (Feller 1970) as

X ( z) = pk z k . (4.2)
k =0
As follows from expression (4.2) the coefficient of z k equals the probability

that random variable X equals k.
Example 4.1 Suppose that discrete random variable X is distributed according to
the Poisson distribution
a k a
Pr{ X = k } = pk = e , k = 0,1, 2,
k!
Find the generating function for this random variable X.

Solution. In accordance with expression (4.2) we obtain
4.1 Mathematical Fundamentals 145
ak a k ( az ) k
X ( z) = pk z k = e z = ea = e a eaz = ea ( z 1) .
k =0 k =0 k ! k =0 k !
The generating function is very convenient when one deals with the summation
of discrete random variables. In order to explain this fact we consider the follow-
ing example.
Example 4.2 Suppose we have two discrete random variables X and Y with the
following distributions (pmf):
Table 4.1 Probability mass functions of random variables X and Y
k 0 1 2 3
Pr{X=k} 0.5 0.3 0.2
Pr{Y=k} 0.6 0.4
The problem is to find the distribution of random variable Z = X + Y .

Solution. We shall solve this problem in two different ways.
The first way. Let us find the Z distribution directly.
Random variable Z = X + Y can have the following values: 1, 2, 3, 4, or 5.
Z = 1, if X = 0 and Y = 1, then
Pr { Z = 1} = Pr { X = 0} Pr {Y = 1} = 0.5 0.6 = 0.30.
Z = 2, if X = 1 and Y = 1, then
Pr { Z = 2} = Pr { X = 1} Pr {Y = 1} = 0.3 0.6 = 0.18.
Z = 3, if X = 2, Y = 1, or X = 0, Y = 3, then
Pr {Z = 3} = Pr { X = 2} Pr {Y = 1} + Pr { X = 0} Pr {Y = 3}
=0.2 0.6 + 0.5 0.4 = 0.32.
Z = 4, if X = 1, Y = 3, then
Pr { Z = 4} = Pr { X = 1} Pr {Y = 3} = 0.3 0.4 = 0.12.

Z = 5, if X = 2 and Y = 3, then
Pr { Z = 5} = Pr { X = 2} Pr {Y = 3} = 0.2 0.4 = 0.08
So, the pmf of the random variable Z = X + Y is presented in Table 4.2.
Table 4.2 Probability mass function of random variable Z = X + Y
k 1 2 3 4 5
Pr{Z=k} 0.30 0.18 0.32 0.12 0.08
Note that in order to find the Z distribution directly one should analyze all pos-
sible combinations of X and Y values. In more complex cases it may be very time-
consuming work. Using generating functions can prevent such difficulties.
The second way to solve the problem is based on generating functions. Let X ( z )
and Y ( z ) be the generating functions of the respective distributions of random
variables X and Y. Then, according to (4.2), we can write
X ( z ) = 0.5 z 0 + 0.3z1 + 0.2 z 2 and Y ( z ) = 0.6 z1 + 0.4 z 3 .
The generating function Z ( z ) of random variable Z can be found as a product

of polynomials X ( z ) and Y ( z ):
Z ( z ) = X ( z ) Y ( z ) = (0.5 z 0 + 0.3z1 + 0.2 z 2 ) (0.6 z1 + 0.4 z 3 )

= 0.5 0.6 z1 + 0.3 0.6 z 2 + 0.2 0.6 z 3 + 0.5 0.4 z 3 + 0.3 0.4 z 4 + 0.2 0.4 z 5
= 0.30 z1 + 0.18 z 2 + 0.32 z 3 + 0.12 z 4 + 0.08 z 5 .
Coefficients of z k determine the probability that Z equals k, so the pmf of ran-

dom variable Z will be as shown in Table 4.3.
This distribution that was found using generating functions and the resulting
distribution that was found directly are exactly the same. But using generating
functions essentially simplified the problem solution because the resulting distri-
bution of two independent random variables can be found as a product of generat-
ing functions of these random variables.
Generating functions have some additional useful properties.
Suppose we wish to find Pr { X k } from the given generating function X (t )
of random variable X. Thus we will have the following:
1 k
Pr{ X k} = k X ( z ) = pi . (4.3)
z z =1 i = 0
This means that in order to find Pr{ X k} only coefficients of z powers in the
generating function of random variable X, where powers are less then or equal to k
should be summarized.
Furthermore, it is clear that
d
X ( z ) |z =1 = kpk z k 1 = kpk = E{ X }. (4.4)
dz k =1 z =1 k =1
This means that in order to find expectation E { X } of random variable X its

generating function should be differentiated and the derivation should be calcu-
lated for z = 1.
If one finds the second derivative of X ( z ), one will obtain the following
equation
d2
2
X ( z ) = k (k 1) pk z k 2 . (4.5)
d z k =0
For z = 1 one obtains
d2
d2z
X ( z ) | z =1 =
k =0
k ( k 1) pk =
k =0
k 2
p k
k =0
kpk . (4.6)
The first sum in the last expression (4.6) is the second initial moment 2 [ X ] of
random variable X, and the second sum is the expectation of random variable X.
Therefore, based on the generating function of random variable X one can obtain
an expression for the second initial moment 2 [ X ]:
d2 d
2[ X ] = 2
X ( z ) |z =1 + X ( z ) |z =1 . (4.7)
d z dz
This means that the second initial moment of the random variable can be ex-
pressed via the sum of the second and first derivatives of the generating function
for z = 1.
Example 4.3 Suppose that discrete random variable X is distributed according to
the Poisson distribution
a k a
Pr{ X = k } = pk = e , k = 0,1, 2,
k!
Find the expectation E{X} of random variable X using its generating function.
Solution. In Example 4.1 the generating function of random variable X (distributed
according to a Poisson distribution) was found to be
X ( z ) = e a ( z 1) .
By differentiating X ( z ) one will have
d d
X ( z ) = e a ( z 1) = ae a ( z 1) .
dz dz
Therefore, according to (4.4)
d
E{ X } = X ( z ) |z =1 = ae a ( z 1) |z =1 = a.
dz
4.1.2 Moment Generating Functions and the z-transform
A Generating function as defined by (4.2) has an essential restriction random

variable X is assumed to be discrete and takes only integer values k = 0,1, 2, In
order to expand the definition, so-called moment generating functions are used
(Ross 2000). We assume that when random variable X is discrete, it can take arbi-
trary real values x0 , x1 , x2 , with corresponding probabilities p0 , p1 , p2 , , and
when random variable X is continuous, its pdf is given by f ( x ) .
Definition 4.2 The moment generating function ( s ) of random variable X is de-

fined for all values s by
e sx px , if X is discrete,
x
( s) = E[e sX ] = + (4.8)
e sx f ( x)dx, if X is continuous.

It is very convenient to obtain higher moments of the random variable by suc-

cessively differentiating ( s ). This fact explains the name moment generating
function.
For example, for s=0
d d d
( s ) = E[e sX ] = E (e sX ) = E[ Xe sX ] = E[ X ]. (4.9)
ds ds ds
Similarly,
d2 d d d
2
( s) = ( s) = E[ Xe sX ]
d s ds ds ds
(4.10)
d
= E ( Xe sX = E[ X 2 e sX ] = E[ X 2 ].
ds
In general, the nth derivative of ( s ) evaluated at s = 0 equals E[ X n ] that is,
dn
(0) = E[ X n ], n 1. (4.11)
d ns
Note that generating function can be written formally by substituting e s = z

into a moment generating function. When a moment generating function is written
in such form, we will call it a z-transform function or simply z-transform of the
discrete random variable.
Definition 4.3 The z-transform X ( z ) of discrete random variable X is defined
for all values z by
X ( z ) = E[ z X ] = px z X . (4.12)
x
As one can see, the z-transform is a polynomial of z powers.

Example 4.4 The pmf of random variable X is presented in Table 4.3.
Table 4.3 Probability mass function of random variable X
X 0 1.65 2.3
Pr { X = xk } 0.3 0.5 0.2
Find the moment generating function and z-transform for random variable X.
Solution. In accordance with definition (4.8) one can write the moment generating
function of random variable X as
X ( s ) = 0.3e 0 s + 0.5e1.65 s + 0.2e 2.3 s .
In accordance with definition (4.12) one can obtain the following z-transform
of random variable X:
X ( z ) = px z X =0.3 z 0 + 0.5 z1.65 + 0.2 z 2.3 .

x
The same result can be obtained by substituting e s = z into X ( s ).

The main difference between the z-transform and the previously defined gener-
ating function of discrete random variables is that the z-transform is defined for
discrete random variables, which can take all real values, not only integers
0,1,2,
The main properties of generating functions such as (4.3), (4.4), and (4.7) are
preserved for the z-transform.
Corollary 4.1 The z-transform of the sum of independent random variables is just
the product of the individual z-transforms.
Proof. Suppose that random variables X and Y are independent and have
z-transforms X ( z ) and Y ( z ), respectively. Then X +Y ( z ), the z-transform of
X+Y, can be obtained in the following way:
X +Y ( z ) = E[ z ( X +Y ) ] = E[ z X zY ] = E[ z X ]E[ z Y ] = X ( z ) Y ( z ). (4.13)
Assume that the pmfs of random variables X1 and X2 are represented by the
vectors
{ } {
x1 = x11 ,..., x1k1 , p1. = p11 ,..., p1k1 } (4.14)
and
{ } {
x 2 = x21 ,..., x2 k1 , p 2. = p21 ,..., p2 k1 , } (4.15)
respectively.
This means that discrete random variables X i , i = 1, 2, can take values
{xi1 ,..., xiki } with corresponding probabilities { pi1 ,..., piki }. Therefore, moment
generating functions corresponding to the pmfs of random variables Xi will be fol-
lows:
ki
X ( z ) = pij z .
xij
i
j =1
Then X1 + X 2 ( z ), the z-transform of X1+X2, is obtained in accordance with

(4.13) as
k1 k2
( z ) = X1 ( z ) X 2 ( z ) = p1i z x1i p2 j z
x2 j
X 1 + X2
i =1 j =1
k1 k2 k1 k2
(4.16)
= p1i p2 j z z = p1i p2 j z
x1i x2 j ( x1i + x2 j )
.
i =1 j =1 i =1 j =1
This property can be easily generalized to n independent random variables:
n
n ( z ) = X j ( z ) =
Xj j =1
j =1
(4.17)
k1 k2 kn
= ... ( p1 j1 p2 j2 ... pnjn ) z
( x1 j1 + x2 j2 + ... xnjn )
.
j1 =1 j2 =1 jn =1
Another important property is that the z-transform of a discrete random vari-

able uniquely determines its distribution (pmf). This means that a one-to-one cor-
respondence exists between the pmf and the moment-generating function of a dis-
crete random variable.
The resulting z-transform X +Y ( z ) relates the probabilities of all the possible
combinations of realizations X = xi, Y = yj, for any i and j, with the values that the
random function X + Y takes on for these combinations.
The reader wishing to learn more about generating functions and the z-
transform is referred to the books by Gnedenko (1969), Grimmett and Stirzaker
(1992), and Ross (2000).
Example 4.5 Suppose that one performs k independent trials and each trial can re-
sult either in a success (with probability ) or in a failure (with probability 1).
Let random variable Xj represent a success or failure event that occurs in the jth
trial; therefore
1, in the case of success,

Xj =
0, in the case of failure.
k
Find the z-transform for random variable X = X i that represents the number
i =1
of successes in k consecutively performed trials.

Solution. The pmf of any variable X j (1 < j < k ) is
{ } {
Pr X j = 1 = p, Pr X j = 0 = 1 p. }
The corresponding z-transform takes the form
X ( z ) = z1 + (1 ) z 0 .
j
The random number of successes that occur in k trials is equal to the sum of the
number of successes in each trial:
k
X= Xj
j =1
Therefore, the corresponding z-transform can be obtained in accordance with

(4.17) as
k k
X ( z ) = X ( z ) = z1 + (1 ) z 0
j
j =1
k
k
= z j j (1 ) k j .
j =0 j
This z-transform corresponds to the pmf of binomial distribution:
k
X n = i, pi = i (1 ) k i , i = 0,1, 2,..., k .
i
4.1.3 Universal Generating Operator and Universal Generating

Function
Consider two independent discrete random variables X1, X2 and assume that each
variable X i , i = 1, 2, has a pmf represented by the vectors xi = {x11 ,..., x1ki } and
pi = { p11 ,..., p1ki }.
We shall find a pmf of random variable Y that is an arbitrary function of X1 and

X2 , so Y = f ( X 1 , X 2 ) .
In order to solve the problem, one has to evaluate vector y of all of the possible
values of the resulting random variable Y and vector q of probabilities that vari-
able Y will take these values.
Each possible value of Y corresponds to a combination of the values of its ar-
guments X1, X2. The total number of possible combinations is
K = k1k2 , (4.18)
where ki ( i = 1, 2 ) is the number of different realizations of random variable Xi.

Since the variables are statistically independent, the probability of each unique
combination is equal to the product of the probabilities of the realizations of ar-
guments composing this combination.
The probability of the jth combination of the realizations of the variables can be
obtained as
2
q j = piji = p1 j1 p2 j2 , j = 1, 2, , K , (4.19)
i =1
where j1 and j2 determine the corresponding value of random variable Y and
y j = f ( x1 j1 , x2 j2 ), j = 1, 2,, K . (4.20)
Therefore, vectors y j = { y1 , , yK } and q j = {q1 ,, qK } defined by (4.20)

and (4.19), respectively, completely determine the resulting pmf of random vari-
able Y = f ( X 1 , X 2 ).
Now the resulting z-transform uY ( z ) for random variable Y = f ( X 1 , X 2 ) can
be written as
K k1 k2
uY ( z ) = q j z = p1 j1 p1 j2 z
yj f ( x1 j1 , x2 j2 )
. (4.21)
j =1 j1 =1 j2 =1
If one compares Equation 4.21, where the z-transform for random variable
Y = f ( X 1 , X 2 ) was found, with expression (4.16), where the z-transform for ran-
dom variable Y = X 1 + X 2 was found, one notices the following.
Remark 4.1 The z-transform for random variable Y = f ( X 1 , X 2 ) can be formally

obtained as a product of individual z-transforms of random variables X1 and X2,
but instead of summation of all possible combinations of values of random vari-
ables X1 and X2, in order to find the corresponding powers of z one should calcu-
late for them a value of the given function f.
Therefore, one can see that in such an interpretation the z-transform formally is
not polynomial because the corresponding powers of z for the resulting polynomi-
als product are obtained by summing the corresponding z powers of individual
polynomials.
To define formally such an action as (4.21) with individual z-transforms, a uni-
versal generating operator (UGO) f was introduced. Application of this opera-
tor to individual z-transforms of independent random variables X1 and X2 will pro-
duce a z-transform of random variable Y = f ( X 1 , X 2 ) .
Let the functions
k1
= p1i z x1i
x1 k1
u X1 ( z ) = p11 z x11 + p12 z x12 + ... + p1k1 z
i =1
and
k2
= p2i z x2 i
x2 k2
u X 2 ( z ) = p21 z x21 + p22 z x22 + ... + p2 k2 z
i =1
represent z-transforms for two random variables X1 and X2, respectively.

Definition 4.4 UGO f , which produces the resulting z-transform uY ( z ) for
random variable Y = f ( X 1 , X 2 ) , is defined by
k1 k2
{ }
f u X1 ( z ), u X 2 ( z ) = p1 j1 p2 j2 z
j1 =1 j2 =1
f ( x1 j1 , x2 j2 )
= uY ( z ). (4.22)
Expression (4.22) can be easily extended to an arbitrary number of random

variables X i , i = 1, 2, , n. For this case we can formulate the following definition
of a UGO.
Definition 4.5 Let individual z-transforms
kj
u X j ( z ) = p ji z ji , j = 1, 2,, n,
x
i =1
represent pmfs of n random variables X j .

UGO f , which produces z-transform for random variable
Y = f ( X 1 , X 2 , , X n ), is defined by
f {u X1 ( z ), u X 2 ( z ),..., u X n ( z )}
k1 k2 kn (4.23)
= ... ( p1 j1 p2 j2 ... pnjn ) z
f ( x1 j1 , x2 j2 ,..., xnjn )
.
j1 =1 j2 =1 jn =1
One can see that using such Definition 4.5 is very useful for MSS reliability
evaluation. Each multi-state element j, j=1, 2, , n, in the MSS can be repre-
sented by its individual z-transform u X j ( z ) that characterizes the elements possi-
ble performance levels xji and corresponding probabilities pji, where i = 1, 2,, k j .
The MSS, in its turn, is represented by the structure function f ( X 1 , X 2 , , X n ).
In this case operator f produces the resulting z-transform of MSS output per-
formance or, in other words, determines output performance levels yi and corre-
sponding probabilities pi, where i = 1, 2, , K . Here K is the total number of possi-
ble performance levels of the entire MSS and can be obtained as
n
K = kj. (4.24)
j =1
UGO f is also called a composition operator (Lisnianski and Levitin 2003)

and is often designated as (Levitin 2005).
f
Now the following definition can be formulated.

Definition 4.6 Let individual z-transforms
kj
u X j ( z ) = p ji z ji , j = 1, 2, , n,
x
i =1
represent pmfs of n random variables Xj and function f represent the new random
variable Y = f ( X 1 , X 2 , , X n ). .
These individual z-transforms are called universal generating functions
(UGFs), if and only if a corresponding UGO f is defined for them.
So z-transforms become UGFs if for them is defined a corresponding UGO
f .
4.1.4 Generalized Universal Generating Operator
One can notice that in a computational sense, the introduction of the auxiliary
variable z permits us to separate the variables of interest: p and x. According to
(4.22) and (4.23), the UGO determines different actions with probabilities p and
performance levels x. From this point of view the z-transform is only useful as a
visual presentation, not more. Based on an understanding of this fact, we introduce
a more general definition for UGO (Gnedenko and Ushakov 1995, Ushakov 1998,
2000).
Definition 4.7 Let two sequences A and B represent two pmfs of random variables
XA and XB:
A= {( p A1 , x A1 ) ,
( p A2 , xA 2 ) ,, ( pAk A
}
, x Ak A ) ,
B = {( p B1 , xB1 ) , ( pB 2 , xB 2 ) , , ( pBk
B
, xBkB )} .
A UGO f operates on the pair of sequences A and B and produces a new se-
quence C = f { A, B} of pairs that represents a pmf of random variable
X C = f ( X A , X B ) in the following manner: for each pair { pAi , xAi } and
{ pBi , xBi } the pair {p Ai pBi , f ( x Ai , xBi )} should be computed.
As one can see, this definition is analogous to Definition 4.4, but it does not use
a z-transform at all.
Usually the resulting pairs of the obtained sequence C should be ordered in ac-
cordance with increasing values of their second components.
In addition, when two or more pairs in the newly obtained sequence C have the
same value of their second components, one should combine all such selected
pairs into a single pair. The first component of this single pair is the sum of all
first components of the selected pairs, and the second component of the new pair
is equal to the value of the same second components of the selected pairs. This
procedure is analogous to the procedure of like-term collection in the resulting z-
transform.
More formally we can write
f ( A, B ) = C
or, since each component of sequence C is a pair of numbers, it can also be rewrit-
ten as
{
f ( A, B ) = fp ( A, B ), fx ( A, B ) , }
where fp ( A, B ) = p Ai pBi is a suboperator that operates on the first components
of sequences A and B and fx ( A, B ) = f ( xAi , xBi ) is a suboperator that operates on
the second components of sequences A and B.
Here we extend Definition 4.7 presented in (Gnedenko and Ushakov 1995) to

the case of n given sequences S1, S2, , Sn representing pmfs of discrete random
variables X1 , , Xn.
Definition 4.8 Let n sequences S1 ,, S n represent n pmfs of discrete random vari-
ables X 1 ,, X n :
{
Si = ( p X11 , xX1 1 ),..., ( p X1k1 , x X1k1 ) }
... ...
{
Sn = ( p X n 1 , x X n 1 ),..., ( p X n kn , x X n kn ) }
A UGO f operates on the set of sequences S1 ,, S n and produces a new
sequence S = f {S1 ,, S n } of pairs, which represents a pmf of random variable
Y = f ( X 1 , X 2 , , X n ) in the following manner:
for each possible combination of pairs
( p X1 j1 , x X1 j1 ), ( p X 2 j2 , x X 2 j2 ),..., ( p X1 jn , x X1 jn )
j1 = 1, , k1 , j2 = 1, , k2 , , jn = 1, , kn
the pair
(p X1 j1 (
p X 2 j2 ... p X n jn , f x X1 j1 , x X 2 j2 ,..., x X n jn )) (4.25)
should be computed.
One can see that this definition in a computational sense is analogous to Defini-
tion 4.5, but it is not based on a z-transform. Therefore, it is clear that the UGO
plays a central role and the z-transform is only useful as a visual representation of
individual sequences Si and the resulting sequence S. This representation is useful
for us, and below we shall use such a z-transform representation for pmfs of dis-
crete random variables that characterize the performance of an individual MSSs
components and an entire MSSs output performance.
In addition, it should be noted that theoretically each sequence Si can be com-
posed not only of pairs, but, for example, of triplets:
{
Si = ( pi1 , xi1 , vi1 ),..., ( piki , xiki , viki ) . }
In practice this corresponds to the case where performance is represented by a
vector. For example, an electrical generator can have different levels of generating
capacity (x) and energy producing costs (v) corresponding to each level. For such
cases two different suboperators fx ( S1 , S2 ,..., Sn ) and fv ( S1 , S2 ,..., Sn ) for sepa-
rate operations with x and v should be determined. For z-transform representation
this means that z powers in the general case may be vectors, not only scalars.
Therefore, this is the second reason why z-transforms in a UGF interpretation are
not polynomial. The first reason was already listed above: an operator defined
over z-functions can differ from the operator of the polynomial product (unlike the
ordinary z-transform, where only the product of polynomials is defined).
4.1.5 Universal Generating Function Associated with Stochastic

Processes
As was mentioned above, there is a one-to-one correspondence between discrete

random variable X and its z-transform. In that sense we can say that each discrete
random variable X has a corresponding z-transform. In MSS reliability computa-
tion we often must deal with stochastic processes, not only with random variables.
For example, MSS output performance in the general case is treated as a discrete-
state, continuous-time stochastic process G(t), and the output performance of each
MSSs element j is treated as a discrete-state, continuous-time stochastic process
Gj(t). For any time instant ti > 0 a discrete-state, continuous-time stochastic proc-
ess G(t), which has K different performance levels { g1 , g 2 , , g K } , is a dis-
crete random variable G(ti) with a corresponding distribution of state probabilities
{ p1 ( ti ) , p2 ( ti ) , , pK ( ti )}. Each probability pm ( ti ) , m = 1, 2,, K is a probabil-
ity that the process will be in the state with performance gm at instant ti. This dis-
crete random variable Gi(t) has a corresponding z-transform:
u ( z , ti ) = p1 (ti ) z g1 + p2 (ti ) z g2 + ... + pK (ti ) z g K . (4.26)
In this sense we can say that any discrete-state, continuous-time stochastic

process G(t) at any time instant t > 0 has an associated z-transform designated as
u ( z , t ).
Definition 4.9 Let two vectors { g1 , g 2 , , g K } and { p1 ( ti ) , p2 ( ti ) ,, pK ( ti )}

represent at any time t > 0 states performance levels and states probability dis-
tributions respectively, for independent discrete-state, continuous-time stochastic
process G ( t ) {g1 , g 2 ,..., g K } . The z-transform
u ( z , t ) = p1 (t ) z g1 + p2 (t ) z g2 + ... + pK (t ) z g K (4.27)
4.2 Universal Generating Function Technique 159
is called a z-transform associated with stochastic process G(t).

If for such z-transforms UGO f is defined, they are called UGFs associated
with corresponding discrete-states, continuous-time stochastic processes.
This z-transform extension was primarily introduced by Lisnianski (2004a) and
then applied to MSS reliability evaluation by Lisnianski (2004b, 2005). A power-
ful method for complex MSS reliability assessment combined UGF and stochas-
tic process method is based on this definition and will be considered in the next
Chapter of the book.
4.2 Universal Generating Function Technique
The technique based on using the z-transform and composition operators f is

called the universal z-transform or universal generating function (UGF) technique.
In the context of this technique, the z-transform of a random variable for which the
operator f was defined is referred to as its u-function. From now we shall usu-
ally refer to the u-function of variable Xi as ui(z), and to that of the function
f ( X 1 , , X n ) as U(z). According to this notation
U ( z ) = f ( u1 ( z ), u2 ( z ),..., un ( z ) ) . (4.28)
Recall that in MSS reliability interpretation the coefficients of the terms in the
u-function usually represent the probabilities of states and corresponding perform-
ance levels encoded by the exponent in these terms.
Straightforward computation of the pmf of function f ( X 1 , , X n ) using
(4.23) is based on an enumerative approach. This approach is extremely resource
consuming. Indeed, the resulting u-function U(z) associated with structure function
f ( X 1 , , X n ) contains K terms, which requires excessive storage space. In or-
der to obtain U(z) one has to perform (n-1)K procedures of probability multiplica-
tion and K procedures of function evaluation. Fortunately, there are two effective
ways to reduce the computational burden: like-terms collection and a recursive
procedure.
4.2.1 Like-terms Collection and Recursive Procedure
The u-functions inherit the essential property of the regular polynomials: they al-
low for collecting like terms. Indeed, if a u-function representing the pmf of a ran-
dom variable X contains the terms ph z xh and pm z xm for which xh = xm , the two
terms can be replaced by a single term ( ph + pm ) z xm , since in this case
Pr { X = xh } = Pr { X = xm } = ph + pm .
Example 4.6 Suppose that the resulting UGF that was found using the composi-
tion operator takes the following form:
U ( z ) = 0.25z1 + 0.3z 2 + 0.1z1 + 0.1z 4 + 0.15z1 + 0.1z 9 .
By collecting the like terms in this u-function we obtain
U ( z ) = 0.5z1 + 0.3z 2 + 0.1z 4 + 0.1z 9 .
In practice, many functions used in reliability engineering produce the same

values for different combinations of the values of their arguments. Therefore, like-
term collection usually considerably reduces the number of terms in a resulting
u-function.
The second way of to simplify computations is based on recursive determina-
tion of the u-functions.
The problem of system reliability analysis usually includes evaluation of the
pmf of some random variables characterizing the systems output performance.
These variables can be very complex functions of a large number of random vari-
ables. The explicit computation of such functions is an extremely complicated and
sometimes hopeless task. Fortunately, the UGF method for many types of systems
allows one to obtain the system u-function recursively. This property of the UGF
method is based on the associative property of many structure functions used in
reliability engineering. The recursive approach presumes obtaining u-functions of
subsystems containing several basic elements and then treating the subsystem as a
single element with the u-function obtained when computing the u-function of a
higher-level subsystem.
In order to illustrate the recursive approach we consider the following example
(Elmakias 2008).
Example 4.7 Consider the function
Y = f ( X 1 ,, X 5 ) = ( max ( X 1 , X 2 ) + min ( X 3 , X 4 ) ) X 5
of five independent random variables X1, , X5. The probability mass functions of
these variables are determined by pairs of vectors xi, pi ( 0 i 5 ):
{( 5, 8, 12 ) , ( 0.6, 0.3, 0.1)} , {( 8, 10 ) , ( 0.7, 0.3)} ,{( 0, 1) , ( 0.6, 0.4 )} ,
{( 0, 8, 10 ) , ( 0.1, 0.5, 0.4 )} , {(1, 1.5) , ( 0.5, 0.5)}.
Find the resulting u-function U(z) for random variable Y.

Solution. The given pmfs can be represented in the form of u-functions as follows:
u1 ( z ) = p10 z x10 + p11 z x11 + p12 z x12 = 0.6z 5 + 0.3z 8 + 0.1z12 ;

u2 ( z ) = p20 z x20 + p21 z x21 = 0.7z 8 + 0.3z10 ;
u3 ( z ) = p30 z x30 + p31 z x31 = 0.6z 0 + 0.4z 2 ;
u4 ( z ) = p40 z x40 + p41 z x41 + p42 z x42 = 0.1z 0 + 0.5z 3 + 0.4z 5 ;
u5 ( z ) = p50 z x50 + p51 z x51 = 0.5z1 + 0.5z1.5 .
Using the straightforward approach one can obtain the pmf of random variable
Y applying operator f (4.22) over these u-functions. Since k1 = 3, k2 = 2, k3 = 2,
k4 = 3, k5 = 2, the total number of term multiplication procedures that one has to
perform using this equation is 3 2 2 3 2 = 72.
In order to demonstrate the recursive approach we introduce three auxiliary
random variables X6, X7, and X8: X6 = max{X1, X2}; X7 = min{X3, X4}; X8 = X6 +
X7, and Y = X8 X5.
We can obtain the pmf of variable Y using composition operators over pairs of
u-functions as follows:
{
u6 ( z ) = max {u1 ( z ) , u2 ( z )} = max (0.6z 5 + 0.3z 8 + 0.1z12 ), (0.7z 8 + 0.3z10 ) }
max {5,8} max{8,8} max {12,8} max {5,10}
= 0.42z 0.21z + 0.07z + 0.18z
max {8,10} max {12,10}
+ 0.09z + 0.03z = 0.63z + 0.27z10 + 0.1z12 ;
8
{
u7 ( z ) = min {u3 ( z ) , u4 ( z )} = min (0.6z 0 + 0.4z 2 ), (0.1z 0 + 0.5z 3 + 0.4z 5 ) }
min {0,0} min {2,0} min {0,3} min {2,3} min{0,5}
= 0.06z + 0.04z + 0.3z + 0.2z +0.24z
min {2,5}
+ 0.16z = 0.64z 0 + 0.36z 2 ;
{
u8 ( z ) = + {u6 ( z ) , u7 ( z )} = + (0.63z 8 + 0.27z10 + 0.1z12 ), (0.64z 0 + 0.36z 2 ) }
8+ 0 10 + 0 12 + 0 8+ 2
= 0.4032z + 0.1728z + 0.064z + 0.2268z
10 + 2 12 + 2
+ 0.0972z + 0.036z = 0.4032z + 0.3996z10 + 0.1612z12 + 0.036z14 ;
8
U ( z ) = {u8 ( z ) , u5 ( z )}
{
= (0.4032z 8 + 0.3996z10 + 0.1612z12 + 0.036z14 ), (0.5z1 + 0.5z1.5 ) }
81 101 121 141 81.5
= 0.2016z + 0.1998z + 0.0806z + 0.018z + 0.2016z
101.5 121.5 141.5
+ 0.1998z + 0.0806z + 0.018z
=0.2016z + 0.1998z + 0.2822z12 + 0.018z14 + 0.1998z15
8 10
+ 0.0806z18 + 0.018z 21 .
The resulting u-function U(z) represents the pmf of Y, which takes the form
y = ( 8, 10, 12, 14, 15, 18, 21) ;

q = ( 0.2016, 0.1998, 0.2822, 0.018, 0.1998, 0.0806, 0.018 ) .
Note that during the recursive computation of this pmf we used only 26 term
multiplication procedures. This considerable computational complexity reduction
is possible by combining the recursive approach with like-term collection in in-
termediate u-functions.
4.2.2 Evaluating Multi-state System Reliability Indices Using

Universal Generating Function
Since the UGF U ( z , t ) represents the instantaneous performance distribution of

the MSS, it can be used for evaluating such reliability measures as instantaneous
availability, mean instantaneous performance, mean instantaneous performance
deficiency, accumulated performance deficiency, etc. Having a UGF U ( z , t ) as-
sociated with the MSS output performance distribution at any instant t 0
K
U ( z , t ) = pi (t ) z gi ,
i =1
one can obtain the system availability at instant t > 0 for the arbitrary constant
demand w using the following operator A:
K K
A(t , w) = A (U ( z , t ), w) = A ( pi (t ) z gi , w) = pi (t )1( F ( g i , w) 0), (4.29)
i=1 i=1
where F ( g i , w) is an acceptability function and

1, if F ( gi , w) 0,
1( F ( gi , w) 0) =
0, if F ( gi , w) < 0.
This means that operator A summarizes for any time instant t > 0 all prob-
abilities of acceptable states.
The MSS instantaneous expected output performance at instant t > 0 defined
by (1.22) can be obtained for the given U ( z , t ) using the following E operator:
K K
E (t ) = E (U ( z , t )) = E ( pi (t ) z g i ) = pi (t ) gi . (4.30)
i =1 i =1
When the MSS performance is represented by a scalar value and U ( z , t ) takes

the form of a genuine polynomial, the operator E produces the value of the first
derivative of U ( z , t ) for z = 1:
K
dU ( z , t ) K
E ( pi (t ) z g ) =
i
z =1 = pi (t ) gi . (4.31)
i =1 dz i =1
The conditional mean MSS performance [the mean performance of the MSS
given the system is in states for which F ( gi , w) 0] defined by (1.25) can be ob-
tained using the CE operator:
K
E * = CE (U ( z , t )) = CE pi (t ) z gi
i =1 (4.32)
K K
= pi (t ) gi 1( F ( gi , w) 0) / pi (t )1( F ( gi , w) 0).
i =1 i =1
The average MSS expected output performance for a fixed time interval [0,T] is
defined according to (1.24) as follows:
T T
1 1 K
ET = E (t )dt = gi pi (t )dt. (4.33)
T 0 T i =1 0
In order to obtain the mean instantaneous performance deficiency for the given
U ( z , t ) and the constant demand w according to (1.30), the following D operator
should be used:
K K
D(t , w) = D (U ( z , t ), w) = D ( pi (t ) z gi , w) = pi (t ) max( w gi , 0). (4.34)
i =1 i =1
The average accumulated performance deficiency for a fixed time interval [0,T]
is defined according to (1.31) as follows:
T K T
D T = D(t , w)dt = max( w gi ) pi (t )dt. (4.35)
i =1
0 0
If the steady-state probabilities pi = lim pi (t ) of the system states

t
i = 1,, K exist, one can determine the MSS steady-state availability A , the
mean steady-state performance E , and the mean steady-state performance defi-
ciency D by replacing pi(t) with pi in (4.29), (4.31), and (4.34), respectively.
Note that here we do not consider the UGF approach application to evaluating
such reliability indices as mean time to failure and mean number of failures. An
interesting method for calculating the steady-state failure frequency (or mean
number of failures) was suggested by Korczak (2007, 2008). In these works an ex-
tension of the UGF method for simultaneous steady-state availability and failure
frequency calculation was presented. The suggested method is based on dual-
number algebra.
Example 4.8 Consider multi-state element with minimal failures and repairs that
has three different output performance rates: g1 = 0, g 2 = 20, and g 3 = 40 . The
corresponding transition intensities are: 2,1 = 2.02 year 1 , 3,2 = 7.01 year 1 ,
1,2 = 10 year 1 , and 2,3 = 14 year 1 .
A states-transition diagram of the element is presented in Figure 4.1.
The element fails if its performance falls below the required demand w = 15,
therefore, its acceptability function takes the form F ( gi , w ) = gi 15.
In the initial moment t = 0, he element is in the state with maximal perform-
ance g3 = 40.
Find the element instantaneous availability, instantaneous expected output per-
formance, and average expected output performance for a fixed time interval T
and the mean instantaneous performance deficiency.
Fig. 4.1 State-transition diagram for the element
Solution. To find state probabilities pi ( t ) , i = 1, 2,3 based on Section 2.3.2.2 one

can write the system of differential equations (2.66) for the considered example in
the following form:
dp1 (t )
dt = 1,2 p1 (t ) + 2,1 p2 (t ),

dp2 (t )
= 1,2 p1 (t ) (2,1 + 2,3 ) p2 (t ) + 3,2 p3 (t ),
dt
dp3 (t )
dt = 2,3 p2 (t ) 3,2 p3 (t ).

Solving the system using the Laplace-Stieltjes transform under initial condi-
tions p1 ( 0 ) = p2 ( 0 ) = 0, p3 ( 0 ) = 1, one obtains the following probabilities:
p1 ( t ) = 0.043e 23.478t 0.106e 9.552t + 0.063,

p2 ( t ) = 0.289e 23.478t 0.0246e 9.552t + 0.313,
p3 ( t ) = 0.246e 23.478t + 0.13e 9.552t + 0.624.
Thus, the element output performance distribution at any instant t > 0
g = { g1 , g 2 , g3 } = {0, 20, 40} , p ( t ) = { p1 ( t ) , p2 ( t ) , p3 ( t )}

can be represented by the following UGF associated with the elements output
performance stochastic process G ( t ) = { g1 , g 2 , g3 }:
3
U ( z , t ) = pi (t ) z gi = p1 ( t ) z 0 + p2 ( t ) z 20 + p3 ( t ) z 40 .
i =1
The MSS fails if its performance falls below the required demand w = 15. In
accordance with (4.29) the MSS instantaneous availability is
3 3
A(t ) = A (U ( z , t ),15 ) = A pi (t ) z gi ,15 = pi (t )1( F ( gi ,15) 0)
i=1 i=1
23.478 t
= p2 (t ) + p3 (t ) = 0.043e + 0.106e 9.552t + 0.937.
According to (4.30) the MSS instantaneous expected output performance is
3 3
E (t ) = E (U ( z ) ) = E pi (t ) z gi = pi (t ) gi = 20 p2 (t ) + 40 p3 (t )
i =1 i =1
23.478t 9.552 t
= 4.047e + 4.730e + 31.223.
The MSS average expected output performance for a fixed time interval [0,T] is
obtained according to (4.33) as follows:
1
T
1 K
T
1 T T

ET = E (t )dt = gi pi (t )dt = 2
20 p ( t ) dt + 40 p3 (t )dt
T 0 T i =1 0 T 0 0
T
1
T 0
= (4.047e 23.478t + 4.730e 9.552t + 31.223) dt
1
= (0.667 0.172e 23.478T 0.495e9.552T ) + 31.223.
T
For T = 0.5 year ET = 32.55, for T = 1 year ET = 31.89.

The mean instantaneous performance deficiency according to (4.34) is
3 3
D(t ) = D (U ( z, t ),15) = D pi (t ) z gi ,15 = pi (t ) max(15 gi , 0)
i =1 i =1
23.478t 9.552 t
= 15 p1 (t ) = 0.650e 1.597e + 0.947.
It can be easily seen that the steady-state probabilities are
p1 = lim p1 (t ) = 0.063, p2 = lim p2 (t ) = 0.313, p3 = lim p3 (t ) = 0.624.

t t t
Therefore, we obtain the steady-state reliability indices as follows:
A = p2 + p3 = 0.937, E = 20 p2 + 40 p3 = 31.223, D = 15 p1 = 0.947.
4.2.3 Properties of Composition Operators
The properties of composition operator (4.23) strictly depend on the properties of

the structure function. Since the procedure of the multiplication of the probabili-
ties in this operator is commutative and associative, the entire operator can also
possess these properties if the structure function possesses them.
Consider an MSS with structure function f consisting of n ordered elements. It
can be seen that if the output performance of any subsystem consisting of j first
elements ( 2 j n ) can be determined as
(
( G1 , G2 ,, G j ) = ( G1 , G2 , , G j 1 ) , G j , ) (4.36)
then the operator determining the u-function of the subsystem Uj(z) for 2 j n
can be obtained as
(
U j ( z ) = u1 ( z ), u2 ( z ),..., u j ( z ) )
(4.37)
= ( )
(u1 ( z ), u2 ( z ),..., u j 1 ( z )), u j ( z ) .
Therefore one can obtain the entire system UGF assigning U1 ( z ) = u1 ( z ) and
applying operator consecutively:
(
U j ( z ) = W f U j 1 ( z ) , u j ( z ) ) (4.38)
for 2 j n (Figure 4.1).

If the structure function has the associative property
(
( G1 ,, G j , G j +1 ,, Gn ) = ( G1 , , G j ) , ( G j +1 , , Gn ) ) (4.39)
for any j, the operator also has this property:
(u1 ( z ),..., un ( z )) = ( (u1 ( z ),..., u j 1 ( z )), (u j ( z ),..., un ( z )), (4.40)
which means that one can consider any subset of the adjacent elements as a sub-
system for which its u-function can be obtained. The subset can further be treated
(Lisnianski and Levitin 2003) as a single element having this u-function. The
u-functions of the MSS with the structure functions meeting condition (4.19) can
be obtained recursively by dividing the ordered set of elements into arbitrary sub-
sets of adjacent elements, replacing these subsets with elements having
u-functions equivalent to u-functions of the subsets and further applying recur-
sively the same aggregating procedure to the reduced set of elements until obtain-
ing the UGF of the entire system (Figure 4.2).
u1 ( z ) u2 ( z ) uj (z) un ( z )
U2 ( z ) U j (z) Un ( z)
Fig. 4.2 Example of recursive derivation of u-function for an MSS meeting condition (4.16)
If in addition to property (4.19) the structure function is also commutative:
( G1 ,, G j , G j +1 ,, Gn ) = ( G1 ,, G j +1 , G j ,, Gn )
for any j, which provides the commutative property for the operator:
(
u1 ( z ),..., u j ( z ), u j +1 ( z ),..., un ( z ) )
(4.41)
(
= u1 ( z ),..., u j +1 ( z ), u j ( z ),..., un ( z ) )
the order of elements in the MSS has no sense and the subsystems in the recurrent
procedure described above can contain an arbitrary set of elements. This means
(Lisnianski and Levitin 2003) that any subset of the system elements u-functions
can be replaced by its equivalent u-function and further treated as a single element
(Figure 4.3).
u1 ( z ) uj ( z) uk ( z ) um ( z ) ue ( z ) uf ( z)
Ui ( z ) Un ( z ) Ud ( z)
U j (z) U ( z)
Fig. 4.3 Example of recursive derivation of u-function for an MSS meeting condition (4.19)
Representing the functions in the recursive form is beneficial from both the
derivation clarity and computation simplicity viewpoints. In many cases, the struc-
ture function of the entire MSS can be represented as the composition of the struc-
ture functions corresponding to some subsets of the system elements (MSS sub-
systems). The u-functions of the subsystems can be obtained separately and the
subsystems can be further treated as single equivalent elements with the perform-
ance pmf represented by these u-functions.
u7 ( z ) u4 ( z )
U1 ( z )
u2 ( z )
u3 ( z ) u1 ( z )
u6 ( z ) u5 ( z )
U3 ( z ) u8 ( z )
U4 ( z)
U (z)
Fig. 4.4 Example of recursive derivation of u-function for an MSS meeting conditions (4.19) and
(4.21)

Connected in Series
While the structure function of a binary series system is unambiguously deter-

mined by its configuration (represented by the reliability block diagram), the
structure function of a series MSS also depends on the physical meaning of the
system and of the elements performance and on the nature of the interaction
among the elements.
In the flow transmission MSS, where performance is defined as capacity or
productivity, the total capacity of a subsystem containing n independent elements
connected in series is equal to the capacity of a bottleneck element (the element
with the worst performance). Therefore, the structure function for such a subsys-
tem takes the form
(1)
f ser (G1 ,..., Gn ) = min {G1 ,..., Gn } . . (4.42)
In task processing MSS, where performance is defined as the processing speed

(or operation time), each system element has its own operation time and the sys-
tems total task completion time is restricted. The entire system typically has a
time resource that is larger than the time needed to perform the systems total task.
However, unavailability or deteriorated performance of the system elements may
cause time delays, which in turn would cause the systems total task performance
time to be unsatisfactory. The definition of the structure function for task process-
ing systems depends on the discipline of the elements interaction in the system.
When the system operation is associated with consecutive discrete actions per-
formed by the ordered line of elements, each element starts its operation after the
previous one has completed its operation. Assume that the random performances
Gj of each element j is characterized by its processing speed. The random process-
ing time Tj of any system element j is defined as T j = 1 / G j . The total time of task
completion for the entire system is
n n
T = T j = G j 1 . (4.43)
j =1 j =1
The entire system processing speed is therefore
n
G = 1 = ( G j 1 ) 1 . (4.44)
T j =1
Note that if for any j G j = 0 the equation cannot be used, but it is obvious that
in this case G = 0. Therefore, one can define the structure function for the series
task processing system as
n 1 n
1 / G j , if G j 0.
(2) j =1 j =1
f ser (G1 ,..., Gn ) = n
(4.45)
0,

if G j = 0.
j =1
One can see that the structure functions presented above are associative and
commutative i.e. meet conditions (2.114) and (2.116). Therefore, the u-functions
for any series system of the described types can be obtained recursively by con-
secutively determining the u-functions of arbitrary subsets of the elements. For
example, the u-function of a system consisting of four elements connected in a se-
ries can be determined in the following ways:
( ( ) )
fser fser fser ( u1 ( z ) , u2 ( z ) ) , u3 ( z ) , u4 ( z )
= fser ( (u ( z ) , u ( z )) , (u ( z ) , u ( z )))
fser 1 2 fser 3 4
and by any permutation of the elements u-functions in this expression.

Example 4.9 Consider a system consisting of n binary-state elements (elements
with only the total failures) connected in series. Each element j has only two
states: operational with a nominal performance of gj1 and failure with a perform-
ance of zero. The probability of the operational state is pj1. The u-function of such
an element is presented by the following expression:
g j1
u j ( z ) = (1 p j1 ) z 0 + p j1 z , j = 1, , n.
Find the UGF U(z) for the entire MSS and steady-state reliability measures
A , D , and E as functions of constant demand level w.
Solution. To find the u-function for the entire MSS, the corresponding f ser op-
erators should be applied. For a MSS with the structure function (4.42) the system
u-function takes the form
n n
U ( z ) = f (1) ( u1 ( z ),..., un ( z ) ) = (1 p j1 ) z 0 + p j1 z min{ g11 ,..., gn1 } .
ser j =1 j =1
For a MSS with the structure function (4.45) the system u-function takes the
form
n n ( g j11 )1
U ( z ) = f ( 2 ) ( u1 ( z ),..., un ( z ) ) = (1 p j1 ) z + p j1 z
0 j =1
.
ser j =1 j =1
Since the failure of each individual element causes the failure of the entire sys-
tem, the MSS can have only two states: one with the performance level of zero
(failure of at least one element) and one with the performance level
g = min { g11 ,..., g n1} for the flow transmission MSS and g = 1 / nj =1 g j11 for the
task processing MSS.
The measures of the system performance A , D = E (max( w G , 0)), and
E are presented in Table 4.4.
Table 4.4 Measures of MSS performance
w A D E
n n n
w > g 0 w(1 p j1) + ( w g ) p j1 = w g p j1
j =1 j =1 j =1 n
g p j1
n n j =1
0 < w g p j1 w(1 p j1 )
j =1 j = 1
The u-function of a subsystem containing n identical binary-state elements

( p j1 = p, g j1 = g for any j) takes the form (1 p n ) z 0 + p n z g for the system with
g
the structure function (4.42) and takes the form (1 p n ) z 0 + p n z n
for the system
with structure function (4.45).

Connected in Parallel
In the flow transmission MSS, in which the flow can be dispersed and transferred
by parallel channels simultaneously (which provides the work sharing), the total
capacity of a subsystem containing n independent elements connected in parallel
is equal to the sum of the capacities of the individual elements. Therefore, the
structure function for such a subsystem takes the form
n
(1)
f par (G1 ,..., Gn ) = G j . (4.46)
j =1
In some cases, only one channel out of n can be chosen for the flow transmis-
sion (no flow dispersion is allowed). This happens when the transmission is asso-
ciated with the consumption of certain limited resources that does not allow simul-
taneous use of more than one channel. The most effective way for such a system
to function is by choosing the channel with the greatest transmission capacity from
the set of available channels. In this case, the structure function takes the form
(1)
f par 1 (G1 ,..., Gn ) = max{G1 ,..., Gn }. (4.47)
In a task processing MSS, the definition of the structure function depends on

the nature of the elements interaction within the system.
First consider a system without work sharing in which the parallel elements act
in a competitive manner. If the system contains n parallel elements, then all the
elements begin to execute the same task simultaneously. The task is assumed to be
completed by the system when it is completed by at least one of its elements. The
entire system processing time is defined by the minimum element processing time
and the entire system processing speed is defined by the maximum element proc-
essing speed. Therefore, the system structure function coincides with (4.47).
Now consider a system of n parallel elements with work sharing for which the
following assumptions are made:
1. The work x to be performed can be divided among the system elements in any
proportion.
2. The time required to make a decision about the optimal work sharing is negli-
gible; the decision is made before the task execution and is based on the infor-
mation about the elements state during the instant the demand for the task exe-
cuting arrives.
3. The probability that the elements will fail during any task execution is negligi-
ble.
The elements start performing the work simultaneously, sharing its total
amount x in such a manner that element j has to perform the xj portion of the work
and x = j =1 x j . The time of the work processed by element j is x j G j . The sys-
n
tem processing time T is defined as the time during which the last portion of work
is completed: T = max 1 j n {x j / G j }. The minimal time of the entire work com-
pletion can be achieved if the elements share the work in proportion to their proc-
essing speed Gj: x j = xG j / nk =1 G k . The system processing time T in this case is
equal to x / nk =1 Gk and its total processing speed G is equal to the sum of the
processing speeds of its elements. Therefore, the structure function of such a sys-
tem coincides with structure function (4.46).
One can see that the structure functions presented also meet conditions (4.39)
and (4.41). Therefore, the u-functions for any parallel system of the described
types can be obtained recursively by the consecutive determination of the
u-functions of arbitrary subsets of the elements.
Example 4.10 Consider a MSS consisting of two elements with total failures con-
nected in parallel. The elements have nominal performance g11 and g21 ( g11 < g 21 )
and the probability of operational state p11 and p21 respectively. Performance in
failure state is zero. Demand level w is constant.
Find MSS reliability indices such as steady-state availability A , steady-state
performance deficiency D , and steady-state expected output performance E .
Solution. The u-function for the entire MSS is
(
U ( z ) = f par ( u1 ( z ), u2 ( z ) ) = f par (1 p11 ) z 0 + p11 z g11 ,(1 p21 ) z 0 + p21 z g21 , )
which for structure function (4.46) takes the form
g11 g 21 g11 + g 21
U1 ( z ) = (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 (1 p11 ) z + p11 p21 z
and for structure function (4.47) takes the form
g11 g 21
U 2 ( z ) = (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 (1 p11 ) z + p11 p21 z max( g11 , g21 )
g11
= (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 z g21 .
The measures of the system output performance for MSSs of both types are
presented in Tables 4.5 and 4.6.
Table 4.5 Reliability measures of MSSs with structure function (4.42)
w A D E
w > g11 + g 21 0 w p11 g11 p21 g 21
g11 p11 ( p21 1) + g 21 p21 ( p11 1)

g 21 < w g11 + g 21 p11 p21 p11 g11 + p21 g 21
+ w (1 p11 p21 )
g11 < w g 21 p21 (1 p21 )( w g11 p11 )

0 < w g11 p11 + p21 p11 p21 (1 p11 )(1 p21 ) w
The u-function of a subsystem containing n identical parallel elements with

only complete failures ( p j1 = p, g j1 = g for any j) can be obtained by applying the
operator f par ( u1 ( z ),...., un ( z ) ) over n functions u(z) of an individual element.
The u-function of this subsystem takes the form:
n
n!
U1 ( z ) = p k (1 p )n k z kg (4.48)
k =0 k !( n k )!
for structure function (4.46) and
U 2 ( z ) = (1 p) n z 0 + (1 (1 p) n ) z g (4.49)
for structure function (4.47).
Table 4.6 Reliability measures of MSSs with structure function (4.45)
w A( w) D ( w) E
w > g 21 0 w p11 g11 p21 g 21 + p11 p21 g11
g11 < w g 21 p21 (1 p21 )( w g11 p11 ) p11 (1 p21 ) g11 + p21g21
0 < w g11 p11 + p21 p11 p21 (1 p11 )(1 p21 ) w
4.2.6 Universal Generating Function of Series-parallel Systems
The structure functions of complex series-parallel systems can always be repre-

sented as compositions of the structure functions of statistically independent sub-
systems containing only elements connected in a series or in parallel. Therefore, in
order to obtain the u-function of a series-parallel system one has to apply the com-
position operators recursively in order to obtain u-functions of the intermediate
pure series or pure parallel structures.
The following algorithm realizes this approach:
1. Find the pure parallel and series subsystems in the MSS.
2. Obtain u-functions of these subsystems using the corresponding f ser and
f par operators.
3. Replace the subsystems with single elements having the u-function obtained for
the given subsystem.
4. If the resulting MSS contains more than one element, return to step 1.
The resulting u-function corresponds to the output performance of the entire
system.
Table 4.7 Structure functions for a pure series and for pure parallel subsystems
Structure Structure
No. of function function
MSS Description of MSS for series for parallel
elements elements
type f par
f ser
1 Flow transmission MSS with flow dispersion (4.42) (4.46)

2 Flow transmission MSS without flow dispersion (4.42) (4.47)
3 Task processing MSS with work sharing (4.45) (4.46)
4 Task processing MSS without work sharing (4.45) (4.47)
The choice of the structure functions used for series and parallel subsystems
depends on the type of system. Table 4.7 presents the possible combinations of
structure functions corresponding to the different types of MSS.
In order to illustrate the presented recursive algorithm we consider the follow-
ing example.
Example 4.11 Consider a series-parallel MSS consisting of seven multi-state ele-
ments presented in Figure 4.5 (a). For each element, the corresponding u-function
ui ( z ) , i = 1, 2, , 7, is given. Find the resulting u-function for the entire MSS.
Solution. First, one can find only one pure series subsystem consisting of elements
with the u-functions u2(z), u3(z), and u4(z). Calculating the u-function
U1 ( z ) = f ser ( u2 ( z ) , u3 ( z ) , u4 ( z ) ) and replacing the three elements with a sin-
gle element having the u-function U1(z), one obtains a system with the structure
presented in Figure 4.5 (b). This system contains a purely parallel subsystem con-
sisting of elements with the u-functions U1(z) and u5(z) that in their turn can be re-
placed by a single element with the u-function U 2 ( z ) = f par (U1 ( z ) , u5 ( z ) )
(Figure 4.5 (c)). The obtained structure has three elements connected in a series
that can be replaced with a single element having the u-function
U 3 ( z ) = f ser ( u1 ( z ) ,U 2 ( z ) , u6 ( z ) ) (Figure 4.5 (d)). The resulting structure con-
tains two elements connected in parallel. The u-function of this structure repre-
senting the u-function of the entire MSS is obtained as
U ( z ) = f par (U 3 ( z ) , u7 ( z ) ) .
The procedure described above obtains recursively the same MSS u-function
that can be obtained directly by operator (4.23) using the following structure func-
tion:
f ( G1 , G2 , G3 , G4 , G5 , G6 , G7 )
( ( ) )
= f par f ser G1 , f par ( f ser ( G2 , G3 , G4 ) , G5 ) , G6 , G7 .
The recursive procedure of obtaining the MSS u-function is not only more con-
venient than the direct one but, which is much more important, it allows one to
considerably reduce the computational burden of the algorithm. Indeed, using the
direct procedure (4.23) one has to evaluate the system structure function for each
7
combination of values of random variables G1,,G7 ( k j times). Using the re-
j =1
cursive algorithm one can take advantage of the fact that some subsystems have
the same performance rates in different states, which makes these states indistin-
guishable and reduces the total number of terms in the corresponding u-functions.
u2(z) u3(z) u4(z)

u1(z) u6(z)
u5(z)
u7(z)
(a) (b)
(c) (d)
Fig. 4.5 Example of recursive determination of the MSS u-function
Consider Example 4.11. The number of evaluations of the system structure

function using the direct Equation 4.23 for the system with two-state elements is
27 = 128. Each evaluation requires calculating a function of seven arguments. Us-
ing the reliability recursive method one obtains the system u-function by just 30
procedures of structure function evaluation (each procedure requires calculating
simple functions of just two arguments). This is possible because of the reduction
in the lengths of intermediate u-functions by like-term collection. For example, it
can be easily seen that in a subsystem of elements 2, 3, and 4, all eight possible
combinations of the elements states produce just two different values of the sub-
system performance: 0 and min ( g 21 , g31 , g 41 ) in the case of a flow transmission

system, or 0 and g 21 g31 g 41 ( g21 g31 + g21 g 41 + g31 g 41 ) in the case of a task process-
ing system. After obtaining the u-function U1(z) for this subsystem and collecting
like terms one has a two-term equivalent u-function that is used further in the re-
cursive algorithm. Such simplification is impossible when the entire expression
(4.23) is used.
4.2.6 Universal Generating Function of Systems with Bridge

Structure
The bridge structure (Figure 4.6) is an example of a complex system for which the
u-function cannot be evaluated by decomposition to a series and parallel subsys-
tems. Each one of five bridge components can in their turn be a complex composi-
tion of the elements. After obtaining the equivalent u-functions of these compo-
nents one should apply Equation 4.23 in order to obtain the u-function of the
entire bridge (Levitin and Lisnianski 1998).
C
U1(z) U3(z)
A B
U5(z)
U2(z) U4(z)
D
U4(z)
ui(z)
ui+1(z) uk+1(z)
Component
...
uk(z)
Element
Fig. 4.6 MSS with bridge structure

4.2.6.1 Flow Transmission Multi-state System
To evaluate the output performance of a flow transmission MSS with flow disper-
sion consider the flows through the bridge structure presented in Figure 4.6. First,
there are two parallel flows through components 1,3 and 2,4. To determine the ca-
pacities of each of the parallel substructures composed from components con-
nected in series, the function fser (4.42) should be used. The function fpar (4.46)
should be used afterwards to obtain the total capacity of the two parallel substruc-
tures. Therefore, the structure function of the bridge, which does not contain di-
agonal component, is
f ( G1 , G2 , G3 , G4 ) = f par ( f ser ( G1 , G3 ) , f ser ( G2 , G4 ) ) (4.50)
and its total capacity is equal to min {G1 , G3 } + min {G2 , G4 } .

The surplus of the transferred product on one of the end nodes of component 5
can be expressed as s = max {( G1 G3 ) , ( G2 G4 ) , 0} and the deficit of the
transferred product on one of the end nodes of component 5 can be expressed as
d = max {( G3 G1 ) , ( G4 G2 ) , 0}.
The necessary condition for the existence of the flow through component 5 is
the simultaneous existence of a surplus on one end node and a deficit on the other
end: s 0, d 0. This condition can be expressed as ( G1 G3 )( G2 G4 ) < 0.
If the condition is met, the flow through component 5 will transfer the amount
of the product that cannot exceed the capacity of component G5 and the amount of
the surplus product s. The deficit d on the second end of component 5 is the
amount of the product that can be transferred by the component that follows the
diagonal (component 3 or 4). Therefore, the flow through the diagonal component
is also limited by d. Thus, the maximal flow through the diagonal component is
min{Ds , Dd , G5 }.
Now we can determine the total capacity of the bridge structure when the ca-
pacities of its five components are given:
f bridge (G1 , G2 , G3 , G4 , G5 ) = min {G1 , G3 } + min {G2 , G4 }

(4.51)
+ min {| G1 G3 , G2 G4 |, G5 } 1( ( G1 G3 ) ( G2 G4 ) < 0 ) .
Now consider the performance of a flow transmission MSS without flow dis-
persion. In such a system a single path between points A and B providing the
greatest flow should be chosen. There exist four possible paths consisting of
groups of components (1,3), (2,4), (1,5,4), and (2,5,3) connected in a series. The
transmission capacity of each path is equal to the minimum transmission capacity
of the elements belonging to this path. Therefore, the structure function of the en-
tire bridge takes the form
f bridge ( G1 , G2 , G3 , G4 , G5 )
(4.52)
= max {min {G1 , G3 } , min {G2 , G4 } , min {G1 , G5 , G4 } , min {G2 , G5 , G3 }} .
Note that the four parallel subsystems (paths) are not statistically independent
since some of them contain the same elements. Therefore, the bridge u-function
cannot be obtained by system decomposition as for the series-parallel systems. In-
stead, one has to evaluate structure function (4.37) for each combination of states
of the five independent components.
4.2.6.2 Task Processing Multi-state System
In this type of system, a task is executed consecutively by components connected

in series. No stage of work execution can start until the previous stage is entirely
completed. The component processing speed is considered to represent its per-
formance.
First, consider a system without work sharing in which the parallel components
act in a competitive manner. There are four alternative sequences of task execution
(paths) in a bridge structure. These paths consist of groups of components (1,3),
(2,4), (1,5,4), and (2,5,3). To evaluate the system processing speed notice that the
total task can be completed by the path with the minimal total processing time
T = min {t1 + t3 , t2 + t4 , t1 + t5 + t4 , t2 + t5 + t3 } .
Therefore, the bridge processing speed G = 1 T can be determined as
G = f ( G1 , G2 , G3 , G4 , G5 )
(4.53)
= max { ( G1 , G3 ) , (G2 , G4 ), ( G1 , G4 , G5 ) , ( G2 , G3 , G5 )} ,
where
G j Gi
, if G j Gi 0,
(G j , Gi ) = (G j + Gi )
0, if G j Gi = 0,

G j Gi Gm
, if G j Gi Gm 0,
(G j , Gi , Gm ) = (G j Gi + Gi Gm + G j Gm )
0, if G j Gi Gm = 0.

Now consider a system with work sharing for which the same three assump-
tions that were made for the parallel system with work sharing (Section 4.2.5) are
made. There are two stages of work performance in the bridge structure. The first
stage is performed by components 1 and 2 and the second stage is performed by
components 3 and 4. The fifth component is necessary to transfer work between
nodes C and D. Following these assumptions, the decision about work sharing can
be made in the nodes of bridge A, C, or D only when the entire amount of work is
available in this node. This means that component 3 or 4 cannot start task process-
ing before both components 1 and 2 have completed their tasks and all of the work
has been gathered at node C or D.
There are two ways to complete the first stage of processing in the bridge struc-
ture depending on the node in which the completed work is gathered. To complete
it in node C, an amount of work of (1 ) x should be performed by component 1
with processing speed G1 and an amount of work of x should be performed by
component 2 with processing speed G2 and then transferred from node D to node
C with speed G5 ( is the work sharing coefficient). The time the work performed
by component 1 appears at node C is t1 = (1 ) x G1 . The time the work per-
formed by component 2 and transferred by component 5 appears at node C is
t2 + t5 , where t2 = x G2 , t5 = x G5 . The total time of the first stage of process-
ing is T1C = max {t1 , t2 + t5 } . It can be easily seen that TC is minimized when the
is chosen that provides equality t1 = t2 + t5 . The work sharing coefficient obtained
from this equality is = G2 G5 ( G1G2 + G1G5 + G2 G5 ) and the minimal processing
time is
T1C = x ( G2 + G5 ) / ( G1G2 + G1G5 + G2 G5 ) .
To complete the first stage of processing in node D, a work load of

(1 ) x should be performed by component 2 with processing speed G2 and a
work load of x should be performed by component 1 with processing speed G1
and then transferred from node C to node D with speed G5. The minimal possible
processing time can be obtained in the same manner as T1C. This time is
T1D = x ( G1 + G5 ) / ( G1G2 + G1G5 + G2 G5 ) .
If the first stage of processing is completed in node C, a work load of

(1 )x should be performed by component 3 in the second stage of processing,
which takes time t3 = (1 ) x G3 . The rest of the work x should be first trans-
ferred to node D by component 5 and then performed by component 4. This will
take time t5 + t4 = x G5 + x G4 . Using the optimal work sharing
(when t3 = t4 + t5 ) with = G4 G5 ( G3G4 + G3G5 + G4 G5 ) we obtain the minimal

time of the second stage of processing:
T2C = x ( G4 + G5 ) / ( G3G4 + G3G5 + G4 G5 ) .
Using the same technique we can obtain the minimal processing time when the
second stage of processing starts from node D:
T2D = x ( G3 + G5 ) / ( G3G4 + G3G5 + G4 G5 ) .
Assuming that the optimal way of performing work can be chosen in node A,
we obtain the total bridge processing time as min {T1C + T2C , T1D + T2D } , where
( G + G5 ) ( G + G5 ) ,
T1C + T2C = x 2 + 4

( G + G5 ) ( G + G5 ) ,
T1D + T2D = x 1 + 3

= G1G2 + G1G5 + G2 G5 ,
= G3G4 + G3G5 + G4 G5 .
The condition T1C + T2C T1D + T2D is satisfied when ( G2 G1 ) p ( G3 G4 ) .

The obtained expressions can be used to estimate the processing speed of the
entire bridge:
G = f ( G1 , G2 , G3 , G4 , G5 ) = , (4.54)
( a + G5 ) + ( e + G5 )
where
a = G4 , e = G2 if ( G2 G1 ) ( G3 G4 ) ,
a = G3 , e = G1 if (G2 G1 ) > ( G3 G4 ) .
Readers that are interested in additional examples of application UGF tech-

nique can find them in Levitin et al. (2006), Levitin (2008), Levitin and Amari
(2009), and Yeh (2009).
4.3 Importance and Sensitivity Analysis Using Universal Generating Function 183
4.3 Importance and Sensitivity Analysis Using Universal

Generating Function
Methods for evaluating the relative influence of elements reliability on the reli-
ability or availability of the entire system provide useful information about the
importance of these elements. Importance evaluation is an essential point in trac-
ing bottlenecks in systems and in the identification of the most important ele-
ments. It is a useful tool to help the analyst find weaknesses in design and to sug-
gest modifications for system upgrade. Importance was first introduced by
Birnbaum (1969). This index characterizes the rate at which the system reliability
changes with respect to changes in the reliability of a given element. An im-
provement in the reliability of the element with the highest importance causes the
greatest increase in system reliability. Several other measures of elements and
minimal cut set importance in coherent systems were developed by Barlow and
Prochan (1974, 1975) and Fussell (1975). Useful information about it can be
found in Ryabinin (1976).
The above importance measures have been defined for coherent binary-state
systems where elements can only have two states: total failure and perfect func-
tioning without any performance considerations.
In MSS, the failure effect will be essentially different for elements with differ-
ent nominal performance rates. Therefore, the performance rates of system ele-
ments should be taken into account when their importance is estimated. Some ex-
tensions of importance measures for coherent MSS were suggested by Butler
(1979), Griffith (1980), Bosche (1987), Ramirez-Marquez and Coit (2005), Zio
and Podofillini (2003).
The entire MSS availability is a complex function of demand w, which is an
additional factor having a strong impact on elements importance in MSSs. Reli-
ability of a certain element may be very important for one demand level and less
important for another.
For the complex system structure, where there can be a large number of de-
mand levels, the importance evaluation for each element is a difficult problem
when the straightforward Boolean or Markov approaches are used because of a
great number of logical functions for the top-event description (when one uses the
logic methods) and a great number of states (when the Markov technique is used).
The method for the Birnbaum importance calculation, based on the USF tech-
nique, is much simpler. It uses the same system description for complex MSSs
with a different physical nature of performance and takes the demand into ac-
count. The method can be easily extended for the sensitivity analysis of additional
system output performance measures, considered in Section 4.2.2.
Here we consider a system in steady state, and therefore the natural generaliza-
tion of Birnbaum importance for MSS is the rate at which the MSS availability in-
dex changes with respect to changes in the availability of a given element j. For
the constant demand w in steadystate, the Birnbaum elements importance ex-

tended to MSS was defined by Bosche (1987):
I A( ji ) ( w) = A( w) / p ji , (4.55)
where pji is the probability that jth element will be in the given state i (with per-
formance rate gji), and A(w) is the steady-state availability of the entire MSS. In
other words, the Birnbaum importance extension in a MSS context characterizes
the influence of changing probabilities pji on the entire MSS availability. Evaluat-
ing MSS reliability/availability indices using UGF was already considered in Sec-
tion 4.2.2 of the book. Based on the previously obtained resulting UGF
K
U ( z ) = pi z gi of the entire MSS, steady-state availability A(w) can be obtained
i =1
for any constant demand w using expression (4.29):
K
A( w) = A {U ( z ), w} = pi 1( F ( gi , w) 0), (4.56)
i=1
where F(gi,w) is an acceptability function, pi is the steady-state probability that

the output performance of entire MSS will be equal to gi, i=1,,K.
If the demand is variable, in steady state it is usually represented by two vectors
w = {w1 , wM } and q = {q1 ,, qM } , where wi is a possible demand level and qi
is a corresponding steady-state probability. Then for such variable demand the
Birnbaum importance extension for any given element j will be the following:
M
I A( ji ) (w , q) = qm I A( ji ) ( wm ). (4.57)
m =1
In a similar manner, one can obtain the sensitivity of the steady-state expected
MSS output performance to the availability of the given element j at given per-
formance level i as
E
I E( ji ) = , (4.58)
p ji
where E can be found by using expression (4.30) from Section 4.2.2:
K
E = lim E (t ) = pi gi . (4.59)
t
i =1
Note that this sensitivity index I E( ji ) does not depend on demand level w.
The sensitivity of the expected steady-state MSS performance deficiency to the
availability of the given element j at given performance level g ji for a single con-
stant demand w is defined as the follows
D ( w)
I D( ji ) ( w) = ,, (4.60)
p ji
where D ( w) is found by using expression (4.34):
K
D ( w) = lim D(t , w) = pi max( w gi , 0) . (4.61)
t i =1
To find the sensitivity of the expected steady-state MSS performance defi-

ciency to the availability of the given element j for a variable demand character-
ized by vectors w, q, the following expression should be used:
M
I D( ji ) (w , q ) = qm I D( ji ) ( wm ) . (4.62)
m =1
Since in a coherent MSS the performance deficiency is a decreasing function of

pj for any element j, the absolute value of a derivative is considered when the de-
gree of influence of element availability on the performance deficiency is esti-
mated.
The algorithms of determining the importance and sensitivity indices
( I A( ji ) , I E( ji ) , I D( ji ) ) differ only by operators used for reliability index evaluation. It
can be easily seen that all the sensitivity indices are linear functions of elements
steady-state availability.
The importance and sensitivity indices for each MSS element strongly depend
on its place in the system, its nominal performance level, and system demand (ex-
cluding index I E( ji ) , which does not depend on demand). Note that, according to
the definition of system relevancy, I A( j ) = 0 means that element j at performance
level i is not relevant in the given MSS for the given demand. In order to illustrate
the presented importance measures we consider the following examples.
Example 4.12 Consider the flow transmission MSS (Figure 4.7) consisting of two
elements connected in parallel.
The structure function for the MSS is given by
(1)
G par = f par (G1 , G2 ) = G1 + G2 .
Fig. 4.7 Flow transmission MSS
Element 1 is a three-state element whose output performance G1 has three

different levels G1 {g11 , g12 , g13 } = {0, 0.8,1.0} with corresponding probabilities
{ p11 , p12 , p13 } = {0.05, 0.10, 0.85}.
Element 2 is a binary-state element whose output performance G2 has two dif-
ferent levels G2 {g 21 , g 22 } = {0,1.0} , with corresponding probabilities
{ p21 , p22 } = {0.1, 0.9} .
An acceptability function is given as Fi ( gi , w ) = gi w, or in other words the
MSS output performance should be greater than or equal to demand.
Find the Birnbaum importance for element 1 at level g13 (in state 3) and for ele-
ment 2 at level g22 (in state 2) for three different demand levels: w1 = 1.0, w2 = 1.5,
and w3 = 2.0.
Solution. Based on the given probability distributions for elements output per-
formance, we can write individual UGFs:
for element 1 u1 ( z ) = p11 z 0 + p12 z g12 + p13 z g13 ,
for element 2 u2 ( z ) = p21 z 0 + p22 z g22 .
By using UGO we obtain the UGF for the entire MSS (UGF corresponding to
the entire MSS output performance Gpar):
{
U ( z ) = f (1) {u1 ( z ), u2 ( z )} = f (1) p11 z 0 + p12 z g12 + p13 z g13 , p21 z 0 + p22 z g22
par par
}
= p11 p21 z 0 + p12 p21 z g12 + p13 p21 z g13 + p11 p22 z g 22 + p12 p22 z g12 + g 22 + p13 p22 z g13 + g22 .
Applying operator A to U(z) [according to expression (4.56)] we obtain
A( w1 = 1.0) = A {U ( z ), w1} = p13 p21 + p22 p11 + p12 p22 + p13 p22 ,
A( w2 = 1.5) = A {U ( z ), w2 } = p12 p22 + p13 p22 ,
A( w3 = 2.0) = A {U ( z ), w3 } = p13 p22 .
Now by differentiating Expression 4.55 we shall have the following Birnbaum

importance values.
For demand w1 = 1.0:
I A(13) ( w1 ) = A( w1 ) / p13 = p21 + p22 = 1.0,

I A(22) ( w1 ) = A( w1 ) / p22 = p11 + p13 = 0.85.
I A(13) ( w2 ) = A( w2 ) / p13 = p22 = 0.90,

I A(22) ( w2 ) = A( w2 ) / p22 = p12 + p13 = 0.95.
I A(13) ( w3 ) = A( w3 ) / p13 = p22 = 0.90,

I A(22) ( w3 ) = A( w3 ) / p22 = p13 = 0.85.
Example 4.13 Consider a MSS consisting of n elements with only total failures
connected in the series described in Example 4.9.
Find the importance and sensitivity measures
I A( j1) ( w), I D( j1) ( w), I E( j1) ( w), j = 1, 2,..., n ,
I A( j1) ( w) and I D( j1) ( w) should be found as functions of constant demand level w for
flow transmission MSS and for task processing MSS.
Solution. In Example 4.9 we calculated the reliability measures of the system that
were presented in Table 4.2. The corresponding importance and sensitivity indices
can be obtained analytically by differentiating these measures according to (4.55),
(4.60), and (4.61). The indices are presented in Table 4.5.
Recall that g = min{ g11 ,..., g n1} for the flow transmission MSS and
g = 1 / nj =1 g j11 for the task processing MSS.
One can see that the element with the minimal availability has the greatest im-
pact on the entire MSS availability. (A chain fails at its weakest link.) The index
I A( j ) in this example does not depend on element performance rates and on de-
mand. Indices I E( j ) and I D( j ) also do not depend on the performance rate of individ-
ual element j, but the performance rate gj can influence these indices if it affects
the entire MSS performance g .
Table 4.8 Importance and sensitivity indices for series MSS
w I A( j ) I D( j ) I E( j )
n n
p p
w > g 0 g i1 g i1
i =1
p j1 p j1
i =1
n n
pi1 pi1
0 < w g p j1
w
p j1
i =1 i =1
One can find more examples in Levitin and Lisnianski (1999), Lisnianski and
Levitin (2003), and Levitin (2005).
4.4 Estimating Boundary Points for Continuous-state System

Reliability Measures
In practice many real-world systems and components exhibit continuous perform-

ance. In these cases one can discern a continuum of different states varying from
perfect operation to complete failure. So a class of structure functions was intro-
duced in which the performance of each state may take any value in the closed in-
terval. Such functions were introduced in Block and Savits (1984) and Baxter
(1984, 1986) and are called continuous structure functions. Till now very few
works have been devoted to reliability investigations for such systems (Zuo et al.
1999); (Brunelle and Kapur 1998). According to the approach used by Block and
Savits (1984) and Baxter and Kim (1986) the continuum structure function was
decomposed into binary functions and results from the standard binary-state cases
were applied to obtain bounds on the system performance. The continuum struc-
ture function was defined and some of its properties were described. The binary-
state and multi-state models were generalized by permitting the state Xi of the ith
component to take any value in the closed unit interval, i.e., X i [ 0,1] , for
i = 1, 2, , n. The vector X = ( X 1 , X 2 ,..., X n ) is known as the state vector. The
state space of the structure function is also the interval [0,1].
The continuous structure function (CSF) f is defined as a mapping from the unit
hypercube [ 0,1] into the unit interval [0,1] in such a way that the state of the sys-
n
tem is known from the states of its n components. So, in accordance with Montero
n
et al. (1990) a CSF is a mapping f : [ 0,1] [ 0,1] , where f ( X 1 ,..., X n ) repre-
sents the performance of the system when each component i works at performance
level X i . Such a system is called as a continuous-state system (CSS). We shall as-
sume that f is monotonic, i.e., f ( X ) f (Y ) , if X Y in the sense that
4.4 Estimating Boundary Points for Continuous-state System Reliability Measures 189
X i Yi , i = 1, 2,..., n. The standard conditions f ( 0 ) = 0 and f (1) = 1 can also

be assumed without loss of practical generality (the system fails when all compo-
nents fail, and perfect functioning appears with perfect components).
The stochastic behavior of CSS and components may be specified (Brunelle
and Kapur 1998) through the following cumulative distribution functions:
F [Wdem , t ] = Pr[ f (t ) Wdem ],

Fi [Wdem , t ] = Pr[ X i (t ) Wdem ],
where Wdem is some specified demand level. As was observed in Brunelle and Ka-
pur (1998), for many real-world CSSs there would be a nonzero probability of be-
ing in state 0, and thus the distribution for its state would be mixed (continuous
and discrete). This case is practically important and sometimes is treated as com-
posite performance and reliability evaluation (Trivedi et al. 1992).
Reliability evaluation for CSS in practice is a very difficult problem and often
requires enormous efforts, even for a sufficiently simple system (Aven 1993).
Thus, one of the most important problems in this field is to develop engineering
method for CSS reliability assessment. It will be suitable for engineers if the
method is based on formalized procedure for finding out the relationship between
the characteristics of an entire complex system (entire CSS performance distribu-
tion) and the characteristics (performance distributions) of its components. Using
by method one can find the entire CSS performance distribution based only on a
CSS logic diagram and individual component performance distributions. Such a
method exists for finite MSS and based on a UGF technique. Hence, if a finite
MSS can approach CSS, this effective technique can be applied. Here we consider
the method presented in Lisnianski (2001), which is based on discrete approxima-
tion of a given CSS by two corresponding MSSs in order to compute lower and
upper bounds of CSS reliability measures. The main advantage of the method is
that it is based solely on system logic diagrams, does not require the building a
CSS structure function and allows one to calculate a boundary point estimation for
CSS reliability measures with preliminary specified accuracy.
4.4.1 Discrete Approximation
As pointed out in previous section, finite MSSs can be considered in order to get
useful approaches for an arbitrary, monotonic, continuum-state system. Without
loss of generality we will consider the interval [0,Xmax], where the system and
component performances take their values. A discrete approximation will be de-
fined by successive partitions of the interval [0,Xmax].
Suppose that performance Xi of the ith system component has the cumulative
distribution function Fi ( x) = Pr{ X i x}. Designate as Nint the number of inter-
vals, which partition the main interval [0, Xmax ]. Hence the length of one interval
x will be follows:
X max
x = . (4.63)
N int
The lower (upper) bound approximation for component i with continuous per-
formance CDF Fi (x ) will be represented by the component whose performance
is distributed according to the following piecewise CDF Filow(x) (Fiupp(x)) respec-
tively (Table 4.9).
In Figure 4.8 one can see these CDFs. According to the definitions of Fiupp ( x)
and Filow ( x) (Table 4.9) we can write
Fiupp ( x ) Fi ( x ) Filow ( x ) , where x [ 0, X i max ] . (4.64)
Table 4.9 Lower Filow ( x) and upper Fiupp ( x) bound piecewise approximation for component per-
formance Fi ( x)
Xi Filow ( x) Fiupp ( x)
0, x ) Fi ( x) 0
x, 2x ) Fi (2x) Fi ( x)
2x,3x ) Fi (3x ) Fi (2x)
( N int 1) x, Nint x
) Fi ( Nint x) Fi ( ( Nint 1)x )
Nint x 1 1
The lower and upper bounds are mentioned here in the sense of bounds for CSS
reliability measures, not as bounds for a function Fi ( x). Reliability measures for
continuum-state systems were studied in Brunelle and Kapur (1998, 1999). From
the set of CSS reliability measures we will use here the two following important
and practical measures: (1) mean CSS performance E and (2) CSS mean unsup-
plied demand D(Wdem). Examples of the second measure are the unsupplied power
in power systems and expected output tardiness in information processing sys-
tems.
By using the definition of the Stieltjes integral (Gnedenko 1988, Gnedenko and
Ushakov 1995), the mean performance for components with SDF
Fi ( x), where x [0, X max ], can be written as
X i max
Ei = xdFi ( x). (4.65)
0
It is treated as a square SE between axis y, curve Fi (x ) and straight-line y = 1

(Figure 4.8).
F(x)
1
Filow (x)
Fi (x)
Fiupp (x)
0
x=Wdem Xmax x
SE SD
Component performance
Fig. 4.8 Lower and upper piecewise approximation for component i performance distribution
Fi ( x)
The curve Fiupp(x) is lower than (or equal to) the curve Fi ( x), hence the square
SEupp and the corresponding mean Eiupp ( x) that are calculated for the CDF Fiupp(x)
will be greater than the mean performance of component i with performance
CDF Fi ( x). Thus, the mean Eiupp ( x) characterizes the upper bound of mean per-
formance Ei ( x) of continuous-state component i. The curve Filow(x) in Figure 4.8 is
greater then (or equal to) the curve Fi ( x), hence the mean Eilow ( x) that is calcu-
lated for the CDF Filow(x) characterizes the lower bound of mean perform-
ance E i (x ) of continuous-state component i.
The mean unsupplied demand for continuous-state component i can be treated
as a mean of the following random value X i :
X Wdem , if 0 X i < Wdem ,

X i = i (4.66)
0, if Wdem X i X i max .
where Xi is the component performance with CDF Fi(x).
Let us designate the CDF of random variable X i as FX i ( x). By using the

definition of the Stieltjes integral, the mean unsupplied demand for the component
with CDF FX i ( x), can be obtained
X i max
Di =
0
xdFX i ( x). (4.67)
Hence, the mean unsupplied demand may be interpreted as a square SD between

axis x, curve Fi(x) and straight-line x = Wdem (Figure 4.8), which is used with a
minus sign. Thus, the mean unsupplied demand that is calculated for the piecewise
curve Fiupp ( x) characterizes the upper bound for component is mean unsupplied
demand. Analogously, the mean unsupplied demand that is calculated for the
piecewise curve Filow ( x) characterizes the lower bound for component is mean un-
supplied demand.
Thus, for any continuous-state component i the lower and upper bounds for re-
liability measures may be determined by using piecewise
functions Fiupp ( x) and Filow ( x).
We consider here a monotonic CSS consisting of n different components.
Hence, if for every component i we use its lower bound performance represented
by Filow ( x), we will get the lower bounds for entire CSS reliability measures E and
D(Wdem). If for every continuum-state component we use its upper bound per-
formance represented by Fiupp ( x) , we will get the upper bounds for all CSS per-
formance measures E and D(Wdem).
Instead of piecewise functions Fiupp ( x) and Filow ( x) we will use below the corre-
(d ) (d )
sponding discrete distributions Fiupp ( x), Filow ( x) with the mass functions presented
in Table 4.10.
(d ) (d )
Table 4.10 Mass functions Fiupp ( x) and Filow ( x) for upper and lower boundary points
Xi Upper boundary points Lower boundary points
0 0 Fi ( x)
x Fi ( x) Fi (2x) Fi (x)
2x Fi (2x) Fi (x) Fi (3x) Fi (2x)

Nint x 1 Fi (( N int 1) x ) 1 Fi ( Nint x )
Components with discrete and piecewise distributions have the same values of
the corresponding reliability measures. The above lower and upper bounds for
CSS reliability measures can be computed with a desired level of accuracy by de-
creasing the step x.
Thus, by using the above-considered approach, CSS can be represented by two
finite MSSs: MSSupp and MSSlow for upper and lower boundary point calculation
of CSS reliability measures, respectively. These MSSs have the same structure as
CSSs. In MSSupp any continuous-state component i must be represented by a dis-
(d )
crete distribution with a corresponding mass function Fiupp ( x). In MSSlow any con-
tinuous-state component i must be represented by a discrete distribution with a
(d )
corresponding mass function Filow ( x). In this case the UGF technique, which
proved to be very effective for finite MSS reliability assessment, can be applied.
4.4.2 Boundary Point Estimation
Using UGF technique boundary points for CSSs, reliability measures may be es-
timated according to the following algorithm.
Algorithm of boundary point estimation
1. Based on the performance CDF Fi(x) for every component i, the individual
u-functions for upper and lower boundary points of component is reliability
measures must be found. For upper bounds, according to Table 4.8, we will
have for every component i
N int 1
ui( u ) ( z ) = 0 * z 0 + Fi (x) * z x + [ F ( j x) F (( j 1)x)]* z
j=2
i i
j x
(4.68)
N int x
+ [1 Fi ( Nint x x)]* z .
The MSS, which consists of components with u-functions ui(u ) ( z ), represents

MSSupp for upper boundary point estimation for the given CSS.
For lower bounds, according to Table 4.8, we will have for every component i
N int 1
ui(l ) ( z ) = Fi x * z 0 + [ F ( j x) F (( j 1)x)] * z
j =2
i i
j x
(4.69)
+ [1 Fi ( N int x)] * z Nint x
.
The MSS, which consists of components with u-functions ui(l ) ( z ) represents

MSSlow for lower boundary point estimation for the given CSS.
2. Using these individual u-functions (4.68) and (4.69) one determines the result-
ing u-functions U S( u ) ( z ) and U S( l ) ( z ) for MSSupp and MSSlow, respectively. This
may be done using methods described in previous subsections of this chapter.
3. Based on the resulting u-functions U S( u ) ( z ) and U S( l ) ( z ), the reliability meas-
ures must be found according to expressions (4.30) and (4.34).
For upper bounds:
dU S(u ) ( z )
E (u ) = |z =1 , (4.70)
dz
(
D (u ) = D U S(u ) ( z ),Wdem . ) (4.71)
For lower bounds:
dU S(l ) ( z )
E (l ) = |z =1 , (4.72)
dz
(
D (l ) = D U S(l ) ( z ),Wdem . ) (4.73)
In order to illustrate the presented approach we consider the following example.

Example 4.14 There is a continuous-state system consisting of three continuous-
state components, presented in Figure 4.9.
Fig. 4.9 Continuous-state system

Component 1 has availability A1 = 0.8; performance X1 is distributed according

to CDF FX 1 ( x ) as an exponential random variable with mean x1m = 40. Compo-
nent 2 has availability A2 = 0.7; performance X2 is distributed according to CDF
FX2(x) as uniform random variable between x2min = 30 and x2max = 60. Component
3 has availability A3 = 0.95; performance X3 is distributed according to CDF
FX3(x) as normal random variable with mean x3m = 70 and standard deviation
x3std = 10. Specified demand level Wdem = 20. Two types of systems should be
considered.
A system of type 1 is one with a capacity treated performance. In this case in
our example, components 1, 2, 3 are the pipes with corresponding capacities X1,
X2, and X3 and availabilities A1, A2, and A3.
A system of type 2 is one with processing speed treated performance (without
load-sharing the job between components connected in parallel). For this case in
our example, components 1 and 2 are the input-output controllers and component
3 is the central processing unit with corresponding data processing speeds X1, X2,
and X3 and availabilities A1, A2, and A3.
For a system of type 1 the total entire CSS capacity must be greater than the
specified demand level Wdem = 20, and for the system of type 2 the total entire
CSS data processing speed must be greater than Wdem = 20.
Find the lower and upper boundary point estimation for all CSS reliability
measures CSS mean output performance and CSS mean unsupplied demand
for both types of systems.
Solution. Taking into account the component availabilities, we will have the fol-
lowing resulting probability that the performance of any component i will be less
than or equal to x:
Ai FXi ( x ) , if x > 0,

Fi ( x ) = 1 Ai , if x = 0,
0, if x < 0.

where i is the component number in CSS i = 1, 2,3.

Set X max = 1000 and the initial step x = 1. Hence, N int = 1000.
Now the algorithm is applied.
1. According to (4.68) and (4.69) we will obtain individual u-functions ui(u ) ( z )
and ui(l ) ( z ) , i = 1, 2,3, for upper and lower boundary point estimation of the re-
liability measures of component i = 1, 2,3.
2. The u-functions of two components 1 and 2, connected in parallel, in the CSS
(Figure 4.9) according to Section 4.2.5 will be as follows:
for upper bounds (system of type 1):
( u1)
u(1,2) ( z ) = f (1) u1(u ) ( z ), u2( u ) ( z ) ;
par
( u 2)
u(1,2) ( z ) = f (1) u1( u ) ( z ), u2( u ) ( z ) ;
par 1
for lower bounds (system of type 1):
( l 1)
u(1,2) ( z ) = f (1) u1(l ) ( z ), u2( l ) ( z ) ;
par
( l 2)
u(1,2) ( z ) = f (1) u1( l ) ( z ), u2(l ) ( z ) .
par 1
(1) (1)
Here structure functions f par and f par 1 are defined by expressions (4.46) and
(4.47), respectively.
The u-functions for the entire system will be as follows:
U S( u1) ( z ) = f (1) u1,2

( u1)
( z ), u3(u ) ( z ) ;
ser
U S( u 2) ( z ) = f ( 2 ) u1,2
( u 2)
( z ), u3( u ) ( z ) ;
ser
U S( l1) ( z ) = f (1) u1,2

( l1)
( z ), u3(l ) ( z ) ;
ser
U S( l 2) ( z ) = f ( 2 ) u1,2
( l 2)
( z ), u3(l ) ( z ) .
ser
3. To obtain the lower and upper bounds for CSS mean output performance E and
expected unsupplied demand D expressions (4.70)(4.73) are used.
For upper bounds: For lower bounds:
dU ( ) ( z ) dU ( ) ( z )
uj lj
E( E( ) =
uj ) lj
= |z =1 , j = 1, 2; | z =1 , j = 1, 2;
dz dz
uj uj
(
D( ) = D U ( ) ( z ),Wdem , j = 1, 2; ) ( )
D ( ) = D U ( ) ( z ), Wdem , j = 1, 2.
lj lj
In Figures 4.10 and 4.11 one can see the upper and lower boundary points for
the CSS mean output performance and mean unsupplied demand for systems of
type 1 and type 2 as functions of step x value. One can see that the difference be-
tween the upper and lower bounds decreases as step x decreases.
55
50
Mean output performance
45
40 System of type 1, upper bound

System of type 1, lower bound
35
System of type 2, upper bound
30 System of type 2, lower bound
25
20
15
2 4 6 8 10
Step
Fig. 4.10 Boundary points for CSS mean output performance
For x = 1 we have the following.

For system of type 1:
Upper bounds Lower bounds

E(
u1)
E(
l1)
= 47.21 = 46.23
( u1) ( l1)
D = 3.07 D = 3.19
For system of type 2:
Upper bounds Lower bounds

E(
u 2)
E(
l 2)
= 24.23 = 23.83
(u 2) (l 2 )
D = 3.32 D = 3.45
Using these lower and upper bounds one can estimate CSS reliability measures.
For a system of type 1 the maximal relative error will be as follows
( x = 1):
for
47.21 46.23 3.19 3.07

errE(1) = = 0.021 and errD(1) = = 0.039.
46.23 3.07
Analogously, for a system of type 2 the maximal relative error will be as fol-
lows ( for x = 1):
24.23 23.83 3.45 3.32
errE(2) = = 0.017 and errD(2) = = 0.039.
23.83 3.32
-2.6
-2.8
Mean unsupplied demand
-3
-3.2
-3.4
-3.6
-3.8 System of type 1, upper bound

-4 System of type 2, upper bound
-4.2
1 2 3 4 5 6 7 8 9 10
Step
Fig. 4.11 Boundary points for CSS mean unsupplied demand
References
Aven T (1993) On performance measures for multistate monotone systems. Reliab Eng Syst Saf
41(3):259266
Barlow R., Prochan F (1974). Importance of system components and fault tree analysis, Opera-
tion Research Center, vol. 3, University of California, Berkeley
Barlow R., Prochan F (1975). Importance of system components and fault tree analysis. Stochas-
tic Processes and their Applications 3(2):153173
Baxter LA (1984) Continuum structures. I. J Appl Probab 21:802815
References 199
Baxter LA (1986) Continuum structures II. In: Mathematical Proc of the Cambridge Philosophi-
cal Society 99:331338
Baxter LA, Kim C (1986) Bounding the stochastic performance of continuum structure func-
tions. J Appl Probab 23:660669
Birnbaum L (1969) On the importance of different components in a multi-component system. In:
Krishnaiah PR (ed) Multivariate Analysis II, Academic, New York, pp 581592
Block HW, Savits TH (1984) Continuous multi-state structure functions. Oper Res 32:703714.
Bosche A (1987) Calculation of critical importance for multi-state components. IEEE Trans Re-
liab, R-36:247249
Brunelle R, Kapur KC (1998) Continuous-state system reliability: an interpolation approach.
IEEE Trans Reliab 47:181-187
Butler DA (1979) A complete importance ranking for component of binary coherent systems
with extensions to multi-state systems. Nav Res Logist Quart 26:565-578
Chakravarty S, Ushakov I (2000) Effectiveness analysis of GlobalstarTM gateways. In: Proceed-
ings of the 2nd International Conference on Mathematical Methods in Reliability
(MMR2000). Bordeaux, France, vol 1
Elmakias D (2008) New computational methods in power system reliability. Springer, London
Fussel JB (1975) How to hand-calculate system reliability and safety characteristics. IEEE Trans
Reliab R-24 (3):168174
Gnedenko B (1969) Mathematical Methods of Reliability Theory. Academic, Boston
Gnedenko B (1988) Course of probability theory. Nauka, Moscow (in Russian)
Gnedenko B, Ushakov I (1995) Probabilistic reliability engineering. Wiley, New York
Griffith WS (1980) Multi-state reliability models. J Appl Prob 17:735744
Grimmett G, Stirzaker D (1992) Probability and random processes. Clarendon, Oxford
Korczak E (2007) New formulae for failure/repair frequency of multi-state monotone systems
and its applications. Control Cybern 36(1):219239
Korczak E (2008) Calculating steady state reliability indices of multi-state systems using dual
number algebra. In: Martorell et al (eds) Safety, Reliability and Risk Analysis: Theory, Meth-
ods and Applications. Proceedings of the European safety and reliability conference
(ESREL2008). Valencia, Spain, pp17951802
London
Levitin G (2008) Optimal structure of multi-state systems with uncovered failures. IEEE
Trans Reliab 57 (1):140148
Levitin G, Amari S (2009) Optimal load distribution in series-parallel systems. Reliab Eng
Syst Saf 94:254260
Levitin G, Dai Y, Ben-Haim H (2006) Reliability and performance of star topology grid
service with precedence constraints on subtask execution. IEEE Trans Reliab 55(3):507
515
Levitin G, Lisnianski A (1998) Structure optimization of power system with bridge topology.
Electr Power Syst Res 45:201208
Levitin G, Lisnianski A (1999) Importance and sensitivity analysis of multi-state systems using
the universal generating function method. Reliab Eng Syst Saf 65:271282
Levitin G, Lisnianski A, Ben Haim H et al (1998) Redundancy optimization for series-parallel
multi-state systems. IEEE Trans Reliab 47:165172
Lisnianski A (2001). Estimation of boundary points for continuum-state system reliability meas-
ures. Reliab Eng Syst Saf 74:8188
Lisnianski A (2004a) Universal generating function technique and random process methods for
multi-state system reliability analysis. In: Proceedings of the 2nd International Workshop in
Applied Probability (IWAP2004). Piraeus, Greece, pp 237242
Lisnianski A (2004b) Combined generating function and semi-Markov process technique for
multi-state system reliability evaluation. In: Communications of the 4th International Confer-
ence on Mathematical Methods in Reliability. Methodology and Practice, Santa-Fe, NM, 21

25 June, 2004
Lisnianski A, Ben Haim H, Elmakis D (1994) Redundancy optimization for power station. In:
Proc of the 10th International Conference of the Israel Society for Quality, Jerusalem, 1417
November 1994, pp 313319
Lisnianski A, Levitin G, Ben-Haim H et al (1996) Power system structure optimization subject to
reliability constraints. Electr Power Syst Res 39:145152
Lisnianski A, Levitin G, Ben Haim H (2000) Structure optimization of multi-state system with
time redundancy. Reliab Eng Syst Saf 67:103112
Montero J, Tejada J, Yanez J (1990) Structural properties of continuum systems. Eur J Oper Res
45:231240
Nourelfath M, Dutuit Y (2004) A combined approach to solve the redundancy optimization prob-
lem for multi-state systems under repair policies. Reliab Eng Sys Saf 86:205213
Ramirez-Marquez J, Coit D (2005) Composite importance msures for multi-state systems with
multi-state components. IEEE Trans Reliab 54(3):517529
Reinshke K, Ushakov I (1988) Application of Graph Theory for Reliability Analysis. Radio i
Sviaz. Moscow (in Russian and in German).
Ross S (2000) Introduction to probability models. Academic, Boston
Ryabinin I (1976). Reliability of engineering systems: Principles and analysis. Mir, Moscow (in
Russian)
Trivedi S, Muppala JK, Woolet SP, Haverkort BR (1992) Composite performance and depend-
ability analysis. Perf Eval 14:197215
Ushakov I (1986) A universal generating function. Sov J Comput Syst Sci 24:3749
Ushakov I (1987) Optimal standby problem and a universal generating function. Sov J Comput
Syst Sci 25:6173
Ushakov I (1998) An object oriented approach to generalized generating function. In: Proc of the
ECCO-XI Conference (European Chapter on Combinatorial Optimization), Copenhagen, May
1998
Ushakov I (2000) The method of generating sequences. Eur J Oper Res 125(2):316323
Yeh W (2009) A convolution universal generating function method for evaluating the symbolic
one-to-all target subset reliability function the acyclic multi-state information networks. IEEE
Trans Reliab 58(3):476484
Zio E, Podofillini L (2003) Monte Carlo simulation analysis of the effects of different sys-
tem performance levels on the importance of multi-state components. Reliab Eng Syst
Saf 82:6373
Zuo M, Jiang R, Yam R (1999) Approaches for reliability modeling of continuous-state devices.
IEEE Trans Reliab 48(1):1018
5 Combined Universal Generating Function and
Stochastic Process Method
As was described in Chapter 2, stochastic process methods are very effective tools
for MSS reliability evaluation. According to these methods a state-space diagram
of a MSS should be built and transitions between all the states defined. Then a
system evolution should be represented by a continuous-time discrete-state sto-
chastic process. Based on this process all MSS reliability measures can be evalu-
ated.
The main disadvantage of stochastic process models for MSS reliability evalua-
tion is that they are very difficult for application to real-world MSSs consisting of
many elements with different performance levels. This is so-called the dimension
curse. First, state-space diagram building or model construction for such complex
MSSs is not a simple job. It is a difficult nonformalized process that may cause
numerous mistakes even for relatively small MSSs. The problem of identifying all
the states and transitions correctly is a very difficult task. Second, solving models
with hundreds of states can challenge the available computer resources. For MSSs
consisting of n different repairable elements where every element j has kj different
n
performance levels one will have a model with K = k j states. This number can
j =1
be very large even for relatively small MSSs.

If the stochastic process is identified as a Markov process, then a system of K
differential equations must be solved in order to find state probabilities of MSSs.
If sojourn times in some states are non-exponentially distributed, then semi-
Markov process application often gives a good opportunity to obtain a solution.
The number of integral equations in the system that should be solved using a semi-
Markov approach is equal to the square of the total number of MSS states. There-
fore, the total number of integral equations (that should be solved in order to find
state probabilities for MSSs using a straightforward semi-Markov method) will
be K 2 .
In the general case simulation may be performed in order to assess MSS reli-
ability. Simulation techniques are also very sensitive to number of states of a
model. It has the same difficulties at the stage of model construction and often re-
202 5 Combined Universal Generating Function and Stochastic Process Method
quires enormous computational resources in the solution stage. Thus, in practice

the dimension curse essentially restricts the straightforward use of stochastic
process methods.
Therefore, the development of a method that is based on simplified procedures
and can reduce the problem dimension may be extremely beneficial for reliability
engineers. In this chapter we consider a method called a combined UGF and ran-
dom process method.
The method is based on UGFs associated with stochastic processes (Section
4.1.5). It was introduced by Lisnianski (2004a). More details can be found in Lis-
nianski (2004b, 2007). In Ding et al. (2009) one can see the industrial application
of the method to restructured power systems.
5.1 Method Description
5.1.1 Performance Stochastic Process for Multi-state Element
In the general case, any element j in a MSS can have kj different states corre-
{
sponding to different performance, represented by the set g j = g j1 ,..., g jk j , }
where gji is the performance rate of element j in state i, i 1, 2,..., k j .{ }
In the first stage, according to the suggested method, a model of a stochastic
process should be built for each multi-state element in the MSS. Based on this
model state probabilities
p ji (t ) = Pr{G j (t ) = g ji }, i {1,..., k j }
for every MSS element j {1,..., n} can be obtained. These probabilities define
the output stochastic process Gj(t) for each element j in the MSS.
At the next stage the output performance distribution for the entire MSS at each
time instant t should be defined based on previously determined state probabilities
for all elements and system structure functions. At this stage the UGF technique
provides simple procedure that is based only on an algebraic operation.
Without loss of generality here we consider a multi-state element with minor
failures and repairs. With each state i there is associated performance gji of ele-
ment j. The states are ordered so that g j ,i +1 g ji for any i. Minor failures and re-
pairs cause element transitions from state i, where 1 i k j , only to the adjacent
states i 1 and i + 1, respectively. The transition will be to state i 1 from state i
if failure occurs in state i and to the state i + 1 if the repair is finished. In state kj
5.1 Method Description 203
may be only failure and transition to the state k j 1 and in state 1 while may be
only repair and transition to state 2.
5.1.1.1 Markov Model for Multi-state Element
If all times to failures and repair times are exponentially distributed, the perform-
ance stochastic process will have a Markov property and can be represented by a
Markov model. Here for simplicity we omit index j and assume that the element
has k different states as presented in Figure 5.1. For a Markov process each transi-
tion from state s to any state m ( s, m = 1, , k ) has its own associated transition
intensity that will be designated as asm. In our case any transition is caused by an
elements failure or repair. If m < s, then asm = sm , where sm is the failure rate
for the failures that cause element transition from state s to state m. If m > s, then
asm = sm , where sm is the corresponding repair rate. The corresponding per-
formance gs is associated with each state s.
k-1,k k,k-1
k-1
k-2,k-1 k-1,k-2
... ...
2,3 3,2
2
1,2 2,1
Fig. 5.1 State-transition diagram for Markov model of repairable multi-state element
Let ps ( t ) , s = 1, , k be the state probabilities of an elements performance

process G (t ) at time t:
ps (t ) = Pr{G (t ) = g s }, s = 1,, k ; t 0.
Based on Chapter 2 the following system of differential equations for finding

the state probabilities ps ( t ) , s = 1, , k for a homogeneous Markov process can
be written as

dps (t ) k k
= pi (t )ais ps (t ) asi . (5.1)
dt i =1 i =1
i s is
In our case all transitions are caused by the elements failures and repairs.
Thus, the corresponding transition intensities ais are expressed by the elements
failure and repair rates. Therefore, the corresponding system of differential equa-
tions may be written as
dp1 (t )
= 12 p1 (t ) + 21 p2 (t ),
dt
dp2 (t )
= 12 p1 (t ) (21 + 23 ) p2 (t ) + 32 p3 (t ),
dt (5.2)
...
dpk (t )
= k 1, k pk 1 (t ) k , k 1 pk (t ).
dt
We assume that the initial state will be state k with the best performance.
Therefore, by solving system (5.2) of differential equations under initial condi-
tions pk ( 0 ) = 1, pk 1 ( 0 ) = = p2 ( 0 ) = p1 ( 0 ) = 0, the state probabilities
ps ( t ) , s = 1,, k can be obtained.
5.1.1.2 Semi-Markov Model for Multi-state Element
Now we assume that for every element j, 1 j n a time to failure is distributed

according to an arbitrary CDF Fi ,(i j)1 (t ) for any state i, 1 < i k j . Analogously, for
any state i, 1 i < k j a time to repair is assumed to be distributed according to
CDF Fi ,(i j+)1 (t ). As was done in the previous section, we also omit index j in further
computations. The state-space diagram and corresponding transitions are pre-
sented in Figure 5.2.
In order to define a semi-Markov process that defines an elements functioning,
the corresponding kernel matrix Qlm (t ) , l , m = 1, 2, k should be obtained. Each
element Qlm (t ) of this matrix determines the probability that transition from state l
to state m will occur during the time interval [0, t].
Fig. 5.2 State-transition diagram for semi-Markov model of repairable multi-state element
According to Section 2.5 we obtain
0 Q12 (t ) 0 0 ... 0 0
Q21 (t ) 0 Q23 (t ) 0 ... 0 0
Qlm (t ) = , (5.3)
... ... ... ... ... ... ...
0 0 0 0 ... Qk , k 1 (t ) 0
where
Q12 (t ) = F12 (t ) , (5.4)

t
Ql +1,l (t ) = 1 Fl +1,l + 2 (t ) dFl +1,l (t ), for 1 l k 2 , (5.5)
0
t
Ql ,l +1 (t ) = 1 Fl ,l 1 (t ) dFl ,l +1 (t ), for 2 l k 1, (5.6)
0
Qk , k 1 (t ) = Fk , k 1 (t ) . (5.7)
Kernel matrix (5.3) and the initial state k (with the best performance) com-
pletely define the semi-Markov process, which describes the stochastic behavior
of a multi-state element.
For every element we denote by lm (t ) the probability that a semi-Markov sto-
chastic process that starts from initial state l at instant t = 0 will be in state m at in-
stant t. Probabilities lm (t ) , l , m = 1, 2,, k can be found from the solution of the
following system of integral equations:
k k t
lm (t ) = lm [1 Qlm (t )] + qls ( ) sm (t )d , l , m = 1,, k , (5.8)
m =1 s =1 0
where
dQis ( )
qis ( ) =
d
and
1, if l = m,
lm =
0, if l m.
We assume that the process always starts from state k (the best state). Hence
the state probabilities of a multi-state element, which should be defined based on
the solution of the system of integral equations (5.8), are as follows:
pk (t ) = k , k (t ), pk 1 (t ) = k , k 1 (t ),..., p1 (t ) = k1 (t ) . (5.9)
Thus, at the first stage, separate small Markov or semi-Markov models

should be constructed for each element of the entire MSS. If the performance sto-
chastic process for a multi-state element is a Markov process, then the Markov
model of this element consists of k differential equations. If the performance
stochastic process for a multi-state element is a semi-Markov process, then the
semi-Markov stochastic model for this element consists of k 2 integral equations.
By solving these equations we obtain the performance probability distribution
pi (t ) = Pr{G (t ) = gi }, i = 1, , k , for every multi-state element at each time in-
stant t. These probabilities completely define output stochastic process G(t) for
each element in the MSS.
5.1.2 Multi-state System Reliability Evaluation
The generic MSS model consists of the performance stochastic processes

G j (t ) g j , j = 1,, n for each system element j, and the system structure func-
tion that produces the stochastic process corresponding to the output performance
of the entire MSS: G (t ) = f (G1 (t ),..., Gn (t )). At the previous stage all stochastic
processes Gj(t), j=1,2,,n were completely defined by the output performance
distribution at any instant t for each system element.
In a traditional binary-state reliability interpretation (Modarres et al. 1999) a re-
liability block diagram shows the interdependencies among all elements. The pur-
pose is to show, by concise visual shorthand, the various block combinations
(paths) that result in system success. Each block of the reliability block diagram
represents one element of a function contained in the system. All blocks are con-
figured in series, parallel, standby, or combinations thereof as appropriate. The
blocks in the diagram follow a logical order that relates the sequence of events
during the prescribed operation of the system. The reliability model consists of a
reliability block diagram and an associated mathematical or simulation model.
In a multi-state interpretation each block of the reliability block diagram repre-
sents one multi-state element of the system. A logical order of the blocks in the
diagram is defined by the system structure function f (G1 (t ),..., Gn (t )), and each
block js behavior is defined by the corresponding performance stochastic process
Gj(t).
At this stage based on previously determined output stochastic processes Gj(t)
for all elements j = 1, 2,, n, and on the given system structure function
f (G1 (t ),..., Gn (t )), an output performance stochastic process G(t) for the entire
MSS should be defined G (t ) = f (G1 (t ),..., Gn (t )). This may be done by using a
UGF method.
At first, an individual UGF for each element should be written. For each ele-
ment j it will be UGF uj(z,t) associated with corresponding stochastic processes
Gj(t). Then by using composition operators over UGF of individual elements and
their combinations in the entire MSS structure, one can obtain the resulting UGF
U(z,t) associated with output performance stochastic process G(t) of the entire
MSS by using simple algebraic operations. This UGF U(z,t) defines the output
performance distribution for the entire MSS at each time instant t. MSS reliability
measures can be easily derived from this output performance distribution (Section
4.2.2).
The following steps should be executed.
1. Having performances gji and corresponding probabilities p ji (t ) for each ele-
ment j, j = 1,...n, i = 1,...k j , one can define UGF uj(z,t) associated with an out-
put performance stochastic process for this element in the following form:
u j ( z , t ) = p j1 ( t ) z + p j2 (t ) z ++ p jk j ( t ) z
g j1 gj2 g jk j
. (5.10)
2. The composition operators fser (for elements connected in series), fpar (for
elements connected in parallel), and fbridge (for elements connected in a bridge
structure) should be applied to the UGF of individual elements and their com-
binations. These operators were defined in the previous chapter where corre-
sponding recursive procedures for their computation were introduced for dif-
ferent types of systems. Based on these procedures the resulting UGF for the
entire MSS can be obtained:
K
U ( z , t ) = pi (t ) z gi , (5.11)
i =1
where K is the number of entire system states and gi is the entire system per-
formance in the corresponding state i, i = 1, , K .
3. Applying the operators A , E , D (introduced in Section 4.2.2) to the resulting

UGF of the entire MSS one can obtain the following MSS reliability indices:
MSS availability A(t, w) at instant t > 0 for arbitrary constant demand w:
K K
A(t ) = A (U ( z , t ), w) = A ( pi (t ) z gi , w) = pi (t )1( gi w 0); (5.12)
i=1 i=1
MSS expected output performance at instant t > 0 :
K K
E (t ) = E (U ( z , t )) = E ( pi (t ) z gi ) = pi (t ) gi ; (5.13)
i =1 i =1
MSS expected performance deficiency at t > 0 for arbitrary constant de-

mand w:
K K
D (t ) = D (U ( z , t ), w) = D ( pi (t ) z gi , w) = pi (t ) max( w gi , 0); (5.14)
i =1 i =1
MSS mean accumulated performance deficiency for a fixed time interval

[0,T]:
T K T
D T = D(t , w)dt = max( w gi ) pi (t )dt . (5.15)
i =1
0 0
Example 5.1 Consider the low transmission system presented in Figure 5.3.
2,1
(1)
1,2
(1)
3,2
(3)
2,1
(3)
u1 ( z , t ) = p11 ( t ) z + p12 ( t ) z
0 1.5
2,3
(3)
1,2
(3)
2,1
(2)
u3 ( z , t ) = p31 ( t ) z 0 + p32 ( t ) z1.8 + p33 ( t ) z 4
1,2
(2)
u2 ( z , t ) = p21 ( t ) z 0 + p22 ( t ) z 2
Fig. 5.3 Series-parallel flow transmission MSS
The system consists of three elements (pipes). The oil flow is transmitted from
left to right. The performance of the pipes is measured by their transmission ca-
pacity (tons per minute). Times to failures and times to repairs are distributed ex-
ponentially for all elements. Elements 1 and 2 are repairable and each one has two
possible states. A state of total failure for both elements corresponds to a transmis-
sion capacity of 0 and the operational state corresponds to capacities of 1.5 and 2
tons/min, respectively, so that
G1 ( t ) { g11 , g12 } = {0,1.5} and G2 ( t ) { g 21 , g 22 } = {0, 2} .
The failure rates and repair rates corresponding to these two elements are
2,1
(1)
= 7 year 1 , 1,2
(1)
= 100 year 1 for element 1,
2,1
( 2)
= 10 year 1 , 1,2
(2)
= 80 year 1 for element 2.
Element 3 is a multi-state element with minor failures and minor repairs. It can
be in one of three states: a state of total failure corresponding to a capacity of 0, a
state of partial failure corresponding to a capacity of 1.8 tons/min and a fully op-
erational state with a capacity of 4 tons/min. Therefore,
G3 ( t ) { g31 , g32 , g33 } = {0,1.8, 4} .

The failure rates and repair rates corresponding to element 3 are
3,2
(3)
= 10 year 1 , 2,1
(3)
= 7 year 1 ,
1,2
(3)
= 120 year 1 , 2,3
(3)
= 110 year 1 .
The MSS structure function is
Gs ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
The demand is constant: w = 2.0 tons/min.

Find the MSS availability A ( t , w ) , , expected output performance E ( t ) , and
expected performance deficiency D(t,w) by using a combined UGF and stochastic
process method.
Solution. Applying the two-stage procedure described above we proceed as fol-
lows.
1. According to the Markov method we build the following systems of differential
equations for each element separately (using the state-space diagrams presented
in Figure 5.3 inside of the corresponding elements).
For element 1:
dp11 (t ) / dt = 1,2
(1)
p11 (t ) + 2,1
(1)
p12 (t ),

dp12 (t ) / dt = 2,1 p12 (t ) + 1,2 p11 (t ).
(1) (1)
Initial conditions are: p12 ( 0 ) = 1, p11 ( 0 ) = 0.

For element 2:
dp21 (t ) / dt = 1,2
(2)
p21 (t ) + 2,1
(2)
p22 (t ),

dp22 (t ) / dt = 2,1 p22 (t ) + 1,2 p21 (t ).
(2) (2)
Initial conditions are: p21 ( 0 ) = 1, p22 ( 0 ) = 0.

For element 3:
dp31 (t ) / dt = 1,2
(3)
p31 (t ) + 2,1
(3)
p32 (t ),

dp32 (t ) / dt = 3,2 p33 (t ) (2,1 + 2,3 ) p32 (t ) + 1,2 p31 (t ),
(3) (3) (3) (3)

dp33 (t ) / dt = 3,2 p33 (t ) + 2,3 p32 (t ).
(3) (3)
Initial conditions are: p31 ( 0 ) = p32 ( 0 ) = 0, p33 ( 0 ) = 1.

A closed-form solution can be obtained for each of these three systems of dif-
ferential equations. The corresponding expressions for states probabilities are as
follows:
For element 1:
2,1
(1)
2,1
(1)
( (1) (1)
2,1 + 1,2 ) t
p11 (t ) = e ,
1,2
(1)
+ 2,1
(1)
1,2
(1)
+ 2,1
(1)
1,2
(1)
2,1
(1)
( (1) (1)
2,1 + 1,2 ) t
p12 (t ) = + e .
1,2 + 2,1 1,2 + 2,1
(1) (1) (1) (1)
For element 2:
2,1
( 2)
2,1
(2)
( (2) (2)
2,1 + 1,2 ) t
p21 (t ) = e ,
1,2 + 2,1 1,2 + 2,1
(2) (2) (2) ( 2)
1,2
(2)
2,1
(2)
( (2) (2)
2,1 + 1,2 ) t
p22 (t ) = + e .
1,2
(2)
+ 2,1
( 2)
1,2
(2)
+ 2,1
(2)
For element 3:
p31 (t ) = A1e t + A2 e t + A3 ,
p32 (t ) = B1e t + B2 e t + B3 ,
p33 (t ) = C1e t + C2 e t + C3 ,
where
= / 2 + 2 / 4 , = / 2 2 / 4 ,
2,1 3,2
(3) (3)
(3) (3) (3) (3)
A1 = , A2 = 2,1 3,2 , A3 = 2,1 3,2 ,
( ) ( )
( 1,2
(3)
+ )3,2
(3)
( 1,2
(3)
+ )3,2
(3)
1,2 3,2
(3) (3)
B1 = , B2 = , B3 = ,
( ) ( )
( (3)
1,2 + ) (3)
3,2
(3)
2,3 ( 1,2
(3)
+ )3,2 2,3
(3) (3)
C1 = , C2 = ,
( )( + ) (3)
3,2 ( )( + 3,2
(3)
)
1,2 2,3 ( + 3,2
(3) (3) (3)
(3,2
(3)
))
C3 = ,
( + 3,2 )( + 3,2 )
(3) (3)
= 2,1
(3)
+ 3,2
(3)
+ 1,2
(3)
+ 2,3
(3)
, = 2,1 3,2 + 1,2
(3) (3)
2,3 + 1,2
(3) (3)
3,2 .
(3) (3)
Therefore, one obtains the following output performance stochastic processes:

element 1: g1 = {g11 , g12 } = {0,1.5}, p1 ( t ) = { p11 (t ), p12 (t )};
element 2: g 2 = {g 21 , g 22 } = {0, 2.0}, p 2 ( t ) = { p21 (t ), p22 (t )};
element 3: g 3 = {g31 , g32 , g33 } = {0,1.8, 4.0}, p 3 ( t ) = { p31 (t ), p32 (t ), p33 (t )}.
2. Having the sets gj, pj(t) for j = 1, 2, 3 one can define for each individual ele-
ment j the u-function associated with the elements output performance sto-
chastic process:
u1 ( z , t ) = p11 ( t ) z g11 + p12 ( t ) z g12 = p11 ( t ) z 0 + p12 ( t ) z1.5 ,

u2 ( z , t ) = p21 ( t ) z g 21 + p22 ( t ) z g 22 = p21 ( t ) z 0 + p22 ( t ) z 2 ,
u3 ( z , t ) = p31 ( t ) z g31 + p32 ( t ) z g32 + p33 ( t ) z g33 = p31 ( t ) z 0 + p32 ( t ) z1.8 + p33 ( t ) z 4 .
These u-functions are also presented in Figure 5.3 under corresponding ele-
ments.
3. Using the composition operators f (1) and f (1) for a flow transmission MSS
ser par
with flow dispersion (Section 4.2) one obtains the resulting UGF for the entire
series-parallel MSS:
U ( z , t ) = f (1) ( f (1) (u1 ( z , t ), u2 ( z , t )), u3 ( z , t )).

ser par
In order to find the resulting UGF U12(z,t) for elements 1 and 2 connected in
parallel, operator f (1) is applied to individual UGF u1(z,t) and u2(z,t).
par
U12 ( z , t ) = f (1) ( u1 ( z , t ) , u2 ( z , t ) )
par
(
= f (1) p11 ( t ) z 0 + p12 ( t ) z1.5 , p21 ( t ) z 0 + p22 ( t ) z 2
par
)
= p11 ( t ) p21 ( t ) z 0 + p12 ( t ) p21 ( t ) z1.5 + p11 ( t ) p22 ( t ) z 2 + p12 ( t ) p22 ( t ) z 3.5 .
In the resulting UGF U12(z,t) the powers of z are found as the sum of powers of
the corresponding terms.
In order to find the UGF for the entire MSS, where element 3 connected in se-
ries with two elements 1 and 2 that connected in parallel, operator f (1) should be
ser
applied:
(
U ( z , t ) = f (1) f (1) ( u1 ( z , t ) , u2 ( z , t ) ) , u3 ( z , t )
ser par
)
(
= f (1) p31 ( t ) z 0 + p32 ( t ) z1.8 + p33 ( t ) z 4 ,
ser
p11 ( t ) p21 ( t ) z 0 + p12 ( t ) p21 ( t ) z1.5 + p11 ( t ) p22 ( t ) z 2 + p12 ( t ) p22 ( t ) z 3.5 )
= p31 ( t ) p11 ( t ) p21 ( t ) z 0 + p31 ( t ) p12 ( t ) p21 ( t ) z 0 + p31 ( t ) p11 ( t ) p22 ( t ) z 0
+ p31 ( t ) p12 ( t ) p22 ( t ) z 0 + p32 ( t ) p11 ( t ) p21 ( t ) z 0 + p32 ( t ) p12 ( t ) p21 ( t ) z1.5
+ p32 ( t ) p11 ( t ) p22 ( t ) z1.8 + p32 ( t ) p12 ( t ) p22 ( t ) z1.8 + p33 ( t ) p11 ( t ) p21 ( t ) z 0
+ p33 ( t ) p12 ( t ) p21 ( t ) z1.5 + p33 ( t ) p11 ( t ) p22 ( t ) z 2 + p33 ( t ) p12 ( t ) p22 ( t ) z 3.5 .
In the resulting UGF U(z,t) the powers of z are found as a minimum of the
powers of the corresponding terms.
Taking into account that p31 ( t ) + p32 ( t ) + p33 ( t ) = 1, p21 ( t ) + p22 ( t ) = 1 and
p11 ( t ) + p12 ( t ) = 1, one can simplify the last expression for U(z,t) and obtain the
resulting UGF associated with the output performance stochastic process g, p(t) of
the entire MSS in the following form:
5
U ( z , t ) = pi (t ) z gi ,
i =1
where
g1 = 0, p1 ( t ) = p11 ( t ) p21 ( t ) + p31 ( t ) p12 ( t ) + p31 ( t ) p11 ( t ) p22 ( t ),

g 2 = 1.5 tons/min, p2 ( t ) = p12 ( t ) p21 ( t ) p32 ( t ) + p33 ( t ) ,
g3 = 1.8 tons/min, p3 ( t ) = p32 ( t ) p22 ( t ),
g 4 = 2.0 tons/min, p4 ( t ) = p33 ( t ) p11 ( t ) p22 ( t ),
g5 = 3.5 tons/min, p5 ( t ) = p33 ( t ) p12 ( t ) p22 ( t ).
These two sets
g = { g1 , g 2 , g3 , g 4 , g5 } and p ( t ) = { p1 (t ), p2 (t ), p3 (t ), p4 (t ), p5 (t )}
completely define the output performance stochastic process for the entire MSS.
Computation of probabilities pi ( t ) , i = 1, 2,,5, gives exactly the same re-
sults that were obtained in Example 2.4 by using a straightforward Markov
method. (These results were presented in Figure 2.13.)
Based on the resulting UGF U(z,t) of the entire MSS, one can obtain the MSS
reliability indices. The instantaneous MSS availability for the constant demand
level w = 2.0 tons/min is
5 5
A(t ) = A (U ( z , t ), w) = A ( pi (t ) z gi , 2) = pi (t )1( F ( gi , 2) 0) = p4 (t ) + p5 (t ).
i=1 i=1
The instantaneous mean output performance at any instant t > 0 is
5
E (t ) = E (U ( z , t )) = pi (t ) gi = 1.5p2 ( t ) + 1.8p3 ( t ) + 2p4 ( t ) + 3.5p5 ( t ) .
i =1
The instantaneous performance deficiency D(t) at any time t for the constant
demand w = 2.0 tons/min is
5
D(t ) = D (U ( z , t ), w) = pi (t ) max ( 2 gi , 0 )
i =1
= p1 ( t )( 2 0 ) + p2 ( t )( 2 1.5) + p3 ( t )( 2 1.8) = 2p1 ( t ) + 0.5p2 ( t ) + 0.2p3 ( t ) .
The calculated reliability indices A(t), E(t), and D(t) are exactly the same as
those obtained in Example 2.4 by using a straightforward Markov method and
graphically presented in Figure 2.14.
Note that instead of solving the system of K = 2 2 3 = 12 differential equa-
tions (as should be done in a straightforward Markov method), here we solve just
three systems: one third-order system and two second-order systems. The further
derivation of the entire systems state probabilities and reliability indices is based
on using simple algebraic equations.
5.2 Redundancy Analysis for Multi-state Systems
5.2.1 Introduction
A redundancy problem in a MSS is much more complex than that in a binary-state

system. Some redundancy problems for MSSs were investigated by Kuo and Zuo
(2003) and Lisnianski and Levitin (2003), where typical parallel connections of
multi-state components, k-out-of-n MSS, and corresponding extensions were dis-
cussed and summarized. Recent research has focused on the reliability evaluation
and optimization of MSSs (Yeh 2006; Huang et al. 2003; Levitin 2005; Tian et al.
5.2 Redundancy Analysis for Multi-state Systems 215
2008). However for the MSS there is an important type of redundancy that has not
existed for binary-state systems and has not been investigated till now in the
framework of MSS reliability analysis.
For MSSs it is typical that after satisfying its own demand one MSS can pro-
vide its abundant resource (performance) to another MSS directly or through an
interconnection system (which can also be multi-state). In this case, the first MSS
can be called the reserve MSS and the second one the main MSS. In the general
case demand for the reserve and the main MSS can also be described by two dif-
ferent independent stochastic processes. Typical examples of such kinds of MSS
include power generating systems where one power station can assist another
power station satisfying demands, oil and gas production and transportation sys-
tems, computing systems with distributed computation resources, etc. Such a
multi-state structure with redundancy may be treated as MSSs with mutual aid or a
structure with interconnected MSSs. This type of redundancy is quite common for
MSSs. However, using existing methods it is very difficult to build a reliability
model for a complex repairable MSS taking redundancy into consideration and to
solve it for obtaining the corresponding reliability indices.
In practice each multi-state component in a MSS can have different numbers of
performance levels. This number may be relatively large up to ten or more
(Billinton and Allan 1996; Goldner 2006). Even for relatively small MSSs consist-
ing of three to five repairable components the number of states system wide will
be significantly greater (ten thousand or more). In general, for a MSS consisting of
n repairable components, where each component j has k j different capacity levels,
n
there are K = k j system states. This number may be very large and increase
j =1
dramatically with increasing number of components. For interconnected MSSs the

problem can be more serious. For such MSSs, enormous efforts have to be made
to develop a stochastic process model and solve it (in order to obtain the corre-
sponding reliability indices) using traditional straightforward methods. However,
it is difficult to develop a stochastic process model for such a complex intercon-
nected MSS. Determining all system states and transitions correctly is an arduous
job. Moreover, it can challenge the available computing resources. If a random
process is identified as a Markov process, the system state probabilities can be ob-
n
tained by solving K = k j differential equations. Therefore, in practice only
j =1
long-term reliability analysis is performed to assess the reliability of such systems,

which is based on steady-state probabilities. In such a case, instead of differential
equations only algebraic equations will be solved. Therefore, the short-term tran-
sient dynamic behavior of a MSS is not considered. In the general case such an
approach will lead to decreased accuracy.
In order to use multi-state models for all components and to avoid decreased
accuracy for reliability analysis, a special technique was proposed. The technique
is based on the combination of UGF and random process methods. Here we con-
sider an application of the combined UGF and random process method for reliabil-
ity assessment of interconnected repairable MSSs with mutual aid. Such an appli-
cation was suggested in Lisnianski and Ding (2009). In Ding et al. (2009) one can
find a method applied to dynamic reliability assessment in restructured power sys-
tems.
5.2.2 Problem Formulation
According to the generic MSS model any system component j in a MSS can have
kj different states corresponding to the performance levels, represented by the set
{ }
g j = g j1 , , g jk j . The current state of component j and the corresponding value
of the component performance level Gj(t) at any instant t are random variables.
Gj(t) takes values from gj: Gj(t) gj. Therefore, for the time interval [0,T], where
T is the MSS operation period, the performance level of component j is defined as
a discrete-state, continuous-time stochastic process. In this chapter only Markov
processes will be considered, where the process behavior at a future instant only
depends on the current state. The general Markov model of a multi-state compo-
nent was introduced in Chapter 2, which considered minor and major fail-
ures/repairs of components.
Minor failures are failures causing component transition from state i to the ad-
jacent state i 1. In other words, minor failure causes minimal degradation of com-
ponent performance. A major failure is one that causes components to transit from
state i to state j : j < i 1. The minor repair returns an component from state j to
state j + 1, while major repair returns components from state j to state i, where
i > j + 1. In this case for each component its performance level Gj(t) is a discrete-
state, continuous-time Markov stochastic process.
A general redundancy scheme for a MSS is presented in Figure 5.4. The main
multi-state system MSSm should satisfy its demand, which is presented as a dis-
crete-state, continuous time Markov stochastic process Wm(t). MSSm consists of m
multi-state components. The performance level of each component i in MSSm at
any instant t > 0 is defined by its output Markov stochastic process Gmi (t ),
i = 1,.., m. All m components in the main MSS are included in the technical struc-
ture according to the given structure function f m , which defines the main system
output stochastic performance Gm(t) over the stochastic processes of the system
components:
Gm (t ) = f m {Gm1 (t ),..., Gmm (t )} . (5.16)

Analogously, the reserve multi-state system MSSr consists of r multi-state

components arranged in the technical structure according its structure function f r ,
which defines the MSSr output stochastic performance Gr(t) based on the output
stochastic processes Gri(t), i=1,.., r of its components:
Gr (t ) = f r {Gr1 (t ),..., Grr (t )} . (5.17)
The reserve multi-state system MSSr should also satisfy its own demand, which
can be represented as a stochastic process Wr(t). If the output performance
Gr ( t ) > Wr ( t ) , the abundant (surplus) performance Gr ( t ) Wr ( t ) can be deliv-
ered to the main multi-state system MSSm through the connecting system. In this
case the stochastic process Gcinp(t) that represents an input of the connecting MSSc
can be defined by the following structure function f cinp :
Gcinp (t ) = f cinp {Gr (t ) Wr (t ), 0} = max{Gr (t ) Wr (t ), 0} . (5.18)
Structure function f cinp defines the reserve system obligations concerning as-
sistance to the main system.
Wr (t )
Reserve system
Reserve MSS obligations Connecting MSS
MSSr Gr (t ) Wr (t ) Gcinp(t ) MSSc Gcout(t )

Gcinp(t )
Gr (t ) Gc (t )
Gcinp(t ) = max{Gr (t ) Wr (t ),0}
Main MSS
MSSm Gm (t ) Wm (t ) G MSS (t )
Gm (t )
Wm (t )
Fig. 5.4 General redundancy scheme for MSS
If the process Gcinp(t) is defined by the above expression, it means that the re-
serve MSSr will only send its abundant performance that remains after satisfying
its own demand to the input of the connecting MSSc. Generally speaking, stochas-
tic process Gcinp(t) and function f cinp can be defined in different ways. It will de-
pend on the reserve system obligation agreement. For example, if, according to the
agreement, even when Gr ( t ) < Wr ( t ) , it should be obligatory to send the speci-

fied performance g s from the reserve system MSSr to the input of the connecting
system MSSc, then we have the following:
Gcinp (t ) = f cinp {Gr (t ) Wr (t ), g s } = max{Gr (t ) Wr (t ), g s }. (5.19)
The expression indicates that the reserve system according to its obligation
agreement should send specified performance g s to the connecting system even
in the case where its own demand is not satisfied. When its demand is satisfied,
the reserve system should send its abundant performance to the connecting sys-
tem.
The connecting system can also be a MSS, which is designated as a MSSc. It
consists of c multi-state components, which are included in the technical structure
with the given structure function f c :
Gc (t ) = f c {Gc1 (t ),..., Gcr (t )}. (5.20)
In the general case such redundancy can be reversible. In other words, the main
MSSm can also be used as a redundant system in order to support the MSSr.
The problem is to evaluate the reliability indices for the main MSSm that char-
acterize the degree of satisfying demand Wm(t), such as availability, expected in-
stantaneous performance deficiency, expected accumulated performance defi-
ciency, etc.
5.2.3 Model Description
5.2.3.1 Model for Multi-state Element
In this subsection when dealing with a single multi-state element, we will omit in-
dex j for the designation of a set of the elements performance rates. This set is
denoted as g = { g1 , , g k } . It is also assumed that this set is ordered so that
gi +1 gi for any i. Here we consider the general model of a repairable Markov
multi-state element as it was described in Chapter 2.
The state-space diagram for the general model of the repairable multi-state
element with minor and major failures and repairs is presented in Figure 5.5. Fail-
ures cause the component to transit from state j to state i ( j > i ) with correspond-
ing transition intensity ji . Repairs cause the component to transit from state e to
state l ( e < l ) with corresponding transition intensity el .
Based on a standard Markov technique the following system of differential

equations can be written for the state probabilities of an element:
dp k (t ) k 1 k 1
dt = e,k e p ( t ) p k ( t ) k , e ,
e =1 e =1
dpi (t ) k i 1
= e , i pe (t ) + e , i p e (t )
dt e = i +1 e =1
(5.21)
i 1 k
pi (t )( i , e + i , e ), for 1 < i < k ,
e =1 e = i +1

dp1 (t ) =
k k
dt
e=2
e ,1 pe (t ) p1 (t ) 1, e ,
e=2
with the initial conditions pk (0) = 1, pk 1 (0) = ... = p1 (0) = 0.
Fig. 5.5 State-transition diagram for repairable multi-state component

Solving this system of differential equations one can obtain the state probabili-
ties pi(t), i = 1, , k , that define probabilities that at instant t>0 the component will
be in state i.
Based on these probabilities and given performance levels in every state i, one
obtains a UGF corresponding to the elements output stochastic performance:
u ( z , t ) = p1 (t ) z g1 + p2 (t ) z g2 + ... + pk (t ) z gk (5.22)
for any component at any instant t > 0.

The UGF (5.22) of an element is called the individual UGF and corresponds to
the discrete-state, continuous-time Markov stochastic process G (t ) g that de-
scribes the component evolution in its state space.
5.2.3.2 Model for Main Multi-state System and Its Demand
As stated in the previous subsection, the main multi-state system MSSm consists of
m multi-state elements. Performance of each element i in MSSm at any instant
t > 0 is defined by its output Markov stochastic process Gmi (t ) , i = 1,.., m. For
any element i in MSSm we assume that its output performance stochastic process
has ki( m ) different states with corresponding performance levels gij( m ) and state
probabilities pij( m ) (t ), i = 1,..., m; j = 1,..., ki( m ) .
After solving the corresponding system of differential equations (5.21) for ele-
ment i, the following equation, which defines individual UGF umi ( z, t ) for the out-
put stochastic performance of component i in MSSm, can be written as
ki( m )
gij( m )
umi ( z , t ) = p
j =1
(m)
ij (t ) z , i = 1,.., m. (5.23)
All m elements in the main MSS are included in the technical structure accord-
ing to the given structure function f m , which defines the main system output sto-
chastic performance Gm(t):
Gm (t ) = f m {Gm1 (t ),..., Gmm (t )} , (5.24)
where
Gm (t ) is the main system (MSSm) output performance stochastic process (it is a
discrete-state, continuous-time Markov stochastic process with a finite number
of different performance levels);
Gmi (t ) is the output performance stochastic process of element i, i = 1, , m, in

the main system MSSm.
We denote Km as the number of output performance levels for main MSSm and
(m)
p (t ) as the probability that the stochastic output performance of the main
i
MSSm will be at level gi( m ) , i = 1,.., K m at time instant t > 0.

According to the definition of UGF, U m ( z , t ) corresponding to the stochastic
output performance of MSSm can be defined in the following format:
Km
gi( m )
Um ( z, t ) = pi( m) (t ) z . (5.25)
i =1
Using composition operator m over individual UGFs representing the output

performance for each component, U m ( z , t ) for the stochastic output performance
of the main system can be obtained as
Km
Um ( z, t ) = pi( m) (t ) z gi = m {um1 ( z, t ),..., umm ( z, t )}.
(m)
(5.26)
i =1
Taking into account expressions (5.23) and using the general definition of com-
position operator (4.23) from Chapter 4 one can obtain the following expression:
U m ( z , t ) = m {um1 ( z , t ),..., umm ( z , t )}

k1( m ) (m)
(m)
km (m)
g mj
p1 j (t ) z ,..., pmj (t ) z
(m) g1 j (m)
= m (5.27)
j =1 j =1

(m) (m)
( ) .
k1 k2 (m)
km
m m g1,( mj ) ,..., g m
(m)
= ... p (m)
i , ji (t )z 1 ,j
m

j1 =1 j2 =1 jm =1 i =1
The procedures for computation of composition operators for major types of

MSS (parallel, series, series-parallel or bridge configurations) was described in
Chapter 4. Based on these procedures the resulting UGF (5.25), corresponding to
state probabilities and performance levels of the MSSm output stochastic process,
can be obtained using simple algebraic operations.
Demand Wm(t) is assumed to be a discrete-state, continuous-time Markov sto-
chastic process that at any instant t > 0 takes discrete values from the set
wm= {wm1 ,..., wmM } with the corresponding probabilities p1( w) (t ),..., pM( w) (t ). There-
fore the UGF UWm ( z, t ) that corresponds to the main system demand process
Wm(t) has the following format:
M
UWm ( z , t ) = p (j w) (t ) z
wmj
. (5.28)
j =1
We designate the UGF corresponding to Markov stochastic process

Gm (t ) Wm (t ) as
Mm
Um (z, t) = pi(m) z gi ,
( m )
(5.29)
i =1
where
Mm- is the number of possible performance levels for stochastic process
Gm (t ) Wm (t ) and
pi( m ) (t ) is the probability that stochastic process Gm (t ) Wm (t ) will be at
level gi( m ) , i = 1,.., M m at time instant t > 0.
Using the known structure function f m w = Gm (t ) Wm (t ) UGF U m ( z, t ) can

be obtained using the following composition operator mw
{
Um ( z, t ) = mw Um ( z, t ),Uwm ( z, t )}
Km M
w
Km M (5.30)
= mw pi( m) (t ) z gi , p(jw) (t ) z mj = pi(m) p(jw) z i mj .
( m) g( m) w
i =1 j =1 i =1 j =1
5.2.3.3 Model for Reserve Multi-state System and Its Demand
As stated above, reserve multi-state system MSSr consists of r multi-state ele-

ments. Performance of each component i in MSSr is defined by its output Markov
stochastic process Gri (t ) , i = 1,.., r. For each component i in MSSr we assume that
its stochastic process of output performance has ki( r ) different performance levels
with corresponding performance levels gij( r ) and state probabilities pij( r ) (t ),
i = 1,..., r ; j = 1,..., ki( r ) .
After solving the corresponding system of differential equations (5.21) for ele-
ment i, the following equation, which defines individual UGF uri ( z , t ) for the out-
put stochastic performance of element i in MSSr, can be written as
ki( r )
gij( r )
uri ( z , t ) = pij( r ) (t ) z , i = 1,.., r. (5.31)
j =1
All r elements in a reserve MSS are included in the technical structure accord-
ing to the given structure function f r , which defines the reserve system output
stochastic performance Gr(t):
Gr (t ) = f r {Gr1 (t ),..., Grr (t )} . (5.32)
Gr (t ) is a discrete-state, continuous-time Markov stochastic process with finite

number of different performance levels.
We denote by Kr the number of output performance levels for the reserve
MSSr, and pi( r ) (t ) the probability that the stochastic output performance of the re-
serve MSSr will be at level gi( r ) , i = 1,.., K r at time instant t > 0.
According to the definition of UGF, U r ( z , t ) for the stochastic output perform-
ance of MSSr can be defined in the following format:
Kr
(r)
U r ( z, t ) = pi( r ) (t ) z
gi
. (5.33)
i =1
The resulting UGF U r ( z , t ) for the reserve system output stochastic perform-
ance Gr(t) can be obtained using composition operator r over individual UGFs
representing the output performance for each component in the reserve MSS:
Kr
{
Ur (z) = pi( r ) (t )z gi = r ur1 (z, t ),..., urr ( z, t) . } (5.34)
(r)
i =1
Taking into account expressions (5.31) and using the general definition of com-
position operator, we obtain the following expression:
{
U r ( z, t ) = r ur1 ( z, t ),..., urr ( z, t ) }
k1
(r )
g(r)

kr( r )
g( r ) (5.35)
= r p1( rj ) (t ) z 1 j ,..., prj( r ) (t ) z rj
j =1 j =1

(r) (r )
r ( g ( r ) ,..., g ( r ) )
k1 k2 (r )
kr
r
= ... pi(,rj)i (t )z 1, j1 r , jr .
j1 =1 j2 =1 jr =1 i =1
Demand Wr(t) is also a discrete-state, continuous-time Markov stochastic proc-

ess that at any instant t > 0 takes discrete values from the set wr= {wr1 ,..., wrN }
with corresponding probabilities p1( wr ) (t ),..., pN( wr ) (t ). Therefore, the UGF U wr ( z, t )

that corresponds to the demand process of the reserve system will be in the follow-
ing format:
N
U wr ( z , t ) = p (j wr ) (t ) z rj .
w
(5.36)
j =1
We designate the UGF corresponding to Markov stochastic process

Gr (t ) Wr (t ) as
Nr
gi( r )
U r ( z , t ) = pi( r ) z . (5.37)
i =1
The UGF U r ( z, t ) can be obtained in the following way
{
Ur ( z, t ) = rw Ur (z, t ),Uwr ( z, t ) }
(5.38)
Kr N
Kr N
w
(r )
g(r )
= rw pi(r ) (t ) z i , p(jwr ) (t ) z rj = pi(r ) p(jwr ) z i rj .
g w
i =1 j =1 i =1 j =1
5.2.3.4 Model for Reserve System Obligation and Connecting System
The reserve MSSr provides abundant resources (performance) to the main MSSm
only after satisfying its own demand.
Therefore, the stochastic process Gcinp(t) that represents an input for the con-
necting MSSc can be defined by the following structure function f cinp , which de-
fines the reserve system obligation:
Gcinp (t ) = f cinp {Gr (t ) Wr (t ), 0} = max {Gr (t ) Wr (t ), 0} . (5.39)
If the process Gcinp(t) is defined by expression (5.18), it indicates that the re-
serve MSSr will only send its abundant performance that remains after satisfying
its own demand to the input of the connecting MSSc. As stated in Section 2, sto-
chastic process Gcinp(t) and function f cinp are defined by the reserve system obli-
gation agreement.
Based on (5.16) (5.18), UGF U cinp ( z, t ) corresponding to Markov stochastic
process Gcinp(t) can be obtained as
{
U cinp ( z , t ) = cinp U r ( z , t ), z 0 }
Nr g( r ) Nr max{ gi( r ) ,0} (5.40)
= cinp pi( r ) z i , z 0 = pi( r ) z .
i =1 i =1
In the general case, the connecting system MSSc can also be a MSS. Its per-
formance Gc(t) is treated as the capability to transmit certain performance
gi( c ) , i = 1,, c from the reserve system MSSr to the main system MSSm
Gc (t ) { g1( c ) , g 2( c ) ,...., g c( c ) }. (5.41)
pi( c ) (t ) is defined as the probability of state i corresponding to the performance

level gi( c ) at instant t > 0. The UGF U c ( z, t ) of the MSSc corresponding to the
underlying stochastic process Gc(t) can be written as follows:
c (c )
U c ( z , t ) = pi( c ) z
gi
. (5.42)
i =1
Output stochastic process Gcout(t) of the connecting system MSSc can be ob-
tained according to the following structure function:
Gcout (t ) = f cout {Gc (t ), Gcinp (t )} = min{Gc (t ), Gcinp (t )}. (5.43)
By using this structure function (5.43) and previously obtained UGFs U c ( z, t )

and U cinp ( z, t ) (Equations 5.40 and 5.42, respectively) one can obtain UGF
U cout ( z, t ) corresponding to the stochastic process of the output performance
Gcout(t). Gcout(t) at any instant t > 0 is defined as the connecting system output per-
formance that can be provided from the connecting system MSSc to the main sys-
tem:
Cout ( cout )
U cout ( z , t ) = pk( cout ) (t ) z
gk
k =1
c
Nr
g(c) max{ g (jr ) ,0}
= cout pi( c ) z i , p (j r ) z (5.44)
i =1 j =1
c Nr
= pi( c ) p (jr ) z
min gi{ (c)
,max[ g j
( r )
},
,0]
i =1 j =1
where Cout is the number of output performance levels for the discrete-state, con-
tinuous-time stochastic process Gcout(t) and pk( cout ) (t ) is the probability that sto-
chastic performance process Gcout(t) will be at level gk( cout ) , i = 1,.., Cout , at time
instant t > 0.
5.2.3.5 Model for Entire Multi-state System
The output performance stochastic process GMSS (t ) of the entire MSS considering
redundancy is defined by the following structure function f MSS :
GMSS (t ) = f MSS {Gm (t ) Wm (t ), Gcout (t )} = Gm (t ) Wm (t ) + Gcout (t ). (5.45)
Based on this structure function and the previously obtained U m ( z, t ) and

U cout ( z, t ) (Equations 5.30 and 5.44, respectively), the UGF U MSS ( z, t ) for the en-
tire MSS corresponding to the stochastic process GMSS (t ) can be obtained as
M MSS ( MSS )
= MSS {U m ( z , t ), U cout ( z , t )}
gj
U MSS ( z , t ) = p (jMSS ) z
j =1
M m (m)
Cout ( cout )
= MSS pi( m ) z i , pk( cout ) (t ) z k
g g
(5.46)
i =1 k =1
M m Cout (m) ( cout )
p (m ) gi + gk
= i pk( cout ) z ,
i =1 k =1
where MMSS is the number of output performance levels for the discrete-state, con-
tinuous-time stochastic process GMSS (t ) and p (j MSS ) (t ) is the probability that the
stochastic performance process GMSS (t ) will be at level g (j MSS ) , j = 1,.., M MSS , at
time instant t > 0.
5.2.4 Algorithm for Universal Generating Function Computation

for Entire Multi-state System
The procedure of UGF computation for the entire MSS considering redundancy is
graphically presented in Figure 5.6.
Fig. 5.6 Recursive procedure for resulting UGF computation for entire MSS with redundancy
The procedure consists of the following steps.

1. Based on reliability data (failure and repair rates) for all components in MSSm
and MSSr, individual UGFs (5.3) and (5.31) for all components can be obtained
by solving the corresponding systems of differential equations (5.1).
2. Based on structure functions f m and f r and individual UGFs for all compo-
nents in MSSm and MSSr, UGFs U m ( z , t ) and U r ( z , t ) corresponding to per-
formance stochastic processes Gm (t ) and Gr (t ) are evaluated according to
(5.27) and (5.35), respectively.
3. In this step should be calculated UGFs U m ( z, t ) and U r ( z, t ) corresponding
to performance stochastic processes Gm (t ) Wm (t ) , Gr (t ) Wr (t ) based on
(5.27) and (5.35), respectively.
4. UGF U cinp ( z, t ) corresponding to the stochastic processes Gcinp (t ) is evaluated
according to expression (5.40), which is based on UGF U r ( z, t ) and structure
function f cinp .
5. UGF U c ( z, t ) corresponding to stochastic processes Gc (t ) is obtained accord-
ing to Equation 5.42 for the connecting system.
6. UGF U cout ( z, t ) corresponding to performance stochastic process Gcout (t ) is
evaluated by (5.44), which is based on UGFs U c ( z, t ) and U cinp ( z, t ) corre-
sponding to performance stochastic processes Gc (t ) and Gcinp (t ), respectively,
and structure functions f cout [expression (5.43)].
7. Based on the structure function f MSS and previously obtained U m ( z, t ) and
U cout ( z, t ) [expressions (5.30) and (5.44), respectively], the resulting UGF
U MSS ( z, t ) corresponding to the resulting output performance stochastic proc-

ess GMSS (t ) for the entire MSS is obtained according to (5.46).
5.2.5 Reliability Measures Computation for Entire Multi-state

System
When the UGF [expression (5.46)] of an entire interconnected MSS is obtained,

the reliability measures for the system can be easily evaluated.
The entire MSS availability A(t ) at instant t > 0 can be evaluated as
M MSS
A(t ) =
i =1
pi( MSS ) (t )1( gi( MSS ) 0), (5.47)
where 1(True) 1 and 1( False) 0.

The expected performance deficiency at instant t > 0 is
MMSS
D(t ) = p
i =1
( MSS )
i (t ) (1)min( gi( MSS ) ,0). (5.48)
For a given period T the expected accumulated performance deficiency can be

calculated as
T
D = D(t ) dt , (5.49)

0
where D(t ) can be evaluated by Equation 5.48.
5.3 Case Studies
The presented technique is used to evaluate an interconnected electric generating

system, which consists of two electric generating systems connected by a tie line
as shown in Figure 5.7.
System 1 consists of two 360 MW coal units, one 220 MW gas unit, and one
220 MW oil unit. System 2 consists of one 360 MW coal unit and one 220 MW
gas unit. The corresponding parameters for these units (Goldner 2006) are shown
in Tables 5.15.3.
5.3 Case Studies 229
Fig. 5.7 Diagram of two interconnected generation systems
Table 5.1 Reliability parameters of the coal unit
Average capacity i ,10 10,i

State i
(MW) (f/year) (f/year)
1 0 93.7 11.6956
2 124 2037.2 7.1862
3 181 1137.7 8.5714
4 204 1368.7 7.8214
5 233 2246.2 2.7487
6 255 1123.1 6.3067
7 282 1460.0 3.9442
8 303 1307.5 5.9836
9 328 1269.6 28.1672
10 360 / /
Table 5.2 Reliability parameters of the gas unit
Average Capacity i ,10 10,i

State i
(MW)
(f/year) (f/year)
1 0 136.9 78.9189
2 96 1460.0 85.0485
3 115 673.8 584.0000
4 136 1460.0 136.8750
5 153 2920.0 547.5000
6 174 1752.0 85.8824
7 194 486.7 250.2857
8 215 584.0 673.8460
9 223 95.2 38.4211
10 228 / /
Table 5.3 Reliability parameters of the oil unit
Average Capacity i ,10 10,i

State i
(MW) (f/year) (f/year)
1 0 73.6134 128.8230
2 68 282.5806 136.8750
3 96 876.0000 168.4615
4 112 625.7143 139.0476
5 133 584.0000 116.8000
6 153 876.0000 190.4348
7 173 486.6667 182.5000
8 198 162.2222 365.0000
9 212 143.6066 273.7500
10 224 257.6471 302.0690
11 228 / /
The coal, gas, and oil units have 10, 10, and 11 states, respectively. It is as-
sumed that a tie line with a transmission capacity of 300 MW connects systems 2
and 1. The tie line is represented as a binary-state component, which has only two
states: full transmission capacity and complete failure. The failure rate and the re-
pair rate of the tie line are 0.477 f/year and 364 f/year, respectively (Goldner
2006). The demands have two levels: the low-demand level and the peak-demand
level. Demand W1(t) of system 1 is represented as a two-state, continuous-time
Markov stochastic process that at any instant t > 0 takes discrete values from the
{ }
set: w1 = w11 , w12 , where w11 = 40 MW and w12 = 800 MW. The corresponding
transition rates from the low-demand level to the peak-demand level and from the
peak-demand level to the low-demand level are 621.96 year1 and 876 year1, re-
spectively. Demand W2(t) of system 2 is also represented as a two-state, continu-
ous-time Markov stochastic process that at any instant t > 0 takes discrete values
{ }
from the set: w 2 = w21 , w22 , where w21 = 20 MW and w22 = 450 MW. The
corresponding transition rates are the same as those of system 1.
Case 1
In the first case, generating system 2 is the main MSSm and generating system 1 is
the corresponding reserve MSSr. The connecting system is presented by the tie
line (Figure 5.8). First, the reserve assistance from MSSr to MSSm will not be con-
sidered and MSSm will only satisfy its demand by its own resources. Second, we
consider that MSSr will provide the reserve to MSSm if MSSr can satisfy its own
demand. The instant availability, the instant expected performance deficiency,
and the expected accumulated performance deficiency of system 2 without con-
sidering the reserve assistance and considering the reserve assistance are shown in
Figures, respectively. It can be observed from Figures 5.8 5.9 that the instant
availability and the instant expected performance deficiency of system 2 reach
steady values after about 400 h. From Figures 5.8 5.10 one can see that reserve
assistance from MSSr to MSSm can greatly increase the MSSm reliability indices.
For example, because of redundancy the system steady-state availability increases
form 0.899 up to 0.972.
0.98
0.96
Availability
without reserve
0.94 with reserve
0.92
0.9
0.88
0 20 40 60 80 100 300 500 700
Time (h)
Fig. 5.8 Instant availability of system 2 with and without reserve assistance
20
Expected performance deficiency

15
10
without reserve
with reserve
5
0
0 50 100 600 1000
Time (h)
Fig. 5.9 Instant expected performance deficiency of system 2 with and without reserve assistance
4
x 10
2
Accumulated performance deficiency
without reserve
with reserve
1.5
0.5
0
0 50 100 600 1000
Time (h)
Fig. 5.10 Instant accumulated performance deficiency of system 2 with and without reserve as-
sistance
Case 2
In the second case generating system 1 is the main MSSm and generating system 2
is the corresponding reserve MSSr. System 2 will provide the reserve assistance to
system 1 if system 2 can satisfy its own demand. The instant availability and the
instant expected performance deficiency of system 1 evaluated by the proposed
model are shown in Figures 5.11 and 5.12, respectively. It can be seen from these
two figures that the instant availability and the instant expected performance defi-
ciency of system 2 reach steady values after about 400 h. Figure 5.11 shows the
expected accumulated performance deficiency for system 1.
1
0.99
Availability
0.98
0.97
0.96
0 50 100 600 1000
Time (h)
Fig. 5.11 Instant availability of system 1 with reserve assistance
12
Expected performance deficiency
10
0
0 50 100 600 1000
Time (h)
Fig. 5.12 Instant expected performance deficiency of system 1 with reserve assistance
As one can see, the method presented in this chapter is highly suitable for engi-
neering applications since the procedure is well formalized and based on the natu-
ral decomposition of the interconnected systems in their entireties. By using this
method the short-term and long-term performance of complex MSSs with redun-
dancy can be accurately predicted.
10000
Accumulated performance deficiency 8000
6000
4000
2000
0
0 50 100 600 1000
Time (h)
Fig. 5.13 Instant accumulated performance deficiency of system 1 with reserve assistance
References
Billinton R, Allan R (1996) Reliability evaluation of power systems. Plenum, New York
Ding Y, Lisnianski A, Wang P et al (2009) Dynamic reliability assessment for bilateral contact
electricity providers in restructured power systems. Electr Power Syst Res 79:14241430
Goldner Sh (2006) Markov model for a typical 360 MW coal fired generation unit. Commun
Depend Qual Manag 9(1):2429
Huang J, Zuo M, Fang Z (2003) Multi-state consecutive k-out-of-n systems. IIE Trans 35:527
534
Kuo W, Zuo M (2003) Optimal reliability modeling principles and applications. Wiley, New
York
London
Lisnianski A (2004a) Universal generating function technique and random process methods for
multi-state system reliability analysis. In: Proceedings of the 2nd International Workshop in
Applied Probability (IWAP2004). Piraeus, Greece: 237242
Lisnianski A (2004b) Combined universal generating function and semi-Markov process tech-
nique for multi-state system reliability evaluation. In: Communication of 4th International
Conference on Mathematical Methods in Reliability, Methodology and Practice (MMR2004),
2125 June, 2004, Santa-Fe, New Mexico
Lisnianski A (2007) Extended block diagram method for a multi-state system reliability assess-
ment. Reliab Eng Syst Saf 92(12):16011607
Lisnianski A, Ding Y (2009) Redundancy analysis for repairable multi-state system by using
combined stochastic process methods and universal generating function technique. Reliab
Eng Syst Saf 94:17881795
References 235
Modarres M, Kaminskiy M, Krivtsov V (1999) Reliability engineering and risk analysis: a prac-
tical guide. Dekker, New York
Tian Z, Zuo M, Huang H (2008) Reliability-redundancy allocation for multi-state series-parallel
systems. IEEE Trans Reliab 57(2):303310
Yeh W (2006) The k-out-of-n acyclic multistate-node network reliability evaluation using the
universal generating function method. Reliab Eng Syst Saf 91:800808
6 Reliability-associated Cost Assessment and
Management Decisions for Multi-state Systems
Reliability is an important factor in the management, planning, and design of any

engineering product. Today, in the global economy and due to various market
pressures, the procurement decisions for many products are not based only on ini-
tial purchasing costs, but on their total life cycle costs (LCCs) (Dhillon 2000).
Any important decision such as reliability allocation, spare parts storage, operation
modes, etc. is based on total life cycle cost. Total LCC analysis should include all
types of costs associated with a systems life cycle. The main part of these costs
for repairable systems is operation and maintenance costs. In order to repair a sys-
tem we must buy corresponding spare parts, so we must pay money for spare parts
purchasing. We also must allocate for spare parts storage and pay the repair team
to repair the system. In addition, there are financial losses when a system inter-
rupts its work because of failure and so on. All these costs together are usually
significantly greater than the cost of purchasing a system. Below these costs will
be called reliability associated costs (RACs). In order to perform effective life cy-
cle cost analysis RACs should be accurately assessed. The reliability engineer per-
forming such an assessment should have a basic knowledge and cooperate with
specialists in many areas (including engineering design, finance and accounting,
statistical analysis, reliability and maintainability engineering, logistics, and con-
tracting). Creating the methods for correct evaluation of RACs is one of the main
problems of practical reliability engineering. This problem is partially solved only
for binary-state systems. Unfortunately, for MSSs almost every individual practi-
cal case requires carrying on special research and very few research works have
been devoted to this problem till now. As a result, managers often do not even
recognize the problems existence. Therefore, there is a significant contradiction
between the great theoretical achievements of reliability theory and their relatively
rare successful applications in practice for MSSs.
In this chapter we present the history of LCC analysis, its principles, and corre-
sponding standards. It will be shown that RAC is really a main part of LCC for the
majority of repairable systems. The methods described in the previous chapters
will be applied below in order to assess RAC for MSSs for some methodologically
important cases. Based on this, the corresponding optimal management practices
238 6 Reliability-associated Cost Assessment and Management Decisions for M66V
are established. It is shown that significant amount of money may be saved as a

result of correct reliability management of MSSs.
6.1 Basic Life Cycle Cost Concept
The LCC of a system (product) is the total cost of acquiring and utilizing the sys-
tem over its entire life span. LCC includes all costs incurred from the point at
which the decision is made to acquire a system, through operational life, to even-
tual disposal of the system. So, in other words, LCC is the total cost of procure-
ment and ownership. As has been shown in many studies, the ownership cost (lo-
gistics and operating cost) for repairable systems can vary from 10 to 100 times
the procurement cost (Ryan 1978). The history of life cycle costing began in the
mid-1960s when a document entitled Life Cycle Costing in Equipment Procure-
ment was published (Logistics Management Institute 1965). In 1974, Florida be-
came the first US state to formally adopt the concept of life cycle costing, and in
1978, the US Congress passed the National Energy Conservation Policy Act
(Dhillon 2000). According to this act every new federal government building
should be LCC effective. From that time till now numerous works have been pub-
lished in this field. A variety of approaches have been suggested for estimating the
cost elements and providing inputs to the establishment of a LCC model for bi-
nary-state systems. The total LCC model is thus composed of subsets of cost mod-
els that are then exercised during trade off studies. These cost models range from
simple informal engineering/cost relationships to complex mathematical state-
ments derived from empirical data. Some of these cost models were extended
from binary-state models to multi-state models and will be considered in this
chapter.
As is known, total LCC is expressed in simple mathematical terms as sum of
acquisition cost and system utilization cost:
LCC = AC + SUC ,
where:
LCC life cycle cost;
AC acquisition cost;
SUC system utilization cost.
Figure 6.1 identifies the more significant cost types and shows how LCC may
be distributed in terms of major cost categories over a systems life cycle (MIL-
HDBK-338B).
In general, design and development costs include materials, labor, administra-
tive, overhead, handling, and transportation.
6.1 Basic Life Cycle Cost Concept 239
Fig. 6.1 Different types of costs
Production costs include all types of costs associated with system production.
Operation and support costs include a spare parts and replacements, equipment
maintenance, inventory management, support equipment, personnel training, tech-
nical data/documentation, and logistics management. In addition, there are finan-
cial losses when a system interrupts its work because of failures.
Disposal costs include all costs associated with deactivating and preparing the
system for disposal through scrap or salvage programs. Disposal costs may be ad-
justed by the amount of value received when the disposal process is through sal-
vage.
LCC analysis provides a meaningful basis for evaluating alternatives regarding
system acquisition and operation and support costs. Based on this analysis, devel-
opment and production goals can be established as well as an optimum required
reliability level. Figure 6.2 illustrates the relationships between reliability and cost
(MIL-STD-338B). The top curve is the total LCC and it is the sum of the acquisi-
tion or investment and operation and support costs. The figure shows that in gen-
eral a more reliable system has lower support costs. At the same time, acquisition
costs (both development and production) increase to attain the improved reliabil-
ity. In this figure one can see the point where the amount of money (investment)
spent on increasing reliability and amount saved in support costs will be exactly
the same. This point represents the reliability for which the total cost is minimal.
The implementation of an effective program based on proven LCC principles,
complete with mathematical models and supporting input cost data, will provide
early cost visibility and control, i.e., indicate the logistics and support cost conse-
quences of early research, development, and other subsequent acquisition deci-
sions.
Fig. 6.2 Life cycle cost as a function of system reliability
There are many known advantages of the LCC approach such as making effec-
tive equipment replacement decisions, comparing the cost of competing projects
and making a selection among the competing contractors, etc. On the other hand,
providing correct LCC analysis for a real system is not a simple job. It requires
very high professional skills, first of all because of the absence of general models
recommended for LCC analysis in standards. Theoretically there are many meth-
ods (a variety of approaches as formulated in MILHDBK338B), but in prac-
tice, there is nothing for immediate use. Because of this reason LCC analysis till
now has been a bad-formalized problem and its solution is also expensive and
time consuming. Almost for any practical case it is required to provide special re-
search work.
To perform LCC analysis the following steps should be executed as shown in
Figure 6.3. (Dhillon 2000). Usually step 2, where all involved costs should be es-
timated, requires the most amount of time and resources. The major component of
a repairable system life cycle is its operation and support phase. For the majority
of repairable MSSs the RAC is strongly associated with operation and support cost
and RAC is usually the main component of LCC.
To estimate correctly RAC, a variety of special models should be developed
and analyzed. There are models for inventory (spare parts) management, complex
reliability models that take into account all types of redundancy, different opera-
tion modes, different failure modes, etc., models for estimation of losses because
of system failures (for example, financial losses due to the interruption of the
power supply to consumers), and so on. Developing such models even for a bi-
nary-state system requires high-level professional skills.
A crucial factor for successful LCC analysis is the attitude and the thinking
philosophy of top level management toward reliability (Dhillon 2000). Without
the support of the top management, reliability and maintainability programs LCC
analysis will not be effective. If the top level management's positive and effective
attitude is generated, then appropriate reliability and maintainability programs can
be successful. Such an attitude can be created only on the base of corresponding
education in the field of reliability engineering.
6.1 Basic Life Cycle Cost Concept 241
1. Determinating systems useful life
2. Estimating involved costs (procurement, operation,

maintenance costs, etc.)
3. Estimating terminal value of the system under

consideration
4. Substracting the terminal value from the systems

ownership cost
5. Discounting the result of the preceding step to present

value
6. Obtaining systems life cycle cost by adding the result of

the previous step to the acquisition cost
Fig. 6.3 General life cycle costing steps
Below we shall demonstrate the methods for RAC assessment and optimization
in order to emphasize their importance for management decisions in MSSs. Addi-
tional examples can be found in Lisnianski and Levitin (2003) and in Levitin
(2005).
6.2 Reliability-associated Cost and Practical Cost-reliability

Analysis
In many practical cases, the reliability engineer has to choose the best solution out
of a number of alternatives. If this number is large, the decision should be based
on an optimization approach. If it is relatively small, the decision may be based on
a comparison analysis.
Usually it is not enough to only assess MSS reliability in order to compare the
existing alternatives. For example, in order to make a decision about system reli-
ability improvement, both the benefits from the system reliability improvement
and the investment costs associated with this improvement should be taken into
account.
In this section we consider a comparison analysis based on cost-type criteria.
Suppose that several different alternatives should be compared. The economic
losses caused by system failures (spare parts cost, payment for repairing team, fi-
nancial losses due to system staying in unacceptable states, etc.) in many cases can
be estimated based on a Markov reward model (see examples in Section 2.4). One
can find the total expected economic losses V j (t ) or RAC during the system use
time t for each alternative j.
If during each year i from the beginning of the system use economic losses of
alternative j are V ji , the total RAC during the entire period of system use (m
years) expressed in the present values can be obtained as
m V ji
RAC = V j* = , (6.1)
i =1 (1 + IR)i
where IR is the interest rate.

On the other hand, the given reliability level in alternative j was achieved by
the investment Cinvj at the beginning of system use (for example, by purchasing
and installing redundant equipment).
The net present value of the profit from the investments in system reliability
improvement in alternative j is
CNj = V j* Cinvj . (6.2)
According to the cost-type criterion, the best alternative is the one that maxi-
mizes the net present value of the profit:
j : CNj = V j* Cinvj max . (6.3)

6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 243
In this subsection we consider methodically important case studies that demon-

strate an application of previously described methods applied to real-world man-
agement decision making. The first case study will demonstrate a pure reliability
analysis of MSS where reliability requirements are presented, but the cost re-
quirement is not defined explicitly.
6.2.1 Case Study 1: Air Conditioning System
Consider the air conditioning system used in one Israeli hospital (Lisnianski et al.
2008). The system consists of two main online air conditioners and one air condi-
tioner in cold reserve. The reserve conditioner begins to work only when one of
the main conditioners has failed. The MSS performance is determined by the
number of air conditioners working online G ( t ) = {0, 1, 2} . Air conditioner fail-
ure rates are = 3 year-1 for the main conditioner and * =10 year-1 for the con-
ditioner in cold reserve ( * > , because the reserve conditioner is usually a sec-
ond-hand device). The repair rates for the main and reserve conditioners are the
same, = * = 100 year-1. Demand is a discreet-state, continuous-time Markov
process W(t) with two levels during a daily 24-h period: peak wpeak and low wlow.
The mean duration of the peak demand period is Td = 7 h. The mean duration of
the low demand period is equal to TN = 24 Td = 17 h. In order to satisfy peak
demand two air conditioners have to work together, so w peak = 2 and in order to
satisfy low demand only one air conditioner can work, so wlow = 1. MSS states
where performance G(t) is greater than or equal to demand W(t) are defined as ac-
ceptable states. States where G ( t ) W ( t ) < 0 are defined as unacceptable states
and entrance into one of these states is treated as a MSS failure.
For maintenance contract the system owner can choose a maintenance com-
pany for repairing air conditioners from a list of companies. Maintenance compa-
nies suggest different mean repair times that range from 0.7 to 7.3 d. Naturally a
contract that provides a lower mean time to repair (MTTR) is more expensive. So,
on one hand, the owner is interested in a less expensive contract or, in other
words, in a contract with maximal repair time. But on other hand, the repair time
should meet specified reliability requirements. Below we shall consider three dif-
ferent cases for given reliability requirements:
Case 1A. The annual average availability of the MSS should not be lower than
0.999 and the mean total number of system failures during 1 year should not be
greater than one.
Case 1B. The mean time up to the first system failure during 1 year should be
greater than or equal to 0.90 years.
Case 1C. The probability of MSS failure-free operation during 1 year should be
greater than or equal to 0.90.
The problem is to find maximal MTTR that meets reliability requirements for
these three cases.
Case 1A The state-transitions diagram for the MSS is presented in Figure 6.4.
6 12
N
2
2
*
d
5 * 11
N
d
* 2 8
N
2 2 2 *
*
d
2
*
4 10
N d
3 * *
* 9
*
2 N
2
d

1 7
Fig. 6.4 State-transitions diagram for MSS with two online conditioners and one conditioner in
cold reserve [Unacceptable states are grey]
This diagram was built in accordance with the algorithm from Section 2.4.2.2
for the combined performance-demand model.
There are 12 states. States 1 to 6 are associated with a peak demand period,
states 7 to 12 are associated with a low demand period.
In states 6 and 12, both 2 main air conditioners are online and the reserve air
conditioner is available. The system performance is g 6 = g12 = 2.
In states 5 and 11, one of the main air conditioners has failed and been replaced
by the reserve air conditioner. The system performance is g5 = g11 = 2.
In states 4 and 10, the second main air conditioner has failed, and only the re-
serve air conditioner is online. The system performance is g 4 = g10 = 1.
In states 3 and 9, the reserve air conditioner has failed, and only one main air
conditioner is online. The system performance is g3 = g9 = 1.
In states 2 and 8, the reserve air conditioner has failed, and two main air condi-
tioners are online. The system performance is g 2 = g8 = 2.
In state 1 and 7, the system suffers total failure. The system performance is
g1 = g 7 = 0.
If in the peak demand period the required demand level is w peak = 2 and in the
low demand period the required demand level is wlow = 1, then there are 8 accept-
able states: 12, 11, 10, 9, 8, 6, 5, and 2. States 7, 4, 3, and 1 are unacceptable. Sys-
tem entrance into any of unacceptable states is treated as a failure.
The transitions from state 6 to state 5, from state 2 to state 3, from state 12 to
state 11, and from state 9 to state 8 are associated with the failure of one of the
main air conditioners and have an intensity of 2. (This is so because either one of
two online main conditioners can fail.) The transitions from state 5 to state 4,
from state 3 to state 1, from state 11 to state 10, and from state 9 to state 7 are as-
sociated with failure of the second main air conditioner and have an intensity of .
state 9, and from state 10 to state 7 are associated with failure of the reserve air
conditioner and have an intensity of *.
state 11, and from state 7 to state 9 are associated with repair of one of the main
air conditioners and have an intensity of 2. The transitions from state 5 to state 6,
from state 3 to state 2, from state 11 to state 12, and from state 9 to state 8 are as-
sociated with failure of the main air conditioner and have an intensity of . The
transitions from state 3 to state 5, from state 2 to state 6, from state 1 to state 4,
from state 9 to state 11, from state 8 to state 12, and from state 7 to state 10 are
associated with repair of the reserve air conditioner and have an intensity of *.
state 10, from state 3 to state 9, from state 2 to state 8, and from state 1 to state 7
are associated with a variable demand and have an intensity of d = 1 Td . The
transitions from state 12 to state 6, from state 11 to state 5, from state 10 to state 4,
from state 9 to state 3, from state 8 to state 2, and from state 7 to state 1 are asso-
ciated with a variable demand and have an intensity of N = 1 TN = 1 (24 Td ).
As one can see below we have defined all transition intensities for the diagram
presented in Figure 6.4 and, therefore, determined the matrix of transition intensi-
ties (6.5) for the corresponding Markov model.
For simplification we use in (6.5) the following designations:
C1 = 2 + * + d , C5 = + * + + d , C9 = + + * + N ,
C2 = 2 + * + d , C6 = 2 + d , C10 = * + 2 + N ,
(6.4)
C3 = + + * + d , C7 = 2 + * + N , C11 = + * + + N ,
C4 = * + 2 + d , C8 = 2 + * + N , C12 = 2 + N .
The first reliability requirement in case 1A concerns annual average availability

and can be written by using the following expression: A(t ) |t =1 year 0.999.
C1 0 2 * 0 0 d 0 0 0 0 0

0 C2 2 0 0 *
0 d 0 0 0 0
C3 0 * 0 0 0 d 0 0 0
*
0 0 C4 2 0 0 0 0 d 0 0
0 0 * C5 0 0 0 0 d 0

0 0 0 0 2 C6 0 0 0 0 0 d (6.5)
a= .
N 2 *
0 0 0 0 0 C7 0 0 0
0 N 0 0 0 0 0 C8 2 * 0 0

0 0 N 0 0 0 C9 0 * 0
0 0 0 N 0 0 * 0 0 C10 2 0

0 0 0 0 N 0 0 0 * C11
0 0 0 0 0 N 0 0 0 * C12

In order to find the MSS annual average availability A(t ) |t =1 year we should pre-
sent the reward matrix rA in the following form (see Section 2.4.2.3 for rewards
determination).
0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0. (6.6)
rA = rij =
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1
In this matrix, rewards associated with all acceptable states are defined as 1 and
rewards associated with all unacceptable states are zeroed, as are all rewards
associated with all transitions.
The following system of differential equations (6.7) can be written in order to
find the expected total rewards Vi (t ), i = 1,...,12. The initial conditions are
Vi (0) = 0, i = 1,...,12.
dV1 (t )
dt = C1V1 (t ) + 2 V3 (t ) + *V4 (t ) + d V7 (t ),

dV2 (t ) = 1 C2V2 (t ) + 2V3 (t ) + *V6 (t ) + d V8 (t ),
dt

dV3 (t ) = V1 (t ) + V2 (t ) C3V3 (t ) + *V5 (t ) + d V9 (t ),
dt

dV4 (t ) = *V1 (t ) C4V4 (t ) + 2 V5 (t ) + d V10 (t ),
dt
dV (t )
5 = 1 + *V3 (t ) + V4 (t ) C5V5 (t ) + V6 (t ) + d V11 (t ),
dt
dV (t )
6 = 1 + 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
(6.7)
dV7 (t ) = N V1 (t ) C7V7 (t ) + 2 V9 (t ) + *V10 (t ),
dt
dV (t )
8 = 1 + N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
dt
dV (t )
9 = 1 + N V3 (t ) + V7 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV (t )
10 = 1 + N V4 (t ) + *V7 (t ) C10V10 (t ) + 2V11 (t ),
dt
dV11 (t )
= 1 + N V5 (t ) + *V9 (t ) + V10 (t ) C11V11 (t ) + V12 (t ),
dt
dV12 (t )
= 1 + N V6 (t ) + 2V11 (t ) C12V12 (t ).
dt
After solving system (6.7) and finding Vi(t), the MSS annual average availabil-
ity can be obtained as follows: A(t ) = V6 (t ) t where t = 1 year (Section 2.4.2.3).
Here the sixth state is the best state and is assumed to be the initial state, where the
MSS was at instant t = 0.
The results of calculation are presented in Figures 6.5 and 6.6.
In Appendix C, Section 5.1 one can find MATLAB code for MSS average
availability calculations.
As one can see from the curve in Figure 6.5 the MSS average availability (cal-
culated for MTTR 3.65 d) will be constant after 1 year and its constant value will
be lower than the required value of 0.999. This means that MTTR = 3.65 d is not
appropriate for the system owner.
0.9998
MSS average availability
0.9996
0.9994
0.9992
0.999
0.9988
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.5 The MSS average availability as a function of time (MTTR = 3.65 d)
In Figure 6.6 the constant values (stationary values after 1 year) of the MSS av-
erage availability were calculated for MTTR ranged from 0.7 up to 7.3 days. From
this figure one can conclude that the system can provide the required average
availability level (0.999 or greater) if the MTTR is less than or equal to 3.2 d
( 0.3125 d 1 ).
The curve in Figure 6.6 supports the engineering decision making and deter-
mines the area where the first reliability requirement (in the case of 1A) to the air
conditioning system can be met. As follows from the Figure 6.6, in order to pro-
vide the required average availability level greater than or equal to 0.999, the
MTTR should be less than or equal to 3.2 d. Thus, one can obtain the maximal
MTTR that meets the first reliability requirement for case 1A: MTTR AV = 3.2 d.
The second reliability requirement in case 1A concerns the mean total number
of system failures during one year. This number N f ( t ) should not be greater
than 1 for t = 1 year. This requirement can be written by the following expression:
N f (t ) |t =1 year 1.
1
Annual average availability
0.999
0.998
0.997
0.996
0.995
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean Time to Repair (d)
Fig. 6.6 The MSS annual average availability depending on mean time to repair
In order to find the mean total number of system failures N f ( t ) we (in accor-
dance with Section 2.4.2.3) should represent the reward matrix rN in the follow-
ing form (6.8):
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0. (6.8)
rN = rij =
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
In this matrix the rewards associated with each transition from the set of ac-
ceptable states to the set of unacceptable states should be defined as 1. All other
rewards should be zeroed.
Now the system of differential equations (6.9) can be written in order to find
the expected total rewards Vi (t ), i = 1,...,12. Here C1,,C12 are calculated via ex-
pressions (6.4). The initial conditions are Vi (0) = 0, i = 1,...,12.
dV1 (t )
dt = C1V1 (t ) + 2V3 (t ) + V4 (t ) + d V7 (t ),
*

dV2 (t ) = 2 C V (t ) + 2V (t ) + *V (t ) + V (t ),
dt 2 2 3 6 d 8

dV3 (t ) = V (t ) + V (t ) C V (t ) + *V (t ) + V (t ),
dt 1 2 3 3 5 d 9

dV4 (t ) = *V1 (t ) C4V4 (t ) + 2 V5 (t ) + d V10 (t ),
dt
dV (t )
5 = * + + *V3 (t ) + V4 (t ) C5V5 (t ) + V6 (t ) + d V11 (t ),
dt
dV (t )
6 = 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt (6.9)

dV7 (t ) = V (t ) C V (t ) + 2 V (t ) + *V (t ),
dt N 1 7 7 9 10
dV (t )
8 = N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
dt
dV (t )
9 = N + + N V3 (t ) + V7 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV10 (t )
= N + * + N V4 (t ) + *V7 (t ) C10V10 (t ) + 2V11 (t ),
dt
dV11 (t )
= N V5 (t ) + *V9 (t ) + V10 (t ) C11V11 (t ) + V12 (t ),
dt
dV12 (t )
dt = N V6 (t ) + 2V11 (t ) C12V12 (t ).

After solving this system and finding Vi(t), the mean total number of system
failures N f ( t ) can be obtained as follows: N f (t ) = V6 (t ), where the sixth state is
the best state and is assumed to be the initial state.
The results of calculation are presented in Figures 6.7 and 6.8.
In Appendix C, Section 5.2 one can find MATLAB code for mean number of
system failures calculations.
As one can see from the curve in Figure 6.7 the mean number of MSS failures
(which has been calculated for MTTR of 3.65 d) will be 1.5 after 1 year, and so it
will be greater than the required value of 1. Therefore, MTTR = 3.65 d is not ap-
propriate for the system owner.
In the Figure 6.8 N f ( t ) ( t = 1 year) was calculated for MTTF range from 0.7
up to 7.3 d. From this figure one can conclude that the system can provide the re-
quired value N f ( t ) 1 for t = 1 year if MTTR is less than or equal to
MTTR N = 2.8 d.
Mean number of system failures 1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.7 Mean number of system failures as a function of time (MTTR = 3.65 d)
Finally, in order to provide both reliability requirements that were formulated

in case 1A (average annual availability greater than or equal to 0.999 and total
number of system failures during 1 year less than or equal to 1), the MTTR should
be as follows:
MTTR min {MTTR AV , MTTR N } = min {3.2 d, 2.8 d} = 2.8 d.
Therefore, the maximal MTTR for case 1A should be 2.8 d.

Mean annual number of system failures
0
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (d)
Fig. 6.8 Mean number of system failures depending on MTTR during 1 year
Case 1B In this case the mean time up to the first system failure (or MTTF)
should be greater than or equal to 0.90 years. In order to calculate the MTTF, the
initial model presented in Figure 6.3 should be transformed. In accordance with
the method of Chapter 2, all transitions that return the MSS from unacceptable
states should be forbidden and all unacceptable states should be united in one ab-
sorbing state. The transformed model is shown in Figure 6.9. Here all unaccept-
able states are united in one absorbing state 1.
Fig. 6.9 Transformed state-transition diagram with absorbing state for MTTF computation [Un-
acceptable state is grey]
To determine the matrix of transition intensities (6.11) for the corresponding

Markov model (Figure 6.9), the following designations are used:
C2 = 2 + * + d , C8 = 2 + * + N , C11 = + * + + N ,
C5 = + + + d , C9 = + + + N , C12 = 2 + N .
* *
(6.10)
C6 = 2 + d , C10 = * + 2 + N ,
In order to assess MTTF for a MSS, the rewards in matrix r for the transformed
model should be determined in the following manner: the rewards associated with
all acceptable states should be defined as 1 and the rewards associated with unac-
ceptable (absorbing) states should be zeroed, as should all rewards associated with
transitions.
0 0 0 0 0 0 0 0 0

2 d
*
C2 0 0 0 0 0
+ * 0 C5 0 0 0 d 0

0 0 2 C6 0 0 0 0 d
a= 0 N 0 0 C8 2 0 0 * . (6.11)

+ N *
0 0 0 C9 0 0
* + 0 0 0 0 0 C10 2 0
N
0 0 N 0 0 * C11

0 0 0 N 0 0 0 2 C12
The reward matrix for the system with two online conditioners and one in cold
reserve is as follows:
0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
r = rij = 0 0 0 0 1 0 0 0 0. (6.12)
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
The following system of differential equations can be written in order to find

the expected total rewards Vi (t ), i = 0, 2,5, 6,8,9,10,11,12:
dV0 (t )
dt = 0,

dV2 (t ) = 1 + 2V (t ) C V (t ) + *V (t ) + V (t ),
dt 0 2 2 6 d 8

dV5 (t ) = 1 + ( + * )V (t ) C V (t ) + V (t ) + V (t ),
dt 0 5 5 6 d 11

dV6 (t ) = 1 + 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
dV (t )
8
= 1 + N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
(6.13)

dt
dV9 (t )
= 1 + ( + N )V0 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV10 (t )
dt = 1 + ( + N )V0 (t ) C10V10 (t ) + 2V11 (t ),
*

dV11 (t ) = 1 + V (t ) + *V (t ) + V (t ) C V (t ) + V (t ),
dt N 5 9 10 11 11 12

dV12 (t ) = 1 + V (t ) + 2V (t ) C V (t ).
dt N 6 11 12 12
The initial conditions are Vi (0) = 0, i = 0, 2,5, 6,8,9,10,11,12.

After solving this system and finding all Vi ( t ) , the MTTF for the MSS can be
obtained as V6 (t ), where the sixth state is the best state and is assumed to be the
initial state. The results of calculation are presented in Figures 6.10 and 6.11.
0.8
0.7
0.6
0.5
MTTF (year)
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.10 Mean time to system failure as a function of time (MTTR = 3.65 d)
In Appendix C, Section 5.3 one can find MATLAB code for mean time to
system failure calculations.
As one can see from the curve in Figure 6.10 the mean time to system failure
(which was calculated for mean time to repair of 3.65 d) will be 0.78 after 1 year,
and so it will be less than the required value of 0.90. Therefore, MTTR = 3.65 d is
not appropriate for the system owner.
0.95
0.9
0.85
MTTF (year)
0.8
0.75
0.7
0.65
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (d)
Fig. 6.11 Mean time to system failure depending on MTTR
In Figure 6.11 the MTTF during 1 year was calculated for mean times to repair
range from 0.7 d to 7.3 d. From this figure one can conclude that the system can
provide the required value of MTTF 0.90 for t = 1 year if MTTR is less than or
equal to 1.65 d.
Therefore, the maximal MTTR for case 1B should be 1.65 d.
Case 1C In order to solve the problem in case 1C one should find the MSS reli-
ability function R(t), which defines the probability of failure-free operation during
the period [ 0, t ] .
To calculate the system reliability function R(t), the model presented in Figure
6.9 is used. As was described in the previous case, in this model all unacceptable
states are treated as one absorbing state and all transitions that return MSS from
unacceptable states are forbidden. But rewards in this case should be defined in
another way. As was described in Section 2.4.2.3, all rewards associated with tran-
sitions to the absorbing state should be defined as 1. All other rewards should be
zeroed.
Therefore, one obtains the following reward matrix for the MSS in this case:
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
r = rij = 0 0 0 0 0 0 0 0 0. (6.14)
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
The mean accumulated reward Vi (t ) will define the probability Q(t) of MSS
failure during the time interval [0, t].
The following system of differential equations (6.15) can be written in order to
find the expected total rewards Vi (t ), i = 0, 2,5, 6,8,9,10,11,12.
Coefficients Ci , i = 2,5, 6,8,9,10,11,12, are calculated via formulas (6.11).
The initial conditions are Vi (0) = 0, i = 0, 2,5, 6,8,9,10,11,12.
dV0 (t )
dt = 0,

dV2 (t ) = 2 + 2V (t ) C V (t ) + *V (t ) + V (t ),
dt 0 2 2 6 d 8

dV5 (t ) = + * + ( + * )V (t ) C V (t ) + V (t ) + V (t ),
dt 0 5 5 6 d 11

dV6 (t ) = 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
dV (t )
8
= N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ), (6.15)

dt
dV9 (t )
dt = + N + ( + N )V0 (t ) + V8 (t ) C9V9 (t ) + V11 (t ),
*

dV10 (t )
dt = + N + ( + N )V0 (t ) C10V10 (t ) + 2 V11 (t ),
* *

dV11 (t ) = V (t ) + *V (t ) + V (t ) C V (t ) + V (t ),
dt N 5 9 10 11 11 12

dV12 (t ) = V (t ) + 2V (t ) C V (t ).
dt N 6 11 12 12
After solving this system and finding Vi ( t ) , the MSS reliability function
can be obtained as R(t ) = 1 V6 (t ), where the sixth state is the best state and is as-
sumed to be the initial state.
The results of the calculation are presented in Figure 6.12.
Probability of MSS failure during 1-year interval
0.9
0.8
0.7
0.6
0.5
0.4
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (days)
Fig. 6.12 Probability of failure-free operation during 1 year as a function of MTTR
In Appendix C, Section 5.4 one can find MATLAB code for the probability of
failure-free operation calculations.
As one can see from the Figure 6.12 in order to meet the reliability requirement
for case 1C R(t ) |t =1 year 0.90, maximal MTTR should be equal to 0.88 d, and
therefore MTTR should be less than or equal to 0.88 d.
6.2.2 Case Study 2: Feed Water Pumps for Power Generating Unit
Consider a subsystem of feed water pumps that supply the water to the boiler in a
coal power generating unit. From a reliability point of view, the generating unit is
a series of three interconnected subsystems: feed water pump subsystem, boiler,
and turbine generator.
The generating unit should provide a nominal generating capacity of
g nom = 100, 000 KW. If the feed water pump subsystem works with water trans-
mission capacity g fw = gbasic , the entire unit is able to generate capacity
Gu = g nom . If the capacity gfw of the feed water pump subsystem is reduced to a
level of g fw = kgbasic (0.5 k < 1), the unit reduces its generating capacity to the
level kgnom. The coal generating unit is installed in order to satisfy constant de-
mand w = g nom .
A designer has seven different possible configurations of the feed water pump
subsystem. Each configuration can be designated as ngp, where n is the number of
identical pumps and gp is the nominal capacity of each pump.
The first (basic) configuration consists of one pump that provides 100% of the
units capacity ( n = 1 and g p = gbasic ) .
The six other configurations consist of two identical pumps with different
nominal capacities.
2nd configuration: g p = kgbasic , k = 0.5. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = gbasic .
3rd configuration: g p = kgbasic , k = 0.6. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = 1.2g basic .
4th configuration: g p = kgbasic , k = 0.7. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = 1.4g basic .
capacity is g fw = 2g p = 2 kgbasic = 1.6 gbasic .
capacity is g fw = 2g p = 2 kgbasic = 1.8 g basic .
capacity is g fw = 2g p = 2 kgbasic = 2.0gbasic .
Each type of pump that can be chosen has only total failures. The failure and
repair rates are the same for all of the pumps ( = 0.0001 h-1 and = 0.01 h-1).
In the first configuration, the pump failure causes the outage of the entire gen-
erating unit. In this case the generating capacity of the unit Gu is reduced to zero:
Gu = 0. In the configuration with two pumps the failure of a single pump causes
the reduction of the unit generating capacity to Gu = kg nom ; simultaneous failure
of two pumps causes the outage of the generating unit ( Gu = 0 ) .
In the cases of the generating unit outage or capacity reduction the generating
capacity deficiency can be partially compensated by a spinning reserve that usu-
ally exists in power systems. The spinning reserve provides additional generating
capacity g nom , where varies from 0 to 1.
A power that cannot be supplied by the power system to consumers in the case
of pump failure is
D = w Gu g nom = g nom Gu g nom . (6.16)

The value D defines the part of the power systems load that must be immedi-
ately switched off by the shedding system. The power D is not supplied to the
consumers till the reserve gas turbines start up and compensate the remaining gen-
erating deficiency. The turbine startup process takes time = 0.25 h. Hence, the
energy not supplied (ENS) to consumers during this time is
ENS = D . (6.17)
During the time the penalty c p = 4 $/KWh must be paid for every kilowatt-
hour of non-supplied energy.
The energy supplied by the gas turbines is more expensive than that supplied
by the coal power unit. The difference in the energy cost is c=0.1 $/KWh.
Each configuration of the feed water pump subsystem has its own investment
cost Cinv associated with pump purchase and installation. In Table 6.1 one can see
the investment costs for each configuration as well as the increase of these costs
Cinv over the cost of the basic configuration 1gbasic.
Table 6.1 Investment cost for each subsystem configuration
Subsystem Cinv Cinv

configuration (mln $) (mln $)
1gbasic 4.60 0.00
20.5gbasic 4.70 0.10
20.6gbasic 5.16 0.56
20.7gbasic 5.60 1.00
20.8gbasic 6.00 1.40
20.9gbasic 6.39 1.79
21.0gbasic 6.76 2.16
One can see that a tradeoff exists between the investment costs and costs of
losses caused by the energy not supplied, and by using more expensive gas turbine
energy. In order to compare the configurations, one has to evaluate the total costs
associated with each configuration in net present values.
In order to obtain the cost of losses caused by the system unreliability we use
the Markov reward model described in Section 2.4.
Consider first the configuration 1gbasic. The state-space diagram of Markov re-
ward model for this configuration is presented in Figure 6.13 (a).
In state 2 the feed water pump operates providing the desired generating unit
capacity w = g nom . If the pump fail, the MSS transits from state 2 to state 1 (with
transition intensity rate of ). In state 1 the gas turbines work in order to supply
the energy to consumers instead of a failed generating unit.
The reward r21 corresponding to the transition from state 2 to state 1 is de-
fined as a penalty cost for energy not supplied before the gas turbines start to op-
erate:
r21 = D c p , (6.18)
where D = g nom Gu g nom = g nom (1 ), because Gu = 0.
(a) (b)
Fig. 6.13 State-transition diagram of Markov reward models: (a) for configurations with n=1 and
(b) for configurations with n=2
The reward r11 corresponding to each time-unit (hour) when the MSS is in state
1 is defined as the excessive cost of energy supplied by the gas turbines:
r11 = Dc = g nom (1 ) c. (6.19)
For large generating units (with capacity greater than or equal to 100,000 KW)
the cost associated with normal operation of the coal generating unit (in state 2) is
negligibly small in comparison with the cost of alternative energy produced by the
gas turbines and with the penalty cost of unsupplied energy. Therefore, reward r22
can be zeroed. In the same way one can neglect the cost of pump repair and zero
the reward r12 associated with transition from state 1 to state 2. Hence, the reward
matrix takes the form
r11 0
r = rij = . (6.20)
r21 0
Transition intensity matrix corresponding to the state-transition diagram pre-

sented in Figure 6.13 (a) takes the form
a11 a12
a = aij = = . (6.21)
a21 a22
We assume that the evolution begins from the best state 2. According to Sec-
tion 2.4, in order to obtain the total expected reward V2 ( t ) during time t under ini-
tial conditions V1 (0) = V2 (0) = 0, we must solve the following system of differen-
tial equations:
dV1 (t )
dt = r11 V1 (t ) + V2 (t ),
(6.22)
dV2 (t ) = r + V (t ) V (t ).
dt 21 1 2
In Table 6.2, one can see the total annual expected costs
V2 ( t ) (T = 1 year = 8760 h ) obtained for different values of relative capacity of
the spinning reserve (the costs are in thousands of dollars).
Consider now the configurations with two pumps ( n = 2 ) . The state-transition
diagram of Markov reward model for these configurations is presented in Figure
6.13 (b).
In state 3, both feed water pumps operate and the generating unit capacity is
Gu = g nom . The cost of normal operation is negligibly small. Therefore, the reward
r33 that is associated with this state is equal to zero.
State 2 corresponds to the case where failure occurs in one of the pumps and
the single pump continues to work. The subsystem transits from state 3 to state 2
with intensity rate 2 because failure can occur in both pumps. The unit generat-
ing capacity in state 2 decreases and becomes equal to the capacity provided by
the single pump: Gu = kg nom . The power not supplied to consumers in state 2 is
D = max { g nom kg nom g nom , 0} . (6.23)
The energy not supplied to consumers before the startup of the gas turbines is
ENS = max { g nom kg nom g nom , 0} . . (6.24)
Therefore, the reward associated with the transition from state 3 to state 2 is
r32 = c p max { g nom kg nom g nom , 0} . (6.25)

The reward r22 corresponding to each time unit (hour) that the MSS is in state 2
is defined as the excessive cost of the energy supplied by the gas turbines:
r22 = Dc = max { g nom kg nom g nom , 0} c. (6.26)
The subsystem can return from state 2 to state 3 after repair with intensity rate
. The reward r23 associated with this transition is assumed to be negligible:
r23 = 0. (6.27)
If in state 2 the failure in the second pump occurs before the completion of re-
pair in the failed pump, the subsystem transits from state 2 to state 1. The intensity
rate of this transition is . The capacity of the generating unit in state 1 is Gu = 0,
and the power not supplied to the consumers is
D = g nom Gu g nom = g nom (1 ). (6.28)
The reward r21 corresponding to the transition from state 2 to state 1 is defined
as a penalty cost for energy not supplied before the gas turbines start to operate:
r21 = c p g nom (1 ) . (6.29)
The reward r11 corresponding to each time unit (hour) that the MSS is in the
state 1 is defined as the excessive cost of the energy supplied by the gas turbines:
r11 = g nom (1 )c. (6.30)
In state 1 both pumps can be repaired simultaneously. Therefore, the subsystem

can return to state 2 with intensity rate 2. The reward r12 associated with this
transition is negligible:
r23 = 0. (6.31)
The reward matrix takes the form
r11 0 0
r = rij = r21 r22 0 (6.32)
0 r32 0
and the subsystem transition intensity matrix corresponding to the state-space dia-
gram presented in Figure 6.13 (b) takes the form
a11 a12 a13 2 2 0

a = aij = a21 a22 a23 = ( + ) . (6.33)
a31 a32 a33 0 2 2
As in the case with n = 1 we assume that the evolution begins from the best
state (state 3). In order to obtain the total expected reward V3(t) under initial con-
ditions V1 (0) = V2 (0) = V3 (0) = 0, we must solve the following system of differen-
tial equations:
dV1 (t )
dt = r11 2V1 (t ) + 2V2 (t ),

dV2 (t )
= r22 + r21 + V1 (t ) ( + )V2 (t ) + V3 (t ), . (6.34)
dt
dV3 (t )
dt = 2 r32 + 2V2 (t ) 2V3 (t ).

In Table 6.2, one can see the total annual expected rewards V3(T) (in thousands
of dollars) obtained for different values of relative capacity of the spinning reserve
for all the considered configurations 2 kgbasic . The total annual expected reward
for the configuration 1gbasic is also given for comparison.
Table 6.2 Total annual expected reward for different subsystem configurations
1 gbasic 20.5 gbasic 20.6 gbasic 20.7 gbasic 20.8 gbasic 20.9 gbasic 2gbasic
0 954 945 758 571 384 197 10.1
0.2 763 569 382 195 8.11 8.11 8.11
0.4 572 193 6.08 6.08 6.08 6.08 6.08
0.6 382 4.06 4.06 4.06 4.06 4.06 4.06
0.8 191 2.03 2.03 2.03 2.03 2.03 2.03
1.0 0 0 0 0 0 0 0
Since the reward functions V2(t) and V3(t) obtained for configurations with
n = 1 and n = 2, respectively, reach their steady states very quickly (within 2
weeks) and there is no aging in a MSS, we can assume that the annual rewards are
the same for any year from the beginning of the subsystem operation. The relative
annual reward V for each configuration with n = 2 is obtained as the difference
between the annual reward of the configuration 1gbasic and the annual reward of
the given configuration 2k gbasic.
From Table 6.2 one can see, for example, that if the capacity of the spinning re-
serve installed in the power system is 0.2w ( = 0.2), the annual costs associated
with the unreliability of the feed water subsystem is $763,000 for the configura-
tion 1 gbasic and $8,100 for the configurations 2kgbasic, where k 0.8. Hence, if
for = 0.2 one chooses configuration 20.8 gbasic, 20.9 gbasic, or 21 gbasic instead
of the simplest configuration 1gbasic, the relative annual reward is
V = 763,000 8,100 = $754,900.
According to (6.1) the sum of equal relative annual rewards accumulated dur-
ing a period of m years in present values is
m
1
V *
= V , (6.35)
i =1 (1 + IR)i
where IR is the interest rate.

Each subsystem configuration is associated with its relative investment cost
Cinv (Table 6.1) and with the sum of relative annual rewards V*. By comparing
the net represent values of the relative profit C N = V * Cinv for different con-
figurations, one can find the most efficient one. The best configuration corre-
sponds to the maximal value of CN. If C N 0 for some configuration 2kgbasic,
this configuration is less efficient than the simplest configuration 1 gbasic.
The value CN was obtained for different configurations 2kgbasic and for differ-
ent . It is presented in Figure 6.14 (for the case of IR = 0.05, m = 5 years). From
this figure one can see that, for example, for = 0.4 the best configuration is
20.6gbasic.
Fig. 6.14 Relative costs of different configurations for feed water pump subsystem
If there is no spinning reserve in the power system, the best configuration is the
configuration 2gbasic (the maximum of the curve corresponding to = 0 is
6.3 Practical Cost-reliability Optimization Problems for Multi-state Systems 265
for g = gbasic ). The configuration 20.5gbasic is the worst one in this case. (It is even
worse than the basic configuration because for g = 0.5gbasic CN < 0.)
If the spinning reserve in a power system is unrestricted ( = 1.0), the simplest
configuration, 1gbasic, is the best because CN < 0 for any configuration 2kgbasic.
6.3 Practical Cost-reliability Optimization Problems for Multi-

state Systems
6.3.1 Multi-state System Structure Optimization
To provide a required level of system reliability or availability, redundant ele-

ments are often included. Including redundant elements in a system causes in-
vestment costs to increase. Usually engineers try to achieve the required level of
reliability at minimal cost. The problem is to find the number of redundant ele-
ments and their location in order to provide the required system reliability at mini-
mal cost. Such a problem is widely known as the redundancy optimization prob-
lem. It has been addressed in numerous research works where binary-state systems
are considered. A detailed overview of these works can be found in Kuo and
Prasad (2000). Principles and applications of reliability optimization models can
be found in Kuo and Zuo (2003), where some multi-state models are considered
too.
The problem of optimal reliability allocation for systems containing elements
with different reliability and performance characteristics is a problem of system
structure optimization. This problem for the MSS was introduced in Ushakov
(1987), where the general optimization approach was formulated. In this work, the
costs of elements have to be defined as analytical functions of their nominal per-
formance levels and the number of elements connected in parallel. The same reli-
ability index values have to be assigned to all elements of a given type, regardless
of their nominal performance level.
In practice, however, analytical functions that express the dependence of the
element cost on its nominal performance and reliability are usually unknown. Of-
ten engineers are dealing with a number of different versions for each type of sys-
tem element that are available on the market. Each version is characterized by its
nominal performance, reliability, and price proposed by the supplier. In order to
find the optimal MSS structure, the appropriate versions should be chosen from a
list of available versions for each type of element, as well as the number of ele-
ments of each version. The objective is to satisfy the required reliability or avail-
ability level at the minimal system total cost. This approach allows an engineer to
solve practical MSS structure optimization problems in which a variety of ver-
sions exist for each type of element and in which analytical dependencies for ele-
ment costs are unavailable.
6.3.1.1 Problem Formulation
A MSS consists of N components. Each component is a subsystem that can consist

of parallel elements with the same functionality. The interaction between the sys-
tem components is given by the system structure function. Different versions and
number of elements may be chosen for any given system component.
For each component i there are Bi versions available in the market. A steady-
state performance distribution g(i,b), p(i,b) and cost c(i,b) are specified for each
version b {1, 2,..., Bi } of element of type i. The structure of system component i
is defined by the numbers of parallel elements of each version n(i,b) for
1 b Bi . The vectors n i = {n ( i, b )} (1 i N , 1 b Bi ) define the entire sys-
tem structure. For given set of vectors = {n1 , n 2 , , n N } the total cost of the sys-
tem can be calculated as
N Bi
C( ) = n(i,b)c(i,b). (6.36)
i=1b=1
Having the system structure defined by its components reliability block dia-
gram and by the set , one can determine the entire MSS availability index
A(w,q) (1.21) for any given steady-state demand distribution w, q. The problem of
structure optimization for series-parallel MSSs is formulated as finding the mini-
mal cost system configuration * that provides the required availability level A ':
( )
C * min
(6.37)
( )
subject to A w, q, * A '.
6.3.1.2. Implementing the Genetic Algorithm
The natural way of encoding the solutions of the optimal assignment problem
(6.5) in a genetic algorithm (GA) is by defining a B-length integer string, where B
is the total number of versions available:
N
B = Bi . (6.38)
i =1
{ }
Each solution is represented by string a = a1 , , a j , , aB , where for each j,
i 1
j = Bm + b. (6.39)
m =1
Each element aj of string a denotes the number of parallel elements of type i

and version b : n ( i, b ) = a j . One can see that a is a concatenation of substrings
representing the vectors n1,n2,,nN. In order to limit the total number of elements
belonging to each component, one can use Equation A.4 from Appendix A.
The solution decoding procedure, based on the UGF technique, performs the
following steps:
1. Determines u-functions uib(z) of each version of elements according to their
steady-state performance distributions g(i,b), p(i,b).
2. Determines u-functions of subcomponents containing the identical elements by
applying the composition operators fpar [with structure functions (4.46) or
(4.47) depending on the type of system] over aj identical u-functions uib(z) [ac-
cording to (6.35) one defines index j for each pair i,b].
3. Determines u-functions of each component i (1 i N ) by applying the com-
position operators fpar over all nonempty subcomponents belonging to this
component.
4. Determines the u-function of the entire MSS U(z) by applying the correspond-
ing composition operators described in Chapter 4.
5. Determines the MSS availability for each demand level wm by applying the op-
erator A (4.29) over U(z) and obtains the availability index for the given de-
mand distribution using expression (1.21).
6. Determines the total system cost using expression (6.36).
7. Determines the solution fitness as a function of the MSS cost and availability
according to expression (A.10) as
M C ( a ) (1 + A ' A ( a ) ) 1( A ( a ) < A ' ) , (6.40)
where is a penalty coefficient and M is a constant value.

Case Study 3 Consider a coal transportation system for a power station consisting
of five basic components connected in series (Section 1.1):
subsystem of primary feeders;
subsystem of primary conveyors;
subsystem of stackers-reclaimers;
subsystem of secondary feeders;
subsystem of secondary conveyors.
The system belongs to the type of flow transmission MSS with flow dispersion
since its main characteristic is the transmission capacity and parallel elements can
transmit the coal simultaneously.
Each system element is an element with total failure (which means that it can
have only two states: functioning with the nominal capacity and total failure, cor-
responding to capacity 0). For each type of equipment, there exists a list of prod-
ucts available on the market. Each version of equipment is characterized by its
nominal capacity g (in hundreds of tons per hour), availability p, and cost c (mil-
lions of dollars). The list of available products is presented in Table 6.3.
Table 6.3 Parameters of MSS elements available on the market
Component 1 Component 2 Component 3 Component 4 Component 5

No. of
version Primary Primary Stackers- Secondary Secondary
of MSS feeders conveyors reclaimers feeders conveyors
element g p c g p c g p c g p c g p c
1 1.20 0.980 0.590 1.00 0.995 0.205 1.00 0.971 7.525 1.15 0.977 0.180 1.28 0.984 0.986
2 1.00 0.977 0.535 0.92 0.996 0.189 0.60 0.973 4.720 1.00 0.978 0.160 1.00 0.983 0.825
3 0.85 0.982 0.470 0.53 0.997 0.091 0.40 0.971 3.590 0.91 0.978 0.150 0.60 0.987 0.490
4 0.85 0.978 0.420 0.28 0.997 0.056 0.20 0.976 2.420 0.72 0.983 0.121 0.51 0.981 0.475
5 0.48 0.983 0.400 0.21 0.998 0.042 0.72 0.981 0.102
6 0.31 0.920 0.180 0.72 0.971 0.096
7 0.26 0.984 0.220 0.55 0.983 0.071
8 0.25 0.982 0.049
9 0.25 0.97 0.044
The system should have availability not less than A ' for the given steady-state
demand distribution presented in Table 6.4.
Table 6.4 Demand distribution
w 1.00 0.80 0.50 0.20

q 0.48 0.09 0.14 0.29
The total number of available versions in the considered problem is

B = 7 + 5 + 4 + 9 + 4 = 29. Thus, each string containing 29 integers can represent a
possible solution. In order to illustrate the string decoding process performed by
the GA, consider, for example, the string 00020010020010000000003000100.
This string corresponds to two primary feeders of version 4, one primary feeder of
version 7, two primary conveyors of version 3, one stacker of version 1, three sec-
ondary feeders of version 7, and one secondary conveyor of version 2.
According to the first step of the decoding procedure, the u-functions of the
chosen elements are determined as follows:
u14 ( z ) = 0.022z 0 + 0.978z 0.85 and u17 ( z ) = 0.016z 0 + 0.984z 0.26 (for the
primary feeders);
u23 ( z ) = 0.003z 0 + 0.997z 0.53 (for the primary conveyors);
u31 ( z ) = 0.029z 0 + 0.971z1.00 (for the stacker);
u47 ( z ) = 0.017z 0 + 0.983z 0.55 (for the secondary feeders);
u52 ( z ) = 0.016z 0 + 0.984z1.28 (for the secondary conveyor).
According to steps 2 and 3, we determine the u-functions of the five system

components using the composition operator fpar with function fpar producing the
sum of its arguments:
( ) ( 0.016z )
2
U1 ( z ) = 0.022z 0 + 0.978z 0.85 0
+ 0.984z 0.26 ,
( z ) = ( 0.003z ),
2
0
U2 + 0.997z 0.53
U 3 ( z ) = 0.029z 0 + 0.971z1.00 ,
( )
3
U 4 ( z ) = 0.017z 0 + 0.983z 0.55 ,
U 5 ( z ) = 0.016z 0 + 0.984z1.28 .
The u-function of the entire system is
U ( z ) = fser (U1 ( z ) , U 2 ( z ) ,U 3 ( z ) ,U 4 ( z ) , U 5 ( z ) ) ,
where function fser in the composition operator fser produces a minimum of its
arguments.
Having the system u-function, we obtain the steady-state availability index for
the given demand distribution using Equation 1.21 and operator (4.29): A = 0.95.
The total system cost, according to Equation 6.36, is
C = 2 0.42 + 0.22 + 2 0.091 + 7.525 + 3 0.071 + 0.825 = 9.805.
The fitness of the solution is estimated using Equation 6.40, where M = 50

and = 25. For the desired value of system availability A ' = 0.95, the fitness
takes the value
50 9.805 25 (1 + 0.95 0.95 ) 1( 0.95 < 0.95 ) = 40.19.

For the desired value of system availability A ' = 0.97, the fitness takes the
value
50 9.805 25 (1 + 0.97 0.95 ) 1( 0.95 < 0.97 ) = 14.695.
The minimal cost solutions obtained for different desired availability levels A '
are presented in Table 6.5. This table represents the cost, calculated availability,
and structure of the minimal cost solutions obtained by the GA. The structure of
each system component is represented by the string that has the form
n1 b1,...,nm bm, where nj is the number of identical elements of version bj be-
longing to this component.
Table 6.5 Optimal solution for MSS structure optimization problem
A ' =0.95 A ' =0.97 A ' =0.99

System availability 0.950 0.970 0.992
System cost 9.805 10.581 15.870
System structure
2 4,
Primary feeders 1 7, 2 4 2 2
1 6
Primary conveyors 2 3 6 5 2 3
2 2,
Stackers-reclaimers 1 1 1 1
1 3
Secondary feeders 3 7 6 9 3 7
Secondary conveyors 1 2 3 3 3 4
Consider, for example, the best solution obtained for A ' = 0.99. The minimal
cost system configuration that provides the system availability A=0.992 consists of
two primary feeders of version 4, one primary feeder of version 6, two primary
conveyors of version 3, two stackers of version 2, one stacker of version 3, three
secondary feeders of version 7, and three secondary conveyors of version 4. The
cost of this configuration is 15.870.
6.3.2 Single-stage Expansion of Multi-state Systems
In practice, the designer often has to include additional elements in the existing
system. It may be necessary, for example, to modernize a system according to new
demand levels or new reliability requirements. The problem of minimal cost MSS
expansion is very similar to the problem of system structure optimization. The
only difference is that each MSS component already contains some working ele-
ments. The cost of the existing elements should not be taken into account when
the MSS expansion cost is minimized.
The initial structure of the MSS is defined as follows: each component of type i
contains B0i different subcomponents connected in parallel. Each subcomponent j
in its turn contains n0(i,j) identical elements, which are also connected in parallel.
Each element is characterized by its steady-state performance distribution g(i,j),
p(i,j). The entire initial system structure can therefore be defined by a set
{g(i,j),p(i,j),n0(i,j)} for 1iN, 1jB0i, and by a reliability block diagram repre-
senting the interconnection among the components.
The optimal MSS expansion problem formulation is the same as in Section
6.3.1.1 and the GA implementation is the same as in Section 6.3.1.2. (The only
difference is that one should take into account u-functions of both the existing ele-
ments and the new elements chosen from the list.)
Case Study 4 Consider the same coal transportation system for a power station
that was considered in case study 3. The initial structure of this MSS is presented
in Table 6.6. (Each component contains a single subcomponent of identical ele-
ments: B0i = 1 for 1 i N .) All the existing elements as well as the new ones to
be included in the system (from the list of available products presented in Table
6.3) are elements with total failure characterized by their availability p and nomi-
nal transmission capacity g.
Table 6.6 Parameters of initial system structure
Capacity Availability Number of parallel

Component No g p elements
1 0.75 0.988 2
2 0.28 0.997 3
3 0.66 0.972 2
4 0.54 0.983 2
5 0.66 0.981 2
The existing structure can satisfy the demand presented in Table 6.4 with avail-
ability A ( w, q ) = 0.506. In order to increase the system availability to the level
of A ', the additional elements are included. The minimal cost MSS expansion so-
lutions for different desired values of system availability A ' are presented in Ta-
ble 6.7.
Table 6.7 Optimal solutions for MSS expansion problem
A ' = 0.95 A ' = 0.97 A ' = 0.99

System availability 0.950 0.971 0.990
Expansion cost 0.630 3.244 4.358
Added elements
Primary feeders 1 6 1 6
Primary conveyors 2 5 1 5, 1 4 1 5
Stackers-reclaimers 1 4 1 3
Secondary feeders 1 7 1 7 1 7
Secondary conveyors 1 4 1 4 1 4
Consider, for example, the best solution obtained for A ' = 0.99 (encoded by
the string 00000100000100100000001000001). The minimal cost system exten-
sion plan that provides the system availability A = 0.99 presumes the addition of
a primary feeder of version 6, a primary conveyor of version 5, a stacker of ver-
sion 3, a secondary feeder of version 7, and a secondary conveyor of version 4.
The cost of this extension plan is $4.358 million.
References
Dhillon BS (2000) Design reliability: fundamentals and applications. CRC Press, London
Goldner Sh (2006) Markov model for a typical 360 MW coal fired generation unit. Commun
Depend Qual Manag 9(1):2429
Kuo W, Prasad VR (2000) An annotated overview of system reliability optimization. IEEE Trans
Reliab 49(2): 487493
Kuo W, Zuo M (2003) Optimal Reliability Modeling Principles and Applications. Wiley, New
York
London
Lisnianski A, Frenkel I, Khvatskin L, Ding Y (2008) Multistate system reliability assessment by
using the Markov reward model. In: Vonta F, Nikulin M, Limnios N, Huber-Carol C (eds)
Stochastic models and methods for biomedical and technical systems. Birkhauser, Boston: pp
153168
Logistics Management Institute (LMI) (1965) Life cycle costing in equipment procurement, Re-
port No. LMI Task 4C5, LMI, Washington, DC
MILHDBK338B (1998). Electronic reliability design handbook. US Department of Defense,
Washington, DC
Ryan W (1978) Procurement views of life cycle costing. In: Proceedings of the Annual Sympo-
sium on Reliability, pp 164168
Ushakov I (1987) Optimal standby problem and a universal generating function. Sov J Comput
Syst Sci 25:6173
7 Aging Multi-state Systems
7.1 Markov Model and Markov Reward Model for Increasing

Failure Rate Function
Many technical systems are subjected during their lifetime to aging and degrada-
tion. After any failure, maintenance is performed by a repair team. This chapter
considers an aging MSS, where the system failure rate increases with time.
Maintenance and repair problems for binary-state systems have been widely
investigated in the literature. Barlow and Proshan (1975), Gertsbakh (2000), Val-
dez-Flores and Feldman (1989), and Wang (2002) survey and summarize theoreti-
cal developments and practical applications of maintenance models. Aging is usu-
ally considered as a process that results in an age-related increase of the failure
rate. The most common shapes of failure rates have been observed by Gertsbakh
and Kordonsky (1969), Meeker and Escobar (1998), Bagdonavicius and Nikulin
(2002), and Wendt and Kahle (2006), Finkelstein (2003). An interesting approach
was introduced by Finkelstein (2005, 2008), where it was shown that aging is not
always manifested by an increasing failure rate. For example, it can be an upside-
down bathtub shape of the failure rate that corresponds to a decreasing mean re-
maining lifetime function.
After each corrective maintenance action or repair, the aging systems failure
rate (t ) can be expressed as (t ) = q (0) + (1 q) (t ), where q is an im-
provement factor that characterizes the quality of the overhauls ( 0 q 1 ) and
(t ) is the aging systems failure rate before repair (Zhang and Jardine 1998). If
q = 1, than the maintenance action is perfect (the system becomes as good as
new after repair). If q = 0, this means that the failed system is returned back to a
working state by minimal repair (the system stays as bad as old after repair), in
which the failure rate of the system is nearly the same as before. The minimal re-
pair is appropriate for large complex systems (consisting of many different com-
ponents) where the failure occurs due to one (or a few) component(s) failing. So,
the minimal repair is usually appropriate for MSSs, and in this chapter we usually
274 7 Aging Multi-state Systems
will deal only with MSSs under minimal repairs, where q = 0. In such situations,
the failure pattern can be described by a non-homogeneous Poisson process
(NHPP). (A detailed description of NHPP can be found in Appendix B.) Incorpo-
rating the time-varying failure intensity into existing Markov models was sug-
gested in Welke et al. (1995) for reliability modeling of hardware/software sys-
tems. More details and interesting examples can be found in Xie et al. (2004).
Here we describe an extended approach (Lisnianski and Frenkel 2009), which in-
corporates the time-varying failure intensity of aging components into a Markov
reward model that is used for general reliability measure evaluation of nonaging
MSSs. Such a model will be called a non-homogeneous Markov reward model.
As was written in Chapter 2, for a Markov MSS, transition rates (intensities) aij
between states i and j are defined by the corresponding system failure ij and re-
pair ij rates. In MSSs, aging can be indicated in any failure rate that may be in-
creasing as a function of time ij (t ). A minimal repair is a corrective maintenance
action that brings the aging equipment to the condition it was in just before the
failure occurrence. An aging MSS subject to minimal repairs experiences reliabil-
ity deterioration with the operating time, i.e., there is a tendency toward more fre-
quent failures. In such situations, the failure pattern can be described by a Poisson
process whose intensity function monotonically increases with t. A Poisson proc-
ess with a nonconstant intensity is called non-homogeneous, since it does not have
stationary increments (Gertsbakh 2000). It was shown (see, for example, Xie et al.
2004) that an NHPP model can be integrated into a Markov model with time-
varying transition intensities aij (t ). Therefore, for aging MSSs, transition intensi-
ties corresponding to failures of aging components will be increasing functions of
time aij(t).
For a non-homogeneous Markov model a systems state at time t can be de-
scribed by a continuous-time Markov chain with a set of states {1, , K} and a
transition intensity matrix a = aij (t ) , i, j = 1,, K , where each transition inten-
sity may be the function of time t. Chapman-Kolmogorov differential equations
should be solved in order to find state probabilities for such a system (Trivedi
2002). For a non-homogeneous Markov reward model it is assumed that if the
process stays in any state i during the time unit, a certain amount of money rii is
gained. It is also assumed that each time the process transits from state i to state j
an additional amount of money rij is gained. A reward may also be negative when
it characterizes a loss or penalty. Such a reward process associated with non-
homogeneous Markov system states transitions is called a non-homogeneous
Markov process with rewards. For such processes, in addition to a transition inten-
sity matrix a, a reward matrix r = rij , i, j = 1,, K should be determined.
Let Vi (t ) be the expected total reward accumulated up to time t, given the ini-
tial state of the process at time instant t = 0 is in state i. Howard differential equa-
7.1 Markov Model and Markov Reward Model for Increasing Failure Rate Function 275
tions (Howard 1960) with time-varying transition intensities aij (t ) should be

solved under specific initial conditions in order to find the total expected rewards:
dVi (t ) K K
= rii + aij (t )rij + aij (t )V j (t ) , i = 1, 2,..., K . (7.1)
dt j =1 j =1
j i
In the most common case, the MSS begins to accumulate rewards after time in-
stant t = 0; therefore, the initial conditions are as follows:
Vi (0)=0, i = 1, 2,..., K . (7.2)
If, for example, the state K with the highest performance level is defined as the
initial state, the value VK ( t ) should be found as a solution of the system (1).
It was shown in Lisnianski and Levitin (2003) and Lisnianski and Frenkel
(2009) that many important reliability measures for aging MSSs can be found by
determining the rewards in a corresponding reward matrix. In the following case
study we extend this approach for aging MSSs under minimal repair. We should
remark that if the repair is not minimal, the Markov properties for such MSSs are
not justified and the approach cannot be applied.
7.1.1 Case Study: Multi-state Power Generating Unit
Consider a multi-state power generating unit with nominal generating capacity

360 KW. The corresponding multi-state model is presented in Figure 7.1 and has
four different performance levels: complete failure level ( g1 = 0 ) , two levels with
reduced capacity ( g 2 = 215 KW, g3 = 325 KW), and perfect functioning
( g 4 = 360 KW). Aging was indicated as an increasing transition failure rate
42 ( t ) = 7.01 + 0.2189t 2 . Other failure rates are constant: 41 = 2.63 year 1 and
43 = 13.14 year . Repair rates are the following: 14 = 2091 year , 24 = 742.8
1 1
year1, 34 = 446.9 year 1 . The demand is constant w = 300 KW and the power
unit failure is treated as generating capacity decreasing below demand level w.
The state-transition diagram for the system is presented in Figure 7.1.
Based on this state-transition diagram we assess the MSS average availability,
mean total number of system failures, accumulated mean performance deficiency
(in this case, where the generating system is considered, this will be the expected
energy not supplied to consumers).
Fig. 7.1 State-transition diagram of generating system
According to the state-transition diagram in Figure 7.1 the transition intensity ma-
trix a (7.3) can be obtained.
-14 0 0 14
0 -24 0 24
a= . (7.3)
0 0 -34 34
41 42 ( t ) 43 ( 41 + 42 ( t ) + 43 )
In order to find the MSS average availability A (T ) according to the intro-

duced approach (Section 2.4) we should represent reward matrix r in the follow-
ing form:
0 0 0 0
0 0 0 0
r = rij =
0 0 1 0 . (7.4)
0 0 0 1
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.3) and reward matrix (7.4) under initial condi-
tions Vi ( 0 ) = 0, i = 1,..., 4. . The results of calculation can be seen in Figure 7.2.
Calculation results are presented for two cases: for an aging unit with
42 ( t ) = 7.01 + 0.2189t 2 (dashed-dotted line) and for a non-aging unit
where 42 = 7.01 constant (bold line).
0.998
0.996
Average availability
0.994
0.992
0.99
0.988
0.986
0 1 2 3 4 5
Time (years)
Fig. 7.2 Calculation the MSS average availability
As one can see from Figure 7.2, the average availability for an aging MSS is
lower than the average availability for a non-aging MSS. Aging impact increases
over time.
In order to find the mean total number of system failures N f ( t ) we should pre-
sent reward matrix r in the following form:
0 0 0 0
0 0 0 0
r = rij = . (7.5)
0 0 0 0
1 1 0 0
sity matrix (7.3) and reward matrix (7.5) under initial conditions
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.3. Calcu-
lation results are presented for two cases: for an aging MSS with
42 ( t ) = 7.01 + 0.2189t 2 (dashed-dotted line) and for a corresponding non-aging

MSS where 42 = 7.01 constant (bold line). As one can see from Figure 7.3, the
mean number of failures during time T for an aging MSS is greater than the mean
number of failures for the corresponding non-aging MSS.
60
Mean number of system failures
50
40
30
20
10
0
0 1 2 3 4 5
Time (years)
Fig. 7.3 Mean number of system failures
In order to find the accumulated performance deficiency we should represent

the reward matrixes r in the following form:
300 0 0 0
0 85 0 0
r = rij = . (7.6)
0 0 0 0
0 0 0 0
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.4. Calcu-
lation results are presented for two cases: for aging MSS with
42 ( t ) = 7.01 + 0.2189t 2 (dashed-dotted line) and for a corresponding non-aging
MSS where 42 = 7.01 constant (bold line). Accumulated performance deficiency
for aging MSS is greater than accumulated performance deficiency for corre-
sponding non-aging MSS.
61320
52560
Accumulated performance deficiency
43800
35040
26280
17520
8760
0
0 1 2 3 4 5
Time (years)
Fig. 7.4 Accumulated performance deficiency
For computation of the mean time to failure and the probability of MSS failure
during the time interval, the state-space diagram of generated system should be
transformed all transitions that return the system from unacceptable states
should be forbidden and all unacceptable states should be treated as an absorbing
state. The state-transition diagram may be presented as shown in Figure 7.5.
41
g4 = 360
43
34
g3 = 325
3
w=300
41 +42 (t)
Fig. 7.5 States-transitions diagram of generating system with absorbing state
According to this state space diagram transition intensity matrix a can be repre-
sented as follows:
0 0 0
a= 0 -34 34 . (7.7)
41 + 42 ( t ) 43 ( 41 + 42 ( t ) + 43 )
In order to find the mean time to failure we should represent the reward ma-
trixes r in the following form:
0 0 0
r = rij = 0 1 0 . (7.8)
0 0 1
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.6.
0.12
0.1
Mean time to failure (years)
0.08
0.06
0.04
0.02
0
0 0.2 0.4 0.6 0.8 1
Time (years)
Fig. 7.6 Mean time to failure
In order to find the probability of MSS failure during the time interval [0,T], we
should represent the reward matrixes r in the following form:
0 0 0
r = rij = 0 0 0 . (7.9)
1 0 0
7.2 Numerical Methods for Reliability Computation for Aging Multi-state System 281
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculating the MSS reliability function are
presented in Figure 7.7.
0.8
MSS reliability function
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
Time (years)
Fig. 7.7 MSS reliability function during the time interval [0,T]
From all graphs one can see age-related MSS reliability decreasing compared
with non-aging MSS. In the last two figures graphs for mean time to failure and
reliability functions for aging and non-aging units are almost the same because of
the fact that the first MSS failure usually occurs within a short time (less than 0.5
years according to Figure 7.6) and aging impact is negligibly small for such a
short period. Thus, graphs for aging and non-aging MSS cannot be visually sepa-
rated for these two cases.
7.2 Numerical Methods for Reliability Computation for Aging

Multi-state System
In Section 7.1 we did not discuss a technique for numerically solving system (7.1).
But for some cases this may be necessary. Generally system (7.1) of differential
equations with variable coefficients may be solved using such tools as
MATLAB, MATHCAD, etc. But even these very powerful tools sometimes
solve this system with inappropriate accuracy. In Figure 7.8 one can see a typical
example of such a mistake.
0.12
Mean time to failure (years) 0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5
Time (years)
Fig. 7.8 Example of MATLAB mistake
Figure 7.8 presents the mean time to system failure (dashed-dotted line), which
is computed using MATLAB for the multi-state generating unit (see case study
from Section 7.1). The real mean time to system failure was computed in Section
7.1 and was presented in Figure 7.6. (This computation was based on a special ap-
proximation method that will be defined below.) To compare the results this
curve is also shown in Figure 7.8 (bold line). As one can see from Figure 7.8 these
two curves are essentially different. Such inaccuracy is noticed only when we are
dealing with aging MSS, or, in other words, when we are solving system (7.1)
with non-constant failure rates. All tools such as MATLAB, MATHCAD,
MATHEMATICA, etc. are perfect for solving a system for non-aging MSS (with
constant failure rates). Recently in engineering practice, when a system with non-
constant failure rates is solved, there is no way to predict whether there will be in-
accuracies. Moreover, it is often not easy to discover such cases, especially when
optimization problems are solved. In order to find an optimum, a corresponding
search procedure should be organized, and usually a great number of computations
for any reliability index should be performed. It is impossible to analyze each so-
lution online. Thus, in engineering practice for reliability computation for aging
MSS we recommend a special approximation approach, which is based on solving
system (7.1) for specific constant failure rates. The approach will be presented be-
low and it will be shown that the approach can prevent inaccuracies and an engi-
neer can be sure of the results.
7.2.1 Bound Approximation of Increasing Failure Rate Function
The ordinary Markov model and the Markov reward model were explained in de-
tails in Chapter 2. Here we briefly come back to this. We suppose that the Markov
model for the system was built under the assumption that the time to failure and
time to repair are distributed exponentially and there is no aging in the system
(failure rate function is constant). We also suppose that the Markov model for the
system has K states that may be presented by a state-space diagram, as can transi-
tions between states. Intensities aij , i, j = 1,..., K of transitions from state i to
state j are defined by corresponding failure and repair rates.
Let p j (t ) be the probability of state j at time t. The following system of differ-
ential equations for finding the state probabilities p j (t ), j = 1,..., K for the
Markov model can be written as
K
dp j ( t ) K
= pi ( t ) aij p j ( t ) a ji . (7.10)
dt i =1 i =1
i j i j
For a Markov reward model construction it is assumed that while the system is
in any state i during any time unit, some money rii will by paid. It is also assumed
that if there is a transition from state i to state j, the amount rij will by paid for
each transition. The amounts rii and rij are called rewards. They can be negative
while representing loss or penalty. The objective is to compute the total expected
reward accumulated from t = 0, when the system begins its evolution in the state
space, up to the time instant T under specific initial conditions.
Let V j (t ) be the total expected reward accumulated up to time t, if the system
begins its evolution at time t = 0 from state j. According to Section 2.4, the fol-
lowing system of differential equations must be solved in order to find this re-
ward:
dV j ( t ) K K
= rjj + aij rij + aijVi ( t ), j = 1, 2,..., K . (7.11)
dt i =1 i =1
i j
System (7.11) should be solved under specific initial conditions:

V j (0) = v j , j = 1, 2,..., K .
For an aging system, the failure rate (t ) increases with age. Here we suggest
a technique that can estimate the total expected reward by determining its lower
and upper bounds, which is easy for engineering applications.
The main idea of the suggested approach (Ding et al. 2009) is the partition of
system lifetime into some intervals, where for each time interval, the failure rate
may be assumed to be constant. In this case, the Markov reward model (7.11)
without aging may be applied in order to find the accumulated total expected re-
ward for each interval.
Table 7.1 Lower and upper bound approximations of the failure rate as piecewise constant func-
tions
Lower bound failure rate Upper bound failure rate
Interval approximation approximation
Time interval
no. n n+
1 0, t (0) (t )
2 t , 2t (t ) (2t )
3 2t ,3t (2t ) (3t )

n t ( n 1), t n (t (n 1)) (t n)

N t ( N 1), t N (t ( N 1)) (t N )
Denote as N the number of intervals that partition the system lifetime T. The
length of each interval is t = T N . . The failure rate (t ) in each time interval
[ t (n 1), t n] , 1 n N , can be approximated by two constant values
and , which represent the value of function (t ) respectively at the be-
n n+
ginning and at the end of the corresponding nth time interval. Thus, we have
n = (t (n 1)) , (7.12)
n + = (t n) . (7.13)
The values n and n + for all n, 1 n N , can be represented as the piece-

wise constant functions of time, which represent, respectively the lower and upper
bounds for the function (t ). These piecewise constant approximations of the
function (t ), by using its lower and upper bounds for each interval, are presented
in Table 7.1 and Figure 7.9.
Fig. 7.9 Piecewise constant approximations of function (t )
7.2.2 Availability Bounds for Increasing Failure Rate Function
Let p j (t ) be the probability of state j at time t. The system of differential equa-

tions (7.10) is used for finding the state probabilities p j (t ), j = 1,..., K for the or-
dinary Markov model. The system (7.10) should be solved under specific initial
conditions: p j (0) = p j , j = 1, 2,..., K . Without loss of generality we assume that at
instant t = 0 the system is in state K. The initial conditions are as follows:
pK (0) = 1, p j (0) = 0, j = 1, 2,..., K 1. (7.14)
For a specific time interval n, lower n or upper n + bounds of the ( t ) are

used to represent failure rates in Equation 7.10. By solving the system of differen-
tial equations (7.10) we can determine state probabilities Pjn = p nj (t n) and
Pjn + = p nj + (t n) at the end of each time interval tn = [ t (n 1), t n ] ,
1 n N. Probability Pjn determines the probability of state j, if the failure rate
during this time interval n is constant and equal to the lower bound n of
the ( t ) . Correspondingly, probability Pjn + determines the probability of state j
if the failure rate during this time interval is constant and equal to the upper bound
n + of the ( t ) .
Using (7.10) the two following systems of differential equations can be used to
find state probabilities Pjn and Pjn + at the end of each time interval tn :
dp nj (t ) K K
= pin (t )aijn p nj (t ) a nji , (7.15)
dt i =1 i =1
i j i j
dp nj + (t ) K K
= pin + (t )aijn + p nj + (t ) a nji+ , (7.16)
dt i =1 i =1
i j i j
where i, j = 1, 2,..., K , n = 1, 2,..., N , and aijn and aijn + are intensities of transitions
from state i to state j, which use lower n and upper n + bounds of failure rates
for each time interval tn , respectively.
The initial state of the system is certainly known only for the first interval. We
assumed that the system is in state K at time t = 0. Therefore the initial conditions
for equations (7.15) for the first time interval n = 1 are
p1K (0) = 1, p1K1 (0) = ... = p11 (0) = 0 (7.17)
and the initial conditions for Equation 7.16 for the first time interval n = 1 are
p1K+ (0) = 1, p1K+1 (0) = ... = p11+ (0) = 0. (7.18)
For any other time interval tn , n = 2,3,..., N , the initial conditions (initial dis-
tribution of state probabilities) for the next time interval are defined by the solu-
tions (distribution of state probabilities) at the end of the previous interval and are
defined by the following recurrent formulas:
p nj (0) = Pj( n 1) , j = 1, 2,..., K ; n = 2,3,..., N , (7.19)
p nj + (0) = Pj( n 1) + , j = 1, 2,..., K ; n = 2,3,..., N . (7.20)
By solving the differential equations of systems (7.15) and (7.16) under initial
conditions (7.17) and (7.18), respectively, we determine the corresponding state
probabilities Pj1 and Pj1+ for each state j at the end of the first time interval
( t = t ) .
Therefore, the lower Aw (t ) and upper Aw+ (t ) bounds of the MSSs avail-
ability at the end of the first time interval ( n = 1) can be defined for any required
demand level w as follows:
Aw (t ) = P
gi w
1+
j , (7.21)
Aw+ (t ) = P
g j w
1
j . (7.22)
The lower and upper bounds of MSSs availability at the end of the nth time-
interval ( t = nt , n = 2,..., N ) can be defined for any required demand level w as
follows:
Aw ( nt ) = P j
n+
, (7.23)
g j w
Aw+ ( nt ) = P j
n
. (7.24)
g j w
This procedure should be repeated till the end of the last time interval. The
lower Aw ( N t ) and upper Aw+ ( N t ) bounds of the MSSs availability at the end
of the last time-interval n = N ( t = N t ) can be defined for any given demand
level w. It should be noted that N t = T . Therefore, the MSSs availability at life-
time T is within the bounds
Aw ( N t ) Aw (T ) Aw+ ( N t ). (7.25)
Repeating calculations Aw ( N t ) and Aw+ ( N t ) for increasing N one can esti-

mate Aw (T ) with the assigned level of accuracy.
7.2.3 Total Expected Reward Bounds for Increasing Failure Rate

Function
For total expected reward computation, according to the system of differential

equations (7.11) for each time interval n, one can use two constant values of func-
tion (t ): n , which represents the function (t ) at the beginning of the nth in-
terval and n + , which represents the function (t ) at the end of the nth interval.
Solving the system of differential equations (7.11) for these two values of the
(
function (t ) n and n + ) we can determine the lower and upper bounds of re-
wards Vi n
and Vi , accumulated during each time interval [ t (n 1), t n ],
n+
1 n N . In other words, reward Vi n determines the total expected reward for

time interval n if the system begins its evolution from state i at the beginning of
the time interval and the failure rate during this time interval is constant and equal
to n . Reward Vi n + determines the total expected reward for time interval n if
the system begins its evolution from state i at the beginning of the time interval
number n and the failure rate during this time interval is constant and equal to
n+ .
Using (7.11) the following systems of differential equations can be written to
find rewards Vi n and Vi n + :
dVi n ( t ) K K
= rii + aijn rij + aijn V jn ( t ), i = 1, 2,..., K , n = 1,..., N , (7.26)
dt j =1 j =1
j i
dVi n + ( t ) K K
= rii + aijn + rij + aijn +V jn + ( t ), i = 1, 2,..., K , n = 1,..., N , (7.27)
dt j =1 j =1
j i
where t [0, t ].
Failure rates n and n + for each time interval [ t (n 1), t n ] determine
the corresponding elements of the transition intensity matrixes aijn and aijn + in
(7.26) and (7.27).
The initial reward values are zeroed for any state k and for any time interval n:
Vi n (0) = Vi n + (0) = 0, i = 1, 2,..., K , n = 1,..., N . (7.28)
Thus, by solving (7.26) and (7.27) we obtain the lower Vi n (t ) and upper
Vi n + (t ) bounds for an expected reward for any time interval n, under the condi-
tion that the system begins its evolution at initial time t = 0 from any state
i = 1, 2, K . In other words, by solving N times (for each time interval n) systems
(7.26) and (7.27) we will have the rewards Vi n (t ) accumulated during each time
interval n if we use for this interval lower bounds n of the failure rate function,
and the will have the rewards Vi n + (t ) accumulated during each time interval n if
we use for this interval the upper bounds n + of the failure rate function. There-
fore, as a result we get for any time interval n = 1, 2, , N two vector columns:
{V
1
n
} { }
(t ), V2n (t ),...,VKn (t ) , V1n + (t ),V2n + (t ),..., VKn + (t ) . (7.29)
As one can see, the expected reward for each time interval depends strictly on
the initial state i [1,..., K ] . The initial state of the system is known with certainty
only for the first interval. For any other time interval n we can find only the prob-
ability distribution Pi n = Pr {i = k } , k = 1, 2,..., K of the initial states. If the prob-
ability distribution of the initial states is known for each time interval, the mean
reward accumulated during this interval can be defined as the sum of rewards
Vi n (t ) (V i
n+
)
(t ) , i = 1,, K , weighted according to the corresponding prob-
n
abilities Pi of the initial states.
Based on the system of differential equations (7.10) we can find these distribu-
tions for each time interval n for the lower and upper bounds of function ( t ) .
We designate these distributions as Pjn ( t ) and Pjn + ( t ) , respectively. For the first
time interval [ 0, t ] ( n = 1) these distributions are known. Without loss of gener-
ality we assume that the system is in state K at time t = 0, so PK1 (0) = 1 and
PK11 (0) = ... = P11 (0) = 0 and PK1+ (0) = 1 and PK1+1 (0) = ... = P11+ (0) = 0. The sys-
tem of differential equations to determine the state probabilities Pjn ( t ) and
Pjn + ( t ) , j = 1, 2,..., K , for each time t [t (n 1), t n] , 1 n N , can be writ-
ten in the following manner:
K
dPjn ( t ) n
K
= Pi ( t ) aij Pjn ( t ) a nji , j = 1, 2,..., K , n = 1, 2,..., N , (7.30)
n
dt i =1 i =1
i j i j
K
dPjn + ( t ) K
= Pi n + ( t ) aijn + Pjn + ( t ) a nji+ , j = 1, 2,..., K , n = 1, 2,..., N . (7.31)
dt i =1 i =1
i j i j
The initial conditions for the system of differential equations (7.30) were de-
fined above for the first time interval. For any other time interval
[t (n 1), t n], n = 2,3,..., N , the initial conditions are defined by the follow-
ing recurrent formula:
Pjn t ( n 1) = Pj( n 1) t ( n 1) , j = 1, 2,..., K , n = 2,3,..., N . (7.32)

This means that the initial conditions (initial distribution of state probabilities)
for the next interval are defined by the solutions (distribution of state probabilities)
at the end of the previous interval.
The initial conditions for the system of differential equations (7.31) are defined
in the same manner. For the first time interval [ 0, t ] ( n = 1) the initial conditions
were defined above: PK1+ (0) = 1 and PK1+1 (0) = ... = P11+ (0) = 0. For any other time
interval [t (n 1), t n], n = 2,3,..., N , the initial conditions are defined by the
following recurrent formula:
Pjn + t ( n 1) = Pj( n 1) + t ( n 1) , j = 1, 2,..., K , n = 2,3,..., N . (7.33)
The lower (upper) bounds for mean reward V n ( V n + ) , accumulated during

each time interval n, are defined as the sum of all rewards Vi n ( t )
(V ( t ) ) , i = 1,, K ,
i
n+
corresponding to this interval, weighted according to the
( )
initial state probabilities Pjn t ( n 1) Pjn + t ( n 1) that were found as
the solution of Equation 7.30 (or 7.31) for the previous interval. Therefore, we ob-
tain
K
V n = Vi n [ t ] Pi n t ( n 1) , n = 1,..., N , (7.34)
i =1
K
V n + = Vi n + [ t ] Pi n + t ( n 1) , n = 1,..., N . (7.35)
i =1
Now the lower (upper) bounds for the total expected reward (TER) accumu-
lated during system lifetime T = t N can be obtained as the sum of mean re-
wards V n ( V n + ) for all N intervals:
N
TER = V n , (7.36)
n =1
N
TER + = V n + . . (7.37)
n =1
Thus, the TER is between the lower and upper bounds
TER TER TER + . (7.38)
Repeating calculations TER and TER + for increasing N one can estimate the
TER with assign level of accuracy.
7.3 Reliability-associated Cost Assessment for Aging Multi-state System 291
7.3 Reliability-associated Cost Assessment for Aging Multi-state

System
Most technical systems are repairable. For many kinds of industrial systems, it is
very important to avoid failures or reduce their occurrences and duration in order
to improve system reliability and reduce the corresponding costs.
With the increasing complexity of systems, only specially trained staff with
specialized equipment can provide system service. In this case, maintenance ser-
vice is provided by an external agent and the owner is considered a customer of
the agent for maintenance service. In the literature, different aspects of mainte-
nance service have been investigated (Almeida 2001; Murthy and Asgharizadeh
1999; Asgharizadeh and Murthy 2000).
Usually there are a number of different companies that provide maintenance for
a technical system. From this point of view, a service market offers a customer
different types of maintenance contracts. Such contracts have different parameters,
related to the conditions of the services provided. The main parameters are re-
sponse time, service time, and costs (Almeida 2001). Response time depends
mainly on customer location. Service time depends on the repair teams profes-
sional skills and the required equipment. Generally, a faster response and a more
qualified repair team provide more expensive services. Well say that these pa-
rameters determine the maintenance contract level.
On the one hand, it is better for the customer to choose a contract with minimal
repair costs, but on the other hand, it should be taken into account that if repair
time increases, losses due to system failures will be greater too. These losses are
defined by the corresponding penalties, paid when the system has failed. In addi-
tion, in order to make a decision, the customer should take into account the corre-
sponding cost of system functioning operation cost. This cost is defined by the
fuel, electric energy, etc. needed for system functioning. When the system or some
of its parts fail, the operation cost will change. The sum of operation costs, repair
costs and penalty costs accumulated during the system life span will define the
RAC. The best decision for the customer will lead to contract that corresponds to a
minimum of RAC.
In this section, a general approach is suggested for computing RAC, accumu-
lated during the aging systems lifespan. The approach is based on a piecewise ap-
proximation of an increasing failure rate function for different time intervals and
on consecutive applications of the Markov reward model. A special iterative com-
putational procedure was suggested for the RAC estimation by determining its
lower and upper bounds. The main advantage of the suggested approach is that it
can be easily implemented in practice by reliability engineers; it is based solely on
ordinary Markov methods.
We will define RAC as the total cost incurred by the user in operations and
maintenance of a system during its lifetime. Thus, RAC will comprise the opera-
tions cost, cost of repair and the penalty cost accumulated during the system life-
span. Therefore,
RAC = OC + RC + PC , (7.39)
where
OC is the system operations cost accumulated during the systems lifetime. It
may be, for example, the cost of primary fuel for an electrical generator, the
cost of consuming electrical energy for an air conditioning system, and so on.
Introducing redundant elements usually requires additional operating cost.
When the system or some of its elements failed, the operation cost can de-
crease;
RC is the repair cost incurred by the user in operating and maintaining the sys-
tem during its lifetime; and
PC is the penalty cost accumulated during the system lifetime that was paid
when the system failed.
We suggest that T is the system lifetime. During this time the system may be in
acceptable states (system functioning) or in unacceptable states (system failure).
After any failure, a corresponding repair action is performed and the system re-
turns to one of the previous acceptable states. Every entrance of the system into a
set of unacceptable states (system failure) and the systems residing in unaccept-
able states is associated with a penalty.
A maintenance contract is an agreement between the repair team and the sys-
tem's owner that guarantees a specific level of services being delivered. The main-
tenance contract defines some important parameters that determine the service
level and corresponding costs. The main time parameters are mean response time
and mean repair time. Without loss of generality here we will deal with only one
parameter, mean repair time, Trm , where m ( m = 1, 2,..., M ) is a possible mainte-
nance contract level and M is the number of such levels.
Repair cost crm for individual repair action depends on repair time, and so it
corresponds to the maintenance contract level m. It usually ranges between the
most expensive repair, where the repair should be completed during the minimal
specified time Trmin after the failure occurrence, and the lowest cost, where the re-
pair should be completed during the maximal specified time Trmax after the failure
occurrence. Thus, Trmin Trm Trmax .
The problem is to find the expected RAC corresponding to each maintenance
contract. According to the suggested approach, this cost is represented by the total
expected reward, calculated via a specially developed Markov reward model.
7.3.1 Case Study: Maintenance Investigation for Aging Air

Conditioning System
Consider the air conditioning system used in hospitals (Lisnianski et al. 2008).
The system consists of two main online air conditioners and one air conditioner in
cold reserve. The workout schedule of the system is that the reserved air condi-
tioner comes online only when one of the main air conditioners has failed.
In the numerical calculation we used the following data. The increasing failure
rates of both air conditioners are described via a Weibull distribution with failure
rate function ( t ) = t 1 and parameters = 1.5849 and = 1.5021 for
the main air conditioners and = 4.1865 and = 1.3821 for the reserved air
conditioner. So, for the main air conditioner ( t ) = 3t 0.5021 and for the reserved air
conditioner * ( t ) = 10t 0.3821 .
The repair rates for the main and reserve air conditioners are the same,
m = m* , and may change from 7.7 day-1 up to 6 h-1, according to the mainte-
nance contract level (m). The repair cost ( crm ) also depends on the maintenance
contracts level. There are ten levels of maintenance contracts, available on the
market. They are characterized by the repair rate (MTTR-1, where MTTR is the
mean time to repair) and the corresponding repair cost per repair as shown in Ta-
ble 7.2. The operation cost, cop , is equal to $400 per year.
Using method we shall find the best maintenance contract level m that provides
a minimum of RAC during system lifetime T = 10 years.
Table 7.2 Maintenance contract characteristics
Maintenance contract level MTTR (d) Repair cost ($ / repair)

m 1 m crm
1 7.3 36
2 3.6 40
3 2.4 46
4 1.8 52
5 1.5 58
6 1.2 66
7 1.0 74
8 0.9 84
9 0.8 94
10 0.7 106
At first, we build an ordinary Markov model for this system and a Markov re-
ward model under the assumption that failure rates are constant.
The state-transition diagram for this system is presented in Figure 7.10.
Fig. 7.10 State-transition diagram for the system with two online air conditioners and one air
conditioner in cold reserve
There are six states in the state-space diagram. In state 6 both main air condi-
tioners are online and the reserve air conditioner is available. In state 5 one of the
main air conditioners failed and was replaced by the reserve air conditioner. In
state 4 the second main air conditioner failed; only the reserve air conditioner is
online. In state 3 the reserve air conditioner failed, and only one main air condi-
tioner is online. In state 2 the reserve air conditioner failed, and two main air con-
ditioners are online. In state 1 the system suffers complete failure.
According to the technical requirements, two online air conditioners are needed
and so there are three acceptable states states 2, 5, and 6, and 3 unacceptable
states states 3, 4, and 1. Any entrance to the set of unacceptable states is associ-
ated with the penalty cost, c p , equal to $1000 for each entrance.
Transitions from state 4 to state 5 and from state 1 to state 3 are associated with
the repair of one of the main air conditioners and have an intensity of 2. Transi-
tions from state 5 to state 6 and from state 3 to state 2 are associated with failure
of the main air conditioner and have an intensity of . Transitions from state 3 to
state 5, from state 2 to state 6, and from state 1 to state 4 are associated with the
repair of the reserve air conditioner and have an intensity of *.
Thus, the transition intensity matrix for MSS with two online air conditioners
and one air conditioner in cold reserve is as follows:
(2m + m* ) 0 2m m* 0 0
0 (2 + )
n *
m 2 n
0 0 m*
n
m ( + m + m* ) 0 *
0 (7.40)
a= m
.
n* 0 0 ( n* + 2m* ) 2m 0
0 0 n* n ( n + n* + m ) m*
0 0 0 0 2 n
2 n
And the system of differential equations is as follows:
dp1 (t )
dt = p3 (t ) + p4 (t ) (2 + ) p1 (t ),
n n* *

dp2 (t ) = p (t ) (2 n + * ) p (t ),
dt 3 2

dp3 (t ) = n* p (t ) + 2 n p (t ) + 2 p (t ) ( n + + * ) p (t ),
dt 5 2 1 3
(7.41)
dp4 (t ) = n p (t ) + * p (t ) ( n* + 2 ) p (t ),
dt 5 1 4
dp (t )
5 = 2 n p6 (t ) + 2 p4 (t ) + * p3 (t ) ( n + n* + ) p5 (t ),
dt
dp (t )
6 = p5 (t ) + * p2 (t ) 2 n p6 (t ).
dt
The system of differential equations (7.41) defines the ordinary Markov model
for the air conditioning system under the assumption that all failure rates are con-
stant.
At the second step we build for the given system a Markov reward model under
the assumption that failure rates are constant. To calculate the total expected re-
ward, the reward matrix for the system with two online air conditioners and one
air conditioner in cold reserve is built in the following manner.
If the system is in state 6, , or 2, the costs associated with the use of two air
conditioners (operation cost) must be paid during any time unit:
r66 = r55 = r22 = 2cop . If the system is in states 4 or 3, the rewards associated with
the use of only one air conditioner must be paid during any time unit:
r44 = r33 = cop . State 1 of the system is unacceptable, so there are no rewards asso-
ciated with this state: r11 = 0.
Transitions from state 5 to state 3 or 4, or from state 2 to state 3 is associated
with the failure of one of the main air conditioners and rewards associated with
this transitions are a penalty: r23 = r53 = r54 = c p . Transitions from state 4 or 3 to
state 1 are associated with complete system failure. The rewards associated with
these transitions are zero: r31 = r41 = 0. Transitions from state 1 to state 3 or 4,
from state 2 to state 6, from state 3 to state 2 or 5, from state 4 to state 5, and from
state 5 to state 6, are associated with the repair of the air conditioner, and the re-
ward associated with this transition is the mean cost of repair,
r13 = r14 = r26 = r32 = r35 = r45 = r56 = crm .
The reward matrix for system with two online air conditioners and one air con-
ditioner in cold reserve is as follows:
0 0 crm crm 0 0
0 2cop cp 0 0 crm
0 crm cop 0 crm 0
r = rij = . (7.42)
0 0 0 cop crm 0
0 0 cp cp 2cop crm
0 0 0 0 0 2cop
Taking into consideration transition intensity matrix (7.40), the system of dif-
ferential equations for the calculation of the total expected reward may be written
in the following manner (7.43).
The system of differential equations (7.43) defines the Markov reward model
for the air conditioning system under the assumption that all failure rates are con-
stant.
dV1n (t )
= crm (2 + * ) (2 + * )V1n (t ) + 2 V3n (t ) + *V4n (t ),
dt
dV2n (t )
= 2cop + 2c p n + crm * (2 n + * )V2n (t ) + 2 nV3n (t ) + *V6n (t ),
dt
dV3n (t )
= cop + crm ( + * ) + nV1 (t ) + V2n (t ) ( n + + * )V3n (t ) + *V5n (t ),
dt
dV4n (t ) (7.43)
= cop + 2crm + n*V1n (t ) ( n* + 2 )V4n (t ) + 2V5n (t ),
dt
dV5n (t )
= 2cop + c p ( n + n* ) + crm + n*V3n (t ) + nV4n (t ) ( n + n* + )V5n (t )
dt
+ V6n (t ),
n
dV6 (t )
dt = 2cop + 2V5 (t ) 2V6 (t ).
n n
Now using the method presented in Section 7.2, the RAC will be calculated for
any maintenance contract level m = 1, 2,...,10 (Table 7.2) by performing the fol-
lowing steps:
1. Define the system lifetime T = 10 years and number of time intervals N = 10.
2. Define the length of each time interval t = T N .

3. Calculate n and n + for any interval n using formulas (7.12) and (7.13). Re-
sults are presented in Table 7.3.
4. Find the lower and upper bounds for state probabilities Pjn (t ) and
Pjn + (t ), j = 1, 2,..., K for each time interval n via solution of the system of dif-
ferential equations (7.41). For the first time interval the initial conditions are as
follows: PK1 (0) = 1, PK11 (0) = ... = P11 (0) = 0 and PK1+ (0) = 1,
PK1+1 (0) = ... = P11+ (0) = 0. For other time intervals, the initial conditions are de-
fined according to recurrent formulas (7.32) and (7.33).
5. Find the lower and upper bounds of rewards Vi n and Vi n + accumulated during
time interval n via solution of the system of differential equations (7.43), sub-
stituting instead of n corresponding n and n + for any interval n.
6. Find the lower and upper bounds via formulas (7.34) and (7.35) for expected
rewards V n and V n+ for each time interval n as the weighted sums of re-
wards, defined in the previous step.
7. Calculate the lower and upper bounds of the total expected rewards accumu-
lated during the system lifetime via formulas (7.36) and (7.37) and the expected
RAC via formula (7.38).
Table 7.3 Calculation n and n+
Main air conditioner Reserve air conditioner

Number of time
interval n
n+
*n *n+
(year-1) (year-1) (year-1) (year-1)
1 3.0000 4.2488 10.0000 13.0324
2 4.2488 5.2082 13.0324 15.2163
3 5.2082 6.0175 15.2163 16.9843
4 6.0175 6.7309 16.9843 18.4959
5 6.7309 7.3762 18.4959 19.8304
6 7.3762 7.9698 19.8304 21.0335
7 7.9698 8.5224 21.0335 22.1346
8 8.5224 9.0416 22.1346 23.1535
9 9.0416 9.5328 23.1535 24.1046
10 9.5328 10.000 24.1046 25.0000
Such calculations should be performed for any maintenance contract level m.

The best maintenance contract should be chosen according to the minimum value
of the expected RAC.
Figure 7.11 shows the lower and upper bounds of the expected RAC for T = 10
years and N = 10 ( t = 1 year ) as a function of the MTTR.
4
x 10
4
Lower bound
Upper bound
3.5
Exact Value
Total expected reward
2.5
1.5
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.2
Mean time to repair (days)
Fig. 7.11 The lower and upper bounds and exact value of the total expected reward (RAC) ver-
sus MTTR
The MTTR, which provides the minimal expected RAC for the system, is 1.2 d.
Choosing a more expensive maintenance contract level, we pay an additional
payment to the repair team. Choosing a less expensive one, we pay more for pen-
alties because of more entrances to unacceptable states.
Decreasing the length of interval t , we decrease the difference between the
lower and upper bounds of the expected reliability associated cost. For example, if
t = 1 year, the lower and upper bounds of the expected RAC for MTTR = 1.2 d
are $19,372 and $21,388, respectively, and the difference is 10.4%. If t = 0.01
year, this difference is only 0.093%, and if t = 0.001 year, the difference is neg-
ligible, and value $20,324 may be accepted as the exact value of the expected
RAC. In Figure 7.11 the results of calculation for t = 0.001 year are presented as
functions of MTTR. Because of the very small difference the corresponding
curves in Figure 7.11 are presented as a single curve Exact Value.
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 299
7.4 Optimal Corrective Maintenance Contract Planning for

Aging Multi-state System
This section describes a technique for optimal corrective maintenance contract

planning for aging MSSs during a life-cycle period. The objective is to determine
a series of maintenance contracts for different operational periods that provides a
minimal total expected cost accumulated during the system life cycle subject to
availability constraints. The approach is based on consecutive applications of the
special Markov models for the computation of lower and upper bounds for total
expected cost and system availability. A piecewise approximation is used for in-
creasing failure rate functions for different time intervals of system components. A
GA is used as an optimization technique. The main advantage of the suggested
approach is that it can be easily implemented in practice by reliability engineers
for maintenance planning of practical aging MSSs because it is based solely on
Markov methods.
As in the previous section we define a system total cost (RAC) CTC as the cost
incurred by the user in operations and maintenance of the system during its life-
time. In that case, CTC will comprise operating costs, repair costs, and penalty
costs, accumulated during the system life cycle (7.39).
We suggest that T is system lifetime. During this time period the system may
be in acceptable states (system functioning) or in unacceptable ones (system fail-
ure). Every time unit that the system resides in each acceptable state is associated
with a corresponding reward that is defined by operating cost in this state. After
any failure, a corresponding repair action is performed and the system may be re-
turned to one of the acceptable states. Each of these transitions is associated with a
corresponding reward defined by the repair cost. In the general case, a reward may
be negative when we consider penalties or cost. Every entrance of the system into
a set of unacceptable states (system failure) is associated with a penalty, as is each
time unit of the systems residing in every unacceptable state.
Usually there are a number of different companies that offer maintenance ser-
vices for a technical system. From this point of view, a service market offers cus-
tomers different types of maintenance contracts. Such contracts have different pa-
rameters, related to the conditions of the services provided. The main parameters
are response time, service time, and costs (Almeida 2001; Jackson and Pascual
2008). Response time depends mainly on customer location. Service time de-
pends on the repair teams professional skills and the required equipment. Gener-
ally, a faster response and more qualified repair team provide more expensive ser-
vices. We say that these parameters determine the corrective maintenance contract
level.
A maintenance contract (in this section we are dealing only with corrective
maintenance) is an agreement between the repair team and the system's owner that
guarantees a specific level of service (Murthy and Yeung 1995). The maintenance
contract defines some important parameters that determine the service level and
corresponding costs. The main time parameters are mean response time and mean
repair time. Without loss of generality, here we will deal with only one parameter,
mean repair time, Trm , where m ( m = 1, 2,..., M ) is a possible maintenance con-
tract level and M is the number of such levels. In additional it should be taken into
account that each maintenance contract has fixed a expiration date. For example, it
may be an agreement for 1 year only and then the system owner can get another
maintenance contract from a number of contracts available on the market. There-
fore, for the entire system lifetime T a sequence of maintenance contracts
m1 , m1 ,..., mL will define MSS maintenance and L is the number of different con-
tract periods.
A repair cost crm for individual repair action depends on repair time, and so it
corresponds to a maintenance contract level m. It usually ranges between the most
expensive repair, where the repair should be completed during the minimal speci-
fied time Trmin after failure occurrence, and the lowest cost, where the repair
should be completed during a maximal specified time Trmax after failure occur-
rence. Thus, Trmin Trm Trmax .
According to the generic multi-state model, the system or system components
can have different states corresponding to various performance levels, represented
by the set g = { g1 ,...g K } . The set is ordered so that gi gi 1 . The failure rate is
defined as the transition intensity of the system or components for any transition
from state i to state j , i > j ( gi g j ). In this section we are dealing only with
minimal repairs. So after each maintenance the failed system is returned back to
its working state (the reliability remains as bad as old after repair), in which the
failure rate of the system is the same as before the repair.
The repair cost crm corresponding to repair time Trm depends on the mainte-
nance contract level m. Therefore, the system total expected cost also depends on
m
the maintenance contract level m and can be designated as E[CTC ], where E is an
expectation symbol.
MSS availability Aw (t ) according to Lisnianski and Levitin (2003) is treated as
the probability that MSS at instant t > 0 will be in one of the acceptable states,
where the system performance is greater than or equal to the required demand
level w.
The problem is to find a maintenance contract from the sequence of mainte-
nance contracts m1 , m2 ,..., mL that minimizes the total expected cost accumulated
during system lifetime T and provides the desirable system availability level,
which is defined as the system availability Aw (T ) at lifetime T that is larger than a
pre-defined value Aw0 (T ).
Thus, mathematically the problem can be formulated as follows:
Find
min
m1,m2 ,...,mL {E C } ,
m
TC (7.44)
subject to availability constraint
Aw (T ) Aw 0 (T ) . (7.45)
The ordinary Markov model for MSSs was built under the assumption that time
to failure and time to repair are distributed exponentially and there is no aging in
the system (failure rate function is constant in the lifetime), which cannot be di-
rectly used in the availability estimation for aging MSSs. However, using the
technique proposed in Section 7.2, the failure rates can be assumed to be constant
values for a specific time interval but vary with different time intervals. Therefore,
an ordinary Markov model can be used iteratively in different time intervals to
calculate the corresponding system availability.
7.4.1 Algorithm for Availability and Total Expected Cost Bound

Estimation
The suggested algorithm for the calculation of the total expected cost and avail-
ability for any maintenance contract level m includes the following steps:
1. Set the system lifetime T years and number of time intervals N.
2. Calculate the length of each time interval t = T N .
3. Calculate and
n n+
for any interval n, n = 1,..., N according to formulas
(7.12) and (7.13).
4. Calculate the state probabilities Pjn and Pjn + , j = 1, 2,..., K at the end of each
time interval n as described in Section 7.2.2.
5. Calculate the lower and upper bounds for the MSSs availability during system
lifetime as described in Section 7.2.2. The MSSs availability is within the
bounds.
6. Calculate the lower and upper bounds of rewards Vi n and Vi n + , accumulated
during each time interval n as described in Section 7.2.3.
7. Calculate the lower and upper bounds for expected rewards V n and V n+ for
each time interval n via expressions (7.34) and (7.35) as the weighted sums.
8. Calculate the lower and upper bounds for the total expected reward accumu-
lated during system lifetime via formulas (7.36)(7.38). The system total ex-
pected cost is within the bounds.
7.4.2 Optimization Technique Using Genetic Algorithms
A GA is described in Appendix A. To apply a GA to our specific problem one has

to define the solution representation. A numerical string with length L is used to
encode a solution. A typical chromosome representing maintenance contracts for 5
years is formed as shown in Figure 7.12.
Year 1 Year 2 Year 3 Year 4 Year 5
1 2 2 3 6
Maintenance contract structure

Fig. 7.12 GA chromosome structure
The sequence m1 , m1 ,..., mL of L numerical items represents the maintenance

contract levels for L time periods. When the maintenance agreement is annual,
each time period is equal to 1 year, when the agreement is biannual the time pe-
riod is equal to 2 years, and so on. For each item an integer between [1, M] should
be randomly generated (M is the maximum available contract level). As in Figure
7.12, contract m1 = 1 is selected for the first year, contract m2 = 2 is selected for
the second year, contract m3 = 2 is selected for the third year, contract m4 = 3 is
selected for the fourth year and contract m5 = 6 is selected in the fifth year.
To let the GA look for the solution with the minimal expected total cost under
the availability requirement, the solution quality (fitness) is evaluated.
If the lower (upper) bound approximation of failure rates is used in the calcula-
tion, the fitness functions FN and FN+ , depending on the number of time intervals
N, can be formulated as
FN = q max(0, Aw0 (T ) Aw (T )) + TER ,

(7.46)
FN+ = q max(0, Aw 0 (T ) Aw+ (T )) + TER + ,
where q is a sufficiently large penalty coefficient, Aw0 (T ) is the required avail-

ability, Aw (T ) and Aw+ (T ) are, respectively, the lower and upper bounds of
MSSs availability, and TER and TER + are the lower and upper bounds of the to-
tal expected reward, calculated as described in Section 7.3.1.
The procedure to solve the optimization problem of optimal contract planning
using the proposed GA should be executed for lower and upper bound approxima-
tions (fitting functions FN and FN+ , respectively). It consists of the following

steps:
1. Set the number of time intervals N and generate an initial population of solu-
tions randomly.
2. Decode each solution and evaluate their fitness (F
N and FN+ ) using expres-
sions (7.46) and the algorithm for bound estimations of failure rates from Sec-
tion 7.3.1.
3. Select two solutions randomly and produce pairs of new solutions for the next
generations using a crossover procedure. The one-point or two-point crossover
method has been used to obtain the feasibility of solutions.
4. Allow the offspring to mutate with probability Pmut . Mutation results in slight
changes in the spring structure and maintains a diversity of solutions. This pro-
cedure avoids premature convergence to a local optimum and facilitates jumps
in the solution space.
5. Apply a selection procedure that compares the new offspring with the worst so-
lutions in the population and selects the best one. The best solution joins the
population and the worst one is discarded.
6. Repeat steps 25 until the stopping criterion is satisfied. The stopping criterion
of the GA can be the fixed number of genetic cycles or a number of genetic cy-
cles without improving the solution performance or both (Lisnianski and
Levitin 2003). The convergence criterion in the proposed GA is set as satisfy-
ing both a minimal number of genetic cycles and a number of genetic cycles
without improving the solution performance.
7. Choose the best solution from the population and evaluate its cost for the lower
and upper bounds of the total expected rewards. If the best solutions S N for
fitness function FN and S N+ for fitness function FN+ are the same, stop the al-
gorithm. The optimal solution is Sopt = S N = S N+ . If they are not the same, we
increase the number of time intervals N and execute steps 16.
7.4.3 Case Study: Optimal Corrective Maintenance Contract for

Aging Air Conditioning System
7.4.3.1 System Description and Data
Consider an air conditioning system used in hospitals that consists of two inde-
pendent air conditioners. Each air conditioner has three states: a perfectly func-
tioning state, a deteriorating state, and a complete failure state. We consider two
types of MSS failures in the model: major failure and minor failure. A major fail-
ure causes the air conditioner to transition from the perfectly functioning state to
the complete failure state, while a minor failure causes the air conditioner to tran-
sition from the perfectly functioning state to a deteriorating state or from the dete-
riorating state to the complete failure state. A major repair returns the air condi-
tioner from the complete failure state to the perfectly functioning state, while a
minor repair returns the air conditioner from the deteriorating state to the perfectly
functioning state. The state-space diagrams for the first and second air condition-
ers are shown in Figures 7.13 and 7.14, respectively.
Fig. 7.13 State-transition diagram for Fig. 7.14 State-transition diagram for condi-
conditioner 1 tioner 2
The cooling capacities (performance levels) of the first air conditioner are
g31 = 233 kW, g 12 = 150 kW, and g11 = 0 for states 3, 2 and 1, respectively. The
cooling capacities of the second air conditioner are g32 = 220 kW, , g 22 = 130 kW,
and g12 = 0 for states 3, 2 and 1, respectively.
The state-space diagram of the system is presented in Figure 7.15. All the sys-
tem states are generated as combinations of all possible states of air conditioners.
There are nine system states: state 9 is a perfectly functioning state, state 1 is a to-
tal failure state, and other states are deteriorating states. The cooling capacities of
the system states are g31 + g32 = 453 kW, g 12 + g32 = 370 kW, g31 + g 22 = 363 kW,
g 12 + g 22 = 280 kW, g31 + g12 = 233 kW, g11 + g32 = 220 kW, g12 + g12 = 150 kW,
g11 + g 22 = 130 kW, and g11 + g12 = 0.
The increasing failures rates of the system are described as linear functions:
g1 , g1 (t ) = 10 + 0.4t , g1 , g1 (t ) = 10 + 0.2t , g1 , g1 (t ) = 12 + 0.6t ,
3 1 2 1 3 2
g 2 2 (t ) = 11 + 0.402t , g 2 , g 2 (t ) = 10 + 0.2t , g 2 , g 2 (t ) = 15 + 0.5t.

3 , g1 2 1 3 2
Fig. 7.15 State-transition diagram for system

There are eight repair contracts available on the market. Each contract is char-
acterized by repair rates ( 1 / Trmax 1 / Trmin ) and repair costs for different kinds
of failures as shown in Table 7.4. The system owner may select a repair contract
for each year.
The system cooling load (w) is 300 kW. The operational cost is
Cop = $0.06 / kWh. In states 16, the system cooling capacity is lower than the
demand. These states constitute the set of unacceptable states. For each entrance
into the set of unacceptable states, a penalty cost of C p = $500 should be paid.
States 79 constitute the set of acceptable states.
Table 7.4 Repair contract characteristics
Cost for Cost for Cost for Cost for

Repair rate Repair rate Repair rate Repair rate
Contract each re- each re- each re- each re-
number pair g11 , g31 pair g12 , g31 pair g12 , g32 pair g2 ,g2
2 3
m r , m
C g1 , g1 ($) year 1 r , m
Cg1 , g1 ($) year 1 r , m
C g 2 , g 2 ($) year 1 r , m
C g 2 , g 2 ($) year 1
1 3 2 3 1 3 2 3
1 359 100 160 120 320 100 150 100

2 381 150 183 160 361 145 163 150
3 393 175 193 185 384 165 183 175
4 415 220 220 250 405 200 200 245
5 456 300 260 320 445 300 270 310
6 654 600 450 700 645 600 430 700
7 938 900 527 800 900 900 505 800
8 2763 1800 1200 2000 2500 1800 1100 1800
The system instantaneous availability defined by the sum of probabilities of ac-

ceptable states is
Aw (t ) = p7 (t ) + p8 (t ) + p9 (t ).
The problem is to find the optimal sequence of repair contracts for each year
that minimizes the system total expected cost accumulated during a lifetime of
T = 10 years and provides the required availability Aw0 (T ) = 0.97.
For a specific time interval n, the lower n or upper n + bounds of the ( t )
are used to represent failure rates in Equations 7.15 and 7.16. By solving these
systems of differential equations, we can determine state probabilities
Pjn = p nj (t n) and Pjn + = p nj + (t n) at the end of each time interval
tn = [ t (n 1), t n ] , 1 n N . Probability Pjn determines the probability of
state j at the end of time interval n if the failure rates during this time interval n are
constant and equal to lower bounds n of ( t ):
dp1n (t )
dt
( 1 3 1 3
)
= g1 , g1 + g 2 , g 2 p1n (t ) + gn2, g 2 p2n (t ) + gn1, g1 p3n (t )
2 1 2 1
+g 2 , g 2 p4 (t ) + g1 , g1 p5 (t ),
n n n n
3 1 3 1
dp2n (t )
dt
( 3 1 3 2 1 3 2
)
= g 2 , g 2 + g1 , g1 + gn2, g 2 p2n (t ) + gn2, g 2 p4n (t ) + gn1, g1 p6n (t )
2 2 1
+ g1 , g1 p7 (t ),
n n
3 1
( )
n
dp (t )
3
= g1 , g1 + g 2 , g 2 + gn1, g1 p3n (t ) + gn1, g1 p5n (t ) + gn2, g 2 p6n (t )
dt 2 3 1 3 2 1 3 2 2 1
+g 2 , g 2 p8 (t ),
n n
3 1
dp4n (t )
dt 2 3 3 2
(3 1 1 3
)
= g 2 , g 2 p1n (t ) + g 2 , g 2 p2n (t ) gn2, g 2 + gn2, g 2 + g1 , g1 p4n (t )
1 3
+g1 , g1 p8 (t ) + g1 , g1 p9 (t ),
n n n n
2 1 3 1
dp5n (t )
dt 1 3 2 3 1 3
(
3 2 3 1
)
= g1 , g1 p1n (t ) + g1 , g1 p3n (t ) g 2 , g 2 + gn1, g1 + gn1, g1 p5n (t )
+g 2 , g 2 p7 (t ) + g 2 , g 2 p9 (t ),
n n n n
2 1 3 1
( )
n
dp (t )
6
= g1 , g1 + g 2 , g 2 + gn2, g 2 + gn1, g1 p6n (t ) + gn1, g1 p7n (t )
dt 2 3 2 3 2 1 2 1 3 2
+g 2 , g 2 p8 (t ),
n n
3 2
dp7n (t )
dt 2 3 2 3
(
2 1 3 1 3 2
)
= g1 , g1 p2n (t ) + g1 , g1 p6n (t ) g 2 , g 2 + gn2, g 2 + gn1, g1 + gn1, g1 p7n (t )
1 3
+g 2 , g 2 p9 (t ),
n n
3 2
dp8n (t )
dt 2 3 2 3
(
2 1 3 1 3 2
)
= g 2 , g 2 p3n (t ) + g 2 , g 2 p6n (t ) g1 , g1 + gn1, g1 + gn2, g 2 + gn2, g 2 p8n (t )
1 3
+g1 , g1 p9 (t ),
n n
3 2
n
dp (t )
9
= g1 , g1 p4n (t ) + g 2 , g 2 p5n (t ) + g 2 , g 2 p7n (t ) + g1 , g1 p8n (t )
dt 1 3 1 3 2 3 2 3
(
gn2, g 2 + gn1, g1 + gn2, g 2 + gn1, g1 p9n (t ).
3 1 3 1 3 2 3 2
)
A similar system of differential equations for the calculation of state probabili-
ties Pjn + can also be obtained if the upper bounds n + of ( t ) are used to repre-
sent the failure rates of the system.
For a specific time interval n, the lower n or upper n + bounds of ( t ) are
used to represent failure rates in Equations 7.26 and 7.27. By solving these sys-
tems of differential equations, we can determine rewards Vi n and Vi n + accumu-
lated during each time interval tn , 1 n N . For Vi n of evaluation the system of

differential equations is as follows:
dV1n (t )
dt
= 0cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 g1 , g1 + g 2 , g 2 V1n (t )
1 3 1 3 1 3 1 3 1 3 1 3
( )
+ g 2 , g 2 V4 (t ) + g1 , g1 V5 (t ),
n n
1 3 1 3
dV2n (t )
= 130cop + g 2 , g 2 cgr ,2m, g 2 + g1 , g1 cgr 1, m, g1 + gn2, g 2 V1n (t )
dt 2 3 2 3 1 3 1 3 2 1
( g 2 , g 2 + g1 , g1 + g 2 , g 2 )V2 (t ) + g 2 , g 2 V4n (t ) + g1 , g1 V7n (t ),

n n
2 3 1 3 2 1 2 3 1 3
dV3n (t )
= 150cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V1n (t )
dt 2 3 2 3 1 3 1 3 2 1
( 2 3 1 3 2 1
)
g1 , g1 + g 2 , g 2 + gn1, g1 V3n (t ) + g1 , g1 V5n (t ) + g 2 , g 2 V8n (t ),
2 3 1 3
n
dV (t )
4
= 220cop + g1 , g1 cgr 1, m, g1 + gn2, g 2 V1n (t ) + gn2, g 2 V2n (t )
dt 1 3 1 3 3 1 3 2
(
g1 , g1 + gn2, g 2 + gn2, g 2 V4n (t ) + g1 , g1 V9n (t ),
1 3 3 2 3 1
) 1 3
n
dV (t )
5
= 233cop + g 2 , g 2 cgr ,2m, g 2 + gn1, g1 V1n (t ) + gn1, g1 V3n (t )
dt 1 3 1 3 3 1 3 2
(
g 2 , g 2 + gn1, g1 + gn1, g1 V5n (t ) + g 2 , g 2 V9n (t ),
1 3 3 2 3 1
) 1 3
n
dV (t )
6
= 280cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V2n (t ) + gn2, g 2 V3n (t )
dt 2 3 2 3 2 3 2 3 2 1 2 1
(
g1 , g1 + g 2 , g 2 + gn2, g 2 + gn1, g1 V6n (t ) + g1 , g1 V7n (t )+ g 2 , g 2 V8n (t ),
2 3 2 3 2 1 2 1
) 2 3 2 3
( )c
n
dV (t )
7
= 300cop + gn2, g 2 + gn1, g1 + gn1, g1 p + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V2n (t )
dt 2 1 3 1 3 2 2 3 2 3 3 1
+ n
g 22 , g12
V (t ) +
5
n n
g31 , g12
V (t ) g 2 , g 2 +
6
n
( 2 3
n
g 22 , g12
+ n
g13 , g11
+ n
g13 , g12 )V 7
n
(t )
+ g 2 , g 2 V9n (t ),
2 3
dV8n (t )
dt 2 2 1 3
(
= 300cop + gn2, g 2 + gn1, g1 + gn2, g 2 c p + g1 , g1 cgr 1, m, g1 + gn2, g 2 V3n (t )
3 1 2 3 2 3 3 1
)
2 1 3 2
(
+ gn1, g1V4n (t ) + gn2, g 2 V6n (t ) g1 , g1 + gn1, g1 + gn2, g 2 + gn2, g 2 V8n (t )
2 3 2 1 3 1 3 2
)
+ g1 , g1 V (t ), 9
n
2 3
( )
n
dV (t )
9
= 300cop + gn2, g 2 + gn1, g1 c p + gn1, g1V4n (t ) + gn2, g 2 V5n (t )
dt 3 1 3 1 3 1 3 1
3 2 3 2
(
+ gn2, g 2 V7n (t ) + gn1, g1 V8n (t ) gn2, g 2 + gn1, g1 + gn2, g 2 + gn1, g1 V9n (t ).
3 1 3 1 3 2 3 2
)
A similar system of differential equations for the reward Vi n + can also be ob-
tained if the upper bounds of ( t ) are used to represent the failure rates of the sys-
tem.
Ten years have been separated into 120 intervals, and each interval is 1 month.
The failure rate has been approximated by the lower and upper bounds for each of
these intervals. The proposed GA has been used to determine the optimal mainte-
nance contract schedule. The stopping criterion is set to perform at least 120 ge-
netic cycles, and there are at least 5 consecutive genetic cycles without improve-
ment of the solution performance. The population size in the GA is 40.
The convergence characteristics of the proposed GA using the lower bound ap-
proximation of failure rates are shown in Figure 7.16.
Fig. 7.16 Convergence characteristics of GA using lower bound approximation
It can be seen that the GA converges to optimal solutions by performing about

76 iterations. The optimal sequence of maintenance contracts is

Sopt = S120 +
= S120 = {1,1, 2, 2, 2, 2, 2, 2, 2, 2} . This sequence defines the number of
contracts for the year for 10 years. The lower and upper bounds of the expected
total system cost is shown in Table 7.3. It can be seen that the difference between
the upper bound and the lower bound of the system total expected cost accumu-
lated over 10 years is relatively small (about 0.00024).
If the required availability Aw0 (T ) is increased from 0.97 to 0.99, the optimal
sequence of maintenance contracts is {4, 4, 4,5,5, 7,5,5, 6, 6} for 10 years from the
first year up to the tenth year, respectively. The lower and upper bounds of the
expected total system cost is shown in Table 7.5. With the increase of the required
availability, the customer will select relatively expensive maintenance contract
schedules. Correspondingly, the system expected total cost also increases.
Table 7.5 Lower and upper bounds of system total expected cost
Total expected cost ($)

Required availability Required availability
Aw0 (T ) = 0.97 Aw0 (T ) = 0.99
Lower bound 1784340 1889930
Upper bound 1784773 1890715
7.5 Optimal Preventive Replacement Policy for Aging Multi-

state Systems
Generally maintenance optimization problems belong to the class of reliability al-

location problems (Kuo and Prasad 2000). Enhancing the reliability of system ele-
ments by performing additional maintenance actions leads to the improvement of
the overall system availability but, on the other hand, increases the system mainte-
nance cost. In MSSs, where roles of different elements in improving the system
availability depends on both their performance distribution and their place in the
system, the optimal distribution of the limited maintenance resources is a compli-
cated combinatorial problem.
In this section we consider an aging MSS that consists of elements having fail-
ure rates increasing with time. Perfect preventive maintenance is aimed at reduc-
ing the failure rates by making the elements as good as new, or by replacing the
existing elements with new ones. Further on, we will refer to such perfect preven-
tive maintenance as preventive replacement (PR). An alternative type of mainte-
nance activity, corrective maintenance (which was considered in the previous sec-
tion), is aimed at making the system operable at the minimal cost when a failure
occurs. Such activity, called minimal repair (MR), which enables the system ele-
ment to continue its work but does not affect the failure rate of the element, may
be much less expensive. The PR of elements with a high risk of failure reduces the
chance of failure but can cause significant expenses, especially in systems with a
high replacement rate. In general, maintenance policies that compromise between
PR and MR aim at achieving an optimal solution for problems with different crite-
ria.
It is recognized that obtaining a component lifetime distributions is the bottle-
neck in the implementation of existing maintenance optimization approaches. But
often a reliability engineer can relatively easily obtain information about the ex-
pected number of element failures in any time interval. This information usually
can be obtained from experimental data or from expert opinion. The analytical ex-
pressions for this function have been covered extensively in Kececioglu (1991)
and used in many problems of maintenance optimization based on increasing fail-
ure rate models (Munoz et al. 1997; Monga and Zuo 1998; Martorell et al. 1999;
7.5 Optimal Preventive Replacement Policy for Aging Multi-state Systems 311
Coit and Smith 1996). The failure rate function obtained from experts can be rep-
resented in tabular form (van Noortwijk et al. 1992). For MR, when duration is
relatively small as compared to the time between failures, the expected number of
failures is equal to the expected number of repairs for any time interval. Thus, it is
possible to obtain the renewal function of each element expected number of re-
pairs at time interval [0,t). This expected number of element failures/repairs N(tj)
can be estimated for different time intervals [0,ti) between consecutive PRs.
In this section, we consider the determination of the optimal schedule of cyclic
PRs for MSS with a given series-parallel configuration and two-state elements.
Each element of this system is characterized by its nominal performance and re-
newal function, obtained from experimental data or elicited from expert opinion.
The times and costs of the two types of maintenance activity (PR and MR) are
also available for each system element. The objective is to provide the desired sys-
tem availability at a minimal total maintenance cost and penalty costs caused by
system mission losses (performance deficiency).
The presented method presumes independence between replacement and repair
activities for different system elements. Such an assumption is justified, for exam-
ple, in complex distributed systems (power systems, computer networks, etc.)
where the information about system element repairs and replacements may be in-
accessible for the maintenance staff servicing the given element. In the general
case, the method, which assumes independence of maintenance actions in the sys-
tem, gives the worst estimation of system availability.
Another important assumption is that repair and replacement times are much
smaller than time between failures. In this case, the probability of replacement and
repair event coincidences may be neglected.
In systems with cyclic variable demand (double-shift job-shop production,
power or water supply, etc.), the PR can be performed in periods of low demand
even if the repairs of some of the system elements are not finished. For example,
in power generation systems some important elements may be replaced at night
when the power demand is much lower than the nominal demand. In these cases,
the replacement time may be neglected and all the maintenance actions may be
considered as independent.
7.5.1 Problem Formulation
We consider a MSS that consists of n elements composing a given structure ac-

cording to system structure function f.
For each element j (1 j n) its nominal performance rate gj, expected pre-
ventive time, and corrective maintenance time and costs are given as well as a re-
newal function representing the expected number of element failures/repairs in the
time interval (0,t). For any replacement interval tj for each element j, one has the
expected number of failures and repairs during the period between preventive re-
placement actions Nj(tj). The replacement interval may be alternatively defined by

the number of preventive replacement actions xj during the system life cycle T:
( )
t j = T x j + 1 . Therefore, the expected number of failures of element j during the
(
system life cycle is x j + 1 N j T ) ( (x j ))
+ 1 cj .
Under the formulated assumptions, the expected time that the jth system ele-
ment is unavailable can be estimated by the following expression:

(x j
)
+1 N j T + x j pj ,
(
x j + 1 cj )
(7.50)

where cj and pj are PR and MR times, respectively.

Now one can define the availability of each element as

T ( x j + 1) N j T x j pj
( x j + 1) cj
pj = , (7.51)
T
the total expected maintenance time tot during the system life cycle as
n

tot = ( x j + 1) N j T ( x + 1) cj + x j pj , (7.52)
j =1 j
and the expected maintenance cost Cm during the system life cycle as
n

Cm = ( x j + 1) N j T ccj + x j c pj , (7.53)
j =1 ( x + 1)
j
where ccj and cpj are corrective and preventive maintenance costs, respectively.
Having the steady-state performance distribution of each system element j
( g = {0, g } , p = {(1 p ) , p }) , one can obtain the entire system steady-state
j j j j j
output performance distribution using the UGF method (Chapter 4), and for the
given steady-state demand distribution w, q, one can obtain the system steady-
state reliability indices: the availability A and the expected performance deficiency
D.
The total unsupplied demand cost during the system life cycle T can be esti-
mated as
Cud = T cu D, (7.54)
where cu is a specific cost of unsupplied demand.

Defining the system replacement policy by the vector x = { x j } , 1 j n, one
can give two formulations of the problem of system maintenance optimization.
Formulation 1 Find the minimal maintenance cost system replacement policy x*
that provides the required MSS availability level while the total maintenance time
does not exceed a prespecified limitation:
Cm ( x*) min (7.55)
subject to
A( x*) A ',
(7.56)
tot ( x*) '.
Formulation 2 Find the system replacement policy x* that minimizes the total
maintenance and unsupplied demand cost while the total maintenance time does
not exceed a prespecified limitation:
Cm ( x*) + Cud ( x*) min (7.57)
subject to
tot ( x* ) '. (7.58)
In the general case, one can use the following formulation:
Cm ( x*) + Cud ( x*) min (7.59)
subject to
A( x*) A ',
(7.60)
tot ( x*) ',
which can be reduced to (6.26) by defining cu = 0 and to (6.27) by defining

A ' = 0.
7.5.2 Implementing the Genetic Algorithm
Different elements can have different possible numbers of PR actions during the
system lifetime. The possible maintenance alternatives (number of PR actions) for
each system element j can be ordered in vector Y j = { y j1 ,..., y jK } , where yji is the
number of preventive maintenance actions corresponding to alternative i for sys-
tem element j. The same number K of possible alternatives (length of vectors Yj)
can be defined for each element. If, in practical problems, the number of alterna-
tives differs for different elements, some elements of shorter vectors Yj can be du-
plicated to provide equality of the vectors length.
Each solution is represented by integer string a = {a1 ,..., an } , where
a j (1 a j K ) represents the number of maintenance alternative applied to ele-
ment j. Hence, the vector x for the given solution, represented by string a is
{ }
x = y1a1 ,..., ynan . For example, for a problem with n = 5, K = 4,
Y1 = Y2 = Y3 = {2,3, 4,5} , and Y4 = Y5 = {20, 45,100,100} , string a = {1, 4, 4,3, 2}
represents a solution with x = {2, 5, 5, 100, 45} . Any arbitrary integer string with
elements belonging to the interval (1, K ) represents a feasible solution.
For each given string a the decoding procedure first obtains the vector x and es-
timates N(xj) for all the system elements 1 j n, then calculates availability in-
dices of each two-state system element using expression (7.51), and determines
the entire system steady-state output performance distribution using the UGF
method in accordance with the specified system structure and given steady-state
performance distributions of the elements. It also determines tot and Cm using ex-
pressions (7.52) and (7.53). After obtaining the entire system steady-state output
performance distribution, the procedure evaluates A and Cud using expressions
(4.29), (1.21), (4.34), (1.31), and (7.54).
In order to let the GA look for the solution with the minimal total cost, and with
A that is not less than the required value A ' and tot not exceeding ', the solution
fitness is evaluated as follows:
M Cud (a) Cm (a) (1 + A ' A(a) ) 1( A(a) < A ')

(7.61)
(1 + tot (a) ' ) 1( ' < tot (a)),
where is a penalty coefficient and M is a constant value.

7.5.3 Case Study: Optimal Preventive Maintenance for Aging

Water Desalination System
Series-parallel water desalination system consists of four components containing

14 elements of 8 different types. The structure of the system (which belongs to the
type of flow transmission MSS with flow dispersion) is presented in Figure 7.17.
Each element is marked with its type number. Table 7.6 contains parameters of
each element including its N(t) function (replacement period t in months is given
for each x), estimated using expert judgments. Times are measured in months. The
element nominal capacity is measured as a percentage of maximal system demand.
Fig. 7.17 Series-parallel water desalination system
All the replacement times in the system considered are equal to 0.5 h (0.0007
month). The corrective maintenance includes fault location search and turning of
the elements, so it takes much more time than preventive replacement, but repairs
are much cheaper than replacements.
Table 7.6 Characteristics of system elements
Renewal function N(t)
t: 24 12 8 6 4.8 4
Element g cp cc c
x: 5 10 15 20 25 30
1 0.40 3.01 0.019 0.002 25 10.0 5.0 2.0 1.00 0.50
2 0.30 2.21 0.049 0.004 26 9.0 2.0 0.6 0.20 0.05
3 0.60 2.85 0.023 0.008 20 4.0 1.0 0.3 0.08 0.01
4 0.15 2.08 0.017 0.005 36 14.0 9.0 6.0 4.00 3.00
5 0.15 1.91 0.029 0.003 55 15.0 7.0 4.0 0.32 0.30
6 0.25 0.95 0.031 0.009 31 9.5 5.6 4.0 2.70 2.00
7 1.00 5.27 0.050 0.002 13 3.2 1.4 0.8 0.50 0.10
8 0.70 4.41 0.072 0.005 5 2.0 1.0 0.4 0.10 0.01
The demand distribution is presented in Table 7.7. The total life cycle T is 120
months and the cost of 1% of unsupplied demand for 1 month is cu = 10 conven-
tional units.
Table 7.7 Demand distribution
w 1.00 0.80 0.50 0.20

q 0.60 0.25 0.05 0.10
For the sake of simplicity, we use in this example the same vector of replace-
ment frequency alternatives for all the elements. The possible number of replace-
ments during the system life cycle varies from 5 to 30 with step 5. The chosen pa-
rameters of the fitness function (7.60) are M = 5000, = 2000. First obtained
were the solutions for the first formulation of the problem in which unsupplied
demand cost is not considered. (Three different solutions are presented in Table
7.8.) One can see the total maintenance time and cost as functions of system avail-
ability in Figures 7.18 and 7.19. Note that each point of the graph corresponds to
an optimal solution.
Then the unsupplied demand cost was introduced and the problem was solved
in its second formulation. The solutions corresponding to the minimal and maxi-
mal possible system availability (minimal and maximal maintenance cost) are pre-
sented in Table 7.8, as is the optimal solution, which minimizes the total cost. One
can see that the optimal maintenance solution allows about 50% total cost reduc-
tion to be achieved in comparison with minimal Cm and minimal Cud solutions.
Table 7.8 Optimal maintenance solutions obtained
Cud Cm Cud+Cm tot A x
Formulation 1
{5,5,5,5,5,5,5,5,
A' = 0.96 0.0 263.1 263.1 9.2 0.9606
10,10,10,10,5,5}
{5,5,5,5,10,10,5,5,
A' = 0.97 0.0 296.6 296.6 7.7 0.9700
10,10,25,25,5,5}
{5,5,5,5,15,15,10,
A' = 0.98 0.0 384.4 384.4 5.85 0.9800
10,10,25,25,25,5,5}
Formulation 2
{5,5,5,5,5,5,5,
Minimal Cm 1029.5 249.1 1278.6 11.61 0.9490
5,5,5,5,5,5,5}
{30,30,30,30,30,30,30,
Minimal Cud 156.4 1060.3 1216.7 2.47 0.9880
30,25,25,30,30,30,30}
(maximal A)
{5,5,5,5,20,20,10,
Minimal Cm+Cu 256.2 397.4 653.5 6.02 0.9800
10,10,10,30,30,5,5}
{10,20,20,20,25,20,30,
Minimal Cm+Cud, '=3 181.7 674.7 856.4 2.99 0.9877
30,25,25,30,30,10,5}
General formulation
Minimal Cm+Cud, {5,5,5,5,25,25,10,

192.8 498.1 690.9 4.96 0.9850
'=5.5, A'=0.985 10,25,25,30,30,10,5}
450 12
Total maintenence cost
400 10
Cost
350 8
300 6
250
0.95 0.96 0.97 0.98 0.99 4
0.95 0.96 0.97 0.98 0.99
Availability Availability
Fig. 7.18 Total maintenance cost as func- Fig. 7.19 Total maintenance time as function
tion of system availability of system availability
The influence of the total maintenance time constraints is illustrated in Figures

7.20 and 7.21, where the costs and the system availability indices are represented
as functions of tot. Observe that a reduction in the allowable maintenance time
causes the system availability and total cost to increase. The interesting exception
is when the maintenance time decreases from 5 to 4.5 months. In this case, the
variations in maintenance policy cause additional expenses without system avail-
ability enhancement. The solution of the general formulation of the problem where
' = 5.5 and A ' = 0.985 is also presented in Table 7.8.
1200 0.99
1000 0.988
Availability
800 0.986
Cost
600
0.984
400
0.982
200
3 4 5 6 0.98
Maintenance time limitation 3 4 5 6
Maintenance time limitation
Cud+Cm Cm Cud Fig. 7.21 Steady-state availability under
Fig. 7.20 System cost under maintenance maintenance time limitations
time limitations
References
Almeida de AT (2001) Multicriteria decision making on maintenance: spares and contract plan-
ning. Eur J Oper Res 129:235241
Asgharizadeh E, Murthy DNP (2000) Service contracts: A stochastic model. Math Comp Model
31:1120
Bagdonavicius V, Nikulin M (2002) Accelerated life models. Chapman & Hall/CRC, Boca
Raton, FL
Barlow R, Proshan F (1975) Statistical theory of reliability and life testing. Holt, Rinehart and
Winston, New York
Coit D, Smith A (1996) Reliability optimization of series-parallel systems using genetic algo-
rithm. IEEE Trans Reliab 45(2):254266.
Ding Y, Lisnianski A, Frenkel I et al (2009) Optimal corrective maintenance contract planning
for aging multi-state system. Appl Stoch Models Bus Ind 25(5):612631
Finkelstein M (2003) A Model of aging and a shape of the observed force of mortality. Lifetime
Data Anal 9:93109
Finkelstein M (2005) On some reliability approaches to human aging. Int J Reliab Qual Saf Eng
12(4):337346
Finkelstein M (2008) Failure Rate Modelling for Reliability and Risk. Springer, London
Gertsbakh IB (2000) Reliability Theory with Applications to Preventive Maintenance, Springer,
Berlin
Gertsbakh IB, Kordonsky KB (1969) Models of failure. Springer, New York
References 319
Howard R (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA
Jackson C, Pascual R (2008) Optimal maintenance service contract negotiation with aging
equipment. Eur J Oper Res 189:387398
Kececioglu D (1991) Reliability Engineering Handbook, Part I and II. Prentice Hall, Englewood
Cliffs, NJ
Kuo W, Prasad V (2000) An annotated overview of system-reliability optimization. IEEE Trans
Reliab 40(2):176187
Lisnianski A, Frenkel I (2009) Non-Homogeneous Markov Reward Model for Aging Multi-state
System under Minimal Repair. Int J Performab Eng 5(4):303312
Lisnianski A, Frenkel I, Khvatskin L et al (2008) Maintenance contract assessment for aging sys-
tems. Qual Reliab Eng Int 24:519531.
Lisnianski A, Levitin G (2003) Multi-state System Reliability: assessment, optimization and ap-
Martorell S, Sanchez A, Serdarell V (1999) Age-dependent reliability model considering effects
of maintenance and working conditions. Reliab Eng Sys Saf 64:1931
Meeker W, Escobar L (1998) Statistical methods for reliability data. Wiley, New York
Monga A, Zuo M (1998) Optimal system design considering maintenance and warranty. Comp
Oper Res 25:691705
Munoz A, Martorell S, Serdarell V (1997) Genetic algorithms in optimizing surveillance and
maintenance of components. Reliab Eng Sys Saf 57:107120
Murthy DNP, Asgharizadeh E (1999) Optimal decision making in a maintenance service opera-
tion. Eur J Oper Res 116:259273
Murthy DNP, Atrens A, Eccleston JA (2002) Strategic maintenance management. J Qual Maint
8(4):287305
Murthy DNP, Yeung V (1995) Modelling and analysis of maintenance service contracts. Math
Comp Model 22: 219225
Trivedi K (2002) Probability and statistics with reliability, queuing and computer science appli-
cations. Wiley, New York
Valdez-Flores C, Feldman RM (1989) A survey of preventive maintenance models for stochasti-
cally deteriorating single-unit systems. Naval Res Logis 36:419446
Van Noortwijk J, Dekker R, Cooke R et al (1992) Expert judgment in maintenance optimization.
IEEE Trans Reliab 41:427432
Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139:
469489
Welke S, Johnson B, Aylor J (1995) Reliability Modeling of Hardware/Software Systems. IEEE
Trans Reliab 44(3):413418
Wendt H, Kahle W (2006) Statistical Analysis of Some Parametric Degradation Models. In: Ni-
kulin M, Commenges D, Huber-Carol C (eds) Probability, Statistics and Modelling in Public
Health. Springer Science + Business Media, Berlin, pp 26679
Xie M, Poh KL, Dai YS (2004) Computing system reliability: models and analysis. Klu-
wer/Plenum, New York
Zhang F, Jardine AKS (1998) Optimal maintenance models with minimal repair, periodic over-
haul and complete renewal. IIE Trans 30:11091119
8 Fuzzy Multi-state System: General Definition
and Reliability Assessment
8.1 Introduction
In conventional multi-state theory, it is assumed that the exact probability and per-
formance level of each component state are given. However, with the progress in
modern industrial technologies, the product development cycle has become shorter
and shorter while the lifetime of products has become longer and longer (Huang et
al. 2006). In many highly reliable applications, there may be only a few available
observations. Therefore, it is difficult to obtain sufficient data to estimate the pre-
cise values of these probabilities and performance levels in these systems. More-
over, inaccuracy in system models that is caused by human error is difficult to
deal with solely by means of conventional reliability theory (Huang et al. 2004).
In some cases, in order to reduce the computational burden, a simplified model is
used to represent a complex system and a MSS model is used to characterize a
continuous-state system, which can reduce the computational accuracy. New tech-
niques and theories are needed to solve these fundamental problems.
The fuzzy set theory provides a useful tool to complement conventional reli-
ability theories. Cai (1996), Singer (1990), Guan and Wu (2006), Misra and We-
ber (1990), Utkin and Gurov (1996), Chen (1994), and Cheng and Mon (1993) at-
tempted to define and evaluate system reliabilities in terms of fuzzy set theory and
techniques, i.e., probist reliability theory, posbist reliability theory, profust
reliability theory, and fuzzy fault tree analysis. In some recent research, posbist
fault tree analysis of coherent systems was discussed (Huang et al. 2004). Huang
et al. (2006) proposed a Bayesian reliability analysis for fuzzy lifetime data.
There are few works focusing on reliability assessment of MSS using fuzzy set
theory. Ding et al. (2008) have made an attempt at this problem. The fuzzy uni-
versal generating function (FUGF) was developed to extend the UGF with crisp
sets (Ding and Lisnianski 2008), which is widely used in the reliability evaluation
of conventional MSS (Lisnianski and Levitin 2003). The basic definition of a
fuzzy multi-state system (FMSS) model is also given: the state probability and the
322 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
state performance level (rate) of a system component can be represented as fuzzy

values. The FUGF was used to assess such system reliabilities.
In this chapter, a general FMSS model is proposed. Definitions and assump-
tions are introduced. The concepts of relevancy, coherency, dominance, and
equivalence are used to characterize the properties of such systems. Several exam-
ples are presented to illustrate the proposed definitions, concepts, and algorithms.
The corresponding performance distribution is defined and assessed using pro-
posed FUGF. This chapter includes research related to the FMSS by Ding et al.
(2008) and Ding and Lisnianski (2008).
We use the following notation:
i component index
j state index, a crisp value taking integer values only for FMSS
Mi the highest possible state of component i
M s the highest possible state of the system
M the highest possible state of each component and system if the FMSS is
homogeneous
the state of the system, a crisp value taking integer values only for FMSS
gij performance level of component i in state j, which is a crisp value
FMSS structure function, which represents the system performance levels
taking fuzzy values
g ij performance level of component i in state j, which is represented as a
fuzzy value
p ij probability of component i in state j, which is represented as a fuzzy
value
G i fuzzy performance level of component j, G i = g ij if component i is in
state j, 0 j M i
G fuzzy n-dimensional vector, which represents the fuzzy performance lev-
els of all components
k j the minimum system performance level required to ensure the system in
state j or above, which can be represented as a fuzzy value
rj the adequacy index for system state j
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System
In this section, key definitions and concepts of FMSS are introduced and devel-
oped.
The natural extension of the crisp definition for conventional MSS to the fuzzy
set definition for FMSS is that the state probabilities and state performances of a
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System 323
component can be considered as fuzzy values. The general assumptions of FMSS

are presented below (Ding et al. 2008):
1. The state probabilities and state performance levels of a component can be
treated as fuzzy values.
2. The state index is a crisp value taking integer values only. The state spaces of
component i and the system are {0,1,..., M i } and {0,1,..., M s }, respectively. If
M i = M s = M , for 1 i n, the system is considered a homogeneous FMSS.
3. The state of a system is completely determined by the state of its components.
4. The state set of components and the system are ordered so that a lower state
level represents a worse fuzzy performance level.
In the fourth assumption, the methods applied in the MSS model cannot be di-
rectly used to order states in a FMSS model. In the MSS model, for a component i
if gik gij > 0, then k > j definitely. The arithmetic calculation of gik gij is
simple and clear. However, in the FMSS model, the performance level of state k
being higher and lower than that of state j is both possible. For example, suppose
that the fuzzy performance levels of state k and state j can be represented by trian-
gular fuzzy numbers (1, 2, 2,5) and (1.8, 2, 2.2 ) , respectively. In this case,
(1, 2, 2.5) (1.8, 2, 2.2) = (1, 2, 2.5) + (2.2, 2, 1.8) = (1.2, 0, 0.7); the performance
level of state k is not definitely higher or lower than that of state j.
Therefore, the ordering method of fuzzy numbers introduced in Kaufmann and
Gupta (1988) is used to order states in a FMSS model. There are three criteria
used to order two fuzzy numbers. If the first criterion does not give a unique order,
then the second and third criteria will be used in sequence. In this chapter, well-
known triangular fuzzy numbers are used to represent fuzzy variables. However
the proposed definitions and characteristics are not only developed for triangular
fuzzy numbers but also generally suitable for various fuzzy variables with differ-
ent kinds of membership functions.
Criteria for Ordering Fuzzy Variables
1. First criterion for ordering (the removal):

Consider a fuzzy number A and a crisp value k as shown in Figure 8.1. The left
side removal of A with respect to k, Rl ( A, k ) is defined as the area bounded by k
and the left side of the fuzzy number A; and the right side removal of A with re-
spect to k, Rr ( A, k ) is defined as the area bounded by k and the right side of the
fuzzy number A. The removal of fuzzy number A with respect to k is defined as
1
R( A, k ) = [ Rr ( A, k ) + Rl ( A, k )] . (8.1)
2
The first criterion, therefore, is set as a comparison of the removals of two dif-
ferent fuzzy numbers with respect to k (Kaufmann 1988). Relative to k = 0, the
removal number R( A, k ) is equivalent to an ordinary representative of the fuzzy
number. If fuzzy number A is represented by a triplet ( a1 , a2 , a3 ) , then the ordinary
representative is given by
a + 2a2 + a3
A = 1 . (8.2)
4
a1 a2 a3 k X
Rl ( A, k ) = Areas +
Rr ( A, k ) = Area
1
R( A, k ) = [Rr ( A, k ) + Rl ( A, k )]
2
Fig. 8.1 Removals with respect to k for a fuzzy number A
2. Second criterion for ordering (the mode):

Different fuzzy numbers may have the same ordinary representatives. The first
criterion may not be sufficient to obtain the linear ordering of these fuzzy num-
bers. In these cases, the second criterion, which is set as a comparison of the
modes of different fuzzy numbers, is used to order these numbers. The mode of a
fuzzy variable is the value that has the highest membership function. In the case of
a triangular fuzzy variable, it is simply a2 .
3. Third criterion for ordering (the divergence):

If the first and second criteria are not enough to obtain the ordering of fuzzy
numbers, the divergences around the modes of fuzzy numbers are used to order
these numbers. The divergence around a mode measures the magnitude of

expansion at the given mode point. In the case of a triangular fuzzy variable, it is
the value of a3 a1 .
The following example (Ding et al. 2008) illustrates the proposed method.
Example 8.1 Consider a component that may be in one of four possible states.
The performance levels of these states are
A1 = ( 4,6, 7 ) , A2 = ( 4,5,9 ) , A3 = ( 3,5,10 ) , A4 = ( 0, 0, 0 ) .
Firstly, we use the first criterion to order:
4 + 12 + 7
A1 = ( 4,6, 7 ) A1 = = 5.75,
4
4 + 10 + 9
A2 = ( 4,5,9 ) A2 = = 5.75,
4
10 + 10 + 3
A3 = ( 3,5,10 ) A3 = = 5.75,
4
0+0+0
A4 = ( 0, 0, 0 ) A = = 0.
4
Therefore, A4 < A1 , A2 , A3 .
Secondly, the second criterion is used to order A1 , A2 , and A3 :
A1 = ( 4, 6, 7 ) mode = 6,
A2 = ( 4,5,9 ) mode = 5,
A3 = ( 3,5,10 ) mode = 5.
Therefore, A1 > A2 , A3 . .
Finally, the third criterion is used to order A2 and A3 :
A2 = ( 4,5,9 ) divergence = 9 4 = 5,
A3 = ( 3,5,10 ) divergence = 10 3 = 7.
Therefore, A3 > A2 .
We obtain the linear order, A1 > A3 > A2 > A4 .
Suppose the performance set of a component is represented
by G = {G0 , G1 , G2 , G3} , where G3 = A1 , G2 = A3 , G1 = A2 and G0 = A4 .
We propose the following definitions and examples that determine and illus-
trate important FMSS properties (Ding et al. 2008):
Definition 8.1 A FMSS is in state j or above if the system performance level is
greater than or equal to k j , a predefined fuzzy or crisp value. Let (G
) represent
the system structure function, which maps the space of components fuzzy per-
formance levels into the space of the systems fuzzy performance levels, and
represents the state of the system. Then we have
{( ) }
Pr (G ) k 0 = Pr { j} .
j
The following example illustrates the definition.

Example 8.2 Consider a FMSS with two components. Each component has three
states: perfect functionality state, degraded state, and complete failure state. For
component 1 the state of perfect functionality corresponds to a performance level
of (1, 1, 1) and a probability of ( 0.795, 0.8, 0.805 ) , the state of complete failure
corresponds to a performance level of ( 0, 0, 0 ) and a probability of
( 0.096, 0.1, 0.102 ), and the performance level and probability of the degraded
state are ( 0.65, 0.7, 0.75 ) and ( 0.095, 0.1, 0.105 ) , respectively. For component 2
the state of perfect functionality corresponds to a performance level of (1, 1, 1) and
a probability of ( 0.695, 0.7, 0.703) , the state of complete failure corresponds to a
performance level of (0, 0, 0) and a probability of ( 0.09, 0.1, 0.11) , and a perform-
ance level and a probability of the degraded state are ( 0.75, 0.8, 0.85 ) and
( 0.195, 0.2, 0.205) , respectively. The system also has three states: perfect work-
ing state, partial working state, and complete failure state. The FMSS structure
function is (G
) = g + g . For simplicity, the minimum requirement to ensure
1j 2j
that the system is in state j or above is set as crisp values but represented as trip-
lets, with k j = (0, 0, 0), (1.5, 1.5, 1.5), (2, 2, 2) for j = 0, 1, 2, respectively.
Suppose that two components are both in state 1. The system performance level
can be evaluated as
( g11 , g 21 ) = g11 + g 21 = ( a1 , a2 , a3 )
= ( 0.65, 0.7, 0.75 ) + ( 0.75, 0.8, 0.85 ) = (1.4, 1.5, 1.6 ) .
As shown in Figure 8.2, for a1 > k0 the FMSS is definitely in state 0 or above;
for a3 < k2 , the FMSS is definitely not in state 2.
However, for a1 < k1 < a3 there exists the uncertainty of FMSS being in state 1.
This uncertainty can be evaluated by possibility measures, which can be easily

integrated with fuzzy sets. A possibility distribution (Zadeh 1978) can be a mem-
bership function of a specified fuzzy set attached to a variable. It is assumed that
1, if (G
) k , with possibility ( (G
) = 1),
j (G ) =
j j
0, if (G
)< k , with possibility ( (G
j j
) = 0).
Some new parameters defined in Ding et al. (2008) are supplemented and used
to evaluate the possibility.
k0
( g11 , g 21 ) k k2
1
1
a1 a2 a3 X
Fig. 8.2 Fuzzy performance level
The adequacy index for system state j, which determines the relation between
the system performance level and the state performance requirement k j , is de-
fined as
{
rj = k j = rj , rj (rj ) | rj = k j , , k j K j ,} (8.3)
where and K j are the definition domains of and k j , respectively, and
j
r j = k j
{
r (rj ) = sup min , k , j
} (8.4)
where k j can be represented by a fuzzy set, which is a more general formulation

than the value represented by crisp values.
If the membership function of rj is discrete, the cardinality of fuzzy set rj is
rj =
r j R j
r (rj ).
j
(8.5)
If the membership function of rj is continuous, the cardinality of fuzzy set rj
is
rj =
r j R j
r (rj )drj ,
j
(8.6)
where R j is the definition domain of rj .

Let SR j be a subset of R j , in which rj 0, and let it be defined as follows:
SR j = {rj R j | rj 0} . (8.7)
And let
srj = {srj , ( srj ) | ( srj ) = (rj ), srj SR j } . (8.8)
If the membership function of ~

r j is discrete, the cardinality of fuzzy set s~
r j is
srj =
sr j SR j
sr ( srj ).
j
(8.9)
If the membership function of ~

r j is continuous, the cardinality of fuzzy set s~
rj
is
srj =
sr j SR j
sr ( srj )dsrj .
j
(8.10)
The relative cardinality of fuzzy set s~

r j is defined as
srj = srj rj . (8.11)

rel
The srj evaluates the possibility of FMSS being in state j or above

rel
( ( j =1) ) given the system performance rate , where 0 srj

rel
1.
Given the system performance level , the fuzzy probability of FMSS being in
state j or above can be evaluated as
{( j ) }
k 0 = Pr (G)
Pr (G) ( ((G))=1)
j
{
= Pr (G) }
sr
j { } rel
, (8.12)
where Pr (G) { }
is the probability that system has performance level (G).

For the example 8.2, for state 1 or above,
r1 = k1 = (1.4, 1.5, 1.6 ) 1.5 = ( 0.1, 0, 0.1) ,

r1 =
r 1R1
r (r1 )dr1 = 0.5 1 (0.1 (0.1)) = 0.1,
1
sr1 =
sr 1SR1
sr ( sr1 )dsr1 = 0.5 1 (0.1 0) = 0.05,
1
sr1 rel = sr1 r1 = 0.5,
{ }
Pr ( g11 , g 21 ) = Pr( g11 ) Pr( g 21 )
= ( 0.095, 0.1, 0.105 ) ( 0.195, 0.2, 0.205)
= (0.018525, 0.02, 0.021525),
{ } { }
Pr ( g11 , g 21 ) k1 = Pr ( g11 , g 21 ) sr1 rel
= (0.018525, 0.02, 0.021525) 0.5
= (0.0092625, 0.01, 0.0107625).
Definition 8.2 Component i is said to be fuzzy strongly relevant to a FMSS with

structure function if for every level j of component i there exists a vector
( , G ) such that ( g , G ) is definitely larger than or equal to k ,
i ij j and
( g , G ) is definitely smaller than k for l > j , 0 j M , 1 l M ,

ij l i i and
M i = M s , where 1 i n .
The condition of component i being strongly relevant to a MSS or fuzzy
strongly relevant to a FMSS is the same: given certain states of other components
in the system, the state of the system is exactly equal to the state of component i.
The only difference is that in the MSS, the performance levels of components or
the system are only represented as crisp values while in the FMSS the perform-
ance levels of components or the system can be considered as fuzzy values. Com-
ponent i and the system have the same number of states. If all the components are
fuzzy strongly relevant to the system, the FMSS must be homogenous.
The following example illustrates the definition.
Example 8.3 Consider a FMSS with two components. Each component and the
system have three states. The components performance levels for different states
are shown in Table 8.1. The performance levels of the derated state (state 1) of the
components are represented as fuzzy values.
) = min { g , g } . The minimum require-
The FMSS structure function is (G 1j 2j
ment to ensure that the system is in state j or above is represented as crisp values,
with k j = 0, 0.8, 1 for j = 0, 1, 2, respectively.
Table 8.1 Performance parameters of components
j 0 1 2
g1 j 0 (0.8, 0.85, 0.9) 1
g 2 j 0 (0.7, 0.8, 0.9) 1
k j 0 0.8 1
Let us assume that component 2 is in state 2. In this case, when component 1 is

in state 2, ( g12 , g 22 ) = min(1,1) = 1 k2 , the FMSS is in state 2; when compo-
nent 1 is in state 1, ( g11, g22 ) = min{(0.8,0.85,0.9),(1,1,1)} = (0.8,0.85,0.9),
k1 ( g11 , g 22 ) < k2 , the FMSS is in state 1; when component 1 is in state 0,
( g10 , g22 ) = min{0,1} = 0, k0 = ( g 22 , g01 ) < k1 , the FMSS is in state 0. Based on
definition 8.2, we say that component 1 is fuzzy strongly relative to the system.
Definition 8.3 Component i is said to be fuzzy relevant to a FMSS with structure
function if for every level j of component i there exists a vector i , G
and a ( )
level l such that the possibility of g ij , G ( )
being larger than or equal to kl is
higher than the possibility of g i ( j 1) , G
( ) being larger than or equal to kl for
1 j M i and 1 l M s , where 1 i n.
Based on definition 8.3, given certain states of other components in a system, a
change in the state of a fuzzy relevant component can change the possibility of the
systems staying or above in a state. The relevant condition in the fuzzy domain is
relaxed. We do not require the state of the system to change definitely when the
state of a fuzzy relevant component changes. We only require the state of a fuzzy
relevant component to have a nontrivial possible influence on the state of the sys-
tem.
Given the system performance level, the possibility of the systems staying in
or above a state can be evaluated by Equation 8.11. The following example illus-
trates the definition.
are shown in Table 8.2. The performance levels of the degraded state (state 1) of
the components are represented as fuzzy values. The FMSS structure function is
(G ) = g1 j + g 2 j . The minimum requirement to ensure that the system is in state j
or above is represented as crisp values, with k j = 0, 0.5, 1.1 for j = 0, 1, 2, re-
spectively.
j 0 1 2
g1 j 0 (0.45, 0.5, 0.55) 1
g 2 j 0 (0.45, 0.6, 0.65) 1
k j 0 0.5 1.1
Let us assume that component 1 is in state 1. In this case, when component 2 is

in state 0, ( g11, g20 ) = g11 + g20 = (0.45, 0.5, 0.55) .
Consider
k1 = 0.5 = (0.5, 0.5, 0.5),
r1 = ( 0.45, 0.5, 0.55) (0.5, 0.5, 0.5) = ( 0.05, 0, 0.05 ) ,
sr1 rel = 0.5.
When component 2 is in state 1, ( g11, g21 ) = g11 + g21 = (0.9,1.1,1.2).
Consider
k1 = 0.5 = (0.5, 0.5, 0.5),
r1 = ( 0.9, 1.1, 1.2 ) (0.5, 0.5, 0.5) = ( 0.4, 0.6, 0.7 ) ,
sr1 rel = 1.
Consider
k2 = 1.1 = (1.1, 1.1, 1.1),
r2 = ( 0.9, 1.1, 1.2 ) (1.1, 1.1, 1.1) = ( 0.2, 0, 0.1) ,
sr2 rel
= 0.33.
When component 2 is in state 2, ( g11, g22 ) = g11 + g22 = (1.45,1.5,1.55).
Consider
k2 = 1.1 = (1.1, 1.1, 1.1),

r2 = (1.45, 1.5, 1.55 ) (1.1, 1.1, 1.1) = ( 0.35, 0.4, 0.45 ) ,
sr2 rel
= 1.
It can be seen that under the condition that component 1 is in state 1, when
component 2 changes its state from 0 to 1, sr1 rel (the possibility of the systems
staying in state 1 or above) increases from 0.5 to 1. When component 2 changes
its state from 1 to 2 the possibility of system staying in state 2 or above also in-
creases from 0.33 to 1. Thus, we conclude that component 2 is fuzzy relevant to
the system structure based on Definition 8.3.
Definition 8.4 Component i is said to be fuzzy weak relevant to a FMSS with
structure function if for component i there exists a vector i , G
and level l ( )
such that the possibility of g iM i , G ( )
being larger than or equal to kl is greater
than the possibility of g i 0 , G ( )
being larger than or equal to k for 1 l M ,
l s
where 1 i n .
Comparing this definition with definition 8.3, we only require that at least one
state of a fuzzy weakly relevant component have a possible nontrivial influence on
the state of the system. The state change of the system can be only possible. The
following example illustrates this concept.
are shown in Table 8.3. The FMSS structure function is (G ) = min { g , g } .
1j 2j
The minimum requirement to ensure that the system is in state j or above is repre-
sented as crisp values, with k j = 0, 0.85, 1 for j = 0, 1,2, respectively.
j 0 1 2
g1 j 0 (0.8, 0.85, 0.9) 1
g 2 j 0 (0.65, 0.75, 0.8) (0.85, 0.9, 0.95)
k j 0 0.85 1
It is supposed that component 1 is in state 1. In this case, when component 2 is

in state 0, ( g11 , g20 ) = min { g11, g20 } = 0, sr0 rel = 1 , sr1 rel = 0 and sr2 rel = 0.
When component 2 is in state 1, ( g11, g21 ) = min { g11, g21} = (0.65,0.75,0.8) ,
sr0 rel
= 1, sr1 rel = 0 and sr2 rel
=0.
When component 2 is in state 2, ( g11, g22 ) = min{ g11, g22} = (0.8, 0.85, 0.9) ,
sr0 rel
= 1 , sr1 rel = 0.5 and sr2 rel
=0.
It can be seen that under the condition that component 1 is in state 1, when
component 2 changes its state from 0 to 1, the possibility of the systems staying
in state 1 or above does not change ( sr
1 rel )
= 0 ; when component 2 changes its
state from 1 to 2, the possibility of the systems staying in state 1 or above in-
creases from 0 to 0.5 ( sr1 rel )
= 0.5 . Thus, we conclude that component 2 is fuzzy
weakly relevant to the system structure based on definition 8.4.
From the above definitions we notice that a fuzzy strongly relevant component
satisfies the requirements for fuzzy relevant and fuzzy weakly relevant compo-
nents; and a fuzzy relevant component satisfies the requirements for fuzzy weakly
relevant components. Moreover, a fuzzy weakly relevant component may be a
fuzzy relevant component or a fuzzy strongly relevant component; and a fuzzy
relevant component can be a fuzzy strongly relevant component.
( ) . The structural
n
Definition 8.5 Let be a function with domain g i 0 , g i1 ,..., g iM i
function represents a fuzzy multi-state monotone system (FMMS) if it satisfies

to the following conditions:
1. If j > l , based on the criteria for ordering, g ij , G ( )
has a better fuzzy per-
(
formance level than g il , G )
, for 0 j M , 1 l M .
i i
2. ( g1,0 ,..., g i ,0 ,..., g n ,0 ) = kmin for 1 i n, where kmin is the lowest system fuzzy
performance level.
( )
3. g1M1 ,..., g iM i ,..., g nM n = kmax for 1 i n, where kmax is the greatest system
fuzzy performance level.
4. If the FMSS is homogenous, the possibility of ( g1 j ,..., g ij ,..., g nj ) being larger
than or equal to k j is larger than 0 for 1 j M and 1 i n .
Based on this definition, we can say that the increase of the state of any system
components will not degrade the system fuzzy performance level. In addition
when all components are working perfectly the greatest system fuzzy performance
level is achieved; and when all components have completely failed, the lowest
system fuzzy performance level is achieved. However, for a homogenous system
the condition that when all components are in the same state the system is also
definitely in the same state is relaxed. We only require that when all components
are in the same state the system have a nontrivial possibility of being in the same
state.
For example, in Example 8.2 ( g10 , g20 ) = 0, ( g11, g21 ) = (1.4, 1.5,1.6), and
( g12 , g 22 ) = 2.
Obviously, ( g12 , g22 ) > ( g11, g21 ) > ( g10 , g20 ) , ( g10 , g20 ) = kmin , and
( g12 , g22 ) = kmax .
The possibilities of ( g12 , g 22 ) , ( g11 , g 21 ) , ( g10 , g 20 ) being larger than
k j = 2, 1.5, 0 for j = 2, 1, 0, are 1, 0.5, and 1, respectively.
Therefore, in Example 8.2 structural function represents a fuzzy multi-state
monotone system.
Definition 8.6 A fuzzy multi-state monotone system with structure function is

strongly fuzzy coherent, fuzzy coherent, and weakly fuzzy coherent if and only if
every component is strongly fuzzy relevant, fuzzy relevant, and weakly fuzzy
relevant and the system structural function represents a fuzzy multi-state
monotone system.
Definition 8.7 Two component fuzzy performance vectors G and G are said to
be equivalent if and only if ( G ) (G
) = 0. We use the notation G G to
indicate that these two vectors are equivalent.
The following example illustrates this concept.
Example 8.6 Consider two performance vectors
G 1( 2 3 )
= G , G , G = {( 0, 0.5, 1) , ( 0.5, 1, 1.5 ) , (1.5, 2, 2.5 )} and
G = ( G , G , G ) = {( 0.5, 0.8,1) , ( 0.5,1,1.5) , (1,1.7, 2.5)}.

1 2 3
Thus, ( G ) = G + G + G = ( 2, 3.5, 5 ) , ( G
1 2 3
) = G + G + G = ( 2, 3.5, 5)
*
1

2

3
and ( G ) (G
) = (0, 0, 0). Therefore these two vectors are equivalent.

To satisfy the equivalent condition, ( G ) must be exactly equal to (G ). It is
a difficult condition to satisfy in the fuzzy domain. Small deviations of fuzzy val-
ues will change the conclusion.
Example 8.7 Suppose G1 = ( 0.6, 0.8, 1) and other variables are the same as in Ex-
ample 8.6. In this case, G ( )

* = (2.1, 3.5, 5) and G
(G
) 0. However, ( )
there exists large overlap (similarity) between G
and (G ( )
) as illustrated in
Figure 8.3. Only the definition of equivalence is not sufficient for evaluating the
property of fuzzy performance vectors.
Therefore, the similarity measure is necessary to evaluate the proximity and

approximate equality of fuzzy performance levels. The similarity of two fuzzy
variables A = { x, A ( x ) | x U } and B = { x, B ( x) | x U } , where is U is the uni-
versal set, can be evaluated by (Pappis and Karacapilidis 1993):
A B A B ( x) min( A ( x), B ( x))

S ( A , B ) = = xU
= xU
. (8.13)
A B A B ( x) max( A ( x), B ( x))
xU xU
When A is exactly equal to B , S ( A , B ) = 1. When A B = 0, which means

that A and B do not overlap at all, S ( A , B ) = 0.
( G ) (G
)
1.45
In Example 8.7, S ( G
, (G( )
)) = = = 0.967.
( )
G (G
) 1.5
Definition 8.8 Two component fuzzy performance vectors G and G are said to
be approximately equivalent within a degree if and only if
( )
S ( G
, (G
)) . We use the notation (G ) to indicate that these two
G

vectors are approximately equivalent.

and G
In the above example suppose = 0.95; we say that G are approxi-
mately equivalent.
( G
) (G
)
2 2.1 3.5 5 X
( G
) (G
) = Areas +
( G
) (G
) = Area
Fig. 8.3 Fuzzy performance levels

8.3 Reliability Evaluation of Fuzzy Multi-state Systems
8.3.1 Fuzzy Universal Generating Function: Definitions and

Properties
The UGF method is the primary approach for reliability evaluation of MSSs. The
fuzzy universal generating functions (FUGFs) developed in Ding and Lisnianski
(2008) can be used to evaluate the defined FMSS, which is summarized in the fol-
lowing subsections.
{
The fuzzy performance distribution (PD) g i = g i1 ,..., g iji , p i = p i1 ,..., p iji} { }
of component i can be represented in the following form:
Mi
u ( z ) = p iji z
g iji
, (8.14)
ji =1
where g i and p i are, respectively, the performance set and probability set repre-
sented by fuzzy sets for component i.
To obtain the fuzzy PD of a FMSS with an arbitrary structure, a general fuzzy
composition operator is used over the z-transform fuzzy representations of n

system components.
M1 Mn
( p z g1 j1 ,... p z g njn )
U ( z) = 1 j1 njn
j1 =1 jn =1
M1 M 2 Mn
( g1 j1 ,... g njn )
= ... ( p j z ) (8.15)
j1 j2 jn
M1 M 2 Mn
= ... ( p j z j ),
g
j1 j2 jn
where p j and g j can be evaluated using Equations 8.16 and 8.17, respectively.
The probability of system state j represented by a fuzzy set can be calculated
as:
n

p j = p j , p j ( p j ) | p j = piji , piji Piji , (8.16)
i =1
8.3 Reliability Evaluation of Fuzzy Multi-state Systems 337
where p j ( p j ) = sup min p1 j ,..., pij .

n
{ 1 i
}
pj = piji
i =1
The performance of system state j represented by a fuzzy set can be evaluated

as
{
g j = ( g1 j1 ,..g iji ., g njn )= g j , g j ( g j ) | g j = ( g1 j1 ,..giji ., g njn ), giji Giji , (8.17) }
where g j ( g j ) = sup
( g1 j1 ,.. giji ., g njn )
{
min g1 j ,..., gnj
1 n
} and ( g1 j1 ,..g iji ., g njn ) is the
structure function for the FMSS.

If system components are connected in parallel, then
n

g j = g j , g j ( g j ) | g j = giji , giji Giji , (8.18)
i =1
where g j ( g j ) = sup min g1 j ,..., gij .

n
{ 1 i
}
gj= giji
i =1
If system components are connected in series, then
{
g j = g j , g j ( g j ) | g j = min( g1 j1 ,..giji ...), giji Giji , } (8.19)
where g j ( g j ) = sup
g j = min( g1 j1 ,.. giji ., g njn )
{
min g1 j ,..., gnj .
1 n
}
The suggested approach is called the FUGF technique.
8.3.2 Availability Assessment for Fuzzy Multi-state Systems
As with conventional MSSs, system availability is also used as an important meas-

ure for evaluating FMSS performance. It is supposed that the FUGF of a FMSS
can be represented as
Ms
U ( z ) = p j z j .
g
(8.20)
j =1
The system fuzzy availability A can be evaluated using the following opera-
tor A :
Ms
A ( w ) = A (U ( z ), w ) = A p j z j , w
g
j =1
{
= ..., ( p , sr ),...
A i j rel }
(8.21)
{ {
= A ..., pi srj
rel
, p j ( p j ) | p j Pj ,... } }
Ms

= A, A ( A ) | A = p j srj , p j Pj ,
rel
j =1
where A ( A ) =
Ms
sup {
min p1 ,.. p j ., pM .
s
}
A = p j ar j
j =1 rel
From Equation 8.21, the operator A uses the following procedures to obtain
system fuzzy availability:
1. Obtain the system FUGF as shown in 8.20.
2. For a given demand w , evaluate srj and srj for system state j using Equa-
rel
tions 8.10 and 8.11, respectively.

3. Evaluate the probability of the FMSS staying in an adequate condition of
system state j: p j srj
rel
{
= p j srj
rel
, p j ( p j ) | p j Pj . }
4. After considering all system states, evaluate the system fuzzy availability as
Ms
A ( w ) = p j arj
j =1
{ rel
, p j ( p j ) | p j Pj }
Ms

= A, A ( A ) | A = p j arj , p j Pj ,
rel
j =1
where A ( A ) =
Ms
sup {
min p1 ,.. p j ., pM .
s
}
A = p j ar j
j =1 rel
8.3.3 Fuzzy Universal Generating Function for Series-parallel

Fuzzy Multi-state Systems
The series-parallel system is one of the most important MSSs. A gas transmission
system is a typical example of such a system. In order to obtain the FUGF of a
FMSS, the composition operators are used recursively to obtain the FUGF of the
intermediate series or parallel subsystems.
Consider a series-parallel system with fuzzy values of the performance rates
(level) and probabilities, where components are statistically independent. The per-
formance rates (level) and probabilities of the components are also assumed to be
triangular fuzzy numbers represented as triplet ( a, b, c ) , which is one of the most
important classes of fuzzy numbers and is used in many practical situations be-
cause of their simplicity in mathematical calculations (Kaufmann and Gupta
1988). The membership function is defined as shown:
0, x < a,

x a , a x b,

X ( x) = b a (8.22)
c x , b x c,
c b
0, x > c,

where x represents the fuzzy number of performance rates (level) or probabilities

of the components.
The following algorithm is used to determine the FUGF of such system:
1. Find the parallel and series subsystems in the FMSS.
2. Obtain the FUGF of these subsystems.
The capacity type of MSS (Lisnianski and Levitin 2003) has been considered.
Parallel subsystem For such subsystems the structure function is the sum of com-
ponent performances. According to 8.18 and the fuzzy arithmetic operations of
triangular fuzzy numbers (Cai 1996), the subsystem performance g j can be ob-
tained as
n n n
g j = P ( g1 j1 ,..g iji ., g njn ) = ( aiji , biji , ciji ), (8.23)
i =1 i =1 i =1
where P is the fuzzy parallel operator and the component g iji is represented as
( )
triplet aiji , biji , ciji .
According to 8.16 and the fuzzy arithmetic operations of triangular fuzzy num-
bers (Cai 1996; Chen and Mon 1994), the subsystem probability p j can be ob-
tained as
n n n

p j = aiji , biji , ciji , (8.24)
i =1 i =1 i =1
where the component piji is represented as aiji , biji , ciji . ( )

Series subsystem For such subsystems, the structure function is the minimum of
component performances.
Based on the resolution theorem of fuzzy mathematics (Klir at al. 1997;
Zimmermann 1991), the subsystem performance g j can be obtained as
g j = S ( g1 j1 ,..g iji ., g njn ) = l g lj = l a lj , c lj , (8.25)

l l
where S is the fuzzy series operator, g lj is the l-cut of fuzzy set g j , which con-
tains all components with a degree of membership greater than or equal to l,
{ }
g lj = g j | g j ( g j ) l , and g lj is expressed as an interval a lj , clj as shown in
Figure 8.4. Notice that g j is represented by solid lines in Figure 8.4.
g j
1
a lj c lj
l
interval
a1 j1 a2 j b2 j2 b1 j1 c2 j2 c1 j1
2 X
Fig. 8.4 g j with l-cut
Suppose that there are two components represented by triplets a1 j1 , b1 j1 , c1 j1 ( )

and (a 2 j2 , b2 j2 , c2 j2 ) in a series subsystem respectively. Let a , c and l
1 j1
l
1 j1
a2l j2 , c2l j2 be, respectively, the confidence interval at level l of g1 j1 and g 2 j2 as
shown in Figure 8.5.
g 2 j g 1 j
2 1
1
a1l j1 a2l j2 c1l j1 c2l j2

l
a1 j1 a2 j b2 j2 b1 j1 c1 j1 c2 j2
2 X
Fig. 8.5 g1 j1 and g 2 j2 with l-cut
Equation 8.25 can be expressed as
S ( g1 j1 , g 2 j2 ) = l a lj , c lj = l min(a1l j1 , a2l j2 ), min(c1l j1 , c2l j2 ) . (8.26)

l l
It is assumed that a1 j1 a2 j2 , therefore, there are four possibilities for the result
of 8.26:
Case 1: b1 j1 b2 j2 and c1 j1 c2 j2 . Obviously in this case g1 j1 is definitely less
than or equal to g 2 j2 . S ( g1 j1 , g 2 j2 ) can be represented by the triplet
(a 1 j1 )
, b1 j1 , c1 j1 .
Case 2: As shown in Figure 8.4, b1 j1 b2 j2 and c1 j1 c2 j2 . The membership
function of S ( g1 j1 , g 2 j2 ) represented by the solid line is
0, x < a1 j1 ,

x a1 j1 a1 j1 b2 j2 a2 j2 b1 j1
b a , a1 j1 x ,
1 j1 1 j1 b2 j2 b1 j1 a2 j2 + a1 j1
x a a1 j1 b2 j2 a2 j2 b1 j1
S ( g1 j , g2 j ) (x) =
2 j2
, x b2 j2 , (8.27)
1 2 b a
2 j2 2 j2 b2 j2 b1 j1 a2 j2 + a1 j1
c x
2 j2 , b2 j2 x c2 j2 ,
c2 j2 b2 j2

0, x > c2 j2 .
Case 3: In this case, b1 j1 b2 j2 and c1 j1 c2 j2 . It can be seen from Figure 8.5

that the membership function of S ( g1 j1 , g 2 j2 ) represented by the solid line is
0, x < a1 j1 ,

x a1 j1 a1 j1 b2 j2 a2 j2 b1 j1
b a , a1 j1 x b b a + a ,
1 j1 1 j1 2 j2 1 j1 2 j2 1 j1
x a2 j a b
1 j1 2 j2 a b
2 j2 1 j1
2
, x b2 j2 ,
b2 j2 a2 j2 b2 j2 b1 j1 a2 j2 + a1 j1 (8.28)
S ( g1 j , g2 j ) ( x) =
1 2
c2 j2 x , b x c2 j2 b1 j1 c1 j1 b2 j2 ,
c b 2 j2
b1 j1 c1 j1 b2 j2 + c2 j2
2 j2 2 j2
c1 j1 x c2 j2 b1 j1 c1 j1 b2 j2
, x c1 j1
c b
1 j1 1 j1 b1 j1 c1 j1 b2 j2 + c2 j2
0, x > c1 j1 .

Case 4: b1 j1 b2 j2 and c1 j1 c2 j2 . The membership function of S ( g1 j1 , g 2 j2 ) is
0, x < a1 j1 ,

x a1 j1
b a , a1 j1 x b1 j1 ,
1 1 j1
1 j
c x c2 j2 b1 j1 c1 j1 b2 j2 (8.29)
1j
S ( g1 j , g2 j ) ( x) = 1 , b1 j1 x ,
1 2 c b
1 j1 1 j1 b1 j1 c1 j1 b2 j2 + c2 j2
c x c2 j2 b1 j1 c1 j1 b2 j2
2 j2 , x c2 j2 ,
c2 j2 b2 j2 b1 j1 c1 j1 b2 j2 + c2 j2

0, x > c2 j2 .
The series subsystem probability p j can be obtained using Equation 8.24. If

the series subsystem has more than two components, first the FUGF of two com-
ponents in the subsystem is calculated and then the two components are replaced
by the single component having the FUGF obtained in the first step. The proce-
dure is repeated to calculate the FUGF using the algorithm discussed above until
the subsystem contains only one component. But the computation is tedious and
(
time consuming. To avoid computational complexity, S g1 j1 , g 2 j2 can be ap- )
proximated by a triangular fuzzy number:
S ( g1 j1 , g 2 j2 ) = l a lj , c lj
l
= l min(a , a2l j2 ), min(c1l j1 , c2l j2 )

l
1 j1 (8.30)
l
( )
= min(a1 j1 , a2 j2 ), min(b1 j1 , b2 j2 ), min(c1 j1 , c2 j2 ) .
1. Replace the subsystems with single components having FUGF obtained in

step 2.
2. If the FMSS contains more than one component, go to step 1.
If the performance rates (level) and probabilities of the components are a arbi-
trary fuzzy representation, the discretization and approximation techniques devel-
oped by Misra and Weber (1990) can be used to calculate system values.
8.3.4 Illustrative Examples
The proposed technique is used to evaluate the availability of a flow transmission

system design, which is presented in Figure 8.6.
Fig. 8.6 A flow transmission system structure

It consists of three components (pipes). The flow is transmitted from left to

right. The performance of the pipes is measured by their transmission capacity
(tons per minute). It is supposed that components 1 and 2 have three states: a state
of total failure corresponding to a capacity of 0, a state of full capacity represented
by a crisp value, and a state of partial failure represented by a triangular fuzzy
number. Component 3 only has two states: a state of total failure and a state of full
capacity represented by a crisp value. It is also assumed that the state probabilities
of each component are represented as triangular fuzzy values. The parameters for
the components are shown in Table 8.4. Suppose that the system demand ( w ) is
estimated as a fuzzy value, which is represented as (1.3, 1.4, 1.5).
Table 8.4 Parameters of the flow transmission system
j 1 2 3
p j1 (0.795, 0.8, 0.805) (0.695, 0.7, 0.703) (0.958, 0.96, 0.965)
p j 2 (0.095, 0.1, 0.105) (0.195, 0.2, 0.205) (0.035, 0.04, 0.041)
p j 3 (0.096, 0.1, 0.102) (0.09, 0.1, 0.11) \
g j1 1.5 2 4
g j 2 (0.9, 1, 1.1) (1.4,1.5,1.7) 0
g j 3 0 0 \
In the first step, the FUGF of each component is defined as follows:
u1 ( z ) = p11 z g11 + p12 z g12 + p13 z g13

=(0.795, 0.8, 0.805) z1.5 + (0.095, 0.1, 0.105) z (0.9,1,1.1) + (0.096, 0.1, 0.102) z 0 .
u2 ( z ) = p 21 z g 21 + p 22 z g 22 + p 23 z g 23
=(0.695, 0.7, 0.703) z 2 + (0.195, 0.2, 0.205) z (1.4,1.5,1.7) + (0.09, 0.1, 0.11) z 0 .
u3 ( z ) = p 31 z g31 + p 32 z g32
=(0.958, 0.96, 0.965) z 4 + (0.035, 0.04, 0.041) z 0 .
In order to find the FUGF for components 1 and 2 connected in parallel, the
is applied to u ( z ) and u ( z ). Expressions 8.23 and 8.24 are used
operator P 1 2
to calculate g j and p j , respectively.

( u ( z ), u ( z ) ) =
P 1 2
= (0.55253, 0.56, 0.56591) z 3.5 + (0.15503, 0.16, 0.16053) z (2.9, 3, 3.2)

+ (0.07155, 0.08, 0.08855) z1.5 + (0.066025, 0.07, 0.073815) z (2.9, 3, 3.1)
+ (0.018525, 0.02, 0.021525) z (2.3, 2.5, 2.8) + (0.00855, 0.01, 0.01155) z (0.9, 1, 1.1)
+ (0.06672, 0.07, 0.07171) z 2 + (0.01872, 0.02, 0.02091) z (1.4, 1.5, 1.7)
+ (0.00864, 0.01, 0.01122) z 0 .
is applied to obtain the FUGF for the entire system, where

The operator S
component 3 is connected in a series with components 1 and 2, which are con-

nected in parallel. Expressions 8.30 and 8.24 are used to calculate g j and p j , , re-
spectively.
( )
9

( (u ( z ), u ( z )), u ( z ) ) = p z g j
S 1 P 2 3 j
j =1
= (0.52932, 0.5376, 0.54611) z 3.5 + (0.14851, 0.1536, 0.15925) z (2.9, 3, 3.2)

+ (0.063252, 0.0672, 0.071231) z (2.9, 3, 3.1) + (0.017747, 0.0192, 0.020772) z (2.3, 2.5, 2.8)
+ (0.063918, 0.0672, 0.069196) z 2 + (0.017934, 0.0192, 0.020178) z (1.4, 1.5, 1.7)
+ (0.068545, 0.0768, 0.085451) z1.5 + (0.008191, 0.0096, 0.01115) z (0.9, 1, 1.1)
+ (0.042097, 0.0496, 0.053066) z 0 .
After collecting the terms with the same capacity rates, there are nine system
states.
For states j = 1,...,5, and 7, w < g j definitely, srj = 1. These states are suc-
rel
cessful states.
For states i = 8 and 9, g j < w definitely, srj = 0. These states are failure
rel
states.
For state j = 6, rj = (1.4, 1.5, 1.7) + (1.5, 1.4, 1.3) = (0.1, 0.1, 0.4). Be-
cause rj is represented as a triangular fuzzy value,
rj = 0.5 1 ( 0.4 (0.1) ) = 0.25; srj = rj 0.5 0.5 ( 0 (0.1)) = 0.225 and
srj = arj rj = 0.9.
rel
Operator A in 8.21 is used to calculate the fuzzy availability.

A ( w ) = A (U ( z ), w )
= (0.52932, 0.5376, 0.54611) + (0.14851, 0.1536, 0.15925)
+ (0.063252, 0.0672, 0.071231) + (0.017747, 0.0192, 0.020772)
+ (0.063918, 0.0672, 0.069196) + (0.017934, 0.0192, 0.020178) 0.9
+ (0.068545, 0.0768, 0.085451) + 0 + 0
= (0.90743, 0.93888, 0.97017).
Suppose that the system safety standard requires the system operation must sat-
isfy a required level of system availability, which is set as 0.9. After evaluation,
the above system design considering fuzzy uncertainties can satisfactorily meet
the system availability requirement, which guarantees the system working in a
relatively safe mode.
References
Cai K (1996) Introduction to fuzzy reliability. Kluwer, Amsterdam

Chen SM (1994) Fuzzy system reliability analysis using fuzzy number arithmetic operations.
Fuzzy Sets Sys 64:3138
Cheng CH, Mon DL (1993) Fuzzy system reliability analysis by interval of confidence. Fuzzy
Sets Sys 56:2935
Ding Y, Lisnianski A (2008) Fuzzy universal generating functions for multi-state system reliabil-
ity assessment. Fuzzy Sets Sys 159:307324
Ding Y, Zuo M, Lisnianski A et al (2008) Fuzzy multi-state system: general definition and per-
formance evaluation. IEEE Trans Reliab 57:589594
Guan J, Wu Y (2006) Repairable consecutive-k-out-of-n:F system with fuzzy states. Fuzzy Sets
Sys157:121142
Huang H, Tong X, Zuo M (2004) Posbist fault tree analysis of coherent systems. Reliab
Eng Syst Saf 84:141148
Huang H, Zuo M, Sun Z (2006) Bayesian reliability analysis for fuzzy lifetime data. Fuzzy Sets
and Systems 157:16741686
Kaufmann A, Gupta MM (1988) Fuzzy mathematical models in engineering and management
Science. Elsevier, New York
Klir GJ, Clair US, Yuan B (1997) Fuzzy set theory foundations and applications. Prentice Hall,
Englewood Cliffs, New Jersey
Lisnianski A, Levitin G (2003) Multi-state system reliability assessment, optimization, applica-
tions. World Scientific, Singapore
Misra KB, Weber G (1990) Use of fuzzy set theory for level-I studies in probabilistic risk as-
sessment. Fuzzy Sets Sys 37:139160
Pappis CP, Karacapilidis NI (1993) A comparative assessment of measures of similarity of fuzzy
values. Fuzzy Sets Sys 56:171174
Singer D (1990) A fuzzy set approach to fault tree and reliability analysis. Fuzzy Sets Sys
34:145155
Utkin LV, Gurov SV (1996) A general formal approach for fuzzy reliability analysis in the pos-
sibility context. Fuzzy Sets Sys 83:203213
Zadeh LA (1978) Fuzzy sets as a basis for the theory of possibility. Fuzzy Sets Sys 1:328
Zimmermann HJ (1991) Fuzzy set theory and its application. Kluwer, Amsterdam
Appendix A
Heuristic Algorithms as a General Optimization
Technique
A.1 Introduction
There are many optimization methods available for use on various reliability op-
timization problems (Lisnianski and Levitin 2003). The applied algorithms can be
classified into two categories: heuristics and exact techniques based on the modi-
fications of dynamic programming and nonlinear programming. Most of the exact
techniques are strongly problem oriented. This means that since they are designed
for solving certain optimization problems, they cannot be easily adapted for solv-
ing other problems. Recently, most research works have focused on developing
general heuristics techniques to solve reliability optimization problems, that are
based on artificial intelligence and stochastic techniques to direct the search. The
important advantage of these techniques is that they do not require any informa-
tion about the objective function besides its values corresponding to the points vis-
ited in the solution space. All heuristics techniques use the idea of randomness
when performing a search, but they also use past knowledge in order to direct the
search. Such search algorithms are known as randomized search techniques. This
appendix includes and updates the reports related to the heuristic algorithms by
Lisnianski and Levitin (2003) and some further discussion and examples.
Based on the classification by Lisnianski and Levitin (2003) and some recent
research, the heuristics techniques include simulated annealing, ant colony, tabu
searching, genetic algorithm (GA), and particle swarm optimization (PSO).
Kirkpatrick et al. (1983) first presented the simulated annealing algorithm. The
idea is initialized by the metallurgy procedure called annealing process. The simu-
lated annealing algorithm can not only improve the objective value of local search
but it also can allow a move to some solutions with higher costs (Lisnianski and
Levitin 2003). This algorithm can obtain global solutions rather than local ones.
The ant colony was first introduced by Dorigo and Gambardella (1997). The
inspiration of the algorithm was from the behavior of natural ant colonies (Lis-
nianski and Levitin 2003): by leaving different amounts of smelling ferments in
348 Appendix A
paths, an ant colony is capable of finding the shortest path from home to a source
and is also able to adapt to changes in the environment.
The tabu search was first described by Glover (1989). This search uses previ-
ously obtained information to restrict the next research direction. The technique is
intelligent and guides the search for global optimal solutions.
PSO was first described by Kennedy and Eberhart (1995). The inspiration of
PSO was the behaviors of a bird swarm. There is some similarity between GAs
and PSO: a stochastic heuristic search is conducted by operating a population of
solutions. However, there are no evolution operators such as crossover and muta-
tion in PSO (PSO Tutorial). It is noticed that the information sharing mechanism
between GA and PSO is totally different (PSO Tutorial): in GA the whole popula-
tion of solutions moves relatively uniformly to the optimal area because solution
chromosomes share information with each other; in PSO only the solution pa-
rameters (gbest and pbest) send out information, which is a one-way information
sharing.
The procedure to solve the optimization problem of the PSO includes the fol-
lowing steps (Parket et al. 2005):
Step 1: Generate an initial population of solutions randomly in the search
space. A particle is represented by a solution vector.
Step 2: Evaluate the fitness of each particle.
Step 3: Calculate the position and velocity for each particle in the swarm using
the following equations:
Vi k +1 = Vi k + c1 rand1 ( pbestik Posik ) + c2 rand 2 ( gbest k Posik ),

Posik +1 = Posik + Vi k +1 ,
where Vi k is the velocity of particle i for iteration k, Posik is the position of

particle i for iteration k representing the associated fitness, c1 and c2 are pa-
rameters, rand1 and rand 2 are random numbers between 0 and 1, pbestik is
the best fitness value of particle i obtained thus far, and gbest k is the best fit-
ness value of the swarm obtained thus far.
The updating positions of particles must be located in the search space. If a
particle violates its constraints, its position is adjusted to the border point of the
search space.
Step 4: If the position of particle i is the best value so far, set it as the new
pbestik .
Step 5: Set the gbest k as the best value of pbestik in the swarm.
Step 6: Repeat steps 2 4 until the stopping criterion is satisfied. The stopping
criterion of PSO can be the fixed number of computing cycles or where the so-
lution of the proposed PSO reaches a set optimal value.
Appendix A 349
The GA is the most widely used heuristics technique. It was inspired by the op-
timization procedures in the biological phenomenon of evolution. In GA, new
population of solutions comes from the optimal selection of offspring solutions
generated by the previous population. Crossover and mutation operators are used
by parents to produce their offspring. The survival of offspring is deter-
mined by their adaptation to the environment. GAs are the most popular heuristic
algorithms for solving different kinds of reliability optimization problems. The de-
tailed descriptions of GA in the later sections of this Appendix include and update
the reports by Lisnianski and Levitin (2003). The advantages of GAs include the
following (Goldberg 1989; Lisnianski and Levitin 2003):
They can be relatively easily implemented to solve different problems includ-
ing constrained optimization problems.
A population of solutions is used to conduct the optimal search in GAs.
GAs are stochastic in nature.
GAs are parallel and can produce good quality solutions simultaneously.
The GA was first introduced by Holland (1975). Holland was impressed by the
ease with which biological organisms could perform tasks, which eluded even the
most powerful computers. He also noted that very few artificial systems have the
most remarkable characteristics of biological systems: robustness and flexibility.
Unlike technical systems, biological ones have methods for self-guidance, self-
repair, and reproduction of these features. Hollands biologically inspired ap-
proach to optimization is based on the following analogies:
As in nature, where there are many organisms, there are many possible solu-
tions to a given problem.
As in nature, where an organism contains many genes defining its properties,
each solution is defined by many interacting variables (parameters).
As in nature, where groups of organisms live together in a population and some
organisms in the population are more fit than others, a group of possible solu-
tions can be stored together in computer memory and some of them will be
closer to the optimum than others.
As in nature, where organisms that are fitter have more chances of mating and
having offspring, solutions that are closer to the optimum can be selected more
often to combine their parameters to form new solutions.
As in nature, where organisms produced by good parents are more likely to be
better adapted than the average organism because they received good genes, the
offspring of good solutions are more likely to be better than a random guess,
since they are composed of better parameters.
As in nature, where survival of the fittest ensures that the successful traits con-
tinue to get passed along to subsequent generations and are refined as the popu-
lation evolves, the survival-of-the-fittest rule ensures that the composition of
the parameters corresponding to best guesses continually get refined.
350 Appendix A
A GA maintains a population of individual solutions, each one represented by a

finite string of symbols, known as the genome, encoding a possible solution within
a given problem space. This space, referred to as the search space, comprises all of
the possible solutions to the problem at hand. Generally speaking, a GA is applied
to spaces that are too large to be searched exhaustively.
GAs exploit the idea of the survival of the fittest and an interbreeding popula-
tion to create a novel and innovative search strategy. They iteratively create new
populations from the old ones by ranking the strings and interbreeding the fittest
to create new strings, which are (hopefully) closer to the optimum solution for the
problem at hand. In each generation, a GA creates a set of strings from pieces of
the previous strings, occasionally adding random new data to keep the population
from stagnating. The result is a search strategy that is tailored for vast, complex,
multimodal search spaces.
The idea of survival of the fittest is of great importance to GA. GAs use what is
termed the fitness function in order to select the fittest string to be used to create
new, and conceivably better, populations of strings. The fitness function takes a
string and assigns it a relative fitness value. The method by which it does this and
the nature of the fitness value do not matter. The only thing that the fitness func-
tion must do is rank the strings in some way by producing their fitness values.
These values are then used to select the fittest strings. GAs use the idea of ran-
domness when performing a search. However, it must be clearly understood that
GAs are not simply random search algorithms. Random search algorithms can be
inherently inefficient due to the directionless nature of their search. GAs are not
directionless. They utilize knowledge from previous generations of strings in order
to construct new strings that will approach the optimal solution. GAs are a form of
randomized search, and the way that the strings are chosen and combined com-
prise a stochastic process.
The essential differences between GAs and other forms of optimization, ac-
cording to Goldberg (1989), are as follows.
GAs usually use a coded form of the solution parameters rather than their ac-
tual values. Solution encoding in the form of strings of symbols (an analogy to
chromosomes containing genes) provides the possibility of crossover and muta-
tion. The symbolic alphabet that was used was initially binary, due to certain
computational advantages purported by Holland (1975). This has been extended in
recent years to include character-based encodings, integer and real-valued encod-
ings, and tree representations (Michalewicz 1996).
GAs do not just use a single point on the problem space, rather they use a set,
or population, of points (solutions) to conduct a search. This gives GAs the power
to search noisy spaces littered with local optimum points. Instead of relying on a
single point to search through the space, GAs look at many different areas of the
problem space at once and use all of this information as a guide.
GAs use only payoff information to guide themselves through the problem
space. Many search techniques need a range of information to guide themselves.
For example, gradient methods require derivatives. The only information a GA
Appendix A 351
needs to continue searching for the optimum is some measure of fitness about a
point in the search space.
GAs are probabilistic in nature, not deterministic. This is a direct result of the
randomization techniques used by GAs.
GAs are inherently parallel. It is one of their most powerful features. GAs, by
their nature, are very parallel, dealing with a large number of solutions simultane-
ously. Using schemata theory, Holland has estimated that a GA, processing n
strings at each generation, in reality processes n 3 useful substrings (Goldberg
1989).
Two of the most common GA implementations are generational and steady
state. The steady-state technique has received increased attention (Kinnear 1993)
because it can offer a substantial reduction in the memory requirements in compu-
tation: the technique abolishes the need to maintain more than one population dur-
ing the evolutionary process, which is necessary in a generational GA. In this way,
genetic systems have greater portability for a variety of computer environments
because of the reduced memory overhead. Another reason for the increased inter-
est in steady-state techniques is that, in many cases, a steady-state GA has been
shown to be more effective than a generational GA (Syswerda 1991; Vavak and
Fogarty 1996). This improved performance can be attributed to factors such as the
diversity of the population and the immediate availability of superior individuals.
Detailed descriptions of a generational GA were illustrated in Goldberg (1989).
Therefore, the structure of a steady-state GA is introduced here.
The steady-state GA proceeds as follows (Whitley 1989), as shown in Figure
A1. First we generate randomly or heuristically an initial population of solutions.
Within this population, new solutions are obtained during the genetic cycle using a
crossover operator. This operator produces an offspring from a randomly selected
pair of parent solutions (the parent solutions are selected with a probability pro-
portional to their relative fitness), facilitating the inheritance of some basic proper-
ties from the parents to the offspring. The newly obtained offspring undergoes
mutation with probability Pmut .
Fig. A.1 Structure of a steady-state GA

352 Appendix A
Each new solution is decoded and its objective function (fitness) values are es-
timated. These values, which are a measure of quality, are used to compare differ-
ent solutions. The comparison is accomplished by a selection procedure that de-
termines which solution is better: the newly obtained solution or the worst solution
in the population. The better solution joins the population, while the other is dis-
carded. If the population contains equivalent solutions following selection, then
redundancies are eliminated and the population size decreases as a result. A ge-
netic cycle terminates when N rep new solutions are produced or when the number
of solutions in the population reaches a specified level. Then, new randomly con-
structed solutions are generated to replenish the shrunken population, and a new
genetic cycle begins. The whole GA is terminated when its termination condition
is satisfied. This condition can be specified in the same way as in a generational
GA. The following is the steady-state GA in pseudo code format (Lisnianski and
Levitin 2003).
begin STEADY STATE GA

Initialize population
Evaluate population {compute fitness values}
while GA termination criterion is not satisfied do
{GENETIC CYCLE}
while genetic cycle termination criterion is not
satisfied do
Select at random Parent Solutions S1, S2 from
Crossover: (S1, S2) O {offspring}
Mutate offspring O O* with probability Pmut
Evaluate O*
Replace SW {the worst solution in with O* }
if O* is better than SW .
Eliminate identical solutions in
end while
Replenish with new randomly generated solutions
end while
end GA
Example A.1 (Lisnianski and Levitin 2003). In this example we present several
initial stages of a steady-state GA that maximizes the function of six integer vari-
ables x1 , x2 ,..., x6 taking the form
f ( x1 ,..., x6 ) = 1000 ( x1 3.4 ) + ( x2 1.8 ) + ( x3 7.7 )

2 2 2

+ ( x4 3.1) + ( x5 2.8 ) + ( x6 8.8 ) .
2 2 2
Appendix A 353
The variables can take values from 1 to 9. The initial population, consisting of
five solutions ordered according to their fitness (value of function f), is
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 )
1 4 2 4 1 2 5 297.8
2 3 7 7 7 2 7 213.8
3 7 5 3 5 3 9 204.2
4 2 7 4 2 1 4 142.5
5 8 2 3 1 1 4 135.2
Using the random generator that produces the numbers of the solutions, the GA
chooses the first and third strings, i.e., (4 2 4 1 2 5) and (7 5 3 5 3 9), respectively.
From these strings, it produces a new one by applying a crossover procedure that
takes the first three numbers from the better parent string and the last three num-
bers from the inferior parent string. The resulting string is (4 2 4 5 3 9). The fit-
ness of this new solution is f ( x1 ,..., x6 ) = 562.4. The new solution enters the
population, replacing the one with the lowest fitness. The new population is now
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 )
1 4 2 4 5 3 9 562.4
2 4 2 4 1 2 5 297.8
3 3 7 7 7 2 7 213.8
4 7 5 3 5 3 9 204.2
5 2 7 4 2 1 4 142.5
Choosing at random the third and fourth strings, (3 7 7 7 2 7) and (7 5 3 5 3 9)

respectively, the GA produces the new string (3 7 7 5 3 9) using the crossover op-
erator. This string undergoes a mutation that changes one of its numbers by one
(here, the fourth element of the string changes from 5 to 4). The resulting string (3
7 7 4 3 9) has a fitness of f ( x1 ,..., x6 ) = 349.9. This solution is better than the in-
ferior one in the population; therefore, the new solution replaces the inferior one.
Now the population takes the form
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 ) )
1 4 2 4 5 3 9 562.4
2 3 7 7 4 3 9 349.9
3 4 2 4 1 2 5 297.8
4 3 7 7 7 2 7 213.8
5 7 5 3 5 3 9 204.2
354 Appendix A
A new solution (4 2 4 4 3 9) is obtained by the crossover operator over the ran-

domly chosen first and second solutions, i.e., (4 2 4 5 3 9) and (3 7 7 4 3 9), re-
spectively. After the mutation this solution takes the form (4 2 4 5 3 9) and has the
fitness f ( x1 ,..., x6 ) = 1165.5. The population obtained after the new solution joins
it is
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 )
1 4 2 5 4 3 9 1165.5
2 4 2 4 5 3 9 562.4
3 3 7 7 4 3 9 349.9
4 4 2 4 1 2 5 297.8
5 3 7 7 7 2 7 213.8
Note that the mutation procedure is not applied to all the solutions obtained by
the crossover. This procedure is used with some prespecified probability pmut . In
our example, only the second and the third newly obtained solutions underwent
mutation.
The actual GAs operate with much larger populations and produce thousands of
new solutions using the crossover and mutation procedures. The steady-state GA
with a population size of 100 obtained the optimal solution for the problem pre-
sented after producing about 3000 new solutions. Note that the total number of
possible solutions is 96 = 531441. The GA managed to find the optimal solution
by exploring less than 0.6% of the entire solution space.
Both types of GA are based on the crossover and mutation procedures, which
depend strongly on the solution encoding technique. These procedures should pre-
serve the feasibility of the solutions and provide the inheritance of their essential
properties.
There are three basic steps in applying a GA to a specific problem. In the first
step, one defines the solution representation (encoding in the form of a string of
symbols) and determines the decoding procedure, which evaluates the fitness of
the solution represented by the arbitrary string.
In the second step, one has to adapt the crossover and mutation procedures to
the given representation in order to provide feasibility for the new solutions pro-
duced by these procedures as well as inheriting the basic properties of the parent
solutions by their offspring.
In the third step, one has to choose the basic GA parameters, such as the popu-
lation size, the mutation probability, the crossover probability (generational GA),
or the number of crossovers per genetic cycle (steady-state GA), and formulate the
termination condition in order to provide the greatest possible GA efficiency
(convergence speed).
The strings representing GA solutions are randomly generated by the popula-
tion generation procedure, modified by the crossover and mutation procedures,
Appendix A 355
and decoded by the fitness evaluation procedure. Therefore, the solution represen-
tation in the GA should meet the following requirements:
It should be easily generated (the complex solution generation procedures re-
duce the GA speed).
It should be as compact as possible (using very long strings requires excessive
computational resources and slows the GA convergence).
It should be unambiguous (i.e., different solutions should be represented by dif-
ferent strings).
It should represent feasible solutions (if no randomly generated string repre-
sents a feasible solution, then the feasibility should be provided by simple
string transformation).
It should provide feasibility inheritance of new solutions obtained from feasible
ones by the crossover and mutation operators.
The field of reliability optimization includes problems of finding optimal pa-
rameters, optimal allocation and assignment of different elements into a system,
and optimal sequencing of the elements. Many of these problems are combinato-
rial by nature. The most suitable symbol alphabet for this class of problems is in-
teger numbers. A finite string of integer numbers can be easily generated and
stored. The random generator produces integer numbers for each element of the
string in a specified range. This range should be the same for each element in or-
der to make the string generation procedure simple and fast. If for some reason
different string elements belong to different ranges, then the string should be
transformed to provide solution feasibility.
In the following subsections we show how integer strings of GAs can be inter-
preted for solving different kinds of optimization problems.
A.2 Parameter Determination Problems
When the problem lies in determining a vector of H parameters ( X 1 , X 2 ,..., X H )

that maximizes an objective function f ( X 1 , X 2 ,..., X H ) one always has to specify
the ranges of the parameter variation:
X min
j X j X max
j , 1 j H. (A.1)
In order to facilitate the search in the solution space determined by inequalities

(A.1), integer strings a = (a1a2 ...aH ) should be generated with elements ranging
from 0 to N and the values of parameters should be obtained for each string as
X j = X min
j + a j ( X max
j X min
j ) N. (A.2)
356 Appendix A
Note that the space of the integer strings just approximately maps the space of
the real-valued parameters. The number N determines the precision of the search.
The search resolution for the jth parameter is ( X max
j X min
j ) N .
Therefore, an increase in N provides a more precise search. On the other hand,

the size of the search space of integer strings grows drastically with the increase in
N, which slows the GA convergence. A reasonable compromise can be found by
using a multistage GA search. In this method, a moderate value of N is chosen and
the GA is run to obtain a crude solution. Then the ranges of all the parameters
are corrected to accomplish the search in a small vicinity of the vector of parame-
ters obtained and the GA is started again. The desired search precision can be ob-
tained by a few iterations.
Example A.2 Consider a weighted voting system with seven voting units. Assume
that following a preliminary decision the ranges of the possible variations of the
units weight are different. Let the random generator provide the generation of in-
teger numbers in the range of 0 to 100 (N = 100). The random integer string and
the corresponding weights of the units (values of the parameters) obtained accord-
ing to (A.2) are presented in Table A.1.
Table A.1 Example of weighted voting system parameters encoding
No. of variable 1 2 3 4 5 6 7
x min
j 0.0 0.0 1.0 1.0 1.0 0.0 0.0
x max
j 3.0 3.0 5.0 5.0 5.0 5.0 5.0
Random integer
21 4 0 100 72 98 0
string
Decoded variable 0. 63 0.12 1.0 5.0 3.88 4.9 0
A.3 Partition and Allocation Problems
The partition problem can be considered as a problem of allocating Y items be-

longing to a set in K mutually disjoint subsets i , i.e., such that
K
i = , i j = , i j . (A.3)
i =1
Each set can contain from 0 to Y items. The partition of set can be repre-
sented by the Y-length string a = ( a1a2 ...aY 1aY ) in which a j is a number of the
Appendix A 357
set to which item j belongs. Note that in the strings representing feasible solutions
of the partition problem, each element can take a value in the range (1, K).
Now consider a more complicated allocation problem in which the number of
items is not specified. Assume that there are H types of different items with an
unlimited number of items for each type h. The number of items of each type allo-
cated in each subset can vary. To represent an allocation of the variable number of
items in K subsets one can use the following string encoding:
a = ( a11a12 ...a1K ...aH 1aH 2 ...aHK ) ,
in which aij corresponds to the number of items of type i belonging to subset j. Ob-
serve that the different subsets can contain identical elements.
Example A.3 Consider the problem of allocating three types of transformers char-
acterized by different nominal power and different availability at two substations
in a power transmission system. In this problem, H = 3 and K = 2. Any possible
allocation can be represented by an integer string using the encoding described
above. For example, the string (2 1 0 1 1 1) encodes the solution in which two
type 1 transformers are allocated in the first substation and one in the second sub-
station, one transformer of type 2 is allocated in the second substation, and one
transformer of type 3 is allocated in each of the two substations.
When K = 1, one has an assignment problem in which a number of different
items should be chosen from a list containing an unlimited number of items of K
different types. Any solution of the assignment problem can be represented by the
string a = ( a1a2 ...aK ) , in which a j corresponds to the number of chosen items of
type j.
The range of variance of string elements for both allocation and assignment
problems can be specified based on the preliminary estimation of the characteris-
tics of the optimal solution (maximal possible number of elements of the same
type included into the single subset). The greater the range, the greater the solution
space to be explored (note that the minimal possible value of the string element is
always zero in order to provide the possibility of not choosing any element of the
given type for the given subset). In many practical applications, the total number
of items belonging to each subset is also limited. In this case, any string represent-
ing a solution in which this constraint is not met should be transformed in the fol-
lowing way:
H

H
* a N
aij = ij j
h =1
ahj , if N j < ahj ,
h =1 (A.4)
a , otherwise,
ij
358 Appendix A
for 1 i H , 1 j K and N j is the maximal allowed number of items in sub-
set j.
Example A.4 Consider the case in which the transformers of three types should be
allocated to two substations. Assume that it is prohibited to allocate more than five
transformers of each type to the same substation. The GA should produce strings
with elements ranging from 0 to 5. An example of such a string is (4 2 5 1 0 2).
Assume that for some reason the total number of transformers in the first and
second substations is restricted to seven and six, respectively. In order to obtain a
feasible solution, one has to apply transform (A.4) in which
3 3
N1 = 7, N 2 = 6, a
h =1
h1 = 4 + 5 + 0 = 9, a
h =1
h2 = 2 + 1 + 2 = 5.
The string elements take the values
a11 = [ 4 7 9] = 3, a21 = [5 7 9] = 3, a31 = [ 0 7 9] = 0,

a12 = [ 2 6 5] = 2, a22 = [1 6 5] = 1, a32 = [ 2 6 5] = 2.

After the transformation, one obtains the following string: (3 2 3 1 0 2).

When the number of item types and subsets is large, the solution representation
described above results in an enormous growth of the length of the string. In addi-
tion, to represent a reasonable solution (especially when the number of items be-
longing to each subset is limited), such a string should contain a large fraction of
zeros because only a few items should be included in each subset. This redun-
dancy causes an increase in the need for computational resources and lowers the
efficiency of the GA. To reduce the redundancy of the solution representation,
each inclusion of m items of type h in subset k is represented by a triplet (m h k).
In order to preserve the constant length of the strings, one has to specify in ad-
vance a maximum reasonable number of such inclusions I. The string representing
up to I inclusions takes the form ( m1h1k1m2 h2 k2 ...mI hI k I ) . The range of string ele-
ments should be (0, max{M, H, K}), where M is the maximum possible number of
elements of the same type included in a single subset. An arbitrary string gener-
ated in this range can still produce infeasible solutions. In order to provide the fea-
sibility, one has to apply the transform aj = mod x +1 a j , where x is equal to M, H,
and K for the string elements corresponding to m, h, and k, respectively. If one of
the elements of the triplet is equal to zero, then this means that no inclusion is
made.
For example, the string (31212321122232) represents the same allocation as
the string (323102) in Example A.4. Note that the permutation of triplets, as well
as an addition or reduction of triplets containing zeros, does not change the solu-
tion. For example, the string (401232212311122321) also represents the same al-
location as that of the previous string.
Appendix A 359
A.4 Mixed Partition and Parameter Determination Problems
Consider a problem in which Y items should be allocated to K subsets and a value

of a certain parameter should be assigned to each item. The first option for repre-
senting solutions of such a problem in the GA is by using a 2Y-length string that
takes the form a = ( a11a12 a21a22 ...aY 1aY 2 ) . In this string, a j1 and a j 2 correspond
respectively to the number of the set item j belongs to and to the value of the pa-
rameter associated with this item. The elements of the string should be generated
in the range (0, max{K, N}), where N is chosen as described in Section A.2. The
solution decoding procedure should transform the odd elements of the string as
follows:
aj1 = 1 + mod K a j1 (A.5)
in order to obtain the class number in the range (1, K ) . The even elements of the
string should be transformed as follows:
aj 2 = 1 + mod K a j 2 (A.6)
in order to obtain the parameter value encoded by the integer number in the range
( 0, N ) . The value of the parameter is then obtained using Equation A.2.
Example A.5 Consider a weighted voting system in which seven voting units
(N = 7) should be allocated to three separate subsets (K = 3) and a value of a pa-
rameter (weight) associated with each unit should be chosen. The solution should
encode both units distribution among the subsets and the parameters (weights).
Let the range of the string elements be (0, 100) (N = 100). The string (99 21 22 4
75 0 14 100 29 72 60 98 1 0) (in which elements corresponding to the numbers of
the subsets are marked in italics) represents the solution presented in Table A.2.
The values corresponding to the numbers of the groups are obtained using Equa-
tion A.5 as
a11 = 1 + mod K a11 = 1 + mod 3 99 = 1,

a21 = 1 + mod K a21 = 1 + mod 3 22 = 2,
and so on. Observe that, in this solution, items 1, 3, and 6 belong to the first sub-
set, units 2 and 7 belong to the second subset, and units 4 and 5 belong to the third
subset. The parameters are identical to those in Example A.2.
360 Appendix A
Table A.2 Example of the solution encoding for the mixed partition and parameter determination
problem
No. of unit 1 2 3 4 5 6 7
No. of subset 1 2 1 3 3 1 2
Integer code parameter value 21 4 0 100 72 98 0
This encoding scheme has two disadvantages:

A large number of different strings can represent an identical solution. Indeed,
when K is much smaller than N, many different values of a ji produce the same
value of 1 + mod K a ji (actually, this transform maps any value mK+n for
n < K and m = 1, 2, , [( N n) / K ] into the same number n + 1). Note, for
example, that the string (3 21 76 4 27 0 29 100 89 72 18 98 70) represents the
same solution as the string presented above. This causes a situation where the
GA population is overwhelmed with different strings corresponding to the
same solution, which misleads the search process.
The string is quite long, which slows the GA process and increases the need for
computational resources. In order to avoid these problems, another solution
representation can be suggested that lies in using a Y-length string in which
element a j represents both the number of the set and the value of the parameter
corresponding to item j. To obtain such a compound representation, the string
elements should be generated in the range (0, K(N+1)-1). The number of the
subset that element j belongs to should be obtained as
1 + [a j / ( N + 1)] (A.7)
and the number corresponding to the value of jth parameter should be obtained as
mod N +1 a j . (A.8)
Consider the example presented above with K = 3 and N = 100. The range of
the string elements should be (0, 302). The string (21 105 0 302 274 98 101) cor-
responds to the same solution as the strings in the previous example (Table A.2).
A.5 Sequencing Problems
The sequencing problem consists in ordering a group of unique items. It can be

considered as a special case of the partition problem in which the number of items
Y is equal to the number of subsets K and no subset should be empty. As in the
Appendix A 361
partition problem, the sequences of items can be represented by Y-length strings

( a1a2 ...aY 1aY ) in which a j is the number of a set to which item j belongs. How-
ever, in the case of the sequencing problem, the string representing a feasible solu-
tion should be a permutation of Y integer numbers, i.e., it should contain all the
numbers from 1 to Y and each number in the string should be unique. While the
decoding of such strings is very simple (it just explicitly represents the order of
item numbers), the generation procedure should be more sophisticated to satisfy
the above-mentioned constraints. The simplest procedure for generating a random
string permutation is as follows:
1. Fill the entire string with zeros.
2. For i from 1 to Y in the sequence:
Generate a random number j in the range (1, Y).
If a j = 0, assign a j = i or else find the closest zero element to the right of
a j and assign i to this element (treat the string as a circle, i.e., consider a0
to be the closest element to the right of aY ).
Like the generation procedures for the partition problem, this one also requires
the generation of Y random numbers.
Example A.6 In a restructured power system, a generating system can plan its
own reserve and can also share the reserve with other generating systems accord-
ing to their reserve contracts. The reserve structure of a generating system should
be determined based on the balance between the required reliability and the re-
serve cost, which is an optimization problem. A GA with a special encoding
scheme that considers the structure of reserve capacity and reserve utilization or-
der is developed for the optimization problem. A mixed numerical and binary
Y
string with length Y + D j is used to encode a solution (Ding et al. 2006). A
j =1
typical chromosome representing three reserve providers is formed as shown in

Figure A.2.
Fig. A.2 GA chromosome structure
The first sequence of Y numerical items represents Y reserve providers and their
reserve utilization order in a contingency state. The initial sequence of the first Y
items is generated randomly and should be a permutation of Y integer numbers,
362 Appendix A
i.e., it should contain all the numbers from 1 to Y and each number in the string
should be unique. The sequence of items can be represented by Y-length strings
( a1a2 ...aY ) in which a j is the number of a set to which item j belongs. The above
procedure is used for generating a random string permutation.
Y
The next D
j =1
j binary bits represents the contracted reserve capacity for Y re-
serve providers, while D j is the number of binary bits encoding the amount of the
contracted reserve capacity from the reserve provider j. Encoding is performed us-
ing different numbers of binary bits for each contracted reserve amount, depend-
ing on the desired accuracy.
Using this encoding algorithm, the solutions for obtaining the reserve utiliza-
tion order are within the feasible space. As appeared in Figure A.2, reserve pro-
vider 2 is used first, reserve provider 1 is used second, and so on, up to the point
where either the load is met or the available reserve is used up in a contingency
state.
A.6 Determination of Solution Fitness
Having a solution represented in the GA by an integer string a, one then has to es-
timate the quality of this solution (or, in terms of the evolution process, the fitness
of the individual). The GA seeks solutions with the greatest possible fitness.
Therefore, the fitness should be defined in such a way that its greatest values cor-
respond to the best solutions. For example, when optimizing the system reliability
R (which is a function of some of the parameters represented by a) one can define
the solution fitness equal to this index, since one wants to maximize it. By con-
trast, when minimizing the system cost C, one has to define the solution fitness
as M C , where M is a constant number. In this case, the maximal solution fit-
ness corresponds to its minimal cost.
In the majority of optimization problems, the optimal solution should satisfy
some constraints. There are three different approaches to handling the constraints
in GA (Michalewicz 1996). One of these uses penalty functions as an adjustment
to the fitness function; two other approaches use decoder or repair algorithms
to avoid building illegal solutions or repair them, respectively. The decoder and
repair approaches suffer from the disadvantage of being tailored to the specific
problems and thus are not sufficiently general to handle a variety of problems. On
the other hand, the penalty approach based on generating potential solutions with-
out considering the constraints and on decreasing the fitness of solutions, violating
the constraints, is suitable for problems with a relatively small number of con-
straints. For heavily constrained problems, the penalty approach causes the GA to
spend most of its time evaluating solutions violating the constraints. Fortunately,
reliability optimization problems usually deal with few constraints.
Appendix A 363
Using the penalty approach one transforms a constrained problem into an un-
constrained one by associating a penalty with all constraint violations. The penalty
is incorporated into the fitness function. Thus, the original problem of maximizing
a function f (a ) is transformed into the maximization of the function:
J
f ( a ) j , (A.9)
j =1
where J is the total number of constraints, is a penalty coefficient, and j is a

penalty coefficient related to the jth constraint ( j = 1,, J ) . Note that the penalty
coefficient should be chosen in such a way as to allow the solution with the small-
est value of f (a ) that meets all of the constraints to have a fitness greater than the
solution with the greatest value of f (a ) but violating at least one constraint.
Consider, for example, a typical problem of maximizing the system reliability
subject to cost constraint:
R(a ) max subject to C (a ) C ' .
The system cost and reliability are functions of parameters encoded by a

string a: C(a) and R(a), respectively. The system cost should not be greater
than C ' . The fitness of any solution a can be defined as
M + R(a ) (C ' , a ), (A.10)
where (C ' , a ) = (1 + C (a ) C ' ) 1(C (a ) > C ' ).

The coefficient should be greater than 1. In this case the fitness of any solu-
tion violating the constraint is smaller than M (the smallest violation of the con-
straint C (a ) C ' produces a penalty greater than ), while the fitness of any so-
lution meeting the constraint is greater than M. In order to keep the fitness of the
solutions positive, one can choose M > (1 + Cmax C ' ), where Cmax is the maxi-
mum possible system cost.
Another typical optimization problem is minimizing the system cost subject to
the reliability constraint:
C (a ) min subject to R(a ) R ' .
The fitness of any solution a of this problem can be defined as
M C (a ) ( R ' , a ), (A.11)
364 Appendix A
where ( A' , a ) = (1 + R ' R (a )) 1( R (a ) < R ' ).

The coefficient should be greater than Cmax . In this case, the fitness of any
solution violating the constraint is smaller than M Cmax , while the fitness of any
solution meeting the constraint is greater than M Cmax . In order to keep the fit-
ness of the solutions positive, one can choose M > Cmax + 2 .
A.7 Basic Genetic Algorithm Procedures and Reliability

Application
The crossover procedures create a new solution as the offspring of a pair of exist-
ing ones (parent solutions). The offspring should inherit some useful properties of
both parents in order to facilitate their propagation throughout the population. The
mutation procedure is applied to the offspring solution. It introduces slight
changes into the solution encoding string by modifying some of the string ele-
ments. Both of these procedures should be developed in such a way as to provide
the feasibility of the offspring solutions given that parent solutions are feasible.
When applied to parameter determination, partition, and assignment problems,
the solution feasibility means that the values of all of the string elements belong to
a specified range. The most commonly used crossover procedures for these prob-
lems generate offspring in which every position is occupied by a corresponding
element from one of the parents. This property of the offspring solution provides
its feasibility. For example, in single-point crossover all the elements located to
the left of a randomly chosen position are copied from the first parent and the rest
of the elements are copied from the second parent.
The commonly used mutation procedure changes the value of a randomly se-
lected string element by 1 (increasing or decreasing this value with equal probabil-
ity). If after the mutation the element is out of the specified range, it takes the
minimal or maximal allowed value.
When applied to sequencing problems, the crossover and mutation operators
should produce offspring that preserve the form of permutations. This means that
the offspring string should contain all of the elements that appear in the initial
strings and each element should appear in the offspring only once. Any omission
or duplication of the element constitutes an error. The mutation procedure that
preserves the permutation feasibility swaps two string elements initially located in
two randomly chosen positions.
There are no general rules for choosing the values of basic GA parameters for
solving specific optimization problems. The best way to determine the proper
combination of these values is by experimental comparison between GAs with dif-
ferent parameters. The GAs should solve a set of problems. When solving each
problem, different GAs should start with the same initial population.
Appendix A 365
There are many applications of GAs used in reliability engineering. Redun-

dancy allocation and structure optimization are the two main areas of its applica-
tion (Levitin et al. 1998; Levitin 2001; Painton and Campbell 1995; Coit and
Smith 1996, Hsiehet et al. 1998; Yang et al. 1999; Gen and Cheng 2000). Other
areas include optimal network topology subject to reliability constraints (Kumar et
al. 1995; Deeter and Smith 1998; Cheng 1998) and maintenance optimization
(Usher et al. 1998; Levitin and Lisnianski 1999; Levitin and Lisnianski 2000).
References
Cheng S (1998) Topological optimization of a reliable communication network. IEEE Trans Re-
liab 47:2331
Coit D, Smith A (1996) Reliability optimization of series-parallel systems using genetic algo-
rithm. IEEE Trans Reliab 45:254266
Deeter D, Smith A (1998) Economic design of reliable networks. IIE Trans 30:11611174.
Ding Y, Wang P, Lisnianski A (2006) Optimal reserve management for restructured power gen-
erating systems. Reliab Eng Syst Saf 91:792799
Dorigo M, Gambardella L (1997) Ant colony system: a cooperative learning approach to the
traveling salesman problem. IEEE Trans Evol Comput 1:5366
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization. Wiley, New York
Glover F (1989) Tabu search-part I. ORSA J Comput 1(3):190206
Goldberg D (1989) Genetic algorithm in search, optimization and machine learning. Addison-
Wesley, Reading, MA
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press,
Ann Arbor, MI
Hsieh Y, Chen T, Bricker D (1998) Genetic algorithm for reliability design problems. Microelec-
tron and Reliab 38:15991605
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE Inter-
national Conference on Neural Networks, 4:19421948
Kinnear K (1993) Generality and difficulty in genetic programming: evolving a sort. In: Forrest
S (ed) Proceedings of the 5th International Conference on Genetic Algorithms. Morgan Kauf-
mann, San Francisco, pp 287294
Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science
220:671680
Kumar A, Pathak R, Gupta Y (1995) Genetic algorithms-based reliability optimization for com-
puter network expansion. IEEE Trans Reliab 44:6372
Levitin G, Lisnianski A, Ben-Haim H et al (1998) Redundancy optimization for series-parallel
multi-state systems. IEEE Trans Reliab 47:165172
Levitin G, Lisnianski A (1999) Joint redundancy and maintenance optimization for multi-state
series-parallel systems. Reliab Eng Syst Saf 64:3342
Levitin G, Lisnianski A (2000) Optimization of imperfect preventive maintenance for multi-state
systems. Reliab Eng Syst Saf 67:193203
Levitin G (2001) Redundancy optimization for multi-state systems with fixed resource-
requirements and unreliability sources. IEEE Trans Reliab 50:5259
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer, Ber-
lin
366 Appendix A
Painton L, Campbell J (1995) Genetic algorithm in optimization of system reliability. IEEE

Trans Reliab 44:172178
Park J, Lee K, Shin J et al (2005) A particle swarm optimization for economic dispatch with
nonsmooth cost functions. IEEE Trans Power Syst 20:3442
PSO Tutorial: http://www.swarmintelligence.org/tutorials.php
Syswerda G (1991) A study of reproduction in generational and steady state genetic algorithms.
In: Rawlins GJE (ed) Proceedings of the 1st Workshop on Foundations of Genetic Algo-
rithms, 15-18 July, 1990. Morgan Kaufmann, San Mateo, CA, pp 94101
Vavak F, Fogarty T (1996) A comparative study of steady state and generational genetic algo-
rithms for use in nonstationary environments: evolutionary computing. Lecture notes in com-
puter science, vol 1143. Springer, Brighton, UK, pp 297306
Whitley D (1989) The genitor algorithm and selective pressure: rank best allocation of reproduc-
tive trails is best. In: Schaffer D (ed) Proceedings of the 3rd International Conference on Ge-
netic Algorithms. Morgan Kaufmann, San Francisco, pp 116121
Yang H, Sung M et al (1999) Application of genetic algorithm for reliability allocation in nuclear
power plant. Reliab Eng Syst Saf 65:229238
Usher J, Kamal A, Syed W (1998) Cost optimal maintenance and replacement scheduling. IIE
Trans 30:11211128
Appendix B
Parameter Estimation and Hypothesis Testing
for Non-homogeneous Poisson Process
B.1 Homogeneous Poisson Process
If a system in service can be repaired to an as good as new condition following

each failure and times between failures are independent and exponentially distrib-
uted, then the failure process is called a homogeneous Poisson process (HPP).
A counting process is a homogenous Poisson process with parameter > 0 if
N ( 0 ) = 0,
the process has independent increments, and
the number of failures in any interval of length t is distributed as a Poisson dis-
tribution with parameter t.
The distribution of the number of events in ( t1 , t2 ] has Poisson distribution
with parameter ( t2 t1 ) . Therefore, the probability mass function is:
x
( t2 t1 ) e
( t2 t1 )
P N ( t2 ) N ( t1 ) = n = , n = 0,1, 2,.... (B.1)

n!
The expected number of failures by time t is ( t ) = E N ( t ) = t , where is

often called the failure intensity or rate of occurrence of failure. The intensity
function is therefore ( t ) = ' ( t ) = . If X 1 , X 2 ,... are independent and identi-
cally distributed exponential random variables, then N(t) corresponds to a Poisson
process.
368 Appendix B
B.2 Non-homogeneous Poisson Process
B.2.1 General Description of Non-homogeneous Poisson Process
As a general class of well-developed stochastic process models in reliability engi-

neering, non-homogeneous Poisson process (NHPP) models have been success-
fully used in studying hardware reliability problems (Gertsbakh 2000). NHPP
models are especially useful to describe failure processes that possess certain
trends, such as reliability growth or deterioration. The cumulative number of fail-
ures up to time t, N(t), can be described by NHPP. For the counting process
{N ( t ) , t 0} modeled by the NHPP, N(t) follows a Poisson distribution with pa-
rameter (t). The probability that N(t) is a given integer n is expressed by
n
( t ) e
( t )
P N ( t ) = n = , n = 0,1, 2,.... (B.2)

n!
(t) is the mean value function. The function ( t ) describes the expected cu-
mulative number of failures behavior.
The underlying assumptions of the NHPP are as follows:
N(0) = 0;
{ N ( t ) , t 0} has independent increments;
P { N ( t + h ) N ( t ) = 1} = ( t ) + (h); and
P { N ( t + h ) N ( t ) 2} = (h).
o(h) denotes a quantity that tends to zero for small h. The function ( t ) is the
failure intensity. Given ( t ) , the mean value function ( t ) = E N ( t ) satisfies
t
( t ) = ( v ) dv. . (B.3)
0
Conversely, knowing ( t ) , the failure intensity at time t can be obtained by
d (t )
(t ) = . (B.4)
dt
Appendix B 369
The probability of exactly n events occurring in the interval (a, b] is given by
n b
b ( t ) dt e a (t ) dt
a
P N ( b ) N ( a ) = n = , n = 0,1, 2,.... (B.5)
n!
Two parameterizations of the failure intensity are especially convenient in reli-

ability analysis of repairable systems. The first form is so-called log-linear with
failure intensity
1 ( t ) = e + t ,
1 1
(B.6)
and the second form is the Weibull or power form with failure intensity
2 ( t ) = 2 2 t 1 ,
2
(B.7)
where 1 , 1 , 2 , 2 are model parameters.

The maximum-likelihood estimates (MLEs) of (1 , 1 ) and ( 2 , 2 ) may be
obtained as the solutions of the systems of the maximum-likelihood equations
(Crowder et al. 1991; Gertsbakh 2000).
If the failure process follows log-linear form process and testing data are trun-
cated at the nth failure with 0 < t1 < t2 < ... < tn denoting the successive failure
times, the likelihood function is
n
L ( t1 , t2 ,..., tn , 1 , 1 ) = n1 + 1 ti
{ }.
e1 e 1tn 1
(B.8)
i =1 1
For (1 , 1 ) (log-linear form) it follows that
n1
e 1 = t ,
e 1 n 1
n (B.9)
t + n 1 = ntn .

i =1
i 1
1 e 1tn
If the failure process follows the Weibull process and testing data are truncated
at the nth failure, with 0 < t1 < t2 < ... < tn denoting the successive failure times,
the likelihood function is
370 Appendix B
(
L ( t1 , t2 ,..., tn , 2 , 2 ) = 2 n 2 n exp 2 tn 2 )t 2 1
i . (B.10)
i =1
MLEs for 2 and 2 are
n
2 = n 1
,
t
ln n
i =1 ti (B.11)
n
2 = .
tn 2
A process is said to be time truncated if it is observed for a fixed length of time.

Some of the estimation and inference procedures are quite similar to those for
failure-truncated data. If the failure process follows a Weibull process and testing
data are truncated at time T, with 0 < t1 < t2 < ... < T denoting the successive fail-
ure times, the likelihood function is
(
L ( t1 , t2 ,..., tn , 2 , 2 ) = 2 n 2 n exp T 2 )t 2 1
i . (B.12)
i =1
Thus, the MLEs of 2 and 2 are
n
2 = n
,
ln T
i =1 ti (B.13)
n
2 = .
T 2
B.2.2 Hypothesis Testing
A natural test of a hypothesis, when considering the reliability of a system, is to

check whether the rate of occurrence of failures is constant. For the log-linear
model, 1 = 0, and for the Weibull or power form model, 2 = 1.
The Laplace trend test is a test for the null hypothesis of an HPP vs. the alternative
of a monotonic trend (Cox and Lewis 1966; Ascher and Feingold 1984).
Appendix B 371
Under the HPP assumption, the first n1 arrival times, designated as

t1 , t2 ,, tn 1 , are the order statistics from a uniform distribution on the [0, tn] in-
terval:
1 n 1 t

n 1 i =1
ti n
2
U= . (B.14)
1
tn
12(n 1)
The null hypothesis The process is HPP is rejected for too small or too large
values of U: U < z or U > z . More over, U > 0 indicates a deteriorating sys-
2 2
tem, and U < 0 indicate an improving system.

The MIL-HDBK-189 test (1981) is based on the test statistic
n 1 T
V = 2 ln n . (B.15)
i =1 Ti
Under the null hypothesis of an HPP, V is distributed as 2 with 2 ( m 1) de-

grees of freedom. Large values of V supply evidence against the null hypothesis in
favor of reliability growth. Small values of V are indicative of reliability deteriora-
tion. V < 12 2 (2n 1) or V > 2 2 (2n 1) . The MIL-HDBK-189 test was dis-
cussed by Crowder et al. (1991), Meeker and Escobar (1998), and others.
The LewisRobinson test for trend (Ascher and Feingold 1984) based on the
Laplace statistic U divided by the estimated coefficient of variation of the times
between failures,
U
U LR = . (B.16)
CV
If the underlying process is an HPP, then ULR is asymptotically equivalent to U

since CV = 1 if the times between failures are exponentially distributed.
The Cramervon Mises test was discussed by Park and Kim (1992). The failure
T
times, U i = i , i = 1, 2,, n 1, are distributed as order statistics from the uni-
Tn
form distribution over [0,1]. The Cramervon Mises statistic is calculated as fol-
lows:
372 Appendix B
2
2i 1
n 1
1
W = U i
2
+ . (B.17)
i =1 2(n 1) 12(n 1)
Critical values of this goodness-of-fit statistic are calculated by Crow (1974,

1990). Park and Kim (1992) present a more precise table of critical values for this
statistic.
The Anderson-Darling test statistic is calculated as follows:
n 1
2
(2i 1) ln U i + ln(1 U n i )

A = i =1
n + 1. (B.18)
n 1
Critical values of this goodness-of-fit statistic are calculated by Park and Kim
(1992).
The Hartley test is based on some results of Hartley (1950). The test uses the ratio
of maximum value to minimum value of the time intervals between failures. The
Hartley test is as follows:
max { i }
h ( n) = . (B.19)
min { i }
The null hypothesis will be rejected if h ( n ) > h1 ( n ) . The critical values for
this statistic are represented in Gnedenko at al. (1969) and for big n values
( n > 12 ) may be calculated using Monte Carlo simulation.
B.2.3 Computer-intensive Procedure for Testing the Non-

homogeneous Poisson Process Hypothesis
Consider an NHPP with log-linear and power form intensity functions. Parameter
estimation is carried out by the maximum-likelihood method.
For the case of a known intensity function, testing the hypothesis that the given
sample path is a realization of NHPP can be carried out on the basis of the follow-
ing well-known fact: by the NHPP model the mean value functions of NHPP,
computed in sequence of ordered failure times, are the failure times of an HPP
with a constant intensity function of 1. Therefore, the intervals between events in
the HPP form a sample of i.i.d. standard exponential random variables and it is
possible to use goodness-of fit tests to check the exponentiality of the process, dis-
cussed in the previous paragraph.
Appendix B 373
Frenkel et al. (2003 and 2005) proposed a computer-intensive procedure for

testing the hypothesis that a given sample path belongs to NHPP without making
the assumption that the intensity function is known and is being estimated from a
sample path.
The procedure for estimating NHPP intensity function parameters is as fol-
lows. Suppose, we observe NHPP in the interval [0,T] and the events took place at
the instants 0 < t1 < t2 < < tn T . We consider only the failure-truncated data,
i.e., tn = T .
Two parameterizations (Cox and Lewis 1966) are especially convenient for
maximum-likelihood estimation. The first form of intensity function is the log-
linear form with (t ) = e + t and the second one is Weibull or power-law form
with (t ) = t 1 . The MLEs of ( , ) are obtained as the solutions of the
systems of the maximum-likelihood equations (Crowder et al. 1991).
Let { N ( t ) , t > 0} be an NHPP with a known intensity function (t). Introduce
the transformation of time via the following relationship:
ti
Wi = (v)dv, i = 1, 2,, n. (B.20)

0
In other words, events in the transformed time occur at the instants W1, W2, ,
Wn.
The following fact is very important. Denote
1 = W1 , 2 = W2 W1 ,..., n = Wn Wn 1 .
Then, 1 , 2 ,..., n are i.i.d. random variables with standard exponential distri-
bution. Hence, the NHPP in the transformed time becomes a Poisson process with
intensity 1 .
The above-mentioned fact may be used for testing the hypothesis that a given
process is an NHPP with known intensity function (t ). Consider the inter event
intervals 1 , 2 ,..., n in the transformed time and check the hypothesis *0 that
they are i.i.d. exponential random variables with parameter 1.
How to check the hypothesis that the given process is an NHPP when the inten-
sity function is not known in advance?
Suppose we observe a counting process { N ( t ) , t > 0} in the interval [0, tn ] .
The times when the events appear in this process are t1 , t2 ,..., tn . Carry out the es-
timation of (t ), using the maximum-likelihood method, by assuming either the
log-linear or the power-law form of (t ). We choose a suitable intensity function
(t ) according to the minimum of
374 Appendix B
D = sup
(t ) * (t ) , (B.21)
t >0
where
0 for t < t1 ,
t 1 for t1 t < t2 ,
( t ) = ( v ) dv
and * (t ) = (B.22)
0
n for tn < t .
Define our assumption:

: The given process {N (t ), t 0} is an NHPP with intensity function (t ).
0
We suggest the following procedure for testing .

0
Step 1: Set j := 1.
Step 2: Simulate a sample path with n events of the NHPP with intensity func-
tion (t ).
Step 3: Carry out the time transformation (B.20).
Step 4: Compute the values of test statistics S1, S2,, Sk, described in the pre-
vious paragraph, for this realization. Denote them as S1( j ) , S 2( j ) ,..., Sk( j ) .
Step 5: Set j := j + 1 .
Repeat the procedure M times (M = 1000).

After carrying out the simulation stage, we get the simulated values of all the
above-mentioned statistics: S1( j ) , S 2( j ) ,..., Sk( j ) , j = 1, 2, , M . Order these statistics:
{S (1)
1 , S1(2) ,..., S1( M ) } , {S 2(1) , S 2(2) ,..., S2( M ) } , ... , {Sk(1) , Sk(2) ,..., Sk( M ) } .
Determine the upper and lower -critical values for these statistics. Denote
them as S1 ( ), S1 (1 ); S2 ( ), S2 (1 );...; Sk ( ), Sk (1 ).
For the given counting process {N (t ), t 0} that events which occur at the in-
stants (t1 , t2 ,..., tn ), we do the following operations:
1. Estimate (t ).
ti
2. Carry out the time transformation Vi = (t )dt , i = 1,..., n and compute the
0
intervals between adjacent events in the transformed time:
1 = V1 , 2 = V2 V1 ,..., n = Vn Vn 1 . (B.23)
Appendix B 375
3. For sample (1 ,..., n ), compute the values S1* , S2* ,..., Sk* of the statistics S1,
S2,, Sk.
4. Compare S1* , S2* ,..., Sk* to the upper and lower critical values calculated above.
5. Reject 0 if one of the statistics S1* , S2* ,..., Sk* falls outside one of the intervals
[ S1 ( ), S1 (1 )] , [ S2 ( ), S2 (1 )] ,..., [ Sk ( ), Sk (1 )]. (B.24)
Let us consider the well-known failure data and compare different authors
conclusions with results gained via our procedure.
We illustrate the presented methodology using data on the time intervals be-
tween successive failures of the air conditioning system of the Boeing 720 jet se-
ries 7912 (Proschan 1963). These data were analyzed by many researchers (in-
cluding Park and Kim 1991; Gaudoin et al. 2003). All authors claim that failure
data came indeed from an NHPP with a power-law intensity function. We came to
a similar conclusion. All test statistic values fall inside the corresponding [0.05,
0.95] simulated intervals for all of our statistics. Therefore, we would claim that
the data do not contradict the NHPP with power-law intensity function.
Crowder et al. (1991) give the data on failures of an engine of USS Halfbeak.
The data were fitted using log-linear and lower-law intensity functions. Using the
Laplace test statistics and MIL-HDBK test statistics the authors express doubts
that the data set comes from an NHPP. Our tests reveal the following: using the
power-law intensity function three of eight statistics fall outside the corresponding
[0.01, 0.99] simulated intervals for these statistics. Using a log-linear intensity
function non our criteria contradict the NHPP hypothesis. Our conclusion is that
the NHPP hypothesis is questionable.
The following data (Frenkel et al. 2004, 2005) summarize the time intervals in
operating hours between failures of the Schlosser Vibration Machine, collected
from operation reports dated from 1999 to 2002 at the Yeroham Construction Ma-
terials Facility (Israel): 240, 4032, 288, 1224, 624, 552, 2352, 168, 480, 1400, 408,
528, 888, 768, 336, 528, 72, 96, 88, 268, 84, 86, 96, 103, 456, 24, 120. The ma-
chine was observed for 16309 h and 27 failures were identified.
The estimated intensity function is assumed to be log-linear. Hence
1 = 1.7992 and 1 = 2.4979. To obtain the failure data we used our method. All
test statistic values fell inside the corresponding simulated intervals. Therefore, we
would claim that the data are from an NHPP with a log-linear intensity function.
References
Ascher H, Feingold H (1984) Repairable systems reliability. Marcel Dekker, New York
Cox DR, Lewis PAW (1966) The statistical analysis of series of events. Chapman and Hall,
London
376 Appendix B
Crow L (1974) Reliability analysis for complex, repairable systems. In: Proschan F, Serfling RJ
(eds) Reliability and Biometry. SIAM, Philadelphia, pp 379410
Crowder MJ, Kimber AC, Smith RL, Sweeting TJ (1991) Statistical analysis of reliability data.
Chapman and Hall/CRC, Boca Raton, Florida
Frenkel IB, Gertsbakh IB, Khvatskin LV (2003) Parameter estimation and hypotheses testing for
nonhomogeneous poisson process. Transport and Telecommunication 4(2):917
Frenkel IB, Gertsbakh IB, Khvatskin LV (2004) Parameter estimation and hypotheses testing for
nonhomogeneous poisson process. Part 2. Numerical Examples. Transport and Telecommu-
nication 5(1):116129
Frenkel IB, Gertsbakh IB, Khvatskin LV (2005) On the simulation approach to hypotheses test-
ing for nonhomogeneous Poisson process. In: Book of Abstracts of International Workshop
on Statistical Modelling and Inference in Life Sciences, September 14, 2005, Potsdam, Ger-
many, pp 3539
Gaudoin O, Yang B Xie Min (2003) A simple goodness-of-fit test for the power-law process,
based on the duane plot. IEEE Trans Reliab 52(1):6974
Gertsbakh IB (2000) Reliability theory with applications to preventive maintenance. Springer,
Berlin
Gnedenko BV, Belyaev U, Solovyev AD (1969) Mathematical methods of reliability theory.
Academic Press, San Diego
Hartley HO (1950) The maximum F-ratio as a short-cut test of heterogeneity of variance. Bio-
metrica 37:308312.
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
Park WJ, Kim YG (1992) Goodness-of-fit tests for the power-law process. IEEE Trans Reliab.
41(1):107111
Proschan F (1963) Theoretical explanation of observed decreasing failure rate. Technometrics
5(3):375383
Appendix C
MATLAB Codes for Examples and Case Study
Calculation
C.1 Using MATLAB ODE Solvers
For solutions of systems of differential equations, MATLAB provides functions,

called solvers that implement RungeKutta methods with variable step size. These
are the ode23, ode45 and ode113 functions. We use in our calculations the
ode45 function, uses a combination of fourth- and fifth-order methods, which is
fast and accurate (Gilat 2008; Moore 2008, Palm 2001).
The syntax for actually solving a differential equation with this function is:
[t,p]=ode45(@funcpdot,tspan,p0),
where pdot is the name of a function that is written to describe the system of dif-
ferential equations, vector tspan contains the starting and ending values of the
independent variable t and p0 is a vector of the initial values of the variables in
the system of differential equations.
C.2 MATLAB Code for Example 2.2
Function File Ex2_2

function f=funcEx2_2(t,p)
f=zeros(4,1);
Lambda4_3=2; Lambda3_2=1; Lambda2_1=0.7;
f(4)=-Lambda4_3*p(4);
f(3)=Lambda4_3*p(4)-Lambda3_2*p(3);
f(2)=Lambda3_2*p(3)-Lambda2_1*p(2);
f(1)=Lambda2_1*p(2);
378 Appendix C
Solver Ex2_2
clear all;
p0=[0 0 0 1];
[t,p]=ode45(@funcEx2_2, [0 8], p0);
R1=1-p(:,1); R2=1-p(:,1)-p(:,2);
plot(t,p(:,1),'k-',t,p(:,2),'k--',t,p(:,3),'k-.',
t,p(:,4),'k.',t,R1,'k*',t,R2,'kx');
figure (2);
Et=10*p(:,4)+8*p(:,3)+5*p(:,2)+0*p(:,1);
Dt=1*p(:,2)+6*p(:,1);
plot(t,Et,'k-',t,Dt,'k--');
Function File Ex2_3A

function f=funcEx2_3A(t,p)
f=zeros(4,1);
Lambda4_3=2;Lambda3_2=1;Lambda2_1=0.7;
Mu2_3=80; Mu1_2=50;
Lambda3_1=0.4;Lambda4_2=0.3;Lambda4_1=0.1Mu3_4=100;
Mu1_4=32; Mu1_3=40; Mu2_4=45;
f(4)=-(Lambda4_3+Lambda4_2+Lambda4_1)*p(4)
+Mu3_4*p(3)+Mu2_4*p(2)+Mu1_4*p(1);
f(3)=Lambda4_3*p(4)-(Lambda3_2+Lambda3_1+Mu3_4)*p(3)
+Mu2_3*p(2)+Mu1_3*p(1);
f(2)=Lambda4_2*p(4)+Lambda3_2*p(3)
-(Lambda2_1+Mu2_3+Mu2_4)*p(2)+Mu1_2*p(1);
f(1)=Lambda4_1*p(4)+Lambda3_1*p(3)
+Lambda2_1*p(2)-(Mu1_2+Mu1_3+Mu1_4)*p(1);
Solver Ex2_3A
clear all;
p0=[0 0 0 1];
[t,p]=ode45(@funcEx2_3A, [0 0.1], p0);
A1=1-p(:,1); A2=1-p(:,1)-p(:,2);A3=p(:,4);
Et=100*p(:,4)+80*p(:,3)+50*p(:,2)+0*p(:,1);
Dt=10*p(:,2)+60*p(:,1);
plot(t,A1,'k-',t,A2,'k--',t,A3,'k-.');
%figure (2);
plot(t,Et,'k-');
figure (3);
plot(t,Dt,'k-');
Appendix C 379
Function File Ex2_3B

function f=funcEx2_3B(t,p)
f=zeros(3,1);
Lambda4_3=2; Lambda3_2=1; Lambda2_1=0.7;
Mu3_4=100; Mu2_3=80; Mu1_2=50;
Lambda3_1=0.4; Lambda4_2=0.3; Lambda4_1=0.1;
Mu1_4=32; Mu1_3=40; Mu2_4=45;
Lambda4_0=Lambda4_1+Lambda4_2;
Lambda3_0=Lambda3_1+Lambda3_2;
f(3)=-(Lambda4_3+Lambda4_2+Lambda4_1)*p(3)+Mu3_4*p(2);
f(2)=Lambda4_3*p(3)-(Lambda3_2+Lambda3_1+Mu3_4)*p(2);
f(1)=Lambda4_0*p(3)+Lambda3_0*p(2);
Solver Ex2_3B
clear all;
p0=[0 0 1];
[t,p]=ode45(@funcEx2_3B, [0 8], p0);
Rw=1-p(:,1);
plot(t,Rw);
Function File Ex2_4A

function f=funcEx2_4A(t,p)
f=zeros(12,1);
%for the element 1
Lambda2_1_1=7; Mu1_2_1=100;
%for the element 2
Lambda2_1_2=10; Mu1_2_2=80;
%for the element 3
Lambda3_2_3=10; Lambda3_1_3=0; Lambda2_1_3=7;
Mu1_3_3=0; Mu1_2_3=120; Mu2_3_3=110;
f(1)=-(Lambda2_1_1+Lambda2_1_2+Lambda3_2_3)*p(1)
+Mu1_2_1*p(2)+Mu1_2_2*p(3)+Mu2_3_3*p(4);
f(2)=Lambda2_1_1*p(1)-(Mu1_2_1+Lambda2_1_2
+Lambda3_2_3)*p(2)+Mu1_2_2*p(5)+Mu2_3_3*p(6);
+Lambda3_2_3)*p(3)+Mu1_2_1*p(5)+Mu2_3_3*p(7);
f(4)=Lambda3_2_3*p(1)-(Mu2_3_3+Lambda2_1_1+Lambda2_1_2
+Lambda2_1_3)*p(4)+Mu1_2_1*p(6)
+Mu1_2_2*p(7)+Mu1_2_3*p(8);
f(5)=Lambda2_1_2*p(2)+Lambda2_1_1*p(3)
380 Appendix C
-(Mu1_2_2+Mu1_2_1+Lambda3_2_3)*p(5)+Mu2_3_3*p(9);
-(Mu2_3_3+Mu1_2_1+Lambda2_1_2+Lambda2_1_3)*p(6)
+Mu1_2_2*p(9)+Mu1_2_3*p(10);
-(Mu2_3_3+Mu1_2_2+Lambda2_1_1+Lambda2_1_3)*p(7)
+Mu1_2_1*p(9)+Mu2_3_3*p(11);
+Lambda2_1_2)*p(8)+Mu1_2_1*p(10)+Mu1_2_2*p(11);
+Lambda2_1_1*p(7)-(Mu2_3_3+Mu1_2_2+Mu1_2_1
+Lambda2_1_3)*p(9)+Mu1_2_3*p(12);
f(10)=Lambda2_1_3*p(6)+Lambda2_1_1*p(8)-
(Mu1_2_3+Mu1_2_1+Lambda2_1_2)*p(10)+Mu1_2_2*p(12);
f(11)=Lambda2_1_3*p(7)+Lambda2_1_2*p(8)-
(Mu1_2_3+Mu1_2_2+Lambda2_1_1)*p(11)+Mu1_2_1*p(12);
+Lambda2_1_1*p(11)-(Mu1_2_3+Mu1_2_2+Mu1_2_1)*p(12);
Solver Ex2_4A
clear all;
p0=[1 0 0 0 0 0 0 0 0 0 0 0];
[t,p]=ode45(@funcEx2_4A, [0 0.2], p0);
PrG0=p(:,5)+p(:,8)+p(:,9)+p(:,10)+p(:,11)+p(:,12);
PrG1_5=p(:,3)+p(:,7); PrG1_8=p(:,4)+p(:,6);
PrG2_0=p(:,2); PrG3_5=p(:,1);
A=p(:,1)+p(:,2)+p(:,3)+p(:,4)+p(:,6)+p(:,7);
Et=3.5*p(:,1)+2*p(:,2)+1.5*p(:,3)+1.8*p(:,4)
+1.8*p(:,6)+1.5*p(:,7);
Dt=p(:,5)+p(:,8)+p(:,9)+p(:,10)+p(:,11)+p(:,12);
plot(t,Et,'k-');
figure(2);
plot(t,Dt,'k-');
Function File Ex2_4B

function f=funcEx2_4B(t,p)
f=zeros(7,1);
%for the element 1
Lambda2_1_1=7; Mu1_2_1=100;
%for the element 2
Lambda2_1_2=10; Mu1_2_2=80;
%for the element 3
Lambda3_2_3=10; Lambda3_1_3=0; Lambda2_1_3=7;
Mu1_3_3=0; Mu1_2_3=120; Mu2_3_3=110;
Appendix C 381
+Lambda2_1_3*p(5)+(Lambda2_1_2+Lambda2_1_3)*p(6)
+(Lambda2_1_1+Lambda2_1_3)*p(7);
f(2)=-(Lambda2_1_1+Lambda2_1_2+Lambda3_2_3)*p(2)
+Mu1_2_1*p(3)+Mu1_2_2*p(4)+Mu2_3_3*p(5);
+Lambda3_2_3)*p(3)+Mu2_3_3*p(6);
+Lambda3_2_3)*p(4)+Mu2_3_3*p(7);
f(5)=Lambda3_2_3*p(2)-(Mu2_3_3+Lambda2_1_1+Lambda2_1_2
+Lambda2_1_3)*p(5)+Mu1_2_1*p(6)+Mu1_2_2*p(7);
-(Mu2_3_3+Mu1_2_1+Lambda2_1_2+Lambda2_1_3)*p(6);
-(Mu2_3_3+Mu1_2_2+Lambda2_1_1+Lambda2_1_3)*p(7);
Solver Ex2_4B
clear all;
p0=[0 1 0 0 0 0 0];
[t,p]=ode45(@funcEx2_4B, [0 1], p0);
R=1-p(:,1);
plot(t,R,'k-');
C.5 MATLAB Code for Air Conditioning System

(Case Study 6.2.1)
C.5.1 Calculating Average Availability
Function File CondAvAv

function f=funcCondAvAv(t,V)
global Lambda Lambda_star Mu Mu_star Lambda_d Lambda_N;
f=zeros(12,1);
f(1)=-(2*Mu+Mu_star+Lambda_d)*V(1)+2*Mu*V(3)+
Mu_star*V(4)+Lambda_d*V(7);
f(2)=1-(2*Lambda+Mu_star+Lambda_d)*V(2)+
2*Lambda*V(3)+Mu_star*V(6)+Lambda_d*V(8);
f(3)=Lambda*V(1)+Mu*V(2)-
(Lambda+Mu+Mu_star+Lambda_d)*V(3)+Mu_star*V(5)
+Lambda_d*V(9);
f(4)=Lambda_star*V(1)-
382 Appendix C
(Lambda_star+2*Mu+Lambda_d)*V(4)+2*Mu*V(5)
+Lambda_d*V(10);
f(5)=1+Lambda_star*V(3)+Lambda*V(4)-
(Lambda+Lambda_star+Mu+Lambda_d)*V(5)+
Mu*V(6)+Lambda_d*V(11);
f(6)=1+2*Lambda*V(5)-(2*Lambda+Lambda_d)*V(6)+
Lambda_d*V(12);
f(7)=Lambda_N*V(1)-(2*Mu+Mu_star+Lambda_N)*V(7)+
2*Mu*V(9)+Mu_star*V(10);
f(8)=1+Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(8)+
2*Lambda*V(9)+Mu_star*V(12);
f(9)=1+Lambda_N*V(3)+Lambda*V(7)+Mu*V(8)-
(Lambda+Mu+Mu_star+Lambda_N)*V(9)+Mu_star*V(11);
f(10)=1+Lambda_N*V(4)+Lambda_star*V(7)-
(Lambda_star+2*Mu+Lambda_N)*V(10)+2*Mu*V(11);
f(11)=1+Lambda_N*V(5)+Lambda_star*V(9)+Lambda*V(10)-
(Lambda+Lambda_star+Mu+Lambda_N)*V(11)+Mu*V(12);
f(12)=1+Lambda_N*V(6)+2*Lambda*V(11)-
(2*Lambda+Lambda_N)*V(12);
Solver CondAvAv (Fig. 6.5)

clear all;
Lambda=3; Lambda_star=10;
Mu=100; Mu_star=100;
Lambda_d=1251; Lambda_N=515.3;
V0 =[0 0 0 0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondAvAv, [0 1], V0);
A=V(:,6)./t;
plot(t,A,'k-');
Solver CondAvAv (Fig. 6.6)

clear all;
i=0;
for Mu=500:-50:50
Mu_star=Mu;
i=i+1;
V0 =[0 0 0 0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondAvnAv,[0 1],V0);
A=V(:,6)./t;
a_A=size(A);
Appendix C 383
b_A=a_A(:,1);
c_A(i)=A(b_A);
i_A(i)=i;
Av(i)=0.999;
i_Av(i)=i;
end
plot(i_A,c_A,'k-',i_Av, Av,'k--');
C.5.2 Calculating Total Number of System Failures
Function File CondMNF

function f=funcCondMNF(t,V)
f=zeros(12,1);
f(1)=-(2*Mu+Mu_star+Lambda_d)*V(1)+2*Mu*V(3)+
f(2)=2*Lambda+2*Lambda-
(2*Lambda+Mu_star+Lambda_d)*V(2)+2*Lambda*V(3)+
f(3)=Lambda*V(1)+Mu*V(2)-
(Lambda+Mu+Mu_star+Lambda_d)*V(3)+Mu_star*V(5)+
Lambda_d*V(9);
f(4)=Lambda_star*V(1)-
Lambda_star+2*Mu+Lambda_d)*V(4)+2*Mu*V(5)+
Lambda_d*V(10);
f(5)=Lambda+Lambda_star+Lambda_star*V(3)+Lambda*V(4)-
f(6)=2*Lambda*V(5)-(2*Lambda+Lambda_d)*V(6)+
Lambda_d*V(12);
f(7)=Lambda_N*V(1)-(2*Mu+Mu_star+Lambda_N)*V(7)+
2*Mu*V(9)+Mu_star*V(10);
f(8)=Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(8)+
f(9)=Lambda_N+Lambda+Lambda_N*V(3)+Lambda*V(7)+Mu*V(8)-
f(10)=Lambda_N+Lambda_star+Lambda_N*V(4)+
Lambda_star*V(7)-Lambda_star+2*Mu+
Lambda_N)*V(10)+2*Mu*V(11);
f(11)=Lambda_N*V(5)+Lambda_star*V(9)+Lambda*V(10)-
f(12)=Lambda_N*V(6)+2*Lambda*V(11)-
2*Lambda+Lambda_N)*V(12);
384 Appendix C
Solver CondMNF (Fig. 6.7)

clear all;
V0=[0 0 0 0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondMNF,[0 1],V0);
plot(t,V (:,6),'k-');
Solver CondMNF (Fig. 6.8)

clear all;
i=0;
for Mu=500:-50:50
Mu_star=Mu;
i=i+1;
V0=[0 0 0 0 0 0 0 0 0 0 0 0];
[t,V]=
ode45(@funcCondMNF,[0 1],V0);
a_V =size(V);
b_V =a_V (:,1);
c_V (i)=V (b_V,6);
i_V (i)=i;
end
plot(i,c_V,'k-');
C.5.3 Calculating Mean Time to System Failure
Function File CondMTTF

function f=funcCondMTTF(t,V)
f=zeros(9,1);
f(1)=0;
f(2)=1+2*Lambda*V(1)-(2*Lambda+Mu_star+Lambda_d)*V(2)+
f(3)=1+(Lambda+Lambda_star)*V(1)-
f(4)=1+2*Lambda*V(3)-(2*Lambda+Lambda_d)*V(4)+
Appendix C 385
Lambda_d*V(9);
f(5)=1+Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(5)+
f(6)=1+(Lambda+Lambda_N)*V(1)+Mu*V(5)-
f(7)=1+(Lambda_star+Lambda_N)*V(1)-
f(8)=1+Lambda_N*V(3)+Lambda_star*V(6)+Lambda*V(7)-
f(9)=1+Lambda_N*V(4)+2*Lambda*V(8)-
Solver CondMTTF (Fig. 6.10)

clear all;
V0=[0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondMTTF,[0 1],V0);
plot(t,V(:,4),'k-');
Solver CondMTTF (Fig. 6.11)

clear all;
i=0;
for Mu=500:-50:50
Mu_star=Mu;
i=i+1;
V0=[0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondMTTF,[0 1],V0);
A =V(:,4);
a_A=size(A);
b_A=a_A(:,1);
c_A(i)=A(b_A);
i_A(i)=i;
end
plot(i_A,c_A,'k-');
386 Appendix C
C.5.4 Calculating Probability of Failure-free Operation
Function File CondRELINT

function f=funcCondRELINT(t,V)
f=zeros(9,1);
f(1)=0;
f(2)=2*Lambda+2*Lambda*V(1)-
(2*Lambda+Mu_star+Lambda_d)*V(2)+
f(3)=Lambda+Lambda_star+(Lambda+Lambda_star)*V(1)-
f(4)=2*Lambda*V(3)-(2*Lambda+Lambda_d)*V(4)+
Lambda_d*V(9);
f(5)=Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(5)+
f(6)=Lambda+Lambda_N+(Lambda+Lambda_N)*V(1)+Mu*V(5)-
f(7)=Lambda_star+Lambda_N+(Lambda_star+Lambda_N)*V(1)-
f(8)=Lambda_N*V(3)+Lambda_star*V(6)+Lambda*V(7)-
f(9)=Lambda_N*V(4)+2*Lambda*V(8)-
Solver CondRELINT
clear all;
i=0;
for Mu=500:-50:50
Mu_star=Mu;
i=i+1;
V0=[0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondRELINT,[0 1],V0);
R=1-V/(:,9);
a_R=size(R);
b_R =a_R (:,1);
c_R(i)=R(b_R;
i_R(i)=i;
end
plot(i_R,c_R,'k-');
Appendix C 387
C.6 MATLAB Code for Multi-state Power Generation Unit

(Case Study 7.1.1)
C.6.1 Calculating Average Availability
Function File GenAvAv for Non-aging Power Generation Unit

function f=funcGenAvAv(t,V)
global Lambda41 Lambda42 Lambda43 Mu14 Mu24 Mu34;
f=zeros(4,1);
f(1)=-Mu14*V(1)+Mu14*V(4);
f(2)=-Mu24*V(2)+Mu24*V(4);
f(3)=1-Mu34*V(3)+Mu34*V(4);
f(4)=1+Lambda41*V(1)+Lambda42*V(2)+Lambda43*V(3)-
(Lambda41+Lambda42+Lambda43)*V(4);
Function File GenAvAv_NH for Aging Power Generation Unit

function f=funcGenAvAv_NH(t,V)
f=zeros(4,1);
Lambda42NH=Lambda42+0.2189*t^2;
f(1)=-Mu14*V(1)+Mu14*V(4);
f(2)=-Mu24*V(2)+Mu24*V(4);
f(3)=1-Mu34*V(3)+Mu34*V(4);
f(4)=1+Lambda41*V(1)+Lambda42NH*V(2)+Lambda43*V(3)-
(Lambda41+Lambda42NH+Lambda43)*V(4);
Solver GenAvAv - GenAvAv_NH

clear all;
Lambda41=2.63; Lambda42=7.01; Lambda43 =13.14;
Mu14=2091; Mu24=742.8; Mu34=446.9;
V0=[0 0 0 0];
[t_const,V_const]=ode45(@funcGenAvAv, [0 5], V0);
A_const=V_const(:,4)./t_const;
V0_NH =[0 0 0 0];
[t_NH,V_NH]=ode45(@funcGenAvAv_NH, [0 5], V0_NH);
A_NH=V_NH(:,4)./t_NH;
plot(t_const,A_const, 'k-', t_NH,A_NH,'k-.');
388 Appendix C
C.6.2 Calculating Total Number of System Failures
Function File GenMNF for Non-aging Power Generation Unit

function f=funcGenMNF (t,V)
f=zeros(4,1);
f(1)=-Mu14*V(1)+Mu14*V(4);
f(2)=-Mu24*V(2)+Mu24*V(4);
f(3)=-Mu34*V(3)+Mu34*V(4);
f(4)=Lambda41+Lambda42+Lambda41*V(1)+Lambda42*V(2)+
Lambda43*V(3)-(Lambda41+Lambda42+Lambda43)*V(4);
Function File GenMNF_NH for Aging Power Generation Unit

function f=funcGenMNF_NH(t,V)
Lambda42NH=Lambda42+0.8755*t;
f=zeros(4,1);
f(1)=-Mu14*V(1)+Mu14*V(4);
f(2)=-Mu24*V(2)+Mu24*V(4);
f(3)=-Mu34*V(3)+Mu34*V(4);
f(4)=Lambda41+Lambda42NH+Lambda41*V(1)+Lambda42NH*V(2)+
Lambda43*V(3)-(Lambda41+Lambda42NH+Lambda43)*V(4);
Solver GenMNF - GenMNF_NH

clear all;
Mu14=2091; Mu24=742.8; Mu34=446.9;
V0=[0 0 0 0];
[t_const,V_const]=ode45(@funcGenMNF, [0 5], V0);
V0_NH =[0 0 0 0];
[t_NH,V_NH]=ode45(@funcGenMNF_NH, [0 5], V0_NH);
plot(t_const,V_const(:,4),'k-', t_NH,V_NH(:,4),'k-.');
C.6.3 Calculating Reliability Function
Function File GenProbFail for Non-aging Power Generation Unit

function f=funcGenProbFail(t,V)
f=zeros(3,1);
Appendix C 389
f(1)=0;
f(2)=-Mu34*V(2)+Mu34*V(3);
f(3)=Lambda41+Lambda42NH+(Lambda41+Lambda42NH)*V(1)+
Function File GenProbFail_NH for Aging Power Generation Unit

function f=funcGenProbFail_NH(t,V)
f=zeros(3,1);
Lambda42NH=Lambda42+5.2189*t^2;
f(1)=0;
f(2)=-Mu34*V(2)+Mu34*V(3);
f(3)=Lambda41+Lambda42NH+(Lambda41+Lambda42NH)*V(1)+
Solver GenProbFail - GenProbFail_NH

clear all;
Mu14=2091; Mu24=742.8; Mu34=446.9;
V0=[0 0 0];
[t_const,V_const]=ode45(@funcGenProbFail, [0 0.5], V0);
V0_NH =[0 0 0];
[t_NH,V_NH]=ode45(@funcGenProbFail_NH, [0 0.5], V0_NH);
plot(t_const,1-V_const(:,3),'k-',
t_NH,1-V_NH(:,3),'k-.');
References
Gilat A (2008) MATLAB: An Introduction with Applications. Wiley, New York

Moore H (2008) MATLAB for Engineers. Prentice-Hall, Englewood Cliffs, NJ
Palm WJ (2001) Introduction to MATLAB 6 for engineers. McGraw-Hill, Boston
Index
A first-order, 31
Absorbing state, 45 nth-order joint , 31
Acceptability function, 16, 17 Demand availability, 21
Acceptable and unacceptable states, 16, 17
Accumulated performance deficiency, 279
Aging Multi-state systems, 273 E
AndersonDarling test, 372
Efficiency, 119
Average accumulated performance
Embedded Markov chain, 100, 135, 137
deficiency, 164
Ergodic process, 46
Average expected output performance, 164
Estimate, 118
Estimator, 118
Expected accumulated performance
C deficiency, 92
ChapmanKolmogorov equation, 36, 42
Chromosome structure, 302
Combined performance-demand model, 86 F
Composition operators, 167
Failure criteria, 16, 17
Coherency, 15
Failure time
Composition operator, 155, 167
censored on the right, 127
Confidence coefficient, 126
censored on the left, 128
Confidence interval, 120, 126
Flow transmission MSS, 178
Confidence limits, 126
Frequency of failures, 92
Connection of elements
Failure-terminated test, 128
bridge, 178
with replacement, 128
parallel, 173
without replacement, 129
series, 170
First-order distribution, 31
series-parallel, 175
Fuzzy mMulti-state monotone system, 333
Consistency, 118
Fuzzy multi-state system, 321
Cramervon Mises test, 371
Fuzzy UGF, 336
D G
Discrete-time Markov chains, 34
Generic MSS model, 10, 48, 67
Discrete-state process, 30
Generalized universal generating
Distribution of the stochastic process
392 Index
operator, 155 Memoryless property, 32

Generating function, 144 Method of moments, 120
Genetic algorithm, 302, 314, 347, 364 MIL-HDBK-189 Test, 371
Moment generating functions, 148
MSS average availability, 91, 208, 276
MSS average output performance, 163
H MSS conditional mean performance, 163
Heuristic algorithms, 347 MSS expected output performance, 208
Homogeneous Markov chains, 36 MSS expected performance deficiency,
Homogeneous Poisson process, 367 208
Homogenity, 15 MSS instantaneous availability, 70
Howard differential equations, 79, 274 MSS instantaneous expected output per-
Hypotheses testing, 370 formance, 163
MSS mean accumulated performance
deficiency, 208
I MSS mean instantaneous performance
Importance, 182 deficiency, 107, 163
Increasing failure rate function, 273, 285 MSS reliability function, 70
Interval estimation, 125 MSS reliability measures, 19
MSS structure function, 10
MSS system availability, 162
MSS with variable demand, 84
J
Joint distribution, 31, 32
N
NHPP, 368
K Non-homogeneous Poisson process, 368
Kernel matrix, 136 Non-repairable multi-state element, 48
n-step transition probabilities, 36
Number of failures, 20
L
LaplaceStieltjes transform, 50
Laplace trend test, 370 O
LewisRobinson test for trend, 371 ODE solvers, 377
Life cycle cost (LCC), 238 One-step transition probabilities, 35
Loss of load probability (LOLP), 23 One-step transition probabilities for
embedded Markov chain, 102
Optimal corrective maintenance contract
planning, 299, 302
M Optimal preventive replacement policy,310
Maintenance contract, 291, 292 Optimization, 302
Maintenance optimization, 310, 312
Markov model for multi-state element, 203
Markov process, 32
Markov reward model, 79 P
Maximum-likelihood method, 122 Performance rate, 1, 8, 10
Mean accumulated reward, 92 Point estimation, 120
Mean conditional sojourn time, 102 Point process, 33
Mean time of staying in state, 46, 47 Poisson process, 33
Mean time between failures (MTBF), 20 Probability of system failure, 93, 280
Mean time to failure (MTTF), 92, 280 Property of estimators, 118
Mean total number of system failures, 277 Property of invariance, 32
Mean unconditional sojourn time, 102
Index 393
R T
Redundancy, 214 Time between failures, 20
Relevancy of system elements, 14 Time-terminated test, 128
Reliability-associated cost, 242 with replacement, 128
Reliability function, 45, 59 without replacement, 129
Reliability indices, 70, 79, 90, 105 Time to failure, 19
Reliability measures, 18 Task-processing MSS, 178
Renewal process, 33 Transition intensity, 41
Repairable multi-state element, 48, 57, 59 Transition probability function, 35
Reward, 79
U
S Unbiasedness, 119
Semi-Markov model, 99, 204 UGF of parallel systems, 173
Sequencing problems, 361 UGF of series systems, 170
State frequency, 46 UGF of series-parallel systems, 175
State probabilities, 32, 36, 42 UGF of systems with bridge structure, 178
Statistical estimation theory, 118 Universal generating function (UGF), 155
Stochastic matrix, 35 Universal generating operator, 154
System availability, 163 u-function, 159
System sojourn time, 135
Sufficiency, 119
Z
z-transform, 149, 151, 158

Multi-State System Reliability Analysis and Optimization For Engineers and Industrial Managers (UGF-Universal Generating Function Method)

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multi-State System Reliability Analysis and Optimization For Engineers and Industrial Managers (UGF-Universal Generating Function Method)

Загружено:

Авторское право:

Доступные форматы

Multi-state System Reliability Analysis and

Optimization for Engineers and Industrial Managers

ISBN 978-1-84996-319-0 e-ISBN 978-1-84996-320-6

British Library Cataloguing in Publication Data

Library of Congress Control Number: 2010932023

Springer-Verlag London Limited 2010

Cover design: eStudioCalamar, Figueres/Berlin

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

To my parents Tao and Jinhong

resources allocation) for real-world MSS. Numerous examples are included in

importance analysis in complex MSSs with different physical natures of per-

1 Multi-state Systems in Nature and in Engineering....................................... 1

2 Modern Stochastic Process Methods for Multi-state System Reliability

2.5.2 Evaluation of Reliability Indices Based on Semi-Markov

3 Statistical Analysis of Reliability Data for Multi-state Systems .............. 117

4 Universal Generating Function Method..................................................... 143

4.3 Importance and Sensitivity Analysis Using Universal Generating

5 Combined Universal Generating Function and Stochastic Process

6 Reliability-associated Cost Assessment and Management Decisions

7 Aging Multi-state Systems .............................................................................273

7.2.3 Total Expected Reward Bounds for Increasing Failure Rate

8 Fuzzy Multi-state System: General Definition and Reliability

Appendix A Heuristic Algorithms as a General Optimization Technique ... 347

Appendix B Parameter Estimation and Hypothesis Testing for Non-

Appendix C MATLAB Codes for Examples and Case Study

Index ................................................................................................................... 391

1.1 Multi-state Systems in the Real World: General Concepts

Table 1.1 Capacity distribution of 50 MW generator

Number of state Generating capacity (MW) State probability

4. Figure 1.1 presents a coal transportation subsystem (Lisnianski and Levitin

Fig. 1.1 Example of flow transmission series system

a visual image processor (1);

Fig. 1.2 Example of task processing series system

Fig. 1.3 Local power system

7. Another example of MSS is a network of roads with different speed limitations

Fig. 1.4 Bridge-shaped network of roads with different speed limitations

Fig. 1.5 Multiplex refrigeration system

9. Sometimes MSS performance can be represented by using probability meas-

Fig. 1.6 Airport radar system

1.2 Main Definitions and Properties

1.2.1 Generic Multi-state System Model

The steady-state (long-term or stationary, t ) performance distribution can

In practice, performance stochastic processes G j (t ) may be presented in some

and system structure function:

The system output performance rate G(t) at any instant t is

0, if there is more than one failed element;

The values of the system structure function G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) for

Table 1.2 Structure function for 2-out of-3 system

G1(t) G2(t) G3(t) f(G1(t),G2(t),G3(t))

The values of the system structure function G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) for

Table 1.3 Possible states of oil transmission system

G1(t) G2(t) G3(t) f(G1(t),G2(t),G3(t))

1.2.2 Main Properties of Multi-state Systems

1.2.2.1 Relevancy of System Elements

f (G1 (t ),..., G j 1 (t ),1, G j +1 (t ),..., Gn (t )) = 1,

Note that for the binary systems Gj(t){0,1} for 1 j n.

f (G1 (t ),..., G j 1 (t ), g jk , G j +1 (t ),..., Gn (t ))

For example, consider a system of switches connected in a series (Figure 1.9).

Fig. 1.9 MSS with series switches

When commanded to open, each switch has different states corresponding to

Possible switch delays (s) Possible circuit

In the binary system context coherency means that: