Академический Документы
Профессиональный Документы
Культура Документы
Multi-state System
Reliability Analysis and
Optimization for Engineers
and Industrial Managers
123
Anatoly Lisnianski, PhD Ilia Frenkel, PhD
The Israel Electric Corporation Ltd Shamoon College of Engineering
Planning, Development and Technology Industrial Engineering and Management
Division Department
The System Reliability Department Center for Reliability and Risk Management
New Office Building Bialik/Basel Sts.
st. Nativ haor, 1. Beer Sheva 84100
Haifa, P.O.Box 10 Israel
Israel iliaf@sce.ac.il
anatoly-l@iec.co.il
lisnians@zahav.net.il
Yi Ding, PhD
Nanyang Technological University
School of Electrical and Electronic
Engineering
Division of Power Engineering
Singapore
dingyi@ntu.edu.sg
MATLAB is a registered trademark of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA,
01760-2098 USA, www.mathworks.com
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be
reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of
the publishers, or in the case of reprographic reproduction in accordance with the terms of licences
issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms
should be sent to the publishers.
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of
a specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the
information contained in this book and cannot accept any legal responsibility or liability for any errors
or omissions that may be made.
To my wife Tania
Ilia Frenkel
Most books on reliability theory are devoted to traditional binary reliability mod-
els allowing for only two possible states for a system and for its components: per-
fect functionality (up) and complete failure (down). Many real-world systems are
composed of multi-state components that have different performance levels and
several failure modes with various effects on the entire system performance. Such
systems are called multi-state systems (MSSs). Examples of MSS are power sys-
tems, communication systems, and computer systems where the system perform-
ance is characterized by generating capacity, communication, or data processing
speed, respectively. In real-world problems of MSS reliability analysis, the great
number of system states that need to be evaluated makes it difficult to use tradi-
tional binary reliability techniques. Since the mid 1970-s and to the present day
numerous research studies have been published that focus on MSS reliability.
This book is the second one devoted to MSS reliability. The first book devoted
to MSS reliability and optimization was published in 2003 by A. Lisnianski
and G. Levitin, Multi-State System Reliability. Assessment, Optimization and Ap-
plications. World Scientific. Almost 7 years have passed and the MSS extension
of classical binary-state reliability theory has been intensively developed during
this time. More than 100 new scientific papers in the field have been published
since that time; special sessions devoted to MSS reliability have been organized at
international reliability conferences (Mathematical Methods in ReliabilityMMR,
European Safety and Reliability ConferencesESREL, etc.). Additional experi-
ence has also been gathered from industrial settings. Thus, recently MSS reliabil-
ity has emerged as a valid field not only for scientists and researchers, but also for
engineers and industrial managers.
The aim of this book is to provide a comprehensive, up-to-date presentation of
MSS reliability theory based on current achievements in this field and to present a
variety of significant case studies that are interesting for both engineers and indus-
trial managers.
New theoretical issues (that were not presented previously), including com-
bined random process methods and a universal generating function technique, sta-
tistical data processing for MSS, reliability analysis of aging MSS, methods for
calculation of reliability-associated cost for MSS, fuzzy MSS, etc., are described.
The book presents important practical problems such as life cycle cost analysis
and optimal decision making (redundancy and maintenance optimization, optimal
viii Preface
We would like to express our sincere appreciation to our teachers and friends
Prof. Igor Ushakov, founder of the International Group on Reliability The Gne-
denko e-Forum and Prof. Eliyahu Gertsbakh from the Ben Gurion University, Is-
rael. Their works and ideas had a great impact on our book. We would like to
thank our colleagues Dr. L. Khvatskin from SCEShamoon College of Engineer-
ing, Israel, Dr. G. Levitin, Dr. D. Elmakis, Dr. H. BenHaim, and Dr. D. Laredo
from the Israel Electric Corporation, Prof. M. Zuo from the University of Alberta,
Canada, Prof. L. Goel and Prof. P. Wang from Nanyang Technological University,
Singapore, for their friendly support and discussions from which this book bene-
fited.
We would also like to thank the SCEShamoon College of Engineering (Is-
rael), and its president, Prof. J. Haddad and the SCE Industrial Engineering and
Management Department and its dean Prof. Z. Laslo for providing a supportive
and intellectually stimulating environment. We also give thanks to the Internal
Funding Program of SCE for partially supporting our research work.
It was a pleasure working with the Springer senior editorial assistant, Ms.
Claire Protherough.
Anatoly Lisnianski
Israel Electric Corporation Limited, Haifa, Israel
Ilia Frenkel
SCEShamoon College of Engineering, Beer Sheva, Israel
Yi Ding
Nanyang Technological University, Singapore
December 2009
Contents
All systems are designed to perform their intended tasks in a given environment.
Some systems can perform their tasks with various distinctive levels of efficiency
usually referred to as performance rates. A system that can have a finite number
of performance rates is called a multi-state system (MSS). Usually a MSS is com-
posed of elements that in their turn can be multi-state. Actually, a binary system is
the simplest case of a MSS having two distinctive states (perfect functioning and
complete failure).
The basic concepts of MSS reliability were primarily introduced in the mid of
the 1970's by Murchland (1975), El-Neveihi et al. (1978), Barlow and Wu (1978),
and Ross (1979). Natvig (1982), Block and Savits (1982), and Hudson and Kapur
(1982) extended the results obtained in these works. Since that time MSS reliabil-
ity began intensive development. Essential achievements that were attained up to
the mid 1980's were reflected in Natvig (1985) and in El-Neveihi and Prochan
(1984) where can be found the state of the art in the field of MSS reliability at this
stage. Readers that are interested in the history of ideas in MSS reliability theory
at next stages can find the corresponding overview in Lisnianski and Levitin
(2003) and Natvig (2007).
In practice there are many different situations in which a system should be con-
sidered a MSS:
Any system consisting of different binary-state units that have a cumulative ef-
fect on the entire system performance has to be considered a MSS. Indeed, the
performance rate of such a system depends on the availability of its units, as the
different numbers of the available units can provide different levels of task per-
formance. The simplest example of such a situation is the well-known k-out-of-
n systems. These systems consist of n identical binary units and can have n+1
states depending on the number of available units. The system performance rate
is assumed to be proportional to the number of available units. It is assumed
that performance rates corresponding to more than k-1 available units are ac-
ceptable. When the contributions of different units to the cumulative system
2 1 Multi-state Systems in Nature and in Engineering
performance rate are different, the number of possible MSS states grows dra-
matically as different combinations of k available units can provide different
performance rates for the entire system.
The performance rate of elements composing a system can also vary as a result
of their deterioration (fatigue, partial failures) or because of variable ambient
conditions. Element failures can lead to the degradation of the entire MSS per-
formance.
In general, the performance rate of any element can range from perfect func-
tioning up to complete failure. The failures that lead to a decrease in the element
performance are called partial failures. After partial failure, elements continue to
operate at reduced performance rates, and after complete failure the elements are
totally unable to perform their tasks.
Consider the following examples of MSSs:
1. In a power supply system consisting of generating and transmitting facilities,
each generating unit can function at different levels of capacity. Generating
units are complex assemblies of many parts. The failures of different parts may
lead to situations in which the generating unit continues to operate, but at a re-
duced capacity. This can occur during outages of several auxiliaries such as
pulverizers, water pumps, fans, etc. For example, Billinton and Allan (1996)
describe a three-state 50 MW generating unit. The performance rates (generat-
ing capacity) corresponding to these states and probabilities of the states are
presented in Table 1.1.
2. At last time multi-state models are used in medicine (Giard et al. 2002; van den
Hout and Matthews 2008; Marshall and Jones 2007; Putter et al. 2007), etc. In
(van den Hout and Matthews 2008) is considered a cognitive ability during old
age. An illness-death model is presented in order to describe the progression of
an illness over time. The model considers three states: the healthy state, an ill-
ness state, and the death state. The model is used to derive the probability of a
transition from one state to another within a specified time interval.
3. As a next example, consider a wireless communication system consisting of
transmission stations. The state of each station is defined by the number of sub-
sequent stations covered in its range. This number depends not only on the
availability of station amplifiers, but also on the conditions for signal propaga-
tion that depend on weather, solar activity, etc.
1.1 Multi-state Systems in the Real World: General Concepts 3
The amount of coal supplied to the boilers at each time unit proceeds consecu-
tively through each element. The feeders and the stacker-reclaimer can have
two states: working with nominal throughput and total failure. The throughput
of the sets of conveyors (primary and secondary) can vary depending on the
availability of individual two-state conveyors.
5. Another category of the MSS is a task processing system for which the per-
formance measure is characterized by an operation time (processing speed).
This category may include control systems, information or data processing sys-
tems, manufacturing systems with constrained operation time, etc. The opera-
tion of these systems is associated with consecutive discrete actions performed
by the ordered line of elements. The total system operation time is equal to the
sum of the operation times of all of its elements. When one measures the ele-
ment (system) performance in terms of processing speed (reciprocal to the op-
eration time), the total failure corresponds to a performance rate of 0. If at least
one system element is in a state of total failure, the entire system also fails
completely. Indeed, the total failure of the element corresponds to its process-
ing speed equal to 0, which is equivalent to an infinite operation time. In this
case, the operation time of the entire system is also infinite. An example of the
task processing series system (Lisnianski and Levitin 2003) is a manipulator
control system (Figure 1.2) consisting of:
4 1 Multi-state Systems in Nature and in Engineering
The system performance is measured by the speed of its response to the occur-
ring events. This speed is determined by the sum of the times needed for each
element to perform its task (from initial detection of the event to the comple-
tion of the manipulator actuators performance). The time of data transmission
also depends on the availability of channels, and the time of data processing
depends on the availability of the processors as well as on the complexity of
the image. The system reliability is defined as its ability to react within a
specified time during an operation period.
6. Consider the local power supply system presented in Figure 1.3 (Lisnianski and
Levitin 2003). The system is aimed at supplying a common load. It consists of
two spatially separated components containing generators and two spatially
separated components containing transformers. Generators and transformers of
different capacities within each component are connected by a common bus
bar. To provide interchangeability of the components, bus bars of the genera-
tors are connected by a group of cables. The system output capacity (perform-
ance) must be no less than a specified load level (demand).
1.1 Multi-state Systems in the Real World: General Concepts 5
8. The most commonly used refrigeration system for supermarkets today is the
multiplex direct expansion system (Baxter 2002, IEA Annex 26 2003). All dis-
play cases and cold storerooms use direct-expansion air-refrigerant coils that
are connected to the system compressors in a remote machine room located in
the back or on the roof of the store. Heat rejection is usually done with air-
cooled condensers with simultaneously working axial blowers mounted outside.
6 1 Multi-state Systems in Nature and in Engineering
Evaporative condensers can be used as well and will reduce condensing tem-
perature and system energy consumption. Figure 1.5 shows the major elements
of a multiplex refrigeration system. Multiple compressors operating at the same
saturated suction temperature are mounted on a skid, or rack, and are piped
with common suction and discharge refrigeration lines.
Using multiple compressors in parallel provides a means of capacity control,
since the compressors can be selected and cycled as needed to meet the refrig-
eration load. A fault in a single unit or item of machinery cannot have detri-
mental effects on the entire store, only decrease the system cool capacity. Fail-
ure of a compressor or axial condenser blower leads to partial system failure
(reducing output cooling capacity) as well as to complete failures of the system.
We can treat a refrigeration system as a MSS, where the system has a finite
number of states.
Consider the refrigeration system used in a supermarket. The system consists
of four compressors situated in the machine room and two main axial condenser
blowers. It is possible to add one reserve blower. The reserve blower begins to
work only when one of the main blowers has failed.
So, the entire refrigerating system has the following output performance lev-
els.
1.1 Multi-state Systems in the Real World: General Concepts 7
The full performance of the refrigerating system is 10.5 109 BTU per
year.
When one of the compressors fails the refrigeration system reduces its per-
formance to 7.9 109 BTU per year.
When two compressors fail the refrigeration system reduces its perform-
ance to 5.2 109 BTU per year.
When three compressors fail the refrigeration system reduces its perform-
ance to 2.6 109 BTU per year.
Failure of one blower reduces the refrigeration system performance to
5.2 109 BTU per year.
It consists of two identical stations: one of them covers 0 to 110 and the other
covers 70 to180. The MSS performance measure is the probability of success-
fully revealing a target. The probability of revealing a target by one station is
psuc = 0.9. In the overlapping zone the probability is
Psuc = 1 (1 psuc )2 = 0.99 .
Thus, the entire airport radar system will have the following performance lev-
els. If both radars are available, then the entire MSS output performance will be
as follows: g 2 = 40 40
0.99 + 0.9 = 0.92 . If only one radar is
180 180
available, then the MSS performance will be g1 = 110 0.9 = 0.55 . If
180
both radars are unavailable, then the MSS performance will be g0 = 0 .
8 1 Multi-state Systems in Nature and in Engineering
Additional interesting examples one can find in Natvig and Morch (2003),
Levitin (2005), and Kuo and Zuo (2003). In Natvig and Morch (2003) was pre-
sented a detailed investigation of gas pipeline network. Levitin (2005), Kuo and
Zuo (2003), Nordmann and Pham (1999), and Zuo and Liang (1994) considered
special types of MSS such as weighted voting systems, multi-state consecutively
connected systems, sliding window systems. Kolowrocki (2004) describes some
types of communication lines and rope transportation systems.
In order to analyze MSS behavior one has to know the characteristics of its ele-
ments. Any system element j can have kj different states corresponding to the per-
formance rates, represented by the set
{
g j = g j1 , g j 2 ,, g jk j , }
where g ji is the performance rate of element j in the state i, i {1, 2,...k j }.
The performance rate Gj(t) of element j at any instant t 0 is a random vari-
able that takes its values from gj: Gj(t) gj. Therefore, for the time interval [0,T],
where T is the MSS operation period, the performance rate of element j is defined
as a stochastic process.
In some cases, the element performance cannot be measured only by a single
value, but by more complex mathematical objects, usually vectors. In these cases,
the element performance is defined as a vector stochastic process Gj(t).
The probabilities associated with the different states (performance rates) of the
system element j at any instant t can be represented by the set
{ }
p j ( t ) = p j1 ( t ) , p j 2 ( t ) , , p jk j ( t ) , (1.1)
where
{
p ji ( t ) = Pr G j ( t ) = g ji . } (1.2)
1.2 Main Definitions and Properties 9
Note that since the element states compose the complete group of mutually ex-
clusive events (meaning that element j can always be in one and only one of kj
kj
states), p
i =1
ji (t ) = 1, for any t : 0 t T .
Expression (1.2) defines the probability mass function (pmf) for a discrete ran-
dom variable Gj(t) at any instant t. The collection of pairs gji, pji(t),
i = 1, 2, , k j completely determines the probability distribution of performance
of the element j at any instant t.
Observe that the behavior of binary elements (elements with only total failures)
can also be represented by performance distribution (PD). Indeed, consider a bi-
nary element b with a nominal performance (performance rate corresponding to a
fully operable state) g* and the probability that the element is in the fully operable
state p(t). Assuming that the performance rate of the element in a state of complete
{ }
failure is 0, one obtains its PD as follows: gb = 0, g * , pb ( t ) = {1 p ( t ) , p ( t )} .
Fig. 1.7 Cumulative performance curves for steady-state behavior of multi-state elements
Definition 1.1 Let Ln = {g11,..., g1k1 } {g 21,..., g 2 k 2 } ... {g n1,..., g nkn } be the
space of possible combinations of performance rates for all of the MSS elements
and M = {g1,,gK} be the space of possible values of the performance rate for the
entire system.
The transform f ( G1 (t ),, Gn (t ) ) : Ln M , which maps the space of the ele-
ments performance rates into the space of the systems performance rates, is
called the MSS structure function.
Note that the MSS structure function is an extension of a binary structure func-
tion. The only difference is in the definition of the state spaces: the binary struc-
ture function is mapped as {0,1}n {0,1} , while in the MSS, one deals with much
more complex spaces.
Now we can define a generic model of the MSS.
This generic MSS model should include models of the performance stochastic
processes
G j (t ), j = 1, 2,, n (1.3)
for each system element j and of the system structure function that produces the
stochastic process corresponding to the output performance of the entire MSS
G (t ) = f ( G1 (t ), , Gn (t ) ) . (1.4)
g j , p j ( t ) , 1 j n, (1.5)
G (t ) = f ( G1 (t ), , Gn (t ) ) . (1.6)
It also does not matter how the structure function is defined. It can be repre-
sented in a table or in analytical form or be described as an algorithm for unambi-
guously determining the system performance G(t) for any given set
{G1 (t ),, Gn (t )} . Below we will consider examples for some possible representa-
tions of MSS structure functions.
1.2 Main Definitions and Properties 11
Example 1.1 Consider a 2-out-of-3 MSS. This system consists of three binary ele-
ments with the performance rates Gi(t) {gi1, gi2}={0, 1}, for i=1,2,3, where
0, if element i is in a state of complete failure;
g i1 =
1, if element i functions perfectly.
Example 1.2 Consider a flow transmission system [Figure 1.8 (a)] consisting of
three pipes (Lisnianski and Levitin 2003).
(a) (b)
Fig. 1.8 Two different MSSs with identical structure functions
12 1 Multi-state Systems in Nature and in Engineering
The oil flow is transmitted from point C to point E. The pipes performance is
measured by their transmission capacity (ton per minute). Elements 1 and 2 are
binary. A state of total failure for both elements corresponds to a transmission ca-
pacity of 0 and the operational state corresponds to the capacities of the elements
1.5 and 2 tons per minute, respectivelyso that G1(t) {0,1.5}, G2(t) {0,2}. Ele-
ment 3 can be in one of three states: a state of total failure corresponding to a ca-
pacity of 0, a state of partial failure corresponding to a capacity of 1.8 tons per
minute, and a fully operational state with a capacity of 4 tons per minute so that
G3(t) {0,1.8,4}. The system output performance rate is defined as the maximum
flow that can be transmitted from C to E.
The total flow between points C and D through parallel pipes 1 and 2 is equal
to the sum of the flows through each of these pipes. The flow from point D to
point E is limited by the transmitting capacity of element 3. On the other hand,
this flow cannot be greater than the flow between points C and D. Therefore, the
flow between points C and E (the system performance) is
G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
Example 1.3 Consider a data transmission system [Figure 1.8 (b)] consisting of
three fully reliable network servers and three data transmission channels (ele-
ments). The data can be transmitted from server C to server E through server D or
directly. The time of data transmission between the servers depends on the state of
the corresponding channel and is considered to be the channel performance rate.
This time is measured in seconds.
1.2 Main Definitions and Properties 13
Elements 1 and 2 are binary. They may be in a state of total failure when data
transmission is impossible. In this case data transmission time is formally defined
as . They may also be in a fully operational state when they provide data trans-
mission for 1.5 s and 2 s, respectively: G1(t) {,1.5}, G2(t) {,2}. Element 3
can be in one of three states: a state of total failure, a state of partial failure with
data transmission for 4 s, and a fully operational state with data transmission for
1.8 s: G3(t) {,4,1.8}. The system performance rate is defined as the total time
the data can be transmitted from server A to server C.
When the data is transmitted through server D, the total time of transmission is
equal to the sum of times G1(t) and G2(t) it takes to transmit them from server C to
server D and from server D to server E, respectively. If either element 1 or 2 is in
a state of total failure, data transmission through server D is impossible. For this
case we formally state that (+2) = and (+1.5) = . When the data are trans-
mitted from server C to server E directly, the transmission time is G3(t). The mini-
mum time needed to transmit the data from C to E directly or through D deter-
mines the system transmission time. Therefore, the MSS structure function takes
the form
G ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
Note that the different technical systems in Examples 1.2 and 1.3, even when
they have different reliability block diagrams (Figures 1.8 A and B), correspond to
the identical MSS structure functions.
In the binary context, the relevancy of a system element means that in some condi-
tions the state of an entire system completely depends on the state of this element.
In terms of the system structure function, the relevancy of element j means that
there exist such G1(t),,Gn(t) that
entire system state. In terms of the MSS structure function, the relevancy of ele-
ment j means that there exist such G1(t),,Gn(t) that for some gjk gjm
Table 1.4 Possible delays of switches and entire circuit disconnection times
1.2.2.2 Coherency
When the system is operating, no repair or addition of elements can cause the
system to fail.
For MSSs these requirements are met in systems with monotonic structure
functions:
1.2.2.3 Homogeneity
The MSS is homogenous if all of its elements and the entire system itself have the
same number of distinguished states. One can easily see that all binary-state sys-
tems are homogenous.
For example, consider a system of switches connected in series (Figure 1.9).
Assume that all the switches are identical and have the same number of states. The
total failure of a switch corresponds to infinite delay. Since the time of circuit
closing is equal to the closing time of its fastest element and since the elements are
identical, the entire system delay can be equal only to the delay of one of its ele-
ments. The possible system delays are the same as the delays of a single element.
This means that the system is homogenous.
Despite the fact that homogenous MSSs are intensively studied, in real applica-
tions most systems do not possess this property. Indeed, even when considering
the same MSS of series switches and allowing for different switches to have dif-
ferent operational delays, one obtains a MSS in which the number of system states
is not equal to the number of states of the elements (see examples in Table 1.5).
16 1 Multi-state Systems in Nature and in Engineering
Table 1.5 Possible delays of switches and entire circuit disconnection times
MSS behavior is characterized by its evolution in the space of states. The entire
set of possible system states can be divided into two disjoint subsets correspond-
ing to acceptable and unacceptable system functioning. The system entrance into
the subset of unacceptable states constitutes a failure. MSS reliability can be de-
fined as the systems ability to remain in acceptable states during the operation pe-
riod.
Since the system functioning is characterized by its output performance G(t),
the state acceptability at any instant t 0 depends on this value. In some cases
this dependency can be expressed by the acceptability function F(G(t)) that takes
non-negative values if and only if the MSS functioning is acceptable. This takes
place when the efficiency of the system functioning is completely determined by
its internal state. For example, only those states where a network preserves its con-
nectivity are acceptable. In such cases, a particular set of MSS states is of interest
to the customer. Usually these states are interpreted as system failure states,
which,when reached, imply that the system should be repaired or discarded.
Much more frequently, the system state acceptability depends on the relation
between the MSS performance and the desired level of this performance (demand)
that is determined outside of the system. In general, the demand W(t) is also a ran-
dom process. Below we shall consider such a case when the demand can take dis-
crete values from the set w = {w1 ,, wM } . Often the desired relation between the
system performance and the demand can be expressed by the acceptability func-
tion F ( G ( t ) , W ( t ) ) . The acceptable system states correspond to
1.3 Multi-state System Reliability and Its Measures 17
as some arbitrary function. For a power system, where G(t) and W(t) are treated as
respectively, generating capacity and load (demand, which is required by consum-
ers), functional J is interpreted as an energy not supplied to consumers, where
() is defined as follows: (t ) = W (t ) G (t ), if W ( t ) G ( t ) 0 , and
(t ) 0, if W ( t ) G ( t ) < 0. Such a functional J is called a failure criteria func-
tional.
In Section 1.2.2 the MSS relevancy was considered as properties of the structure
function representing the system performance. When the MSS is considered from
the reliability viewpoint, the system demand should be taken into account too. The
system performance value is of interest as well as the demand value. In this con-
text, an element is relevant if changes in its state without changes in the states of
the remaining elements cause changes in the systems reliability. The relevancy is
now treated not as an internal property of the system, but as one associated with
the systems ability to perform a task, which is defined outside the system. In this
context element j, relevancy means that there exist G1(t),,Gn(t) for which for
some gjk gjm
{ }
J f (G1 (t ),..., G j 1 (t ), g jk , G j +1 (t ),..., Gn (t )) ,W J 0 , (1.10)
18 1 Multi-state Systems in Nature and in Engineering
while
{ }
J (G1 (t ),..., G j 1 (t ), g jm , G j +1 (t ),..., Gn (t ) ,W > 0
Note that this condition is tougher than condition (1.8). Indeed, a relevant ele-
ment according to expression (1.8) can be irrelevant according to (1.10).
For example, consider a system of switches connected in a series (Figure 1.9)
and assume that the switches are binary elements with switching delays, presented
in the last row of Table 1.5. Assume that the system disconnection time is not
greater than constant W: ( J = W G ( t ) ) . Observe that for W 0.6, the second
switch is relevant since when the first and third switches do not work, the systems
success depends on the state of the second switch. For W < 0.6, the second switch
is irrelevant since when the first and third switches do not work, the system fails to
meet the demand independently of the state of the second switch. (According to
expression (1.8) the second switch is always relevant).
Using the acceptability function, one can also give a definition of system co-
herency that is more closely related to the one given for binary systems. Indeed,
the definition of coherency for binary systems operates with notions of fault and
normal operation, while when applied to MSS all that is required is the monotonic
behavior of the structure function. In the context of reliability, MSS coherency
means that the improvement in the performance of the system elements cannot
cause the entire system to transition from an acceptable state to an unacceptable
one:
Note that in a steady state the distribution of the variable demand can be repre-
sented (in analogy with the distribution of MSS performance) by two vectors
(w,q), where w = {w1 , , wM } is the vector of possible demand levels wj,
j = 1, , M , and q = {q1 , , qM } is the vector of steady-state probabilities of cor-
responding demand levels q j = Pr {W = w j } , j = 1,, M .
When one considers MSS evolution in the space of states during system opera-
tion period T, the following random variables can be of interest:
Time to failure, T f , is the time from the beginning of the systems life up
to the instant when the system first enters the subset of unacceptable states.
Time between failures, Tb , is the time between two consecutive transitions
from the subset of acceptable states to the subset of unacceptable states.
Number of failures, NT , is the number of times the system enters the subset
of unacceptable states during the time interval [0,T].
In Figure 1.10, one can see an example of the random realization of two sto-
chastic processes G(t) and W(t). Assume that the system performance value ex-
ceeds the value of demand: F ( G ( t ) , W ( t ) ) = G ( t ) W ( t ) . In this case, the first
time that the process G(t) downcrosses the level of demand, W(t) determines the
time to MSS failure. This time is designated as T f . The random variable T f is
characterized by the following indices:
20 1 Multi-state Systems in Nature and in Engineering
{ }
R ( t ) = Pr T f t | F ( G ( 0 ) , W ( 0 ) ) 0. (1.10)
Mean time to failure (MTTF) is the mean time up to the instant when the
system enters the subset of unacceptable states for the first time:
{ }
E Tf . (1.11)
From now on E {}
is used as an expectation symbol.
The same two indices can be defined for the random variable Tb:
The probability that the time between failures is greater than or equal to t:
Pr {Tb t} . (1.12)
E {Tb } . (1.13)
E { NT } . (1.15)
Measures in expressions (1.14) and (1.15) are often important when logistica
problems related to MSS operations are considered (for example, determining the
required number of spare parts).
MSS instantaneous (point) availability A(t,w) is the probability that the MSS at
instant t > 0 is in an acceptable state:
1.3 Multi-state System Reliability and Its Measures 21
{ }
A ( t , w ) = Pr F ( G ( t ) , W ( t ) ) 0 . (1.16)
T
1
1{ F [G (t ), W (t )] 0} dt ,
T 0
AT = (1.17)
where
1, if F [G (t ), W (t )] 0,
1{ F [G (t ),W (t )] 0} =
0, if F [ G (t ),W (t ) ] < 0.
The random variable AT represents the portion of time when the MSS output
performance rate is in an acceptable area. For example, in Figure 1.10
AT = (T T1 T2 ) T . This index characterizes the portion of time when the MSS
output performance rate is not less than the demand.
The expected value of AT is often used and is called demand availability (Aven
and Jensen 1999):
AD = E { AT } . (1.18)
For large t (t ) , the system initial state has practically no influence on its
availability. Therefore, the steady-state (stationary or long-term) MSS availability
A ( w) for the constant demand level W ( t ) = w can be determined on the basis
of the system steady-state performance distribution:
K
A ( w) = pk 1(F ( g k , w) 0) , (1.19)
k =1
where
1, if F ( gi , w) 0,
1( F ( gi , w) 0) =
0, if F ( gi , w) < 0,
and pk = lim pk (t ) is the steady-state probability of the MSS state k with the cor-
t
K
A ( w) = pk 1(g k w) = p. k (1.20)
k =1 gk w
M M K
A ( w , q) = A(wm )qm = qm pk 1(F ( g k , wm ) 0) , (1.21)
m =1 m =1 k =1
Tm Tm
qm = = , m = 1, 2, , M . (1.22)
M
T
T
m =1
m
1.3 Multi-state System Reliability and Its Measures 23
Gmean ( t ) = E {G ( t )} . (1.22)
K
G = pk g k . (1.23)
k =1
The average MSS expected output performance for a fixed time interval [0,T]
is defined as
T
1
T 0
GT = Gmean (t )dt. (1.24)
Observe that the mean MSS performance does not depend on demand.
In some cases a conditional expected performance is used. This index repre-
sents the mean performance of MSS on the condition that it is in an acceptable
state. In the steady state it takes the form
g k pk 1(F ( g k ,W ) 0)
G* = k =1
K . (1.25)
pk 1(F ( gk ,W ) 0)
k =1
Pr { D(t , w) d } ; (1.27)
Dm ( t , w ) = E { D ( t , w )} . (1.28)
K
D = pk max( w g k , 0) . (1.29)
k =1
M K
D ( w, q ) = pk qm max( wm gi , 0) . (1.30)
m =1 i =1
The average MSS expected performance deficiency for a fixed time interval
[0, T ] is defined as follows:
1T
DT = Dt dt . (1.31)
T 0
T
D T = D(t , w)dt. (1.32)
0
1.3 Multi-state System Reliability and Its Measures 25
T T T
D T = (W (t ) G (t ))dt = W (t )dt G (t )dt. (1.33)
o 0 0
the expected amount of the product not supplied to consumers during the
interval [0,T]:
D m=E D T .
{ } (1.35)
four relative capacity levels that characterize the performance of the second gen-
erator: g 21 = 0.0, g 21 =40 /100 = 0.4, g 23 = 80 /100 = 0.8, g 23 = 100 / 100 = 1.0 .
Assume that the corresponding steady state probabilities are as follows:
p11 = 0.1, p12 = 0.6, p13 = 0.3 for the first generator and p21 = 0.05, p22 = 0.25 ,
p23 = 0.3, p24 = 0.4 for the second generator.
The required capacity level is 50 MW, which corresponds to w = 50 / 100 = 0.5.
The MSS stationary availability is
A1 ( w) = A1 (0.5) =
g1 k 0.5
p1k = 0.6 + 0.3 = 0.9,
A 2 ( w) = A2 (0.5) =
g 2 k 0.5
p2 k = 0.3 + 0.4 = 0.7.
3
G1 = p1k g 1k = 0.1 0 + 0.6 0.6 + 0.3 1.0 = 0.66,
k =1
which means 66% of the nominal generating capacity for the first generator and
4
G2 = p 2 k g 2 k = 0.05 0 + 0.25 0.4 + 0.3 0.8 + 0.4 1.0 = 0.74,
k =1
which means 74% of the nominal generating capacity for the second generator.
The steady-state performance deficiency (1.30) is
D1 (0.5) =
g1 k W < 0
p1k (W g1k ) = 0.1 (0.5 0.0) = 0.05,
D2 (0.5) =
g 2 k W < 0
p2 k (W g 2 k ) = 0.05 (0.5 0.0) + 0.25 (0.5 0.4) = 0.05.
Some addition useful information about MSS readers can find in Lisnianski and
Levitin (2003) book, which is completely devoted to MSS reliability and in Aven
and Jensen (1999), Levitin (2005), and Xie et al. (2004), where special chapters
were devoted to MSS reliability.
References
Aven T (1993) On performance measures for multistate monotone systems. Reliab Eng Syst Saf
41:259266
Aven T, Jensen U (1999) Stochastic models in reliability. Springer, New York
Barlow RE, Wu AS (1978) Coherent systems with multi-state components. Math Operat Res 3:
275281
Baxter V (2002) Advances in supermarket refrigeration systems. In: Proceedings of the 7th in-
ternational Energy Agency heat pump conference, Beijing, China
Billinton R, Allan R (1996) Reliability evaluation of power systems. Plenum, New York.
Block H, Savits T (1982) A decomposition of multistate monotone system. J Appl Prob 19:391
402
Doulliez P, Jamoulle E (1972) Transportation networks with random arc capacities. RAIRO 3:
4560
El-Neweihi E, Proschan F (1984) Degradable systems: a survey of multistate system theory.
Commun Stat Theory Methods 13:405432
Giard N, Lichtenstein P, Yashin A (2002) A multi-state model for genetic analysis of the aging
process. Stat Med 21:25112526
Hudson JC, Kapur KC (1982) Reliability theory for multistate systems with multistate compo-
nents. Microelectron Reliab 22:17
IEA Annex 26: Advanced supermarket refrigeration/heat recovery systems, Final Report, Vol-
ume 1. Oak Ridge National Laboratory, Oak Ridge, TN, 2003
Kolowrocki K (2004) Reliability of large systems. Elsevier, Amsterdam
Kuo W, Zuo M (2003) Optimal reliability modeling: principles and applications. Wiley, New
York
Levitin G (2005) Universal generating function in reliability analysis and optimization. Springer,
London
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
Malinowski J, Preuss W (1995) Reliability of circular consecutively connected systems with
multistate components. IEEE Trans Reliab 44:532534
Marshall G, Jones R (2007) Multi-state models in diabetic retinopathy. Stat Med 14(18):1975
1983
Murchland J (1975) Fundamental concepts and relations for reliability analysis of Multistate sys-
tems. In: Barlow RE, Fussell JB and Singpurwalla N (eds) Reliability and fault tree analysis:
theoretical and applied aspects of system reliability. SIAM, Philadelphia: pp 581-618
Natvig B (1982) Two suggestions of how to define a multistate coherent system. Adv Appl
Probab 14: 434455
Natvig B (1985) Multi-state coherent systems. In: Jonson N, Kotz S (eds) Encyclopedia of statis-
tical sciences, vol 5. Wiley, New York: pp 732735
Natvig B, Morch H (2003) An application of multistate reliability theory to an offshore gas pipe-
line network. Int J Reliab Qual Saf Eng 10(4): 361381
Natvig B (2007) Multi-state reliability theory. In: Ruggeri F, Kenett R, Faltin FW (eds) Encyclo-
pedia of Statistics in Quality and Reliability, Wiley, New York: pp 11601164
28 1 Multi-state Systems in Nature and in Engineering
Nordmann L, Pham H (1999) Weighted voting systems. IEEE Trans Reliab 48:4249
Putter H, Fiocco M, Geskus B (2007) Tutorial in biostatistics: competing risk and multi-state
models. Stat Med 26(11):23892430
Ross SM (1979) Multivalued state component systems. Ann Prob 7:379383
Ushakov I (ed) (1994) Handbook of reliability engineering. Wiley, New York
Van den Hout A, Matthews F (2008) Multi-state analysis of cognitive ability data. Stat Med,
published on line Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/3360
Xie M, Dai YS, Poh KL (2004) Computing system reliability: models and analysis. Kluwer/ Ple-
num, New York
Zuo M, Liang M (1994) Reliability of multistate consecutively connected systems. Reliab Eng
Syst Saf 44:173176
2 Modern Stochastic Process Methods for
Multi-state System Reliability Assessment
A stochastic or random process is, essentially, a set of random variables where the
variables are ordered in a given sequence. For example, the daily maximum tem-
peratures at a weather station form a sequence of random variables, and this or-
dered sequence can be considered as a stochastic process. Another example is the
sequence formed by the continuously changing number of people waiting in line at
the ticket window of a railway station.
More formally, the sequence of random variables in a process can be denoted
by X ( t ) , where t is the index of the process.
The values assumed by the random variable X ( t ) are called states, and the set
of all possible values forms the state space of the process. So, a stochastic process
is a sequence of random variables { X ( t ) t T }, defined on a given probability
space, indexed by the parameter t, where t varies over an index set T. In this book,
we mainly deal with stochastic processes where t represents time.
A random variable X can be considered as the rule for assigning to every out-
come of an experiment the value X ( ) . A stochastic process is a rule for as-
signing to every the function X ( t , ) . Thus, a stochastic process is a family of
time functions depending on the parameter or, equivalently, a function of t
and . The domain of is the set of all the possible experimental outcomes and
the domain of t is a set of non-negative real numbers.
For example, the instantaneous speed of a car movement during its trip from
point A to point B will be a stochastic process. The speed on each trip can be con-
sidered as an experimental outcome , and each trip will have its own speed
X ( t , ) that characterizes for this case an instantaneous speed of the trip as a
function of time. This function will be different from such functions of other trips
because of the influence of many random factors (such as wind, broad conditions
etc.). In Figure 2.1 one can see three different speed functions for three trips that
can be treated as three different realizations of the stochastic process. It should be
noticed that the cut of this stochastic process at time instant t1 will represent the
random variable with mean Vm. In real-world systems many parameters such as
temperature, voltage, frequency, etc. may be considered stochastic processes.
The time may be discrete or continuous. A discrete time may have a finite or
infinite number of values; continuous time obviously has only an infinite number
of values. The values taken by the random variables constitute the state space.
This state space, in its turn, may be discrete or continuous. Therefore, stochastic
processes may be classified into four categories according to whether their state
spaces and time are continuous or discrete. If the state space of a stochastic proc-
ess is discrete, then it is called a discrete-state process, often referred to as a chain.
2.1 General Concepts of Stochastic Process Theory 31
n n
F ( x1 , x2 ,..., xn ; t1 , t 2 ,..., t n ) = F ( xi ; ti ) = Pr{ X (ti ) xi } . (2.4)
i =1 i =1
e t ( t ) k
Pr { N ( t1 , t2 ) = k } = . (2.7)
k!
If the intervals ( t1 , t2 ) and ( t3 , t4 ) are not overlapping, then the random vari-
ables N ( t1 , t2 ) and N ( t3 , t4 ) are independent. Using the points ti one can form
the stochastic process X ( t ) = N ( 0, t ) .
The Poisson process plays a special role in reliability analysis, comparable to the
role of the normal distribution in probability theory. Many real physical situations
can be successfully described with the help of Poisson processes.
A well-known type of point process is the so-called renewal process. This
process can be described as a sequence of events, the intervals between which are
independent and identically distributed random variables. In reliability theory, this
kind of mathematical model is used to describe the flow of failures in time.
To every point process ti one can associate a sequence of random variables yn
such that y1 = t1 , y2 = t2 t1 ,..., yn = tn tn 1 , where t1 is the first random point to
the right of the origin. This sequence is called a renewal process. An example is
the life history of items that are replaced as soon as they fail. In this case, yi is the
total time the ith item is in operation and ti is the time of its failure.
One can see a correspondence among the following three processes:
a point process ti ;
a discrete-state stochastic process X ( t ) increasing (or decreasing) by 1 at
points ti ; and
34 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
Pr{ X n = xn | X 0 = x0 , X 1 = x1 ,..., X n 1 = xn 1} =
(2.8)
= Pr{ X n xn | X n 1 = xn 1}.
As in the case of a general Markov process, Equation 2.8 implies that chain be-
havior in the future depends only on its present state and does not depend on its
behavior in the past.
2.2 Markov Models: Discrete-time Markov Chains 35
We designate the probability that at step n the chain will be in state j as p j (n).
Thus, we can write
We also define the probability pij (m, n) that the chain makes a transition to
state j at step n if at step m it was in state i. This probability is a conditional prob-
ability, and we can write the following
The probability mass function of the random value X ( 0 ) is called the initial
probability row-vector
p ( 0 ) = p0 ( 0 ) , p1 ( 0 ) ,..., pM ( 0 ) (2.14)
The problem being considered here is in obtaining an expression for evaluating the
n-step transition probability pij ( n ) from the one-step transition probabilities
pij = pij (1) . Recall that for a homogeneous Markov chain according to expression
(2.11) we have the following:
Let us consider the transition probability pij (m + n) that the process goes to
state j at the (m + n) step, given that at 0 step it is in state i. In order to reach state
j at the (m + n) step the process first reaches some intermediate state k at step m
with probability pik (m) and then moves from k and reaches j at step (m + n) with
probability pkj (n). The Markov property implies that there are two independent
events. Then using the theorem of total probability we obtain
P ( n ) = P P ( n 1) = P n , (2.16)
p ( n) = p ( 0) Pn , (2.18)
where p(0) and p(n) are the row-vectors of the state probabilities initially (at step
n = 0 ) and after the nth step, respectively.
This implies that unconditional state probabilities of a homogeneous Markov
chain are completely determined by the one-step transition probability matrix P
and the initial probability vector p(0).
To illustrate the presented approach, we consider the following example.
Example 2.1 (Bhat and Miller 2002). Assume a two-state Markov chain with the
states denoted by 0 and 1 (Figure 2.2).
since it must hold that p00 + p01 = 1 and p10 + p11 = 1. Assume that p01 = and
p10 = . Then
1
P= .
1
p01 (n 1) = 1 p00 (n 1) .
p00 (1) = 1 ,
p00 (2) = + (1 )(1 ),
p00 (3) = + (1 ) + (1 ) 2 (1 ),
...
p00 (n) = + (1 ) + (1 ) 2 + ... +
+ (1 ) n 2 + (1 ) n 1 (1 )
n 2
= (1 ) k + (1 ) n 1 (1 ).
k =0
Based on the formula for the sum of a finite geometric series, we can write:
n2 1 (1 ) n 1 1 (1 ) n 1
(1 ) k = 1 (1 )
=
+
.
k =0
Therefore, the expression for p00 (n) can be rewritten in the following form:
(1 ) n
p00 (n) = + .
+ +
(1 )n
p01 (n) = 1 p00 (n) = .
+ +
Expressions for the two remaining entries p10 (n) and p11 (n) can be found in a
similar way. (Readers can do it themselves as an exercise.)
Thus, the n-step transition probability matrix can be written as
+ (1 ) n (1 ) n
+ +
P ( n) = Pn = .
(1 )n + (1 ) n
+ +
Based on this n-step transition probability matrix and on the given initial state
probability row-vector p ( 0 ), one can find state probabilities after the nth step by
using Equation 2.18
40 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
+ (1 ) n (1 ) n
+ +
p ( n ) = p ( 0 ) P n = [ a,1 a ]
(1 )n + (1 ) n
+ +
+ (1 ) n (1 )n
= [a( + ) ], [a ( + ) ] .
+ +
Therefore, the state probabilities after the nth step are as follows:
+ (1 ) n
p0 (n) =
+
[ a( + ) ],
(1 )n
p1 (n) =
+
[ a( + ) ].
Pr{ X (t + t ) = i | X (t ) = j} = ji (t , t + t ) . (2.20)
ji (t , t + t ) = ji (t , t ) .
1, if j = i,
ji (t , t ) = (2.21)
0, otherwise.
Taking into account (2.21) one can define for each j a non-negative continuous
function a j ( t ) :
jj (t , t ) jj (t , t + t ) 1 jj (t , t + t )
a j (t ) = lim = lim (2.22)
t 0 t t 0 t
ji (t , t ) ji (t , t + t ) ji (t , t + t )
a ji (t ) = lim = lim . (2.23)
t 0 t t 0 t
jj (t ) + ji (t ) = 1. (2.24)
i j
1
a jj = a j = lim
t 0
ji (t ) =
t i j i j
a ji . (2.25)
pi (t ) = Pr { X (t ) = i} , j = 1,..., K ; t 0. (2.26)
Expression (2.26) defines the probability mass function (pmf) of X(t) at time t.
Since at any given time the process must be in one of K states,
p (t ) = 1
i =1
i (2.27)
for any t 0.
By using the theorem of total probability, for given t > t1 , we can express the
pmf of X(t) in terms of the transition probabilities ij (t1 , t ) and the pmf of X(t1):
Pr { X (t ) = j | X (t1 ) = i}
(2.31)
= Pr { X (t ) = j | X (t2 ) = k , X (t1 ) = i} Pr { X (t2 ) = k | X (t1 ) = i} .
k S
p j (t + t ) = p j (t ) 1 a ji t + pi (t )aij t , i, j = 1,..., K . (2.32)
i j i j
respectively.
2. At instant t the process may be in one of the states i j and during time t
transits from state i to state j. These events have probabilities pi ( t ) and aij t ,
respectively. These probabilities should be multiplied and summarized for all
i j because the process can achieve state j from any state i.
Now one can rewrite (2.32) by using (2.29) and obtain the following:
or
p j (t + t ) p j (t )
K K K
= pi (t )aij t + p j (t )a jj t = pi (t )aij t p j (t ) a ji t. (2.34)
i =1 i =1 i =1
i j i j i j
dp j (t ) K K
= pi (t )aij p j (t ) a ji , j = 1, 2,..., K . (2.35)
dt i =1 i =1
i j i j
44 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
The system of differential equations (2.35) is used for finding the state prob-
abilities p j ( t ) , j = 1, , K for the homogeneous Markov process when the initial
conditions are given as
p j (t ) = j , j = 1,..., K . (2.36)
dp (t )
= p (t ) a . (2.38)
dt
K
Note that the sum of the matrix elements in each row equals 0: a
j =1
ij = 0 for
each i (1 i K ) .
When the system state transitions are caused by failures and repairs of its ele-
ments, the corresponding transition intensities are expressed by the elements fail-
ure and repair rates.
An elements failure rate ( t ) is the instantaneous conditional density of the
probability of failure of an initially operational element at time t given that the
element has not failed up to time t. Briefly, one can say that ( t ) is the time-to-
failure conditional probability density function (pdf). It expresses a hazard of fail-
2.3 Markov Models: Continuous-time Markov Chains 45
ure in time instant t under a condition where there was no failure up to time t. The
failure rate of an element at time t is defined as
1 F (t + t ) F (t ) f (t )
(t ) = lim
t 0 t R (t ) = R(t ) , (2.39)
= MTTF 1 , (2.40)
where MTTF is the mean time to failure. Similarly, the repair rate ( t ) is the
time-to-repair conditional pdf. For homogeneous Markov processes a repair rate
does not depend on t and can be expressed as
= MTTR 1 , (2.41)
and they are independent of the initial state j S . If the steady-state probabilities
exist, the process is called ergodic. For the steady-state state probabilities, the
computations become simpler. The set of differential equations (2.35) is reduced
to a set of K algebraic linear equations because for the constant probabilities all
dpi ( t )
time-derivatives are equal to zero, so = 0, i = 1,..., K .
dt
Let the steady-state probabilities pi = lim pi ( t ) exist. For this case in steady
t
state, all derivatives of state probabilities on the left-hand side of (2.35) will be ze-
46 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
roes. So, in order to find the long-run probabilities the following system of alge-
braic linear equations should be solved:
k K
0 = pi (t )aij p j (t ) a ji , j = 1, 2,..., K . (2.43)
i =1 i =1
i j i j
The K equations in (2.43) are not linearly independent (the determinant of the
system is zero). An additional independent equation can be provided by the simple
fact that the sum of the state probabilities is equal to 1 at any time:
p
i =1
i = 1. (2.44)
From the definition of the state frequency it follows that, in the long run, f i
equals the reciprocal of the mean cycle time
1
fi = . (2.46)
T ci
Ti
T i fi = = pi . (2.47)
T ci
Therefore
2.3 Markov Models: Continuous-time Markov Chains 47
pi
fi = . (2.48)
Ti
This is a fundamental equation, which provides the relation between the three
state parameters in the steady state.
Unconditional random value Ti is minimal from all random values Tij that
characterize the conditional random time of staying in state i if the transition is
performed from state i to any state j i :
All conditional times Tij are distributed exponentially with the following cu-
mulative distribution functions Fij (Tij t ) = 1 e
aij t
. All transitions from state i
are independent and, therefore, the cumulative distribution function of uncondi-
tional time Ti of staying in state i can be computed as follows:
aij t (2.50)
= 1 1 Fij (Tij t ) = 1 e
aij t
= 1 e j i
.
j i j i
1 . (2.51)
Ti =
a
j i
ij
f i = pi aij . (2.52)
j i
According to the generic MSS model (Chapter 1), any system element j can have
kj different states corresponding to the performance rates, represented by the set
{ }
g j = g j1 ,..., g jk j . The current state of the element j and, therefore, the current
value of the element performance rate G j ( t ) at any instant t are random variables.
G j ( t ) takes values from g j : G j ( t ) g j . Therefore, for the time interval [0,T],
where T is the MSS operation period, the performance rate of element j is defined
as a stochastic process. Note that we consider only the Markov process where the
state probabilities at a future instant do not depend on the states occupied in the
past.
In this subsection, when we deal with a single multi-state element, we can omit
index j for the designation of a set of the elements performance rates. Thus, this
set is denoted as g = { g1 ,..., g k } . We also assume that this set is ordered so that
gi+1 gi for any i.
The elements can be divided into two groups. Those elements that are observed
only until they fail belong to the first group. These elements either cannot be re-
paired, or the repair is uneconomical, or only the life history up to the first failure
is of interest. Those elements that are repaired upon failure and whose life histo-
ries consist of operating and repair periods belong to the second group. In the fol-
lowing subsections, both groups are discussed.
As mentioned above, the lifetime of a non-repairable element lasts until its first
entrance into the subset of unacceptable states. In general, the acceptability of an
elements state depends on the relation between the elements performance and
the desired level of this performance (demand). The demand W(t) is also a random
process that takes discrete values from the set w = { w1 , , wM }. The desired rela-
tion between the system performance and the demand can be expressed by the ac-
ceptability function F(G(t),W(t)).
First consider a multi-state element with only minor failures defined as failures
that cause element transition from state i to the adjacent state i1. In other words,
a minor failure causes minimal degradation of element performance. The state-
space diagram for such an element is presented in Figure 2.3.
The element evolution in the state space is the only performance degradation
that is characterized by the stochastic process {G(t) | t 0 }. The transition inten-
sity for any transition from state i to state i1 is i ,i 1 , i = 2,..., k .
2.3 Markov Models: Continuous-time Markov Chains 49
Fig. 2.3 State-transition diagram for non-repairable element with minor failures
When the sojourn time in any state i (or in other words, the time up to a minor
failure in state i) is exponentially distributed with parameter i ,i 1 , the process is a
continuous-time Markov chain. Moreover, it is the widely known pure death
process (Trivedi 2002). Let us define the auxiliary discrete-state continuous time
stochastic process { X ( t ) | t 0} , where X ( t ) {1,..., k} . This process is strictly
associated with the stochastic process {G ( t ) | t 0} . When X ( t ) = i, the corre-
sponding performance rate of a multi-state element is gi : G ( t ) = gi . The process
X(t) is a discrete-state stochastic process decreasing by 1 at the points ti, i = 1, , k ,
when the corresponding transitions occur. The state probabilities of X(t) are
Note that
k
pi (t ) = 1 (2.54)
i =1
for any t 0 , since at any given time the process must be in some state.
According to the system (2.35), the following differential equations can be
written in order to find state probabilities for the Markov process presented in
Figure 2.3:
50 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
dpk (t )
dt = k , k 1 pk (t ),
dpi (t )
= i +1,i pi +1 (t ) i ,i 1 pi (t ), i = 2,3,..., k 1, (2.55)
dt
dp1 (t )
dt = 2,1 p2 (t ).
One can see that in state k there is only one transition from this state to the state
k1 with the intensity of k , k 1 and there are no transitions to state k. In each state
i, i = 2,3,, k 1, there is one transition to this state from the previous state i+1
with the intensity i +1,i and there is one transition from this state to state i1 with
the intensity i ,i 1 . Observe that there are no transitions from state 1. This means
that if the process enters this state, it is never left. State 1 for non-repairable multi-
state elements is the absorbing state.
We assume that the process begins from the best state k with a maximal ele-
ment performance rate of gk. Hence, the initial conditions are
Using widely available software tools, one can obtain the numerical solution of
the system of differential equations (2.55) under initial conditions (2.56) even for
large k. The system (2.55) can also be solved analytically using the Laplace
Stieltjes transform (Gnedenko and Ushakov 1995). Using this transform and tak-
ing into account the initial conditions (2.56) one can represent (2.55) in the form
of linear algebraic equations:
sp k ( s ) 1 = k , k 1 p k ( s ),
sp i ( s ) = i +1,i p i +1 ( s ) i ,i 1 p i ( s ), i = 2,3,..., k 1, (2.57)
sp1 ( s ) = 2,1 p 2 ( s ),
where p k ( s ) = L { pk (t )} = e st pk (t ) is the LaplaceStieltjes transform of a func-
0
dp (t )
tion pk (t ) and L k = sp k ( s ) pk (0) is the LaplaceStieltjes transform of
dt
the derivative of a function pk (t ).
The system (2.57) may be rewritten in the following form:
2.3 Markov Models: Continuous-time Markov Chains 51
1
p k ( s ) = s + ,
k , k 1
i +1,i
p i ( s ) = p i +1 ( s ), i = 2,3,..., k 1, (2.58)
s + i , k 1
p1 ( s ) = 2,1 p 2 ( s ).
s
Starting to solve this system from the first equation and sequentially substitut-
ing the obtained results into the next equation, one obtains
1
p k ( s ) = ,
s + k , k 1
i +1,i i + 2,i +1 k , k 1 1
p i ( s ) = ... , i = 2,3,..., k 1, (2.59)
( s + i ,i 1 ) ( s + i +1,i ) ( s + k 1, k 2 ) ( s + k , k 1 )
3,2 4,3 k , k 1 1
p1 ( s ) = 2,1 ... .
s ( s + 2,1 ) ( s + 3,2 ) ( s + k 1, k 2 ) ( s + k , k 1 )
R1 (t ) = 1 p1 (t ) . (2.60)
i
Ri (t ) = 1 p j (t ) . (2.61)
j =1
52 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
The mean time up to multi-state element failure for this constant demand level
can be interpreted as the mean time up to the process entering state i. It can be cal-
culated as the sum of the time periods during which the process remains in each
state j > i. Since the process begins from the best state k with the maximal ele-
ment performance rate gk [the initial conditions (2.56)], we have
k
1
MTTFi =
j = i +1
, i = 1, 2,..., k 1 . (2.62)
j , j 1
According to (1.23) one can obtain the element mean instantaneous perform-
ance at time t as
k
Et = gi pi (t ) . (2.63)
i =1
The element mean instantaneous performance deficiency for the constant de-
mand w according to (1.29) is
k
Dt = pi (t ) max( w g i ,0) . (2.64)
i =1
dp4 (t )
dt = 4,3 p4 (t ),
dp3 (t ) = p (t ) p (t ),
dt 4,3 4 3,2 3
dp2 (t ) = p (t ) p (t ),
dt 3,2 3 2,1 2
dp (t )
1 = 2,1 p2 (t ),
dt
1 4,3
p 4 ( s ) = , p 3 ( s ) = ,
s + 4,3 ( s + 3,2 )( s + 4,3 )
3,2 4,3 2,13,2 4,3
p 2 ( s ) = , p1 ( s ) = .
( s + 2,1 )( s + 3,2 )( s + 4,3 ) s ( s + 2,1 )( s + 3,2 )( s + 4,3 )
4 ,3t
p4 (t ) = e ,
4,3 3,2 t 4 ,3
p3 (t ) = (e e ),
4,3 3,2
3,2 4,3 [(4,3 3,2 )e t + (2,1 4,3 )e t + (3,2 2,1 )e t ]
2 ,1 3,2 4 ,3
p2 (t ) = ,
(3,2 2,1 )(4,3 3,2 )(2,1 4,3 )
p4 (t ) = 1 p2 (t ) p3 (t ) p4 (t ).
R1 (t ) = 1 p1 (t ), for g1 < w g 2 ,
R2 (t ) = 1 p1 (t ) p2 (t ), for g 2 < w g3 ,
R3 (t ) = 1 p1 (t ) p2 (t ) p3 (t ) = p4 (t ), for g3 < w g 4 .
0.8
p1(t)
p2(t)
0.6 p3(t)
Probability
p4(t)
R1(t)
0.4 R2(t)
0.2
0
0 2 4 6 8
Time (years)
Fig. 2.4 State probabilities and reliability measures for non-repairable element with minor fail-
ures
4
Et = gi pi (t ) = 10 p4 (t ) + 8 p3 (t ) + 5 p2 (t ) + 0 p1 (t ).
i =1
The demand is constant during the flight and w = 6 KW. Therefore, according
to (2.64), the element mean instantaneous performance deficiency is
4
Dt = pi (t ) max( w gi , 0) = 1 p2 (t ) + 6 p1 (t ).
i =1
Tservice
EENS =
0
Dt dt 0.547 KWh .
2.3 Markov Models: Continuous-time Markov Chains 55
10
Et
8 Dt
6
Kw
0
0 2 4 6 8
Time (years)
Fig. 2.5 Mean instantaneous performance and mean instantaneous performance deficiency for
non-repairable element with minor failures
1 1 1
MTTF1 = + + = 2.93 year for g1 < w g 2 ,
4,3 3,2 2,1
1 1
MTTF2 = + = 1.5 year for g 2 < w g3 ,
4,3 3,2
1
MTTF3 = = 0.5 year for g3 < w g 4 .
4,3
For the constant demand w = 6 KW, the mean time to failure is equal
to MTTF2 = 1.5 years. The probability that this failure (decreasing the generating
capacity lower than a demand level of 6 KW) will not occur during the service
time according to the graph in Figure 2.4 will be as follows:
Now consider a non-repairable multi-state element that can have both minor
and major failures (a major failure is a failure that causes the element transition
from state i to state j: j < i 1 ). The state-space diagram for such an element rep-
resenting transitions corresponding to both minor and major failures is presented
in Figure 2.6.
56 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
Fig. 2.6 State-transition diagram for non-repairable element with minor and major failures
dpk (t ) k 1
dt = p k (t ) k ,e ,
e =1
dpi (t ) k i 1
= e ,i pe (t ) pi (t ) i , e , i = 2,3,..., k 1, (2.65)
dt e = i +1 e =1
dp1 (t ) k
= e ,1 pe (t ),
dt e=2
The more general model of a multi-state element is the model with repair. The re-
pairs can also be both minor and major. A minor repair returns an element from
state j to state j + 1 while a major repair returns it from state j to state i, where
i > j + 1.
The special case of the repairable multi-state element is an element with only
minor failures and minor repairs. The stochastic process corresponding to such an
element is called the birth and death process. The state-space diagram of this proc-
ess is presented in Figure 2.7 (a).
k-1,k k,k-1
k-1 k,2
2,k
k-2,k-1 k-1,k-2
1,k k,1
... ...
2,3 3,2
1,k-1 k-1,1
2
1,2 2,1
(a) (b)
Fig. 2.7 State-transition diagrams for repairable element with minor failures and repairs (a) and
for repairable element with minor and major failures and repairs (b)
The state-space diagram for the general case of the repairable multi-state ele-
ment with minor and major failures and repairs is presented in Figure 2.7 (b). The
following system of differential equations can be written for the state probabilities
of such elements:
58 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
dpk (t ) k 1 k 1
dt = e , k pe (t ) pk (t ) k ,e ,
e =1 e =1
dpi (t ) k i 1 i 1 k
= e ,i pe (t ) + e ,i pe (t ) pi (t )( i , e + i , e ),
dt e = i +1 e =1 e =1 e = i +1 (2.66)
i = 2,3,..., k 1
dp1 (t ) k k
dt = e ,1 pe (t ) p1 (t ) 1, e ,
e=2 e=2
with the initial conditions (2.56). Solving this system one obtains the state prob-
abilities pi ( t ) , i = 1, , k .
When F ( gi , w ) = gi w for the constant demand level gi < w gi +1 , the ac-
ceptable states where the element performance is above level gi are i + 1,, k .
Thus, the instantaneous availability is
k
Ai (t ) = pe (t ) . (2.67)
e = i +1
the repairable element. As was said above, if the steady-state probabilities exist,
the process is called ergodic. For the steady-state probabilities the computations
become simpler. The set of differential equations (2.66) is reduced to a set of k al-
gebraic linear equations because for the constant probabilities all time-derivatives
dp (t )
are equal to zero, thus, i = 0 , i=1,,k.
dt
Let the steady-state probabilities pi = lim pi (t ) exist. In order to find the prob-
t
abilities the following system of algebraic linear equations should be solved
k 1 k 1
0 = p
e, k m p k k , e ,
e =1 e =1
k i 1 i 1 k
0 = e ,i pe + e ,i pe pi ( i ,e + i ,e ), i = 2,3,..., k 1, (2.68)
e = i +1 e =1 e =1 e = i +1
k k
0 = e ,1 pe p1 1,e .
e=2 e=2
2.3 Markov Models: Continuous-time Markov Chains 59
The k equations in (2.68) are not linearly independent (the determinant of the
system is zero). An additional independent equation can be provided by the simple
fact that the sum of the state probabilities is equal to 1 at any time:
k
pi = 1 . (2.69)
i =1
i
k ,0 = k , j
j =1
i
k 1,0 = k 1, j
j =1
i
i +1,0 = i +1, j
j =1
Fig. 2.8 State-transition diagram for determination of reliability function Ri(t) for repairable ele-
ment (for a constant demand rate w: gi<w<gi+1)
In order to find the element reliability function Ri (t ), for the constant demand
w ( gi < w gi +1 ) , an additional Markov model should be built. All states 1,2,,i
of the element corresponding to the performance rates that are lower than the de-
mand w should be united in one absorbing state. This absorbing state can be con-
sidered now as state 0 and all repairs that return the element from this state back to
the set of acceptable states should be forbidden. This corresponds to zeroing all
60 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
the transition intensities 0, m for m = i + 1,..., k . The transition rate m,0 from
any acceptable state m (m > i ) to the united absorbing state 0 is equal to the sum
of the transition rates from state m to all the unacceptable states (states 1,2,,i):
i
m,0 = m, j , m = k , k 1,, i + 1. (2.70)
j =1
dpk (t ) k 1 k 1
= e , k pe (t ) pk (t ) k ,e + k ,0 ,
dt e = i +1 e = i +1
dp (t ) k j 1
j 1 k
j = e , j pe (t ) + e , j pe (t ) p j (t ) j , e + j ,0 + j ,e ,
dt e = j +1 e =1 e = i +1 e = j +1 (2.71)
for i < j < k
dp (t ) k
0 = e ,0 pe (t ).
dt e = i +1
pk (0) = 1, pk 1 ( 0 ) = = pi ( 0 ) = p0 ( 0 ) = 0
k
Ri (t ) = 1 p0 (t ) = p j (t ) . (2.72)
j = i +1
Obviously, the final state probabilities for system (2.71) are as follows:
pk = pk 1 = = pi +1 = 0 , p0 = 1,
k
Based on the computed reliability function Ri (t ) = p j (t ) one can find the
j = i +1
mean time to first failure, when the element performance drops for the first time
under demand level w, where gi < w gi +1 :
MTTFi = Ri (t )dt . (2.73)
0
dp4 (t )
dt = (4,3 + 4,2 + 4,1 ) p4 (t ) + 3,4 p3 (t ) + 2,4 p2 (t ) + 1,4 p1 (t ),
dp3 (t ) = p (t ) ( + + ) p (t ) + p (t ) + p (t ),
dt 4,3 4 3,2 3,1 3,4 3 1,3 1 2,3 2
dp2 (t ) = p (t ) + p (t ) ( + + ) p (t ) + p (t ),
dt 4,2 4 3,2 3 2,1 2,3 2,4 2 1,2 1
dp (t )
1 = 4,1 p4 (t ) + 3,1 p3 (t ) + 2,1 p2 (t ) ( 1,2 + 1,3 + 1,4 ) p1 (t ),
dt
(a) (b)
Fig. 2.9 State-transition diagrams for four-state element with minor and major failures and re-
pairs
A3 (t ) = p4 (t ), for g3 < w g 4 ,
A2 (t ) = p4 (t ) + p3 (t ), for g 2 < w g3 ,
A1 (t ) = p4 (t ) + p3 (t ) + p2 (t ) = 1 p1 (t ), for g1 < w g 2 .
0.995
0.99 A1(t)
A2(t)
Availability
A3(t)
0.985
0.98
0.975
0.97
0 0.02 0.04 0.06 0.08 0.1
Time (years)
4
Et = g k pk (t ) = 100 p4 (t ) + 80 p3 (t ) + 50 p2 (t ) + 0 p1 (t ).
k =1
Aw ( t ) = A2 ( t ) .
4
Dt = pk (t ) max( w g k ,0) = 10 p2 (t ) + 60 p1 (t ) .
k =1
The indices Dt and Et, as functions of time, are presented in Figure 2.11.
100 0.1
Instantaneous mean performance
99.9
Performance deficiency
0.08
99.8
99.7 0.06
99.6
0.04
99.5
0.02
99.4
If one wants to find only the final state probabilities he can do it without solv-
ing the system of differential equations. As was shown above, the final state prob-
abilities can be found by solving the system of linear algebraic equations (2.68) in
which one of the equations is replaced with Equation 2.69. In our example, the
system of linear algebraic equations that should be solved takes the form
64 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
where
A = p4 + p3 ,
4
E = g k pk = 100 p4 + 80 p3 + 50 p2 + 0 p1 ,
k =1
4
D = pk max( w g k ,0) = 10 p2 + 60 p1 .
k =1
As one can see in Figures 2.10 and 2.11 the steady-state values of the state
probabilities are achieved during a short time period. After 0.07 years, the process
becomes stationary. Due to this consideration, only the final solution is important
2.3 Markov Models: Continuous-time Markov Chains 65
in many practical cases. This is especially so for elements with a relatively long
lifetime. This is the case in our example if the element lifetime is at least several
years. However, if one deals with highly responsible components and takes into
account even small information losses at the beginning of the process, an analysis
based on a system of differential equations should be performed.
In order to find the element reliability function Rw (t ), for the constant demand
w = 60 s 1 ( g 2 < w g3 ) , an additional Markov model should be built. States 1
and 2 corresponding to performance rates that are lower than the demand w should
be united in one absorbing state. This absorbing state can be considered now as
state 0 and all repairs that return the element from this state back to the set of ac-
ceptable states should be forbidden. This corresponds to zeroing the transition in-
tensities 0,3 and 0,4 . The transition rates from the acceptable states 3 and 4 to the
united absorbing state 0 are equal to the sum of the corresponding transition rates
from these states to the unacceptable states 1 and 2. According to (2.70) we obtain
4,0 = 4,1 + 4,2 , 3,0 = 3,1 + 3,2 .
The state-space diagram for computation of the reliability function Rw (t ) is pre-
sented in Figure 2.9 (b). For this state-space diagram, the state probability p0 (t )
characterizes the reliability function of the element because after the first entrance
into the absorbing state 0 the element never leaves it.
The system of differential equations for determining the reliability function of
the element takes the form
dp4 (t )
dt = (4,3 + 4,2 + 4,1 ) p4 (t ) + 3,4 p3 (t ),
dp3 (t )
= 4,3 p4 (t ) (3,2 + 3,1 + 3,4 ) p3 (t ),
dt
dp0 (t )
dt = (4,1 + 4,2 ) p4 (t ) + (3,1 + 3,2 ) p3 (t ).
Solving this system under initial conditions p4 (0) = 1, p3 (0) = p0 (0) = 0 we ob-
tain the reliability function as Rw (t ) = 1 p0 (t ) . This function is presented in Fig-
ure 2.12.
When the reliability function is known, the mean time to first failure (element's
capacity dropping below to demand w = 60 s 1 ) can be found by using (2.73):
MTTFw = Rw (t )dt 2.3 years.
0
66 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
0.8
0.6
Reliability
0.4
0.2
0
0 2 4 6 8
Time (years)
Fig. 2.12 Reliability Rw(t) of a four-state element
the current value of the element performance rate G j ( t ) at any instant t are ran-
dom variables. G j ( t ) takes values from g j : G j ( t ) g j . The performance rate of
any element j is defined as a continuous-state Markov process in the time interval
[0,T], where T is the MSS operation period. Such models for different types of
MSS elements were studied in the previous section.
According to the generic MSS model, we assume that
space of the performance rates of the elements into the space of the system per-
formance rates at any instant t, defines the system structure function. Therefore,
by using the structure function, the entire MSS performance rate can be computed
for any combination of performance rates of system elements. The current state of
the entire MSS and, therefore, the current value of the system output performance
rate G(t) at any instant t are random variables. G(t) is a continuous-time Markov
chain that takes values from g : G ( t ) g = { g1 , g 2 , , g K }.
We suppose that Markov processes for different elements are independent and
that there are no simultaneous state transitions of any different elements. In other
words, there may be only one failure or one repair in a system at any instant t.
The traditional application of the Markov technique to MSS reliability evalua-
tion consists of two stages: development of the state-space diagram for the entire
system and the evaluation of the systems reliability based on solving a system of
differential equations corresponding to the diagram.
The proper design of the state-transition diagram is a critical task in Markov
analysis, especially for the MSS. The explosion of the number of states when the
modeled system is large enough is still a major problem. In such cases a state-
space diagram representation in its pictorial form often becomes impossible. One
of the possible solutions is to use a formalized description of the system. When
such a description is used, the state-space diagram is not actually presented in pic-
torial form, but knowledge of the rules that govern the MSS evolution enable us to
explore the state-space graph systematically by using a computer. In addition, it is
important to understand that the state-space diagram plays only an auxiliary role.
The main aim here is to determine the transition intensity matrix a that defines the
system of differential equations (2.38) and hence the corresponding Markov
model. Therefore, in this context we speak about the formalized generation of the
transition intensity matrix and, therefore, about the Markov model generation.
Based on this idea, efficient algorithms are built for the reliability evaluation. One
possible algorithm for Markov model generation for the MSS is as follows.
{ ( j)
k j , k j 1 , k( jj,)k j 2 ,..., k( jj,1) , k( jj) 1, k j 2 , k( jj) 1, k j 3 ,..., k( jj) 1,1 ,..., 3,2
( j)
, 3,1
( j)
, 2,1
( j)
}
and the ordered set of repair rates
68 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
{ ( j)
1,2 ,..., 1,( kj )j 1 , 1,( kj )j , 2,3
( j)
}
,..., 2,( jk)j 1 , 2,( jk)j ,..., k( jj ) 2, k j 1 , k( jj ) 2, k j , k( jj ) 1, k j .
If for element j there is no failure that causes a decrease in the element per-
formance from level gjm to level g jm m , the corresponding failure rate
(mj,)m m is equal to zero in the failure rate set. In the same manner, if there is
no repair that returns the performance of element j from level g jm m to level
( j)
gjm, the corresponding repair rate m m, n is equal to zero in the repair rate
set.
3. Enumeration of the system states and the computation of the MSS output per-
formance
All system states should be enumerated. For computer-based algorithms the
enumeration order is not important. What is really important is the correspon-
dence among the number of states ns ( ns [1, K ]) , the set of performance
rates of elements in this state {g1i ,..., g nl }, and the MSS output performance
rate g ns in this state that is determined by the MSS structure function
{g 1i } {
,, g jm ,, g nh g1i ,, g jf ,, g nh , where m f , 1 j n. }
2.3 Markov Models: Continuous-time Markov Chains 69
The transition in which f < m corresponds to the element failure (with transi-
tion intensity ( with transition intensity ) and the transition in which
( j)
m, f
an1n2 = m( j, )f . (2.74)
If the MSS transits from state n1 to state n2 because of repair with intensity
m( j, )f ( f > m ) of an arbitrary element j, then the element an1n2 of transition
matrix a located in the intersection of row n1 and column n2 is
an1n2 = m( j, )f . (2.75)
If the transition from state n1 to state n2 does not exist, then the element
an1n2 of transition matrix a located in the intersection between row n1 and col-
umn n2 is zero:
an1n2 = 0 . (2.76)
K
aii = ain , i = 1,..., K . (2.77)
n =1
n i
The algorithm described above is general. It can build a Markov model for
quite complex MSS and reduces the risk of errors and misrepresentations.
MSS reliability indices such as instantaneous availability, instantaneous ex-
pected performance, and instantaneous performance deficiency can be found in the
same way as was demonstrated for multi-state element (the only difference is the
greater order of the system of differential equations).
At first, the system of differential equations must be solved and probabilities
pi(t) must be found for all system states i=1,,K.
For the constant demand level w the MSS instantaneous availability can be ob-
tained as the sum of probabilities of all acceptable states (the states where MSS
output performance is greater than or equal to w). Therefore, MSS instantaneous
availability can be defined as
K
A(t ) = pi (t ) 1( gi w), (2.78)
i =1
K
Et = gi pi (t ), (2.79)
i =1
K
Dt = pi (t ) max( w g i ,0). (2.79)
i =1
In order to find the MSS reliability function Ri (t ) for the constant demand w,
gi < w gi +1 , the Markov model should be changed. All the system states from
the unacceptable area where the performance rate is lower than demand w, should
be united in one absorbing state with the number 0. Transitions from state 0 to any
acceptable state should be forbidden. The transition rate from any acceptable state
j to the absorbing state should be determined as the sum of the transition rates
from state j to all the unacceptable states. After performing these changes, we ob-
tain the new transition intensity matrix. By solving the differential equation (2.38
with this matrix one obtains the probability of state 0 p0 (t ) and determines the
system reliability function as R(t ) = 1 p0 (t ).
Example 2.4 (Lisnianski and Levitin 2003). Consider the flow transmission sys-
tem from Example 1.2 (Chapter 1) that was presented in Figure 1.8 (a). It consists
of three elements (pipes). The oil flow is transmitted from point C to point E. The
performance of the pipes is measured by their transmission capacity (tons per min-
ute). Elements 1 and 2 are repairable and each has two possible states. A state of
2.3 Markov Models: Continuous-time Markov Chains 71
total failure for both elements corresponds to a transmission capacity of 0 and the
operational state corresponds to capacities of 1.5 and 2 tons per minute, respec-
tively, so that G1 ( t ) { g11 , g12 } = {0,1.5} and G2 ( t ) { g 21 , g 22 } = {0, 2} .
The failure rates and repair rates corresponding to these two elements are
2,1
(1)
= 7 year 1 , 1,2
(1)
= 100 year 1 for element 1,
2,1
( 2)
= 10 year 1 , 1,2
(2)
= 80 year 1 for element 2.
Element 3 is a multi-state element with only minor failures and minor repairs.
It can be in one of three states: a state of total failure corresponding to a capacity
of 0, a state of partial failure corresponding to a capacity of 1.8 tons per minute,
and a fully operational state with a capacity of 4 tons per minute. Therefore,
G3 ( t ) { g31 , g32 , g33 } = {0,1.8, 4} .
The failure rates and repair rates corresponding to element 3 are
3,2
(3)
= 10 year 1 , 2,1
(3)
= 7 year 1 ,
1,2
(3)
= 120 year 1 , 2,3
(3)
= 110 year 1 .
The system output performance rate is defined as the maximum flow that can
be transmitted from C to E. As was shown in Example 1.2, the MSS structure
function is
Gs ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
G1 ( t ) {0,1.5}
G3 ( t ) {0,1.8, 4}
G ( t ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )}
G2 ( t ) {0, 2}
g12 = 1.5 g 22 = 2 g 33 = 4
(2)
1,2
g 32 = 1.8
g11 = 0
g 21 = 0
2,1
(3)
1,2
(3)
g 31 = 0
In order to derive the system of differential equations for the MSS we apply the
algorithm described above:
1. The failure and repair rate sets for the system elements are:
{ } { }
element 1: 2,1
(1)
, 1,2
(1)
{ } { }
; element 2 : 2,1
(2)
, 1,2
(2)
;
element 3 : { , = 0, } , { , = 0, } .
(3)
3,2
(3)
3,1
(1)
2,1
(3)
1,2
(3)
1,3
(3)
2,3
2. All the system states are generated as combinations of all possible states of sys-
tem elements (characterized by their performance levels). The total number of
different system states is K = k1k2 k3 = 2 2 3 = 12.
3. A unique number is assigned to each system state. All the system states with
their numbers ns corresponding performance rates are presented in columns 1 to
5 of Table 2.1. For every state, the system output performance rate is computed
based on the MSS structure function. For example, in state 1 we have
G1 (t ) = g12 = 1.5, G2 (t ) = g 22 = 2.0, G3 (t ) = g33 = 4.0 .Using the system struc-
ture function, we obtain the entire MSS output performance in state 1 as
G ( t ) = g1 = f ( g12 , g 22 , g33 ) = min { g12 + g 22 , g33 } = min {1.5 + 2.0, 4.0} = 3.5.
4. The state transition analysis is performed for all pairs of system states. For ex-
ample, for state number 2 where the states of the elements are {g11, g22,
g33}={2,4,2} the transitions to states 1, 5, and 6 exist with the intensities
1,2
(1)
, 2,1
(2)
, 3,2
(3)
, respectively. All the existing transitions and corresponding tran-
sition intensities are also presented in Table 2.1. Based on Table 2.1 one can
easily find the non-diagonal elements of the transition intensity matrix that de-
scribes an evolution of the MSS in the state space (the elements of the matrix
corresponding to the absence of transitions should be zeroed).
2.3 Markov Models: Continuous-time Markov Chains 73
Performance ns
ns G1 G2 G3 G 1 2 3 4 5 6 7 8 9 10 11 12
1 1.5 2.0 4.0 3.5 (1)
2,1 ( 2)
2,1 (3)
3,2
5 0 0 4.0 0 1,2
(2)
1,2
(1)
3,2
(3)
9 0 0 4 0 2,3
(3)
1,2
(2)
1,2
(1)
2,1
(3)
10 0 2.0 0 0 1,2
(3)
1,2
(1)
2,1
( 2)
11 1.5 0 0 0 1,2
(3)
1,2
(2)
2,1
(1)
12 0 0 0 0 1,2
(3)
1,2
(2)
1,2
(1)
5. The diagonal elements of the transition intensity matrix are determined in such
a way that the sum of elements of each row of the matrix equals zero. These di-
agonal elements are as follows:
a11 = (2,1
(1)
+ 2,1
(2)
+ 3,2
(3)
) a77 = ( 2,3
(3)
+ 1,2
(1)
+ 2,1
(1)
+ 2,1
(3)
)
a22 = ( 1,2
(1)
+ 2,1
(2)
+ 3,2
(3)
) a88 = ( 1,2
(3)
+ 2,1
(1)
+ 2,1
(2)
)
a33 = ( 1,2
(1)
+ 2,1
(2)
+ 3,2
(3)
) a99 = ( 2,3
(3)
+ 1,2
(2)
+ 1,2
(1)
+ 2,1
(3)
)
a44 = ( 2,3
(3)
+ 2,1
(1)
+ 2,1
(2)
+ 2,1
(3)
) a10,10 = ( 1,2
(3)
+ 1,2
(1)
+ 2,1
(2)
)
a55 = ( 1,2
(1)
+ 1,2
(1)
+ 3,2
(3)
) a11,11 = ( 1,2
(3)
+ 1,2
( 2)
+ 2,1
(1)
)
a66 = ( 2,3
(3)
+ 1,2
(1)
+ 2,1
(2)
+ 2,1
(3)
) a12,12 = ( 1,2
(3)
+ 1,2
(2)
+ 1,2
(1)
)
The state-space diagram of the system is presented in Figure 2.15 (in this dia-
gram the corresponding system performance is presented in the lower parts of the
circle).
74 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
2(1,1)
1,( 22)
1,(12) 2( 3,3)
(2)
3(,32) (1 )2,1
2 ,1
3(,32) 1,( 22)
( 3)
2,3
2( ,12 ) 1,(12)
2(1,1) 3,( 32)
2( 3,3)
( 2)
1,2
1,(12) 1,( 22)
(3 )
3, 2 ( 3)
2,3 2,1(2)
(3 )
2(1,1)
2 ,1 ( 3)
1,2 (2) 1,( 32)
2,1
( 3)
1,2
2( 3,1) 1,( 22)
2(1,1) 2( 3,1)
( 2)
1, 2
1,(12) 2( 3,3)
(3)
1, 2 2 ,1
( 2) (2)
2,1
2(1,1)
2( ,12 )
1,(12)
dp1 (t )
dt
= 2,1(
(1)
+ 2,1
(2)
+ 3,2
(3)
p1 (t ) + 1,2
(1)
)
p2 (t ) + 1,2
(2)
p3 (t ) + 2,3 (3)
p4 (t ),
dp2 (t )
dt
= 2,1
(1)
p1 (t ) 1,2 (1)
(
+ 2,1
( 2)
+ 3,2
(3)
p2 (t ) + 1,2
(2)
)
p5 (t ) + 2,3(3)
p6 (t ),
dp3 (t )
dt
= 2,1
(2)
p1 (t ) 1,2(2)
(
+ 2,1
(1)
+ 3,2
(3)
p3 (t ) + 1,2
(1)
)
p5 (t ) + 2,3
(3)
p7 (t ),
dp4 (t )
dt
= 3,2
(3)
p1 (t ) 2,3
(3)
(
+ 2,1(1)
+ 2,1
( 2)
+ 2,1
(3)
p4 (t ) + 1,2(1)
p6 (t ) )
+ 1,2
(2)
p7 (t ) + 1,2
(3)
p8 (t ),
dp5 (t )
dt
= 2,1
(2)
p2 (t ) + 2,1
(1)
p3 (t ) 1,2
(2)
+ 1,2
(1)
+ 3,2
(3)
(
p5 (t ) + 2,3
(3)
p9 (t ), )
2.3 Markov Models: Continuous-time Markov Chains 75
dp6 (t )
dt
= 3,2
(3)
p2 (t ) + 2,1
(1)
(
p4 (t ) 2,3
(3)
+ 1,2
(1)
+ 2,1
(2)
+ 2,1
(3)
p6 (t ) )
+ 1,2
(2)
p9 (t ) + 1,2
(3)
p10 (t ),
dp7 (t )
dt
= 3,2
(3)
p3 (t ) + 2,1
(2)
(
p4 (t ) 2,3
(3)
+ 1,2
(2)
+ 2,1
(1)
+ 2,1
(3)
p7 (t ) )
+ 1,2
(1)
p9 (t ) + 2,3
(3)
p11 (t ),
dp8 (t )
dt
= 2,1
(3)
(
p4 (t ) 1,2
(3)
+ 2,1
(1)
+ 2,1(2)
)p8 (t ) + 1,2
(1)
p10 (t ) + 1,2(2)
p11 (t ),
dp9 (t )
dt
= 3,2
(3)
p5 (t ) + 2,1
(2)
p6 (t ) + 2,1
(1)
(
p7 (t ) 2,3
(3)
+ 1,2(2)
+ 1,2
(1)
+ 2,1
(3)
) p9 (t )
+ 1,2
( 3)
p12 (t ),
dp10 (t )
dt
= 2,1
(3)
p6 (t ) + 2,1
(1)
p8 (t ) 1,2(
(3)
+ 1,2
(1)
+ 2,1
(2)
)
p10 (t ) + 1,2
(2)
p12 (t ),
dp11 (t )
dt
= 2,1
(3)
p7 (t ) + 2,1
(2)
p8 (t ) 1,2 (
(3)
+ 1,2(2)
+ 2,1
(1)
)
p11 (t ) + 1,2
(1)
p12 (t ),
dp12 (t )
dt
= 2,1
(3)
p9 (t ) + 2,1
(2)
p10 (t ) + 2,1
(1)
p11 (t ) 1,2 (
(3)
+ 1,2
(2)
+ 1,2
(1)
)
p12 (t ).
Solving this system with the initial conditions p1 (0) = 1 , pi (0) = 0 for
2 i 12 one obtains the probability of each state at time t.
According to Table 2.1, in different states a MSS has the following perform-
ance rates: in state 1 g1 = 3.5, in state 2 g 2 = 2.0, in states 4 and 6 g 4 = g 6 = 1.8,
in states 3 and 7 g3 = g 7 = 1.5, in states 5, 8, 9, 10, 11 and 12
g5 = g8 = g9 = g10 = g11 = g12 = 0. Therefore,
Pr {G = 3.5} = p1 ( t ) ,
Pr {G = 2.0} = p2 ( t ) ,
Pr {G = 1.5} = p3 ( t ) + p7 ( t ) ,
Pr {G = 1.8} = p4 ( t ) + p6 ( t ) ,
Pr {G = 0} = p5 ( t ) + p8 ( t ) + p9 ( t ) + p10 ( t ) + p11 ( t ) + p12 ( t ) .
For the constant demand level w = 1 one obtains the MSS instantaneous avail-
ability as a sum of state probabilities where the MSS output performance is greater
than or equal to 1. States 1, 2, 3, 4, 6, and 7 are acceptable. Hence
A(t ) = p1 (t ) + p2 (t ) + p3 (t ) + p4 (t ) + p6 (t ) + p7 (t ) .
76 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
0.8
Availability
Probabilities
0.6
Pr (G=3.5)
Pr (G=2.0)
0.4 Pr (G=1.8)
Pr (G=1.5)
Pr (G=0)
0.2
0
0 0.05 0.1 0.15 0.2
Time (years)
Fig. 2.16 Instantaneous availability and probabilities of different MSS performance levels
12
Et = pi (t )gi .
i =1
12
Dt = pi (t ) max( w g i ,0) .
i =1
3.5 0.014
0.012
Performance deficiency
3.4
Output performance
0.01
3.3
0.008
3.2 0.006
0.004
3.1
0.002
3 0
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
Time (years) Time (years)
(a) (b)
Fig. 2.17 Reliability indices of the flow transmission MSS
(1)
2,1 (2)
1,2
(1)
1,2 (2)
2,1
(3)
3(,32) 2,3
3(,32) (3)
2,3 (1)
3,( 32)
2,1
(1)
1,2 2(3,3)
1,(12) (2)
2,1
(2)
2,1 ( 2) ( 3)
2,1 + 2,1
(1) (3)
2,1 + 2,1 2(3,1)
In this state-space diagram all unacceptable states (with MSS output perform-
ance lower than w = 1 ) are united into one absorbing state 0. As was described
above, the system unreliability is treated as the probability that the MSS enters the
unacceptable area the first time at time instant t.
78 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
dp1 (t )
dt
= 2,1
(1)
(
+ 3,2
(3)
+ 2,1(2)
)
p1 (t ) + 1,2(1)
p2 (t ) + 1,2(2)
p3 (t ) + 2,3
(3)
p4 (t ),
dp2 (t )
dt
= 2,1
(1)
p1 (t ) 2,1(2)
+ 1,2
(1)
(
+ 3,2(3)
)
p2 (t ) + 2,3 (3)
p6 (t ),
dp3 (t )
dt
= 2,1
(2)
p1 (t ) 2,1(1)
+ 1,2
(2)
(
+ 3,2 (3)
)
p3 (t ) + 2,3
(3)
p7 (t ),
dp4 (t )
dt
= 3,2
(3)
p1 (t ) 2,1(1)
+ 2,3
(3)
(
+ 2,1(2)
+ 2,1
(3)
)
p4 (t ) + 1,2 (1)
p6 (t ) + 1,2
(2)
p7 (t ),
dp6 (t )
dt
= 3,2
(3)
p2 (t ) + 2,1
(1)
p4 (t ) 2,3 (3)
(
+ 2,1
(2)
+ 2,1
(3)
+ 1,2(1)
)p6 (t ),
dp7 (t )
dt
= 3,2
(3)
p3 (t ) + 2,1
(2)
(
p4 (t ) 2(3),3 + 2,1(3)
+ 2,1
(1)
+ 1,2(2)
)p7 (t ),
dp0 (t )
dt
= 2,1
(2)
p2 (t ) + 2,1
(1)
p3 (t ) + 2,1(3)
p4 (t ) + 2,1 (
( 2)
+ 2,1
(3)
) (
p6 (t ) + 2,1 (3)
+ 2,1)
(1)
p7 (t ).
0.9
0.8
Reliability function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.2 0.4 0.6 0.8 1
Time (years)
Fig. 2.19 Reliability function of flow transmission MSS
2.4 Markov Reward Models 79
In the preceding subsections, it was shown how some important MSSs' reliability
indices can be found by using the Markov technique. Here we consider additional
indices such as states frequencies and the mean number of system failures during
an operating period. It is also very important that the Markov reward models con-
sidered here are very useful for MSS life cycle cost analysis and reliability-
associated cost computation. Here we describe the common computational
method, which is based on the general Markov reward model that was primarily
introduced by Howard (1960) and then was essentially extended for different ap-
plications in Mine and Osaki (1970) and many other research works. The corre-
sponding overview can be found in Reibman et al. (1989).
This model considers the continuous-time Markov chain with a set of states
{ , K } and transition intensity matrix a = aij , i, j = 1,, K . It is suggested
1,
that if the process stays in any state i during the time unit, a certain amount of
money rii should be paid. It is also suggested that each time that the process tran-
sits from state i to state j a certain amount of money rij should be paid. These
amounts of money rii and rij are called rewards (the reward may also be negative
when it characterizes losses or penalties). Rewards may also be considered in
other senses, not only as money. It may be, for example, the energy of a power
generating system, information quantity of a communications system, the produc-
tivity of a production line, etc. Markov process with rewards associated with its
states and transitions is called a Markov process with rewards. For these proc-
esses, an additional matrix r = rij , i, j = 1,, K of rewards is determined. If all
rewards are zeroes, the process reduces to the ordinary continuous-time discrete-
state Markov process.
Note that the rewards rii and rij have different dimensions. For example, if rij is
measured in cost units, reward rii is measured in cost units per time unit. The value
that is of interest is the total expected reward accumulated up to time instant t
under specified initial conditions.
Let Vi (t ) be the total expected reward accumulated up to time t, given the ini-
tial state of the process at time instant t = 0 is state i. According to Howard
(1960), the following system of differential equations must be solved under speci-
fied initial conditions in order to find the total expected rewards:
80 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
dVi (t ) K K
= rii + aij rij + aijV j (t ), i = 1,..., K . (2.80)
dt j =1 j =1
j i
System (2.80) can be obtained in the following manner. Assume that at time in-
stant t = 0 the process is in state i. During the time increment t , the process can
remain in this state or transit to some other state j. If it remains in state i during
time t , the expected reward accumulated during this time is rii t. Since at the
beginning of the time interval [ t , t + t ]] the process is still in state i, the ex-
pected reward during this interval is Vi (t ) and the expected reward during the en-
tire interval [ 0, t + t ] is Vi (t + t ) = rii t + Vi (t ). The probability that the process
will remain in state i during the time interval t equals 1 minus the probability
that it will transit to any other state j i during this interval:
K
ii (0, t ) = 1 aij t = 1 + aii t . (2.81)
j =1
j i
On the other hand, during time t the process can transit to some other state
j i with the probability ij (0, t ) = aij t. In this case the expected reward ac-
cumulated during the time interval [ 0, t ] is rij. At the beginning of the time in-
terval [ t , t + t ]] the process is in state j. Therefore, the expected reward during
this interval is V j (t ) and the expected reward during the interval [ 0, t + t ] is
Vi (t + t ) = rij + V j (t ).
In order to obtain the total expected reward one must summarize the products
of rewards and corresponding probabilities for all of the states. Thus, for small t
one has
K
Vi (t + t ) (1 + aii t ) [ rii t + Vi (t ) ] + aij t rij + V j (t ) , i = 1,..., K . (2.82)
j =1
j i
Neglecting the terms with an order greater than t one can rewrite the last ex-
pression as follows:
Vi (t + t ) Vi (t ) K K
= rii + aij rij + aijV j (t ), i = 1,..., K . (2.83)
t j =1 j =1
j i
Defining the vector column of total expected rewards V(t) with components
V1 (t ),..., VK (t ) and vector column u with components
K
ui = rii + aij rij , i = 1,..., K , (2. 84)
j i
j =1
d
V ( t ) = u + aV ( t ) . (2.85)
dt
0 = u + aV ( t ) , (2.86)
sition from state 1 to state 2 is r12 = cr . There is no reward associated with the
transition from state 2 to state 1, so r21 = 0.
r r12 c p N ic cr
r = rij = 11 =
r21 r22 0 rprf
a a12
a = aij = 11 = .
a21 a22
dV1 (t )
dt = c p N ic + cr V1 (t ) + V2 (t ),
dV2 (t ) = r + V (t ) V (t ).
dt prf 1 2
A total expected reward RT associated with the production line operating dur-
ing the time interval [0, T] is equal to the expected reward V2 (t ) accumulated up
to time t, given the initial state of the process at time instant t = 0 is state 2.
Using the LaplaceStieltjes transform under the initial conditions
V1 (0) = V2 (0) = 0, we transform the system of differential equations into the fol-
lowing system of linear algebraic equations:
2.4 Markov Reward Models 83
c p L + cr
sv1 ( s ) = v1 ( s ) + v2 ( s ),
s
sv ( s ) = v ( s ) v ( s ),
2 1 2
c p L + cr
v2 ( s ) = .
s 2 (s + + )
c p L + cr ( + ) t
V2 (t ) = L1 {v2 ( s )} = e + ( + )t 1 .
( + )2
c p L + cr ( + )T
CT = V2 (T ) = e + ( + )T 1 .
( + )2
For relatively large T the term e ( + )T can be neglected and the following ap-
proximation can be used:
( c p L + cr )
CT T.
+
Therefore, for large T, the total expected reward is a linear function of time and
the coefficient
( c p L + c r )
cun =
+
defines the annual expected cost associated with production line unreliability. For
the data given in the example, cun = $13.14 106 year-1.
84 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
In its general form the Markov reward model was intended to provide economic
and financial calculations. From the preceding subsection and Example 2.5 it is
clear that the Markov reward model is a very useful tool for life cycle cost analy-
sis, and corresponding case studies will be presented in Chapters 6 and 7. How-
ever, it was shown by Lubkov and Stepanyans (1978) and Volik et al. (1988) that
this tool may also be very suitable for reliability analysis and important reliability
measures could be easily found by the corresponding determination of the rewards
in matrix r. In these works it was suggested that demand w is constant. The
method was extended by Lisnianski (2007) to MSS with variable demand, where
demand is assumed to be a continuous-time Markov chain with m different possi-
ble states (levels) w1,,wm and corresponding constant transition intensities with a
given matrix b = bij , i, j = 1, 2, m. Here we apply this method for MSS reliabil-
ity analysis.
In the previous section, MSS was considered to have constant demand. In prac-
tice, this is often not so. A MSS can fall into a set of unacceptable states in two
ways: either through a performance decrease because of failures or through an in-
crease in demand.
For example, consider the demand variation that is typical for power systems.
Usually demand can be represented by a daily demand curve. This curve is cyclic
in nature with a maximum level (peak) during the day and a minimum level at
night (Endrenyi 1979); (Billinton and Allan 1996). Another example is a number
of telephone calls arriving during a time unit to a telephone station. In the simplest
and most frequently used model, the cyclic demand variation can be approximated
by a two-level demand curve as shown in Figure 2.21 (a).
In this model, the demand is represented as a continuous-time Markov chain
with two states: w = {w1 , w2 } [Figure 2.21 (b)], where w2 is a peak level of de-
mand and w1 is a low level. When the cycle time Tc and the mean duration of the
peak tp are known (usually ( usually Tc = 24 h ) , the transition intensities of the
model can be obtained as
1 1
p = , l = , (2.87)
Tc t p tp
2.4 Markov Reward Models 85
where p is the transition intensity from a low demand level to a peak level and
l is the transition intensity from a peak demand level to a low level.
w2
p
w1
T
Actual demand curve t
Two-level approximation
(a) (b)
Fig. 2.21 Two-level demand model: (a) approximation of actual demand curve, and (b) state-
transition diagram
In the further extension of the variable demand model the demand process can
be approximated by defining a set of discrete values {w1 , w2 ,, wm } representing
different possible demand levels and determining the transition intensities between
each pair of demand levels (usually derived from the demand statistics). The reali-
zation of the stochastic process of the demand for a specified period and the corre-
sponding state-space diagram are shown in Figure 2.22. bij is the transition inten-
sity from demand level wi to demand level wj.
(a) (b)
Fig. 2.22 Discrete variable demand: (a) realization of general Markov demand process, and (b)
state-transition diagram for general Markov demand process
86 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
So, for a general case we assume that demand W(t) is also a random process
that can take on discrete values from the set w = {w1 ,, wm } . The desired relation
between the MSS output performance and the demand at any time instant t can be
expressed by the acceptability function (G (t ),W (t )). The acceptable system
states correspond to (G (t ), W (t )) 0 and the unacceptable states correspond to
(G (t ),W (t )) < 0. The last inequality defines the system failure criterion. Usually
in power systems, the system generating capacity should be equal to or exceed the
demand. Therefore, in such cases the acceptability function takes on the following
form:
(G (t ),W (t )) = G (t ) W (t ) (2.88)
(G (t ),W (t )) = G (t ) W (t ) 0 . (2.89)
Below we present a general method that proved to be very useful for the com-
putation of system reliability measures when MSS output performance and de-
mand are independent discrete-state continuous-time Markov processes.
The performance and demand models can be combined based on the inde-
pendence of events in these two models. The probabilities of transitions in each
model are not affected by the events that occur in another one. The state-space
diagram for the combined m-state demand model and K-state output capacity
model is shown in Figure 2.25. Each state in the diagram is labeled by two indi-
ces indicating the demand level w {w1 ,..., w m } and the element performance
rate g {g1 , g 2 ,..., g K }.
These indices for each state are presented in the lower part of the correspond-
ing circle. The combined model is considered to have mK states. Each state cor-
responds to a unique combination of demand levels wi and element performance
g j and is numbered according to the following rule:
z = ( i 1) K + j , (2.90)
z ~ {wi , g j } . (2.91)
In Figure 2.25 the number of each state is shown in the upper part of the corre-
sponding circle.
In addition to transitions between states with different performance levels, there
are transitions between states with the same performance levels but with different
demand levels. All intensities of horizontal transitions are defined by transition in-
tensities bi , j , i, j = 1, m of the Markov demand model Ch2, and all intensities of
vertical transitions are defined by transition intensities ai , j , i, j = 1, K of the per-
formance model Ch1. All other (diagonal) transitions are forbidden. We designate
the transition intensity matrix for the combined performance-demand model as
c = cij , where i, j = 1, 2,, mK .
Thus, the algorithm of the combined performance-demand model building
based on separate performance and demand models Ch1 and Ch2 can be presented
by the following steps.
2.4 Markov Reward Models 89
K 2K ... mK
w 1 gK w2 gK wm gK
w1 g1 w2 g1 wm g1
Algorithm
1. The state-space diagram of a combined performance-demand model is shown in
Figure 2.25, where the nodes represent system states and the arcs represent cor-
responding transitions.
2. The graph consists of mK nodes that should be ordered in K rows and m col-
umns.
3. Each state (node) should be numbered according to rule (2.90).
90 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
cz1 , z2 = bi , s , (2.92)
cz1 , z3 = a j ,t , (2.93)
where
6. All diagonal transitions are forbidden so that the corresponding transitions in-
tensities in matrix c are zeroed.
conditions in order to find the total expected rewards for the combined perform-
ance-demand model:
dVi (t ) mK mK
= rii + cij rij + cijV j (t ), i = 1,, mK . (2.95)
dt j =1 j =1
j i
In the most common case, the MSS begins to accumulate rewards after time in-
stant t = 0, therefore, the initial conditions are
Vi ( 0 ) = 0, i = 1,, mK . (2.96)
If, for example, the state number K (Figure 2.25) with the highest performance
level and the lowest demand level is defined as the initial state, the value VK(t)
should be found as a solution of system (2.95).
In order to find reliability measures for a MSS the specific reward matrix r
should be defined for each measure. Based on the combined performance-demand
model, the theory of the Markov reward processes can be applied for computation
of reliability measures for Markov MSS. As was said above, we assume that de-
mand W(t) and MSS output performance G(t) are mutually independent continu-
ous-time Markov chains.
T
1
A (T ) =
T 0
A(t )dt , (2.97)
where A(t) is the instantaneous (point) availability the probability that the MSS
at instant t > 0 is in one of the acceptable states:
As was shown in the previous section, A(t) can be found by solving differential
equations (2.35) and summarizing the probabilities corresponding to all acceptable
states. But based on the Markov reward model MSS average availability A ( T )
may be found more easily without using expression (2.97). For this purpose the
rewards in matrix r for the combined performance-demand model should be de-
termined in the following manner:
The rewards associated with all acceptable states should be defined as 1.
The rewards associated with all unacceptable states should be zeroed as well as
all the rewards associated with the transitions.
92 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
The mean reward Vi(T) accumulated during the interval [0,T] defines how long
the power system will be in the set of acceptable states in the case where state i is
the initial state. This reward should be found as a solution of system (2.95) under
initial conditions (2.96). After solving (2.95) and finding Vi(t), the MSS average
availability can be obtained for each different initial state i = 1, 2,, mK :
Vi (T )
Ai (T ) = . (2.99)
T
Usually the state K with the greatest performance level and minimum demand
level is determined as an initial state.
The mean number Nfi(T) of MSS failures during the time interval [0, T], if state i
is the initial state, can be treated as a mean number of MSS entrances into the set
of unacceptable states during the time interval [0,T]. For its computation, the re-
wards associated with each transition from the set of acceptable states to the set of
unacceptable states should be defined as 1. All other rewards should be zeroed.
In this case the mean accumulated reward Vi(T), obtained by solving (2.95) pro-
vides the mean number of entrances into the unacceptable area during the time in-
terval [0,T]:
N fi (T ) = Vi (T ) . (2.100)
When the mean number of system failures is computed, the corresponding fre-
quency of failures or frequency of entrances into the set of unacceptable states can
be found:
1
f fi (T ) = . (2.101)
N fi (T )
T
EAPDi = Vi (T ) = E (W (t ) G (t ))dt . (2.102)
0
Mean time to failure (MTTF) is the mean time up to the instant when the system
enters the subset of unacceptable states for the first time. For its computation the
2.4 Markov Reward Models 93
Ri (T ) = 1 Vi (T ), i = 1,..., K . (2.103)
Example 2.6 Consider reliability evaluation for a power system, whose output
generating capacity is represented by a continuous-time Markov chain with three
states. The corresponding capacity levels for states 1, 2, and 3 are
g1 = 0, g 2 = 70 MW, g3 = 100 MW, respectively, and the transition intensity ma-
trix is as the follows:
500 0 500
a = aij = 0 1000 1000 .
1 10 11
Daily peaks w2 and w3 occur twice a week and five times a week, respectively,
and the mean duration of the daily peak is Tp = 8 h. The mean duration of low
demand level w1 = 0 is defined as TL = 24 8 = 16 h.
According to the approach presented in Endrenyi (1979) that is justified for a
power system, peak duration and low level duration are assumed to be exponen-
tially distributed random values.
The acceptability function is given: ( G (t ), W (t ) ) = G (t ) W (t ). Therefore, a
failure is treated as an entrance into the state where the acceptability function is
negative or G (t ) < W (t ).
Find the mean number of generator entrances into the set of unacceptable states
during the time interval [0,T].
Solution. Markov performance model Ch1 corresponding to the given capacity
levels g1 = 0, g 2 = 70, g3 = 100 and transition intensity matrix a is graphically
shown in Figure 2.27 (a).
Markov demand model Ch2 is shown in Figure 2.27 (b). States 1, 2, and 3 rep-
resent the corresponding demand levels w1 , w2 , and w3 . Transition intensities are
such as follows:
1 1 1
b21 = b31 = = h = 1110 years 1 ,
Tp 8
2 1 2 1
b12 = = = 0.0179 h 1 = 156 years 1 ,
7 TL 7 16
5 1 5 1
b13 = = = 0.0446 h 1 = 391 years 1 .
7 TL 7 16
2.4 Markov Reward Models 95
(a) (b)
Fig. 2.27 Output performance model (a) and demand model (b)
cz1 z2 = bi , s .
96 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
All intensities of vertical transitions from state z1~ {wi , g j } to state z3~ {wi , gt },
i = 1,3 , j , t = 1,3, are defined by the capacity transition intensity matrix a
cz1 z3 = a j ,t .
x1 0 a1,3 0 0 0 b1,3 0 0
0 x2 a2,3 0 0 0 0 b1,3 0
a1,3 a3,2 x3 0 0 0 0 0 b1,3
0 0 0 x4 0 a1,3 b2,3 0 0
c = cij = 0 0 0 0 x5 a2,3 0 b2,3 0
0 0 0 a31 a3,2 x6 0 0 b2,3
b 0 0 b3,2 0 0 x7 0 a1,3
3,1
0 b3,1 0 0 b3,2 0 0 x8 a2,3
0 0 b3,1 0 0 b3,2 a3,1 a3,2 x9
where
x1 = a1,3 b1,3 , x2 = a2,3 b3,1 , x3 = a1,3 a3,2 b1,3 ,
x4 = a1,3 b2,3 , x5 = a2,3 b2,3 , x6 = a3,1 a3,2 b2,3 ,
x7 = a1,3 b3,1 b3,2 , x8 = a2,3 b3,1 b3,2 , x9 = a3,1 a3,2 b1,3 b3,2 .
The state with the maximum performance g3 = 100 MW and the minimum de-
mand w1 = 0 ( state 3) is given as the initial state. In states 2, 5, and 8 the MSS
performance is 70 MW, in states 3, 6, and 9 it is 100 MW, and in states 1, 4, and 7
it is 0. In states 4, 7, and 8 the MSS performance is lower than the demand. These
states are unacceptable and have a performance deficiency:
D4 = w2 g1 = 60 MW, D7 = w3 g1 = 90 MW, and D8 = w3 g 2 = 70 MW.
States 1, 2, 3, 5, 6, and 9 constitute the set of acceptable states.
In order to find the mean number of failures the reward matrix should be de-
fined according to the suggested method. Each reward associated with transition
from the set of acceptable states to the set of unacceptable states should be defined
as 1. All other rewards should be zeroed. Therefore, in a reward matrix
r14 = r17 = r28 = r98 = r97 = 1 and all other rewards are zeroes. So, reward matrix r
is obtained:
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
r = rij = 0 0 0 0 0 0 0 1 0 .
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0
The corresponding system of differential equations is as follows:
98 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
dV1 (t )
= b1,3 ( a1,3 + b1,3 ) V1 (t ) + a1,3V3 (t ) + b1,3V7 (t ),
dt
dV2 (t )
= b1,3 ( a2,3 + b1,3 ) V2 (t ) + a2,3V3 (t ) + b1,3V8 (t ),
dt
dV3 (t )
= a1,3V1 (t ) + a3,2V2 (t ) ( a1,3 + b1,3 + a3,2 ) V3 (t ) + b1,3V9 (t ),
dt
dV4 (t )
= ( a1,3 + b2,3 ) V4 (t ) + a1,3V6 (t ) + b2,3V7 (t ),
dt
dV5 (t )
= b2,3 ( a2,3 + b2,3 )V5 (t ) + a2,3V6 (t ) + b2,3V8 (t ),
dt
dV6 (t )
= a3,1 + a3,1V4 (t ) + a3,2V5 (t ) ( a3,1 + a3,2 + b2,3 ) V6 (t ) + b2,3V9 (t ),
dt
dV7 (t )
= b3,1V1 (t ) + b3,2V4 (t ) ( a1,3 + b3,1 + b3,2 ) V7 (t ) + a1,3V9 (t ),
dt
dV8 (t )
= b3,1V2 (t ) + b3,2V5 (t ) ( a2,3 + b3,1 + b3,2 ) V8 (t ) + a2,3V9 (t ),
dt
dV9 (t )
= a3,1 + a3,2 + b3,1V3 (t ) + b3,2V6 (t ) + a3,1V7 (t ) + a3,2V8 (t )
dt
( a3,1 + a3,2 + b3,1 + b3,2 ) V9 (t ).
By solving the system of these differential equations under the initial condi-
tions Vi ( t ) = 0, i = 1, , 9 all expected rewards Vi ( t ) , i = 1,,9 can be found as
functions of time t.
The state K = 3, in which the system has a maximum capacity level and a
minimum demand, is given as the initial state. Then, according to expression
(2.100) the value V3(T) is treated as the mean numbers of system entrances into the
area of unacceptable states or the mean number of power system failures during
the time interval [0,T]. The function N f 3 ( t ) = V3 ( t ) is graphically presented in
Figure 2.29, where N f 3 (t ) is the mean number of system failures in the case, when
state 3 is an initial state.
The function N f 1 ( t ) = V1 (t ) characterizes the mean number of system failures
in the case where state 1 is given as the initial state. It is also presented in this fig-
ure. As shown, N f 3 ( t ) < N f 1 ( t ) because state 1 is closer to the set of unac-
ceptable states it has the direct transition to the set in the unacceptable area and
state 3 does not. Therefore, at the beginning of the process the systems entrance
into the set of unacceptable states is more likely from state 1 than from state 3.
Figure 2.29 (a) graphically represents a number of power system failures for a
short period only 8 d. However, after this short period the function N f 3 (t ) will
2.5 Semi-Markov Models 99
3 140
2.5 120
100
2
80
1.5
60
1
Nf1(t) 40
Nf3(t)
0.5 20
0 0
0 2 4 6 8 0 0.2 0.4 0.6 0.8 1
Time (days) Time (years)
(a) (b)
Fig. 2.29 Mean number of generator entrances to the set of unacceptable states: (a) short time
period, and (b) 1 year time period
According to (2.101) the frequency of the power system failures can be ob-
tained:
1
ff3 = = 0.0076 year 1 .
Nf3
In order to define a semi-Markov process, consider a system that at any time in-
stant t 0 can be in one of various possible states g1 , g 2 ,, g K . The system be-
havior is defined by the discrete-state continuous-time stochastic performance
process G ( t ) { g1 , g 2 , , g K } . We assume that the initial state i of the system
and one-step transition probabilities are given as follows:
G (0) = gi , i {1,..., K },
(2.104)
{ }
jk = P G (tm ) = g k G (tm 1 ) = g j , j , k {1,..., K }.
Here jk is the probability that the system will transit from state j with per-
formance rate g j to state k with performance rate g k . Probabilities jk ,
j , k {1,..., K } define the one-step transition probability matrix = jk for the
discrete-time chain G (tm ), where transitions from one state to another may hap-
pen only at discrete time moments t1 , t2 , , tm 1 , tm , . Such a Markov chain G(tm)
is called Markov chain embedded in stochastic process G(t), or embedded Markov
chain for short.
To each jk 0 a random variable corresponds T jk* with the cumulative distri-
bution function
and probability density function f jk* ( t ) . This random variable is called a condi-
tional sojourn time in state j and characterizes the system sojourn time in the j un-
der condition that the system transits from state j to state k.
The graphical interpretation of possible realization of the considered process is
shown in Figure 2.30. At the initial time instant G ( 0 ) = gi . The process transits to
state j (with performance rate gj) from the initial state i with probability ij . There-
fore, if the next state is state j, the process remains in state i during random time
Tij* with cdf Fij* ( t ) . When the process transits to state j, the probability of the
transition from this state to any state k is jk . If the system transits from state j to
state k, it remains in state j during random time T jk* with cdf Fjk* ( t ) up to the tran-
sition to state k.
account the sojourn times in different states, the process does not have Markov
properties. (It remains a Markov process only if all the sojourn times are distrib-
uted exponentially.) Therefore, the process can be considered a Markov process
only at time instants of transitions. This explains why the process was named
semi-Markov.
The most general definition of the semi-Markov process is based on kernel ma-
trix Q(t). Each element Qij (t ) of this matrix determines the probability that a one-
step transition from state i to state j occurs during the time interval [0,T]. Using a
kernel matrix, one-step transition probabilities for embedded Markov chain can be
obtained as
and the CDF Fij* ( t ) of the conditional sojourn time in state i can be obtained as
1
Fij* ( t ) = Qij (t ) . (2.107)
ij
K K
Fi (t ) = Qij (t ) = ij F *ij (t ) . (2.108)
j =1 j =1
Hence, for pdf of the unconditional sojourn time in state i with performance
rate gi, we can write
d K
f i (t ) = Fi (t ) = ij f *ij (t ) . (2.109)
dt j =1
Based on (2.109), the mean unconditional sojourn time in state i can be ob-
tained as
K
Ti = tf i (t )dt = ijT *ij , (2.110)
0 j =1
where Tij* is the mean conditional sojourn time in state i given that the system tran-
sits from state i to state j.
2.5 Semi-Markov Models 103
Kernel matrix Q(t) and the initial state completely define the stochastic behav-
ior of a semi-Markov process.
In practice, when MSS reliability is studied, in order to find the kernel matrix
for a semi-Markov process, one can use the following considerations (Lisnianski
and Yeager 2000). Transitions between different states are usually executed as
consequences of such events as failures, repairs, inspections, etc. For every type of
event, the cdf of time between them is known. The transition is realized according
to the event that occurs first in a competition among the events.
In Figure 2.31, one can see a state-transition diagram for the simplest semi-
Markov process with three possible transitions from initial state 0. The process
will transit from state 0 to states 1, 2, and 3 when events of some different types 1,
2, and 3, respectively, occurs. The time between events of type 1 is random vari-
able T0,1 distributed according to CDF F0,1(t). If an event of type 1 occurs first, the
process transits from state 0 to state 1. The random variable T0,2 that defines the
time between events of type 2 is distributed according to cdf F0,2(t). If an event of
type 2 occurs earlier than other events, the process transits from state 0 to state 2.
The time between events of type 3 is random variable T0,3 distributed according
to cdf F0,3(t). If an event of type 3 occurs first, the process transits from state 0 to
state 3.
The probability Q01 (t ) that the process will transit from state 0 to state 1 up to
time t ( the initial time t = 0 ) may be determined as the probability that under
condition T0,1 t , the random variable T0,1 is less than variables T0,2 and T0,3.
Hence, we have
t
Q02 (t ) = 1 F0,1 (u ) 1 F0,3 (u ) dF0,2 (u ), (2.112)
0
t
Q03 (t ) = 1 F0,1 (u ) 1 F0,2 (u ) dF0,3 (u ). (2.113)
0
0, if t < Tc ,
F0,3 (t ) =
1, if t Tc
(such a CDF corresponds to the arrival of events with constant period Tc).
Find:
1. one-step transition probabilities Q01(t), Q02(t), Q03(t) for the kernel matrix;
2. cumulative distribution function for unconditional sojourn time T0 in state 0;
3. one-step transition probabilities for the embedded Markov chain.
Solution. Using (2.111) (2.113) we obtain one-step probabilities for the kernel
matrix:
2.5 Semi-Markov Models 105
0,1 ( + )t
[1 e 0,1 0,2 ], if t < Tc ,
+
0,1 0,2
Q01 (t ) =
0,1 [1 e ( 0,1 + 0,2 )Tc ], if t T ,
0,1 + 0,2 c
0,2 ( + )t
[1 e 0 ,1 0 ,2 ], if t < Tc ,
0,1 + 0,2
Q02 (t ) =
0,2 [1 e ( 0 ,1 + 0 ,2 )Tc ], if t T ,
+ c
0,1 0,2
0, if t < Tc ,
Q03 (t ) = ( 0 ,1 + 0 ,2 )Tc
e , if t Tc .
( +
1 e 0,1 0,2 , if t < Tc ,
3 )t
F0 (t ) = Q0 j (t ) =
j =1 1, if t Tc .
One-step transition probabilities for embedded Markov chain are defined ac-
cording to (2.106):
0,1
01 = 1 e ( 0,1 + 0,2 )Tc
,
0,1 + 0,2
0,2
02 = 1 e ( 0 ,1 + 0 ,2 )Tc
,
0,1 + 0,2
In order to find the MSS reliability indices, the system state-space diagram should
be built as was done in previous sections for Markov processes. The only differ-
ence is that, in the case of the semi-Markov model, the transition times may be
distributed arbitrarily. Based on transition time distributions Fi, j (t ), the kernel
matrix Q(t) should be defined according to the method presented in the previous
section.
106 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
The main problem of semi-Markov process analysis is to find the state prob-
abilities. Let ij (t ) be the probability that the process that starts in initial state i at
instant t = 0 will be in state j at instant t. It was shown that probabilities ij (t ),
i, j {1, , K } , can be found from the solution of the following system of inte-
gral equations:
K t
ij (t ) = ij [1 Fi (t )] + qik ( ) kj (t ) d , (2.115)
k =1 0
where
dQik ( )
qik ( ) = , (2.116)
d
K
Fi (t ) = Qij (t ) , (2.117)
j =1
1, if i = j ,
ij = (2.118)
0, if i j.
The system of linear integral equations (2.115) is the main system in the theory
of semi-Markov processes. By solving this system, one can find all the probabili-
ties ij (t ) , i, j {1,..., K }, for a semi-Markov process with a given kernel matrix
Qij (t ) and given initial state.
Based on the probabilities ij (t ), i, j {1,..., K }, important reliability indices
can easily be found. Suppose that system states are ordered according to their per-
formance rates g K g K 1 ... g 2 g1 and demand g m w > g m 1 is constant.
State K with performance rate gK is the initial state. In this case system instantane-
ous availability is treated as the probability that a system starting at instant
t = 0 from state K will be at instant t 0 in any state g K ,, g m . Hence, we obtain
K
A(t , w) = Ki (t ) . (2.119)
j =m
The mean system instantaneous output performance and the mean instantane-
ous performance deficiency can be obtained, respectively, as
2.5 Semi-Markov Models 107
K
Et = gi Ki (t ) (2.120)
i =1
and
m 1
Dt ( w) = ( w gi ) Ki (t )1( w > gi ). (2.121)
i =1
In the general case, the system of integral Equations 2.115 can be solved only
by numerical methods. For some of the simplest cases the method of the Laplace
Stieltjes transform can be applied in order to derive an analytical solution of the
system. As was done for Markov models, we designate a LaplaceStieltjes trans-
form of function f(x) as
~ t
f ( s ) = L{ f ( x)} = e sx f ( x)dx . (2.122)
0
K
ij ( s ) = iji ( s ) + ik fik ( s )kj ( s ), 1 i , j K , (2.123)
k =1
i (t ) = 1 Fi (t ) = f i (t )dt = Pr{Ti > t} (2.124)
t
and, therefore,
~ 1 ~
i ( s ) = [1 f i ( s )] . (2.125)
s
their designation, one can use only one index: j . It is proven that
108 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
p jT j
j = K
, (2.126)
pT
j =1
j j
p j =
i =1
pi ij , j = 1,..., K ,
K (2.127)
p = 1.
i =1
i
Note that the first K equations in (2.127) are linearly dependant and we cannot
K
solve the system without the last equation pi = 1 .
i =1
In order to find the reliability function, an additional semi-Markov model
should be built in analogy with the corresponding Markov models: all states corre-
sponding to performance rates lower than constant demand w should be united in
one absorbing state with the number 0. All transitions that return the system from
this absorbing state should be forbidden. The reliability function is obtained from
this new model as R( w, t ) = K 0 (t ).
Example 2.8 (Lisnianski and Levitin 2003). Consider an electric generator that
has four possible performance (generating capacity) levels g 4 = 100 MW,
g3 = 70 MW, g 2 = 50 MW, and g1 = 0. The constant demand is w = 60 MW. The
best state with performance rate g 4 = 100 MW is the initial state. Only minor fail-
ures and minor repairs are possible. Times to failures are distributed exponentially
with following parameters: 3,2 = 5 104 h 1 , 2,1 = 2 104 h 1 . Hence, times to
failures T4,3, T3,2, T2,1 are random variables distributed according to the corre-
sponding CDF:
4 ,3 t 3,2 t 2,1t
F4,3 (t ) = 1 e , F3,2 (t ) = 1 e , F2,1 (t ) = 1 e .
Repair times are normally distributed. T3,4 has a mean time to repair of
T3,4 = 240 h and a standard deviation of 3,4 = 16 h, T2,3 has a mean time to repair
of T2,3 = 480 h and standard deviation 2,3 = 48 h, T1,2 has a mean time to repair
T1,2 = 720 h and standard deviation 1,2 = 120 h. Hence, the CDF of random vari-
ables T3,4, T2,3, and T1,2 are, respectively:
2.5 Semi-Markov Models 109
1
t
(u T3,4 )
F3,4 (t ) = exp du,
2 2 3,4
2
3,4 0
1
t
(u T2,3 )
F2,3 (t ) = exp du,
2 2 2,3
2
2,3 0
1
t
(u T1,2 )
F1,2 (t ) = exp du.
2 2 1,2
2
1,2 0
F3,4(t) F4,3(t)
F2,3(t) F3,2(t)
F1,2(t) F2,1(t)
(a) (b)
Fig. 2.32 Generator representation by stochastic process: (a) generator evolution in the state
space, and (b) semi-Markov model
State 4 is an initial state with generating capacity g4. After the failure, which
occurs according to distribution F4,3(t), the generator transits from state 4 to state 3
with reduced generating capacity g3.
If a random repair time in state 3, which is distributed according to CDF
F3,4(t), is lower than the time up to the failure in state 3, which is distributed ac-
cording to F3,2(t), the generator will come back to state 4. If the repair time is
greater than the time up to the failure in state 3, the generator will fall down to
state 2 with generating capacity g2.
If the random repair time in state 2, which is distributed according to CDF
F2,3(t), is lower than the time up to the failure in state 2, which is distributed ac-
110 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
cording to F2,1(t), the generator will come back to state 3. If the repair time is
greater than the time up to the failure in state 2, the generator will fall down to
state 1 with generating capacity g1.
In state 1 after repair time, which is distributed according to F1,2 ( t ) , the genera-
tor will come back to state 2.
Based on (2.111)(2.113), we obtain the following kernel matrix
Q(t)= Qij (t ) , i, j = 1, 2,3, 4:
0 Q12 (t ) 0 0
Q (t ) 0 Q23 (t ) 0
Q ( t ) = 21
0 Q32 (t ) 0 Q34 (t )
0 0 Q43 (t ) 0
in which
t
Q12 (t ) = F1,2 (t ), Q21 (t ) = [1 F2,3 (t )]dF2,1 (t ),
0
t t
Q23 (t ) = [1 F2,1 (t )]dF2,3 (t ), Q32 (t ) = [1 F3,4 (t )]dF3,2 (t ),
0 0
t
Q34 (t ) = [1 F3,2 (t )]dF3,4 (t ), Q43 (t ) = F4,3 (t ).
0
12 = F1,2 () = 1, 21 = [1 F2,3 (t )]dF2,1 (t ), 23 = [1 F2,1 (t )]dF2,3 (t ),
0 0
32 = [1 F3,4 (t )]dF3,2 (t ), 34 = [1 F3,2 (t )]dF3,4 (t ), 43 = F4,3 () = 1.
0 0
2.5 Semi-Markov Models 111
0 12 0 0 0 1 0 0
0 23 0 0.0910 0 0.9090 0
= lim Q ( t ) = 21 = .
t 0 32 0 34 0 0.1131 0 0.8869
0 0 43 0 0 0 1 0
p1 = 21 p2 ,
p2 = 12 p1 + 32 p3 ,
p3 = 23 p2 + 43 p4 ,
p = p ,
4 34 3
p1 + p2 + p3 + p4 = 1.
p1T1 p2T2
1 = 4
= 0.0069, 2 = 4
= 0.0484,
p jT j
j =1
p jTj
j =1
p3T3 p4T4
3 = 4
= 0.1919, 4 = 4
= 0.7528.
p jTj
j =1
p jTj
j =1
The steady-state availability of the generator for the given constant demand is
A ( w ) = 3 + 4 = 0.9447.
4
E = g k k = 91.13 MW,
k =1
112 2 Modern Stochastic Process Methods for Multi-state System Reliability Assessment
D = ( w g 2 ) 2 + ( w g1 )1 = 0.50 MW.
In order to find the reliability function for the given constant demand
w = 60 MW, we unite states 1 and 2 into one absorbing state 0. The modified
graphical representation of the system evolution in the state space for this case is
shown in Figure 2.33 (a). Figures 2.33 (b) shows the state-space diagram for the
corresponding semi-Markov process.
(a) (b)
Fig. 2.33 State-transition diagrams for evaluating reliability function of generator: (a) evolution
in modified state space, and (b) semi-Markov model
As in the previous case, we define the kernel matrix for the corresponding
semi-Markov process based on expressions (2.111) (2.113):
0 0 0
Q ( t ) = Q30 (t ) 0 Q34 (t ) ,
0 Q43 (t ) 0
where
t t
Q30 (t ) = [1 F3,4 (t )]dF3,1 (t ), Q34 (t ) = [1 F3,1 (t )]dF3,4 (t ), Q43 (t ) = F4,3 (t ).
0 0
R( w, t ) = 40 (t ) .
References 113
t
40 ( t ) = 0 q43 ( )30 (t )d ,
t t
30 (t ) = 0 34 40
q ( ) (t ) d + 0 q30 ( )00 (t )d , .
(t ) = 1.
00
0.8
Reliability function
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Time (hours)
Fig. 2.34 Reliability function of generator
References
Trivedi K (2002) Probability and statistics with reliability, queuing and computer science appli-
cations. Wiley, New York
Volik B, Buyanov B, Lubkov N, Maximov V, Stepanyants A (1988) Methods of analysis and
synthesis of control systems structures. Energoatomizdat, Moscow (in Russian)
3 Statistical Analysis of Reliability Data for
Multi-state Systems
1
f ( x; , ) =
2
{ }
exp ( x ) 2 / 2 2 , < x < +,
118 3 Statistical Analysis of Reliability Data for Multi-state Systems
{1 , 2 } = { , } .
There will then always be an infinite number of functions of sample values,
called a statistic, which may be proposed to estimate one or more of the parame-
ters. Formally a statistic S = S ( X ) is any function of X. The statistic (as a func-
tion!) is called the estimator, meanwhile its numerical value is called the estimate.
Evidently the best estimate would be one that falls nearest to the true value of
the parameter to be estimated. In other words, the statistic whose distribution con-
centrates as closely as possible near the true value of the parameter may be re-
garded as the best estimate. Hence, the basic problem of the estimation in the
above case can be formulated as follows:
Determine the functions of the sample observations
such that their distribution is concentrated as closely as possible to the true value
of the parameter. The estimating functions are then referred to as estimators.
Several properties of estimators are of interest to engineers. The concepts that are
widely used, and sometimes misunderstood, include consistency, unbiasedness, ef-
ficiency, and sufficiency.
{
lim Pr n > = 0.
n
} (3.1)
{ }
E n = , (3.2)
{ }
b = E n .
(3.3)
{} { }
mator, 2 , if Var 1 < Var 2 , where Var { } is the variance. If in a class of
consistent estimators for a parameter there exists one whose sampling variance is
less than that of any other estimator, it is called the most efficient estimator.
Whenever such an estimator exists, it provides a criterion for the measurement of
efficiency of the other estimators.
If is the most efficient estimator with variance V , and is any other esti-
1 1 2
1
E= . (3.4)
2
no other estimator computed from the same sample can provide additional infor-
mation about the parameter.
Point and interval estimations are the two basic kinds of estimation procedures
considered in statistics. Point estimation provides a single number obtained on the
basis of a data set (a sample) that represents a parameter of the distribution func-
tion or other characteristic of the underlying random variable of interest. The point
estimation does not provide any information about its accuracy. As opposed to
point estimation, interval estimation is expressed in terms of confidence intervals,
and the confidence interval includes the true value of the parameter with a speci-
fied confidence probability.
Several methods of point estimation are considered in mathematical statistics.
In this subsection, two of the most common methods, i.e., the method of moments
and the method of maximum likelihood, are briefly described.
The method of moments is an estimation procedure based on empirically estimated
moments (sample moments) of the random variable. We assume that the sample
{ x1 , x2 ,..., xn } was obtained by n observations of continuous random variable X.
Naturally one can define the sample mean and sample variance (the first and the
second moments) as the respective expected values of the sample of size n as fol-
lows:
1 n
x= xi
n i =1
(3.5)
and
1 n
S2 = ( xi x )2 . (3.6)
n i =1
Then x and S 2 can be used as the point estimates of the distribution mean
and variance 2 . It should be mentioned that estimator of variance (3.6) is bi-
ased since x is estimated from the same sample. However, this bias can be re-
moved by multiplying it by n / (n 1):
3.1 Basic Concepts of Statistical Estimation Theory 121
1 n
S2 = ( xi x ) 2 . (3.7)
n 1 i =1
Then, according to the method of moments, the sample moments are equated to
the corresponding distribution moments. The solutions of the equations obtained
provide the estimators of the distribution parameters. Estimates obtained by the
method of moments are always consistent, but they may not be efficient.
In order to illustrate the method of moments we consider the following exam-
ple.
Example 3.1 We assume there is a sample { x1 , x2 ,..., xn } that was taken from the
uniform distribution whose density function is given by
1
, a xb
f ( x) = b a
0, otherwise
b+a (b a )2
= , 2 = .
2 12
On the other hand, based on the sample { x1 , x2 ,..., xn } its mean and variance
can be estimated by using (3.5) and (3.7):
1 n
= xi ,
n i =1
1 n
2 =
n 1 i =1
( xi ) 2 .
Thus according to methods of moments one will have the two following equa-
tions:
122 3 Statistical Analysis of Reliability Data for Multi-state Systems
b+a
= ,
2
(b a) 2
2 = .
12
a = / 3,
b = + / 3.
1 n 1 1 n
a =
n i =1
xi
3
( xi x )2 ,
n 1 i =1
1 n
1 1 n
b = xi + ( xi x )2 .
n i =1 3 n 1 i =1
The maximum-likelihood method is one of the most widely used methods of esti-
mation. This method is based on the principle of calculating values of parameters
that maximize the probability of obtaining a particular sample.
Consider a continuous random variable, X, with probability density function
f ( X , ) , where is a parameter. Assume that we have a sample { x1 , x2 ,..., xn }
of size n from the distribution of random variable X. Under the maximum-like-
lihood approach, the estimate of is found as the value that provides the high-
est (or most likely) probability density of observing the particular set
{ x1 , x2 ,..., xn }. The likelihood of the sample is the total probability of drawing
each item of the sample.
Generally speaking, the definition of the likelihood function is based on the
probability (for a discrete random variable) or on the probability density function
(for a continuous random variable) of the joint occurrence of n events (observa-
tions), X = x1 ,..., X = xn . For independent events the total probability is the prod-
uct of all the individual item probabilities. Thus, the likelihood function for a con-
tinuous distribution is introduced as
L( x1 , x2 ,..., xn , )
= 0, (3.10)
ln L( x1 , x2 ,..., xn , )
= 0. (3.11)
n n
L(t , ) = exp( ti ) = n exp[ ( ti )]
i =1 i =1
n
ln L(t , ) = n(ln ) ti .
i =1
124 3 Statistical Analysis of Reliability Data for Multi-state Systems
L(t , ) n n
= ti = 0 .
i =1
Solving this equation, the maximum likelihood estimate for can be obtained:
n
= n
.
t
i =1
i
It should be noted that the estimate is indeed the maximum likelihood esti-
mate, because we have the following second-order condition:
2 ln L n
2 2
= <0.
2
Example 3.3 Consider the test where only the number of tested items and the
number of failures are known. The measure that can be estimated on the basis of
such a sample is the failure probability, q, within the test period. Find the estimate
q for this probability by using the maximum-likelihood method.
Solution. Let a series of Bernoulli trials have n failures in N trials. Then the likeli-
hood function is
N
L(q) = q n (1 q) N n .
n
N
ln L(q) = + n ln q + ( N n) ln(1 q) .
n
ln L (q) n N n
= .
q q 1 q
3.1 Basic Concepts of Statistical Estimation Theory 125
Therefore the equation for the estimate q of parameter q takes the following
form:
n N n
=0.
q 1 q
n
q = .
N
The result seems to be natural in order to estimate the failure probability one
should calculate the ratio between the number of failed items and the number of
all tested items. It was shown that this estimate is unbiased and effective.
We will use the results from Examples 3.2 and 3.3 for transition intensity point es-
timation for MSSs.
The constants c1 and c2 are called confidence limits and the interval [ c1 , c2 ]
within which the unknown value of the population parameter is expected to lie is
called the confidence interval. (1 ) is called the confidence coefficient.
Below we consider an example illustrating the basic idea of confidence limit
construction.
1 n
Solution. It was shown that sample mean X = xi considering as a statistic has
n i =1
the normal distribution N ( , 2 n). We introduce a new random variable
X
Z= ,
n
which has the standard normal distribution N(0, 1) with mean 0 and standard de-
viation 1. Using this distribution one can write
X
Pr z1 / 2 z1 / 2 = 1 ,
n
Pr X z1 / 2 X + z1 / 2 = 1 .
n n
Pr X 1.96 X + 1.96 = 0.95.
n n
3.2 Classical Parametric Estimation for Binary-state System 127
This means that X 1.96 are 95% confidence limits for the unknown
n
population mean ( parameter ) . The interval X 1.96 , X + 1.96 is
n n
called the 95% confidence interval.
In this section we briefly consider statistical methods for estimating the reliability
model parameter, such as of the exponential distribution, for binary-state sys-
tems. Our goal is to find the point estimate and confidence interval for this pa-
rameter.
Generally, estimation of parameters can be based on field data as well as on
data obtained from a special reliability or life test. In reliability testing, a sample
of components is placed on test under those environmental conditions in which the
components are expected to function. All times to failure are recorded. There are
two major types of tests. The first is testing with replacement of the failed items
where each item should be replaced after its failure by a new one, and the second
is testing without replacement. A sequence of recorded times tofailure is consid-
ered as a given sample for further analysis. A complete sample is one in which all
items have failed during a test for a given observation period, and all the failure
times are known. Note that the likelihood function for a complete sample was in-
troduced in Section 3.1 (Example 3.2). But in the real world, obtaining a complete
sample of observations is often impracticable. Usually we stop the test either at a
prescribed time or after observing a prescribed number of failed items. Otherwise,
the test becomes too time consuming or too costly. Thus, for some items the life-
time is censored, i.e., our information about it has the form the lifetime exceeds
some value t. Modern products are usually reliable enough so that a complete
sample is a rarity. Therefore, generally, reliability data are incomplete and we are
dealing with censored samples.
Let N be the number of items placed on the test and we assume that all items
are tested simultaneously.
If during the test period, T, only r items have failed, the failure times being
known and the failed items are not replaced, the sample is called singly censored
on the right at T. In this case about N-r unfailed items we only know that their
failure times are greater than the test period T. According to Lawless (2002) an
128 3 Statistical Analysis of Reliability Data for Multi-state Systems
T = NTs . (3.13)
If r failures have been observed up to time Ts, then the maximum likelihood
point estimate of the component failure rate can be found in a similar way to Ex-
ample 3.2:
r
= . (3.14)
T
Thus, the corresponding estimate of the components mean time to failure can
be obtained:
T
MTTF = . (3.15)
r
It should be noticed that the number of units tested during the test, ntest, is
3.2 Classical Parametric Estimation for Binary-state System 129
ntest = N + r . (3.16)
If only one item is placed on test ( N = 1) and r failures were recorded during
the test time T, then we obtain
r r
= = . (3.17)
T Ts
Recall that expressions (3.14)(3.17) are true under the assumption that re-
placement times are negligibly small. If it is not so, then the total accumulated re-
placement time, TR , should be calculated and the failure rate may be estimated by
the following expression:
r
= . (3.18)
Ts TR
r
T = ( N r )Ts + ti , (3.19)
i =1
where
ti recorded time to failure for failed item i,
r
t
i =1
i accumulated time on test of the r failed items,
T = NTr . (3.20)
130 3 Statistical Analysis of Reliability Data for Multi-state Systems
r r
= = . (3.21)
T NTr
T
MTTF = . (3.22)
r
ntest = N + r 1, (3.23)
because the test is terminated when the last failed item fails, and so the last failed
item is not replaced.
N identical items are placed on test without replacement when a failure occurs,
the failed item is not replaced by a new one. The test is terminated after a time, Tr,
when the rth failure has occurred.
The total time on test, T, is obtained by
r
T = ( N r )Tr + ti , (3.24)
i =1
where
ti recorded time to failure for failed item i,
r
t
i =1
i accumulated time on test of the r failed items,
r r
= = r
. (3.25)
T
( N r )Tr + ti
i =1
3.2 Classical Parametric Estimation for Binary-state System 131
r
( N r )Tr + ti
T
MTTF = = i =1
. (3.26)
r r
ntest = N (3.27)
Epstein (1960) considered the failure-terminated test and showed that if the time
2r
to failure is exponentially distributed with parameter , the variable = 2T
has the 2 distribution with 2r degrees of freedom. Therefore, one can write
2r
Pr 2 2;2 r 12 2;2 r = 1 . (3.28)
Taking into account = r , after rearranging one will have a two-sided con-
T
fidence interval for the true value of :
1 2 1 2
Pr 2;2 r 1 2;2 r = 1 . (3.29)
2T 2T
So one can obtain the upper confidence limit or the one-sided confidence inter-
val
1 2
Pr 1 2;2 r = 1 . (3.30)
2T
For the time-terminated test the exact confidence limits are not available. For
this case the approximate two-sided confidence interval for the failure rate, , was
obtained as
132 3 Statistical Analysis of Reliability Data for Multi-state Systems
1 2 1 2
Pr 2;2 r 1 2;2 r + 2 = 1 . (3.31)
2T 2T
1 2
Pr 1 2;2 r + 2 = 1 . (3.32)
2T
Actually, a binary-state system is the simplest case of a MSS having two distinc-
tive states (perfect functioning and complete failure). Point estimation for transi-
tion intensities of two-state (binary) Markov models was briefly considered in the
previous subsections. But till now there have been almost no investigations con-
sidering this problem in a multi-state context, in spite of the fact that it is an actual
practical problem. For example, in the field of power system reliability assessment
it has been recognized (Billinton and Allan 1996) that modeling large generating
units in generating capacity adequacy assessment by simple two-state models can
yield pessimistic appraisals. In order to assess unit reliability more accurately,
many utilities now use multi-state models instead of two-state representations. In
these models steady-state probabilities of a unit residing at different generating
capacity levels are used. Usually a steady-state probability of a unit residing at a
specified capacity level is simply defined as the part of the operation time when
the unit is at this capacity level. When the short-term behavior of MSSs is studied,
the investigation cannot be based on steady-state (long-term) probabilities. The
investigation should use a general MSS model, where transition intensities be-
tween any states of the model are known. The problem is to estimate these transi-
tion intensities from actual MSS failures (output performance deratings) and repair
statistics, which is represented by the observed realization of an output perform-
ance stochastic process. Below we shall present the corresponding technique for
point and interval estimation of transition intensities via output performance ob-
servation. The technique was primarily presented in Lisnianski (2008).
3.3 Estimation of Transition Intensities via Output Performance Observations 133
A general Markov model of a MSS with minor and major failures and repairs
(Lisnianski and Levitin 2003) is presented in Figure 3.1.
There are N states in the model, where each state i [1,..., N ] has its own as-
signed performance level gi . Usually state N is associated with the nominal per-
formance level and state 1 is associated with complete system failure, and all other
states i [ 2,..., N 1] are associated with the corresponding reduced performance
levels gi . The transition intensity from state i to state j is designated as aij .
aN-1,N aN,N-1
N-1
a2,N aN,2
aN-2,N-1 aN-1,N-2
a1,N ... ... aN,1
a2,3 a3,2
a1,N-1 aN-1,1
2
a1,2 a2,1
As a result, MSS output performance is known for any time instant t [0, T ] ,
where T is the total observation time, as well as the corresponding time instants of
MSS transitions from any performance level gi to level g j , i, j [1,..., N ] . The
example of a single realization of such a stochastic process is presented in Figure
3.2.
By its nature stochastic process GA (t ) is a discrete-state continuous-time proc-
ess. For this stochastic process the following designations are introduced.
Ti ( m ) systems sojourn time of mth residing in state i during observation time
T;
ki accumulated number of system entrances into state i (or accumulated num-
ber of system exits from state i to any other state) during observation time T;
134 3 Statistical Analysis of Reliability Data for Multi-state Systems
Thus, the reliability data for MSS that can be derived from the observation of
the output performance stochastic process during time T are the following.
For each state i are known:
{ }
1. sample Ti (1) , Ti (2) ,..., Ti ( ki ) of system sojourn times in state i during observation
time T;
2. number kij of system transitions from state i to any possible state j during ob-
servation time T; and
3.3 Estimation of Transition Intensities via Output Performance Observations 135
3. number k i of system residences in state i (or number of system exits from state
i to any other possible state) during observation time T.
The problem is to estimate transition intensities aij , i, j [1,..., N ] based on a
single realization of discrete-state continuous-time stochastic process GA (t ) that
was observed during time T.
aij t
Fij (t ) = 1 e , (3.33)
Q ( t ) = Qij ( t ) . (3.34)
These one-step probabilities for a kernel matrix may be defined in the follow-
ing way (Lisnianski and Jeager 2000). Each probability Qik ( t ) defines the prob-
ability that random variable Tik will be minimal from all other random variables
Tij , j i, j k , j = 1,..., N , which defines all possible transitions from state i to all
other states. thus, for each k i one will have
Based on (3.35) one can obtain the one-step probability Qik ( t ) as the probabil-
ity that under condition Tik t the random variable Tik will be less than all other
variables Tij , j i, j k , j = 1,..., N .
Hence, for each i = 1, 2,..., N and k i can be written the following expression
Qik ( t )
{
= Pr (Tik t ) & (Ti1 > t ) & ... & (Ti , k 1 > t ) & (Ti , k +1 > t ) &...& ( TiN > t ) }
t
= dFik (u ) dFi1 (u )... dFi , k 1 (u ) dFi , k +1 (u )... dFiN (u ) (3.36)
0 t t t t
t
= [1 Fi1 (u ) ] ... 1 Fi , k 1 (u ) 1 Fi , k +1 (u ) ...[1 FiN (u ) ]dFik (u ).
0
By using (3.36) and taking into account expression (3.33) one obtains
aij t
N
a
Qik (t ) = N ik 1 e j=1 . (3.37)
aij
j =1
So, for a Markov model of a MSS, the unconditional sojourn time Ti is an ex-
ponentially distributed random variable with mean
3.3 Estimation of Transition Intensities via Output Performance Observations 137
1 1
Timean = N
= , (3.39)
A
aij
j =1
N
where A = aij .
j =1
ki
Ti( j )
j =1
Timean = . (3.40)
ki
Based on (3.39) and (3.40) one can write the following expression for estimat-
ing the sum A of intensities of all transitions that exit from state i:
1 ki
A = = . (3.41)
Timean
ki
T
j =1
i
( j)
By using expression (3.41) one can estimate only the sum of intensities for all
transitions that exit from any state i. To estimate individual transition intensities,
an additional expression can be obtained in the following way.
Based on kernel matrix Q(t) for stochastic process GA (t ) one can obtain one-
step transition probabilities for embedded Markov chain GAm (t ):
aij t
N
a
ik = lim Qik (t ) = lim N ik 1 e j=1 = aik (3.43)
t t
aij N
j =1 aij
j =1
or
138 3 Statistical Analysis of Reliability Data for Multi-state Systems
N
aik = ik aij . (3.44)
j =1
kik
ik = . (3.45)
ki
Substituting estimates (3.41) and (3.45) into expression (3.44) the following es-
timate will be obtained for the transition intensity:
k 1 kik kik
aik = ik A = ik = = , i, k = [1,..., N ], i k , (3.46)
ki Timean ki
T i
Ti ( j )
j =1
where T i is the systems accumulated time residence in state i during total ob-
servation time T.
N
For a Markov MSS with N states the sum a
j =1
ij = 0 , therefore
N
aii = aij . (3.47)
j =1
j i
Based on the method described in the previous subsection, the following algorithm
for data processing is suggested for multi-state Markov systems with N possible
states.
1. Calculate accumulated time of the systems residence in state i during total ob-
servation time T:
3.3 Estimation of Transition Intensities via Output Performance Observations 139
ki
T i = Ti(m) .
m =1
2. Estimate transition intensity aij from state i to state j i using the following
expression:
kij
aij = .
T i
N
aii = aij .
j =1
j i
MSS output performance was observed during time T; therefore, in this case we
are dealing with a time-terminated test. Thus, based on expression (3.31) de-
scribed in Section 3.2.3, the following two-sided confidence interval for the true
value of aij can be written:
1 1
Pr 2 2;2 kij aij 12 2;2 kij + 2 = 1 . (3.48)
2T i 2T i
1
Pr aij 12 ;2 kij + 2 = 1 . (3.49)
2T
i
140 3 Statistical Analysis of Reliability Data for Multi-state Systems
State number 1 2 3 4
1 - 0 0 31
2 18 - 0 64
3 11 0 - 50
4 20 43 58 -
Find the point and interval estimations of transition intensities for a four-state
Markov model of the diesel generator.
Solution.
1. According to the given data, accumulated times of the systems residence in
states i=1,,4 during total observation time are as follows
T 1 = 480 h, T 2 = 742 h, T 3 = 511 h, T 4 = 7027 h.
2. Transition intensities should be estimated using the following expression:
kij
aij = , for i j .
T i
Therefore, based on the given kij in Table 3.1, we obtain the following point es-
timates:
3.3 Estimation of Transition Intensities via Output Performance Observations 141
0 0 31
a12 = = 0, a13 = = 0, a14 = = 0.065 h 1 ,
480 480 480
18 0 64
a21 = = 0.024 h 1 , a23 = = 0, a24 = = 0.086 h 1 ,
742 742 742
11 0 50
a31 = = 0.022 h 1 , a32 = = 0, a34 = = 0.098 h 1 ,
511 511 511
20 43 58
a41 = = 0.003 h 1 , a42 = = 0.006 h 1 , a43 = = 0.008 h 1 .
7027 7027 7027
4
aii = aij , i = 1,..., 4.
j =1
j i
4. As a result using the presented algorithm the following matrix of point estima-
tions of transition intensities was computed:
0.065 0 0 0.065
0.024 0.110 0 0.086
aij = .
0.022 0 0.120 0.098
0.003 0.006 0.008 0.017
5. Now based on expressions (3.48) the two-sided confidence interval for the true
values of transition intensities can be obtained.
For example, to calculate the two-sided confidence interval for a14 we have
kij = 31 and T 1 = 480 h; therefore, for = 0.1, by using (3.48) one obtains
1 1
Pr 0.05;2
2
31 a14 0.95;2
2
31+ 2 = 1 0.1 = 0.9,
2 * 480 2 * 480
This means that the true value of a14 is within the interval [0.047, 0.087] with a
probability 0.9. All other confidence intervals can be found in the same way and
readers can do it themselves as an exercise.
References
Ayyub B, McCuen R (2003) Probability, statistics and reliability for engineers and scientists.
Chapman & Hall/CRC, London, NY
Bickel P, Doksum K (2007) Mathematical statistics. Pearson Prentice Hall, New Jersey
Billinton R, Allan R (1996) Reliability evaluation of power system, Plenum, New York
Epstein B (1960) Estimation from life test data. Technometrics 2:447454
Fisher R (1925) Theory of statistical estimation. Proceedings of the Cambridge Philosophical
Society, 22:700-725
Fisher R (1934) Two new properties of mathematical likelihood. Proceedings of Royal Society,
A, 144: 285307
Gertsbakh I (2000) Reliability theory with application to preventive maintenance. Springer, Lon-
don
Hines W, Montgomery D (1997) Probability and statistics in engineering and management sci-
ence. Wiley, New York
International Standard IEC60605-4 (2001) Procedures for determining point estimates and confi-
dence limits for equipment reliability determination tests. International Electrotechnical
Commission, Geneva, Switzerland
Korolyuk V, Swishchuk A (1995) Semi-Markov random evolutions. Kluwer, Dordrecht
Lawless J (2002) Statistical models and methods for lifetime data. Wiley, New York
Lehmann E, Casella G (2003) Theory of point estimation. Springer-Verlag, NY
Limnious N, Oprisan G (2000) Semi-Markov processes and reliability. Birhauser, Boston
Lisnianski A (2008) Point estimation of the transition intensities for a Markov multi-state system
via output performance observation. In: Bedford T et al (eds) Advances in Mathematical
Modeling for Reliability. IOS, Amsterdam
Lisnianski A, Jeager A (2000) Time-redundant system reliability under randomly constrained
time resources. Reliab Eng Syst Saf 70:157166
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific Singapore
Meeker W, Escobar L (1998) Statistical methods for reliability data. Wiley, New York
Modarres M, Kaminskiy M, Krivtsov V (1999) Reliability engineering and risk analysis: a prac-
tical guide. Dekker, New York
Neyman J (1935) On the problem of confidence intervals. Ann Math Stat 6:111116
4 Universal Generating Function Method
In recent years a specific approach called the universal generating function (UGF)
technique has been widely applied to MSS reliability analysis. The UGF technique
allows one to find the entire MSS performance distribution based on the perform-
ance distributions of its elements using algebraic procedures. This technique
(sometimes also called the method of generalized generating sequences) (Gne-
denko and Ushakov 1996) generalizes the technique that is based on a well-known
ordinary generating function. The basic ideas of the method were primarily intro-
duced by I. Ushakov in the mid 1980s (Ushakov 1986, 1987). Then the method
was described in a book by Reinshke and Ushakov (1988), where one chapter was
devoted to UGF. (Unfortunately, this book was published only in German and
Russian and so remained unknown for English speakers.) Wide application of the
method to MSS reliability analysis began in the mid-1990s, when the first applica-
tion was reported (Lisnianski et al. 1994) and two corresponding papers (Lisnian-
ski et al. 1996; Levitin et al. 1998) were published. Since then, the method has
been considerably expanded in numerous research papers and in the books by Lis-
nianski and Levitin (2003), and Levitin (2005).
Here we present the mathematical fundamentals of the method and illustrate the
theory by corresponding examples in order to provide readers with a basic knowl-
edge that is necessary for understanding the next chapters.
The UGF approach is based on intuitively simple recursive procedures and pro-
vides a systematic method for the system states enumeration that can replace ex-
tremely complicated combinatorial algorithms. It is very convenient for a comput-
erized realization of the different enumeration problems that often arise in MSS
reliability analysis and optimization.
Generally, the UGF approach allows one to obtain the systems output per-
formance distribution based on the given performance distribution of the systems
elements and the system structure function. In many real-world problems it can be
done by using a simple algebraic operation and does not require great computa-
tional resources. The computational burden is the especially crucial factor when
144 4 Universal Generating Function Method
one solves MSS reliability analysis and optimization problems where the perform-
ance measures have to be evaluated for a great number of possible solutions along
the search procedures. This makes using the traditional methods in MSS reliability
analysis and optimization problematic. In contrast, the UGF technique is fast
enough to be implemented in such problems and has proved to be very effective.
The UGF approach is universal enough because an analyst can use the same
procedures for systems with a different physical nature of performance and differ-
ent types of element interaction.
The UGF technique is based on an approach that is closely connected to gener-
ating functions that are widely used in probability theory. Therefore, we consider
these functions first.
Consider a discrete random variable X that can take values k = 0,1, 2, and have
the following distribution (probability mass function):
X ( z) = pk z k . (4.2)
k =0
a k a
Pr{ X = k } = pk = e , k = 0,1, 2,
k!
ak a k ( az ) k
X ( z) = pk z k = e z = ea = e a eaz = ea ( z 1) .
k =0 k =0 k ! k =0 k !
The generating function is very convenient when one deals with the summation
of discrete random variables. In order to explain this fact we consider the follow-
ing example.
Example 4.2 Suppose we have two discrete random variables X and Y with the
following distributions (pmf):
k 0 1 2 3
Pr{X=k} 0.5 0.3 0.2
Pr{Y=k} 0.6 0.4
Z = 2, if X = 1 and Y = 1, then
Z = 3, if X = 2, Y = 1, or X = 0, Y = 3, then
Pr {Z = 3} = Pr { X = 2} Pr {Y = 1} + Pr { X = 0} Pr {Y = 3}
=0.2 0.6 + 0.5 0.4 = 0.32.
Z = 4, if X = 1, Y = 3, then
Z = 5, if X = 2 and Y = 3, then
k 1 2 3 4 5
Pr{Z=k} 0.30 0.18 0.32 0.12 0.08
Note that in order to find the Z distribution directly one should analyze all pos-
sible combinations of X and Y values. In more complex cases it may be very time-
consuming work. Using generating functions can prevent such difficulties.
The second way to solve the problem is based on generating functions. Let X ( z )
and Y ( z ) be the generating functions of the respective distributions of random
variables X and Y. Then, according to (4.2), we can write
1 k
Pr{ X k} = k X ( z ) = pi . (4.3)
z z =1 i = 0
This means that in order to find Pr{ X k} only coefficients of z powers in the
generating function of random variable X, where powers are less then or equal to k
should be summarized.
Furthermore, it is clear that
d
X ( z ) |z =1 = kpk z k 1 = kpk = E{ X }. (4.4)
dz k =1 z =1 k =1
d2
2
X ( z ) = k (k 1) pk z k 2 . (4.5)
d z k =0
d2
d2z
X ( z ) | z =1 =
k =0
k ( k 1) pk =
k =0
k 2
p k
k =0
kpk . (4.6)
The first sum in the last expression (4.6) is the second initial moment 2 [ X ] of
random variable X, and the second sum is the expectation of random variable X.
Therefore, based on the generating function of random variable X one can obtain
an expression for the second initial moment 2 [ X ]:
d2 d
2[ X ] = 2
X ( z ) |z =1 + X ( z ) |z =1 . (4.7)
d z dz
This means that the second initial moment of the random variable can be ex-
pressed via the sum of the second and first derivatives of the generating function
for z = 1.
Example 4.3 Suppose that discrete random variable X is distributed according to
the Poisson distribution
148 4 Universal Generating Function Method
a k a
Pr{ X = k } = pk = e , k = 0,1, 2,
k!
Find the expectation E{X} of random variable X using its generating function.
Solution. In Example 4.1 the generating function of random variable X (distributed
according to a Poisson distribution) was found to be
X ( z ) = e a ( z 1) .
d d
X ( z ) = e a ( z 1) = ae a ( z 1) .
dz dz
d
E{ X } = X ( z ) |z =1 = ae a ( z 1) |z =1 = a.
dz
e sx px , if X is discrete,
x
( s) = E[e sX ] = + (4.8)
e sx f ( x)dx, if X is continuous.
4.1 Mathematical Fundamentals 149
d d d
( s ) = E[e sX ] = E (e sX ) = E[ Xe sX ] = E[ X ]. (4.9)
ds ds ds
Similarly,
d2 d d d
2
( s) = ( s) = E[ Xe sX ]
d s ds ds ds
(4.10)
d
= E ( Xe sX = E[ X 2 e sX ] = E[ X 2 ].
ds
dn
(0) = E[ X n ], n 1. (4.11)
d ns
X ( z ) = E[ z X ] = px z X . (4.12)
x
X 0 1.65 2.3
Find the moment generating function and z-transform for random variable X.
150 4 Universal Generating Function Method
Solution. In accordance with definition (4.8) one can write the moment generating
function of random variable X as
In accordance with definition (4.12) one can obtain the following z-transform
of random variable X:
X +Y ( z ) = E[ z ( X +Y ) ] = E[ z X zY ] = E[ z X ]E[ z Y ] = X ( z ) Y ( z ). (4.13)
Assume that the pmfs of random variables X1 and X2 are represented by the
vectors
{ } {
x1 = x11 ,..., x1k1 , p1. = p11 ,..., p1k1 } (4.14)
and
{ } {
x 2 = x21 ,..., x2 k1 , p 2. = p21 ,..., p2 k1 , } (4.15)
respectively.
This means that discrete random variables X i , i = 1, 2, can take values
{xi1 ,..., xiki } with corresponding probabilities { pi1 ,..., piki }. Therefore, moment
generating functions corresponding to the pmfs of random variables Xi will be fol-
lows:
4.1 Mathematical Fundamentals 151
ki
X ( z ) = pij z .
xij
i
j =1
k1 k2
( z ) = X1 ( z ) X 2 ( z ) = p1i z x1i p2 j z
x2 j
X 1 + X2
i =1 j =1
k1 k2 k1 k2
(4.16)
= p1i p2 j z z = p1i p2 j z
x1i x2 j ( x1i + x2 j )
.
i =1 j =1 i =1 j =1
n
n ( z ) = X j ( z ) =
Xj j =1
j =1
(4.17)
k1 k2 kn
= ... ( p1 j1 p2 j2 ... pnjn ) z
( x1 j1 + x2 j2 + ... xnjn )
.
j1 =1 j2 =1 jn =1
k
Find the z-transform for random variable X = X i that represents the number
i =1
{ } {
Pr X j = 1 = p, Pr X j = 0 = 1 p. }
The corresponding z-transform takes the form
X ( z ) = z1 + (1 ) z 0 .
j
The random number of successes that occur in k trials is equal to the sum of the
number of successes in each trial:
k
X= Xj
j =1
k k
X ( z ) = X ( z ) = z1 + (1 ) z 0
j
j =1
k
k
= z j j (1 ) k j .
j =0 j
k
X n = i, pi = i (1 ) k i , i = 0,1, 2,..., k .
i
Consider two independent discrete random variables X1, X2 and assume that each
variable X i , i = 1, 2, has a pmf represented by the vectors xi = {x11 ,..., x1ki } and
pi = { p11 ,..., p1ki }.
4.1 Mathematical Fundamentals 153
K = k1k2 , (4.18)
2
q j = piji = p1 j1 p2 j2 , j = 1, 2, , K , (4.19)
i =1
y j = f ( x1 j1 , x2 j2 ), j = 1, 2,, K . (4.20)
K k1 k2
uY ( z ) = q j z = p1 j1 p1 j2 z
yj f ( x1 j1 , x2 j2 )
. (4.21)
j =1 j1 =1 j2 =1
If one compares Equation 4.21, where the z-transform for random variable
Y = f ( X 1 , X 2 ) was found, with expression (4.16), where the z-transform for ran-
dom variable Y = X 1 + X 2 was found, one notices the following.
ables X1 and X2, in order to find the corresponding powers of z one should calcu-
late for them a value of the given function f.
Therefore, one can see that in such an interpretation the z-transform formally is
not polynomial because the corresponding powers of z for the resulting polynomi-
als product are obtained by summing the corresponding z powers of individual
polynomials.
To define formally such an action as (4.21) with individual z-transforms, a uni-
versal generating operator (UGO) f was introduced. Application of this opera-
tor to individual z-transforms of independent random variables X1 and X2 will pro-
duce a z-transform of random variable Y = f ( X 1 , X 2 ) .
Let the functions
k1
= p1i z x1i
x1 k1
u X1 ( z ) = p11 z x11 + p12 z x12 + ... + p1k1 z
i =1
and
k2
= p2i z x2 i
x2 k2
u X 2 ( z ) = p21 z x21 + p22 z x22 + ... + p2 k2 z
i =1
k1 k2
{ }
f u X1 ( z ), u X 2 ( z ) = p1 j1 p2 j2 z
j1 =1 j2 =1
f ( x1 j1 , x2 j2 )
= uY ( z ). (4.22)
kj
u X j ( z ) = p ji z ji , j = 1, 2,, n,
x
i =1
f {u X1 ( z ), u X 2 ( z ),..., u X n ( z )}
k1 k2 kn (4.23)
= ... ( p1 j1 p2 j2 ... pnjn ) z
f ( x1 j1 , x2 j2 ,..., xnjn )
.
j1 =1 j2 =1 jn =1
One can see that using such Definition 4.5 is very useful for MSS reliability
evaluation. Each multi-state element j, j=1, 2, , n, in the MSS can be repre-
sented by its individual z-transform u X j ( z ) that characterizes the elements possi-
ble performance levels xji and corresponding probabilities pji, where i = 1, 2,, k j .
The MSS, in its turn, is represented by the structure function f ( X 1 , X 2 , , X n ).
In this case operator f produces the resulting z-transform of MSS output per-
formance or, in other words, determines output performance levels yi and corre-
sponding probabilities pi, where i = 1, 2, , K . Here K is the total number of possi-
ble performance levels of the entire MSS and can be obtained as
n
K = kj. (4.24)
j =1
kj
u X j ( z ) = p ji z ji , j = 1, 2, , n,
x
i =1
represent pmfs of n random variables Xj and function f represent the new random
variable Y = f ( X 1 , X 2 , , X n ). .
These individual z-transforms are called universal generating functions
(UGFs), if and only if a corresponding UGO f is defined for them.
So z-transforms become UGFs if for them is defined a corresponding UGO
f .
One can notice that in a computational sense, the introduction of the auxiliary
variable z permits us to separate the variables of interest: p and x. According to
156 4 Universal Generating Function Method
(4.22) and (4.23), the UGO determines different actions with probabilities p and
performance levels x. From this point of view the z-transform is only useful as a
visual presentation, not more. Based on an understanding of this fact, we introduce
a more general definition for UGO (Gnedenko and Ushakov 1995, Ushakov 1998,
2000).
Definition 4.7 Let two sequences A and B represent two pmfs of random variables
XA and XB:
A= {( p A1 , x A1 ) ,
( p A2 , xA 2 ) ,, ( pAk A
}
, x Ak A ) ,
B = {( p B1 , xB1 ) , ( pB 2 , xB 2 ) , , ( pBk
B
, xBkB )} .
A UGO f operates on the pair of sequences A and B and produces a new se-
quence C = f { A, B} of pairs that represents a pmf of random variable
X C = f ( X A , X B ) in the following manner: for each pair { pAi , xAi } and
{ pBi , xBi } the pair {p Ai pBi , f ( x Ai , xBi )} should be computed.
As one can see, this definition is analogous to Definition 4.4, but it does not use
a z-transform at all.
Usually the resulting pairs of the obtained sequence C should be ordered in ac-
cordance with increasing values of their second components.
In addition, when two or more pairs in the newly obtained sequence C have the
same value of their second components, one should combine all such selected
pairs into a single pair. The first component of this single pair is the sum of all
first components of the selected pairs, and the second component of the new pair
is equal to the value of the same second components of the selected pairs. This
procedure is analogous to the procedure of like-term collection in the resulting z-
transform.
More formally we can write
f ( A, B ) = C
or, since each component of sequence C is a pair of numbers, it can also be rewrit-
ten as
{
f ( A, B ) = fp ( A, B ), fx ( A, B ) , }
where fp ( A, B ) = p Ai pBi is a suboperator that operates on the first components
of sequences A and B and fx ( A, B ) = f ( xAi , xBi ) is a suboperator that operates on
the second components of sequences A and B.
4.1 Mathematical Fundamentals 157
{
Si = ( p X11 , xX1 1 ),..., ( p X1k1 , x X1k1 ) }
... ...
{
Sn = ( p X n 1 , x X n 1 ),..., ( p X n kn , x X n kn ) }
A UGO f operates on the set of sequences S1 ,, S n and produces a new
sequence S = f {S1 ,, S n } of pairs, which represents a pmf of random variable
Y = f ( X 1 , X 2 , , X n ) in the following manner:
for each possible combination of pairs
( p X1 j1 , x X1 j1 ), ( p X 2 j2 , x X 2 j2 ),..., ( p X1 jn , x X1 jn )
j1 = 1, , k1 , j2 = 1, , k2 , , jn = 1, , kn
the pair
(p X1 j1 (
p X 2 j2 ... p X n jn , f x X1 j1 , x X 2 j2 ,..., x X n jn )) (4.25)
should be computed.
One can see that this definition in a computational sense is analogous to Defini-
tion 4.5, but it is not based on a z-transform. Therefore, it is clear that the UGO
plays a central role and the z-transform is only useful as a visual representation of
individual sequences Si and the resulting sequence S. This representation is useful
for us, and below we shall use such a z-transform representation for pmfs of dis-
crete random variables that characterize the performance of an individual MSSs
components and an entire MSSs output performance.
In addition, it should be noted that theoretically each sequence Si can be com-
posed not only of pairs, but, for example, of triplets:
{
Si = ( pi1 , xi1 , vi1 ),..., ( piki , xiki , viki ) . }
In practice this corresponds to the case where performance is represented by a
vector. For example, an electrical generator can have different levels of generating
158 4 Universal Generating Function Method
capacity (x) and energy producing costs (v) corresponding to each level. For such
cases two different suboperators fx ( S1 , S2 ,..., Sn ) and fv ( S1 , S2 ,..., Sn ) for sepa-
rate operations with x and v should be determined. For z-transform representation
this means that z powers in the general case may be vectors, not only scalars.
Therefore, this is the second reason why z-transforms in a UGF interpretation are
not polynomial. The first reason was already listed above: an operator defined
over z-functions can differ from the operator of the polynomial product (unlike the
ordinary z-transform, where only the product of polynomials is defined).
u ( z , t ) = p1 (t ) z g1 + p2 (t ) z g2 + ... + pK (t ) z g K (4.27)
4.2 Universal Generating Function Technique 159
U ( z ) = f ( u1 ( z ), u2 ( z ),..., un ( z ) ) . (4.28)
Recall that in MSS reliability interpretation the coefficients of the terms in the
u-function usually represent the probabilities of states and corresponding perform-
ance levels encoded by the exponent in these terms.
Straightforward computation of the pmf of function f ( X 1 , , X n ) using
(4.23) is based on an enumerative approach. This approach is extremely resource
consuming. Indeed, the resulting u-function U(z) associated with structure function
f ( X 1 , , X n ) contains K terms, which requires excessive storage space. In or-
der to obtain U(z) one has to perform (n-1)K procedures of probability multiplica-
tion and K procedures of function evaluation. Fortunately, there are two effective
ways to reduce the computational burden: like-terms collection and a recursive
procedure.
The u-functions inherit the essential property of the regular polynomials: they al-
low for collecting like terms. Indeed, if a u-function representing the pmf of a ran-
160 4 Universal Generating Function Method
dom variable X contains the terms ph z xh and pm z xm for which xh = xm , the two
terms can be replaced by a single term ( ph + pm ) z xm , since in this case
Pr { X = xh } = Pr { X = xm } = ph + pm .
Example 4.6 Suppose that the resulting UGF that was found using the composi-
tion operator takes the following form:
Y = f ( X 1 ,, X 5 ) = ( max ( X 1 , X 2 ) + min ( X 3 , X 4 ) ) X 5
of five independent random variables X1, , X5. The probability mass functions of
these variables are determined by pairs of vectors xi, pi ( 0 i 5 ):
{( 5, 8, 12 ) , ( 0.6, 0.3, 0.1)} , {( 8, 10 ) , ( 0.7, 0.3)} ,{( 0, 1) , ( 0.6, 0.4 )} ,
{( 0, 8, 10 ) , ( 0.1, 0.5, 0.4 )} , {(1, 1.5) , ( 0.5, 0.5)}.
4.2 Universal Generating Function Technique 161
Using the straightforward approach one can obtain the pmf of random variable
Y applying operator f (4.22) over these u-functions. Since k1 = 3, k2 = 2, k3 = 2,
k4 = 3, k5 = 2, the total number of term multiplication procedures that one has to
perform using this equation is 3 2 2 3 2 = 72.
In order to demonstrate the recursive approach we introduce three auxiliary
random variables X6, X7, and X8: X6 = max{X1, X2}; X7 = min{X3, X4}; X8 = X6 +
X7, and Y = X8 X5.
We can obtain the pmf of variable Y using composition operators over pairs of
u-functions as follows:
{
u6 ( z ) = max {u1 ( z ) , u2 ( z )} = max (0.6z 5 + 0.3z 8 + 0.1z12 ), (0.7z 8 + 0.3z10 ) }
max {5,8} max{8,8} max {12,8} max {5,10}
= 0.42z 0.21z + 0.07z + 0.18z
max {8,10} max {12,10}
+ 0.09z + 0.03z = 0.63z + 0.27z10 + 0.1z12 ;
8
{
u7 ( z ) = min {u3 ( z ) , u4 ( z )} = min (0.6z 0 + 0.4z 2 ), (0.1z 0 + 0.5z 3 + 0.4z 5 ) }
min {0,0} min {2,0} min {0,3} min {2,3} min{0,5}
= 0.06z + 0.04z + 0.3z + 0.2z +0.24z
min {2,5}
+ 0.16z = 0.64z 0 + 0.36z 2 ;
{
u8 ( z ) = + {u6 ( z ) , u7 ( z )} = + (0.63z 8 + 0.27z10 + 0.1z12 ), (0.64z 0 + 0.36z 2 ) }
8+ 0 10 + 0 12 + 0 8+ 2
= 0.4032z + 0.1728z + 0.064z + 0.2268z
10 + 2 12 + 2
+ 0.0972z + 0.036z = 0.4032z + 0.3996z10 + 0.1612z12 + 0.036z14 ;
8
162 4 Universal Generating Function Method
U ( z ) = {u8 ( z ) , u5 ( z )}
{
= (0.4032z 8 + 0.3996z10 + 0.1612z12 + 0.036z14 ), (0.5z1 + 0.5z1.5 ) }
81 101 121 141 81.5
= 0.2016z + 0.1998z + 0.0806z + 0.018z + 0.2016z
101.5 121.5 141.5
+ 0.1998z + 0.0806z + 0.018z
=0.2016z + 0.1998z + 0.2822z12 + 0.018z14 + 0.1998z15
8 10
+ 0.0806z18 + 0.018z 21 .
The resulting u-function U(z) represents the pmf of Y, which takes the form
Note that during the recursive computation of this pmf we used only 26 term
multiplication procedures. This considerable computational complexity reduction
is possible by combining the recursive approach with like-term collection in in-
termediate u-functions.
K
U ( z , t ) = pi (t ) z gi ,
i =1
one can obtain the system availability at instant t > 0 for the arbitrary constant
demand w using the following operator A:
K K
A(t , w) = A (U ( z , t ), w) = A ( pi (t ) z gi , w) = pi (t )1( F ( g i , w) 0), (4.29)
i=1 i=1
1, if F ( gi , w) 0,
1( F ( gi , w) 0) =
0, if F ( gi , w) < 0.
This means that operator A summarizes for any time instant t > 0 all prob-
abilities of acceptable states.
The MSS instantaneous expected output performance at instant t > 0 defined
by (1.22) can be obtained for the given U ( z , t ) using the following E operator:
K K
E (t ) = E (U ( z , t )) = E ( pi (t ) z g i ) = pi (t ) gi . (4.30)
i =1 i =1
K
dU ( z , t ) K
E ( pi (t ) z g ) =
i
z =1 = pi (t ) gi . (4.31)
i =1 dz i =1
The conditional mean MSS performance [the mean performance of the MSS
given the system is in states for which F ( gi , w) 0] defined by (1.25) can be ob-
tained using the CE operator:
K
E * = CE (U ( z , t )) = CE pi (t ) z gi
i =1 (4.32)
K K
= pi (t ) gi 1( F ( gi , w) 0) / pi (t )1( F ( gi , w) 0).
i =1 i =1
The average MSS expected output performance for a fixed time interval [0,T] is
defined according to (1.24) as follows:
T T
1 1 K
ET = E (t )dt = gi pi (t )dt. (4.33)
T 0 T i =1 0
In order to obtain the mean instantaneous performance deficiency for the given
U ( z , t ) and the constant demand w according to (1.30), the following D operator
should be used:
164 4 Universal Generating Function Method
K K
D(t , w) = D (U ( z , t ), w) = D ( pi (t ) z gi , w) = pi (t ) max( w gi , 0). (4.34)
i =1 i =1
The average accumulated performance deficiency for a fixed time interval [0,T]
is defined according to (1.31) as follows:
T K T
D T = D(t , w)dt = max( w gi ) pi (t )dt. (4.35)
i =1
0 0
i = 1,, K exist, one can determine the MSS steady-state availability A , the
mean steady-state performance E , and the mean steady-state performance defi-
ciency D by replacing pi(t) with pi in (4.29), (4.31), and (4.34), respectively.
Note that here we do not consider the UGF approach application to evaluating
such reliability indices as mean time to failure and mean number of failures. An
interesting method for calculating the steady-state failure frequency (or mean
number of failures) was suggested by Korczak (2007, 2008). In these works an ex-
tension of the UGF method for simultaneous steady-state availability and failure
frequency calculation was presented. The suggested method is based on dual-
number algebra.
Example 4.8 Consider multi-state element with minimal failures and repairs that
has three different output performance rates: g1 = 0, g 2 = 20, and g 3 = 40 . The
corresponding transition intensities are: 2,1 = 2.02 year 1 , 3,2 = 7.01 year 1 ,
1,2 = 10 year 1 , and 2,3 = 14 year 1 .
A states-transition diagram of the element is presented in Figure 4.1.
The element fails if its performance falls below the required demand w = 15,
therefore, its acceptability function takes the form F ( gi , w ) = gi 15.
In the initial moment t = 0, he element is in the state with maximal perform-
ance g3 = 40.
Find the element instantaneous availability, instantaneous expected output per-
formance, and average expected output performance for a fixed time interval T
and the mean instantaneous performance deficiency.
4.2 Universal Generating Function Technique 165
dp1 (t )
dt = 1,2 p1 (t ) + 2,1 p2 (t ),
dp2 (t )
= 1,2 p1 (t ) (2,1 + 2,3 ) p2 (t ) + 3,2 p3 (t ),
dt
dp3 (t )
dt = 2,3 p2 (t ) 3,2 p3 (t ).
Solving the system using the Laplace-Stieltjes transform under initial condi-
tions p1 ( 0 ) = p2 ( 0 ) = 0, p3 ( 0 ) = 1, one obtains the following probabilities:
can be represented by the following UGF associated with the elements output
performance stochastic process G ( t ) = { g1 , g 2 , g3 }:
3
U ( z , t ) = pi (t ) z gi = p1 ( t ) z 0 + p2 ( t ) z 20 + p3 ( t ) z 40 .
i =1
The MSS fails if its performance falls below the required demand w = 15. In
accordance with (4.29) the MSS instantaneous availability is
3 3
A(t ) = A (U ( z , t ),15 ) = A pi (t ) z gi ,15 = pi (t )1( F ( gi ,15) 0)
i=1 i=1
23.478 t
= p2 (t ) + p3 (t ) = 0.043e + 0.106e 9.552t + 0.937.
3 3
E (t ) = E (U ( z ) ) = E pi (t ) z gi = pi (t ) gi = 20 p2 (t ) + 40 p3 (t )
i =1 i =1
23.478t 9.552 t
= 4.047e + 4.730e + 31.223.
The MSS average expected output performance for a fixed time interval [0,T] is
obtained according to (4.33) as follows:
1
T
1 K
T
1 T T
ET = E (t )dt = gi pi (t )dt = 2
20 p ( t ) dt + 40 p3 (t )dt
T 0 T i =1 0 T 0 0
T
1
T 0
= (4.047e 23.478t + 4.730e 9.552t + 31.223) dt
1
= (0.667 0.172e 23.478T 0.495e9.552T ) + 31.223.
T
3 3
D(t ) = D (U ( z, t ),15) = D pi (t ) z gi ,15 = pi (t ) max(15 gi , 0)
i =1 i =1
23.478t 9.552 t
= 15 p1 (t ) = 0.650e 1.597e + 0.947.
4.2 Universal Generating Function Technique 167
(
( G1 , G2 ,, G j ) = ( G1 , G2 , , G j 1 ) , G j , ) (4.36)
then the operator determining the u-function of the subsystem Uj(z) for 2 j n
can be obtained as
(
U j ( z ) = u1 ( z ), u2 ( z ),..., u j ( z ) )
(4.37)
= ( )
(u1 ( z ), u2 ( z ),..., u j 1 ( z )), u j ( z ) .
Therefore one can obtain the entire system UGF assigning U1 ( z ) = u1 ( z ) and
applying operator consecutively:
(
U j ( z ) = W f U j 1 ( z ) , u j ( z ) ) (4.38)
(
( G1 ,, G j , G j +1 ,, Gn ) = ( G1 , , G j ) , ( G j +1 , , Gn ) ) (4.39)
168 4 Universal Generating Function Method
which means that one can consider any subset of the adjacent elements as a sub-
system for which its u-function can be obtained. The subset can further be treated
(Lisnianski and Levitin 2003) as a single element having this u-function. The
u-functions of the MSS with the structure functions meeting condition (4.19) can
be obtained recursively by dividing the ordered set of elements into arbitrary sub-
sets of adjacent elements, replacing these subsets with elements having
u-functions equivalent to u-functions of the subsets and further applying recur-
sively the same aggregating procedure to the reduced set of elements until obtain-
ing the UGF of the entire system (Figure 4.2).
u1 ( z ) u2 ( z ) uj (z) un ( z )
U2 ( z ) U j (z) Un ( z)
Fig. 4.2 Example of recursive derivation of u-function for an MSS meeting condition (4.16)
( G1 ,, G j , G j +1 ,, Gn ) = ( G1 ,, G j +1 , G j ,, Gn )
for any j, which provides the commutative property for the operator:
(
u1 ( z ),..., u j ( z ), u j +1 ( z ),..., un ( z ) )
(4.41)
(
= u1 ( z ),..., u j +1 ( z ), u j ( z ),..., un ( z ) )
the order of elements in the MSS has no sense and the subsystems in the recurrent
procedure described above can contain an arbitrary set of elements. This means
(Lisnianski and Levitin 2003) that any subset of the system elements u-functions
can be replaced by its equivalent u-function and further treated as a single element
(Figure 4.3).
4.2 Universal Generating Function Technique 169
u1 ( z ) uj ( z) uk ( z ) um ( z ) ue ( z ) uf ( z)
Ui ( z ) Un ( z ) Ud ( z)
U j (z) U ( z)
Fig. 4.3 Example of recursive derivation of u-function for an MSS meeting condition (4.19)
Representing the functions in the recursive form is beneficial from both the
derivation clarity and computation simplicity viewpoints. In many cases, the struc-
ture function of the entire MSS can be represented as the composition of the struc-
ture functions corresponding to some subsets of the system elements (MSS sub-
systems). The u-functions of the subsystems can be obtained separately and the
subsystems can be further treated as single equivalent elements with the perform-
ance pmf represented by these u-functions.
u7 ( z ) u4 ( z )
U1 ( z )
u2 ( z )
u3 ( z ) u1 ( z )
u6 ( z ) u5 ( z )
U3 ( z ) u8 ( z )
U4 ( z)
U (z)
Fig. 4.4 Example of recursive derivation of u-function for an MSS meeting conditions (4.19) and
(4.21)
170 4 Universal Generating Function Method
(1)
f ser (G1 ,..., Gn ) = min {G1 ,..., Gn } . . (4.42)
n n
T = T j = G j 1 . (4.43)
j =1 j =1
n
G = 1 = ( G j 1 ) 1 . (4.44)
T j =1
4.2 Universal Generating Function Technique 171
Note that if for any j G j = 0 the equation cannot be used, but it is obvious that
in this case G = 0. Therefore, one can define the structure function for the series
task processing system as
n 1 n
1 / G j , if G j 0.
(2) j =1 j =1
f ser (G1 ,..., Gn ) = n
(4.45)
0,
if G j = 0.
j =1
One can see that the structure functions presented above are associative and
commutative i.e. meet conditions (2.114) and (2.116). Therefore, the u-functions
for any series system of the described types can be obtained recursively by con-
secutively determining the u-functions of arbitrary subsets of the elements. For
example, the u-function of a system consisting of four elements connected in a se-
ries can be determined in the following ways:
( ( ) )
fser fser fser ( u1 ( z ) , u2 ( z ) ) , u3 ( z ) , u4 ( z )
= fser ( (u ( z ) , u ( z )) , (u ( z ) , u ( z )))
fser 1 2 fser 3 4
g j1
u j ( z ) = (1 p j1 ) z 0 + p j1 z , j = 1, , n.
Find the UGF U(z) for the entire MSS and steady-state reliability measures
A , D , and E as functions of constant demand level w.
Solution. To find the u-function for the entire MSS, the corresponding f ser op-
erators should be applied. For a MSS with the structure function (4.42) the system
u-function takes the form
n n
U ( z ) = f (1) ( u1 ( z ),..., un ( z ) ) = (1 p j1 ) z 0 + p j1 z min{ g11 ,..., gn1 } .
ser j =1 j =1
172 4 Universal Generating Function Method
For a MSS with the structure function (4.45) the system u-function takes the
form
n n ( g j11 )1
U ( z ) = f ( 2 ) ( u1 ( z ),..., un ( z ) ) = (1 p j1 ) z + p j1 z
0 j =1
.
ser j =1 j =1
Since the failure of each individual element causes the failure of the entire sys-
tem, the MSS can have only two states: one with the performance level of zero
(failure of at least one element) and one with the performance level
g = min { g11 ,..., g n1} for the flow transmission MSS and g = 1 / nj =1 g j11 for the
task processing MSS.
The measures of the system performance A , D = E (max( w G , 0)), and
E are presented in Table 4.4.
w A D E
n n n
w > g 0 w(1 p j1) + ( w g ) p j1 = w g p j1
j =1 j =1 j =1 n
g p j1
n n j =1
0 < w g p j1 w(1 p j1 )
j =1 j = 1
In the flow transmission MSS, in which the flow can be dispersed and transferred
by parallel channels simultaneously (which provides the work sharing), the total
capacity of a subsystem containing n independent elements connected in parallel
is equal to the sum of the capacities of the individual elements. Therefore, the
structure function for such a subsystem takes the form
4.2 Universal Generating Function Technique 173
n
(1)
f par (G1 ,..., Gn ) = G j . (4.46)
j =1
In some cases, only one channel out of n can be chosen for the flow transmis-
sion (no flow dispersion is allowed). This happens when the transmission is asso-
ciated with the consumption of certain limited resources that does not allow simul-
taneous use of more than one channel. The most effective way for such a system
to function is by choosing the channel with the greatest transmission capacity from
the set of available channels. In this case, the structure function takes the form
(1)
f par 1 (G1 ,..., Gn ) = max{G1 ,..., Gn }. (4.47)
tem processing time T is defined as the time during which the last portion of work
is completed: T = max 1 j n {x j / G j }. The minimal time of the entire work com-
pletion can be achieved if the elements share the work in proportion to their proc-
essing speed Gj: x j = xG j / nk =1 G k . The system processing time T in this case is
equal to x / nk =1 Gk and its total processing speed G is equal to the sum of the
174 4 Universal Generating Function Method
processing speeds of its elements. Therefore, the structure function of such a sys-
tem coincides with structure function (4.46).
One can see that the structure functions presented also meet conditions (4.39)
and (4.41). Therefore, the u-functions for any parallel system of the described
types can be obtained recursively by the consecutive determination of the
u-functions of arbitrary subsets of the elements.
Example 4.10 Consider a MSS consisting of two elements with total failures con-
nected in parallel. The elements have nominal performance g11 and g21 ( g11 < g 21 )
and the probability of operational state p11 and p21 respectively. Performance in
failure state is zero. Demand level w is constant.
Find MSS reliability indices such as steady-state availability A , steady-state
performance deficiency D , and steady-state expected output performance E .
(
U ( z ) = f par ( u1 ( z ), u2 ( z ) ) = f par (1 p11 ) z 0 + p11 z g11 ,(1 p21 ) z 0 + p21 z g21 , )
which for structure function (4.46) takes the form
g11 g 21 g11 + g 21
U1 ( z ) = (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 (1 p11 ) z + p11 p21 z
g11 g 21
U 2 ( z ) = (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 (1 p11 ) z + p11 p21 z max( g11 , g21 )
g11
= (1 p11 )(1 p21 ) z 0 + p11 (1 p21 ) z + p21 z g21 .
The measures of the system output performance for MSSs of both types are
presented in Tables 4.5 and 4.6.
w A D E
n
n!
U1 ( z ) = p k (1 p )n k z kg (4.48)
k =0 k !( n k )!
U 2 ( z ) = (1 p) n z 0 + (1 (1 p) n ) z g (4.49)
w A( w) D ( w) E
w > g 21 0 w p11 g11 p21 g 21 + p11 p21 g11
g11 < w g 21 p21 (1 p21 )( w g11 p11 ) p11 (1 p21 ) g11 + p21g21
4. If the resulting MSS contains more than one element, return to step 1.
The resulting u-function corresponds to the output performance of the entire
system.
Table 4.7 Structure functions for a pure series and for pure parallel subsystems
Structure Structure
No. of function function
MSS Description of MSS for series for parallel
elements elements
type f par
f ser
The choice of the structure functions used for series and parallel subsystems
depends on the type of system. Table 4.7 presents the possible combinations of
structure functions corresponding to the different types of MSS.
In order to illustrate the presented recursive algorithm we consider the follow-
ing example.
Example 4.11 Consider a series-parallel MSS consisting of seven multi-state ele-
ments presented in Figure 4.5 (a). For each element, the corresponding u-function
ui ( z ) , i = 1, 2, , 7, is given. Find the resulting u-function for the entire MSS.
Solution. First, one can find only one pure series subsystem consisting of elements
with the u-functions u2(z), u3(z), and u4(z). Calculating the u-function
U1 ( z ) = f ser ( u2 ( z ) , u3 ( z ) , u4 ( z ) ) and replacing the three elements with a sin-
gle element having the u-function U1(z), one obtains a system with the structure
presented in Figure 4.5 (b). This system contains a purely parallel subsystem con-
sisting of elements with the u-functions U1(z) and u5(z) that in their turn can be re-
placed by a single element with the u-function U 2 ( z ) = f par (U1 ( z ) , u5 ( z ) )
(Figure 4.5 (c)). The obtained structure has three elements connected in a series
that can be replaced with a single element having the u-function
U 3 ( z ) = f ser ( u1 ( z ) ,U 2 ( z ) , u6 ( z ) ) (Figure 4.5 (d)). The resulting structure con-
tains two elements connected in parallel. The u-function of this structure repre-
senting the u-function of the entire MSS is obtained as
U ( z ) = f par (U 3 ( z ) , u7 ( z ) ) .
4.2 Universal Generating Function Technique 177
The procedure described above obtains recursively the same MSS u-function
that can be obtained directly by operator (4.23) using the following structure func-
tion:
f ( G1 , G2 , G3 , G4 , G5 , G6 , G7 )
( ( ) )
= f par f ser G1 , f par ( f ser ( G2 , G3 , G4 ) , G5 ) , G6 , G7 .
The recursive procedure of obtaining the MSS u-function is not only more con-
venient than the direct one but, which is much more important, it allows one to
considerably reduce the computational burden of the algorithm. Indeed, using the
direct procedure (4.23) one has to evaluate the system structure function for each
7
combination of values of random variables G1,,G7 ( k j times). Using the re-
j =1
cursive algorithm one can take advantage of the fact that some subsystems have
the same performance rates in different states, which makes these states indistin-
guishable and reduces the total number of terms in the corresponding u-functions.
u7(z)
(a) (b)
(c) (d)
Fig. 4.5 Example of recursive determination of the MSS u-function
The bridge structure (Figure 4.6) is an example of a complex system for which the
u-function cannot be evaluated by decomposition to a series and parallel subsys-
tems. Each one of five bridge components can in their turn be a complex composi-
tion of the elements. After obtaining the equivalent u-functions of these compo-
nents one should apply Equation 4.23 in order to obtain the u-function of the
entire bridge (Levitin and Lisnianski 1998).
C
U1(z) U3(z)
A B
U5(z)
U2(z) U4(z)
D
U4(z)
ui(z)
ui+1(z) uk+1(z)
Component
...
uk(z)
Element
To evaluate the output performance of a flow transmission MSS with flow disper-
sion consider the flows through the bridge structure presented in Figure 4.6. First,
there are two parallel flows through components 1,3 and 2,4. To determine the ca-
pacities of each of the parallel substructures composed from components con-
nected in series, the function fser (4.42) should be used. The function fpar (4.46)
should be used afterwards to obtain the total capacity of the two parallel substruc-
tures. Therefore, the structure function of the bridge, which does not contain di-
agonal component, is
Now consider the performance of a flow transmission MSS without flow dis-
persion. In such a system a single path between points A and B providing the
greatest flow should be chosen. There exist four possible paths consisting of
groups of components (1,3), (2,4), (1,5,4), and (2,5,3) connected in a series. The
transmission capacity of each path is equal to the minimum transmission capacity
180 4 Universal Generating Function Method
of the elements belonging to this path. Therefore, the structure function of the en-
tire bridge takes the form
f bridge ( G1 , G2 , G3 , G4 , G5 )
(4.52)
= max {min {G1 , G3 } , min {G2 , G4 } , min {G1 , G5 , G4 } , min {G2 , G5 , G3 }} .
Note that the four parallel subsystems (paths) are not statistically independent
since some of them contain the same elements. Therefore, the bridge u-function
cannot be obtained by system decomposition as for the series-parallel systems. In-
stead, one has to evaluate structure function (4.37) for each combination of states
of the five independent components.
G = f ( G1 , G2 , G3 , G4 , G5 )
(4.53)
= max { ( G1 , G3 ) , (G2 , G4 ), ( G1 , G4 , G5 ) , ( G2 , G3 , G5 )} ,
where
G j Gi
, if G j Gi 0,
(G j , Gi ) = (G j + Gi )
0, if G j Gi = 0,
G j Gi Gm
, if G j Gi Gm 0,
(G j , Gi , Gm ) = (G j Gi + Gi Gm + G j Gm )
0, if G j Gi Gm = 0.
4.2 Universal Generating Function Technique 181
Now consider a system with work sharing for which the same three assump-
tions that were made for the parallel system with work sharing (Section 4.2.5) are
made. There are two stages of work performance in the bridge structure. The first
stage is performed by components 1 and 2 and the second stage is performed by
components 3 and 4. The fifth component is necessary to transfer work between
nodes C and D. Following these assumptions, the decision about work sharing can
be made in the nodes of bridge A, C, or D only when the entire amount of work is
available in this node. This means that component 3 or 4 cannot start task process-
ing before both components 1 and 2 have completed their tasks and all of the work
has been gathered at node C or D.
There are two ways to complete the first stage of processing in the bridge struc-
ture depending on the node in which the completed work is gathered. To complete
it in node C, an amount of work of (1 ) x should be performed by component 1
with processing speed G1 and an amount of work of x should be performed by
component 2 with processing speed G2 and then transferred from node D to node
C with speed G5 ( is the work sharing coefficient). The time the work performed
by component 1 appears at node C is t1 = (1 ) x G1 . The time the work per-
formed by component 2 and transferred by component 5 appears at node C is
t2 + t5 , where t2 = x G2 , t5 = x G5 . The total time of the first stage of process-
ing is T1C = max {t1 , t2 + t5 } . It can be easily seen that TC is minimized when the
is chosen that provides equality t1 = t2 + t5 . The work sharing coefficient obtained
from this equality is = G2 G5 ( G1G2 + G1G5 + G2 G5 ) and the minimal processing
time is
Using the same technique we can obtain the minimal processing time when the
second stage of processing starts from node D:
Assuming that the optimal way of performing work can be chosen in node A,
we obtain the total bridge processing time as min {T1C + T2C , T1D + T2D } , where
( G + G5 ) ( G + G5 ) ,
T1C + T2C = x 2 + 4
( G + G5 ) ( G + G5 ) ,
T1D + T2D = x 1 + 3
= G1G2 + G1G5 + G2 G5 ,
= G3G4 + G3G5 + G4 G5 .
G = f ( G1 , G2 , G3 , G4 , G5 ) = , (4.54)
( a + G5 ) + ( e + G5 )
where
a = G4 , e = G2 if ( G2 G1 ) ( G3 G4 ) ,
a = G3 , e = G1 if (G2 G1 ) > ( G3 G4 ) .
Methods for evaluating the relative influence of elements reliability on the reli-
ability or availability of the entire system provide useful information about the
importance of these elements. Importance evaluation is an essential point in trac-
ing bottlenecks in systems and in the identification of the most important ele-
ments. It is a useful tool to help the analyst find weaknesses in design and to sug-
gest modifications for system upgrade. Importance was first introduced by
Birnbaum (1969). This index characterizes the rate at which the system reliability
changes with respect to changes in the reliability of a given element. An im-
provement in the reliability of the element with the highest importance causes the
greatest increase in system reliability. Several other measures of elements and
minimal cut set importance in coherent systems were developed by Barlow and
Prochan (1974, 1975) and Fussell (1975). Useful information about it can be
found in Ryabinin (1976).
The above importance measures have been defined for coherent binary-state
systems where elements can only have two states: total failure and perfect func-
tioning without any performance considerations.
In MSS, the failure effect will be essentially different for elements with differ-
ent nominal performance rates. Therefore, the performance rates of system ele-
ments should be taken into account when their importance is estimated. Some ex-
tensions of importance measures for coherent MSS were suggested by Butler
(1979), Griffith (1980), Bosche (1987), Ramirez-Marquez and Coit (2005), Zio
and Podofillini (2003).
The entire MSS availability is a complex function of demand w, which is an
additional factor having a strong impact on elements importance in MSSs. Reli-
ability of a certain element may be very important for one demand level and less
important for another.
For the complex system structure, where there can be a large number of de-
mand levels, the importance evaluation for each element is a difficult problem
when the straightforward Boolean or Markov approaches are used because of a
great number of logical functions for the top-event description (when one uses the
logic methods) and a great number of states (when the Markov technique is used).
The method for the Birnbaum importance calculation, based on the USF tech-
nique, is much simpler. It uses the same system description for complex MSSs
with a different physical nature of performance and takes the demand into ac-
count. The method can be easily extended for the sensitivity analysis of additional
system output performance measures, considered in Section 4.2.2.
Here we consider a system in steady state, and therefore the natural generaliza-
tion of Birnbaum importance for MSS is the rate at which the MSS availability in-
dex changes with respect to changes in the availability of a given element j. For
184 4 Universal Generating Function Method
I A( ji ) ( w) = A( w) / p ji , (4.55)
where pji is the probability that jth element will be in the given state i (with per-
formance rate gji), and A(w) is the steady-state availability of the entire MSS. In
other words, the Birnbaum importance extension in a MSS context characterizes
the influence of changing probabilities pji on the entire MSS availability. Evaluat-
ing MSS reliability/availability indices using UGF was already considered in Sec-
tion 4.2.2 of the book. Based on the previously obtained resulting UGF
K
U ( z ) = pi z gi of the entire MSS, steady-state availability A(w) can be obtained
i =1
for any constant demand w using expression (4.29):
K
A( w) = A {U ( z ), w} = pi 1( F ( gi , w) 0), (4.56)
i=1
M
I A( ji ) (w , q) = qm I A( ji ) ( wm ). (4.57)
m =1
In a similar manner, one can obtain the sensitivity of the steady-state expected
MSS output performance to the availability of the given element j at given per-
formance level i as
E
I E( ji ) = , (4.58)
p ji
K
E = lim E (t ) = pi gi . (4.59)
t
i =1
4.3 Importance and Sensitivity Analysis Using Universal Generating Function 185
Note that this sensitivity index I E( ji ) does not depend on demand level w.
The sensitivity of the expected steady-state MSS performance deficiency to the
availability of the given element j at given performance level g ji for a single con-
stant demand w is defined as the follows
D ( w)
I D( ji ) ( w) = ,, (4.60)
p ji
K
D ( w) = lim D(t , w) = pi max( w gi , 0) . (4.61)
t i =1
M
I D( ji ) (w , q ) = qm I D( ji ) ( wm ) . (4.62)
m =1
(1)
G par = f par (G1 , G2 ) = G1 + G2 .
186 4 Universal Generating Function Method
Solution. Based on the given probability distributions for elements output per-
formance, we can write individual UGFs:
for element 1 u1 ( z ) = p11 z 0 + p12 z g12 + p13 z g13 ,
for element 2 u2 ( z ) = p21 z 0 + p22 z g22 .
By using UGO we obtain the UGF for the entire MSS (UGF corresponding to
the entire MSS output performance Gpar):
{
U ( z ) = f (1) {u1 ( z ), u2 ( z )} = f (1) p11 z 0 + p12 z g12 + p13 z g13 , p21 z 0 + p22 z g22
par par
}
= p11 p21 z 0 + p12 p21 z g12 + p13 p21 z g13 + p11 p22 z g 22 + p12 p22 z g12 + g 22 + p13 p22 z g13 + g22 .
A( w1 = 1.0) = A {U ( z ), w1} = p13 p21 + p22 p11 + p12 p22 + p13 p22 ,
A( w2 = 1.5) = A {U ( z ), w2 } = p12 p22 + p13 p22 ,
A( w3 = 2.0) = A {U ( z ), w3 } = p13 p22 .
4.3 Importance and Sensitivity Analysis Using Universal Generating Function 187
Example 4.13 Consider a MSS consisting of n elements with only total failures
connected in the series described in Example 4.9.
Find the importance and sensitivity measures
I A( j1) ( w) and I D( j1) ( w) should be found as functions of constant demand level w for
flow transmission MSS and for task processing MSS.
Solution. In Example 4.9 we calculated the reliability measures of the system that
were presented in Table 4.2. The corresponding importance and sensitivity indices
can be obtained analytically by differentiating these measures according to (4.55),
(4.60), and (4.61). The indices are presented in Table 4.5.
Recall that g = min{ g11 ,..., g n1} for the flow transmission MSS and
g = 1 / nj =1 g j11 for the task processing MSS.
One can see that the element with the minimal availability has the greatest im-
pact on the entire MSS availability. (A chain fails at its weakest link.) The index
I A( j ) in this example does not depend on element performance rates and on de-
mand. Indices I E( j ) and I D( j ) also do not depend on the performance rate of individ-
ual element j, but the performance rate gj can influence these indices if it affects
the entire MSS performance g .
188 4 Universal Generating Function Method
w I A( j ) I D( j ) I E( j )
n n
p p
w > g 0 g i1 g i1
i =1
p j1 p j1
i =1
n n
pi1 pi1
0 < w g p j1
w
p j1
i =1 i =1
One can find more examples in Levitin and Lisnianski (1999), Lisnianski and
Levitin (2003), and Levitin (2005).
tem is known from the states of its n components. So, in accordance with Montero
n
et al. (1990) a CSF is a mapping f : [ 0,1] [ 0,1] , where f ( X 1 ,..., X n ) repre-
sents the performance of the system when each component i works at performance
level X i . Such a system is called as a continuous-state system (CSS). We shall as-
sume that f is monotonic, i.e., f ( X ) f (Y ) , if X Y in the sense that
4.4 Estimating Boundary Points for Continuous-state System Reliability Measures 189
where Wdem is some specified demand level. As was observed in Brunelle and Ka-
pur (1998), for many real-world CSSs there would be a nonzero probability of be-
ing in state 0, and thus the distribution for its state would be mixed (continuous
and discrete). This case is practically important and sometimes is treated as com-
posite performance and reliability evaluation (Trivedi et al. 1992).
Reliability evaluation for CSS in practice is a very difficult problem and often
requires enormous efforts, even for a sufficiently simple system (Aven 1993).
Thus, one of the most important problems in this field is to develop engineering
method for CSS reliability assessment. It will be suitable for engineers if the
method is based on formalized procedure for finding out the relationship between
the characteristics of an entire complex system (entire CSS performance distribu-
tion) and the characteristics (performance distributions) of its components. Using
by method one can find the entire CSS performance distribution based only on a
CSS logic diagram and individual component performance distributions. Such a
method exists for finite MSS and based on a UGF technique. Hence, if a finite
MSS can approach CSS, this effective technique can be applied. Here we consider
the method presented in Lisnianski (2001), which is based on discrete approxima-
tion of a given CSS by two corresponding MSSs in order to compute lower and
upper bounds of CSS reliability measures. The main advantage of the method is
that it is based solely on system logic diagrams, does not require the building a
CSS structure function and allows one to calculate a boundary point estimation for
CSS reliability measures with preliminary specified accuracy.
As pointed out in previous section, finite MSSs can be considered in order to get
useful approaches for an arbitrary, monotonic, continuum-state system. Without
loss of generality we will consider the interval [0,Xmax], where the system and
component performances take their values. A discrete approximation will be de-
fined by successive partitions of the interval [0,Xmax].
Suppose that performance Xi of the ith system component has the cumulative
distribution function Fi ( x) = Pr{ X i x}. Designate as Nint the number of inter-
190 4 Universal Generating Function Method
vals, which partition the main interval [0, Xmax ]. Hence the length of one interval
x will be follows:
X max
x = . (4.63)
N int
The lower (upper) bound approximation for component i with continuous per-
formance CDF Fi (x ) will be represented by the component whose performance
is distributed according to the following piecewise CDF Filow(x) (Fiupp(x)) respec-
tively (Table 4.9).
In Figure 4.8 one can see these CDFs. According to the definitions of Fiupp ( x)
and Filow ( x) (Table 4.9) we can write
Table 4.9 Lower Filow ( x) and upper Fiupp ( x) bound piecewise approximation for component per-
formance Fi ( x)
Xi Filow ( x) Fiupp ( x)
0, x ) Fi ( x) 0
x, 2x ) Fi (2x) Fi ( x)
( N int 1) x, Nint x
) Fi ( Nint x) Fi ( ( Nint 1)x )
Nint x 1 1
The lower and upper bounds are mentioned here in the sense of bounds for CSS
reliability measures, not as bounds for a function Fi ( x). Reliability measures for
continuum-state systems were studied in Brunelle and Kapur (1998, 1999). From
the set of CSS reliability measures we will use here the two following important
and practical measures: (1) mean CSS performance E and (2) CSS mean unsup-
plied demand D(Wdem). Examples of the second measure are the unsupplied power
in power systems and expected output tardiness in information processing sys-
tems.
4.4 Estimating Boundary Points for Continuous-state System Reliability Measures 191
By using the definition of the Stieltjes integral (Gnedenko 1988, Gnedenko and
Ushakov 1995), the mean performance for components with SDF
Fi ( x), where x [0, X max ], can be written as
X i max
Ei = xdFi ( x). (4.65)
0
F(x)
1
Filow (x)
Fi (x)
Fiupp (x)
0
x=Wdem Xmax x
SE SD
Component performance
Fig. 4.8 Lower and upper piecewise approximation for component i performance distribution
Fi ( x)
The curve Fiupp(x) is lower than (or equal to) the curve Fi ( x), hence the square
SEupp and the corresponding mean Eiupp ( x) that are calculated for the CDF Fiupp(x)
will be greater than the mean performance of component i with performance
CDF Fi ( x). Thus, the mean Eiupp ( x) characterizes the upper bound of mean per-
formance Ei ( x) of continuous-state component i. The curve Filow(x) in Figure 4.8 is
greater then (or equal to) the curve Fi ( x), hence the mean Eilow ( x) that is calcu-
192 4 Universal Generating Function Method
lated for the CDF Filow(x) characterizes the lower bound of mean perform-
ance E i (x ) of continuous-state component i.
The mean unsupplied demand for continuous-state component i can be treated
as a mean of the following random value X i :
X i max
Di =
0
xdFX i ( x). (4.67)
(d ) (d )
Table 4.10 Mass functions Fiupp ( x) and Filow ( x) for upper and lower boundary points
0 0 Fi ( x)
x Fi ( x) Fi (2x) Fi (x)
Components with discrete and piecewise distributions have the same values of
the corresponding reliability measures. The above lower and upper bounds for
CSS reliability measures can be computed with a desired level of accuracy by de-
creasing the step x.
Thus, by using the above-considered approach, CSS can be represented by two
finite MSSs: MSSupp and MSSlow for upper and lower boundary point calculation
of CSS reliability measures, respectively. These MSSs have the same structure as
CSSs. In MSSupp any continuous-state component i must be represented by a dis-
(d )
crete distribution with a corresponding mass function Fiupp ( x). In MSSlow any con-
tinuous-state component i must be represented by a discrete distribution with a
(d )
corresponding mass function Filow ( x). In this case the UGF technique, which
proved to be very effective for finite MSS reliability assessment, can be applied.
Using UGF technique boundary points for CSSs, reliability measures may be es-
timated according to the following algorithm.
1. Based on the performance CDF Fi(x) for every component i, the individual
u-functions for upper and lower boundary points of component is reliability
measures must be found. For upper bounds, according to Table 4.8, we will
have for every component i
N int 1
ui( u ) ( z ) = 0 * z 0 + Fi (x) * z x + [ F ( j x) F (( j 1)x)]* z
j=2
i i
j x
(4.68)
N int x
+ [1 Fi ( Nint x x)]* z .
194 4 Universal Generating Function Method
N int 1
ui(l ) ( z ) = Fi x * z 0 + [ F ( j x) F (( j 1)x)] * z
j =2
i i
j x
(4.69)
+ [1 Fi ( N int x)] * z Nint x
.
dU S(u ) ( z )
E (u ) = |z =1 , (4.70)
dz
(
D (u ) = D U S(u ) ( z ),Wdem . ) (4.71)
dU S(l ) ( z )
E (l ) = |z =1 , (4.72)
dz
(
D (l ) = D U S(l ) ( z ),Wdem . ) (4.73)
Ai FXi ( x ) , if x > 0,
Fi ( x ) = 1 Ai , if x = 0,
0, if x < 0.
( u1)
u(1,2) ( z ) = f (1) u1(u ) ( z ), u2( u ) ( z ) ;
par
( u 2)
u(1,2) ( z ) = f (1) u1( u ) ( z ), u2( u ) ( z ) ;
par 1
( l 1)
u(1,2) ( z ) = f (1) u1(l ) ( z ), u2( l ) ( z ) ;
par
( l 2)
u(1,2) ( z ) = f (1) u1( l ) ( z ), u2(l ) ( z ) .
par 1
(1) (1)
Here structure functions f par and f par 1 are defined by expressions (4.46) and
(4.47), respectively.
The u-functions for the entire system will be as follows:
for upper bounds (system of type 1):
U S( u 2) ( z ) = f ( 2 ) u1,2
( u 2)
( z ), u3( u ) ( z ) ;
ser
U S( l 2) ( z ) = f ( 2 ) u1,2
( l 2)
( z ), u3(l ) ( z ) .
ser
4.4 Estimating Boundary Points for Continuous-state System Reliability Measures 197
3. To obtain the lower and upper bounds for CSS mean output performance E and
expected unsupplied demand D expressions (4.70)(4.73) are used.
For upper bounds: For lower bounds:
dU ( ) ( z ) dU ( ) ( z )
uj lj
E( E( ) =
uj ) lj
= |z =1 , j = 1, 2; | z =1 , j = 1, 2;
dz dz
uj uj
(
D( ) = D U ( ) ( z ),Wdem , j = 1, 2; ) ( )
D ( ) = D U ( ) ( z ), Wdem , j = 1, 2.
lj lj
In Figures 4.10 and 4.11 one can see the upper and lower boundary points for
the CSS mean output performance and mean unsupplied demand for systems of
type 1 and type 2 as functions of step x value. One can see that the difference be-
tween the upper and lower bounds decreases as step x decreases.
55
50
Mean output performance
45
25
20
15
2 4 6 8 10
Step
Using these lower and upper bounds one can estimate CSS reliability measures.
For a system of type 1 the maximal relative error will be as follows
( x = 1):
for
Analogously, for a system of type 2 the maximal relative error will be as fol-
lows ( for x = 1):
24.23 23.83 3.45 3.32
errE(2) = = 0.017 and errD(2) = = 0.039.
23.83 3.32
-2.6
-2.8
Mean unsupplied demand
-3
-3.2
-3.4
-3.6
References
Aven T (1993) On performance measures for multistate monotone systems. Reliab Eng Syst Saf
41(3):259266
Barlow R., Prochan F (1974). Importance of system components and fault tree analysis, Opera-
tion Research Center, vol. 3, University of California, Berkeley
Barlow R., Prochan F (1975). Importance of system components and fault tree analysis. Stochas-
tic Processes and their Applications 3(2):153173
Baxter LA (1984) Continuum structures. I. J Appl Probab 21:802815
References 199
Baxter LA (1986) Continuum structures II. In: Mathematical Proc of the Cambridge Philosophi-
cal Society 99:331338
Baxter LA, Kim C (1986) Bounding the stochastic performance of continuum structure func-
tions. J Appl Probab 23:660669
Birnbaum L (1969) On the importance of different components in a multi-component system. In:
Krishnaiah PR (ed) Multivariate Analysis II, Academic, New York, pp 581592
Block HW, Savits TH (1984) Continuous multi-state structure functions. Oper Res 32:703714.
Bosche A (1987) Calculation of critical importance for multi-state components. IEEE Trans Re-
liab, R-36:247249
Brunelle R, Kapur KC (1998) Continuous-state system reliability: an interpolation approach.
IEEE Trans Reliab 47:181-187
Butler DA (1979) A complete importance ranking for component of binary coherent systems
with extensions to multi-state systems. Nav Res Logist Quart 26:565-578
Chakravarty S, Ushakov I (2000) Effectiveness analysis of GlobalstarTM gateways. In: Proceed-
ings of the 2nd International Conference on Mathematical Methods in Reliability
(MMR2000). Bordeaux, France, vol 1
Elmakias D (2008) New computational methods in power system reliability. Springer, London
Fussel JB (1975) How to hand-calculate system reliability and safety characteristics. IEEE Trans
Reliab R-24 (3):168174
Gnedenko B (1969) Mathematical Methods of Reliability Theory. Academic, Boston
Gnedenko B (1988) Course of probability theory. Nauka, Moscow (in Russian)
Gnedenko B, Ushakov I (1995) Probabilistic reliability engineering. Wiley, New York
Griffith WS (1980) Multi-state reliability models. J Appl Prob 17:735744
Grimmett G, Stirzaker D (1992) Probability and random processes. Clarendon, Oxford
Korczak E (2007) New formulae for failure/repair frequency of multi-state monotone systems
and its applications. Control Cybern 36(1):219239
Korczak E (2008) Calculating steady state reliability indices of multi-state systems using dual
number algebra. In: Martorell et al (eds) Safety, Reliability and Risk Analysis: Theory, Meth-
ods and Applications. Proceedings of the European safety and reliability conference
(ESREL2008). Valencia, Spain, pp17951802
Levitin G (2005) Universal generating function in reliability analysis and optimization. Springer,
London
Levitin G (2008) Optimal structure of multi-state systems with uncovered failures. IEEE
Trans Reliab 57 (1):140148
Levitin G, Amari S (2009) Optimal load distribution in series-parallel systems. Reliab Eng
Syst Saf 94:254260
Levitin G, Dai Y, Ben-Haim H (2006) Reliability and performance of star topology grid
service with precedence constraints on subtask execution. IEEE Trans Reliab 55(3):507
515
Levitin G, Lisnianski A (1998) Structure optimization of power system with bridge topology.
Electr Power Syst Res 45:201208
Levitin G, Lisnianski A (1999) Importance and sensitivity analysis of multi-state systems using
the universal generating function method. Reliab Eng Syst Saf 65:271282
Levitin G, Lisnianski A, Ben Haim H et al (1998) Redundancy optimization for series-parallel
multi-state systems. IEEE Trans Reliab 47:165172
Lisnianski A (2001). Estimation of boundary points for continuum-state system reliability meas-
ures. Reliab Eng Syst Saf 74:8188
Lisnianski A (2004a) Universal generating function technique and random process methods for
multi-state system reliability analysis. In: Proceedings of the 2nd International Workshop in
Applied Probability (IWAP2004). Piraeus, Greece, pp 237242
Lisnianski A (2004b) Combined generating function and semi-Markov process technique for
multi-state system reliability evaluation. In: Communications of the 4th International Confer-
200 4 Universal Generating Function Method
As was described in Chapter 2, stochastic process methods are very effective tools
for MSS reliability evaluation. According to these methods a state-space diagram
of a MSS should be built and transitions between all the states defined. Then a
system evolution should be represented by a continuous-time discrete-state sto-
chastic process. Based on this process all MSS reliability measures can be evalu-
ated.
The main disadvantage of stochastic process models for MSS reliability evalua-
tion is that they are very difficult for application to real-world MSSs consisting of
many elements with different performance levels. This is so-called the dimension
curse. First, state-space diagram building or model construction for such complex
MSSs is not a simple job. It is a difficult nonformalized process that may cause
numerous mistakes even for relatively small MSSs. The problem of identifying all
the states and transitions correctly is a very difficult task. Second, solving models
with hundreds of states can challenge the available computer resources. For MSSs
consisting of n different repairable elements where every element j has kj different
n
performance levels one will have a model with K = k j states. This number can
j =1
In the general case, any element j in a MSS can have kj different states corre-
{
sponding to different performance, represented by the set g j = g j1 ,..., g jk j , }
where gji is the performance rate of element j in state i, i 1, 2,..., k j .{ }
In the first stage, according to the suggested method, a model of a stochastic
process should be built for each multi-state element in the MSS. Based on this
model state probabilities
p ji (t ) = Pr{G j (t ) = g ji }, i {1,..., k j }
for every MSS element j {1,..., n} can be obtained. These probabilities define
the output stochastic process Gj(t) for each element j in the MSS.
At the next stage the output performance distribution for the entire MSS at each
time instant t should be defined based on previously determined state probabilities
for all elements and system structure functions. At this stage the UGF technique
provides simple procedure that is based only on an algebraic operation.
Without loss of generality here we consider a multi-state element with minor
failures and repairs. With each state i there is associated performance gji of ele-
ment j. The states are ordered so that g j ,i +1 g ji for any i. Minor failures and re-
pairs cause element transitions from state i, where 1 i k j , only to the adjacent
states i 1 and i + 1, respectively. The transition will be to state i 1 from state i
if failure occurs in state i and to the state i + 1 if the repair is finished. In state kj
5.1 Method Description 203
may be only failure and transition to the state k j 1 and in state 1 while may be
only repair and transition to state 2.
If all times to failures and repair times are exponentially distributed, the perform-
ance stochastic process will have a Markov property and can be represented by a
Markov model. Here for simplicity we omit index j and assume that the element
has k different states as presented in Figure 5.1. For a Markov process each transi-
tion from state s to any state m ( s, m = 1, , k ) has its own associated transition
intensity that will be designated as asm. In our case any transition is caused by an
elements failure or repair. If m < s, then asm = sm , where sm is the failure rate
for the failures that cause element transition from state s to state m. If m > s, then
asm = sm , where sm is the corresponding repair rate. The corresponding per-
formance gs is associated with each state s.
k-1,k k,k-1
k-1
k-2,k-1 k-1,k-2
... ...
2,3 3,2
2
1,2 2,1
Fig. 5.1 State-transition diagram for Markov model of repairable multi-state element
ps (t ) = Pr{G (t ) = g s }, s = 1,, k ; t 0.
204 5 Combined Universal Generating Function and Stochastic Process Method
dps (t ) k k
= pi (t )ais ps (t ) asi . (5.1)
dt i =1 i =1
i s is
In our case all transitions are caused by the elements failures and repairs.
Thus, the corresponding transition intensities ais are expressed by the elements
failure and repair rates. Therefore, the corresponding system of differential equa-
tions may be written as
dp1 (t )
= 12 p1 (t ) + 21 p2 (t ),
dt
dp2 (t )
= 12 p1 (t ) (21 + 23 ) p2 (t ) + 32 p3 (t ),
dt (5.2)
...
dpk (t )
= k 1, k pk 1 (t ) k , k 1 pk (t ).
dt
We assume that the initial state will be state k with the best performance.
Therefore, by solving system (5.2) of differential equations under initial condi-
tions pk ( 0 ) = 1, pk 1 ( 0 ) = = p2 ( 0 ) = p1 ( 0 ) = 0, the state probabilities
ps ( t ) , s = 1,, k can be obtained.
element Qlm (t ) of this matrix determines the probability that transition from state l
to state m will occur during the time interval [0, t].
Fig. 5.2 State-transition diagram for semi-Markov model of repairable multi-state element
0 Q12 (t ) 0 0 ... 0 0
Q21 (t ) 0 Q23 (t ) 0 ... 0 0
Qlm (t ) = , (5.3)
... ... ... ... ... ... ...
0 0 0 0 ... Qk , k 1 (t ) 0
where
Qk , k 1 (t ) = Fk , k 1 (t ) . (5.7)
206 5 Combined Universal Generating Function and Stochastic Process Method
Kernel matrix (5.3) and the initial state k (with the best performance) com-
pletely define the semi-Markov process, which describes the stochastic behavior
of a multi-state element.
For every element we denote by lm (t ) the probability that a semi-Markov sto-
chastic process that starts from initial state l at instant t = 0 will be in state m at in-
stant t. Probabilities lm (t ) , l , m = 1, 2,, k can be found from the solution of the
following system of integral equations:
k k t
lm (t ) = lm [1 Qlm (t )] + qls ( ) sm (t )d , l , m = 1,, k , (5.8)
m =1 s =1 0
where
dQis ( )
qis ( ) =
d
and
1, if l = m,
lm =
0, if l m.
We assume that the process always starts from state k (the best state). Hence
the state probabilities of a multi-state element, which should be defined based on
the solution of the system of integral equations (5.8), are as follows:
pk (t ) = k , k (t ), pk 1 (t ) = k , k 1 (t ),..., p1 (t ) = k1 (t ) . (5.9)
u j ( z , t ) = p j1 ( t ) z + p j2 (t ) z ++ p jk j ( t ) z
g j1 gj2 g jk j
. (5.10)
2. The composition operators fser (for elements connected in series), fpar (for
elements connected in parallel), and fbridge (for elements connected in a bridge
structure) should be applied to the UGF of individual elements and their com-
binations. These operators were defined in the previous chapter where corre-
sponding recursive procedures for their computation were introduced for dif-
ferent types of systems. Based on these procedures the resulting UGF for the
entire MSS can be obtained:
K
U ( z , t ) = pi (t ) z gi , (5.11)
i =1
where K is the number of entire system states and gi is the entire system per-
formance in the corresponding state i, i = 1, , K .
K K
A(t ) = A (U ( z , t ), w) = A ( pi (t ) z gi , w) = pi (t )1( gi w 0); (5.12)
i=1 i=1
K K
E (t ) = E (U ( z , t )) = E ( pi (t ) z gi ) = pi (t ) gi ; (5.13)
i =1 i =1
K K
D (t ) = D (U ( z , t ), w) = D ( pi (t ) z gi , w) = pi (t ) max( w gi , 0); (5.14)
i =1 i =1
T K T
D T = D(t , w)dt = max( w gi ) pi (t )dt . (5.15)
i =1
0 0
5.1 Method Description 209
Example 5.1 Consider the low transmission system presented in Figure 5.3.
2,1
(1)
1,2
(1)
3,2
(3)
2,1
(3)
u1 ( z , t ) = p11 ( t ) z + p12 ( t ) z
0 1.5
2,3
(3)
1,2
(3)
2,1
(2)
u3 ( z , t ) = p31 ( t ) z 0 + p32 ( t ) z1.8 + p33 ( t ) z 4
1,2
(2)
u2 ( z , t ) = p21 ( t ) z 0 + p22 ( t ) z 2
The system consists of three elements (pipes). The oil flow is transmitted from
left to right. The performance of the pipes is measured by their transmission ca-
pacity (tons per minute). Times to failures and times to repairs are distributed ex-
ponentially for all elements. Elements 1 and 2 are repairable and each one has two
possible states. A state of total failure for both elements corresponds to a transmis-
sion capacity of 0 and the operational state corresponds to capacities of 1.5 and 2
tons/min, respectively, so that
The failure rates and repair rates corresponding to these two elements are
2,1
(1)
= 7 year 1 , 1,2
(1)
= 100 year 1 for element 1,
2,1
( 2)
= 10 year 1 , 1,2
(2)
= 80 year 1 for element 2.
Element 3 is a multi-state element with minor failures and minor repairs. It can
be in one of three states: a state of total failure corresponding to a capacity of 0, a
state of partial failure corresponding to a capacity of 1.8 tons/min and a fully op-
erational state with a capacity of 4 tons/min. Therefore,
3,2
(3)
= 10 year 1 , 2,1
(3)
= 7 year 1 ,
1,2
(3)
= 120 year 1 , 2,3
(3)
= 110 year 1 .
Gs ( t ) = f ( G1 ( t ) , G2 ( t ) , G3 ( t ) ) = min {G1 ( t ) + G2 ( t ) , G3 ( t )} .
dp11 (t ) / dt = 1,2
(1)
p11 (t ) + 2,1
(1)
p12 (t ),
dp12 (t ) / dt = 2,1 p12 (t ) + 1,2 p11 (t ).
(1) (1)
dp21 (t ) / dt = 1,2
(2)
p21 (t ) + 2,1
(2)
p22 (t ),
dp22 (t ) / dt = 2,1 p22 (t ) + 1,2 p21 (t ).
(2) (2)
dp31 (t ) / dt = 1,2
(3)
p31 (t ) + 2,1
(3)
p32 (t ),
dp32 (t ) / dt = 3,2 p33 (t ) (2,1 + 2,3 ) p32 (t ) + 1,2 p31 (t ),
(3) (3) (3) (3)
dp33 (t ) / dt = 3,2 p33 (t ) + 2,3 p32 (t ).
(3) (3)
5.1 Method Description 211
2,1
(1)
2,1
(1)
( (1) (1)
2,1 + 1,2 ) t
p11 (t ) = e ,
1,2
(1)
+ 2,1
(1)
1,2
(1)
+ 2,1
(1)
1,2
(1)
2,1
(1)
( (1) (1)
2,1 + 1,2 ) t
p12 (t ) = + e .
1,2 + 2,1 1,2 + 2,1
(1) (1) (1) (1)
For element 2:
2,1
( 2)
2,1
(2)
( (2) (2)
2,1 + 1,2 ) t
p21 (t ) = e ,
1,2 + 2,1 1,2 + 2,1
(2) (2) (2) ( 2)
1,2
(2)
2,1
(2)
( (2) (2)
2,1 + 1,2 ) t
p22 (t ) = + e .
1,2
(2)
+ 2,1
( 2)
1,2
(2)
+ 2,1
(2)
For element 3:
p31 (t ) = A1e t + A2 e t + A3 ,
p32 (t ) = B1e t + B2 e t + B3 ,
p33 (t ) = C1e t + C2 e t + C3 ,
where
= / 2 + 2 / 4 , = / 2 2 / 4 ,
2,1 3,2
(3) (3)
(3) (3) (3) (3)
A1 = , A2 = 2,1 3,2 , A3 = 2,1 3,2 ,
( ) ( )
( 1,2
(3)
+ )3,2
(3)
( 1,2
(3)
+ )3,2
(3)
1,2 3,2
(3) (3)
B1 = , B2 = , B3 = ,
( ) ( )
( (3)
1,2 + ) (3)
3,2
(3)
2,3 ( 1,2
(3)
+ )3,2 2,3
(3) (3)
C1 = , C2 = ,
( )( + ) (3)
3,2 ( )( + 3,2
(3)
)
1,2 2,3 ( + 3,2
(3) (3) (3)
(3,2
(3)
))
C3 = ,
( + 3,2 )( + 3,2 )
(3) (3)
= 2,1
(3)
+ 3,2
(3)
+ 1,2
(3)
+ 2,3
(3)
, = 2,1 3,2 + 1,2
(3) (3)
2,3 + 1,2
(3) (3)
3,2 .
(3) (3)
212 5 Combined Universal Generating Function and Stochastic Process Method
2. Having the sets gj, pj(t) for j = 1, 2, 3 one can define for each individual ele-
ment j the u-function associated with the elements output performance sto-
chastic process:
These u-functions are also presented in Figure 5.3 under corresponding ele-
ments.
3. Using the composition operators f (1) and f (1) for a flow transmission MSS
ser par
with flow dispersion (Section 4.2) one obtains the resulting UGF for the entire
series-parallel MSS:
In order to find the resulting UGF U12(z,t) for elements 1 and 2 connected in
parallel, operator f (1) is applied to individual UGF u1(z,t) and u2(z,t).
par
U12 ( z , t ) = f (1) ( u1 ( z , t ) , u2 ( z , t ) )
par
(
= f (1) p11 ( t ) z 0 + p12 ( t ) z1.5 , p21 ( t ) z 0 + p22 ( t ) z 2
par
)
= p11 ( t ) p21 ( t ) z 0 + p12 ( t ) p21 ( t ) z1.5 + p11 ( t ) p22 ( t ) z 2 + p12 ( t ) p22 ( t ) z 3.5 .
In the resulting UGF U12(z,t) the powers of z are found as the sum of powers of
the corresponding terms.
In order to find the UGF for the entire MSS, where element 3 connected in se-
ries with two elements 1 and 2 that connected in parallel, operator f (1) should be
ser
applied:
5.1 Method Description 213
(
U ( z , t ) = f (1) f (1) ( u1 ( z , t ) , u2 ( z , t ) ) , u3 ( z , t )
ser par
)
(
= f (1) p31 ( t ) z 0 + p32 ( t ) z1.8 + p33 ( t ) z 4 ,
ser
p11 ( t ) p21 ( t ) z 0 + p12 ( t ) p21 ( t ) z1.5 + p11 ( t ) p22 ( t ) z 2 + p12 ( t ) p22 ( t ) z 3.5 )
= p31 ( t ) p11 ( t ) p21 ( t ) z 0 + p31 ( t ) p12 ( t ) p21 ( t ) z 0 + p31 ( t ) p11 ( t ) p22 ( t ) z 0
+ p31 ( t ) p12 ( t ) p22 ( t ) z 0 + p32 ( t ) p11 ( t ) p21 ( t ) z 0 + p32 ( t ) p12 ( t ) p21 ( t ) z1.5
+ p32 ( t ) p11 ( t ) p22 ( t ) z1.8 + p32 ( t ) p12 ( t ) p22 ( t ) z1.8 + p33 ( t ) p11 ( t ) p21 ( t ) z 0
+ p33 ( t ) p12 ( t ) p21 ( t ) z1.5 + p33 ( t ) p11 ( t ) p22 ( t ) z 2 + p33 ( t ) p12 ( t ) p22 ( t ) z 3.5 .
In the resulting UGF U(z,t) the powers of z are found as a minimum of the
powers of the corresponding terms.
Taking into account that p31 ( t ) + p32 ( t ) + p33 ( t ) = 1, p21 ( t ) + p22 ( t ) = 1 and
p11 ( t ) + p12 ( t ) = 1, one can simplify the last expression for U(z,t) and obtain the
resulting UGF associated with the output performance stochastic process g, p(t) of
the entire MSS in the following form:
5
U ( z , t ) = pi (t ) z gi ,
i =1
where
g = { g1 , g 2 , g3 , g 4 , g5 } and p ( t ) = { p1 (t ), p2 (t ), p3 (t ), p4 (t ), p5 (t )}
completely define the output performance stochastic process for the entire MSS.
Computation of probabilities pi ( t ) , i = 1, 2,,5, gives exactly the same re-
sults that were obtained in Example 2.4 by using a straightforward Markov
method. (These results were presented in Figure 2.13.)
214 5 Combined Universal Generating Function and Stochastic Process Method
Based on the resulting UGF U(z,t) of the entire MSS, one can obtain the MSS
reliability indices. The instantaneous MSS availability for the constant demand
level w = 2.0 tons/min is
5 5
A(t ) = A (U ( z , t ), w) = A ( pi (t ) z gi , 2) = pi (t )1( F ( gi , 2) 0) = p4 (t ) + p5 (t ).
i=1 i=1
5
E (t ) = E (U ( z , t )) = pi (t ) gi = 1.5p2 ( t ) + 1.8p3 ( t ) + 2p4 ( t ) + 3.5p5 ( t ) .
i =1
The instantaneous performance deficiency D(t) at any time t for the constant
demand w = 2.0 tons/min is
5
D(t ) = D (U ( z , t ), w) = pi (t ) max ( 2 gi , 0 )
i =1
= p1 ( t )( 2 0 ) + p2 ( t )( 2 1.5) + p3 ( t )( 2 1.8) = 2p1 ( t ) + 0.5p2 ( t ) + 0.2p3 ( t ) .
The calculated reliability indices A(t), E(t), and D(t) are exactly the same as
those obtained in Example 2.4 by using a straightforward Markov method and
graphically presented in Figure 2.14.
Note that instead of solving the system of K = 2 2 3 = 12 differential equa-
tions (as should be done in a straightforward Markov method), here we solve just
three systems: one third-order system and two second-order systems. The further
derivation of the entire systems state probabilities and reliability indices is based
on using simple algebraic equations.
5.2.1 Introduction
2008). However for the MSS there is an important type of redundancy that has not
existed for binary-state systems and has not been investigated till now in the
framework of MSS reliability analysis.
For MSSs it is typical that after satisfying its own demand one MSS can pro-
vide its abundant resource (performance) to another MSS directly or through an
interconnection system (which can also be multi-state). In this case, the first MSS
can be called the reserve MSS and the second one the main MSS. In the general
case demand for the reserve and the main MSS can also be described by two dif-
ferent independent stochastic processes. Typical examples of such kinds of MSS
include power generating systems where one power station can assist another
power station satisfying demands, oil and gas production and transportation sys-
tems, computing systems with distributed computation resources, etc. Such a
multi-state structure with redundancy may be treated as MSSs with mutual aid or a
structure with interconnected MSSs. This type of redundancy is quite common for
MSSs. However, using existing methods it is very difficult to build a reliability
model for a complex repairable MSS taking redundancy into consideration and to
solve it for obtaining the corresponding reliability indices.
In practice each multi-state component in a MSS can have different numbers of
performance levels. This number may be relatively large up to ten or more
(Billinton and Allan 1996; Goldner 2006). Even for relatively small MSSs consist-
ing of three to five repairable components the number of states system wide will
be significantly greater (ten thousand or more). In general, for a MSS consisting of
n repairable components, where each component j has k j different capacity levels,
n
there are K = k j system states. This number may be very large and increase
j =1
sider an application of the combined UGF and random process method for reliabil-
ity assessment of interconnected repairable MSSs with mutual aid. Such an appli-
cation was suggested in Lisnianski and Ding (2009). In Ding et al. (2009) one can
find a method applied to dynamic reliability assessment in restructured power sys-
tems.
According to the generic MSS model any system component j in a MSS can have
kj different states corresponding to the performance levels, represented by the set
{ }
g j = g j1 , , g jk j . The current state of component j and the corresponding value
of the component performance level Gj(t) at any instant t are random variables.
Gj(t) takes values from gj: Gj(t) gj. Therefore, for the time interval [0,T], where
T is the MSS operation period, the performance level of component j is defined as
a discrete-state, continuous-time stochastic process. In this chapter only Markov
processes will be considered, where the process behavior at a future instant only
depends on the current state. The general Markov model of a multi-state compo-
nent was introduced in Chapter 2, which considered minor and major fail-
ures/repairs of components.
Minor failures are failures causing component transition from state i to the ad-
jacent state i 1. In other words, minor failure causes minimal degradation of com-
ponent performance. A major failure is one that causes components to transit from
state i to state j : j < i 1. The minor repair returns an component from state j to
state j + 1, while major repair returns components from state j to state i, where
i > j + 1. In this case for each component its performance level Gj(t) is a discrete-
state, continuous-time Markov stochastic process.
A general redundancy scheme for a MSS is presented in Figure 5.4. The main
multi-state system MSSm should satisfy its demand, which is presented as a dis-
crete-state, continuous time Markov stochastic process Wm(t). MSSm consists of m
multi-state components. The performance level of each component i in MSSm at
any instant t > 0 is defined by its output Markov stochastic process Gmi (t ),
i = 1,.., m. All m components in the main MSS are included in the technical struc-
ture according to the given structure function f m , which defines the main system
output stochastic performance Gm(t) over the stochastic processes of the system
components:
The reserve multi-state system MSSr should also satisfy its own demand, which
can be represented as a stochastic process Wr(t). If the output performance
Gr ( t ) > Wr ( t ) , the abundant (surplus) performance Gr ( t ) Wr ( t ) can be deliv-
ered to the main multi-state system MSSm through the connecting system. In this
case the stochastic process Gcinp(t) that represents an input of the connecting MSSc
can be defined by the following structure function f cinp :
Structure function f cinp defines the reserve system obligations concerning as-
sistance to the main system.
Wr (t )
Reserve system
Reserve MSS obligations Connecting MSS
Main MSS
MSSm Gm (t ) Wm (t ) G MSS (t )
Gm (t )
Wm (t )
If the process Gcinp(t) is defined by the above expression, it means that the re-
serve MSSr will only send its abundant performance that remains after satisfying
its own demand to the input of the connecting MSSc. Generally speaking, stochas-
tic process Gcinp(t) and function f cinp can be defined in different ways. It will de-
pend on the reserve system obligation agreement. For example, if, according to the
218 5 Combined Universal Generating Function and Stochastic Process Method
The expression indicates that the reserve system according to its obligation
agreement should send specified performance g s to the connecting system even
in the case where its own demand is not satisfied. When its demand is satisfied,
the reserve system should send its abundant performance to the connecting sys-
tem.
The connecting system can also be a MSS, which is designated as a MSSc. It
consists of c multi-state components, which are included in the technical structure
with the given structure function f c :
In the general case such redundancy can be reversible. In other words, the main
MSSm can also be used as a redundant system in order to support the MSSr.
The problem is to evaluate the reliability indices for the main MSSm that char-
acterize the degree of satisfying demand Wm(t), such as availability, expected in-
stantaneous performance deficiency, expected accumulated performance defi-
ciency, etc.
In this subsection when dealing with a single multi-state element, we will omit in-
dex j for the designation of a set of the elements performance rates. This set is
denoted as g = { g1 , , g k } . It is also assumed that this set is ordered so that
gi +1 gi for any i. Here we consider the general model of a repairable Markov
multi-state element as it was described in Chapter 2.
The state-space diagram for the general model of the repairable multi-state
element with minor and major failures and repairs is presented in Figure 5.5. Fail-
ures cause the component to transit from state j to state i ( j > i ) with correspond-
ing transition intensity ji . Repairs cause the component to transit from state e to
state l ( e < l ) with corresponding transition intensity el .
5.2 Redundancy Analysis for Multi-state Systems 219
dp k (t ) k 1 k 1
dt = e,k e p ( t ) p k ( t ) k , e ,
e =1 e =1
dpi (t ) k i 1
= e , i pe (t ) + e , i p e (t )
dt e = i +1 e =1
(5.21)
i 1 k
pi (t )( i , e + i , e ), for 1 < i < k ,
e =1 e = i +1
dp1 (t ) =
k k
dt
e=2
e ,1 pe (t ) p1 (t ) 1, e ,
e=2
Solving this system of differential equations one can obtain the state probabili-
ties pi(t), i = 1, , k , that define probabilities that at instant t>0 the component will
be in state i.
Based on these probabilities and given performance levels in every state i, one
obtains a UGF corresponding to the elements output stochastic performance:
u ( z , t ) = p1 (t ) z g1 + p2 (t ) z g2 + ... + pk (t ) z gk (5.22)
As stated in the previous subsection, the main multi-state system MSSm consists of
m multi-state elements. Performance of each element i in MSSm at any instant
t > 0 is defined by its output Markov stochastic process Gmi (t ) , i = 1,.., m. For
any element i in MSSm we assume that its output performance stochastic process
has ki( m ) different states with corresponding performance levels gij( m ) and state
probabilities pij( m ) (t ), i = 1,..., m; j = 1,..., ki( m ) .
After solving the corresponding system of differential equations (5.21) for ele-
ment i, the following equation, which defines individual UGF umi ( z, t ) for the out-
put stochastic performance of component i in MSSm, can be written as
ki( m )
gij( m )
umi ( z , t ) = p
j =1
(m)
ij (t ) z , i = 1,.., m. (5.23)
All m elements in the main MSS are included in the technical structure accord-
ing to the given structure function f m , which defines the main system output sto-
chastic performance Gm(t):
where
Gm (t ) is the main system (MSSm) output performance stochastic process (it is a
discrete-state, continuous-time Markov stochastic process with a finite number
of different performance levels);
5.2 Redundancy Analysis for Multi-state Systems 221
Km
gi( m )
Um ( z, t ) = pi( m) (t ) z . (5.25)
i =1
Km
Um ( z, t ) = pi( m) (t ) z gi = m {um1 ( z, t ),..., umm ( z, t )}.
(m)
(5.26)
i =1
Taking into account expressions (5.23) and using the general definition of com-
position operator (4.23) from Chapter 4 one can obtain the following expression:
= ... p (m)
i , ji (t )z 1 ,j
m
j1 =1 j2 =1 jm =1 i =1
M
UWm ( z , t ) = p (j w) (t ) z
wmj
. (5.28)
j =1
Mm
Um (z, t) = pi(m) z gi ,
( m )
(5.29)
i =1
where
Mm- is the number of possible performance levels for stochastic process
Gm (t ) Wm (t ) and
pi( m ) (t ) is the probability that stochastic process Gm (t ) Wm (t ) will be at
level gi( m ) , i = 1,.., M m at time instant t > 0.
{
Um ( z, t ) = mw Um ( z, t ),Uwm ( z, t )}
Km M
w
Km M (5.30)
= mw pi( m) (t ) z gi , p(jw) (t ) z mj = pi(m) p(jw) z i mj .
( m) g( m) w
i =1 j =1 i =1 j =1
ki( r )
gij( r )
uri ( z , t ) = pij( r ) (t ) z , i = 1,.., r. (5.31)
j =1
All r elements in a reserve MSS are included in the technical structure accord-
ing to the given structure function f r , which defines the reserve system output
stochastic performance Gr(t):
Kr
(r)
U r ( z, t ) = pi( r ) (t ) z
gi
. (5.33)
i =1
The resulting UGF U r ( z , t ) for the reserve system output stochastic perform-
ance Gr(t) can be obtained using composition operator r over individual UGFs
representing the output performance for each component in the reserve MSS:
Kr
{
Ur (z) = pi( r ) (t )z gi = r ur1 (z, t ),..., urr ( z, t) . } (5.34)
(r)
i =1
Taking into account expressions (5.31) and using the general definition of com-
position operator, we obtain the following expression:
{
U r ( z, t ) = r ur1 ( z, t ),..., urr ( z, t ) }
k1
(r )
g(r)
kr( r )
g( r ) (5.35)
= r p1( rj ) (t ) z 1 j ,..., prj( r ) (t ) z rj
j =1 j =1
(r) (r )
r ( g ( r ) ,..., g ( r ) )
k1 k2 (r )
kr
r
= ... pi(,rj)i (t )z 1, j1 r , jr .
j1 =1 j2 =1 jr =1 i =1
N
U wr ( z , t ) = p (j wr ) (t ) z rj .
w
(5.36)
j =1
Nr
gi( r )
U r ( z , t ) = pi( r ) z . (5.37)
i =1
{
Ur ( z, t ) = rw Ur (z, t ),Uwr ( z, t ) }
(5.38)
Kr N
Kr N
w
(r )
g(r )
= rw pi(r ) (t ) z i , p(jwr ) (t ) z rj = pi(r ) p(jwr ) z i rj .
g w
i =1 j =1 i =1 j =1
The reserve MSSr provides abundant resources (performance) to the main MSSm
only after satisfying its own demand.
Therefore, the stochastic process Gcinp(t) that represents an input for the con-
necting MSSc can be defined by the following structure function f cinp , which de-
fines the reserve system obligation:
If the process Gcinp(t) is defined by expression (5.18), it indicates that the re-
serve MSSr will only send its abundant performance that remains after satisfying
its own demand to the input of the connecting MSSc. As stated in Section 2, sto-
chastic process Gcinp(t) and function f cinp are defined by the reserve system obli-
gation agreement.
Based on (5.16) (5.18), UGF U cinp ( z, t ) corresponding to Markov stochastic
process Gcinp(t) can be obtained as
5.2 Redundancy Analysis for Multi-state Systems 225
{
U cinp ( z , t ) = cinp U r ( z , t ), z 0 }
Nr g( r ) Nr max{ gi( r ) ,0} (5.40)
= cinp pi( r ) z i , z 0 = pi( r ) z .
i =1 i =1
In the general case, the connecting system MSSc can also be a MSS. Its per-
formance Gc(t) is treated as the capability to transmit certain performance
gi( c ) , i = 1,, c from the reserve system MSSr to the main system MSSm
c (c )
U c ( z , t ) = pi( c ) z
gi
. (5.42)
i =1
Output stochastic process Gcout(t) of the connecting system MSSc can be ob-
tained according to the following structure function:
Cout ( cout )
U cout ( z , t ) = pk( cout ) (t ) z
gk
k =1
c
Nr
g(c) max{ g (jr ) ,0}
= cout pi( c ) z i , p (j r ) z (5.44)
i =1 j =1
c Nr
= pi( c ) p (jr ) z
min gi{ (c)
,max[ g j
( r )
},
,0]
i =1 j =1
226 5 Combined Universal Generating Function and Stochastic Process Method
where Cout is the number of output performance levels for the discrete-state, con-
tinuous-time stochastic process Gcout(t) and pk( cout ) (t ) is the probability that sto-
chastic performance process Gcout(t) will be at level gk( cout ) , i = 1,.., Cout , at time
instant t > 0.
The output performance stochastic process GMSS (t ) of the entire MSS considering
redundancy is defined by the following structure function f MSS :
M MSS ( MSS )
= MSS {U m ( z , t ), U cout ( z , t )}
gj
U MSS ( z , t ) = p (jMSS ) z
j =1
M m (m)
Cout ( cout )
= MSS pi( m ) z i , pk( cout ) (t ) z k
g g
(5.46)
i =1 k =1
M m Cout (m) ( cout )
p (m ) gi + gk
= i pk( cout ) z ,
i =1 k =1
where MMSS is the number of output performance levels for the discrete-state, con-
tinuous-time stochastic process GMSS (t ) and p (j MSS ) (t ) is the probability that the
stochastic performance process GMSS (t ) will be at level g (j MSS ) , j = 1,.., M MSS , at
time instant t > 0.
The procedure of UGF computation for the entire MSS considering redundancy is
graphically presented in Figure 5.6.
5.2 Redundancy Analysis for Multi-state Systems 227
Fig. 5.6 Recursive procedure for resulting UGF computation for entire MSS with redundancy
M MSS
A(t ) =
i =1
pi( MSS ) (t )1( gi( MSS ) 0), (5.47)
MMSS
D(t ) = p
i =1
( MSS )
i (t ) (1)min( gi( MSS ) ,0). (5.48)
T
D = D(t ) dt , (5.49)
0
The coal, gas, and oil units have 10, 10, and 11 states, respectively. It is as-
sumed that a tie line with a transmission capacity of 300 MW connects systems 2
and 1. The tie line is represented as a binary-state component, which has only two
states: full transmission capacity and complete failure. The failure rate and the re-
pair rate of the tie line are 0.477 f/year and 364 f/year, respectively (Goldner
2006). The demands have two levels: the low-demand level and the peak-demand
level. Demand W1(t) of system 1 is represented as a two-state, continuous-time
Markov stochastic process that at any instant t > 0 takes discrete values from the
5.3 Case Studies 231
{ }
set: w1 = w11 , w12 , where w11 = 40 MW and w12 = 800 MW. The corresponding
transition rates from the low-demand level to the peak-demand level and from the
peak-demand level to the low-demand level are 621.96 year1 and 876 year1, re-
spectively. Demand W2(t) of system 2 is also represented as a two-state, continu-
ous-time Markov stochastic process that at any instant t > 0 takes discrete values
{ }
from the set: w 2 = w21 , w22 , where w21 = 20 MW and w22 = 450 MW. The
corresponding transition rates are the same as those of system 1.
Case 1
In the first case, generating system 2 is the main MSSm and generating system 1 is
the corresponding reserve MSSr. The connecting system is presented by the tie
line (Figure 5.8). First, the reserve assistance from MSSr to MSSm will not be con-
sidered and MSSm will only satisfy its demand by its own resources. Second, we
consider that MSSr will provide the reserve to MSSm if MSSr can satisfy its own
demand. The instant availability, the instant expected performance deficiency,
and the expected accumulated performance deficiency of system 2 without con-
sidering the reserve assistance and considering the reserve assistance are shown in
Figures, respectively. It can be observed from Figures 5.8 5.9 that the instant
availability and the instant expected performance deficiency of system 2 reach
steady values after about 400 h. From Figures 5.8 5.10 one can see that reserve
assistance from MSSr to MSSm can greatly increase the MSSm reliability indices.
For example, because of redundancy the system steady-state availability increases
form 0.899 up to 0.972.
0.98
0.96
Availability
without reserve
0.94 with reserve
0.92
0.9
0.88
0 20 40 60 80 100 300 500 700
Time (h)
Fig. 5.8 Instant availability of system 2 with and without reserve assistance
232 5 Combined Universal Generating Function and Stochastic Process Method
20
10
without reserve
with reserve
5
0
0 50 100 600 1000
Time (h)
Fig. 5.9 Instant expected performance deficiency of system 2 with and without reserve assistance
4
x 10
2
Accumulated performance deficiency
without reserve
with reserve
1.5
0.5
0
0 50 100 600 1000
Time (h)
Fig. 5.10 Instant accumulated performance deficiency of system 2 with and without reserve as-
sistance
Case 2
In the second case generating system 1 is the main MSSm and generating system 2
is the corresponding reserve MSSr. System 2 will provide the reserve assistance to
system 1 if system 2 can satisfy its own demand. The instant availability and the
instant expected performance deficiency of system 1 evaluated by the proposed
model are shown in Figures 5.11 and 5.12, respectively. It can be seen from these
two figures that the instant availability and the instant expected performance defi-
5.3 Case Studies 233
ciency of system 2 reach steady values after about 400 h. Figure 5.11 shows the
expected accumulated performance deficiency for system 1.
1
0.99
Availability
0.98
0.97
0.96
0 50 100 600 1000
Time (h)
Fig. 5.11 Instant availability of system 1 with reserve assistance
12
Expected performance deficiency
10
0
0 50 100 600 1000
Time (h)
Fig. 5.12 Instant expected performance deficiency of system 1 with reserve assistance
As one can see, the method presented in this chapter is highly suitable for engi-
neering applications since the procedure is well formalized and based on the natu-
ral decomposition of the interconnected systems in their entireties. By using this
method the short-term and long-term performance of complex MSSs with redun-
dancy can be accurately predicted.
234 5 Combined Universal Generating Function and Stochastic Process Method
10000
6000
4000
2000
0
0 50 100 600 1000
Time (h)
Fig. 5.13 Instant accumulated performance deficiency of system 1 with reserve assistance
References
Billinton R, Allan R (1996) Reliability evaluation of power systems. Plenum, New York
Ding Y, Lisnianski A, Wang P et al (2009) Dynamic reliability assessment for bilateral contact
electricity providers in restructured power systems. Electr Power Syst Res 79:14241430
Goldner Sh (2006) Markov model for a typical 360 MW coal fired generation unit. Commun
Depend Qual Manag 9(1):2429
Huang J, Zuo M, Fang Z (2003) Multi-state consecutive k-out-of-n systems. IIE Trans 35:527
534
Kuo W, Zuo M (2003) Optimal reliability modeling principles and applications. Wiley, New
York
Levitin G (2005) Universal generating function in reliability analysis and optimization. Springer,
London
Lisnianski A (2004a) Universal generating function technique and random process methods for
multi-state system reliability analysis. In: Proceedings of the 2nd International Workshop in
Applied Probability (IWAP2004). Piraeus, Greece: 237242
Lisnianski A (2004b) Combined universal generating function and semi-Markov process tech-
nique for multi-state system reliability evaluation. In: Communication of 4th International
Conference on Mathematical Methods in Reliability, Methodology and Practice (MMR2004),
2125 June, 2004, Santa-Fe, New Mexico
Lisnianski A (2007) Extended block diagram method for a multi-state system reliability assess-
ment. Reliab Eng Syst Saf 92(12):16011607
Lisnianski A, Ding Y (2009) Redundancy analysis for repairable multi-state system by using
combined stochastic process methods and universal generating function technique. Reliab
Eng Syst Saf 94:17881795
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
References 235
Modarres M, Kaminskiy M, Krivtsov V (1999) Reliability engineering and risk analysis: a prac-
tical guide. Dekker, New York
Tian Z, Zuo M, Huang H (2008) Reliability-redundancy allocation for multi-state series-parallel
systems. IEEE Trans Reliab 57(2):303310
Yeh W (2006) The k-out-of-n acyclic multistate-node network reliability evaluation using the
universal generating function method. Reliab Eng Syst Saf 91:800808
6 Reliability-associated Cost Assessment and
Management Decisions for Multi-state Systems
The LCC of a system (product) is the total cost of acquiring and utilizing the sys-
tem over its entire life span. LCC includes all costs incurred from the point at
which the decision is made to acquire a system, through operational life, to even-
tual disposal of the system. So, in other words, LCC is the total cost of procure-
ment and ownership. As has been shown in many studies, the ownership cost (lo-
gistics and operating cost) for repairable systems can vary from 10 to 100 times
the procurement cost (Ryan 1978). The history of life cycle costing began in the
mid-1960s when a document entitled Life Cycle Costing in Equipment Procure-
ment was published (Logistics Management Institute 1965). In 1974, Florida be-
came the first US state to formally adopt the concept of life cycle costing, and in
1978, the US Congress passed the National Energy Conservation Policy Act
(Dhillon 2000). According to this act every new federal government building
should be LCC effective. From that time till now numerous works have been pub-
lished in this field. A variety of approaches have been suggested for estimating the
cost elements and providing inputs to the establishment of a LCC model for bi-
nary-state systems. The total LCC model is thus composed of subsets of cost mod-
els that are then exercised during trade off studies. These cost models range from
simple informal engineering/cost relationships to complex mathematical state-
ments derived from empirical data. Some of these cost models were extended
from binary-state models to multi-state models and will be considered in this
chapter.
As is known, total LCC is expressed in simple mathematical terms as sum of
acquisition cost and system utilization cost:
LCC = AC + SUC ,
where:
LCC life cycle cost;
AC acquisition cost;
SUC system utilization cost.
Figure 6.1 identifies the more significant cost types and shows how LCC may
be distributed in terms of major cost categories over a systems life cycle (MIL-
HDBK-338B).
In general, design and development costs include materials, labor, administra-
tive, overhead, handling, and transportation.
6.1 Basic Life Cycle Cost Concept 239
Production costs include all types of costs associated with system production.
Operation and support costs include a spare parts and replacements, equipment
maintenance, inventory management, support equipment, personnel training, tech-
nical data/documentation, and logistics management. In addition, there are finan-
cial losses when a system interrupts its work because of failures.
Disposal costs include all costs associated with deactivating and preparing the
system for disposal through scrap or salvage programs. Disposal costs may be ad-
justed by the amount of value received when the disposal process is through sal-
vage.
LCC analysis provides a meaningful basis for evaluating alternatives regarding
system acquisition and operation and support costs. Based on this analysis, devel-
opment and production goals can be established as well as an optimum required
reliability level. Figure 6.2 illustrates the relationships between reliability and cost
(MIL-STD-338B). The top curve is the total LCC and it is the sum of the acquisi-
tion or investment and operation and support costs. The figure shows that in gen-
eral a more reliable system has lower support costs. At the same time, acquisition
costs (both development and production) increase to attain the improved reliabil-
ity. In this figure one can see the point where the amount of money (investment)
spent on increasing reliability and amount saved in support costs will be exactly
the same. This point represents the reliability for which the total cost is minimal.
The implementation of an effective program based on proven LCC principles,
complete with mathematical models and supporting input cost data, will provide
early cost visibility and control, i.e., indicate the logistics and support cost conse-
quences of early research, development, and other subsequent acquisition deci-
sions.
240 6 Reliability-associated Cost Assessment and Management Decisions for M66V
There are many known advantages of the LCC approach such as making effec-
tive equipment replacement decisions, comparing the cost of competing projects
and making a selection among the competing contractors, etc. On the other hand,
providing correct LCC analysis for a real system is not a simple job. It requires
very high professional skills, first of all because of the absence of general models
recommended for LCC analysis in standards. Theoretically there are many meth-
ods (a variety of approaches as formulated in MILHDBK338B), but in prac-
tice, there is nothing for immediate use. Because of this reason LCC analysis till
now has been a bad-formalized problem and its solution is also expensive and
time consuming. Almost for any practical case it is required to provide special re-
search work.
To perform LCC analysis the following steps should be executed as shown in
Figure 6.3. (Dhillon 2000). Usually step 2, where all involved costs should be es-
timated, requires the most amount of time and resources. The major component of
a repairable system life cycle is its operation and support phase. For the majority
of repairable MSSs the RAC is strongly associated with operation and support cost
and RAC is usually the main component of LCC.
To estimate correctly RAC, a variety of special models should be developed
and analyzed. There are models for inventory (spare parts) management, complex
reliability models that take into account all types of redundancy, different opera-
tion modes, different failure modes, etc., models for estimation of losses because
of system failures (for example, financial losses due to the interruption of the
power supply to consumers), and so on. Developing such models even for a bi-
nary-state system requires high-level professional skills.
A crucial factor for successful LCC analysis is the attitude and the thinking
philosophy of top level management toward reliability (Dhillon 2000). Without
the support of the top management, reliability and maintainability programs LCC
analysis will not be effective. If the top level management's positive and effective
attitude is generated, then appropriate reliability and maintainability programs can
be successful. Such an attitude can be created only on the base of corresponding
education in the field of reliability engineering.
6.1 Basic Life Cycle Cost Concept 241
Below we shall demonstrate the methods for RAC assessment and optimization
in order to emphasize their importance for management decisions in MSSs. Addi-
tional examples can be found in Lisnianski and Levitin (2003) and in Levitin
(2005).
242 6 Reliability-associated Cost Assessment and Management Decisions for M66V
In many practical cases, the reliability engineer has to choose the best solution out
of a number of alternatives. If this number is large, the decision should be based
on an optimization approach. If it is relatively small, the decision may be based on
a comparison analysis.
Usually it is not enough to only assess MSS reliability in order to compare the
existing alternatives. For example, in order to make a decision about system reli-
ability improvement, both the benefits from the system reliability improvement
and the investment costs associated with this improvement should be taken into
account.
In this section we consider a comparison analysis based on cost-type criteria.
Suppose that several different alternatives should be compared. The economic
losses caused by system failures (spare parts cost, payment for repairing team, fi-
nancial losses due to system staying in unacceptable states, etc.) in many cases can
be estimated based on a Markov reward model (see examples in Section 2.4). One
can find the total expected economic losses V j (t ) or RAC during the system use
time t for each alternative j.
If during each year i from the beginning of the system use economic losses of
alternative j are V ji , the total RAC during the entire period of system use (m
years) expressed in the present values can be obtained as
m V ji
RAC = V j* = , (6.1)
i =1 (1 + IR)i
According to the cost-type criterion, the best alternative is the one that maxi-
mizes the net present value of the profit:
Consider the air conditioning system used in one Israeli hospital (Lisnianski et al.
2008). The system consists of two main online air conditioners and one air condi-
tioner in cold reserve. The reserve conditioner begins to work only when one of
the main conditioners has failed. The MSS performance is determined by the
number of air conditioners working online G ( t ) = {0, 1, 2} . Air conditioner fail-
ure rates are = 3 year-1 for the main conditioner and * =10 year-1 for the con-
ditioner in cold reserve ( * > , because the reserve conditioner is usually a sec-
ond-hand device). The repair rates for the main and reserve conditioners are the
same, = * = 100 year-1. Demand is a discreet-state, continuous-time Markov
process W(t) with two levels during a daily 24-h period: peak wpeak and low wlow.
The mean duration of the peak demand period is Td = 7 h. The mean duration of
the low demand period is equal to TN = 24 Td = 17 h. In order to satisfy peak
demand two air conditioners have to work together, so w peak = 2 and in order to
satisfy low demand only one air conditioner can work, so wlow = 1. MSS states
where performance G(t) is greater than or equal to demand W(t) are defined as ac-
ceptable states. States where G ( t ) W ( t ) < 0 are defined as unacceptable states
and entrance into one of these states is treated as a MSS failure.
For maintenance contract the system owner can choose a maintenance com-
pany for repairing air conditioners from a list of companies. Maintenance compa-
nies suggest different mean repair times that range from 0.7 to 7.3 d. Naturally a
contract that provides a lower mean time to repair (MTTR) is more expensive. So,
on one hand, the owner is interested in a less expensive contract or, in other
words, in a contract with maximal repair time. But on other hand, the repair time
should meet specified reliability requirements. Below we shall consider three dif-
ferent cases for given reliability requirements:
Case 1A. The annual average availability of the MSS should not be lower than
0.999 and the mean total number of system failures during 1 year should not be
greater than one.
Case 1B. The mean time up to the first system failure during 1 year should be
greater than or equal to 0.90 years.
244 6 Reliability-associated Cost Assessment and Management Decisions for M66V
Case 1C. The probability of MSS failure-free operation during 1 year should be
greater than or equal to 0.90.
The problem is to find maximal MTTR that meets reliability requirements for
these three cases.
Case 1A The state-transitions diagram for the MSS is presented in Figure 6.4.
6 12
N
2
2
*
d
5 * 11
N
d
* 2 8
N
2 2 2 *
*
d
2
*
4 10
N d
3 * *
* 9
*
2 N
2
d
1 7
Fig. 6.4 State-transitions diagram for MSS with two online conditioners and one conditioner in
cold reserve [Unacceptable states are grey]
This diagram was built in accordance with the algorithm from Section 2.4.2.2
for the combined performance-demand model.
There are 12 states. States 1 to 6 are associated with a peak demand period,
states 7 to 12 are associated with a low demand period.
In states 6 and 12, both 2 main air conditioners are online and the reserve air
conditioner is available. The system performance is g 6 = g12 = 2.
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 245
In states 5 and 11, one of the main air conditioners has failed and been replaced
by the reserve air conditioner. The system performance is g5 = g11 = 2.
In states 4 and 10, the second main air conditioner has failed, and only the re-
serve air conditioner is online. The system performance is g 4 = g10 = 1.
In states 3 and 9, the reserve air conditioner has failed, and only one main air
conditioner is online. The system performance is g3 = g9 = 1.
In states 2 and 8, the reserve air conditioner has failed, and two main air condi-
tioners are online. The system performance is g 2 = g8 = 2.
In state 1 and 7, the system suffers total failure. The system performance is
g1 = g 7 = 0.
If in the peak demand period the required demand level is w peak = 2 and in the
low demand period the required demand level is wlow = 1, then there are 8 accept-
able states: 12, 11, 10, 9, 8, 6, 5, and 2. States 7, 4, 3, and 1 are unacceptable. Sys-
tem entrance into any of unacceptable states is treated as a failure.
The transitions from state 6 to state 5, from state 2 to state 3, from state 12 to
state 11, and from state 9 to state 8 are associated with the failure of one of the
main air conditioners and have an intensity of 2. (This is so because either one of
two online main conditioners can fail.) The transitions from state 5 to state 4,
from state 3 to state 1, from state 11 to state 10, and from state 9 to state 7 are as-
sociated with failure of the second main air conditioner and have an intensity of .
The transitions from state 5 to state 3, from state 4 to state 1, from state 11 to
state 9, and from state 10 to state 7 are associated with failure of the reserve air
conditioner and have an intensity of *.
The transitions from state 4 to state 5, from state 1 to state 3, from state 10 to
state 11, and from state 7 to state 9 are associated with repair of one of the main
air conditioners and have an intensity of 2. The transitions from state 5 to state 6,
from state 3 to state 2, from state 11 to state 12, and from state 9 to state 8 are as-
sociated with failure of the main air conditioner and have an intensity of . The
transitions from state 3 to state 5, from state 2 to state 6, from state 1 to state 4,
from state 9 to state 11, from state 8 to state 12, and from state 7 to state 10 are
associated with repair of the reserve air conditioner and have an intensity of *.
The transitions from state 6 to state 12, from state 5 to state 11, from state 4 to
state 10, from state 3 to state 9, from state 2 to state 8, and from state 1 to state 7
are associated with a variable demand and have an intensity of d = 1 Td . The
transitions from state 12 to state 6, from state 11 to state 5, from state 10 to state 4,
from state 9 to state 3, from state 8 to state 2, and from state 7 to state 1 are asso-
ciated with a variable demand and have an intensity of N = 1 TN = 1 (24 Td ).
As one can see below we have defined all transition intensities for the diagram
presented in Figure 6.4 and, therefore, determined the matrix of transition intensi-
ties (6.5) for the corresponding Markov model.
For simplification we use in (6.5) the following designations:
246 6 Reliability-associated Cost Assessment and Management Decisions for M66V
C1 = 2 + * + d , C5 = + * + + d , C9 = + + * + N ,
C2 = 2 + * + d , C6 = 2 + d , C10 = * + 2 + N ,
(6.4)
C3 = + + * + d , C7 = 2 + * + N , C11 = + * + + N ,
C4 = * + 2 + d , C8 = 2 + * + N , C12 = 2 + N .
C1 0 2 * 0 0 d 0 0 0 0 0
0 C2 2 0 0 *
0 d 0 0 0 0
C3 0 * 0 0 0 d 0 0 0
*
0 0 C4 2 0 0 0 0 d 0 0
0 0 * C5 0 0 0 0 d 0
0 0 0 0 2 C6 0 0 0 0 0 d (6.5)
a= .
N 2 *
0 0 0 0 0 C7 0 0 0
0 N 0 0 0 0 0 C8 2 * 0 0
0 0 N 0 0 0 C9 0 * 0
0 0 0 N 0 0 * 0 0 C10 2 0
0 0 0 0 N 0 0 0 * C11
0 0 0 0 0 N 0 0 0 * C12
In order to find the MSS annual average availability A(t ) |t =1 year we should pre-
sent the reward matrix rA in the following form (see Section 2.4.2.3 for rewards
determination).
0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0. (6.6)
rA = rij =
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 247
In this matrix, rewards associated with all acceptable states are defined as 1 and
rewards associated with all unacceptable states are zeroed, as are all rewards
associated with all transitions.
The following system of differential equations (6.7) can be written in order to
find the expected total rewards Vi (t ), i = 1,...,12. The initial conditions are
Vi (0) = 0, i = 1,...,12.
dV1 (t )
dt = C1V1 (t ) + 2 V3 (t ) + *V4 (t ) + d V7 (t ),
dV2 (t ) = 1 C2V2 (t ) + 2V3 (t ) + *V6 (t ) + d V8 (t ),
dt
dV3 (t ) = V1 (t ) + V2 (t ) C3V3 (t ) + *V5 (t ) + d V9 (t ),
dt
dV4 (t ) = *V1 (t ) C4V4 (t ) + 2 V5 (t ) + d V10 (t ),
dt
dV (t )
5 = 1 + *V3 (t ) + V4 (t ) C5V5 (t ) + V6 (t ) + d V11 (t ),
dt
dV (t )
6 = 1 + 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
(6.7)
dV7 (t ) = N V1 (t ) C7V7 (t ) + 2 V9 (t ) + *V10 (t ),
dt
dV (t )
8 = 1 + N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
dt
dV (t )
9 = 1 + N V3 (t ) + V7 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV (t )
10 = 1 + N V4 (t ) + *V7 (t ) C10V10 (t ) + 2V11 (t ),
dt
dV11 (t )
= 1 + N V5 (t ) + *V9 (t ) + V10 (t ) C11V11 (t ) + V12 (t ),
dt
dV12 (t )
= 1 + N V6 (t ) + 2V11 (t ) C12V12 (t ).
dt
After solving system (6.7) and finding Vi(t), the MSS annual average availabil-
ity can be obtained as follows: A(t ) = V6 (t ) t where t = 1 year (Section 2.4.2.3).
Here the sixth state is the best state and is assumed to be the initial state, where the
MSS was at instant t = 0.
The results of calculation are presented in Figures 6.5 and 6.6.
In Appendix C, Section 5.1 one can find MATLAB code for MSS average
availability calculations.
248 6 Reliability-associated Cost Assessment and Management Decisions for M66V
As one can see from the curve in Figure 6.5 the MSS average availability (cal-
culated for MTTR 3.65 d) will be constant after 1 year and its constant value will
be lower than the required value of 0.999. This means that MTTR = 3.65 d is not
appropriate for the system owner.
0.9998
MSS average availability
0.9996
0.9994
0.9992
0.999
0.9988
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.5 The MSS average availability as a function of time (MTTR = 3.65 d)
In Figure 6.6 the constant values (stationary values after 1 year) of the MSS av-
erage availability were calculated for MTTR ranged from 0.7 up to 7.3 days. From
this figure one can conclude that the system can provide the required average
availability level (0.999 or greater) if the MTTR is less than or equal to 3.2 d
( 0.3125 d 1 ).
The curve in Figure 6.6 supports the engineering decision making and deter-
mines the area where the first reliability requirement (in the case of 1A) to the air
conditioning system can be met. As follows from the Figure 6.6, in order to pro-
vide the required average availability level greater than or equal to 0.999, the
MTTR should be less than or equal to 3.2 d. Thus, one can obtain the maximal
MTTR that meets the first reliability requirement for case 1A: MTTR AV = 3.2 d.
The second reliability requirement in case 1A concerns the mean total number
of system failures during one year. This number N f ( t ) should not be greater
than 1 for t = 1 year. This requirement can be written by the following expression:
N f (t ) |t =1 year 1.
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 249
1
Annual average availability
0.999
0.998
0.997
0.996
0.995
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean Time to Repair (d)
Fig. 6.6 The MSS annual average availability depending on mean time to repair
In order to find the mean total number of system failures N f ( t ) we (in accor-
dance with Section 2.4.2.3) should represent the reward matrix rN in the follow-
ing form (6.8):
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0. (6.8)
rN = rij =
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
In this matrix the rewards associated with each transition from the set of ac-
ceptable states to the set of unacceptable states should be defined as 1. All other
rewards should be zeroed.
Now the system of differential equations (6.9) can be written in order to find
the expected total rewards Vi (t ), i = 1,...,12. Here C1,,C12 are calculated via ex-
pressions (6.4). The initial conditions are Vi (0) = 0, i = 1,...,12.
250 6 Reliability-associated Cost Assessment and Management Decisions for M66V
dV1 (t )
dt = C1V1 (t ) + 2V3 (t ) + V4 (t ) + d V7 (t ),
*
dV2 (t ) = 2 C V (t ) + 2V (t ) + *V (t ) + V (t ),
dt 2 2 3 6 d 8
dV3 (t ) = V (t ) + V (t ) C V (t ) + *V (t ) + V (t ),
dt 1 2 3 3 5 d 9
dV4 (t ) = *V1 (t ) C4V4 (t ) + 2 V5 (t ) + d V10 (t ),
dt
dV (t )
5 = * + + *V3 (t ) + V4 (t ) C5V5 (t ) + V6 (t ) + d V11 (t ),
dt
dV (t )
6 = 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt (6.9)
dV7 (t ) = V (t ) C V (t ) + 2 V (t ) + *V (t ),
dt N 1 7 7 9 10
dV (t )
8 = N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
dt
dV (t )
9 = N + + N V3 (t ) + V7 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV10 (t )
= N + * + N V4 (t ) + *V7 (t ) C10V10 (t ) + 2V11 (t ),
dt
dV11 (t )
= N V5 (t ) + *V9 (t ) + V10 (t ) C11V11 (t ) + V12 (t ),
dt
dV12 (t )
dt = N V6 (t ) + 2V11 (t ) C12V12 (t ).
After solving this system and finding Vi(t), the mean total number of system
failures N f ( t ) can be obtained as follows: N f (t ) = V6 (t ), where the sixth state is
the best state and is assumed to be the initial state.
The results of calculation are presented in Figures 6.7 and 6.8.
In Appendix C, Section 5.2 one can find MATLAB code for mean number of
system failures calculations.
As one can see from the curve in Figure 6.7 the mean number of MSS failures
(which has been calculated for MTTR of 3.65 d) will be 1.5 after 1 year, and so it
will be greater than the required value of 1. Therefore, MTTR = 3.65 d is not ap-
propriate for the system owner.
In the Figure 6.8 N f ( t ) ( t = 1 year) was calculated for MTTF range from 0.7
up to 7.3 d. From this figure one can conclude that the system can provide the re-
quired value N f ( t ) 1 for t = 1 year if MTTR is less than or equal to
MTTR N = 2.8 d.
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 251
0.5
0
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.7 Mean number of system failures as a function of time (MTTR = 3.65 d)
0
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (d)
Fig. 6.8 Mean number of system failures depending on MTTR during 1 year
252 6 Reliability-associated Cost Assessment and Management Decisions for M66V
Case 1B In this case the mean time up to the first system failure (or MTTF)
should be greater than or equal to 0.90 years. In order to calculate the MTTF, the
initial model presented in Figure 6.3 should be transformed. In accordance with
the method of Chapter 2, all transitions that return the MSS from unacceptable
states should be forbidden and all unacceptable states should be united in one ab-
sorbing state. The transformed model is shown in Figure 6.9. Here all unaccept-
able states are united in one absorbing state 1.
Fig. 6.9 Transformed state-transition diagram with absorbing state for MTTF computation [Un-
acceptable state is grey]
C2 = 2 + * + d , C8 = 2 + * + N , C11 = + * + + N ,
C5 = + + + d , C9 = + + + N , C12 = 2 + N .
* *
(6.10)
C6 = 2 + d , C10 = * + 2 + N ,
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 253
In order to assess MTTF for a MSS, the rewards in matrix r for the transformed
model should be determined in the following manner: the rewards associated with
all acceptable states should be defined as 1 and the rewards associated with unac-
ceptable (absorbing) states should be zeroed, as should all rewards associated with
transitions.
0 0 0 0 0 0 0 0 0
2 d
*
C2 0 0 0 0 0
+ * 0 C5 0 0 0 d 0
0 0 2 C6 0 0 0 0 d
a= 0 N 0 0 C8 2 0 0 * . (6.11)
+ N *
0 0 0 C9 0 0
* + 0 0 0 0 0 C10 2 0
N
0 0 N 0 0 * C11
0 0 0 N 0 0 0 2 C12
The reward matrix for the system with two online conditioners and one in cold
reserve is as follows:
0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
r = rij = 0 0 0 0 1 0 0 0 0. (6.12)
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
dV0 (t )
dt = 0,
dV2 (t ) = 1 + 2V (t ) C V (t ) + *V (t ) + V (t ),
dt 0 2 2 6 d 8
dV5 (t ) = 1 + ( + * )V (t ) C V (t ) + V (t ) + V (t ),
dt 0 5 5 6 d 11
dV6 (t ) = 1 + 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
dV (t )
8
= 1 + N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ),
(6.13)
dt
dV9 (t )
= 1 + ( + N )V0 (t ) + V8 (t ) C9V9 (t ) + *V11 (t ),
dt
dV10 (t )
dt = 1 + ( + N )V0 (t ) C10V10 (t ) + 2V11 (t ),
*
dV11 (t ) = 1 + V (t ) + *V (t ) + V (t ) C V (t ) + V (t ),
dt N 5 9 10 11 11 12
dV12 (t ) = 1 + V (t ) + 2V (t ) C V (t ).
dt N 6 11 12 12
0.8
0.7
0.6
0.5
MTTF (year)
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Time (year)
Fig. 6.10 Mean time to system failure as a function of time (MTTR = 3.65 d)
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 255
In Appendix C, Section 5.3 one can find MATLAB code for mean time to
system failure calculations.
As one can see from the curve in Figure 6.10 the mean time to system failure
(which was calculated for mean time to repair of 3.65 d) will be 0.78 after 1 year,
and so it will be less than the required value of 0.90. Therefore, MTTR = 3.65 d is
not appropriate for the system owner.
0.95
0.9
0.85
MTTF (year)
0.8
0.75
0.7
0.65
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (d)
Fig. 6.11 Mean time to system failure depending on MTTR
In Figure 6.11 the MTTF during 1 year was calculated for mean times to repair
range from 0.7 d to 7.3 d. From this figure one can conclude that the system can
provide the required value of MTTF 0.90 for t = 1 year if MTTR is less than or
equal to 1.65 d.
Therefore, the maximal MTTR for case 1B should be 1.65 d.
Case 1C In order to solve the problem in case 1C one should find the MSS reli-
ability function R(t), which defines the probability of failure-free operation during
the period [ 0, t ] .
To calculate the system reliability function R(t), the model presented in Figure
6.9 is used. As was described in the previous case, in this model all unacceptable
states are treated as one absorbing state and all transitions that return MSS from
unacceptable states are forbidden. But rewards in this case should be defined in
another way. As was described in Section 2.4.2.3, all rewards associated with tran-
sitions to the absorbing state should be defined as 1. All other rewards should be
zeroed.
Therefore, one obtains the following reward matrix for the MSS in this case:
256 6 Reliability-associated Cost Assessment and Management Decisions for M66V
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
r = rij = 0 0 0 0 0 0 0 0 0. (6.14)
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
The mean accumulated reward Vi (t ) will define the probability Q(t) of MSS
failure during the time interval [0, t].
The following system of differential equations (6.15) can be written in order to
find the expected total rewards Vi (t ), i = 0, 2,5, 6,8,9,10,11,12.
Coefficients Ci , i = 2,5, 6,8,9,10,11,12, are calculated via formulas (6.11).
The initial conditions are Vi (0) = 0, i = 0, 2,5, 6,8,9,10,11,12.
dV0 (t )
dt = 0,
dV2 (t ) = 2 + 2V (t ) C V (t ) + *V (t ) + V (t ),
dt 0 2 2 6 d 8
dV5 (t ) = + * + ( + * )V (t ) C V (t ) + V (t ) + V (t ),
dt 0 5 5 6 d 11
dV6 (t ) = 2V5 (t ) C6V6 (t ) + d V12 (t ),
dt
dV (t )
8
= N V2 (t ) C8V8 (t ) + 2V9 (t ) + *V12 (t ), (6.15)
dt
dV9 (t )
dt = + N + ( + N )V0 (t ) + V8 (t ) C9V9 (t ) + V11 (t ),
*
dV10 (t )
dt = + N + ( + N )V0 (t ) C10V10 (t ) + 2 V11 (t ),
* *
dV11 (t ) = V (t ) + *V (t ) + V (t ) C V (t ) + V (t ),
dt N 5 9 10 11 11 12
dV12 (t ) = V (t ) + 2V (t ) C V (t ).
dt N 6 11 12 12
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 257
After solving this system and finding Vi ( t ) , the MSS reliability function
can be obtained as R(t ) = 1 V6 (t ), where the sixth state is the best state and is as-
sumed to be the initial state.
The results of the calculation are presented in Figure 6.12.
Probability of MSS failure during 1-year interval
0.9
0.8
0.7
0.6
0.5
0.4
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.3
Mean time to repair (days)
Fig. 6.12 Probability of failure-free operation during 1 year as a function of MTTR
In Appendix C, Section 5.4 one can find MATLAB code for the probability of
failure-free operation calculations.
As one can see from the Figure 6.12 in order to meet the reliability requirement
for case 1C R(t ) |t =1 year 0.90, maximal MTTR should be equal to 0.88 d, and
therefore MTTR should be less than or equal to 0.88 d.
6.2.2 Case Study 2: Feed Water Pumps for Power Generating Unit
Consider a subsystem of feed water pumps that supply the water to the boiler in a
coal power generating unit. From a reliability point of view, the generating unit is
a series of three interconnected subsystems: feed water pump subsystem, boiler,
and turbine generator.
The generating unit should provide a nominal generating capacity of
g nom = 100, 000 KW. If the feed water pump subsystem works with water trans-
mission capacity g fw = gbasic , the entire unit is able to generate capacity
Gu = g nom . If the capacity gfw of the feed water pump subsystem is reduced to a
level of g fw = kgbasic (0.5 k < 1), the unit reduces its generating capacity to the
258 6 Reliability-associated Cost Assessment and Management Decisions for M66V
level kgnom. The coal generating unit is installed in order to satisfy constant de-
mand w = g nom .
A designer has seven different possible configurations of the feed water pump
subsystem. Each configuration can be designated as ngp, where n is the number of
identical pumps and gp is the nominal capacity of each pump.
The first (basic) configuration consists of one pump that provides 100% of the
units capacity ( n = 1 and g p = gbasic ) .
The six other configurations consist of two identical pumps with different
nominal capacities.
2nd configuration: g p = kgbasic , k = 0.5. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = gbasic .
3rd configuration: g p = kgbasic , k = 0.6. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = 1.2g basic .
4th configuration: g p = kgbasic , k = 0.7. The entire feed water subsystem
capacity is g fw = 2g p = 2 kg basic = 1.4g basic .
5th configuration: g p = kgbasic , k = 0.8. The entire feed water subsystem
capacity is g fw = 2g p = 2 kgbasic = 1.6 gbasic .
6th configuration: g p = kgbasic , k = 0.9. The entire feed water subsystem
capacity is g fw = 2g p = 2 kgbasic = 1.8 g basic .
7th configuration: g p = kgbasic , k = 1.0. The entire feed water subsystem
capacity is g fw = 2g p = 2 kgbasic = 2.0gbasic .
Each type of pump that can be chosen has only total failures. The failure and
repair rates are the same for all of the pumps ( = 0.0001 h-1 and = 0.01 h-1).
In the first configuration, the pump failure causes the outage of the entire gen-
erating unit. In this case the generating capacity of the unit Gu is reduced to zero:
Gu = 0. In the configuration with two pumps the failure of a single pump causes
the reduction of the unit generating capacity to Gu = kg nom ; simultaneous failure
of two pumps causes the outage of the generating unit ( Gu = 0 ) .
In the cases of the generating unit outage or capacity reduction the generating
capacity deficiency can be partially compensated by a spinning reserve that usu-
ally exists in power systems. The spinning reserve provides additional generating
capacity g nom , where varies from 0 to 1.
A power that cannot be supplied by the power system to consumers in the case
of pump failure is
The value D defines the part of the power systems load that must be immedi-
ately switched off by the shedding system. The power D is not supplied to the
consumers till the reserve gas turbines start up and compensate the remaining gen-
erating deficiency. The turbine startup process takes time = 0.25 h. Hence, the
energy not supplied (ENS) to consumers during this time is
ENS = D . (6.17)
During the time the penalty c p = 4 $/KWh must be paid for every kilowatt-
hour of non-supplied energy.
The energy supplied by the gas turbines is more expensive than that supplied
by the coal power unit. The difference in the energy cost is c=0.1 $/KWh.
Each configuration of the feed water pump subsystem has its own investment
cost Cinv associated with pump purchase and installation. In Table 6.1 one can see
the investment costs for each configuration as well as the increase of these costs
Cinv over the cost of the basic configuration 1gbasic.
One can see that a tradeoff exists between the investment costs and costs of
losses caused by the energy not supplied, and by using more expensive gas turbine
energy. In order to compare the configurations, one has to evaluate the total costs
associated with each configuration in net present values.
In order to obtain the cost of losses caused by the system unreliability we use
the Markov reward model described in Section 2.4.
Consider first the configuration 1gbasic. The state-space diagram of Markov re-
ward model for this configuration is presented in Figure 6.13 (a).
In state 2 the feed water pump operates providing the desired generating unit
capacity w = g nom . If the pump fail, the MSS transits from state 2 to state 1 (with
transition intensity rate of ). In state 1 the gas turbines work in order to supply
the energy to consumers instead of a failed generating unit.
260 6 Reliability-associated Cost Assessment and Management Decisions for M66V
The reward r21 corresponding to the transition from state 2 to state 1 is de-
fined as a penalty cost for energy not supplied before the gas turbines start to op-
erate:
r21 = D c p , (6.18)
(a) (b)
Fig. 6.13 State-transition diagram of Markov reward models: (a) for configurations with n=1 and
(b) for configurations with n=2
The reward r11 corresponding to each time-unit (hour) when the MSS is in state
1 is defined as the excessive cost of energy supplied by the gas turbines:
For large generating units (with capacity greater than or equal to 100,000 KW)
the cost associated with normal operation of the coal generating unit (in state 2) is
negligibly small in comparison with the cost of alternative energy produced by the
gas turbines and with the penalty cost of unsupplied energy. Therefore, reward r22
can be zeroed. In the same way one can neglect the cost of pump repair and zero
the reward r12 associated with transition from state 1 to state 2. Hence, the reward
matrix takes the form
r11 0
r = rij = . (6.20)
r21 0
a11 a12
a = aij = = . (6.21)
a21 a22
We assume that the evolution begins from the best state 2. According to Sec-
tion 2.4, in order to obtain the total expected reward V2 ( t ) during time t under ini-
tial conditions V1 (0) = V2 (0) = 0, we must solve the following system of differen-
tial equations:
dV1 (t )
dt = r11 V1 (t ) + V2 (t ),
(6.22)
dV2 (t ) = r + V (t ) V (t ).
dt 21 1 2
In Table 6.2, one can see the total annual expected costs
V2 ( t ) (T = 1 year = 8760 h ) obtained for different values of relative capacity of
the spinning reserve (the costs are in thousands of dollars).
Consider now the configurations with two pumps ( n = 2 ) . The state-transition
diagram of Markov reward model for these configurations is presented in Figure
6.13 (b).
In state 3, both feed water pumps operate and the generating unit capacity is
Gu = g nom . The cost of normal operation is negligibly small. Therefore, the reward
r33 that is associated with this state is equal to zero.
State 2 corresponds to the case where failure occurs in one of the pumps and
the single pump continues to work. The subsystem transits from state 3 to state 2
with intensity rate 2 because failure can occur in both pumps. The unit generat-
ing capacity in state 2 decreases and becomes equal to the capacity provided by
the single pump: Gu = kg nom . The power not supplied to consumers in state 2 is
The energy not supplied to consumers before the startup of the gas turbines is
Therefore, the reward associated with the transition from state 3 to state 2 is
The reward r22 corresponding to each time unit (hour) that the MSS is in state 2
is defined as the excessive cost of the energy supplied by the gas turbines:
The subsystem can return from state 2 to state 3 after repair with intensity rate
. The reward r23 associated with this transition is assumed to be negligible:
r23 = 0. (6.27)
If in state 2 the failure in the second pump occurs before the completion of re-
pair in the failed pump, the subsystem transits from state 2 to state 1. The intensity
rate of this transition is . The capacity of the generating unit in state 1 is Gu = 0,
and the power not supplied to the consumers is
The reward r21 corresponding to the transition from state 2 to state 1 is defined
as a penalty cost for energy not supplied before the gas turbines start to operate:
The reward r11 corresponding to each time unit (hour) that the MSS is in the
state 1 is defined as the excessive cost of the energy supplied by the gas turbines:
r23 = 0. (6.31)
r11 0 0
r = rij = r21 r22 0 (6.32)
0 r32 0
and the subsystem transition intensity matrix corresponding to the state-space dia-
gram presented in Figure 6.13 (b) takes the form
6.2 Reliability-associated Cost and Practical Cost-reliability Analysis 263
As in the case with n = 1 we assume that the evolution begins from the best
state (state 3). In order to obtain the total expected reward V3(t) under initial con-
ditions V1 (0) = V2 (0) = V3 (0) = 0, we must solve the following system of differen-
tial equations:
dV1 (t )
dt = r11 2V1 (t ) + 2V2 (t ),
dV2 (t )
= r22 + r21 + V1 (t ) ( + )V2 (t ) + V3 (t ), . (6.34)
dt
dV3 (t )
dt = 2 r32 + 2V2 (t ) 2V3 (t ).
In Table 6.2, one can see the total annual expected rewards V3(T) (in thousands
of dollars) obtained for different values of relative capacity of the spinning reserve
for all the considered configurations 2 kgbasic . The total annual expected reward
for the configuration 1gbasic is also given for comparison.
Table 6.2 Total annual expected reward for different subsystem configurations
1 gbasic 20.5 gbasic 20.6 gbasic 20.7 gbasic 20.8 gbasic 20.9 gbasic 2gbasic
0 954 945 758 571 384 197 10.1
0.2 763 569 382 195 8.11 8.11 8.11
0.4 572 193 6.08 6.08 6.08 6.08 6.08
0.6 382 4.06 4.06 4.06 4.06 4.06 4.06
0.8 191 2.03 2.03 2.03 2.03 2.03 2.03
1.0 0 0 0 0 0 0 0
Since the reward functions V2(t) and V3(t) obtained for configurations with
n = 1 and n = 2, respectively, reach their steady states very quickly (within 2
weeks) and there is no aging in a MSS, we can assume that the annual rewards are
the same for any year from the beginning of the subsystem operation. The relative
annual reward V for each configuration with n = 2 is obtained as the difference
between the annual reward of the configuration 1gbasic and the annual reward of
the given configuration 2k gbasic.
From Table 6.2 one can see, for example, that if the capacity of the spinning re-
serve installed in the power system is 0.2w ( = 0.2), the annual costs associated
264 6 Reliability-associated Cost Assessment and Management Decisions for M66V
with the unreliability of the feed water subsystem is $763,000 for the configura-
tion 1 gbasic and $8,100 for the configurations 2kgbasic, where k 0.8. Hence, if
for = 0.2 one chooses configuration 20.8 gbasic, 20.9 gbasic, or 21 gbasic instead
of the simplest configuration 1gbasic, the relative annual reward is
V = 763,000 8,100 = $754,900.
According to (6.1) the sum of equal relative annual rewards accumulated dur-
ing a period of m years in present values is
m
1
V *
= V , (6.35)
i =1 (1 + IR)i
Fig. 6.14 Relative costs of different configurations for feed water pump subsystem
If there is no spinning reserve in the power system, the best configuration is the
configuration 2gbasic (the maximum of the curve corresponding to = 0 is
6.3 Practical Cost-reliability Optimization Problems for Multi-state Systems 265
for g = gbasic ). The configuration 20.5gbasic is the worst one in this case. (It is even
worse than the basic configuration because for g = 0.5gbasic CN < 0.)
If the spinning reserve in a power system is unrestricted ( = 1.0), the simplest
configuration, 1gbasic, is the best because CN < 0 for any configuration 2kgbasic.
sions exist for each type of element and in which analytical dependencies for ele-
ment costs are unavailable.
N Bi
C( ) = n(i,b)c(i,b). (6.36)
i=1b=1
Having the system structure defined by its components reliability block dia-
gram and by the set , one can determine the entire MSS availability index
A(w,q) (1.21) for any given steady-state demand distribution w, q. The problem of
structure optimization for series-parallel MSSs is formulated as finding the mini-
mal cost system configuration * that provides the required availability level A ':
( )
C * min
(6.37)
( )
subject to A w, q, * A '.
The natural way of encoding the solutions of the optimal assignment problem
(6.5) in a genetic algorithm (GA) is by defining a B-length integer string, where B
is the total number of versions available:
N
B = Bi . (6.38)
i =1
6.3 Practical Cost-reliability Optimization Problems for Multi-state Systems 267
{ }
Each solution is represented by string a = a1 , , a j , , aB , where for each j,
i 1
j = Bm + b. (6.39)
m =1
The system belongs to the type of flow transmission MSS with flow dispersion
since its main characteristic is the transmission capacity and parallel elements can
transmit the coal simultaneously.
Each system element is an element with total failure (which means that it can
have only two states: functioning with the nominal capacity and total failure, cor-
responding to capacity 0). For each type of equipment, there exists a list of prod-
ucts available on the market. Each version of equipment is characterized by its
nominal capacity g (in hundreds of tons per hour), availability p, and cost c (mil-
lions of dollars). The list of available products is presented in Table 6.3.
1 1.20 0.980 0.590 1.00 0.995 0.205 1.00 0.971 7.525 1.15 0.977 0.180 1.28 0.984 0.986
2 1.00 0.977 0.535 0.92 0.996 0.189 0.60 0.973 4.720 1.00 0.978 0.160 1.00 0.983 0.825
3 0.85 0.982 0.470 0.53 0.997 0.091 0.40 0.971 3.590 0.91 0.978 0.150 0.60 0.987 0.490
4 0.85 0.978 0.420 0.28 0.997 0.056 0.20 0.976 2.420 0.72 0.983 0.121 0.51 0.981 0.475
5 0.48 0.983 0.400 0.21 0.998 0.042 0.72 0.981 0.102
6 0.31 0.920 0.180 0.72 0.971 0.096
7 0.26 0.984 0.220 0.55 0.983 0.071
8 0.25 0.982 0.049
9 0.25 0.97 0.044
The system should have availability not less than A ' for the given steady-state
demand distribution presented in Table 6.4.
According to the first step of the decoding procedure, the u-functions of the
chosen elements are determined as follows:
u14 ( z ) = 0.022z 0 + 0.978z 0.85 and u17 ( z ) = 0.016z 0 + 0.984z 0.26 (for the
primary feeders);
u23 ( z ) = 0.003z 0 + 0.997z 0.53 (for the primary conveyors);
u31 ( z ) = 0.029z 0 + 0.971z1.00 (for the stacker);
u47 ( z ) = 0.017z 0 + 0.983z 0.55 (for the secondary feeders);
u52 ( z ) = 0.016z 0 + 0.984z1.28 (for the secondary conveyor).
( ) ( 0.016z )
2
U1 ( z ) = 0.022z 0 + 0.978z 0.85 0
+ 0.984z 0.26 ,
( z ) = ( 0.003z ),
2
0
U2 + 0.997z 0.53
U 3 ( z ) = 0.029z 0 + 0.971z1.00 ,
( )
3
U 4 ( z ) = 0.017z 0 + 0.983z 0.55 ,
U 5 ( z ) = 0.016z 0 + 0.984z1.28 .
U ( z ) = fser (U1 ( z ) , U 2 ( z ) ,U 3 ( z ) ,U 4 ( z ) , U 5 ( z ) ) ,
where function fser in the composition operator fser produces a minimum of its
arguments.
Having the system u-function, we obtain the steady-state availability index for
the given demand distribution using Equation 1.21 and operator (4.29): A = 0.95.
The total system cost, according to Equation 6.36, is
For the desired value of system availability A ' = 0.97, the fitness takes the
value
The minimal cost solutions obtained for different desired availability levels A '
are presented in Table 6.5. This table represents the cost, calculated availability,
and structure of the minimal cost solutions obtained by the GA. The structure of
each system component is represented by the string that has the form
n1 b1,...,nm bm, where nj is the number of identical elements of version bj be-
longing to this component.
Consider, for example, the best solution obtained for A ' = 0.99. The minimal
cost system configuration that provides the system availability A=0.992 consists of
two primary feeders of version 4, one primary feeder of version 6, two primary
conveyors of version 3, two stackers of version 2, one stacker of version 3, three
secondary feeders of version 7, and three secondary conveyors of version 4. The
cost of this configuration is 15.870.
In practice, the designer often has to include additional elements in the existing
system. It may be necessary, for example, to modernize a system according to new
demand levels or new reliability requirements. The problem of minimal cost MSS
expansion is very similar to the problem of system structure optimization. The
6.3 Practical Cost-reliability Optimization Problems for Multi-state Systems 271
only difference is that each MSS component already contains some working ele-
ments. The cost of the existing elements should not be taken into account when
the MSS expansion cost is minimized.
The initial structure of the MSS is defined as follows: each component of type i
contains B0i different subcomponents connected in parallel. Each subcomponent j
in its turn contains n0(i,j) identical elements, which are also connected in parallel.
Each element is characterized by its steady-state performance distribution g(i,j),
p(i,j). The entire initial system structure can therefore be defined by a set
{g(i,j),p(i,j),n0(i,j)} for 1iN, 1jB0i, and by a reliability block diagram repre-
senting the interconnection among the components.
The optimal MSS expansion problem formulation is the same as in Section
6.3.1.1 and the GA implementation is the same as in Section 6.3.1.2. (The only
difference is that one should take into account u-functions of both the existing ele-
ments and the new elements chosen from the list.)
Case Study 4 Consider the same coal transportation system for a power station
that was considered in case study 3. The initial structure of this MSS is presented
in Table 6.6. (Each component contains a single subcomponent of identical ele-
ments: B0i = 1 for 1 i N .) All the existing elements as well as the new ones to
be included in the system (from the list of available products presented in Table
6.3) are elements with total failure characterized by their availability p and nomi-
nal transmission capacity g.
The existing structure can satisfy the demand presented in Table 6.4 with avail-
ability A ( w, q ) = 0.506. In order to increase the system availability to the level
of A ', the additional elements are included. The minimal cost MSS expansion so-
lutions for different desired values of system availability A ' are presented in Ta-
ble 6.7.
272 6 Reliability-associated Cost Assessment and Management Decisions for M66V
Consider, for example, the best solution obtained for A ' = 0.99 (encoded by
the string 00000100000100100000001000001). The minimal cost system exten-
sion plan that provides the system availability A = 0.99 presumes the addition of
a primary feeder of version 6, a primary conveyor of version 5, a stacker of ver-
sion 3, a secondary feeder of version 7, and a secondary conveyor of version 4.
The cost of this extension plan is $4.358 million.
References
Dhillon BS (2000) Design reliability: fundamentals and applications. CRC Press, London
Goldner Sh (2006) Markov model for a typical 360 MW coal fired generation unit. Commun
Depend Qual Manag 9(1):2429
Kuo W, Prasad VR (2000) An annotated overview of system reliability optimization. IEEE Trans
Reliab 49(2): 487493
Kuo W, Zuo M (2003) Optimal Reliability Modeling Principles and Applications. Wiley, New
York
Levitin G (2005) Universal generating function in reliability analysis and optimization. Springer,
London
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
Lisnianski A, Frenkel I, Khvatskin L, Ding Y (2008) Multistate system reliability assessment by
using the Markov reward model. In: Vonta F, Nikulin M, Limnios N, Huber-Carol C (eds)
Stochastic models and methods for biomedical and technical systems. Birkhauser, Boston: pp
153168
Logistics Management Institute (LMI) (1965) Life cycle costing in equipment procurement, Re-
port No. LMI Task 4C5, LMI, Washington, DC
MILHDBK338B (1998). Electronic reliability design handbook. US Department of Defense,
Washington, DC
Ryan W (1978) Procurement views of life cycle costing. In: Proceedings of the Annual Sympo-
sium on Reliability, pp 164168
Ushakov I (1987) Optimal standby problem and a universal generating function. Sov J Comput
Syst Sci 25:6173
7 Aging Multi-state Systems
Many technical systems are subjected during their lifetime to aging and degrada-
tion. After any failure, maintenance is performed by a repair team. This chapter
considers an aging MSS, where the system failure rate increases with time.
Maintenance and repair problems for binary-state systems have been widely
investigated in the literature. Barlow and Proshan (1975), Gertsbakh (2000), Val-
dez-Flores and Feldman (1989), and Wang (2002) survey and summarize theoreti-
cal developments and practical applications of maintenance models. Aging is usu-
ally considered as a process that results in an age-related increase of the failure
rate. The most common shapes of failure rates have been observed by Gertsbakh
and Kordonsky (1969), Meeker and Escobar (1998), Bagdonavicius and Nikulin
(2002), and Wendt and Kahle (2006), Finkelstein (2003). An interesting approach
was introduced by Finkelstein (2005, 2008), where it was shown that aging is not
always manifested by an increasing failure rate. For example, it can be an upside-
down bathtub shape of the failure rate that corresponds to a decreasing mean re-
maining lifetime function.
After each corrective maintenance action or repair, the aging systems failure
rate (t ) can be expressed as (t ) = q (0) + (1 q) (t ), where q is an im-
provement factor that characterizes the quality of the overhauls ( 0 q 1 ) and
(t ) is the aging systems failure rate before repair (Zhang and Jardine 1998). If
q = 1, than the maintenance action is perfect (the system becomes as good as
new after repair). If q = 0, this means that the failed system is returned back to a
working state by minimal repair (the system stays as bad as old after repair), in
which the failure rate of the system is nearly the same as before. The minimal re-
pair is appropriate for large complex systems (consisting of many different com-
ponents) where the failure occurs due to one (or a few) component(s) failing. So,
the minimal repair is usually appropriate for MSSs, and in this chapter we usually
274 7 Aging Multi-state Systems
will deal only with MSSs under minimal repairs, where q = 0. In such situations,
the failure pattern can be described by a non-homogeneous Poisson process
(NHPP). (A detailed description of NHPP can be found in Appendix B.) Incorpo-
rating the time-varying failure intensity into existing Markov models was sug-
gested in Welke et al. (1995) for reliability modeling of hardware/software sys-
tems. More details and interesting examples can be found in Xie et al. (2004).
Here we describe an extended approach (Lisnianski and Frenkel 2009), which in-
corporates the time-varying failure intensity of aging components into a Markov
reward model that is used for general reliability measure evaluation of nonaging
MSSs. Such a model will be called a non-homogeneous Markov reward model.
As was written in Chapter 2, for a Markov MSS, transition rates (intensities) aij
between states i and j are defined by the corresponding system failure ij and re-
pair ij rates. In MSSs, aging can be indicated in any failure rate that may be in-
creasing as a function of time ij (t ). A minimal repair is a corrective maintenance
action that brings the aging equipment to the condition it was in just before the
failure occurrence. An aging MSS subject to minimal repairs experiences reliabil-
ity deterioration with the operating time, i.e., there is a tendency toward more fre-
quent failures. In such situations, the failure pattern can be described by a Poisson
process whose intensity function monotonically increases with t. A Poisson proc-
ess with a nonconstant intensity is called non-homogeneous, since it does not have
stationary increments (Gertsbakh 2000). It was shown (see, for example, Xie et al.
2004) that an NHPP model can be integrated into a Markov model with time-
varying transition intensities aij (t ). Therefore, for aging MSSs, transition intensi-
ties corresponding to failures of aging components will be increasing functions of
time aij(t).
For a non-homogeneous Markov model a systems state at time t can be de-
scribed by a continuous-time Markov chain with a set of states {1, , K} and a
transition intensity matrix a = aij (t ) , i, j = 1,, K , where each transition inten-
sity may be the function of time t. Chapman-Kolmogorov differential equations
should be solved in order to find state probabilities for such a system (Trivedi
2002). For a non-homogeneous Markov reward model it is assumed that if the
process stays in any state i during the time unit, a certain amount of money rii is
gained. It is also assumed that each time the process transits from state i to state j
an additional amount of money rij is gained. A reward may also be negative when
it characterizes a loss or penalty. Such a reward process associated with non-
homogeneous Markov system states transitions is called a non-homogeneous
Markov process with rewards. For such processes, in addition to a transition inten-
sity matrix a, a reward matrix r = rij , i, j = 1,, K should be determined.
Let Vi (t ) be the expected total reward accumulated up to time t, given the ini-
tial state of the process at time instant t = 0 is in state i. Howard differential equa-
7.1 Markov Model and Markov Reward Model for Increasing Failure Rate Function 275
dVi (t ) K K
= rii + aij (t )rij + aij (t )V j (t ) , i = 1, 2,..., K . (7.1)
dt j =1 j =1
j i
In the most common case, the MSS begins to accumulate rewards after time in-
stant t = 0; therefore, the initial conditions are as follows:
If, for example, the state K with the highest performance level is defined as the
initial state, the value VK ( t ) should be found as a solution of the system (1).
It was shown in Lisnianski and Levitin (2003) and Lisnianski and Frenkel
(2009) that many important reliability measures for aging MSSs can be found by
determining the rewards in a corresponding reward matrix. In the following case
study we extend this approach for aging MSSs under minimal repair. We should
remark that if the repair is not minimal, the Markov properties for such MSSs are
not justified and the approach cannot be applied.
year1, 34 = 446.9 year 1 . The demand is constant w = 300 KW and the power
unit failure is treated as generating capacity decreasing below demand level w.
The state-transition diagram for the system is presented in Figure 7.1.
Based on this state-transition diagram we assess the MSS average availability,
mean total number of system failures, accumulated mean performance deficiency
(in this case, where the generating system is considered, this will be the expected
energy not supplied to consumers).
276 7 Aging Multi-state Systems
According to the state-transition diagram in Figure 7.1 the transition intensity ma-
trix a (7.3) can be obtained.
-14 0 0 14
0 -24 0 24
a= . (7.3)
0 0 -34 34
41 42 ( t ) 43 ( 41 + 42 ( t ) + 43 )
0 0 0 0
0 0 0 0
r = rij =
0 0 1 0 . (7.4)
0 0 0 1
7.1 Markov Model and Markov Reward Model for Increasing Failure Rate Function 277
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.3) and reward matrix (7.4) under initial condi-
tions Vi ( 0 ) = 0, i = 1,..., 4. . The results of calculation can be seen in Figure 7.2.
Calculation results are presented for two cases: for an aging unit with
42 ( t ) = 7.01 + 0.2189t 2 (dashed-dotted line) and for a non-aging unit
where 42 = 7.01 constant (bold line).
0.998
0.996
Average availability
0.994
0.992
0.99
0.988
0.986
0 1 2 3 4 5
Time (years)
Fig. 7.2 Calculation the MSS average availability
As one can see from Figure 7.2, the average availability for an aging MSS is
lower than the average availability for a non-aging MSS. Aging impact increases
over time.
In order to find the mean total number of system failures N f ( t ) we should pre-
sent reward matrix r in the following form:
0 0 0 0
0 0 0 0
r = rij = . (7.5)
0 0 0 0
1 1 0 0
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.3) and reward matrix (7.5) under initial conditions
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.3. Calcu-
lation results are presented for two cases: for an aging MSS with
278 7 Aging Multi-state Systems
60
Mean number of system failures
50
40
30
20
10
0
0 1 2 3 4 5
Time (years)
Fig. 7.3 Mean number of system failures
300 0 0 0
0 85 0 0
r = rij = . (7.6)
0 0 0 0
0 0 0 0
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.3) and reward matrix (7.6) under initial conditions
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.4. Calcu-
lation results are presented for two cases: for aging MSS with
42 ( t ) = 7.01 + 0.2189t 2 (dashed-dotted line) and for a corresponding non-aging
MSS where 42 = 7.01 constant (bold line). Accumulated performance deficiency
for aging MSS is greater than accumulated performance deficiency for corre-
sponding non-aging MSS.
7.1 Markov Model and Markov Reward Model for Increasing Failure Rate Function 279
61320
52560
Accumulated performance deficiency
43800
35040
26280
17520
8760
0
0 1 2 3 4 5
Time (years)
For computation of the mean time to failure and the probability of MSS failure
during the time interval, the state-space diagram of generated system should be
transformed all transitions that return the system from unacceptable states
should be forbidden and all unacceptable states should be treated as an absorbing
state. The state-transition diagram may be presented as shown in Figure 7.5.
41
g4 = 360
43
34
g3 = 325
3
w=300
41 +42 (t)
According to this state space diagram transition intensity matrix a can be repre-
sented as follows:
280 7 Aging Multi-state Systems
0 0 0
a= 0 -34 34 . (7.7)
41 + 42 ( t ) 43 ( 41 + 42 ( t ) + 43 )
In order to find the mean time to failure we should represent the reward ma-
trixes r in the following form:
0 0 0
r = rij = 0 1 0 . (7.8)
0 0 1
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.7) and reward matrix (7.8) under initial conditions
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculation are presented in Figure 7.6.
0.12
0.1
Mean time to failure (years)
0.08
0.06
0.04
0.02
0
0 0.2 0.4 0.6 0.8 1
Time (years)
In order to find the probability of MSS failure during the time interval [0,T], we
should represent the reward matrixes r in the following form:
0 0 0
r = rij = 0 0 0 . (7.9)
1 0 0
7.2 Numerical Methods for Reliability Computation for Aging Multi-state System 281
The system of differential equations (7.1) must be solved for transition inten-
sity matrix (7.7) and reward matrix (7.9) under initial conditions
Vi ( 0 ) = 0, i = 1,..., 4. The results of calculating the MSS reliability function are
presented in Figure 7.7.
0.8
MSS reliability function
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
Time (years)
Fig. 7.7 MSS reliability function during the time interval [0,T]
From all graphs one can see age-related MSS reliability decreasing compared
with non-aging MSS. In the last two figures graphs for mean time to failure and
reliability functions for aging and non-aging units are almost the same because of
the fact that the first MSS failure usually occurs within a short time (less than 0.5
years according to Figure 7.6) and aging impact is negligibly small for such a
short period. Thus, graphs for aging and non-aging MSS cannot be visually sepa-
rated for these two cases.
In Section 7.1 we did not discuss a technique for numerically solving system (7.1).
But for some cases this may be necessary. Generally system (7.1) of differential
equations with variable coefficients may be solved using such tools as
MATLAB, MATHCAD, etc. But even these very powerful tools sometimes
solve this system with inappropriate accuracy. In Figure 7.8 one can see a typical
example of such a mistake.
282 7 Aging Multi-state Systems
0.12
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5
Time (years)
Fig. 7.8 Example of MATLAB mistake
Figure 7.8 presents the mean time to system failure (dashed-dotted line), which
is computed using MATLAB for the multi-state generating unit (see case study
from Section 7.1). The real mean time to system failure was computed in Section
7.1 and was presented in Figure 7.6. (This computation was based on a special ap-
proximation method that will be defined below.) To compare the results this
curve is also shown in Figure 7.8 (bold line). As one can see from Figure 7.8 these
two curves are essentially different. Such inaccuracy is noticed only when we are
dealing with aging MSS, or, in other words, when we are solving system (7.1)
with non-constant failure rates. All tools such as MATLAB, MATHCAD,
MATHEMATICA, etc. are perfect for solving a system for non-aging MSS (with
constant failure rates). Recently in engineering practice, when a system with non-
constant failure rates is solved, there is no way to predict whether there will be in-
accuracies. Moreover, it is often not easy to discover such cases, especially when
optimization problems are solved. In order to find an optimum, a corresponding
search procedure should be organized, and usually a great number of computations
for any reliability index should be performed. It is impossible to analyze each so-
lution online. Thus, in engineering practice for reliability computation for aging
MSS we recommend a special approximation approach, which is based on solving
system (7.1) for specific constant failure rates. The approach will be presented be-
low and it will be shown that the approach can prevent inaccuracies and an engi-
neer can be sure of the results.
7.2 Numerical Methods for Reliability Computation for Aging Multi-state System 283
The ordinary Markov model and the Markov reward model were explained in de-
tails in Chapter 2. Here we briefly come back to this. We suppose that the Markov
model for the system was built under the assumption that the time to failure and
time to repair are distributed exponentially and there is no aging in the system
(failure rate function is constant). We also suppose that the Markov model for the
system has K states that may be presented by a state-space diagram, as can transi-
tions between states. Intensities aij , i, j = 1,..., K of transitions from state i to
state j are defined by corresponding failure and repair rates.
Let p j (t ) be the probability of state j at time t. The following system of differ-
ential equations for finding the state probabilities p j (t ), j = 1,..., K for the
Markov model can be written as
K
dp j ( t ) K
= pi ( t ) aij p j ( t ) a ji . (7.10)
dt i =1 i =1
i j i j
For a Markov reward model construction it is assumed that while the system is
in any state i during any time unit, some money rii will by paid. It is also assumed
that if there is a transition from state i to state j, the amount rij will by paid for
each transition. The amounts rii and rij are called rewards. They can be negative
while representing loss or penalty. The objective is to compute the total expected
reward accumulated from t = 0, when the system begins its evolution in the state
space, up to the time instant T under specific initial conditions.
Let V j (t ) be the total expected reward accumulated up to time t, if the system
begins its evolution at time t = 0 from state j. According to Section 2.4, the fol-
lowing system of differential equations must be solved in order to find this re-
ward:
dV j ( t ) K K
= rjj + aij rij + aijVi ( t ), j = 1, 2,..., K . (7.11)
dt i =1 i =1
i j
The main idea of the suggested approach (Ding et al. 2009) is the partition of
system lifetime into some intervals, where for each time interval, the failure rate
may be assumed to be constant. In this case, the Markov reward model (7.11)
without aging may be applied in order to find the accumulated total expected re-
ward for each interval.
Table 7.1 Lower and upper bound approximations of the failure rate as piecewise constant func-
tions
Lower bound failure rate Upper bound failure rate
Interval approximation approximation
Time interval
no. n n+
1 0, t (0) (t )
2 t , 2t (t ) (2t )
Denote as N the number of intervals that partition the system lifetime T. The
length of each interval is t = T N . . The failure rate (t ) in each time interval
[ t (n 1), t n] , 1 n N , can be approximated by two constant values
and , which represent the value of function (t ) respectively at the be-
n n+
ginning and at the end of the corresponding nth time interval. Thus, we have
n = (t (n 1)) , (7.12)
n + = (t n) . (7.13)
Using (7.10) the two following systems of differential equations can be used to
find state probabilities Pjn and Pjn + at the end of each time interval tn :
dp nj (t ) K K
= pin (t )aijn p nj (t ) a nji , (7.15)
dt i =1 i =1
i j i j
dp nj + (t ) K K
= pin + (t )aijn + p nj + (t ) a nji+ , (7.16)
dt i =1 i =1
i j i j
where i, j = 1, 2,..., K , n = 1, 2,..., N , and aijn and aijn + are intensities of transitions
from state i to state j, which use lower n and upper n + bounds of failure rates
for each time interval tn , respectively.
The initial state of the system is certainly known only for the first interval. We
assumed that the system is in state K at time t = 0. Therefore the initial conditions
for equations (7.15) for the first time interval n = 1 are
and the initial conditions for Equation 7.16 for the first time interval n = 1 are
For any other time interval tn , n = 2,3,..., N , the initial conditions (initial dis-
tribution of state probabilities) for the next time interval are defined by the solu-
tions (distribution of state probabilities) at the end of the previous interval and are
defined by the following recurrent formulas:
By solving the differential equations of systems (7.15) and (7.16) under initial
conditions (7.17) and (7.18), respectively, we determine the corresponding state
probabilities Pj1 and Pj1+ for each state j at the end of the first time interval
( t = t ) .
7.2 Numerical Methods for Reliability Computation for Aging Multi-state System 287
Therefore, the lower Aw (t ) and upper Aw+ (t ) bounds of the MSSs avail-
ability at the end of the first time interval ( n = 1) can be defined for any required
demand level w as follows:
Aw (t ) = P
gi w
1+
j , (7.21)
Aw+ (t ) = P
g j w
1
j . (7.22)
The lower and upper bounds of MSSs availability at the end of the nth time-
interval ( t = nt , n = 2,..., N ) can be defined for any required demand level w as
follows:
Aw ( nt ) = P j
n+
, (7.23)
g j w
Aw+ ( nt ) = P j
n
. (7.24)
g j w
This procedure should be repeated till the end of the last time interval. The
lower Aw ( N t ) and upper Aw+ ( N t ) bounds of the MSSs availability at the end
of the last time-interval n = N ( t = N t ) can be defined for any given demand
level w. It should be noted that N t = T . Therefore, the MSSs availability at life-
time T is within the bounds
Aw ( N t ) Aw (T ) Aw+ ( N t ). (7.25)
Solving the system of differential equations (7.11) for these two values of the
(
function (t ) n and n + ) we can determine the lower and upper bounds of re-
wards Vi n
and Vi , accumulated during each time interval [ t (n 1), t n ],
n+
dVi n ( t ) K K
= rii + aijn rij + aijn V jn ( t ), i = 1, 2,..., K , n = 1,..., N , (7.26)
dt j =1 j =1
j i
dVi n + ( t ) K K
= rii + aijn + rij + aijn +V jn + ( t ), i = 1, 2,..., K , n = 1,..., N , (7.27)
dt j =1 j =1
j i
where t [0, t ].
Failure rates n and n + for each time interval [ t (n 1), t n ] determine
the corresponding elements of the transition intensity matrixes aijn and aijn + in
(7.26) and (7.27).
The initial reward values are zeroed for any state k and for any time interval n:
Thus, by solving (7.26) and (7.27) we obtain the lower Vi n (t ) and upper
Vi n + (t ) bounds for an expected reward for any time interval n, under the condi-
tion that the system begins its evolution at initial time t = 0 from any state
i = 1, 2, K . In other words, by solving N times (for each time interval n) systems
(7.26) and (7.27) we will have the rewards Vi n (t ) accumulated during each time
interval n if we use for this interval lower bounds n of the failure rate function,
and the will have the rewards Vi n + (t ) accumulated during each time interval n if
we use for this interval the upper bounds n + of the failure rate function. There-
fore, as a result we get for any time interval n = 1, 2, , N two vector columns:
7.2 Numerical Methods for Reliability Computation for Aging Multi-state System 289
{V
1
n
} { }
(t ), V2n (t ),...,VKn (t ) , V1n + (t ),V2n + (t ),..., VKn + (t ) . (7.29)
As one can see, the expected reward for each time interval depends strictly on
the initial state i [1,..., K ] . The initial state of the system is known with certainty
only for the first interval. For any other time interval n we can find only the prob-
ability distribution Pi n = Pr {i = k } , k = 1, 2,..., K of the initial states. If the prob-
ability distribution of the initial states is known for each time interval, the mean
reward accumulated during this interval can be defined as the sum of rewards
Vi n (t ) (V i
n+
)
(t ) , i = 1,, K , weighted according to the corresponding prob-
n
abilities Pi of the initial states.
Based on the system of differential equations (7.10) we can find these distribu-
tions for each time interval n for the lower and upper bounds of function ( t ) .
We designate these distributions as Pjn ( t ) and Pjn + ( t ) , respectively. For the first
time interval [ 0, t ] ( n = 1) these distributions are known. Without loss of gener-
ality we assume that the system is in state K at time t = 0, so PK1 (0) = 1 and
PK11 (0) = ... = P11 (0) = 0 and PK1+ (0) = 1 and PK1+1 (0) = ... = P11+ (0) = 0. The sys-
tem of differential equations to determine the state probabilities Pjn ( t ) and
Pjn + ( t ) , j = 1, 2,..., K , for each time t [t (n 1), t n] , 1 n N , can be writ-
ten in the following manner:
K
dPjn ( t ) n
K
= Pi ( t ) aij Pjn ( t ) a nji , j = 1, 2,..., K , n = 1, 2,..., N , (7.30)
n
dt i =1 i =1
i j i j
K
dPjn + ( t ) K
= Pi n + ( t ) aijn + Pjn + ( t ) a nji+ , j = 1, 2,..., K , n = 1, 2,..., N . (7.31)
dt i =1 i =1
i j i j
The initial conditions for the system of differential equations (7.30) were de-
fined above for the first time interval. For any other time interval
[t (n 1), t n], n = 2,3,..., N , the initial conditions are defined by the follow-
ing recurrent formula:
This means that the initial conditions (initial distribution of state probabilities)
for the next interval are defined by the solutions (distribution of state probabilities)
at the end of the previous interval.
The initial conditions for the system of differential equations (7.31) are defined
in the same manner. For the first time interval [ 0, t ] ( n = 1) the initial conditions
were defined above: PK1+ (0) = 1 and PK1+1 (0) = ... = P11+ (0) = 0. For any other time
interval [t (n 1), t n], n = 2,3,..., N , the initial conditions are defined by the
following recurrent formula:
(V ( t ) ) , i = 1,, K ,
i
n+
corresponding to this interval, weighted according to the
( )
initial state probabilities Pjn t ( n 1) Pjn + t ( n 1) that were found as
the solution of Equation 7.30 (or 7.31) for the previous interval. Therefore, we ob-
tain
K
V n = Vi n [ t ] Pi n t ( n 1) , n = 1,..., N , (7.34)
i =1
K
V n + = Vi n + [ t ] Pi n + t ( n 1) , n = 1,..., N . (7.35)
i =1
Now the lower (upper) bounds for the total expected reward (TER) accumu-
lated during system lifetime T = t N can be obtained as the sum of mean re-
wards V n ( V n + ) for all N intervals:
N
TER = V n , (7.36)
n =1
N
TER + = V n + . . (7.37)
n =1
Repeating calculations TER and TER + for increasing N one can estimate the
TER with assign level of accuracy.
7.3 Reliability-associated Cost Assessment for Aging Multi-state System 291
Most technical systems are repairable. For many kinds of industrial systems, it is
very important to avoid failures or reduce their occurrences and duration in order
to improve system reliability and reduce the corresponding costs.
With the increasing complexity of systems, only specially trained staff with
specialized equipment can provide system service. In this case, maintenance ser-
vice is provided by an external agent and the owner is considered a customer of
the agent for maintenance service. In the literature, different aspects of mainte-
nance service have been investigated (Almeida 2001; Murthy and Asgharizadeh
1999; Asgharizadeh and Murthy 2000).
Usually there are a number of different companies that provide maintenance for
a technical system. From this point of view, a service market offers a customer
different types of maintenance contracts. Such contracts have different parameters,
related to the conditions of the services provided. The main parameters are re-
sponse time, service time, and costs (Almeida 2001). Response time depends
mainly on customer location. Service time depends on the repair teams profes-
sional skills and the required equipment. Generally, a faster response and a more
qualified repair team provide more expensive services. Well say that these pa-
rameters determine the maintenance contract level.
On the one hand, it is better for the customer to choose a contract with minimal
repair costs, but on the other hand, it should be taken into account that if repair
time increases, losses due to system failures will be greater too. These losses are
defined by the corresponding penalties, paid when the system has failed. In addi-
tion, in order to make a decision, the customer should take into account the corre-
sponding cost of system functioning operation cost. This cost is defined by the
fuel, electric energy, etc. needed for system functioning. When the system or some
of its parts fail, the operation cost will change. The sum of operation costs, repair
costs and penalty costs accumulated during the system life span will define the
RAC. The best decision for the customer will lead to contract that corresponds to a
minimum of RAC.
In this section, a general approach is suggested for computing RAC, accumu-
lated during the aging systems lifespan. The approach is based on a piecewise ap-
proximation of an increasing failure rate function for different time intervals and
on consecutive applications of the Markov reward model. A special iterative com-
putational procedure was suggested for the RAC estimation by determining its
lower and upper bounds. The main advantage of the suggested approach is that it
can be easily implemented in practice by reliability engineers; it is based solely on
ordinary Markov methods.
We will define RAC as the total cost incurred by the user in operations and
maintenance of a system during its lifetime. Thus, RAC will comprise the opera-
292 7 Aging Multi-state Systems
tions cost, cost of repair and the penalty cost accumulated during the system life-
span. Therefore,
RAC = OC + RC + PC , (7.39)
where
OC is the system operations cost accumulated during the systems lifetime. It
may be, for example, the cost of primary fuel for an electrical generator, the
cost of consuming electrical energy for an air conditioning system, and so on.
Introducing redundant elements usually requires additional operating cost.
When the system or some of its elements failed, the operation cost can de-
crease;
RC is the repair cost incurred by the user in operating and maintaining the sys-
tem during its lifetime; and
PC is the penalty cost accumulated during the system lifetime that was paid
when the system failed.
We suggest that T is the system lifetime. During this time the system may be in
acceptable states (system functioning) or in unacceptable states (system failure).
After any failure, a corresponding repair action is performed and the system re-
turns to one of the previous acceptable states. Every entrance of the system into a
set of unacceptable states (system failure) and the systems residing in unaccept-
able states is associated with a penalty.
A maintenance contract is an agreement between the repair team and the sys-
tem's owner that guarantees a specific level of services being delivered. The main-
tenance contract defines some important parameters that determine the service
level and corresponding costs. The main time parameters are mean response time
and mean repair time. Without loss of generality here we will deal with only one
parameter, mean repair time, Trm , where m ( m = 1, 2,..., M ) is a possible mainte-
nance contract level and M is the number of such levels.
Repair cost crm for individual repair action depends on repair time, and so it
corresponds to the maintenance contract level m. It usually ranges between the
most expensive repair, where the repair should be completed during the minimal
specified time Trmin after the failure occurrence, and the lowest cost, where the re-
pair should be completed during the maximal specified time Trmax after the failure
occurrence. Thus, Trmin Trm Trmax .
The problem is to find the expected RAC corresponding to each maintenance
contract. According to the suggested approach, this cost is represented by the total
expected reward, calculated via a specially developed Markov reward model.
7.3 Reliability-associated Cost Assessment for Aging Multi-state System 293
Consider the air conditioning system used in hospitals (Lisnianski et al. 2008).
The system consists of two main online air conditioners and one air conditioner in
cold reserve. The workout schedule of the system is that the reserved air condi-
tioner comes online only when one of the main air conditioners has failed.
In the numerical calculation we used the following data. The increasing failure
rates of both air conditioners are described via a Weibull distribution with failure
rate function ( t ) = t 1 and parameters = 1.5849 and = 1.5021 for
the main air conditioners and = 4.1865 and = 1.3821 for the reserved air
conditioner. So, for the main air conditioner ( t ) = 3t 0.5021 and for the reserved air
conditioner * ( t ) = 10t 0.3821 .
The repair rates for the main and reserve air conditioners are the same,
m = m* , and may change from 7.7 day-1 up to 6 h-1, according to the mainte-
nance contract level (m). The repair cost ( crm ) also depends on the maintenance
contracts level. There are ten levels of maintenance contracts, available on the
market. They are characterized by the repair rate (MTTR-1, where MTTR is the
mean time to repair) and the corresponding repair cost per repair as shown in Ta-
ble 7.2. The operation cost, cop , is equal to $400 per year.
Using method we shall find the best maintenance contract level m that provides
a minimum of RAC during system lifetime T = 10 years.
At first, we build an ordinary Markov model for this system and a Markov re-
ward model under the assumption that failure rates are constant.
294 7 Aging Multi-state Systems
Fig. 7.10 State-transition diagram for the system with two online air conditioners and one air
conditioner in cold reserve
There are six states in the state-space diagram. In state 6 both main air condi-
tioners are online and the reserve air conditioner is available. In state 5 one of the
main air conditioners failed and was replaced by the reserve air conditioner. In
state 4 the second main air conditioner failed; only the reserve air conditioner is
online. In state 3 the reserve air conditioner failed, and only one main air condi-
tioner is online. In state 2 the reserve air conditioner failed, and two main air con-
ditioners are online. In state 1 the system suffers complete failure.
According to the technical requirements, two online air conditioners are needed
and so there are three acceptable states states 2, 5, and 6, and 3 unacceptable
states states 3, 4, and 1. Any entrance to the set of unacceptable states is associ-
ated with the penalty cost, c p , equal to $1000 for each entrance.
Transitions from state 4 to state 5 and from state 1 to state 3 are associated with
the repair of one of the main air conditioners and have an intensity of 2. Transi-
tions from state 5 to state 6 and from state 3 to state 2 are associated with failure
of the main air conditioner and have an intensity of . Transitions from state 3 to
state 5, from state 2 to state 6, and from state 1 to state 4 are associated with the
repair of the reserve air conditioner and have an intensity of *.
Thus, the transition intensity matrix for MSS with two online air conditioners
and one air conditioner in cold reserve is as follows:
7.3 Reliability-associated Cost Assessment for Aging Multi-state System 295
(2m + m* ) 0 2m m* 0 0
0 (2 + )
n *
m 2 n
0 0 m*
n
m ( + m + m* ) 0 *
0 (7.40)
a= m
.
n* 0 0 ( n* + 2m* ) 2m 0
0 0 n* n ( n + n* + m ) m*
0 0 0 0 2 n
2 n
dp1 (t )
dt = p3 (t ) + p4 (t ) (2 + ) p1 (t ),
n n* *
dp2 (t ) = p (t ) (2 n + * ) p (t ),
dt 3 2
dp3 (t ) = n* p (t ) + 2 n p (t ) + 2 p (t ) ( n + + * ) p (t ),
dt 5 2 1 3
(7.41)
dp4 (t ) = n p (t ) + * p (t ) ( n* + 2 ) p (t ),
dt 5 1 4
dp (t )
5 = 2 n p6 (t ) + 2 p4 (t ) + * p3 (t ) ( n + n* + ) p5 (t ),
dt
dp (t )
6 = p5 (t ) + * p2 (t ) 2 n p6 (t ).
dt
The system of differential equations (7.41) defines the ordinary Markov model
for the air conditioning system under the assumption that all failure rates are con-
stant.
At the second step we build for the given system a Markov reward model under
the assumption that failure rates are constant. To calculate the total expected re-
ward, the reward matrix for the system with two online air conditioners and one
air conditioner in cold reserve is built in the following manner.
If the system is in state 6, , or 2, the costs associated with the use of two air
conditioners (operation cost) must be paid during any time unit:
r66 = r55 = r22 = 2cop . If the system is in states 4 or 3, the rewards associated with
the use of only one air conditioner must be paid during any time unit:
r44 = r33 = cop . State 1 of the system is unacceptable, so there are no rewards asso-
ciated with this state: r11 = 0.
Transitions from state 5 to state 3 or 4, or from state 2 to state 3 is associated
with the failure of one of the main air conditioners and rewards associated with
this transitions are a penalty: r23 = r53 = r54 = c p . Transitions from state 4 or 3 to
state 1 are associated with complete system failure. The rewards associated with
these transitions are zero: r31 = r41 = 0. Transitions from state 1 to state 3 or 4,
296 7 Aging Multi-state Systems
from state 2 to state 6, from state 3 to state 2 or 5, from state 4 to state 5, and from
state 5 to state 6, are associated with the repair of the air conditioner, and the re-
ward associated with this transition is the mean cost of repair,
r13 = r14 = r26 = r32 = r35 = r45 = r56 = crm .
The reward matrix for system with two online air conditioners and one air con-
ditioner in cold reserve is as follows:
0 0 crm crm 0 0
0 2cop cp 0 0 crm
0 crm cop 0 crm 0
r = rij = . (7.42)
0 0 0 cop crm 0
0 0 cp cp 2cop crm
0 0 0 0 0 2cop
Taking into consideration transition intensity matrix (7.40), the system of dif-
ferential equations for the calculation of the total expected reward may be written
in the following manner (7.43).
The system of differential equations (7.43) defines the Markov reward model
for the air conditioning system under the assumption that all failure rates are con-
stant.
dV1n (t )
= crm (2 + * ) (2 + * )V1n (t ) + 2 V3n (t ) + *V4n (t ),
dt
dV2n (t )
= 2cop + 2c p n + crm * (2 n + * )V2n (t ) + 2 nV3n (t ) + *V6n (t ),
dt
dV3n (t )
= cop + crm ( + * ) + nV1 (t ) + V2n (t ) ( n + + * )V3n (t ) + *V5n (t ),
dt
dV4n (t ) (7.43)
= cop + 2crm + n*V1n (t ) ( n* + 2 )V4n (t ) + 2V5n (t ),
dt
dV5n (t )
= 2cop + c p ( n + n* ) + crm + n*V3n (t ) + nV4n (t ) ( n + n* + )V5n (t )
dt
+ V6n (t ),
n
dV6 (t )
dt = 2cop + 2V5 (t ) 2V6 (t ).
n n
Now using the method presented in Section 7.2, the RAC will be calculated for
any maintenance contract level m = 1, 2,...,10 (Table 7.2) by performing the fol-
lowing steps:
1. Define the system lifetime T = 10 years and number of time intervals N = 10.
7.3 Reliability-associated Cost Assessment for Aging Multi-state System 297
Figure 7.11 shows the lower and upper bounds of the expected RAC for T = 10
years and N = 10 ( t = 1 year ) as a function of the MTTR.
4
x 10
4
Lower bound
Upper bound
3.5
Exact Value
Total expected reward
2.5
1.5
0.7 0.8 0.9 1.0 1.2 1.5 1.8 2.4 3.6 7.2
Mean time to repair (days)
Fig. 7.11 The lower and upper bounds and exact value of the total expected reward (RAC) ver-
sus MTTR
The MTTR, which provides the minimal expected RAC for the system, is 1.2 d.
Choosing a more expensive maintenance contract level, we pay an additional
payment to the repair team. Choosing a less expensive one, we pay more for pen-
alties because of more entrances to unacceptable states.
Decreasing the length of interval t , we decrease the difference between the
lower and upper bounds of the expected reliability associated cost. For example, if
t = 1 year, the lower and upper bounds of the expected RAC for MTTR = 1.2 d
are $19,372 and $21,388, respectively, and the difference is 10.4%. If t = 0.01
year, this difference is only 0.093%, and if t = 0.001 year, the difference is neg-
ligible, and value $20,324 may be accepted as the exact value of the expected
RAC. In Figure 7.11 the results of calculation for t = 0.001 year are presented as
functions of MTTR. Because of the very small difference the corresponding
curves in Figure 7.11 are presented as a single curve Exact Value.
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 299
corresponding costs. The main time parameters are mean response time and mean
repair time. Without loss of generality, here we will deal with only one parameter,
mean repair time, Trm , where m ( m = 1, 2,..., M ) is a possible maintenance con-
tract level and M is the number of such levels. In additional it should be taken into
account that each maintenance contract has fixed a expiration date. For example, it
may be an agreement for 1 year only and then the system owner can get another
maintenance contract from a number of contracts available on the market. There-
fore, for the entire system lifetime T a sequence of maintenance contracts
m1 , m1 ,..., mL will define MSS maintenance and L is the number of different con-
tract periods.
A repair cost crm for individual repair action depends on repair time, and so it
corresponds to a maintenance contract level m. It usually ranges between the most
expensive repair, where the repair should be completed during the minimal speci-
fied time Trmin after failure occurrence, and the lowest cost, where the repair
should be completed during a maximal specified time Trmax after failure occur-
rence. Thus, Trmin Trm Trmax .
According to the generic multi-state model, the system or system components
can have different states corresponding to various performance levels, represented
by the set g = { g1 ,...g K } . The set is ordered so that gi gi 1 . The failure rate is
defined as the transition intensity of the system or components for any transition
from state i to state j , i > j ( gi g j ). In this section we are dealing only with
minimal repairs. So after each maintenance the failed system is returned back to
its working state (the reliability remains as bad as old after repair), in which the
failure rate of the system is the same as before the repair.
The repair cost crm corresponding to repair time Trm depends on the mainte-
nance contract level m. Therefore, the system total expected cost also depends on
m
the maintenance contract level m and can be designated as E[CTC ], where E is an
expectation symbol.
MSS availability Aw (t ) according to Lisnianski and Levitin (2003) is treated as
the probability that MSS at instant t > 0 will be in one of the acceptable states,
where the system performance is greater than or equal to the required demand
level w.
The problem is to find a maintenance contract from the sequence of mainte-
nance contracts m1 , m2 ,..., mL that minimizes the total expected cost accumulated
during system lifetime T and provides the desirable system availability level,
which is defined as the system availability Aw (T ) at lifetime T that is larger than a
pre-defined value Aw0 (T ).
Thus, mathematically the problem can be formulated as follows:
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 301
Find
min
m1,m2 ,...,mL {E C } ,
m
TC (7.44)
Aw (T ) Aw 0 (T ) . (7.45)
The ordinary Markov model for MSSs was built under the assumption that time
to failure and time to repair are distributed exponentially and there is no aging in
the system (failure rate function is constant in the lifetime), which cannot be di-
rectly used in the availability estimation for aging MSSs. However, using the
technique proposed in Section 7.2, the failure rates can be assumed to be constant
values for a specific time interval but vary with different time intervals. Therefore,
an ordinary Markov model can be used iteratively in different time intervals to
calculate the corresponding system availability.
The suggested algorithm for the calculation of the total expected cost and avail-
ability for any maintenance contract level m includes the following steps:
1. Set the system lifetime T years and number of time intervals N.
2. Calculate the length of each time interval t = T N .
3. Calculate and
n n+
for any interval n, n = 1,..., N according to formulas
(7.12) and (7.13).
4. Calculate the state probabilities Pjn and Pjn + , j = 1, 2,..., K at the end of each
time interval n as described in Section 7.2.2.
5. Calculate the lower and upper bounds for the MSSs availability during system
lifetime as described in Section 7.2.2. The MSSs availability is within the
bounds.
6. Calculate the lower and upper bounds of rewards Vi n and Vi n + , accumulated
during each time interval n as described in Section 7.2.3.
7. Calculate the lower and upper bounds for expected rewards V n and V n+ for
each time interval n via expressions (7.34) and (7.35) as the weighted sums.
8. Calculate the lower and upper bounds for the total expected reward accumu-
lated during system lifetime via formulas (7.36)(7.38). The system total ex-
pected cost is within the bounds.
302 7 Aging Multi-state Systems
1 2 2 3 6
Consider an air conditioning system used in hospitals that consists of two inde-
pendent air conditioners. Each air conditioner has three states: a perfectly func-
tioning state, a deteriorating state, and a complete failure state. We consider two
types of MSS failures in the model: major failure and minor failure. A major fail-
304 7 Aging Multi-state Systems
ure causes the air conditioner to transition from the perfectly functioning state to
the complete failure state, while a minor failure causes the air conditioner to tran-
sition from the perfectly functioning state to a deteriorating state or from the dete-
riorating state to the complete failure state. A major repair returns the air condi-
tioner from the complete failure state to the perfectly functioning state, while a
minor repair returns the air conditioner from the deteriorating state to the perfectly
functioning state. The state-space diagrams for the first and second air condition-
ers are shown in Figures 7.13 and 7.14, respectively.
Fig. 7.13 State-transition diagram for Fig. 7.14 State-transition diagram for condi-
conditioner 1 tioner 2
The cooling capacities (performance levels) of the first air conditioner are
g31 = 233 kW, g 12 = 150 kW, and g11 = 0 for states 3, 2 and 1, respectively. The
cooling capacities of the second air conditioner are g32 = 220 kW, , g 22 = 130 kW,
and g12 = 0 for states 3, 2 and 1, respectively.
The state-space diagram of the system is presented in Figure 7.15. All the sys-
tem states are generated as combinations of all possible states of air conditioners.
There are nine system states: state 9 is a perfectly functioning state, state 1 is a to-
tal failure state, and other states are deteriorating states. The cooling capacities of
the system states are g31 + g32 = 453 kW, g 12 + g32 = 370 kW, g31 + g 22 = 363 kW,
g 12 + g 22 = 280 kW, g31 + g12 = 233 kW, g11 + g32 = 220 kW, g12 + g12 = 150 kW,
g11 + g 22 = 130 kW, and g11 + g12 = 0.
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 305
The increasing failures rates of the system are described as linear functions:
g1 , g1 (t ) = 10 + 0.4t , g1 , g1 (t ) = 10 + 0.2t , g1 , g1 (t ) = 12 + 0.6t ,
3 1 2 1 3 2
There are eight repair contracts available on the market. Each contract is char-
acterized by repair rates ( 1 / Trmax 1 / Trmin ) and repair costs for different kinds
of failures as shown in Table 7.4. The system owner may select a repair contract
for each year.
The system cooling load (w) is 300 kW. The operational cost is
Cop = $0.06 / kWh. In states 16, the system cooling capacity is lower than the
demand. These states constitute the set of unacceptable states. For each entrance
into the set of unacceptable states, a penalty cost of C p = $500 should be paid.
States 79 constitute the set of acceptable states.
m r , m
C g1 , g1 ($) year 1 r , m
Cg1 , g1 ($) year 1 r , m
C g 2 , g 2 ($) year 1 r , m
C g 2 , g 2 ($) year 1
1 3 2 3 1 3 2 3
Aw (t ) = p7 (t ) + p8 (t ) + p9 (t ).
The problem is to find the optimal sequence of repair contracts for each year
that minimizes the system total expected cost accumulated during a lifetime of
T = 10 years and provides the required availability Aw0 (T ) = 0.97.
For a specific time interval n, the lower n or upper n + bounds of the ( t )
are used to represent failure rates in Equations 7.15 and 7.16. By solving these
systems of differential equations, we can determine state probabilities
Pjn = p nj (t n) and Pjn + = p nj + (t n) at the end of each time interval
tn = [ t (n 1), t n ] , 1 n N . Probability Pjn determines the probability of
state j at the end of time interval n if the failure rates during this time interval n are
constant and equal to lower bounds n of ( t ):
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 307
dp1n (t )
dt
( 1 3 1 3
)
= g1 , g1 + g 2 , g 2 p1n (t ) + gn2, g 2 p2n (t ) + gn1, g1 p3n (t )
2 1 2 1
+g 2 , g 2 p4 (t ) + g1 , g1 p5 (t ),
n n n n
3 1 3 1
dp2n (t )
dt
( 3 1 3 2 1 3 2
)
= g 2 , g 2 + g1 , g1 + gn2, g 2 p2n (t ) + gn2, g 2 p4n (t ) + gn1, g1 p6n (t )
2 2 1
+ g1 , g1 p7 (t ),
n n
3 1
( )
n
dp (t )
3
= g1 , g1 + g 2 , g 2 + gn1, g1 p3n (t ) + gn1, g1 p5n (t ) + gn2, g 2 p6n (t )
dt 2 3 1 3 2 1 3 2 2 1
+g 2 , g 2 p8 (t ),
n n
3 1
dp4n (t )
dt 2 3 3 2
(3 1 1 3
)
= g 2 , g 2 p1n (t ) + g 2 , g 2 p2n (t ) gn2, g 2 + gn2, g 2 + g1 , g1 p4n (t )
1 3
+g1 , g1 p8 (t ) + g1 , g1 p9 (t ),
n n n n
2 1 3 1
dp5n (t )
dt 1 3 2 3 1 3
(
3 2 3 1
)
= g1 , g1 p1n (t ) + g1 , g1 p3n (t ) g 2 , g 2 + gn1, g1 + gn1, g1 p5n (t )
+g 2 , g 2 p7 (t ) + g 2 , g 2 p9 (t ),
n n n n
2 1 3 1
( )
n
dp (t )
6
= g1 , g1 + g 2 , g 2 + gn2, g 2 + gn1, g1 p6n (t ) + gn1, g1 p7n (t )
dt 2 3 2 3 2 1 2 1 3 2
+g 2 , g 2 p8 (t ),
n n
3 2
dp7n (t )
dt 2 3 2 3
(
2 1 3 1 3 2
)
= g1 , g1 p2n (t ) + g1 , g1 p6n (t ) g 2 , g 2 + gn2, g 2 + gn1, g1 + gn1, g1 p7n (t )
1 3
+g 2 , g 2 p9 (t ),
n n
3 2
dp8n (t )
dt 2 3 2 3
(
2 1 3 1 3 2
)
= g 2 , g 2 p3n (t ) + g 2 , g 2 p6n (t ) g1 , g1 + gn1, g1 + gn2, g 2 + gn2, g 2 p8n (t )
1 3
+g1 , g1 p9 (t ),
n n
3 2
n
dp (t )
9
= g1 , g1 p4n (t ) + g 2 , g 2 p5n (t ) + g 2 , g 2 p7n (t ) + g1 , g1 p8n (t )
dt 1 3 1 3 2 3 2 3
(
gn2, g 2 + gn1, g1 + gn2, g 2 + gn1, g1 p9n (t ).
3 1 3 1 3 2 3 2
)
A similar system of differential equations for the calculation of state probabili-
ties Pjn + can also be obtained if the upper bounds n + of ( t ) are used to repre-
sent the failure rates of the system.
For a specific time interval n, the lower n or upper n + bounds of ( t ) are
used to represent failure rates in Equations 7.26 and 7.27. By solving these sys-
tems of differential equations, we can determine rewards Vi n and Vi n + accumu-
308 7 Aging Multi-state Systems
dV1n (t )
dt
= 0cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 g1 , g1 + g 2 , g 2 V1n (t )
1 3 1 3 1 3 1 3 1 3 1 3
( )
+ g 2 , g 2 V4 (t ) + g1 , g1 V5 (t ),
n n
1 3 1 3
dV2n (t )
= 130cop + g 2 , g 2 cgr ,2m, g 2 + g1 , g1 cgr 1, m, g1 + gn2, g 2 V1n (t )
dt 2 3 2 3 1 3 1 3 2 1
dV3n (t )
= 150cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V1n (t )
dt 2 3 2 3 1 3 1 3 2 1
( 2 3 1 3 2 1
)
g1 , g1 + g 2 , g 2 + gn1, g1 V3n (t ) + g1 , g1 V5n (t ) + g 2 , g 2 V8n (t ),
2 3 1 3
n
dV (t )
4
= 220cop + g1 , g1 cgr 1, m, g1 + gn2, g 2 V1n (t ) + gn2, g 2 V2n (t )
dt 1 3 1 3 3 1 3 2
(
g1 , g1 + gn2, g 2 + gn2, g 2 V4n (t ) + g1 , g1 V9n (t ),
1 3 3 2 3 1
) 1 3
n
dV (t )
5
= 233cop + g 2 , g 2 cgr ,2m, g 2 + gn1, g1 V1n (t ) + gn1, g1 V3n (t )
dt 1 3 1 3 3 1 3 2
(
g 2 , g 2 + gn1, g1 + gn1, g1 V5n (t ) + g 2 , g 2 V9n (t ),
1 3 3 2 3 1
) 1 3
n
dV (t )
6
= 280cop + g1 , g1 cgr 1, m, g1 + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V2n (t ) + gn2, g 2 V3n (t )
dt 2 3 2 3 2 3 2 3 2 1 2 1
(
g1 , g1 + g 2 , g 2 + gn2, g 2 + gn1, g1 V6n (t ) + g1 , g1 V7n (t )+ g 2 , g 2 V8n (t ),
2 3 2 3 2 1 2 1
) 2 3 2 3
( )c
n
dV (t )
7
= 300cop + gn2, g 2 + gn1, g1 + gn1, g1 p + g 2 , g 2 cgr ,2m, g 2 + gn1, g1V2n (t )
dt 2 1 3 1 3 2 2 3 2 3 3 1
+ n
g 22 , g12
V (t ) +
5
n n
g31 , g12
V (t ) g 2 , g 2 +
6
n
( 2 3
n
g 22 , g12
+ n
g13 , g11
+ n
g13 , g12 )V 7
n
(t )
+ g 2 , g 2 V9n (t ),
2 3
dV8n (t )
dt 2 2 1 3
(
= 300cop + gn2, g 2 + gn1, g1 + gn2, g 2 c p + g1 , g1 cgr 1, m, g1 + gn2, g 2 V3n (t )
3 1 2 3 2 3 3 1
)
2 1 3 2
(
+ gn1, g1V4n (t ) + gn2, g 2 V6n (t ) g1 , g1 + gn1, g1 + gn2, g 2 + gn2, g 2 V8n (t )
2 3 2 1 3 1 3 2
)
+ g1 , g1 V (t ), 9
n
2 3
( )
n
dV (t )
9
= 300cop + gn2, g 2 + gn1, g1 c p + gn1, g1V4n (t ) + gn2, g 2 V5n (t )
dt 3 1 3 1 3 1 3 1
3 2 3 2
(
+ gn2, g 2 V7n (t ) + gn1, g1 V8n (t ) gn2, g 2 + gn1, g1 + gn2, g 2 + gn1, g1 V9n (t ).
3 1 3 1 3 2 3 2
)
7.4 Optimal Corrective Maintenance Contract Planning for Aging Multi-state System 309
A similar system of differential equations for the reward Vi n + can also be ob-
tained if the upper bounds of ( t ) are used to represent the failure rates of the sys-
tem.
Ten years have been separated into 120 intervals, and each interval is 1 month.
The failure rate has been approximated by the lower and upper bounds for each of
these intervals. The proposed GA has been used to determine the optimal mainte-
nance contract schedule. The stopping criterion is set to perform at least 120 ge-
netic cycles, and there are at least 5 consecutive genetic cycles without improve-
ment of the solution performance. The population size in the GA is 40.
The convergence characteristics of the proposed GA using the lower bound ap-
proximation of failure rates are shown in Figure 7.16.
Table 7.5 Lower and upper bounds of system total expected cost
Coit and Smith 1996). The failure rate function obtained from experts can be rep-
resented in tabular form (van Noortwijk et al. 1992). For MR, when duration is
relatively small as compared to the time between failures, the expected number of
failures is equal to the expected number of repairs for any time interval. Thus, it is
possible to obtain the renewal function of each element expected number of re-
pairs at time interval [0,t). This expected number of element failures/repairs N(tj)
can be estimated for different time intervals [0,ti) between consecutive PRs.
In this section, we consider the determination of the optimal schedule of cyclic
PRs for MSS with a given series-parallel configuration and two-state elements.
Each element of this system is characterized by its nominal performance and re-
newal function, obtained from experimental data or elicited from expert opinion.
The times and costs of the two types of maintenance activity (PR and MR) are
also available for each system element. The objective is to provide the desired sys-
tem availability at a minimal total maintenance cost and penalty costs caused by
system mission losses (performance deficiency).
The presented method presumes independence between replacement and repair
activities for different system elements. Such an assumption is justified, for exam-
ple, in complex distributed systems (power systems, computer networks, etc.)
where the information about system element repairs and replacements may be in-
accessible for the maintenance staff servicing the given element. In the general
case, the method, which assumes independence of maintenance actions in the sys-
tem, gives the worst estimation of system availability.
Another important assumption is that repair and replacement times are much
smaller than time between failures. In this case, the probability of replacement and
repair event coincidences may be neglected.
In systems with cyclic variable demand (double-shift job-shop production,
power or water supply, etc.), the PR can be performed in periods of low demand
even if the repairs of some of the system elements are not finished. For example,
in power generation systems some important elements may be replaced at night
when the power demand is much lower than the nominal demand. In these cases,
the replacement time may be neglected and all the maintenance actions may be
considered as independent.
(
system life cycle is x j + 1 N j T ) ( (x j ))
+ 1 cj .
Under the formulated assumptions, the expected time that the jth system ele-
ment is unavailable can be estimated by the following expression:
(x j
)
+1 N j T + x j pj ,
(
x j + 1 cj )
(7.50)
T ( x j + 1) N j T x j pj
( x j + 1) cj
pj = , (7.51)
T
the total expected maintenance time tot during the system life cycle as
n
tot = ( x j + 1) N j T ( x + 1) cj + x j pj , (7.52)
j =1 j
and the expected maintenance cost Cm during the system life cycle as
n
Cm = ( x j + 1) N j T ccj + x j c pj , (7.53)
j =1 ( x + 1)
j
where ccj and cpj are corrective and preventive maintenance costs, respectively.
Having the steady-state performance distribution of each system element j
( g = {0, g } , p = {(1 p ) , p }) , one can obtain the entire system steady-state
j j j j j
output performance distribution using the UGF method (Chapter 4), and for the
given steady-state demand distribution w, q, one can obtain the system steady-
state reliability indices: the availability A and the expected performance deficiency
D.
The total unsupplied demand cost during the system life cycle T can be esti-
mated as
7.5 Optimal Preventive Replacement Policy for Aging Multi-state Systems 313
Cud = T cu D, (7.54)
subject to
A( x*) A ',
(7.56)
tot ( x*) '.
Formulation 2 Find the system replacement policy x* that minimizes the total
maintenance and unsupplied demand cost while the total maintenance time does
not exceed a prespecified limitation:
subject to
subject to
A( x*) A ',
(7.60)
tot ( x*) ',
Different elements can have different possible numbers of PR actions during the
system lifetime. The possible maintenance alternatives (number of PR actions) for
each system element j can be ordered in vector Y j = { y j1 ,..., y jK } , where yji is the
number of preventive maintenance actions corresponding to alternative i for sys-
tem element j. The same number K of possible alternatives (length of vectors Yj)
can be defined for each element. If, in practical problems, the number of alterna-
tives differs for different elements, some elements of shorter vectors Yj can be du-
plicated to provide equality of the vectors length.
Each solution is represented by integer string a = {a1 ,..., an } , where
a j (1 a j K ) represents the number of maintenance alternative applied to ele-
ment j. Hence, the vector x for the given solution, represented by string a is
{ }
x = y1a1 ,..., ynan . For example, for a problem with n = 5, K = 4,
Y1 = Y2 = Y3 = {2,3, 4,5} , and Y4 = Y5 = {20, 45,100,100} , string a = {1, 4, 4,3, 2}
represents a solution with x = {2, 5, 5, 100, 45} . Any arbitrary integer string with
elements belonging to the interval (1, K ) represents a feasible solution.
For each given string a the decoding procedure first obtains the vector x and es-
timates N(xj) for all the system elements 1 j n, then calculates availability in-
dices of each two-state system element using expression (7.51), and determines
the entire system steady-state output performance distribution using the UGF
method in accordance with the specified system structure and given steady-state
performance distributions of the elements. It also determines tot and Cm using ex-
pressions (7.52) and (7.53). After obtaining the entire system steady-state output
performance distribution, the procedure evaluates A and Cud using expressions
(4.29), (1.21), (4.34), (1.31), and (7.54).
In order to let the GA look for the solution with the minimal total cost, and with
A that is not less than the required value A ' and tot not exceeding ', the solution
fitness is evaluated as follows:
All the replacement times in the system considered are equal to 0.5 h (0.0007
month). The corrective maintenance includes fault location search and turning of
the elements, so it takes much more time than preventive replacement, but repairs
are much cheaper than replacements.
316 7 Aging Multi-state Systems
t: 24 12 8 6 4.8 4
Element g cp cc c
x: 5 10 15 20 25 30
1 0.40 3.01 0.019 0.002 25 10.0 5.0 2.0 1.00 0.50
2 0.30 2.21 0.049 0.004 26 9.0 2.0 0.6 0.20 0.05
3 0.60 2.85 0.023 0.008 20 4.0 1.0 0.3 0.08 0.01
4 0.15 2.08 0.017 0.005 36 14.0 9.0 6.0 4.00 3.00
5 0.15 1.91 0.029 0.003 55 15.0 7.0 4.0 0.32 0.30
6 0.25 0.95 0.031 0.009 31 9.5 5.6 4.0 2.70 2.00
7 1.00 5.27 0.050 0.002 13 3.2 1.4 0.8 0.50 0.10
8 0.70 4.41 0.072 0.005 5 2.0 1.0 0.4 0.10 0.01
The demand distribution is presented in Table 7.7. The total life cycle T is 120
months and the cost of 1% of unsupplied demand for 1 month is cu = 10 conven-
tional units.
For the sake of simplicity, we use in this example the same vector of replace-
ment frequency alternatives for all the elements. The possible number of replace-
ments during the system life cycle varies from 5 to 30 with step 5. The chosen pa-
rameters of the fitness function (7.60) are M = 5000, = 2000. First obtained
were the solutions for the first formulation of the problem in which unsupplied
demand cost is not considered. (Three different solutions are presented in Table
7.8.) One can see the total maintenance time and cost as functions of system avail-
ability in Figures 7.18 and 7.19. Note that each point of the graph corresponds to
an optimal solution.
Then the unsupplied demand cost was introduced and the problem was solved
in its second formulation. The solutions corresponding to the minimal and maxi-
mal possible system availability (minimal and maximal maintenance cost) are pre-
sented in Table 7.8, as is the optimal solution, which minimizes the total cost. One
can see that the optimal maintenance solution allows about 50% total cost reduc-
tion to be achieved in comparison with minimal Cm and minimal Cud solutions.
7.5 Optimal Preventive Replacement Policy for Aging Multi-state Systems 317
Formulation 1
{5,5,5,5,5,5,5,5,
A' = 0.96 0.0 263.1 263.1 9.2 0.9606
10,10,10,10,5,5}
{5,5,5,5,10,10,5,5,
A' = 0.97 0.0 296.6 296.6 7.7 0.9700
10,10,25,25,5,5}
{5,5,5,5,15,15,10,
A' = 0.98 0.0 384.4 384.4 5.85 0.9800
10,10,25,25,25,5,5}
Formulation 2
{5,5,5,5,5,5,5,
Minimal Cm 1029.5 249.1 1278.6 11.61 0.9490
5,5,5,5,5,5,5}
{30,30,30,30,30,30,30,
Minimal Cud 156.4 1060.3 1216.7 2.47 0.9880
30,25,25,30,30,30,30}
(maximal A)
{5,5,5,5,20,20,10,
Minimal Cm+Cu 256.2 397.4 653.5 6.02 0.9800
10,10,10,30,30,5,5}
{10,20,20,20,25,20,30,
Minimal Cm+Cud, '=3 181.7 674.7 856.4 2.99 0.9877
30,25,25,30,30,10,5}
General formulation
450 12
Total maintenence cost
400 10
Cost
350 8
300 6
250
0.95 0.96 0.97 0.98 0.99 4
0.95 0.96 0.97 0.98 0.99
Availability Availability
Fig. 7.18 Total maintenance cost as func- Fig. 7.19 Total maintenance time as function
tion of system availability of system availability
318 7 Aging Multi-state Systems
1200 0.99
1000 0.988
Availability
800 0.986
Cost
600
0.984
400
0.982
200
3 4 5 6 0.98
Maintenance time limitation 3 4 5 6
Maintenance time limitation
Cud+Cm Cm Cud Fig. 7.21 Steady-state availability under
Fig. 7.20 System cost under maintenance maintenance time limitations
time limitations
References
Almeida de AT (2001) Multicriteria decision making on maintenance: spares and contract plan-
ning. Eur J Oper Res 129:235241
Asgharizadeh E, Murthy DNP (2000) Service contracts: A stochastic model. Math Comp Model
31:1120
Bagdonavicius V, Nikulin M (2002) Accelerated life models. Chapman & Hall/CRC, Boca
Raton, FL
Barlow R, Proshan F (1975) Statistical theory of reliability and life testing. Holt, Rinehart and
Winston, New York
Coit D, Smith A (1996) Reliability optimization of series-parallel systems using genetic algo-
rithm. IEEE Trans Reliab 45(2):254266.
Ding Y, Lisnianski A, Frenkel I et al (2009) Optimal corrective maintenance contract planning
for aging multi-state system. Appl Stoch Models Bus Ind 25(5):612631
Finkelstein M (2003) A Model of aging and a shape of the observed force of mortality. Lifetime
Data Anal 9:93109
Finkelstein M (2005) On some reliability approaches to human aging. Int J Reliab Qual Saf Eng
12(4):337346
Finkelstein M (2008) Failure Rate Modelling for Reliability and Risk. Springer, London
Gertsbakh IB (2000) Reliability Theory with Applications to Preventive Maintenance, Springer,
Berlin
Gertsbakh IB, Kordonsky KB (1969) Models of failure. Springer, New York
References 319
Howard R (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA
Jackson C, Pascual R (2008) Optimal maintenance service contract negotiation with aging
equipment. Eur J Oper Res 189:387398
Kececioglu D (1991) Reliability Engineering Handbook, Part I and II. Prentice Hall, Englewood
Cliffs, NJ
Kuo W, Prasad V (2000) An annotated overview of system-reliability optimization. IEEE Trans
Reliab 40(2):176187
Lisnianski A, Frenkel I (2009) Non-Homogeneous Markov Reward Model for Aging Multi-state
System under Minimal Repair. Int J Performab Eng 5(4):303312
Lisnianski A, Frenkel I, Khvatskin L et al (2008) Maintenance contract assessment for aging sys-
tems. Qual Reliab Eng Int 24:519531.
Lisnianski A, Levitin G (2003) Multi-state System Reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
Martorell S, Sanchez A, Serdarell V (1999) Age-dependent reliability model considering effects
of maintenance and working conditions. Reliab Eng Sys Saf 64:1931
Meeker W, Escobar L (1998) Statistical methods for reliability data. Wiley, New York
Monga A, Zuo M (1998) Optimal system design considering maintenance and warranty. Comp
Oper Res 25:691705
Munoz A, Martorell S, Serdarell V (1997) Genetic algorithms in optimizing surveillance and
maintenance of components. Reliab Eng Sys Saf 57:107120
Murthy DNP, Asgharizadeh E (1999) Optimal decision making in a maintenance service opera-
tion. Eur J Oper Res 116:259273
Murthy DNP, Atrens A, Eccleston JA (2002) Strategic maintenance management. J Qual Maint
8(4):287305
Murthy DNP, Yeung V (1995) Modelling and analysis of maintenance service contracts. Math
Comp Model 22: 219225
Trivedi K (2002) Probability and statistics with reliability, queuing and computer science appli-
cations. Wiley, New York
Valdez-Flores C, Feldman RM (1989) A survey of preventive maintenance models for stochasti-
cally deteriorating single-unit systems. Naval Res Logis 36:419446
Van Noortwijk J, Dekker R, Cooke R et al (1992) Expert judgment in maintenance optimization.
IEEE Trans Reliab 41:427432
Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139:
469489
Welke S, Johnson B, Aylor J (1995) Reliability Modeling of Hardware/Software Systems. IEEE
Trans Reliab 44(3):413418
Wendt H, Kahle W (2006) Statistical Analysis of Some Parametric Degradation Models. In: Ni-
kulin M, Commenges D, Huber-Carol C (eds) Probability, Statistics and Modelling in Public
Health. Springer Science + Business Media, Berlin, pp 26679
Xie M, Poh KL, Dai YS (2004) Computing system reliability: models and analysis. Klu-
wer/Plenum, New York
Zhang F, Jardine AKS (1998) Optimal maintenance models with minimal repair, periodic over-
haul and complete renewal. IIE Trans 30:11091119
8 Fuzzy Multi-state System: General Definition
and Reliability Assessment
8.1 Introduction
In conventional multi-state theory, it is assumed that the exact probability and per-
formance level of each component state are given. However, with the progress in
modern industrial technologies, the product development cycle has become shorter
and shorter while the lifetime of products has become longer and longer (Huang et
al. 2006). In many highly reliable applications, there may be only a few available
observations. Therefore, it is difficult to obtain sufficient data to estimate the pre-
cise values of these probabilities and performance levels in these systems. More-
over, inaccuracy in system models that is caused by human error is difficult to
deal with solely by means of conventional reliability theory (Huang et al. 2004).
In some cases, in order to reduce the computational burden, a simplified model is
used to represent a complex system and a MSS model is used to characterize a
continuous-state system, which can reduce the computational accuracy. New tech-
niques and theories are needed to solve these fundamental problems.
The fuzzy set theory provides a useful tool to complement conventional reli-
ability theories. Cai (1996), Singer (1990), Guan and Wu (2006), Misra and We-
ber (1990), Utkin and Gurov (1996), Chen (1994), and Cheng and Mon (1993) at-
tempted to define and evaluate system reliabilities in terms of fuzzy set theory and
techniques, i.e., probist reliability theory, posbist reliability theory, profust
reliability theory, and fuzzy fault tree analysis. In some recent research, posbist
fault tree analysis of coherent systems was discussed (Huang et al. 2004). Huang
et al. (2006) proposed a Bayesian reliability analysis for fuzzy lifetime data.
There are few works focusing on reliability assessment of MSS using fuzzy set
theory. Ding et al. (2008) have made an attempt at this problem. The fuzzy uni-
versal generating function (FUGF) was developed to extend the UGF with crisp
sets (Ding and Lisnianski 2008), which is widely used in the reliability evaluation
of conventional MSS (Lisnianski and Levitin 2003). The basic definition of a
fuzzy multi-state system (FMSS) model is also given: the state probability and the
322 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
In this section, key definitions and concepts of FMSS are introduced and devel-
oped.
The natural extension of the crisp definition for conventional MSS to the fuzzy
set definition for FMSS is that the state probabilities and state performances of a
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System 323
1
R( A, k ) = [ Rr ( A, k ) + Rl ( A, k )] . (8.1)
2
324 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
The first criterion, therefore, is set as a comparison of the removals of two dif-
ferent fuzzy numbers with respect to k (Kaufmann 1988). Relative to k = 0, the
removal number R( A, k ) is equivalent to an ordinary representative of the fuzzy
number. If fuzzy number A is represented by a triplet ( a1 , a2 , a3 ) , then the ordinary
representative is given by
a + 2a2 + a3
A = 1 . (8.2)
4
a1 a2 a3 k X
Rl ( A, k ) = Areas +
Rr ( A, k ) = Area
1
R( A, k ) = [Rr ( A, k ) + Rl ( A, k )]
2
Fig. 8.1 Removals with respect to k for a fuzzy number A
4 + 12 + 7
A1 = ( 4,6, 7 ) A1 = = 5.75,
4
4 + 10 + 9
A2 = ( 4,5,9 ) A2 = = 5.75,
4
10 + 10 + 3
A3 = ( 3,5,10 ) A3 = = 5.75,
4
0+0+0
A4 = ( 0, 0, 0 ) A = = 0.
4
Therefore, A4 < A1 , A2 , A3 .
Secondly, the second criterion is used to order A1 , A2 , and A3 :
A1 = ( 4, 6, 7 ) mode = 6,
A2 = ( 4,5,9 ) mode = 5,
A3 = ( 3,5,10 ) mode = 5.
Therefore, A1 > A2 , A3 . .
Finally, the third criterion is used to order A2 and A3 :
A2 = ( 4,5,9 ) divergence = 9 4 = 5,
A3 = ( 3,5,10 ) divergence = 10 3 = 7.
Therefore, A3 > A2 .
We obtain the linear order, A1 > A3 > A2 > A4 .
Suppose the performance set of a component is represented
by G = {G0 , G1 , G2 , G3} , where G3 = A1 , G2 = A3 , G1 = A2 and G0 = A4 .
326 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
We propose the following definitions and examples that determine and illus-
trate important FMSS properties (Ding et al. 2008):
Definition 8.1 A FMSS is in state j or above if the system performance level is
greater than or equal to k j , a predefined fuzzy or crisp value. Let (G
) represent
the system structure function, which maps the space of components fuzzy per-
formance levels into the space of the systems fuzzy performance levels, and
represents the state of the system. Then we have
{( ) }
Pr (G ) k 0 = Pr { j} .
j
that the system is in state j or above is set as crisp values but represented as trip-
lets, with k j = (0, 0, 0), (1.5, 1.5, 1.5), (2, 2, 2) for j = 0, 1, 2, respectively.
Suppose that two components are both in state 1. The system performance level
can be evaluated as
( g11 , g 21 ) = g11 + g 21 = ( a1 , a2 , a3 )
= ( 0.65, 0.7, 0.75 ) + ( 0.75, 0.8, 0.85 ) = (1.4, 1.5, 1.6 ) .
As shown in Figure 8.2, for a1 > k0 the FMSS is definitely in state 0 or above;
for a3 < k2 , the FMSS is definitely not in state 2.
However, for a1 < k1 < a3 there exists the uncertainty of FMSS being in state 1.
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System 327
1, if (G
) k , with possibility ( (G
) = 1),
j (G ) =
j j
0, if (G
)< k , with possibility ( (G
j j
) = 0).
Some new parameters defined in Ding et al. (2008) are supplemented and used
to evaluate the possibility.
k0
( g11 , g 21 ) k k2
1
1
a1 a2 a3 X
Fig. 8.2 Fuzzy performance level
The adequacy index for system state j, which determines the relation between
the system performance level and the state performance requirement k j , is de-
fined as
{
rj = k j = rj , rj (rj ) | rj = k j , , k j K j ,} (8.3)
j
r j = k j
{
r (rj ) = sup min , k , j
} (8.4)
328 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
rj =
r j R j
r (rj ).
j
(8.5)
If the membership function of rj is continuous, the cardinality of fuzzy set rj
is
rj =
r j R j
r (rj )drj ,
j
(8.6)
SR j = {rj R j | rj 0} . (8.7)
And let
srj =
sr j SR j
sr ( srj ).
j
(8.9)
srj =
sr j SR j
sr ( srj )dsrj .
j
(8.10)
{( j ) }
k 0 = Pr (G)
Pr (G) ( ((G))=1)
j
{
= Pr (G) }
sr
j { } rel
, (8.12)
where Pr (G) { }
is the probability that system has performance level (G).
sr1 =
sr 1SR1
sr ( sr1 )dsr1 = 0.5 1 (0.1 0) = 0.05,
1
{ }
Pr ( g11 , g 21 ) = Pr( g11 ) Pr( g 21 )
= ( 0.095, 0.1, 0.105 ) ( 0.195, 0.2, 0.205)
= (0.018525, 0.02, 0.021525),
{ } { }
Pr ( g11 , g 21 ) k1 = Pr ( g11 , g 21 ) sr1 rel
= (0.018525, 0.02, 0.021525) 0.5
= (0.0092625, 0.01, 0.0107625).
ponent i and the system have the same number of states. If all the components are
fuzzy strongly relevant to the system, the FMSS must be homogenous.
The following example illustrates the definition.
Example 8.3 Consider a FMSS with two components. Each component and the
system have three states. The components performance levels for different states
are shown in Table 8.1. The performance levels of the derated state (state 1) of the
components are represented as fuzzy values.
) = min { g , g } . The minimum require-
The FMSS structure function is (G 1j 2j
ment to ensure that the system is in state j or above is represented as crisp values,
with k j = 0, 0.8, 1 for j = 0, 1, 2, respectively.
j 0 1 2
k j 0 0.8 1
Given the system performance level, the possibility of the systems staying in
or above a state can be evaluated by Equation 8.11. The following example illus-
trates the definition.
Example 8.4 Consider a FMSS with two components. Each component and the
system have three states. The components performance levels for different states
are shown in Table 8.2. The performance levels of the degraded state (state 1) of
the components are represented as fuzzy values. The FMSS structure function is
(G ) = g1 j + g 2 j . The minimum requirement to ensure that the system is in state j
or above is represented as crisp values, with k j = 0, 0.5, 1.1 for j = 0, 1, 2, re-
spectively.
j 0 1 2
g1 j 0 (0.45, 0.5, 0.55) 1
k j 0 0.5 1.1
where 1 i n .
Comparing this definition with definition 8.3, we only require that at least one
state of a fuzzy weakly relevant component have a possible nontrivial influence on
the state of the system. The state change of the system can be only possible. The
following example illustrates this concept.
Example 8.5 Consider a FMSS with two components. Each component and the
system have three states. The components performance levels for different states
are shown in Table 8.3. The FMSS structure function is (G ) = min { g , g } .
1j 2j
The minimum requirement to ensure that the system is in state j or above is repre-
sented as crisp values, with k j = 0, 0.85, 1 for j = 0, 1,2, respectively.
j 0 1 2
k j 0 0.85 1
When component 2 is in state 2, ( g11, g22 ) = min{ g11, g22} = (0.8, 0.85, 0.9) ,
sr0 rel
= 1 , sr1 rel = 0.5 and sr2 rel
=0.
It can be seen that under the condition that component 1 is in state 1, when
component 2 changes its state from 0 to 1, the possibility of the systems staying
in state 1 or above does not change ( sr
1 rel )
= 0 ; when component 2 changes its
state from 1 to 2, the possibility of the systems staying in state 1 or above in-
creases from 0 to 0.5 ( sr1 rel )
= 0.5 . Thus, we conclude that component 2 is fuzzy
weakly relevant to the system structure based on definition 8.4.
From the above definitions we notice that a fuzzy strongly relevant component
satisfies the requirements for fuzzy relevant and fuzzy weakly relevant compo-
nents; and a fuzzy relevant component satisfies the requirements for fuzzy weakly
relevant components. Moreover, a fuzzy weakly relevant component may be a
fuzzy relevant component or a fuzzy strongly relevant component; and a fuzzy
relevant component can be a fuzzy strongly relevant component.
( ) . The structural
n
Definition 8.5 Let be a function with domain g i 0 , g i1 ,..., g iM i
(
formance level than g il , G )
, for 0 j M , 1 l M .
i i
2. ( g1,0 ,..., g i ,0 ,..., g n ,0 ) = kmin for 1 i n, where kmin is the lowest system fuzzy
performance level.
( )
3. g1M1 ,..., g iM i ,..., g nM n = kmax for 1 i n, where kmax is the greatest system
fuzzy performance level.
4. If the FMSS is homogenous, the possibility of ( g1 j ,..., g ij ,..., g nj ) being larger
than or equal to k j is larger than 0 for 1 j M and 1 i n .
Based on this definition, we can say that the increase of the state of any system
components will not degrade the system fuzzy performance level. In addition
when all components are working perfectly the greatest system fuzzy performance
level is achieved; and when all components have completely failed, the lowest
system fuzzy performance level is achieved. However, for a homogenous system
the condition that when all components are in the same state the system is also
definitely in the same state is relaxed. We only require that when all components
are in the same state the system have a nontrivial possibility of being in the same
state.
334 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
For example, in Example 8.2 ( g10 , g20 ) = 0, ( g11, g21 ) = (1.4, 1.5,1.6), and
( g12 , g 22 ) = 2.
Obviously, ( g12 , g22 ) > ( g11, g21 ) > ( g10 , g20 ) , ( g10 , g20 ) = kmin , and
( g12 , g22 ) = kmax .
The possibilities of ( g12 , g 22 ) , ( g11 , g 21 ) , ( g10 , g 20 ) being larger than
k j = 2, 1.5, 0 for j = 2, 1, 0, are 1, 0.5, and 1, respectively.
Therefore, in Example 8.2 structural function represents a fuzzy multi-state
monotone system.
Definition 8.7 Two component fuzzy performance vectors G and G are said to
be equivalent if and only if ( G ) (G
) = 0. We use the notation G G to
indicate that these two vectors are equivalent.
The following example illustrates this concept.
Example 8.6 Consider two performance vectors
G 1( 2 3 )
= G , G , G = {( 0, 0.5, 1) , ( 0.5, 1, 1.5 ) , (1.5, 2, 2.5 )} and
Thus, ( G ) = G + G + G = ( 2, 3.5, 5 ) , ( G
1 2 3
) = G + G + G = ( 2, 3.5, 5)
*
1
2
3
and ( G ) (G
) = (0, 0, 0). Therefore these two vectors are equivalent.
a difficult condition to satisfy in the fuzzy domain. Small deviations of fuzzy val-
ues will change the conclusion.
Example 8.7 Suppose G1 = ( 0.6, 0.8, 1) and other variables are the same as in Ex-
Figure 8.3. Only the definition of equivalence is not sufficient for evaluating the
property of fuzzy performance vectors.
8.2 Key Definitions and Concepts of a Fuzzy Multi-state System 335
Definition 8.8 Two component fuzzy performance vectors G and G are said to
be approximately equivalent within a degree if and only if
( )
S ( G
, (G
)) . We use the notation (G ) to indicate that these two
G
( G
) (G
)
2 2.1 3.5 5 X
( G
) (G
) = Areas +
( G
) (G
) = Area
The UGF method is the primary approach for reliability evaluation of MSSs. The
fuzzy universal generating functions (FUGFs) developed in Ding and Lisnianski
(2008) can be used to evaluate the defined FMSS, which is summarized in the fol-
lowing subsections.
{
The fuzzy performance distribution (PD) g i = g i1 ,..., g iji , p i = p i1 ,..., p iji} { }
of component i can be represented in the following form:
Mi
u ( z ) = p iji z
g iji
, (8.14)
ji =1
where g i and p i are, respectively, the performance set and probability set repre-
sented by fuzzy sets for component i.
To obtain the fuzzy PD of a FMSS with an arbitrary structure, a general fuzzy
composition operator is used over the z-transform fuzzy representations of n
system components.
M1 Mn
( p z g1 j1 ,... p z g njn )
U ( z) = 1 j1 njn
j1 =1 jn =1
M1 M 2 Mn
( g1 j1 ,... g njn )
= ... ( p j z ) (8.15)
j1 j2 jn
M1 M 2 Mn
= ... ( p j z j ),
g
j1 j2 jn
where p j and g j can be evaluated using Equations 8.16 and 8.17, respectively.
The probability of system state j represented by a fuzzy set can be calculated
as:
n
p j = p j , p j ( p j ) | p j = piji , piji Piji , (8.16)
i =1
8.3 Reliability Evaluation of Fuzzy Multi-state Systems 337
{
g j = ( g1 j1 ,..g iji ., g njn )= g j , g j ( g j ) | g j = ( g1 j1 ,..giji ., g njn ), giji Giji , (8.17) }
where g j ( g j ) = sup
( g1 j1 ,.. giji ., g njn )
{
min g1 j ,..., gnj
1 n
} and ( g1 j1 ,..g iji ., g njn ) is the
n
g j = g j , g j ( g j ) | g j = giji , giji Giji , (8.18)
i =1
{
g j = g j , g j ( g j ) | g j = min( g1 j1 ,..giji ...), giji Giji , } (8.19)
where g j ( g j ) = sup
g j = min( g1 j1 ,.. giji ., g njn )
{
min g1 j ,..., gnj .
1 n
}
The suggested approach is called the FUGF technique.
Ms
U ( z ) = p j z j .
g
(8.20)
j =1
338 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
The system fuzzy availability A can be evaluated using the following opera-
tor A :
Ms
A ( w ) = A (U ( z ), w ) = A p j z j , w
g
j =1
{
= ..., ( p , sr ),...
A i j rel }
(8.21)
{ {
= A ..., pi srj
rel
, p j ( p j ) | p j Pj ,... } }
Ms
= A, A ( A ) | A = p j srj , p j Pj ,
rel
j =1
where A ( A ) =
Ms
sup {
min p1 ,.. p j ., pM .
s
}
A = p j ar j
j =1 rel
From Equation 8.21, the operator A uses the following procedures to obtain
system fuzzy availability:
1. Obtain the system FUGF as shown in 8.20.
2. For a given demand w , evaluate srj and srj for system state j using Equa-
rel
The series-parallel system is one of the most important MSSs. A gas transmission
system is a typical example of such a system. In order to obtain the FUGF of a
8.3 Reliability Evaluation of Fuzzy Multi-state Systems 339
FMSS, the composition operators are used recursively to obtain the FUGF of the
intermediate series or parallel subsystems.
Consider a series-parallel system with fuzzy values of the performance rates
(level) and probabilities, where components are statistically independent. The per-
formance rates (level) and probabilities of the components are also assumed to be
triangular fuzzy numbers represented as triplet ( a, b, c ) , which is one of the most
important classes of fuzzy numbers and is used in many practical situations be-
cause of their simplicity in mathematical calculations (Kaufmann and Gupta
1988). The membership function is defined as shown:
0, x < a,
x a , a x b,
X ( x) = b a (8.22)
c x , b x c,
c b
0, x > c,
n n n
g j = P ( g1 j1 ,..g iji ., g njn ) = ( aiji , biji , ciji ), (8.23)
i =1 i =1 i =1
where P is the fuzzy parallel operator and the component g iji is represented as
( )
triplet aiji , biji , ciji .
According to 8.16 and the fuzzy arithmetic operations of triangular fuzzy num-
bers (Cai 1996; Chen and Mon 1994), the subsystem probability p j can be ob-
tained as
340 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
n n n
p j = aiji , biji , ciji , (8.24)
i =1 i =1 i =1
where S is the fuzzy series operator, g lj is the l-cut of fuzzy set g j , which con-
tains all components with a degree of membership greater than or equal to l,
{ }
g lj = g j | g j ( g j ) l , and g lj is expressed as an interval a lj , clj as shown in
g j
1
a lj c lj
l
interval
a1 j1 a2 j b2 j2 b1 j1 c2 j2 c1 j1
2 X
a2l j2 , c2l j2 be, respectively, the confidence interval at level l of g1 j1 and g 2 j2 as
shown in Figure 8.5.
g 2 j g 1 j
2 1
1
a1 j1 a2 j b2 j2 b1 j1 c1 j1 c2 j2
2 X
Fig. 8.5 g1 j1 and g 2 j2 with l-cut
It is assumed that a1 j1 a2 j2 , therefore, there are four possibilities for the result
of 8.26:
Case 1: b1 j1 b2 j2 and c1 j1 c2 j2 . Obviously in this case g1 j1 is definitely less
than or equal to g 2 j2 . S ( g1 j1 , g 2 j2 ) can be represented by the triplet
(a 1 j1 )
, b1 j1 , c1 j1 .
Case 2: As shown in Figure 8.4, b1 j1 b2 j2 and c1 j1 c2 j2 . The membership
function of S ( g1 j1 , g 2 j2 ) represented by the solid line is
342 8 Fuzzy Multi-state System: General Definition and Reliability Assessment
0, x < a1 j1 ,
x a1 j1 a1 j1 b2 j2 a2 j2 b1 j1
b a , a1 j1 x ,
1 j1 1 j1 b2 j2 b1 j1 a2 j2 + a1 j1
x a a1 j1 b2 j2 a2 j2 b1 j1
S ( g1 j , g2 j ) (x) =
2 j2
, x b2 j2 , (8.27)
1 2 b a
2 j2 2 j2 b2 j2 b1 j1 a2 j2 + a1 j1
c x
2 j2 , b2 j2 x c2 j2 ,
c2 j2 b2 j2
0, x > c2 j2 .
0, x < a1 j1 ,
x a1 j1 a1 j1 b2 j2 a2 j2 b1 j1
b a , a1 j1 x b b a + a ,
1 j1 1 j1 2 j2 1 j1 2 j2 1 j1
x a2 j a b
1 j1 2 j2 a b
2 j2 1 j1
2
, x b2 j2 ,
b2 j2 a2 j2 b2 j2 b1 j1 a2 j2 + a1 j1 (8.28)
S ( g1 j , g2 j ) ( x) =
1 2
c2 j2 x , b x c2 j2 b1 j1 c1 j1 b2 j2 ,
c b 2 j2
b1 j1 c1 j1 b2 j2 + c2 j2
2 j2 2 j2
c1 j1 x c2 j2 b1 j1 c1 j1 b2 j2
, x c1 j1
c b
1 j1 1 j1 b1 j1 c1 j1 b2 j2 + c2 j2
0, x > c1 j1 .
0, x < a1 j1 ,
x a1 j1
b a , a1 j1 x b1 j1 ,
1 1 j1
1 j
c x c2 j2 b1 j1 c1 j1 b2 j2 (8.29)
1j
S ( g1 j , g2 j ) ( x) = 1 , b1 j1 x ,
1 2 c b
1 j1 1 j1 b1 j1 c1 j1 b2 j2 + c2 j2
c x c2 j2 b1 j1 c1 j1 b2 j2
2 j2 , x c2 j2 ,
c2 j2 b2 j2 b1 j1 c1 j1 b2 j2 + c2 j2
0, x > c2 j2 .
8.3 Reliability Evaluation of Fuzzy Multi-state Systems 343
S ( g1 j1 , g 2 j2 ) = l a lj , c lj
l
( )
= min(a1 j1 , a2 j2 ), min(b1 j1 , b2 j2 ), min(c1 j1 , c2 j2 ) .
j 1 2 3
p j1 (0.795, 0.8, 0.805) (0.695, 0.7, 0.703) (0.958, 0.96, 0.965)
g j1 1.5 2 4
g j 3 0 0 \
In order to find the FUGF for components 1 and 2 connected in parallel, the
is applied to u ( z ) and u ( z ). Expressions 8.23 and 8.24 are used
operator P 1 2
( u ( z ), u ( z ) ) =
P 1 2
( )
9
( (u ( z ), u ( z )), u ( z ) ) = p z g j
S 1 P 2 3 j
j =1
After collecting the terms with the same capacity rates, there are nine system
states.
For states j = 1,...,5, and 7, w < g j definitely, srj = 1. These states are suc-
rel
cessful states.
For states i = 8 and 9, g j < w definitely, srj = 0. These states are failure
rel
states.
For state j = 6, rj = (1.4, 1.5, 1.7) + (1.5, 1.4, 1.3) = (0.1, 0.1, 0.4). Be-
cause rj is represented as a triangular fuzzy value,
rj = 0.5 1 ( 0.4 (0.1) ) = 0.25; srj = rj 0.5 0.5 ( 0 (0.1)) = 0.225 and
srj = arj rj = 0.9.
rel
A ( w ) = A (U ( z ), w )
= (0.52932, 0.5376, 0.54611) + (0.14851, 0.1536, 0.15925)
+ (0.063252, 0.0672, 0.071231) + (0.017747, 0.0192, 0.020772)
+ (0.063918, 0.0672, 0.069196) + (0.017934, 0.0192, 0.020178) 0.9
+ (0.068545, 0.0768, 0.085451) + 0 + 0
= (0.90743, 0.93888, 0.97017).
Suppose that the system safety standard requires the system operation must sat-
isfy a required level of system availability, which is set as 0.9. After evaluation,
the above system design considering fuzzy uncertainties can satisfactorily meet
the system availability requirement, which guarantees the system working in a
relatively safe mode.
References
A.1 Introduction
There are many optimization methods available for use on various reliability op-
timization problems (Lisnianski and Levitin 2003). The applied algorithms can be
classified into two categories: heuristics and exact techniques based on the modi-
fications of dynamic programming and nonlinear programming. Most of the exact
techniques are strongly problem oriented. This means that since they are designed
for solving certain optimization problems, they cannot be easily adapted for solv-
ing other problems. Recently, most research works have focused on developing
general heuristics techniques to solve reliability optimization problems, that are
based on artificial intelligence and stochastic techniques to direct the search. The
important advantage of these techniques is that they do not require any informa-
tion about the objective function besides its values corresponding to the points vis-
ited in the solution space. All heuristics techniques use the idea of randomness
when performing a search, but they also use past knowledge in order to direct the
search. Such search algorithms are known as randomized search techniques. This
appendix includes and updates the reports related to the heuristic algorithms by
Lisnianski and Levitin (2003) and some further discussion and examples.
Based on the classification by Lisnianski and Levitin (2003) and some recent
research, the heuristics techniques include simulated annealing, ant colony, tabu
searching, genetic algorithm (GA), and particle swarm optimization (PSO).
Kirkpatrick et al. (1983) first presented the simulated annealing algorithm. The
idea is initialized by the metallurgy procedure called annealing process. The simu-
lated annealing algorithm can not only improve the objective value of local search
but it also can allow a move to some solutions with higher costs (Lisnianski and
Levitin 2003). This algorithm can obtain global solutions rather than local ones.
The ant colony was first introduced by Dorigo and Gambardella (1997). The
inspiration of the algorithm was from the behavior of natural ant colonies (Lis-
nianski and Levitin 2003): by leaving different amounts of smelling ferments in
348 Appendix A
paths, an ant colony is capable of finding the shortest path from home to a source
and is also able to adapt to changes in the environment.
The tabu search was first described by Glover (1989). This search uses previ-
ously obtained information to restrict the next research direction. The technique is
intelligent and guides the search for global optimal solutions.
PSO was first described by Kennedy and Eberhart (1995). The inspiration of
PSO was the behaviors of a bird swarm. There is some similarity between GAs
and PSO: a stochastic heuristic search is conducted by operating a population of
solutions. However, there are no evolution operators such as crossover and muta-
tion in PSO (PSO Tutorial). It is noticed that the information sharing mechanism
between GA and PSO is totally different (PSO Tutorial): in GA the whole popula-
tion of solutions moves relatively uniformly to the optimal area because solution
chromosomes share information with each other; in PSO only the solution pa-
rameters (gbest and pbest) send out information, which is a one-way information
sharing.
The procedure to solve the optimization problem of the PSO includes the fol-
lowing steps (Parket et al. 2005):
Step 1: Generate an initial population of solutions randomly in the search
space. A particle is represented by a solution vector.
Step 2: Evaluate the fitness of each particle.
Step 3: Calculate the position and velocity for each particle in the swarm using
the following equations:
The GA is the most widely used heuristics technique. It was inspired by the op-
timization procedures in the biological phenomenon of evolution. In GA, new
population of solutions comes from the optimal selection of offspring solutions
generated by the previous population. Crossover and mutation operators are used
by parents to produce their offspring. The survival of offspring is deter-
mined by their adaptation to the environment. GAs are the most popular heuristic
algorithms for solving different kinds of reliability optimization problems. The de-
tailed descriptions of GA in the later sections of this Appendix include and update
the reports by Lisnianski and Levitin (2003). The advantages of GAs include the
following (Goldberg 1989; Lisnianski and Levitin 2003):
They can be relatively easily implemented to solve different problems includ-
ing constrained optimization problems.
A population of solutions is used to conduct the optimal search in GAs.
GAs are stochastic in nature.
GAs are parallel and can produce good quality solutions simultaneously.
The GA was first introduced by Holland (1975). Holland was impressed by the
ease with which biological organisms could perform tasks, which eluded even the
most powerful computers. He also noted that very few artificial systems have the
most remarkable characteristics of biological systems: robustness and flexibility.
Unlike technical systems, biological ones have methods for self-guidance, self-
repair, and reproduction of these features. Hollands biologically inspired ap-
proach to optimization is based on the following analogies:
As in nature, where there are many organisms, there are many possible solu-
tions to a given problem.
As in nature, where an organism contains many genes defining its properties,
each solution is defined by many interacting variables (parameters).
As in nature, where groups of organisms live together in a population and some
organisms in the population are more fit than others, a group of possible solu-
tions can be stored together in computer memory and some of them will be
closer to the optimum than others.
As in nature, where organisms that are fitter have more chances of mating and
having offspring, solutions that are closer to the optimum can be selected more
often to combine their parameters to form new solutions.
As in nature, where organisms produced by good parents are more likely to be
better adapted than the average organism because they received good genes, the
offspring of good solutions are more likely to be better than a random guess,
since they are composed of better parameters.
As in nature, where survival of the fittest ensures that the successful traits con-
tinue to get passed along to subsequent generations and are refined as the popu-
lation evolves, the survival-of-the-fittest rule ensures that the composition of
the parameters corresponding to best guesses continually get refined.
350 Appendix A
needs to continue searching for the optimum is some measure of fitness about a
point in the search space.
GAs are probabilistic in nature, not deterministic. This is a direct result of the
randomization techniques used by GAs.
GAs are inherently parallel. It is one of their most powerful features. GAs, by
their nature, are very parallel, dealing with a large number of solutions simultane-
ously. Using schemata theory, Holland has estimated that a GA, processing n
strings at each generation, in reality processes n 3 useful substrings (Goldberg
1989).
Two of the most common GA implementations are generational and steady
state. The steady-state technique has received increased attention (Kinnear 1993)
because it can offer a substantial reduction in the memory requirements in compu-
tation: the technique abolishes the need to maintain more than one population dur-
ing the evolutionary process, which is necessary in a generational GA. In this way,
genetic systems have greater portability for a variety of computer environments
because of the reduced memory overhead. Another reason for the increased inter-
est in steady-state techniques is that, in many cases, a steady-state GA has been
shown to be more effective than a generational GA (Syswerda 1991; Vavak and
Fogarty 1996). This improved performance can be attributed to factors such as the
diversity of the population and the immediate availability of superior individuals.
Detailed descriptions of a generational GA were illustrated in Goldberg (1989).
Therefore, the structure of a steady-state GA is introduced here.
The steady-state GA proceeds as follows (Whitley 1989), as shown in Figure
A1. First we generate randomly or heuristically an initial population of solutions.
Within this population, new solutions are obtained during the genetic cycle using a
crossover operator. This operator produces an offspring from a randomly selected
pair of parent solutions (the parent solutions are selected with a probability pro-
portional to their relative fitness), facilitating the inheritance of some basic proper-
ties from the parents to the offspring. The newly obtained offspring undergoes
mutation with probability Pmut .
Each new solution is decoded and its objective function (fitness) values are es-
timated. These values, which are a measure of quality, are used to compare differ-
ent solutions. The comparison is accomplished by a selection procedure that de-
termines which solution is better: the newly obtained solution or the worst solution
in the population. The better solution joins the population, while the other is dis-
carded. If the population contains equivalent solutions following selection, then
redundancies are eliminated and the population size decreases as a result. A ge-
netic cycle terminates when N rep new solutions are produced or when the number
of solutions in the population reaches a specified level. Then, new randomly con-
structed solutions are generated to replenish the shrunken population, and a new
genetic cycle begins. The whole GA is terminated when its termination condition
is satisfied. This condition can be specified in the same way as in a generational
GA. The following is the steady-state GA in pseudo code format (Lisnianski and
Levitin 2003).
Example A.1 (Lisnianski and Levitin 2003). In this example we present several
initial stages of a steady-state GA that maximizes the function of six integer vari-
ables x1 , x2 ,..., x6 taking the form
+ ( x4 3.1) + ( x5 2.8 ) + ( x6 8.8 ) .
2 2 2
Appendix A 353
The variables can take values from 1 to 9. The initial population, consisting of
five solutions ordered according to their fitness (value of function f), is
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 )
1 4 2 4 1 2 5 297.8
2 3 7 7 7 2 7 213.8
3 7 5 3 5 3 9 204.2
4 2 7 4 2 1 4 142.5
5 8 2 3 1 1 4 135.2
Using the random generator that produces the numbers of the solutions, the GA
chooses the first and third strings, i.e., (4 2 4 1 2 5) and (7 5 3 5 3 9), respectively.
From these strings, it produces a new one by applying a crossover procedure that
takes the first three numbers from the better parent string and the last three num-
bers from the inferior parent string. The resulting string is (4 2 4 5 3 9). The fit-
ness of this new solution is f ( x1 ,..., x6 ) = 562.4. The new solution enters the
population, replacing the one with the lowest fitness. The new population is now
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 )
1 4 2 4 5 3 9 562.4
2 4 2 4 1 2 5 297.8
3 3 7 7 7 2 7 213.8
4 7 5 3 5 3 9 204.2
5 2 7 4 2 1 4 142.5
No. x1 x2 x3 x4 x5 x6 f ( x1 ,..., x6 ) )
1 4 2 4 5 3 9 562.4
2 3 7 7 4 3 9 349.9
3 4 2 4 1 2 5 297.8
4 3 7 7 7 2 7 213.8
5 7 5 3 5 3 9 204.2
354 Appendix A
Note that the mutation procedure is not applied to all the solutions obtained by
the crossover. This procedure is used with some prespecified probability pmut . In
our example, only the second and the third newly obtained solutions underwent
mutation.
The actual GAs operate with much larger populations and produce thousands of
new solutions using the crossover and mutation procedures. The steady-state GA
with a population size of 100 obtained the optimal solution for the problem pre-
sented after producing about 3000 new solutions. Note that the total number of
possible solutions is 96 = 531441. The GA managed to find the optimal solution
by exploring less than 0.6% of the entire solution space.
Both types of GA are based on the crossover and mutation procedures, which
depend strongly on the solution encoding technique. These procedures should pre-
serve the feasibility of the solutions and provide the inheritance of their essential
properties.
There are three basic steps in applying a GA to a specific problem. In the first
step, one defines the solution representation (encoding in the form of a string of
symbols) and determines the decoding procedure, which evaluates the fitness of
the solution represented by the arbitrary string.
In the second step, one has to adapt the crossover and mutation procedures to
the given representation in order to provide feasibility for the new solutions pro-
duced by these procedures as well as inheriting the basic properties of the parent
solutions by their offspring.
In the third step, one has to choose the basic GA parameters, such as the popu-
lation size, the mutation probability, the crossover probability (generational GA),
or the number of crossovers per genetic cycle (steady-state GA), and formulate the
termination condition in order to provide the greatest possible GA efficiency
(convergence speed).
The strings representing GA solutions are randomly generated by the popula-
tion generation procedure, modified by the crossover and mutation procedures,
Appendix A 355
and decoded by the fitness evaluation procedure. Therefore, the solution represen-
tation in the GA should meet the following requirements:
It should be easily generated (the complex solution generation procedures re-
duce the GA speed).
It should be as compact as possible (using very long strings requires excessive
computational resources and slows the GA convergence).
It should be unambiguous (i.e., different solutions should be represented by dif-
ferent strings).
It should represent feasible solutions (if no randomly generated string repre-
sents a feasible solution, then the feasibility should be provided by simple
string transformation).
It should provide feasibility inheritance of new solutions obtained from feasible
ones by the crossover and mutation operators.
The field of reliability optimization includes problems of finding optimal pa-
rameters, optimal allocation and assignment of different elements into a system,
and optimal sequencing of the elements. Many of these problems are combinato-
rial by nature. The most suitable symbol alphabet for this class of problems is in-
teger numbers. A finite string of integer numbers can be easily generated and
stored. The random generator produces integer numbers for each element of the
string in a specified range. This range should be the same for each element in or-
der to make the string generation procedure simple and fast. If for some reason
different string elements belong to different ranges, then the string should be
transformed to provide solution feasibility.
In the following subsections we show how integer strings of GAs can be inter-
preted for solving different kinds of optimization problems.
X min
j X j X max
j , 1 j H. (A.1)
X j = X min
j + a j ( X max
j X min
j ) N. (A.2)
356 Appendix A
Note that the space of the integer strings just approximately maps the space of
the real-valued parameters. The number N determines the precision of the search.
The search resolution for the jth parameter is ( X max
j X min
j ) N .
No. of variable 1 2 3 4 5 6 7
x min
j 0.0 0.0 1.0 1.0 1.0 0.0 0.0
x max
j 3.0 3.0 5.0 5.0 5.0 5.0 5.0
Random integer
21 4 0 100 72 98 0
string
Decoded variable 0. 63 0.12 1.0 5.0 3.88 4.9 0
K
i = , i j = , i j . (A.3)
i =1
Each set can contain from 0 to Y items. The partition of set can be repre-
sented by the Y-length string a = ( a1a2 ...aY 1aY ) in which a j is a number of the
Appendix A 357
set to which item j belongs. Note that in the strings representing feasible solutions
of the partition problem, each element can take a value in the range (1, K).
Now consider a more complicated allocation problem in which the number of
items is not specified. Assume that there are H types of different items with an
unlimited number of items for each type h. The number of items of each type allo-
cated in each subset can vary. To represent an allocation of the variable number of
items in K subsets one can use the following string encoding:
in which aij corresponds to the number of items of type i belonging to subset j. Ob-
serve that the different subsets can contain identical elements.
Example A.3 Consider the problem of allocating three types of transformers char-
acterized by different nominal power and different availability at two substations
in a power transmission system. In this problem, H = 3 and K = 2. Any possible
allocation can be represented by an integer string using the encoding described
above. For example, the string (2 1 0 1 1 1) encodes the solution in which two
type 1 transformers are allocated in the first substation and one in the second sub-
station, one transformer of type 2 is allocated in the second substation, and one
transformer of type 3 is allocated in each of the two substations.
When K = 1, one has an assignment problem in which a number of different
items should be chosen from a list containing an unlimited number of items of K
different types. Any solution of the assignment problem can be represented by the
string a = ( a1a2 ...aK ) , in which a j corresponds to the number of chosen items of
type j.
The range of variance of string elements for both allocation and assignment
problems can be specified based on the preliminary estimation of the characteris-
tics of the optimal solution (maximal possible number of elements of the same
type included into the single subset). The greater the range, the greater the solution
space to be explored (note that the minimal possible value of the string element is
always zero in order to provide the possibility of not choosing any element of the
given type for the given subset). In many practical applications, the total number
of items belonging to each subset is also limited. In this case, any string represent-
ing a solution in which this constraint is not met should be transformed in the fol-
lowing way:
H
H
* a N
aij = ij j
h =1
ahj , if N j < ahj ,
h =1 (A.4)
a , otherwise,
ij
358 Appendix A
set j.
Example A.4 Consider the case in which the transformers of three types should be
allocated to two substations. Assume that it is prohibited to allocate more than five
transformers of each type to the same substation. The GA should produce strings
with elements ranging from 0 to 5. An example of such a string is (4 2 5 1 0 2).
Assume that for some reason the total number of transformers in the first and
second substations is restricted to seven and six, respectively. In order to obtain a
feasible solution, one has to apply transform (A.4) in which
3 3
N1 = 7, N 2 = 6, a
h =1
h1 = 4 + 5 + 0 = 9, a
h =1
h2 = 2 + 1 + 2 = 5.
in order to obtain the class number in the range (1, K ) . The even elements of the
string should be transformed as follows:
aj 2 = 1 + mod K a j 2 (A.6)
in order to obtain the parameter value encoded by the integer number in the range
( 0, N ) . The value of the parameter is then obtained using Equation A.2.
Example A.5 Consider a weighted voting system in which seven voting units
(N = 7) should be allocated to three separate subsets (K = 3) and a value of a pa-
rameter (weight) associated with each unit should be chosen. The solution should
encode both units distribution among the subsets and the parameters (weights).
Let the range of the string elements be (0, 100) (N = 100). The string (99 21 22 4
75 0 14 100 29 72 60 98 1 0) (in which elements corresponding to the numbers of
the subsets are marked in italics) represents the solution presented in Table A.2.
The values corresponding to the numbers of the groups are obtained using Equa-
tion A.5 as
and so on. Observe that, in this solution, items 1, 3, and 6 belong to the first sub-
set, units 2 and 7 belong to the second subset, and units 4 and 5 belong to the third
subset. The parameters are identical to those in Example A.2.
360 Appendix A
Table A.2 Example of the solution encoding for the mixed partition and parameter determination
problem
No. of unit 1 2 3 4 5 6 7
No. of subset 1 2 1 3 3 1 2
Integer code parameter value 21 4 0 100 72 98 0
1 + [a j / ( N + 1)] (A.7)
and the number corresponding to the value of jth parameter should be obtained as
mod N +1 a j . (A.8)
Consider the example presented above with K = 3 and N = 100. The range of
the string elements should be (0, 302). The string (21 105 0 302 274 98 101) cor-
responds to the same solution as the strings in the previous example (Table A.2).
Like the generation procedures for the partition problem, this one also requires
the generation of Y random numbers.
Example A.6 In a restructured power system, a generating system can plan its
own reserve and can also share the reserve with other generating systems accord-
ing to their reserve contracts. The reserve structure of a generating system should
be determined based on the balance between the required reliability and the re-
serve cost, which is an optimization problem. A GA with a special encoding
scheme that considers the structure of reserve capacity and reserve utilization or-
der is developed for the optimization problem. A mixed numerical and binary
Y
string with length Y + D j is used to encode a solution (Ding et al. 2006). A
j =1
The first sequence of Y numerical items represents Y reserve providers and their
reserve utilization order in a contingency state. The initial sequence of the first Y
items is generated randomly and should be a permutation of Y integer numbers,
362 Appendix A
i.e., it should contain all the numbers from 1 to Y and each number in the string
should be unique. The sequence of items can be represented by Y-length strings
( a1a2 ...aY ) in which a j is the number of a set to which item j belongs. The above
procedure is used for generating a random string permutation.
Y
The next D
j =1
j binary bits represents the contracted reserve capacity for Y re-
serve providers, while D j is the number of binary bits encoding the amount of the
contracted reserve capacity from the reserve provider j. Encoding is performed us-
ing different numbers of binary bits for each contracted reserve amount, depend-
ing on the desired accuracy.
Using this encoding algorithm, the solutions for obtaining the reserve utiliza-
tion order are within the feasible space. As appeared in Figure A.2, reserve pro-
vider 2 is used first, reserve provider 1 is used second, and so on, up to the point
where either the load is met or the available reserve is used up in a contingency
state.
Having a solution represented in the GA by an integer string a, one then has to es-
timate the quality of this solution (or, in terms of the evolution process, the fitness
of the individual). The GA seeks solutions with the greatest possible fitness.
Therefore, the fitness should be defined in such a way that its greatest values cor-
respond to the best solutions. For example, when optimizing the system reliability
R (which is a function of some of the parameters represented by a) one can define
the solution fitness equal to this index, since one wants to maximize it. By con-
trast, when minimizing the system cost C, one has to define the solution fitness
as M C , where M is a constant number. In this case, the maximal solution fit-
ness corresponds to its minimal cost.
In the majority of optimization problems, the optimal solution should satisfy
some constraints. There are three different approaches to handling the constraints
in GA (Michalewicz 1996). One of these uses penalty functions as an adjustment
to the fitness function; two other approaches use decoder or repair algorithms
to avoid building illegal solutions or repair them, respectively. The decoder and
repair approaches suffer from the disadvantage of being tailored to the specific
problems and thus are not sufficiently general to handle a variety of problems. On
the other hand, the penalty approach based on generating potential solutions with-
out considering the constraints and on decreasing the fitness of solutions, violating
the constraints, is suitable for problems with a relatively small number of con-
straints. For heavily constrained problems, the penalty approach causes the GA to
spend most of its time evaluating solutions violating the constraints. Fortunately,
reliability optimization problems usually deal with few constraints.
Appendix A 363
Using the penalty approach one transforms a constrained problem into an un-
constrained one by associating a penalty with all constraint violations. The penalty
is incorporated into the fitness function. Thus, the original problem of maximizing
a function f (a ) is transformed into the maximization of the function:
J
f ( a ) j , (A.9)
j =1
M C (a ) ( R ' , a ), (A.11)
364 Appendix A
The crossover procedures create a new solution as the offspring of a pair of exist-
ing ones (parent solutions). The offspring should inherit some useful properties of
both parents in order to facilitate their propagation throughout the population. The
mutation procedure is applied to the offspring solution. It introduces slight
changes into the solution encoding string by modifying some of the string ele-
ments. Both of these procedures should be developed in such a way as to provide
the feasibility of the offspring solutions given that parent solutions are feasible.
When applied to parameter determination, partition, and assignment problems,
the solution feasibility means that the values of all of the string elements belong to
a specified range. The most commonly used crossover procedures for these prob-
lems generate offspring in which every position is occupied by a corresponding
element from one of the parents. This property of the offspring solution provides
its feasibility. For example, in single-point crossover all the elements located to
the left of a randomly chosen position are copied from the first parent and the rest
of the elements are copied from the second parent.
The commonly used mutation procedure changes the value of a randomly se-
lected string element by 1 (increasing or decreasing this value with equal probabil-
ity). If after the mutation the element is out of the specified range, it takes the
minimal or maximal allowed value.
When applied to sequencing problems, the crossover and mutation operators
should produce offspring that preserve the form of permutations. This means that
the offspring string should contain all of the elements that appear in the initial
strings and each element should appear in the offspring only once. Any omission
or duplication of the element constitutes an error. The mutation procedure that
preserves the permutation feasibility swaps two string elements initially located in
two randomly chosen positions.
There are no general rules for choosing the values of basic GA parameters for
solving specific optimization problems. The best way to determine the proper
combination of these values is by experimental comparison between GAs with dif-
ferent parameters. The GAs should solve a set of problems. When solving each
problem, different GAs should start with the same initial population.
Appendix A 365
References
Cheng S (1998) Topological optimization of a reliable communication network. IEEE Trans Re-
liab 47:2331
Coit D, Smith A (1996) Reliability optimization of series-parallel systems using genetic algo-
rithm. IEEE Trans Reliab 45:254266
Deeter D, Smith A (1998) Economic design of reliable networks. IIE Trans 30:11611174.
Ding Y, Wang P, Lisnianski A (2006) Optimal reserve management for restructured power gen-
erating systems. Reliab Eng Syst Saf 91:792799
Dorigo M, Gambardella L (1997) Ant colony system: a cooperative learning approach to the
traveling salesman problem. IEEE Trans Evol Comput 1:5366
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization. Wiley, New York
Glover F (1989) Tabu search-part I. ORSA J Comput 1(3):190206
Goldberg D (1989) Genetic algorithm in search, optimization and machine learning. Addison-
Wesley, Reading, MA
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press,
Ann Arbor, MI
Hsieh Y, Chen T, Bricker D (1998) Genetic algorithm for reliability design problems. Microelec-
tron and Reliab 38:15991605
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE Inter-
national Conference on Neural Networks, 4:19421948
Kinnear K (1993) Generality and difficulty in genetic programming: evolving a sort. In: Forrest
S (ed) Proceedings of the 5th International Conference on Genetic Algorithms. Morgan Kauf-
mann, San Francisco, pp 287294
Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science
220:671680
Kumar A, Pathak R, Gupta Y (1995) Genetic algorithms-based reliability optimization for com-
puter network expansion. IEEE Trans Reliab 44:6372
Levitin G, Lisnianski A, Ben-Haim H et al (1998) Redundancy optimization for series-parallel
multi-state systems. IEEE Trans Reliab 47:165172
Levitin G, Lisnianski A (1999) Joint redundancy and maintenance optimization for multi-state
series-parallel systems. Reliab Eng Syst Saf 64:3342
Levitin G, Lisnianski A (2000) Optimization of imperfect preventive maintenance for multi-state
systems. Reliab Eng Syst Saf 67:193203
Levitin G (2001) Redundancy optimization for multi-state systems with fixed resource-
requirements and unreliability sources. IEEE Trans Reliab 50:5259
Lisnianski A, Levitin G (2003) Multi-state system reliability: assessment, optimization and ap-
plications. World Scientific, Singapore
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer, Ber-
lin
366 Appendix A
N ( 0 ) = 0,
the process has independent increments, and
the number of failures in any interval of length t is distributed as a Poisson dis-
tribution with parameter t.
The distribution of the number of events in ( t1 , t2 ] has Poisson distribution
with parameter ( t2 t1 ) . Therefore, the probability mass function is:
x
( t2 t1 ) e
( t2 t1 )
n
( t ) e
( t )
(t) is the mean value function. The function ( t ) describes the expected cu-
mulative number of failures behavior.
The underlying assumptions of the NHPP are as follows:
N(0) = 0;
{ N ( t ) , t 0} has independent increments;
P { N ( t + h ) N ( t ) = 1} = ( t ) + (h); and
P { N ( t + h ) N ( t ) 2} = (h).
o(h) denotes a quantity that tends to zero for small h. The function ( t ) is the
failure intensity. Given ( t ) , the mean value function ( t ) = E N ( t ) satisfies
t
( t ) = ( v ) dv. . (B.3)
0
d (t )
(t ) = . (B.4)
dt
Appendix B 369
n b
b ( t ) dt e a (t ) dt
a
P N ( b ) N ( a ) = n = , n = 0,1, 2,.... (B.5)
n!
1 ( t ) = e + t ,
1 1
(B.6)
and the second form is the Weibull or power form with failure intensity
2 ( t ) = 2 2 t 1 ,
2
(B.7)
n
L ( t1 , t2 ,..., tn , 1 , 1 ) = n1 + 1 ti
{ }.
e1 e 1tn 1
(B.8)
i =1 1
n1
e 1 = t ,
e 1 n 1
n (B.9)
t + n 1 = ntn .
i =1
i 1
1 e 1tn
If the failure process follows the Weibull process and testing data are truncated
at the nth failure, with 0 < t1 < t2 < ... < tn denoting the successive failure times,
the likelihood function is
370 Appendix B
(
L ( t1 , t2 ,..., tn , 2 , 2 ) = 2 n 2 n exp 2 tn 2 )t 2 1
i . (B.10)
i =1
n
2 = n 1
,
t
ln n
i =1 ti (B.11)
n
2 = .
tn 2
(
L ( t1 , t2 ,..., tn , 2 , 2 ) = 2 n 2 n exp T 2 )t 2 1
i . (B.12)
i =1
n
2 = n
,
ln T
i =1 ti (B.13)
n
2 = .
T 2
The Laplace trend test is a test for the null hypothesis of an HPP vs. the alternative
of a monotonic trend (Cox and Lewis 1966; Ascher and Feingold 1984).
Appendix B 371
1 n 1 t
n 1 i =1
ti n
2
U= . (B.14)
1
tn
12(n 1)
The null hypothesis The process is HPP is rejected for too small or too large
values of U: U < z or U > z . More over, U > 0 indicates a deteriorating sys-
2 2
n 1 T
V = 2 ln n . (B.15)
i =1 Ti
U
U LR = . (B.16)
CV
2
2i 1
n 1
1
W = U i
2
+ . (B.17)
i =1 2(n 1) 12(n 1)
n 1
2
(2i 1) ln U i + ln(1 U n i )
A = i =1
n + 1. (B.18)
n 1
Critical values of this goodness-of-fit statistic are calculated by Park and Kim
(1992).
The Hartley test is based on some results of Hartley (1950). The test uses the ratio
of maximum value to minimum value of the time intervals between failures. The
Hartley test is as follows:
max { i }
h ( n) = . (B.19)
min { i }
The null hypothesis will be rejected if h ( n ) > h1 ( n ) . The critical values for
this statistic are represented in Gnedenko at al. (1969) and for big n values
( n > 12 ) may be calculated using Monte Carlo simulation.
Consider an NHPP with log-linear and power form intensity functions. Parameter
estimation is carried out by the maximum-likelihood method.
For the case of a known intensity function, testing the hypothesis that the given
sample path is a realization of NHPP can be carried out on the basis of the follow-
ing well-known fact: by the NHPP model the mean value functions of NHPP,
computed in sequence of ordered failure times, are the failure times of an HPP
with a constant intensity function of 1. Therefore, the intervals between events in
the HPP form a sample of i.i.d. standard exponential random variables and it is
possible to use goodness-of fit tests to check the exponentiality of the process, dis-
cussed in the previous paragraph.
Appendix B 373
ti
In other words, events in the transformed time occur at the instants W1, W2, ,
Wn.
The following fact is very important. Denote
1 = W1 , 2 = W2 W1 ,..., n = Wn Wn 1 .
Then, 1 , 2 ,..., n are i.i.d. random variables with standard exponential distri-
bution. Hence, the NHPP in the transformed time becomes a Poisson process with
intensity 1 .
The above-mentioned fact may be used for testing the hypothesis that a given
process is an NHPP with known intensity function (t ). Consider the inter event
intervals 1 , 2 ,..., n in the transformed time and check the hypothesis *0 that
they are i.i.d. exponential random variables with parameter 1.
How to check the hypothesis that the given process is an NHPP when the inten-
sity function is not known in advance?
Suppose we observe a counting process { N ( t ) , t > 0} in the interval [0, tn ] .
The times when the events appear in this process are t1 , t2 ,..., tn . Carry out the es-
timation of (t ), using the maximum-likelihood method, by assuming either the
log-linear or the power-law form of (t ). We choose a suitable intensity function
(t ) according to the minimum of
374 Appendix B
D = sup
(t ) * (t ) , (B.21)
t >0
where
0 for t < t1 ,
t 1 for t1 t < t2 ,
( t ) = ( v ) dv
and * (t ) = (B.22)
0
n for tn < t .
Step 1: Set j := 1.
Step 2: Simulate a sample path with n events of the NHPP with intensity func-
tion (t ).
Step 3: Carry out the time transformation (B.20).
Step 4: Compute the values of test statistics S1, S2,, Sk, described in the pre-
vious paragraph, for this realization. Denote them as S1( j ) , S 2( j ) ,..., Sk( j ) .
Step 5: Set j := j + 1 .
{S (1)
1 , S1(2) ,..., S1( M ) } , {S 2(1) , S 2(2) ,..., S2( M ) } , ... , {Sk(1) , Sk(2) ,..., Sk( M ) } .
Determine the upper and lower -critical values for these statistics. Denote
them as S1 ( ), S1 (1 ); S2 ( ), S2 (1 );...; Sk ( ), Sk (1 ).
For the given counting process {N (t ), t 0} that events which occur at the in-
stants (t1 , t2 ,..., tn ), we do the following operations:
1. Estimate (t ).
ti
2. Carry out the time transformation Vi = (t )dt , i = 1,..., n and compute the
0
1 = V1 , 2 = V2 V1 ,..., n = Vn Vn 1 . (B.23)
Appendix B 375
3. For sample (1 ,..., n ), compute the values S1* , S2* ,..., Sk* of the statistics S1,
S2,, Sk.
4. Compare S1* , S2* ,..., Sk* to the upper and lower critical values calculated above.
5. Reject 0 if one of the statistics S1* , S2* ,..., Sk* falls outside one of the intervals
Let us consider the well-known failure data and compare different authors
conclusions with results gained via our procedure.
We illustrate the presented methodology using data on the time intervals be-
tween successive failures of the air conditioning system of the Boeing 720 jet se-
ries 7912 (Proschan 1963). These data were analyzed by many researchers (in-
cluding Park and Kim 1991; Gaudoin et al. 2003). All authors claim that failure
data came indeed from an NHPP with a power-law intensity function. We came to
a similar conclusion. All test statistic values fall inside the corresponding [0.05,
0.95] simulated intervals for all of our statistics. Therefore, we would claim that
the data do not contradict the NHPP with power-law intensity function.
Crowder et al. (1991) give the data on failures of an engine of USS Halfbeak.
The data were fitted using log-linear and lower-law intensity functions. Using the
Laplace test statistics and MIL-HDBK test statistics the authors express doubts
that the data set comes from an NHPP. Our tests reveal the following: using the
power-law intensity function three of eight statistics fall outside the corresponding
[0.01, 0.99] simulated intervals for these statistics. Using a log-linear intensity
function non our criteria contradict the NHPP hypothesis. Our conclusion is that
the NHPP hypothesis is questionable.
The following data (Frenkel et al. 2004, 2005) summarize the time intervals in
operating hours between failures of the Schlosser Vibration Machine, collected
from operation reports dated from 1999 to 2002 at the Yeroham Construction Ma-
terials Facility (Israel): 240, 4032, 288, 1224, 624, 552, 2352, 168, 480, 1400, 408,
528, 888, 768, 336, 528, 72, 96, 88, 268, 84, 86, 96, 103, 456, 24, 120. The ma-
chine was observed for 16309 h and 27 failures were identified.
The estimated intensity function is assumed to be log-linear. Hence
1 = 1.7992 and 1 = 2.4979. To obtain the failure data we used our method. All
test statistic values fell inside the corresponding simulated intervals. Therefore, we
would claim that the data are from an NHPP with a log-linear intensity function.
References
Ascher H, Feingold H (1984) Repairable systems reliability. Marcel Dekker, New York
Cox DR, Lewis PAW (1966) The statistical analysis of series of events. Chapman and Hall,
London
376 Appendix B
Crow L (1974) Reliability analysis for complex, repairable systems. In: Proschan F, Serfling RJ
(eds) Reliability and Biometry. SIAM, Philadelphia, pp 379410
Crowder MJ, Kimber AC, Smith RL, Sweeting TJ (1991) Statistical analysis of reliability data.
Chapman and Hall/CRC, Boca Raton, Florida
Frenkel IB, Gertsbakh IB, Khvatskin LV (2003) Parameter estimation and hypotheses testing for
nonhomogeneous poisson process. Transport and Telecommunication 4(2):917
Frenkel IB, Gertsbakh IB, Khvatskin LV (2004) Parameter estimation and hypotheses testing for
nonhomogeneous poisson process. Part 2. Numerical Examples. Transport and Telecommu-
nication 5(1):116129
Frenkel IB, Gertsbakh IB, Khvatskin LV (2005) On the simulation approach to hypotheses test-
ing for nonhomogeneous Poisson process. In: Book of Abstracts of International Workshop
on Statistical Modelling and Inference in Life Sciences, September 14, 2005, Potsdam, Ger-
many, pp 3539
Gaudoin O, Yang B Xie Min (2003) A simple goodness-of-fit test for the power-law process,
based on the duane plot. IEEE Trans Reliab 52(1):6974
Gertsbakh IB (2000) Reliability theory with applications to preventive maintenance. Springer,
Berlin
Gnedenko BV, Belyaev U, Solovyev AD (1969) Mathematical methods of reliability theory.
Academic Press, San Diego
Hartley HO (1950) The maximum F-ratio as a short-cut test of heterogeneity of variance. Bio-
metrica 37:308312.
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
Park WJ, Kim YG (1992) Goodness-of-fit tests for the power-law process. IEEE Trans Reliab.
41(1):107111
Proschan F (1963) Theoretical explanation of observed decreasing failure rate. Technometrics
5(3):375383
Appendix C
MATLAB Codes for Examples and Case Study
Calculation
[t,p]=ode45(@funcpdot,tspan,p0),
where pdot is the name of a function that is written to describe the system of dif-
ferential equations, vector tspan contains the starting and ending values of the
independent variable t and p0 is a vector of the initial values of the variables in
the system of differential equations.
Solver Ex2_2
clear all;
p0=[0 0 0 1];
[t,p]=ode45(@funcEx2_2, [0 8], p0);
R1=1-p(:,1); R2=1-p(:,1)-p(:,2);
plot(t,p(:,1),'k-',t,p(:,2),'k--',t,p(:,3),'k-.',
t,p(:,4),'k.',t,R1,'k*',t,R2,'kx');
figure (2);
Et=10*p(:,4)+8*p(:,3)+5*p(:,2)+0*p(:,1);
Dt=1*p(:,2)+6*p(:,1);
plot(t,Et,'k-',t,Dt,'k--');
Solver Ex2_3A
clear all;
p0=[0 0 0 1];
[t,p]=ode45(@funcEx2_3A, [0 0.1], p0);
A1=1-p(:,1); A2=1-p(:,1)-p(:,2);A3=p(:,4);
Et=100*p(:,4)+80*p(:,3)+50*p(:,2)+0*p(:,1);
Dt=10*p(:,2)+60*p(:,1);
plot(t,A1,'k-',t,A2,'k--',t,A3,'k-.');
%figure (2);
plot(t,Et,'k-');
figure (3);
plot(t,Dt,'k-');
Appendix C 379
Solver Ex2_3B
clear all;
p0=[0 0 1];
[t,p]=ode45(@funcEx2_3B, [0 8], p0);
Rw=1-p(:,1);
plot(t,Rw);
-(Mu1_2_2+Mu1_2_1+Lambda3_2_3)*p(5)+Mu2_3_3*p(9);
f(6)=Lambda3_2_3*p(2)+Lambda2_1_1*p(4)
-(Mu2_3_3+Mu1_2_1+Lambda2_1_2+Lambda2_1_3)*p(6)
+Mu1_2_2*p(9)+Mu1_2_3*p(10);
f(7)=Lambda3_2_3*p(3)+Lambda2_1_2*p(4)
-(Mu2_3_3+Mu1_2_2+Lambda2_1_1+Lambda2_1_3)*p(7)
+Mu1_2_1*p(9)+Mu2_3_3*p(11);
f(8)=Lambda2_1_3*p(4)-(Mu1_2_3+Lambda2_1_1
+Lambda2_1_2)*p(8)+Mu1_2_1*p(10)+Mu1_2_2*p(11);
f(9)=Lambda3_2_3*p(5)+Lambda2_1_2*p(6)
+Lambda2_1_1*p(7)-(Mu2_3_3+Mu1_2_2+Mu1_2_1
+Lambda2_1_3)*p(9)+Mu1_2_3*p(12);
f(10)=Lambda2_1_3*p(6)+Lambda2_1_1*p(8)-
(Mu1_2_3+Mu1_2_1+Lambda2_1_2)*p(10)+Mu1_2_2*p(12);
f(11)=Lambda2_1_3*p(7)+Lambda2_1_2*p(8)-
(Mu1_2_3+Mu1_2_2+Lambda2_1_1)*p(11)+Mu1_2_1*p(12);
f(12)=Lambda2_1_3*p(9)+Lambda2_1_2*p(10)
+Lambda2_1_1*p(11)-(Mu1_2_3+Mu1_2_2+Mu1_2_1)*p(12);
Solver Ex2_4A
clear all;
p0=[1 0 0 0 0 0 0 0 0 0 0 0];
[t,p]=ode45(@funcEx2_4A, [0 0.2], p0);
PrG0=p(:,5)+p(:,8)+p(:,9)+p(:,10)+p(:,11)+p(:,12);
PrG1_5=p(:,3)+p(:,7); PrG1_8=p(:,4)+p(:,6);
PrG2_0=p(:,2); PrG3_5=p(:,1);
A=p(:,1)+p(:,2)+p(:,3)+p(:,4)+p(:,6)+p(:,7);
Et=3.5*p(:,1)+2*p(:,2)+1.5*p(:,3)+1.8*p(:,4)
+1.8*p(:,6)+1.5*p(:,7);
Dt=p(:,5)+p(:,8)+p(:,9)+p(:,10)+p(:,11)+p(:,12);
plot(t,Et,'k-');
figure(2);
plot(t,Dt,'k-');
f(1)=Lambda2_1_2*p(3)+Lambda2_1_1*p(4)
+Lambda2_1_3*p(5)+(Lambda2_1_2+Lambda2_1_3)*p(6)
+(Lambda2_1_1+Lambda2_1_3)*p(7);
f(2)=-(Lambda2_1_1+Lambda2_1_2+Lambda3_2_3)*p(2)
+Mu1_2_1*p(3)+Mu1_2_2*p(4)+Mu2_3_3*p(5);
f(3)=Lambda2_1_1*p(2)-(Mu1_2_1+Lambda2_1_2
+Lambda3_2_3)*p(3)+Mu2_3_3*p(6);
f(4)=Lambda2_1_2*p(2)-(Mu1_2_2+Lambda2_1_1
+Lambda3_2_3)*p(4)+Mu2_3_3*p(7);
f(5)=Lambda3_2_3*p(2)-(Mu2_3_3+Lambda2_1_1+Lambda2_1_2
+Lambda2_1_3)*p(5)+Mu1_2_1*p(6)+Mu1_2_2*p(7);
f(6)=Lambda3_2_3*p(3)+Lambda2_1_1*p(5)
-(Mu2_3_3+Mu1_2_1+Lambda2_1_2+Lambda2_1_3)*p(6);
f(7)=Lambda3_2_3*p(4)+Lambda2_1_2*p(5)
-(Mu2_3_3+Mu1_2_2+Lambda2_1_1+Lambda2_1_3)*p(7);
Solver Ex2_4B
clear all;
p0=[0 1 0 0 0 0 0];
[t,p]=ode45(@funcEx2_4B, [0 1], p0);
R=1-p(:,1);
plot(t,R,'k-');
(Lambda_star+2*Mu+Lambda_d)*V(4)+2*Mu*V(5)
+Lambda_d*V(10);
f(5)=1+Lambda_star*V(3)+Lambda*V(4)-
(Lambda+Lambda_star+Mu+Lambda_d)*V(5)+
Mu*V(6)+Lambda_d*V(11);
f(6)=1+2*Lambda*V(5)-(2*Lambda+Lambda_d)*V(6)+
Lambda_d*V(12);
f(7)=Lambda_N*V(1)-(2*Mu+Mu_star+Lambda_N)*V(7)+
2*Mu*V(9)+Mu_star*V(10);
f(8)=1+Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(8)+
2*Lambda*V(9)+Mu_star*V(12);
f(9)=1+Lambda_N*V(3)+Lambda*V(7)+Mu*V(8)-
(Lambda+Mu+Mu_star+Lambda_N)*V(9)+Mu_star*V(11);
f(10)=1+Lambda_N*V(4)+Lambda_star*V(7)-
(Lambda_star+2*Mu+Lambda_N)*V(10)+2*Mu*V(11);
f(11)=1+Lambda_N*V(5)+Lambda_star*V(9)+Lambda*V(10)-
(Lambda+Lambda_star+Mu+Lambda_N)*V(11)+Mu*V(12);
f(12)=1+Lambda_N*V(6)+2*Lambda*V(11)-
(2*Lambda+Lambda_N)*V(12);
b_A=a_A(:,1);
c_A(i)=A(b_A);
i_A(i)=i;
Av(i)=0.999;
i_Av(i)=i;
end
plot(i_A,c_A,'k-',i_Av, Av,'k--');
Lambda_d*V(9);
f(5)=1+Lambda_N*V(2)-(2*Lambda+Mu_star+Lambda_N)*V(5)+
2*Lambda*V(6)+Mu_star*V(9);
f(6)=1+(Lambda+Lambda_N)*V(1)+Mu*V(5)-
(Lambda+Mu+Mu_star+Lambda_N)*V(6)+Mu_star*V(8);
f(7)=1+(Lambda_star+Lambda_N)*V(1)-
(Lambda_star+2*Mu+Lambda_N)*V(7)+2*Mu*V(8);
f(8)=1+Lambda_N*V(3)+Lambda_star*V(6)+Lambda*V(7)-
(Lambda+Lambda_star+Mu+Lambda_N)*V(8)+Mu*V(9);
f(9)=1+Lambda_N*V(4)+2*Lambda*V(8)-
(2*Lambda+Lambda_N)*V(9);
Solver CondRELINT
clear all;
global Lambda Lambda_star Mu Mu_star Lambda_d Lambda_N;
Lambda=3; Lambda_star=10;
Lambda_d=1251; Lambda_N=515.3;
i=0;
for Mu=500:-50:50
Mu_star=Mu;
i=i+1;
V0=[0 0 0 0 0 0 0 0 0];
[t,V]=ode45(@funcCondRELINT,[0 1],V0);
R=1-V/(:,9);
a_R=size(R);
b_R =a_R (:,1);
c_R(i)=R(b_R;
i_R(i)=i;
end
plot(i_R,c_R,'k-');
Appendix C 387
f(1)=0;
f(2)=-Mu34*V(2)+Mu34*V(3);
f(3)=Lambda41+Lambda42NH+(Lambda41+Lambda42NH)*V(1)+
Lambda43*V(2)-(Lambda41+Lambda42NH+Lambda43)*V(3);
References
A first-order, 31
Absorbing state, 45 nth-order joint , 31
Acceptability function, 16, 17 Demand availability, 21
Acceptable and unacceptable states, 16, 17
Accumulated performance deficiency, 279
Aging Multi-state systems, 273 E
AndersonDarling test, 372
Efficiency, 119
Average accumulated performance
Embedded Markov chain, 100, 135, 137
deficiency, 164
Ergodic process, 46
Average expected output performance, 164
Estimate, 118
Estimator, 118
Expected accumulated performance
C deficiency, 92
ChapmanKolmogorov equation, 36, 42
Chromosome structure, 302
Combined performance-demand model, 86 F
Composition operators, 167
Failure criteria, 16, 17
Coherency, 15
Failure time
Composition operator, 155, 167
censored on the right, 127
Confidence coefficient, 126
censored on the left, 128
Confidence interval, 120, 126
Flow transmission MSS, 178
Confidence limits, 126
Frequency of failures, 92
Connection of elements
Failure-terminated test, 128
bridge, 178
with replacement, 128
parallel, 173
without replacement, 129
series, 170
First-order distribution, 31
series-parallel, 175
Fuzzy mMulti-state monotone system, 333
Consistency, 118
Fuzzy multi-state system, 321
Cramervon Mises test, 371
Fuzzy UGF, 336
D G
Discrete-time Markov chains, 34
Generic MSS model, 10, 48, 67
Discrete-state process, 30
Generalized universal generating
Distribution of the stochastic process
392 Index
L
LaplaceStieltjes transform, 50
Laplace trend test, 370 O
LewisRobinson test for trend, 371 ODE solvers, 377
Life cycle cost (LCC), 238 One-step transition probabilities, 35
Loss of load probability (LOLP), 23 One-step transition probabilities for
embedded Markov chain, 102
Optimal corrective maintenance contract
planning, 299, 302
M Optimal preventive replacement policy,310
Maintenance contract, 291, 292 Optimization, 302
Maintenance optimization, 310, 312
Markov model for multi-state element, 203
Markov process, 32
Markov reward model, 79 P
Maximum-likelihood method, 122 Performance rate, 1, 8, 10
Mean accumulated reward, 92 Point estimation, 120
Mean conditional sojourn time, 102 Point process, 33
Mean time of staying in state, 46, 47 Poisson process, 33
Mean time between failures (MTBF), 20 Probability of system failure, 93, 280
Mean time to failure (MTTF), 92, 280 Property of estimators, 118
Mean total number of system failures, 277 Property of invariance, 32
Mean unconditional sojourn time, 102
Index 393
R T
Redundancy, 214 Time between failures, 20
Relevancy of system elements, 14 Time-terminated test, 128
Reliability-associated cost, 242 with replacement, 128
Reliability function, 45, 59 without replacement, 129
Reliability indices, 70, 79, 90, 105 Time to failure, 19
Reliability measures, 18 Task-processing MSS, 178
Renewal process, 33 Transition intensity, 41
Repairable multi-state element, 48, 57, 59 Transition probability function, 35
Reward, 79
U
S Unbiasedness, 119
Semi-Markov model, 99, 204 UGF of parallel systems, 173
Sequencing problems, 361 UGF of series systems, 170
State frequency, 46 UGF of series-parallel systems, 175
State probabilities, 32, 36, 42 UGF of systems with bridge structure, 178
Statistical estimation theory, 118 Universal generating function (UGF), 155
Stochastic matrix, 35 Universal generating operator, 154
System availability, 163 u-function, 159
System sojourn time, 135
Sufficiency, 119
Z
z-transform, 149, 151, 158