Вы находитесь на странице: 1из 31

Chapter 3

BASIC RELIABILITY
MATHEMATICS

This Chapter introduces the terms reliability R(t), unreliability F (t), time to
failure density f (t), failure rate function f r(t), hazard h(t) and cumulative
hazard H(t) functions as well as their interrelationships. Other terms relating
to mean life are also introduced.
It contains mathematical definitions and relationships necessary to under-
stand each of the chapters which follow. These definitions and relationships
are the building blocks of reliability engineering. It introduces the four fun-
damental failure distributions (densities) of reliability engineering. It also
explains how we can estimate the percent of the population which will fail
by a certain time simply by using the sample data order number and number
in the sample. This provides the basis for probability plotting, discussed in
Chapter 8.
Many of the developments in this chapter have their origin in the math-
ematics of actuarial science, developed for over 200 years before they were
applied to electro-mechanical devices. Also, there are statistics and biostatis-
tics courses in “survival analysis” which focus on many of the same topics as
do reliability engineering courses.

Glossary of terms and symbols:

Bathtub Curve: A plot of h(t), the hazard function over time, t. So-
called because its shape resembles the profile of a bathtub.

91
92 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

Conditional Reliability: The probability of no failure in an interval


given no failure (survival) from time zero until the starting time of the inter-
val.

Cumulative Hazard Function H(t): The area under the hazard func-
tion from 0 to t. H(t) is not a probability.

Hazard Function h(t): The instantaneous conditional probability of


failure in a small interval (t, t + dt) divided by the width of the interval.

Failure Rate Function: A function depicting the number of failures per


unit of time at a particular time. The failure rate function f r(t) is related
to the hazard function in that its plot over time has the same shape. Only
the Y axis values differ.

Non- parametric means “distribution-free” and refers in this chapter


to estimates of functions such as unreliability, F(t) which are made without
reference to the underlying failure distribution (Weibull, normal, etc.)

Reliability: R(t) The probability that a device or system will perform


its intended function for a given interval of time under specified operating
conditions.

Time-to-Failure Density Function, f(t) A probability density func-


tion describing the failure behavior of system over time.

Unreliability: F (t) = 1 − R(t) The probability that a device or system


will not perform its intended function for a given interval of time under spec-
ified operating conditions. The Unreliability is identical to the cumulative
density function (cdf) in probability theory.

3.1 Definition of reliability


Most definitions of reliability have four elements. Consider the definition pro-
posed by the Advisory Group on Reliability of Electronic Equipment(AGREE)
in 1952 and reported in AGREE (1957).
3.1. DEFINITION OF RELIABILITY 93

Definition: Reliability is the probability of performing without fail-


ure, a specific function under given conditions for a specified period of
time.

The four elements are:

1) Probability: Reliability is a probability, a probability of performing


without failure; thus, a reliability is a number between zero and one.

2) Failure: What constitutes a failure must be agreed upon in advance


of the testing and use of the component or system under study. For example
if the function of a pump is to deliver at least 200 gallons of fluid per minute
and it is now delivering 150 gallons/per minute, the pump has failed, by this
definition.

3) Function: The device whose reliability is in question must perform


a specific function. For example, if I use my gasoline-powered lawnmower to
trim my hedges and a blade breaks, this should not be charged as a failure

4) Conditions: The device must perform its function under given con-
ditions. For example, if my company builds and sells small gasoline-powered
electrical generators intended for use in ambient temperatures of 0-120 de-
grees Fahrenheit and several are brought to Nome, Alaska and fail to operate
in the winter, we should not charge failures to these units.

5) Time: The device must perform for a period of time. One should
never cite a reliability figure without specifying the time in question. The
exception to this rule is for one-shot devices such as munitions, rockets, au-
tomobile air-bags, and the like. In this case we think of the reliability as the
probability that the device will operate properly (once) when deployed or
used. Or equivalently one-shot reliability may be thought of as the propor-
tion of all identical devices which will operate properly (once) when deployed
or used. In reliability, unless otherwise specified time begins at zero. We
treat conditional probability of failure and conditional reliability separately
and call them as such.

The elements 2,3 and 4 are important to the reliability of a device, but
they differ in different situations; elements 1 and 5 are more basic. Since
94 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

reliability is a probability, the theory outlined in Section 1 of this chapter


is available for use in reliability theory and also the methods of probability
assignment discussed in Section 1 are important for reliability studies. The
probability element of reliability also allows one to calculate reliabilities in a
quantitative way, that is, the assessment of a reliability can be done proba-
bilistically so that the quantity given to the reliability has the meaning and
structure of probability for its manipulation and interpretation.

The time element is also basic in reliability. In fact, the same publication
in which the AGREE definition of reliability appeared proposes that the
basic distinction between reliability and quality control is related to this
element. In this way of comparing reliability and quality control, quality
control studies failure at a given time whereas reliability studies failure over
time.
In a sense, this comparison introduces a new definition of reliability, that
is, a study of failure over time. Also the term failure is introduced and to be
consistent, it is important to define failure. Thus, a failure is defined as any
functioning of the device or component which is not considered within the
prescribed limits of satisfactory functioning.

Since the element time is so basic to reliability, it is quite natural then,


that the primary random variable in reliability studies is time and that the
purpose of such studies is often life length. When this emphasis on life length
is the focus of a reliability study, the study is often referred to as a life test.
and this terminology is often used to describe the reliability study. With
these points in mind, one can imagine the kinds of interesting discussions
and arguments one may observe when design engineers, manufacturing en-
gineers, electrical, mechanical and quality engineers get together in design
review or failure analysis meetings and discuss things such as :

1) was it a failure or not ? and

2) was it electrical or mechanical or was it really mechanical caused by


electrical (or vice-versa) or it was caused by software or it was caused by
”those guys over there” ? Thus, there will be some finger-pointing. This
makes for interesting meetings. To minimize these problems, you must clas-
sify potential failures, define what is a failure and have some kind of meeting
of the minds before you actually see the failures.
3.2. MATHEMATICAL DEFINITION OF RELIABILITY 95

3.2 Mathematical definition of reliability


The life of a device under reliability study follows a sequence that results in
an observable time to failure. A new device is put into service, it functions
acceptably for a period of time and then it fails to function satisfactorily.
The observed time to failure is a value of the random variable T, which
represents the lifetime of the device. T takes its values in an interval of
the real numbers, R, most often in the interval [0, ∞). Since the lifetime
of a device is represented by a random variable T, there is a probability
distribution function (cdf) of T,

FT (t) = P (T ≤ t), 0 < t. (3.1)

FT (t) is usually called the unreliability at time t. It represents the prob-


ability of failure in the interval [0, t]. The probability of failure in the interval
(t1 , t2 ] equals F (t2 ) − F (t1 ).

Definition: The reliability function is:

RT (t) = P (T > t) = 1 − FT (t) (3.2)

Thus, reliability is the probability of no failures in the interval [0, t] or equiv-


alently, the probability of failure after time t. Sometimes T will take on only
a countable number of values in R. This case, called the discrete case, occurs
when T is a number of cycles, for example, or when the failure time can occur
at only discrete points.
Most of the time, however, T will be a continuous random variable and its
distribution FT (t) will be a continuous distribution having a density fT (t).

3.2.1 Reliability With Continuous Random Variables


Assume T is a continuous random variable, taking values in (0, ∞) and with
density function fT (t). The reliability function RT (t) is:
Z ∞ Z t
RT (t) = fT (x)dx = 1 − fT (x)dx = 1 − FT (t) (3.3)
t 0

where: Z ∞
fT (t) ≥ 0 and fT (x)dx = 1.
0
96 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

Note that,
dRT (t)
fT (t) = − (3.4)
dt
It is also worth noting that the probability that the failure time T occurs
in an interval (t1 , t2 ) can be written:

P (t1 < T < t2 ) = FT (t2 ) − FT (t1 ) = RT (t1 ) − RT (t2 ) (3.5)


At this point, we abandon the notation fT (t), RT (t) and FT (t) and for
simplicity use f(t), R(t) and F(t), respectively. Figure 3.1 presents the rela-
tionship between f(t), F(t) and R(t) graphically.

Figure 3.1: Relationship between f(t),F(t) and R(t)

Example: The exponential time-to-failure density is given by


1 t
f (t) = exp(− ), t > 0.
θ θ
Using the above relationships,
t t
F (t) = 1 − exp(− ) and R(t) = exp(− ).
θ θ
One selects a time-to-failure (TTF) density, f(t), by collecting failure data
and either doing a goodness of fit test if sufficient data exists, or by making
a probability plot if there is very little data. Probability plotting procedures
are discussed in Chapter 8. We illustrate the use of TTF densities with
3.2. MATHEMATICAL DEFINITION OF RELIABILITY 97

exponential and Weibull densities which are discussed in much more detail
in Chapter 4.
Figure 3.2 below illustrates a histogram of 1000 data points with an ex-
ponential density curve overlaid. Figure 3.3 represents a probability plot on

Figure 3.2: Histogram for Exponential Distribution

exponential paper of 10 TTF points. The data points are reasonably close to
the fitted line and hence we may conclude that the exponential distribution
is an appropriate choice for f(t).
Figure 3.4 represents plots of Weibull density functions with various pa-
rameters. The general form of this density function is given by

β tβ−1 −(t/θ)β
f (t) = e t>0
θβ
where θ is a scale parameter called the characteristic value and β is called
the shape parameter. More on the Weibull distribution will be presented
throughout the book, beginning with Chapter 4. Some densities in Figure
3.4 have a positive skewness (”ski-slope” to the right) which indicates that
most failures occur in the early part of life. Figure 3.5 represents F(t) vs. t for
98 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

Figure 3.3: Probability plot for Exponential Distribution

those distributions. Note that as time increases, the cumulative probability


of failure (failure on or before time t) increases and ultimately reaches one.
Next, Figure 3.6 presents the hazard function for the same set of Weibull
random variables. Compare these plots to the ones in Figure 3.4 and notice
how the shapes of both depend on the parameter β.
Example:
Suppose that the TTF density of a pick and place machine used in printed
circuit surface mount technology is given by a Weibull density with pa-
t 2
rameters θ = 40 and β = 2. Thus, f (t) = 2t402 e−( 40 ) . Hence,R(t) =
2−1

R ∞ 2t −( t )2 t 2
−( 40 ) . The probability of surviving the in-
t 1600
e 40 dt. or R(t) = e
terval (0, 40) is R(40) = exp(−1) = 0.368

3.2.2 Reliability With Discrete Random Variables:


Suppose now T is discrete, taking values 0 = t0 < t1 < t2 < . . . with
probability function:

p(ti ) = P (T = ti ), i = 0, 1, 2, . . . (3.6)
3.2. MATHEMATICAL DEFINITION OF RELIABILITY 99

4.5
f(t),scale=1,shape=0.5
4 f(t),scale=1,shape=1
f(t),scale=1,shape=2
3.5 f(t),scale=1,shape=3.5
f(t),scale=1,shape=8
3
f(t)

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3
t

Figure 3.4: Weibull Density Function

In practice, as with cycles or discrete time periods (like minutes), these


points ti may be taken to be equally spaced with ti = i in suitable units. It
is not necessary but it is often convenient. If ti = i, then:

RT (ti ) = P (T > ti ) = p(i + 1) + p(i + 2) + . . . (3.7)

In general,
X
RT (t) = P {T > t} = p(ti ) (3.8)
i: ti >t

Notice that:
p(ti ) = R(ti−1 ) − R(ti ) (3.9)
100 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

0.9

0.8

0.7

0.6 F(t),scale=1,shape=0.5
F(t),scale=1,shape=1
F(t)

0.5 F(t),scale=1,shape=2
F(t),scale=1,shape=3.5
0.4
F(t),scale=1,shape=8

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8

Figure 3.5: Weibull Cumulative Distribution Function

3.2.3 Conditional Reliability and Unreliability


We first define conditional reliability. Using Bayes Rule (2.10),

P [nof ailure(t, t + T )|nof ailure(0, t)]

P [nof ailure(t, t + T ) ∩ nof ailure(0, t)] R(t + T )


= = (3.10)
P [nof ailure(0, t)] R(t)
Example: For the pick and place machine of the previous example, the
probability of surviving the interval (40,50) given survival (0,40) is
2
e−( 40 )
50
R(50) e−1.5625
= 2 = = 0.5698
R(40) e −( 40
40 ) e −1
3.2. MATHEMATICAL DEFINITION OF RELIABILITY 101

4.5

3.5

3
h(t)

2.5
h(t),scale=1,shape=0.5
h(t),scale=1,shape=1
2 h(t),scale=1,shape=2
h(t),scale=1,shape=3.5
1.5 h(t),scale=1,shape=8

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
t

Figure 3.6: Weibull Hazard or Survival Function

Conditional reliability is always calculated as the ratio of the reliability


at the end of the interval to the reliability of the beginning of the interval.
Conditional unreliability is given by

P [f ailure(t, t + T ) ∩ nof ailure(0, t)]


P [f ailure(t, t + T )|nof ailure(0, t)] =
P [nof ailure(0, t)]

F (t + T ) − F (t) R(t) − R(T + t)


= = (3.11)
R(t) R(t)

Figure 3.8 illustrates the regions of interest.


Example. The probability that the pick and place machine will fail in
102 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

Figure 3.7: Region of interest for Reliability and Unreliability

Figure 3.8: Region of interest for Conditional Reliability and Unreliability

the interval (40,50) given survival (0,40) is


2
1 − e−( 40 ) − (1 − e−1 )
50
F (50) − F (40) e−1 − e−1.5625
= 40 2
= = 0.4302
R(40) e −( 40 ) e −1
3.3. HAZARD FUNCTIONS (CONTINUOUS) 103

Note that for this problem, in which the interval and TTF density is the
same as the previous example, the conditional unreliability could have been
obtained by subtracting the conditional reliability from one.

3.3 Hazard functions (continuous)


Sometimes it is difficult to specify the distribution function of T directly from
the physical information that is available. A function found useful in clari-
fying the relationship between physical modes of failure and the probability
distribution of T is the conditional density function h(t), called the hazard
function or failure rate. Consider the probability that a failure will occur
in the small interval of time (t, t + dt):

P {t ≤ T < t + dt} = P {T ≥ t}P {T < t + dt|T ≥ t},


which is true by the multiplication rule of probability. Further, if R(t) =
P (T > t)ispositive,

P (t < T < t + dt)


P (T < t + dt|T > t) = (3.12)
R(t)
Now the conditional rate of failure for the interval (t,t+dt) is the condi-
tional probability of failure in the interval (given that the life of the device
has reached t) divided by the length of the interval. Thus, the conditional
interval failure rate is given by:
P (t < T < t + dt|T > t) P (t < T < t + dt) [R(t) − R(t + dt)]
= = (3.13)
dt R(t)dt R(t)dt
The instantaneous failure rate, or the hazard rate, is the limit of the above
equations as dt −→ 0. That is,
P (t < T ≤ t + dt|T > t) R(t) − R(t + dt) −R′ (t)
hT (t) = lim = lim =
dt→0 dt dt→0 R(t)dt R(t)
(3.14)
For simplicity we replace hT (t) with h(t). The function h(t) is usually
referred to in reliability as the hazard rate. Above, it is also called the in-
stantaneous failure rate and elsewhere the failure rate function. In actuarial
statistics it is called the force of mortality and, in other places, the intensity
function. In economics, the inverse of h(t) is called Mills’ ratio. Note that
104 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

if both sides of the first equality in equation 3.15 are multiplied by dt then
h(t)dt is the instantaneous conditional probability of failure, i.e., the proba-
bility of failure in the decreasingly small interval (t, t + dt) given no failure
in (0, t).

3.3.1 The Bathtub Curve


With repect to both human and electromechnical failures, the shape of the
hazard function over the lifetime appears to take on a shape somewhat like
a bathtub. We are of the opinion that the bathtub curves for electrical and
mechanical devices are different and futhermore each has been changed since
attention to quality became a strategic issue in America. Figures 3.9 and 3.10
depicts hypothetical bathtub curves for electrical and mechanical devices.

Figure 3.9: Bathtub curve for Electrical devices

Prior to the quality movement, a lifetime was thought of being comprised


of three failure regions. The first, where the hazard rate is decreasing is called
3.3. HAZARD FUNCTIONS (CONTINUOUS) 105

the infancy hazard rate or the burn-in hazard rate. The second region, which
was represented by a rather constant hazard rate is the region where failure
is usually attributed to a chance occurrence. The third region is where the
life time reached the stage where the device is beginning to wear out and the
hazard rate begins to increase. We feel that today’s plots of h(t) vs. t are
not like the traditional bathtub shapes.

Figure 3.10: Bathtub curve for Mechanical devices

A few arguments for the demise of the “bathtub” curve will now be pre-
sented. The initial portion, called the “burn-in” period for electronic devices
and the period of early failures for mechanical components, was due to defects
present in the raw materials or subassemblies, errors in workmanship, early
manufacturing problems and the like. That is, the early part of the bathtub
curve was primarily associated with poor quality . As defects were identified
and removed, quality improved, and the hazard function began to steadily
decrease. With TQM and more attention to supplier quality, nurturing of
suppliers, supplier evaluation and certification and with attention to elimi-
106 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

nating and removing the root causes of defects in manufacturing, quality has
vastly improved. There is very little poor quality to improve upon and hence
few reasons to expect a downward slope to the curve.
Many manufacturers of consumer products, with substantial electronic
circuitry, e.g., appliances, are now foregoing the “burn-in” period. That is ,
no “burn-in”. The reason for “burn-in” (high temperatures, and sometimes
vibration) is to allow substandard components (e.g., bad capacitor and faulty
processes, e.g., poor solderability) to identify themselves by failing under the
increased stress(es). Replacements or repairs would be made before assembly
of component to the printed circuit board and/or before boards are placed in
the cabinet or housing. These manufacturers feel that quality has improved
to the extent that it is more likely that “burn-in” will cause latent defects
than it will identify substandard components or processes. This is somewhat
analogous to the cessation of polio immunizations in the United States, with
the belief that it is more likely that the polio shot will cause the disease than
prevent it since it is now so rare in the U.S. population.
Since the exponential distribution, as will be shown in a later section,
has a constant hazard rate, the hazard rate function is useful for comparing
distributions to the exponential. In addition, the empirical hazard function
(based on data alone) has been shown to be convenient for comparing groups
of devices. Other strengths of the use of the hazard function relate to its
facility and stability when there is censoring of some of the data and when
there are several modes of failure present in the failure process.

3.3.2 Considerations in Selecting a TTF Density Func-


tion
If we examine the simplified plots of Figure 3.11 below, we observe three
hazard functions: A, B and C. A is monotonically decreasing and B is mono-
tonically increasing while C is constant. Relationship A implies that as time
goes by the instantaneous condition probability of failure decreases. This
is unrealistic beyond small values of t for nearly all cases. Hence, most re-
liability studies use relationship B or relationship C. As we shall soon see,
B represents the constant hazard function of the exponential distribution,
implying that the instantaneous conditional probability of failure does not
change over time. B is often selected for use in modeling TTF for electronic
and electrical devices. C is most common for mechanical and electromechan-
3.4. RELATIONSHIP OF H(T ), F (T ) AND R(T ) 107

ical failures.
A is usually observed only for a brief initial period after manufacture or
processing. If devices behaved according to A, the more they were used,
the better they would get, and paradoxically as we shall see in Chapter
12, the more they are repaired, the worse they get. Hence, we recommend
modeling TTF with a random variable whose hazard function is, for the
most part, either relatively constant or increasing in nature, although such
an increase may not be strictly monotone. This is the way most things in
life behave; the more we use them, the worse they get (the more likely they
are to fail). Even for the exponential random variable with the constant
hazard function, wearout occurs. Failure eventually happens. It’s just that
with the exponential, the conditional probability of failure in a fixed interval
is independent of where the interval begins (how long the device has been
operating).

Figure 3.11: Three Hazard Functions

3.4 Relationship of h(t), f (t) and R(t)


Because of the relationship on the right-hand side of (3.15), it also follows
that
−d(ln R(t))
h(t) = (3.15)
dt
108 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

and from that, integrating both sides and using R(0)=1, we have
 Z t 
R(t) = exp − h(x)dx (3.16)
0
−dR(t) −dR(t)
Now, from (3.15), h(t)R(t) = dt.
However, from (3.4), f (t) = dt
or,
f (t) f (t)
h(t) = = (3.17)
R(t) 1 − F (t)
and the relationship among h(t), f(t) and R(t) is established. Note that if any
one of the three functions is known, the others are known. Thus knowledge
of the hazard rate is equivalent to knowledge of the distribution.

3.5 The cumulative hazard rate


In some situations there is interest in the function
Z t Z t
f (x)
H(t) = h(x)dx = dx (3.18)
0 0 1 − F (x)

which is called the cumulative hazard rate. Using (3.17), it is seen


that
R(t) = e−H(t) and H(t) = −ln{R(t)} (3.19)
Notice that the condition that R(t) ≤ 1 indicates that H(t) ≥ 0. It can
easily be shown that −2 ln U ∼ χ22 , where U is distributed uniformly on the
unit interval. Since F(t) is uniform over (0,1), so is R(t). Thus, we can write

−2 ln{R(t)} ∼ χ22 ,

or from (2.3.18),
−2 ln(e−H(t) ) ∼ χ22
Thus
2 H(t) ∼ χ22 (3.20)
Equation (3.5.3) is the basis for a test of hypotheses to be introduced
in Chapter 4. The cumulative hazard has been proposed (see, for example,
Nelson (1972) or Nelson (1982)) as an effective characteristic to use as a basis
for the determination of the failure distribution through the use of plotting
techniques.
3.5. THE CUMULATIVE HAZARD RATE 109

3.5.1 Explanatory variables or regression models


In the early reliability analyses, it was usual to assume that the population of
devices under study was sufficiently homogeneous so that the lifetimes of the
devices could be considered independent and identically distributed random
variables. However, in many applications, it is not possible or instructive
to obtain devices from homogeneous populations and thus the devices under
study or available may differ in their intrinsic properties or in the conditions
under which they operate. These differing conditions make it important to
consider and add explanatory variables or covariates to the reliability model.
These explanatory variables are variables that are associated with each device
and are believed to affect the lifetime of the device. These variables may be
continuous, as in the case of temperature or voltage, or discrete, as in the
case of a particular material used in the device or the presence or absence
of a particular factor in the device. These variables can also be classified as
constant over time or as time dependent.
The relationship of these explanatory variables to the lifetime of the de-
vice is usually studied by means of a regression model in which the lifetime
of the device has a distribution that depends on the explanatory variables.
If the amount of information about the lifetime distribution that is available
is minimal, then there are appropriate non-parametric analyses that can be
used.
In the following sub-sections, two possible models are outlined where
the model allows one to include the effect of explanatory variables on the
lifetime of a device. When the effect of the explanatory variables is applied
to the hazard function as a multiplicative factor, the resulting model is the
proportional hazards model, developed by D. R. Cox (1972). When the effect
of the explanatory variables is applied to the time scale as a multiplicative
factor, the resulting model is the accelerated life model. These models and
the related methods will be considered in more detail in Chapter 9.

3.5.2 Proportional hazards models


One successful method of including explanatory variables into the model is
to allow a function of the explanatory variables to affect the hazard function
of the lifetimes as a multiplicative factor. Thus, a standard or use condition
or baseline hazard function is multiplied by a function of the explanatory
variables, resulting in a new hazard function which is now a function of the
110 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

parameters associated with the explanatory variables. Thus:

h(t; x) = Ψ(x)h(t; x = 0) (3.21)

where h(t ; x=0) is the standard or baseline hazard function and Ψ(x)
is the function of the vector of explanatory variables x with an associated
vector of parameters β. Since it is required that Ψ(x) be positive and that
Ψ(0) = 1, it is usual to define Ψ(x) as:

Ψ(x) = e(x1 β1 +x2 β2 +···+xr βr ) ,

when there are r explanatory variables.


−( 0t h(u)du)
R
Also, since, R(t) = e , the relationship of the reliability func-
tions in the proportional hazards model is:

R(t; x) = [R(t; x = 0)]Ψ(x)

In many applications, the assumption that a change in one or more conditions


under which a device is tested or used has a multiplicative effect on the hazard
rate of the device is a reasonable and effective assumption. In addition,
the techniques associated with the proportional hazard model accommodate
censored data, tied values and failure times of zero. These situations which
occur regularly in reliability studies, can cause difficulty in some analyses,
but pose no problem in the use of proportional hazards techniques. Also,
general non-parametric techniques are also available so that estimation of
the reliability can be achieved without an assumption as to the underlying
failure time distribution.

Example 1
A CMOS integrated curcuit memory device is such that its time to failure
is assumed to follow the exponential distribution and its failure rate λ is
a function of temperature according to the Arrhenius model, that is, λ =
Ke−A/T = eln K−A( T ) , where λ is the failure rate, K is the proportionality
1

o
constant, A is Boltzman’s constant and T is the temperature,
  K. Choose
the baseline proportionality constant so that lnK − A T10 = 0, that is, the
baseline failure rate is 1. Then for the proportional hazards model:
 
1
φ(x) = eln K−A( T ) = ea+bx ,
1
h t:x= = φ(x) h(t; x0 ),
T
3.6. MEAN TIME TO FAILURE (MTTF), AND MEAN TIME BETWEEN FAILURES (MTBF)111

that is, the Arrhenius model is a special case of the proportional hazards
model.
The Arrhenius model was developed as an accelerated life model, which
it is (see the next section and Chapter **) and it will be seen that for the
Weibull distribution, of which the exponential distribution is a member, the
accelerated life model and the proportional hazards model are equivalent. In
the case of this example, note that the accelerated life model is such that:
ln K+A ( T1 )
Ra (t) = Ru (φ(x)t) = e−λ(T )t = e−e = e(a+bx)t

3.6 Mean time to failure (MTTF), and mean


time between failures (MTBF)
It is important to distinguish between the concepts Mean Time To Failure
(MTTF) and Mean Time Between Failures (MTBF). The MTTF is the ex-
pected time to failure of a component or system. That is, the mean of the
time to failure (TTF) for that component or system. The MTBF is the ex-
pected time to failure after a failure and repair of the component or system.
With the MTBF, it is easily seen that some assumptions are necessary as to
the state of the component after its repair. The terms, Time Between Fail-
ures and Mean Time Between Failures are usually reserved for the study of
repairable systems. Although many practitioners assign the same meaning to
the symbols TTF and TBF as well as treating MTTF and MTBF identically,
the practice is discouraged. We will study TBFs and MTBFs extensively in
Chapter 11. Throughout this book, it will be assumed that when we use the
symbols TTF or MTBF, we are referring to operating time until failure, for
both repairable and non-repairable components and systems. We will use the
symbols TBF and MTBF only when referring to down times for repairable
components and systems.
Suppose the random variable (lifetime) T has density f(t) and reliability
function R(t). The MTTF is:
Z ∞ Z ∞  
−dR(t)
M T T F = E(T ) = t f (t)dt = t dt
0 0 dt
Z ∞ Z ∞

= −tR(t)|0 + R(t)dt = R(t)dt (3.22)
0 0
112 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

if limt→∞ t R(t) = 0, which is true for distributions whose mean exists, par-
ticularly those of interest in reliability practice. For many of the popular
densities of reliability, it will not be necessary to perform integration to de-
termine the mean as it is well-known.

3.7 Variance of the TTF


The variance of the TTF is given by
Z ∞
V AR(T ) = t2 f (t)dt − M T T F 2 (3.23)
0

Once again, it will often not be necessary to perform the above integration
since, for the most part, one will be dealing with well-known TTF densities
whose variances are well-established.

3.8 Mean residual life (MRL)


The MRL is the mean remaining lifetime of a component given that it has
reached age t. It is quite important in the study of systems in which censoring
occurs. Censoring is discussed in section 3.9. The MRL is defined as:
R∞
(u − t)f (u)du
r(t) = E(T |T ≥ t) = t
R(t)
R∞
1 ∞
R(u)du
Z
t
= {[−(u − t)R(u)]|∞
t + R(u)du} = (3.24)
R(t) t R(t)
Note that r(0)=E(T) and if the life has a constant hazard rate λ, then r(t) =
1
λ
.?This is further evidence of the memoryless property of the exponential
distribution, discussed in Chapter 4. It also follows that :

r(0) − R0t r(u)


du
R(t) = e (3.25)
r(t)

Thus the time to failure distribution is completely specified by the MRL.


EXAMPLE: Consider a linear mean residual life given by: r(t) = m(1+mt)
(k−1)
.
Using (3.7.2), one finds that the reliability function R(t) is:
3.9. MEAN LIFE WITH CENSORING (MLC) 113

1  Z t 
(k−1) (k − 1)mdu
R(t) = (1+mt)
exp −
0 (1 + mu)
(k−1)
1 1 1
exp −(k − 1)ln(1 + mu)|t0 = eln(1+mt)
  −k+1
= = ,
(1 + mt) (1 + mt) (1 + mt)k
which is the reliability function for the Burr distribution (see Chapter 6)
with parameter c = 1.

3.9 Mean life with censoring (MLC)


In (3.25) and (3.26) above, it is important to note that the MRL is mea-
sured from time t, the lifetime already achieved without failure. An expres-
sion similar to the mean residual life, called the mean life with censoring
(MLC) combines the lifetime already achieved with the expected remaining
life. Thus MLC=MRL + t.
R∞
t f (t)dt
M LC(at time t) = t (3.26)
R(t)
The MLC (at time t) is simply the conditional expectation of the entire
life given survival until time t.

Example: Suppose that a device with an exponential TTF with mean


500 hours is removed from service after 200 hours. What is the MLC at 200
hours ? Recall from Section 3.2.1that the density of the exponential is given
by f (t) = 1θ exp(−t/θ), t > 0 and θ is the mean of the exponential.
R ∞ 1 (−t/500)dt
t 500 e 469.224
M LC(200) = 200 (−200/500) = = 700
e 0.67032
Thus a unit removed from service after 200 hours will have a total ex-
pected life of 700 hours. This is 500 hours after censoring (removal from
service). The MRL(200) = from (3.9.1) is
R ∞ (−t/500)dt
200
e 335.16
(−200/500)
= = 500
e 0.67032
In general, for the exponential, M RL(t) = θ + t. Also, we have verified
through this example that MLC(t)=MRL(t)+t. More will be said about the
MRL and MLC in the Chapter 4.
114 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

3.10 Life testing


3.10.1 Introduction
Life testing is concerned with measuring the pertinent characteristics of the
life of the unit under study. Often this is accomplished by making statistical
inferences about probability distributions or their parameters.
In general, units are put on test, observed and the times of failure recorded
as they occur. For example, a group of similar components are placed on test
and the failure times observed. Obviously, the times at which individual units
fail will vary. Sometimes, assignable causes can be found that contribute to
that variation. Suppose some components have been subjected to testing
at a high temperature environment and it is possible that such components
will fail sooner than those tested at an ambient temperature environment.
However, the components at the high temperature will still have different
failure times; and, if there are no assignable causes in operation, these com-
ponents will still have different failure times, that is, it is always assumed
that the failure times of the components have some random elements and
will be assumed to be a random variable with a probability distribution.
To make statistical inferences about the probability distribution of the
failure time random variable, one uses the failure times that have been ob-
served from a life test, ideally a test that has been statistically designed
for the purpose of the study. If the failure times of a particular component
under a given set of conditions, can be adequately described by a probabil-
ity distribution, there are considerable practical benefits. The failure times
can then be used to estimate the parameters of the distribution and to per-
haps study the relationship of these parameters to associated explanatory
variables. The estimates can be used to make predictions, determine com-
ponent configurations in systems, determine replacement procedures, specify
guarantee periods and make other decisions about the use of the component.

3.10.2 Failure Times


Before a study of the effects of a group of failure times is begun, it must be
determined precisely what these data values involve. There must be agree-
ment among participating parties about certain characteristics of the failure
data. That is, the start of the time measurement, the scale of the time mea-
surement and the definition of a failure are not always consistent in life test
3.10. LIFE TESTING 115

situations and must be precisely specified in a given study.


The time origin in some studies is obvious. In some other studies, how-
ever, there is enough confusion about the origin of time measurements that
some agreement as to the origin must be reached before the study begins.
For example, in some studies the unit under test may have under gone earlier
testing in development studies and some agreement must be reached as to
whether to include the earlier times on test as running times for the present
study.
The same is true of the time scale. Usually the scale is clock time but
other measures may also be used, such as the number of cycles, the mileage
to the first puncture of a tire, etc.
There may also be differing definitions of what constitutes a failure. It is
important that one definition be specified or that different modes of failure
be recognized and allowed as failures. It is usually informative in the data
analysis if the differing modes of failure are distinguished and recorded in the
test results. For many components, failure is catastrophic and the definition
of a failure is obvious. But for some components, the performance slowly
degrades and the amount of degradation to be judged a failure must be
defined.

3.10.3 Censoring of Data


One of the circumstances that has traditionally caused concern and some
difficulty in statistical studies has been the occurrence of missing observa-
tions. Although techniques have been proposed for accommodating missing
observations in most types of statistical analyses, the problem of missing
or incomplete observations in general does not seem to occur as often as in
modern reliability studies. With highly reliable components, it is unusual if
all the components have failed by the end of the time allotted for the test.
In human survival studies and in some engineering studies, some of the units
on test may be withdrawn from the test for various reasons or may fail due
to a cause that is not under study. Such incomplete data observations in
reliability studies are called censored items. Although the failure time infor-
mation on such an item is incomplete, there is usually still some information
in the time data that is available in the item and so the censoring time should
always be recorded in a study.
Censoring is often distinguished according to type and order. The type
of censoring reflects the rule for censoring and influences which variables in
116 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

the study that. are random. A consideration of which variables are random
affects the distributional assumptions of estimates and will be discussed later.
Type I censoring is the rule that specifies that the testing is terminated
at a specific, fixed time tc . In this case, the time tc is a fixed value and the
number of units which are censored in a study is a random variable. Type
I censoring is the most common type of censoring used in practice because
it is the easiest to implement since the duration of the study is determined
and fixed beforehand. However, it is not the most convenient in terms of the
distributional considerations.
Type II censoring is the rule that specifies that the testing is terminated
when a pre-set number of units, say r, have failed. In the case of Type II
censoring the time at which the test is stopped is a random variable, that
is, the time at which the rth failure occurred. This type of censoring is
less practical because it does not allow an upper bound on the total time
duration. It does, however, result in a more convenient theory.
The order of censoring indicates whether there is a single or there are
multiple rules for censoring in a test. Multiply censored data are made up of
failure times and a mixture of censored times.
For example, n units are on test:
a) The test is terminated at tc = 100 hours and there are r failures. The
number of failures, R , is a random variable as is the total test time, T T :
r
X
TT = ti (Type I, single)
i=1

b) The test is terminated when the r, say, 10th failure occurs which is at
time t(10) . T T is a random variable as is the total test time:
10
X
TT = ti + (n − 10)t(10) Type II, single
i=1

c) The test is terminated at tc = 100 hours and there are r failures. In


addition, two units have been removed while still functioning at 50 hours.
(Type I, multiple)
More generally, for the ith unit from a sample of n on life test, one could
record the observation (xi , di ), where xi is the failure time if the indicator
variable di = 1 and xi is the censored time if di = 0. In Type I censoring,
all the xi values are equal to tc when di = 0 and when di = 1, the xi values
3.11. RELIABILITY DATA FROM THE FIELD 117

have the values ti which are observations of the random failure variable T.
In Type II censoring, the censoring time is a random variable, the rth order
statistic T(r) , if the test is stopped at the time of the rth failure.
The (xi , di ) notation can handle multiple censoring mechanisms also, and
will be particularly useful in the maximum likelihood derivations of estima-
tors.
It is important that the censoring mechanism remain independent of the
failure mechanism. It would be impossible to obtain meaningful data if units
were censored when they appeared to have a high probability of failure at the
time of censoring. Any unit censored at a time tc should be representative
of all the units under the same test conditions at time tc .

3.11 Reliability data from the field


After release of a product, most of the data provided to the reliability engineer
come from the field (from actual use conditions). In this case, the data are
almost always multiply-censored. This means there are a mixture of failure
times and non-failure running times – i.e., there is no particular order in
which the failed and non-failed units occur. They are completely intermixed.
Many of the topics in this book deal with multiply-censored data.

3.12 Reliable life


Sometimes, instead of computing the reliability at time t, it is of interest
to compute the time for which the reliability is α . This value is called the
reliable life for reliability α. Reliable Life is a useful way of allowing engi-
neers to specify reliability goals or targets as well as specifying intolerable
reliability values. More will be said about this in Chapter 4. It gives the
time at which 100% of the components in question are functioning and is
equivalent to determining the 100(1 − α)th percentile of the time to failure
distribution. Estimates of the reliable life can be obtained from one-sided
tolerance limits. If the reliability function is known, the reliable life can be
obtained by inverting the reliability function at the appropriate value.
118 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

EXAMPLE: Suppose the reliability function for the life of a particular


component is given by:
2
R(t) = e−0.01t ,
where t is in hours. The reliable life for reliability R=0.90 is:
r
−ln(0.90) √
t0.90 = = 10.536 = 3.246 hours.
0.01

3.13 Other responses


Although the failure time of a unit on life test is the primary response to be
studied, other responses with associated probability distributions can occur.
In some test situations, the time of failure is not relevant, only whether the
unit fails or not is of importance. This situation results in what is called
attribute life test data. It is often more economical and straight-forward
to run an attribute test and analyze attribute data but there are obvious
disadvantages in that less information is obtained than if one observed the
failure times. The analysis of attribute test data will be treated in a later
chapter but the emphasis of the procedures in this text will be the cases
where a variable, such as failure time, will be observed.
Another type of data that can arise in a life test is called quantal-response
data. Quantal-reponse data is observed when the failure time itself cannot
be observed and a unit is only inspected once at a certain time to see if it
has failed or not.
A more usual situation in life testing, where the failure time itself cannot
be observed, is the case of interval or grouped data responses. In this case,
the units are inspected more than once, but one only knows whether a unit
failed in an interval between inspections. Techniques for this type of data
will be discussed for some of the graphical data analysis procedures. When
interval data occurs, it is often with large data sets and in these situations
the graphical analyses do well.

3.14 Problems in reliability mathematics


Problem 3.1 If h(t) = 3t2 − 2t.
a) What are f(t) and R(t)?
3.14. PROBLEMS IN RELIABILITY MATHEMATICS 119

b) What the restrictions on t ?


c) R(2)?
6t
Problem 3.2 A system has a time to failure (TTF) density f (t) = (1+t)4
. Find R(t),
h(t) and H(t).

Problem 3.3 Find E(T) for a system whose TTF density is 16 t e−4t .
−t2
 
Problem 3.4 For a component having the Rayleigh TTF density, i.e., f (t) = at2 e 2a2 ,
a) find E(T); b) find R(t) and h(t); c) find the reliable life t0.90 .

Problem 3.5 If f (t) = 25t e−0.2t ,


a) what is the probability that this part fails during the first 10 hours
of life
b) what is the probability that it fails during the interval (10, 20)
c) what is the conditional probability that it fails during the interval
(10, 20),
given that it has survived until 10 hours ?
d) what is the MRL at t = 10?
1
Problem 3.6 h(t) = t− 2 ,
a) what is H(t)?
b) what is R(3)?

Problem 3.7 A component has TTF density given by f (t) = kt4 e−5t , t > 0. Find:
a) k
b) R(t)
c) h(t)
d) MTTF

Problem 3.8 Suppose that R(t) = t2 exp(−9t2 ), t>0


a) What is the MTTF ? (Numerical answer required)
b) What is the expression for f(t) ?
c) Is the random variable, T, a time-to-failure random variable ?
Check by evaluating F(0) and F (∞).
d) What is H(t)?

Problem 3.9 Find the mean time-to-failure of the time-to-failure density given by
t − t2
f (t) = e 8 t > 0.
4
120 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

1
 √
Numerical answer required. Note that Γ 2
= π.
3 −t/4
Problem 3.10 Suppose that f (t) = t 1536
e
t>0
a) What is the MTTF ?
b) What is the probability of failure before t=100?

Problem 3.11 If T is discrete, say cycles, with numbers small enough so that a con-
i
tinuous approximation is not valid, and P {T = ti = i} = λi! e−λ , i =
0, 1, . . .
a) plot h(ti ), R(ti ) and H(ti ) for i=0, 1, ..., 10 and λ = 2,
b) show h(ti ) is monotone increasing, for any λ?

Problem 3.12 fT (t) = t1u , 0 ≤ T ≤ tu , that is, f(t) is uniform in the time interval
[0, tu ]. Find:
a) R(t)
b) h(t)
c) MTTF
d)MTTF using right hand side of (2.3.28)
e) MRL.
f) Is the process represented by f(t), an aging process?

Problem 3.13 Consider a process where the components are replaced at a set time tr ,
or replaced at failure, if failure occurs before tr . What is the mean life
of a component of this type, in terms of the reliability function?

Problem 3.14 In problem 13, the cost of replacement of such a component at failure
is Cf and at replacement, Cr . What is the average cost per unit time
per component?

Problem 3.15 If the components in problem 14 have constant hazard, show that the
best strategy is to replace on failure only. Is this also true for compo-
nents with decreasing hazard rate?

Problem 3.16 If the components in problem 15 have f(t) as in problem 12 with tu = 50


hours and tr = 30 hours, find the cost per unit time per component. If
2Cr = Cf , can you find a better tr ?

Problem 3.17 What is the mean of the random variable with the following TTF den-
sity
a b+1
f (t) = e− t ab /Γ(b) 1t

t > 0.
3.15. REFERENCES 121

Problem 3.18 Suppose that R(t) = t2 exp(−9t2 ). What is the MTTF ?

Problem 3.19 The time to failure density for a particular component is given by f (t) =
1
124416
t3 exp(−t/12) t > 0. What is the probability of failure before 120
hours ?

Problem 3.20 For an exponential distribution with a mean of 500 hours, Find
a)P [f ailure(300, 400)|no failure (0, 300)].
b) P [f ailure(600, 700)|no failure(0, 600)].

Problem 3.21 Write expressions for H(t) for each of the following densities:
a) exponential b) Weibull c) normal d) lognormal e) gamma

Problem 3.22 Estimate the parameters of the lognormal distribution fitted to


the following:

Data
11.0 23.5 7.6 5.2 10.6
25.8 28.6 3.5 6.7 8.1
18.7 5.7 4.3 6.9 3.3
10.4 4.4 3.5 6.5 6.3
23.6 2.0 7.4 9.4 17.8
8.3 8.8 1.9 10.4 13.2

3.15 References
Advisory Group on Reliability of Electronic Equipment (AGREE) (1957),
”Reliability of Military Electronic Equipment”, Task Group 9 Report, Wash-
ington, DC, US Government Printing Office, June.

Nelson, Wayne (1972), ”Theory and Application of Hazard Plotting for


Censored Failure Data,” Technometrics 14, pp. 945-966

Nelson, Wayne (1982), Applied Life Data Analysis, John Wiley & Sons,
New York.

Вам также может понравиться