Академический Документы
Профессиональный Документы
Культура Документы
BASIC RELIABILITY
MATHEMATICS
This Chapter introduces the terms reliability R(t), unreliability F (t), time to
failure density f (t), failure rate function f r(t), hazard h(t) and cumulative
hazard H(t) functions as well as their interrelationships. Other terms relating
to mean life are also introduced.
It contains mathematical definitions and relationships necessary to under-
stand each of the chapters which follow. These definitions and relationships
are the building blocks of reliability engineering. It introduces the four fun-
damental failure distributions (densities) of reliability engineering. It also
explains how we can estimate the percent of the population which will fail
by a certain time simply by using the sample data order number and number
in the sample. This provides the basis for probability plotting, discussed in
Chapter 8.
Many of the developments in this chapter have their origin in the math-
ematics of actuarial science, developed for over 200 years before they were
applied to electro-mechanical devices. Also, there are statistics and biostatis-
tics courses in “survival analysis” which focus on many of the same topics as
do reliability engineering courses.
Bathtub Curve: A plot of h(t), the hazard function over time, t. So-
called because its shape resembles the profile of a bathtub.
91
92 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
Cumulative Hazard Function H(t): The area under the hazard func-
tion from 0 to t. H(t) is not a probability.
4) Conditions: The device must perform its function under given con-
ditions. For example, if my company builds and sells small gasoline-powered
electrical generators intended for use in ambient temperatures of 0-120 de-
grees Fahrenheit and several are brought to Nome, Alaska and fail to operate
in the winter, we should not charge failures to these units.
5) Time: The device must perform for a period of time. One should
never cite a reliability figure without specifying the time in question. The
exception to this rule is for one-shot devices such as munitions, rockets, au-
tomobile air-bags, and the like. In this case we think of the reliability as the
probability that the device will operate properly (once) when deployed or
used. Or equivalently one-shot reliability may be thought of as the propor-
tion of all identical devices which will operate properly (once) when deployed
or used. In reliability, unless otherwise specified time begins at zero. We
treat conditional probability of failure and conditional reliability separately
and call them as such.
The elements 2,3 and 4 are important to the reliability of a device, but
they differ in different situations; elements 1 and 5 are more basic. Since
94 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
The time element is also basic in reliability. In fact, the same publication
in which the AGREE definition of reliability appeared proposes that the
basic distinction between reliability and quality control is related to this
element. In this way of comparing reliability and quality control, quality
control studies failure at a given time whereas reliability studies failure over
time.
In a sense, this comparison introduces a new definition of reliability, that
is, a study of failure over time. Also the term failure is introduced and to be
consistent, it is important to define failure. Thus, a failure is defined as any
functioning of the device or component which is not considered within the
prescribed limits of satisfactory functioning.
where: Z ∞
fT (t) ≥ 0 and fT (x)dx = 1.
0
96 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
Note that,
dRT (t)
fT (t) = − (3.4)
dt
It is also worth noting that the probability that the failure time T occurs
in an interval (t1 , t2 ) can be written:
exponential and Weibull densities which are discussed in much more detail
in Chapter 4.
Figure 3.2 below illustrates a histogram of 1000 data points with an ex-
ponential density curve overlaid. Figure 3.3 represents a probability plot on
exponential paper of 10 TTF points. The data points are reasonably close to
the fitted line and hence we may conclude that the exponential distribution
is an appropriate choice for f(t).
Figure 3.4 represents plots of Weibull density functions with various pa-
rameters. The general form of this density function is given by
β tβ−1 −(t/θ)β
f (t) = e t>0
θβ
where θ is a scale parameter called the characteristic value and β is called
the shape parameter. More on the Weibull distribution will be presented
throughout the book, beginning with Chapter 4. Some densities in Figure
3.4 have a positive skewness (”ski-slope” to the right) which indicates that
most failures occur in the early part of life. Figure 3.5 represents F(t) vs. t for
98 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
R ∞ 2t −( t )2 t 2
−( 40 ) . The probability of surviving the in-
t 1600
e 40 dt. or R(t) = e
terval (0, 40) is R(40) = exp(−1) = 0.368
p(ti ) = P (T = ti ), i = 0, 1, 2, . . . (3.6)
3.2. MATHEMATICAL DEFINITION OF RELIABILITY 99
4.5
f(t),scale=1,shape=0.5
4 f(t),scale=1,shape=1
f(t),scale=1,shape=2
3.5 f(t),scale=1,shape=3.5
f(t),scale=1,shape=8
3
f(t)
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3
t
In general,
X
RT (t) = P {T > t} = p(ti ) (3.8)
i: ti >t
Notice that:
p(ti ) = R(ti−1 ) − R(ti ) (3.9)
100 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
0.9
0.8
0.7
0.6 F(t),scale=1,shape=0.5
F(t),scale=1,shape=1
F(t)
0.5 F(t),scale=1,shape=2
F(t),scale=1,shape=3.5
0.4
F(t),scale=1,shape=8
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8
4.5
3.5
3
h(t)
2.5
h(t),scale=1,shape=0.5
h(t),scale=1,shape=1
2 h(t),scale=1,shape=2
h(t),scale=1,shape=3.5
1.5 h(t),scale=1,shape=8
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
t
Note that for this problem, in which the interval and TTF density is the
same as the previous example, the conditional unreliability could have been
obtained by subtracting the conditional reliability from one.
if both sides of the first equality in equation 3.15 are multiplied by dt then
h(t)dt is the instantaneous conditional probability of failure, i.e., the proba-
bility of failure in the decreasingly small interval (t, t + dt) given no failure
in (0, t).
the infancy hazard rate or the burn-in hazard rate. The second region, which
was represented by a rather constant hazard rate is the region where failure
is usually attributed to a chance occurrence. The third region is where the
life time reached the stage where the device is beginning to wear out and the
hazard rate begins to increase. We feel that today’s plots of h(t) vs. t are
not like the traditional bathtub shapes.
A few arguments for the demise of the “bathtub” curve will now be pre-
sented. The initial portion, called the “burn-in” period for electronic devices
and the period of early failures for mechanical components, was due to defects
present in the raw materials or subassemblies, errors in workmanship, early
manufacturing problems and the like. That is, the early part of the bathtub
curve was primarily associated with poor quality . As defects were identified
and removed, quality improved, and the hazard function began to steadily
decrease. With TQM and more attention to supplier quality, nurturing of
suppliers, supplier evaluation and certification and with attention to elimi-
106 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
nating and removing the root causes of defects in manufacturing, quality has
vastly improved. There is very little poor quality to improve upon and hence
few reasons to expect a downward slope to the curve.
Many manufacturers of consumer products, with substantial electronic
circuitry, e.g., appliances, are now foregoing the “burn-in” period. That is ,
no “burn-in”. The reason for “burn-in” (high temperatures, and sometimes
vibration) is to allow substandard components (e.g., bad capacitor and faulty
processes, e.g., poor solderability) to identify themselves by failing under the
increased stress(es). Replacements or repairs would be made before assembly
of component to the printed circuit board and/or before boards are placed in
the cabinet or housing. These manufacturers feel that quality has improved
to the extent that it is more likely that “burn-in” will cause latent defects
than it will identify substandard components or processes. This is somewhat
analogous to the cessation of polio immunizations in the United States, with
the belief that it is more likely that the polio shot will cause the disease than
prevent it since it is now so rare in the U.S. population.
Since the exponential distribution, as will be shown in a later section,
has a constant hazard rate, the hazard rate function is useful for comparing
distributions to the exponential. In addition, the empirical hazard function
(based on data alone) has been shown to be convenient for comparing groups
of devices. Other strengths of the use of the hazard function relate to its
facility and stability when there is censoring of some of the data and when
there are several modes of failure present in the failure process.
ical failures.
A is usually observed only for a brief initial period after manufacture or
processing. If devices behaved according to A, the more they were used,
the better they would get, and paradoxically as we shall see in Chapter
12, the more they are repaired, the worse they get. Hence, we recommend
modeling TTF with a random variable whose hazard function is, for the
most part, either relatively constant or increasing in nature, although such
an increase may not be strictly monotone. This is the way most things in
life behave; the more we use them, the worse they get (the more likely they
are to fail). Even for the exponential random variable with the constant
hazard function, wearout occurs. Failure eventually happens. It’s just that
with the exponential, the conditional probability of failure in a fixed interval
is independent of where the interval begins (how long the device has been
operating).
and from that, integrating both sides and using R(0)=1, we have
Z t
R(t) = exp − h(x)dx (3.16)
0
−dR(t) −dR(t)
Now, from (3.15), h(t)R(t) = dt.
However, from (3.4), f (t) = dt
or,
f (t) f (t)
h(t) = = (3.17)
R(t) 1 − F (t)
and the relationship among h(t), f(t) and R(t) is established. Note that if any
one of the three functions is known, the others are known. Thus knowledge
of the hazard rate is equivalent to knowledge of the distribution.
−2 ln{R(t)} ∼ χ22 ,
or from (2.3.18),
−2 ln(e−H(t) ) ∼ χ22
Thus
2 H(t) ∼ χ22 (3.20)
Equation (3.5.3) is the basis for a test of hypotheses to be introduced
in Chapter 4. The cumulative hazard has been proposed (see, for example,
Nelson (1972) or Nelson (1982)) as an effective characteristic to use as a basis
for the determination of the failure distribution through the use of plotting
techniques.
3.5. THE CUMULATIVE HAZARD RATE 109
where h(t ; x=0) is the standard or baseline hazard function and Ψ(x)
is the function of the vector of explanatory variables x with an associated
vector of parameters β. Since it is required that Ψ(x) be positive and that
Ψ(0) = 1, it is usual to define Ψ(x) as:
Example 1
A CMOS integrated curcuit memory device is such that its time to failure
is assumed to follow the exponential distribution and its failure rate λ is
a function of temperature according to the Arrhenius model, that is, λ =
Ke−A/T = eln K−A( T ) , where λ is the failure rate, K is the proportionality
1
o
constant, A is Boltzman’s constant and T is the temperature,
K. Choose
the baseline proportionality constant so that lnK − A T10 = 0, that is, the
baseline failure rate is 1. Then for the proportional hazards model:
1
φ(x) = eln K−A( T ) = ea+bx ,
1
h t:x= = φ(x) h(t; x0 ),
T
3.6. MEAN TIME TO FAILURE (MTTF), AND MEAN TIME BETWEEN FAILURES (MTBF)111
that is, the Arrhenius model is a special case of the proportional hazards
model.
The Arrhenius model was developed as an accelerated life model, which
it is (see the next section and Chapter **) and it will be seen that for the
Weibull distribution, of which the exponential distribution is a member, the
accelerated life model and the proportional hazards model are equivalent. In
the case of this example, note that the accelerated life model is such that:
ln K+A ( T1 )
Ra (t) = Ru (φ(x)t) = e−λ(T )t = e−e = e(a+bx)t
if limt→∞ t R(t) = 0, which is true for distributions whose mean exists, par-
ticularly those of interest in reliability practice. For many of the popular
densities of reliability, it will not be necessary to perform integration to de-
termine the mean as it is well-known.
Once again, it will often not be necessary to perform the above integration
since, for the most part, one will be dealing with well-known TTF densities
whose variances are well-established.
1 Z t
(k−1) (k − 1)mdu
R(t) = (1+mt)
exp −
0 (1 + mu)
(k−1)
1 1 1
exp −(k − 1)ln(1 + mu)|t0 = eln(1+mt)
−k+1
= = ,
(1 + mt) (1 + mt) (1 + mt)k
which is the reliability function for the Burr distribution (see Chapter 6)
with parameter c = 1.
the study that. are random. A consideration of which variables are random
affects the distributional assumptions of estimates and will be discussed later.
Type I censoring is the rule that specifies that the testing is terminated
at a specific, fixed time tc . In this case, the time tc is a fixed value and the
number of units which are censored in a study is a random variable. Type
I censoring is the most common type of censoring used in practice because
it is the easiest to implement since the duration of the study is determined
and fixed beforehand. However, it is not the most convenient in terms of the
distributional considerations.
Type II censoring is the rule that specifies that the testing is terminated
when a pre-set number of units, say r, have failed. In the case of Type II
censoring the time at which the test is stopped is a random variable, that
is, the time at which the rth failure occurred. This type of censoring is
less practical because it does not allow an upper bound on the total time
duration. It does, however, result in a more convenient theory.
The order of censoring indicates whether there is a single or there are
multiple rules for censoring in a test. Multiply censored data are made up of
failure times and a mixture of censored times.
For example, n units are on test:
a) The test is terminated at tc = 100 hours and there are r failures. The
number of failures, R , is a random variable as is the total test time, T T :
r
X
TT = ti (Type I, single)
i=1
b) The test is terminated when the r, say, 10th failure occurs which is at
time t(10) . T T is a random variable as is the total test time:
10
X
TT = ti + (n − 10)t(10) Type II, single
i=1
have the values ti which are observations of the random failure variable T.
In Type II censoring, the censoring time is a random variable, the rth order
statistic T(r) , if the test is stopped at the time of the rth failure.
The (xi , di ) notation can handle multiple censoring mechanisms also, and
will be particularly useful in the maximum likelihood derivations of estima-
tors.
It is important that the censoring mechanism remain independent of the
failure mechanism. It would be impossible to obtain meaningful data if units
were censored when they appeared to have a high probability of failure at the
time of censoring. Any unit censored at a time tc should be representative
of all the units under the same test conditions at time tc .
Problem 3.3 Find E(T) for a system whose TTF density is 16 t e−4t .
−t2
Problem 3.4 For a component having the Rayleigh TTF density, i.e., f (t) = at2 e 2a2 ,
a) find E(T); b) find R(t) and h(t); c) find the reliable life t0.90 .
Problem 3.7 A component has TTF density given by f (t) = kt4 e−5t , t > 0. Find:
a) k
b) R(t)
c) h(t)
d) MTTF
Problem 3.9 Find the mean time-to-failure of the time-to-failure density given by
t − t2
f (t) = e 8 t > 0.
4
120 CHAPTER 3. BASIC RELIABILITY MATHEMATICS
1
√
Numerical answer required. Note that Γ 2
= π.
3 −t/4
Problem 3.10 Suppose that f (t) = t 1536
e
t>0
a) What is the MTTF ?
b) What is the probability of failure before t=100?
Problem 3.11 If T is discrete, say cycles, with numbers small enough so that a con-
i
tinuous approximation is not valid, and P {T = ti = i} = λi! e−λ , i =
0, 1, . . .
a) plot h(ti ), R(ti ) and H(ti ) for i=0, 1, ..., 10 and λ = 2,
b) show h(ti ) is monotone increasing, for any λ?
Problem 3.12 fT (t) = t1u , 0 ≤ T ≤ tu , that is, f(t) is uniform in the time interval
[0, tu ]. Find:
a) R(t)
b) h(t)
c) MTTF
d)MTTF using right hand side of (2.3.28)
e) MRL.
f) Is the process represented by f(t), an aging process?
Problem 3.13 Consider a process where the components are replaced at a set time tr ,
or replaced at failure, if failure occurs before tr . What is the mean life
of a component of this type, in terms of the reliability function?
Problem 3.14 In problem 13, the cost of replacement of such a component at failure
is Cf and at replacement, Cr . What is the average cost per unit time
per component?
Problem 3.15 If the components in problem 14 have constant hazard, show that the
best strategy is to replace on failure only. Is this also true for compo-
nents with decreasing hazard rate?
Problem 3.17 What is the mean of the random variable with the following TTF den-
sity
a b+1
f (t) = e− t ab /Γ(b) 1t
t > 0.
3.15. REFERENCES 121
Problem 3.19 The time to failure density for a particular component is given by f (t) =
1
124416
t3 exp(−t/12) t > 0. What is the probability of failure before 120
hours ?
Problem 3.20 For an exponential distribution with a mean of 500 hours, Find
a)P [f ailure(300, 400)|no failure (0, 300)].
b) P [f ailure(600, 700)|no failure(0, 600)].
Problem 3.21 Write expressions for H(t) for each of the following densities:
a) exponential b) Weibull c) normal d) lognormal e) gamma
Data
11.0 23.5 7.6 5.2 10.6
25.8 28.6 3.5 6.7 8.1
18.7 5.7 4.3 6.9 3.3
10.4 4.4 3.5 6.5 6.3
23.6 2.0 7.4 9.4 17.8
8.3 8.8 1.9 10.4 13.2
3.15 References
Advisory Group on Reliability of Electronic Equipment (AGREE) (1957),
”Reliability of Military Electronic Equipment”, Task Group 9 Report, Wash-
ington, DC, US Government Printing Office, June.
Nelson, Wayne (1982), Applied Life Data Analysis, John Wiley & Sons,
New York.