0 оценок0% нашли этот документ полезным (0 голосов)

109 просмотров31 страницаreliability

Nov 25, 2015

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

reliability

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

109 просмотров31 страницаreliability

© All Rights Reserved

Вы находитесь на странице: 1из 31

Most people will have some concept of what reliability is from everyday life. For

example, people may discuss how reliable their washing machine has been over the length of

time they have owned it. Similarly, a car that doesnt need to go to the garage for repairs

often, during its lifetime, would be said to have been reliable. It can be said that reliability is

quality overtime. Quality is associated with workmanship and manufacturing and therefore if

a product doesnt work or breaks as soon as you buy it you would consider the product to

have poor quality. However if over time parts of the product wear-out before you expect them

to then this would be termed poor reliability. The difference therefore between quality and

reliability is concerned with time and more specifically product life time.

Reliability is the probability of performing without failure, a specific function, under given

conditions for a specified period of time.

The five elements are:

i.

ii.

iii.

iv.

v.

reliability is a number between zero and one.

Failure: What constitutes a failure must be agreed upon in advance of the testing and use

of the component or system under study. For example if the function of a pump is to

deliver at least 200 gallons of fluid per minute and it is now delivering 150 gallons/per

minute, the pump has failed, by this definition.

The main reasons why failures occur include: The product is not fit for purpose or more

specifically the design is inherently incapable; the item may be overstressed in some way;

caused by wear-out; caused by variation; wrong specifications may cause failures; misuse

of the item may cause failure; items are designed for a specific operating environment and

if they are then used outside this environment then failure can occur etc.

Function: The device whose reliability is in question must performa specific function. For

example, if I use my gasoline-powered lawn mower to trim my hedges and a blade

breaks, this should not be charged as a failure.

Conditions: The device must perform its function under given conditions. For example, if

my company builds and sells small gasoline-powered electrical generators intended for

use in ambient temperatures of 0-120 degrees Fahrenheit and several are brought to

Nome, Alaska and fail to operate in the winter, we should not charge failures to these

units.

Time: The device must perform for a period of time. One should never cite a reliability

figure without specifying the time in question. The exception to this rule is for one-shot

devices such as munitions, rockets, automobile air-bags, and the like. In this case we

think of the reliability as the probability that the device will operate properly (once) when

deployed or used. Or equivalently one-shot reliability may be thought of as the proportion

of all identical devices which will operate properly (once) when deployed or used. In

reliability, unless otherwise specified time begins at zero. We treat conditional probability

of failure and conditional reliability separately and call them as such.

however there will always be an element of uncertainty. The actual

strength values of any population of components will vary; there will

be some that are relatively strong, others that are relatively weak, but

most will be of nearly average strength. Similarly there will be some

loads greater than others but mostly they will be average. Fig1 shows

the load strength relationship with no overlaps.

However if, as shown in fig2, there is an overlap of the two distributions then failures will

occur. There therefore needs to be a safety margin to ensure that there is no overlap of these

distributions.

It is clear that to ensure good reliability the causes of failure need to be identified and

eliminated. Indeed the objectives of reliability engineering are:

of failures;

To identify and correct the causes of failure that does occur;

To determine ways of coping with failures that does occur;

To apply methods of estimating the likely reliability of new designs, and for

analysing reliability data.

The so-called bath-tub curve represents the pattern of failure for many products

especially complex products such as cars and washing machines. The vertical axis in the

figure is the failure rate at each point in time. Higher values here indicate higher probabilities

of failure.

The life of a product or a population of units can be divided into three distinct periods.

Figure 3 shows the reliability bathtub curve which models the cradle to grave instantaneous

failure rates vs. time. If we follow the slope from the start to where it begins to flatten out this

can be considered the first period. The first period is characterized by a decreasing failure

rate. It is what occurs during the early life of a product or population of units. The weaker

units die off leaving a population that is more rigorous. This first period is also called infant

mortality period. The next period is the flat portion of the graph. It is called the normal life.

Failures occur more in a random sequence during this time. It is difficult to predict which

failure mode will manifest, but the rate of failures is predictable. Notice the constant slope.

The third period begins at the point where the slope begins to increase and extends to the end

of the graph. This is what happens when units become old and begin to fail at an increasing

rate.

Infant Mortality: This stage is also called early failure or debugging stage. The failure

rate is high but decreases gradually with time. During this period, failures occur because

engineering did not test products or systems or devices sufficiently, or manufacturing

made some defective products. Therefore the failure rate at the beginning of infant

mortality stage is high and then it decreases with time after early failures are removed by

burn-in or other stress screening methods. Some of the typical early failures are: poor

welds, poor connections, contamination on surface in materials, incorrect positioning of

parts, etc.

Useful Life Period: As the product matures, the weaker units die off, the failure rate

becomes nearly constant, and modules have entered what is considered the normal life

period. This period is characterized by a relatively constant failure rate. The length of this

period is referred to as the system life of a product or component. It is during this period

of time that the lowest failure rate occurs. Notice how the amplitude on the bathtub curve

is at its lowest during this time. The useful life period is the most common time frame for

making reliability predictions.

Wear-out Period: This is the final stage where the failure rate increases as the products

begin to wear out because of age or lack of maintenance. When the failure rate becomes

high, repair, replacement of parts etc., should be done.

RELIABILITY MEASURES

Reliability is the probability that a product or part will operate properly for a specified

period of time (design life) under the design operating conditions (such as temperature, volt,

etc.) without failure. In other words, reliability may be used as a measure of the systems

success in providing its function properly. Reliability is one of the quality characteristics that

consumers require from the manufacturer of products.

Many mathematical concepts apply to reliability engineering, particularly from the

areas of probability and statistics. Likewise, many mathematical distributions can be used for

various purposes, including the Gaussian (normal) distribution, the log-normal distribution,

the exponential distribution, the Weibull distribution and a host of others.

Failure rate: The purpose for quantitative reliability measurements is to define the rate of

failure relative to time and to model that failure rate in a mathematical distribution for the

purpose of understanding the quantitative aspects of failure. The most basic building block is

the failure rate, which is estimated using the following equation:

Where:

= Failure rate (sometimes referred to as the hazard rate)

T = Total running time/cycles/miles/etc. during an investigation period for both failed

and non-failed items.

r = the total number of failures occurring during the investigation period.

For example, if five electric motors operate for a collective total time of 50 years with

five functional failures during the period, then the failure rate, , is 0.1 failures per year.

Another very basic concept is the mean time between/to failure (MTBF/MTTF). The

only difference between MTBF and MTTF is that we employ MTBF when referring to items

that are repaired when they fail. For items that are simply thrown away and replaced, we use

the term MTTF. The computations are the same.

The basic calculation to estimate mean time between failure (MTBF) and mean time

to failure (MTTF), both measures of central tendency, is simply the reciprocal of the failure

rate function. It is calculated using the following equation:

Where:

= Mean time between/to failure

T = Total running time/cycles/miles/etc. during an investigation period for

both failed and non-failed items.

r = the total number of failures occurring during the investigation period.

The MTBF for the industrial electric motor mentioned in the previous example is 10

years, which is the reciprocal of the failure rate for the motors. Incidentally, we would

estimate MTBF for electric motors that are rebuilt upon failure. For smaller motors that are

considered disposable, we would state the measure of central tendency as MTTF. The failure

rate is a basic component of many more complex reliability calculations. Depending upon the

mechanical/electrical design, operating context, environment and/or maintenance

effectiveness, a machines failure rate as a function of time may decline, remain constant,

increase linearly or increase geometrically.

Failure rate calculations are based on complex models which include factors using

specific component data such as temperature, environment, and stress. In the prediction

model, assembled components are structured serially. Thus, calculated failure rates for

assemblies are a sum of the individual failure rates for components within the assembly.

There are three common basic categories of failure rates:

a) Mean Time Between Failures (MTBF): MTBF is a basic measure of reliability for

repairable items. MTBF can be described as the time passed before a component,

assembly, or system fails, under the condition of a constant failure rate. Another way of

stating MTBF is the expected value of time between two consecutive failures, for

repairable systems. It is a commonly used variable in reliability and maintainability

analyses.

MTBF can be calculated as the inverse of the failure rate, , for constant failure rate

systems. For example, for a component with a failure rate of 2 failures per million hours,

the MTBF would be the inverse of that failure rate, , i.e.:

( = 1/MTBF )

( = T/R)

= MTBF; T = total time; R = number of failures

MTBF = MTTF + MTTR (see figure 5 below)

b) Mean time to failure (MTTF): MTTF is a basic measure of reliability for non-repairable

systems. It is the mean time expected until the first failure of a piece of equipment. MTTF

is a statistical value and is intended to be the mean over a long period of time and with a

large number of units. For constant failure rate systems, MTTF is the inverse of the

failure rate, . If failure rate, , is in failures/million hours, MTTF = 1,000,000 /Failure

Rate, , for components with exponential distributions.

MTTF is the number of total hours of service of all devices divided by the number of

devices. It is only when all the parts fail with the same failure mode that MTBF

converges to MTTF.

MTTF = 1/

= T/N

( = MTTF; T = total time; N = Number of units under test.)

For example, the item above fails, on average, once every 4000 hours, so the

probability of failure for each hour is obviously 1/4000. This depends on the failure

rate being constant - which is the condition for the exponential distribution.

This equation can also be written the other way round:

MTBF (or MTTF) = 1/

For example, if the failure rate is 0.00025, then

MTBF (or MTTF) = 1/0.00025 = 4,000 hours.

c) Mean Time to Repair (MTTR): Mean time to repair (MTTR) is defined as the total

amount of time spent performing all corrective or preventative maintenance repairs

divided by the total number of those repairs. It is the expected span of time from a failure

(or shut down) to the repair or maintenance completion. This term is typically only used

with repairable systems.

If you take a large number of measurements you can draw a histogram to show the

how the measurements vary. A more useful diagram, for continuous data, is the probability

density function. The y axis is the percentage measured in a range (shown on the x-axis)

rather than the frequency as in a histogram. If you reduce the ranges (or intervals) then the

histogram becomes a curve which describes the distribution of the measurements or values.

This distribution is the probability density function or PDF.

i.

NORMAL DISTRIBUTION

In reliability engineering, the normal distribution primarily applies to measurements of

product susceptibility and external stress. This two parameter distribution is used to describe

systems in which a failure results due to some wear out effect for many mechanical systems.

Normal distributions are applied to single variable continuous data (E.g. heights of plants,

weights of lambs, lengths of time etc.). The normal distribution is the most important

distribution in statistics, since it arises naturally in numerous applications. The key reason is

that large sums of (small) random variables often turn out to be normally distributed.

The normal distribution takes the well-known bell shape. This distribution is symmetrical

about the mean and the spread is measured by variance. This distribution is symmetrical

about the mean and the spread is measured by variance. The larger the value, the flatter the

distribution. The pdf is given by

Where is the mean value and is the standard deviation. The cumulative

distribution function (cdf) is

(either s or x)

The reliability function is

There is no closed form solution for the above equation. However, tables for the

standard normal density function are readily available and can be used to find probabilities

for any normal distribution. If

This is a so-called standard normal pdf, with a mean value of 0 and a standard

deviation of 1. The standardized cdf is given by

variable T, with mean and standard deviation ,

Where yields the relationship necessary if standard normal tables are to be used.

The hazard function for a normal distribution is a monotonically increasing function of t. This

can be easily shown by proving that h(t) 0 for all t. Since

The normal distribution is flexible enough to make it a very useful empirical model. It

can be theoretically derived under assumptions matching many failure mechanisms. Some of

these are corrosion, migration, crack growth, and in general, failures resulting from chemical

reactions or processes. That does not mean that the normal is always the correct model for

these mechanisms, but it does perhaps explain why it has been empirically successful in so

many of these cases.

Example: A component has a normal distribution of failure times with = 2000 hours and

= 100 hours. Find the reliability of the component and the hazard function at 1900 hours.

Solution: The reliability function is related to the standard normal deviate z by,

The value of the hazard function is found from the relationship

Example: A part has a normal distribution of failure times with = 40000 cycles and =

2000 cycles. Find the reliability of the part at 38000 cycles.

Solution: The reliability at 38000 cycles,

ii.

EXPONENTIAL DISTRIBUTION

The exponential distribution, the most basic and widely used reliability prediction

formula, models machines with the constant failure rate, or the flat section of the bathtub

curve. Most industrial machines spend most of their lives in the constant failure rate, so it is

widely applicable. Below is the basic equation for estimating the reliability of a machine that

follows the exponential distribution, where the failure rate is constant as a function of time.

Where:

Rt = Reliability estimate for a period of time, cycles, miles, etc. (t).

e = Base of the natural logarithms (2.718281828)

= Failure rate (1/MTBF, or 1/MTTF)

F(t) = Unreliability (The probability that the component or system

experiences the first failure or has failed one or more times during the

time interval zero to time t, given that it was operating or repaired to a

like new condition at time zero; Rt + Ft = 1)

i.e. The PDF, CDF and survival function is given as:

In the electric motor example, if you assume a constant failure rate the likelihood of

running a motor for six years without a failure, or the projected reliability, is 55 percent.

This is calculated as follows:

In other words, after six years, about 45% of the population of identical motors

operating in an identical application can probabilistically be expected to fail. It is worth

reiterating at this point that these calculations project the probability for a population. Any

given individual from the population could fail on the first day of operation while another

individual could last 30 years. That is the nature of probabilistic reliability projections.

A characteristic of the exponential distribution is the MTBF occurs at the point at

which the calculated reliability is 36.78%, or the point at which 63.22% of the machines have

already failed. In our motor example, after 10 years, 63.22% of the motors from a population

of identical motors serving in identical applications can be expected to fail. In other words,

the survival rate is 36.78% of the population.

The probability density function (pdf), or life distribution, is a mathematical equation

that approximates the failure frequency distribution. It is the pdf, or life frequency

distribution, that yields the familiar bell-shaped curve in the Gaussian, or normal,

distribution. Below is the pdf for the exponential distribution.

f(t) = e-t

Where:

f(t) = Life frequency distribution for a given time (t) (Failure Density)

e = Base of the natural logarithms (2.718281828)

= Failure rate

In our electric motor example, the actual likelihood of failure at three years is calculated as

follows:

-t

f(t) = e

f(3) = 0.1e-0.1x3 = .07408 7.4%

In the example, if we assume a constant failure rate, which follows the exponential

distribution, the life distribution, or pdf for the industrial electric motors, is expressed in

Figure 6. The failure rate is constant, but the pdf mathematically assumes failure without

replacement, so the population from which failures can occur is continuously reducing

asymptotically approaching zero.

The cumulative distribution function (cdf) is simply the cumulative number of failures

one might expect over a period of time. For the exponential distribution, the failure rate is

constant, so the relative rate at which failed components are added to the cdf remains

constant. However, as the population declines as a result of failure, the actual number of

mathematically estimated failures decreases as a function of the declining population. Much

like the pdf asymptotically approaches zero, the cdf asymptotically approaches one.

The declining failure rate portion of the bathtub curve, which is often called the infant

mortality region, and the wear out region will be discussed in the following section

addressing the versatile Weibull distribution.

Hazard Rate: Sometimes it is difficult to specify the distribution function of T directly from

the physical information that is available. A function found useful in clarifying the

relationship between physical modes of failure and the probability distribution of T is the

conditional density function h(t), called the hazard function or failure rate. The hazard

function for the exponential distribution is given as:

h(t) =

()

For a constant failure rate, hazard rate is also constant and is equal to the failure rate.

h(t) =

()

Notice that the hazard function is not a function of time and is in fact a constant equal to .

iii.

WEIBULL DISTRIBUTION

distribution. It is named after Waloddi Weibull, who described it in detail in 1951, although it

was first identified by Frchet (1927) and first applied by Rosin & Rammler (1933) to

describe a particle size distribution.

Weibull analysis is easily the most versatile distribution employed by reliability

engineers. While it is called a distribution, it is actually a tool that enables the reliability

engineer to first characterize the probability density function (failure frequency distribution)

of a set of failure data to characterize the failures as early life, constant (exponential) or wear

out (Gaussian or lognormal) by plotting time to failure data on a special plotting paper with

the log of the times/cycles/miles to failure plotted a log scaled X-axis versus the cumulative

percent of the population represented by each failure on a log-log scaled Y-axis.

Once plotted, the linear slope of the resultant curve is an important variable, called the

shape parameter, represented by , which is used to adjust the exponential distribution to fit a

wide number of failure distributions. In general, if the coefficient, or shape parameter, is

less than 1, the distribution exhibits early life, or infant mortality failures. If the shape

parameter exceeds about 3.5, the data are time dependent and indicate wear-out failures. This

data set typically assumes the Gaussian, or normal, distribution. As the coefficient increases

above 3.5, the bell-shaped distribution tightens, exhibiting increasing kurtosis (peakedness at

the top of the curve) and a smaller standard deviation.

Many data sets will exhibit two or even three distinct regions. It is common for

reliability engineers to plot, for example, one curve representing the shape parameter during

run in (infant mortality period), another curve to represent the constant or gradually

increasing failure rate and a third distinct linear slope emerges to identify a third shape, the

wear out region. In these instances, the pdf of the failure data do in fact assume the familiar

bathtub curve shape.

The 3-parameter Weibull pdf is given by:

f t =

where

f(t) 0;

t 0 or

> 0;

>0

- < < +

: scale parameter, or characteristic life

: shape parameter (or slope)

: location parameter (or failure free life)

The 2-parameter Weibull pdf is obtained by setting = 0, and is given by:

f t =

Frequently, the location parameter is not used, and the value for this parameter can be set to

zero.

There is also a form of the Weibull distribution known as the 1-parameter Weibull

distribution. This in fact takes the same form as the 2-parameter Weibull pdf, the only

difference being that the value of is assumed to be known beforehand. This assumption

means that only the scale parameter needs be estimated, allowing for analysis of small data

sets. It is recommended that the analyst have a very good and justifiable estimate for before

using the 1-parameter Weibull distribution for analysis.

Weibull reliability and CDF functions are:

R t =

F t =

the population will have failed.

Figure 8: bath tub curve and the Weibull distribution

When = 1, the hazard function is constant

and therefore the data can be modeled by an exponential distribution with =1/ .

When <1, we get a decreasing hazard function and

When >1, we get an increasing hazard function

Figure 8, shows the Weibull shape parameters superimposed on the bath-tub curve.

H t =

Example: The failure time of a component follows a Weibull distribution with shape

parameter = 1.5 and scale parameter = 10,000 h. When should the component be replaced if

the minimum recurring reliability for the component is 0.95?

Solution:

=

0.95 =

1.5

10000

t = 1380.38 h

Example 2.8: The failure time of a certain component has a Weibull distribution with = 4,

= 2000, and = 1000. Find the reliability of the component and the hazard rate for an

operating time of 1500 hours.

Solution: A direct substitution into equation yields

Note that the Rayleigh and exponential distributions are special cases of the Weibull

distribution at = 2, = 0, and = 1, = 0, respectively. For example, when = 1 and = 0,

the reliability of the Weibull distribution function reduces to

And the hazard function reduces to 1/, a constant. Thus, the exponential is a special case of

the Weibull distribution. Similarly, when = 0 and = 2, the Weibull probability density

function becomes the Rayleigh density function. That is

iv.

GAMMA DISTRIBUTION

Gamma distribution can be used as a failure probability function for components whose

distribution is skewed. The failure density function for a gamma distribution is

and

The gamma density function has shapes that are very similar to the Weibull

distribution. At = 1, the gamma distribution becomes the exponential distribution with the

constant failure rate 1/. The gamma distribution can also be used to model the time to the

nth failure of a system if the underlying failure distribution is exponential. Thus, if Xi is

exponentially distributed with parameter = 1/, then T = X1 + X2 ++Xn, is gamma

distributed with parameters and n.

The gamma model is a flexible lifetime model that may offer a good fit to some sets

of failure data. It is not, however, widely used as a lifetime distribution model for common

failure mechanisms. A common use of the gamma lifetime model occurs in Bayesian

reliability applications.

Example: The time to failure of a component has a gamma distribution with = 3 and = 5.

Determine the reliability of the component and the hazard rate at 10 time-units.

Solution: Using

we compute,

The other form of the gamma probability density function can be written as follows:

This pdf is characterized by two parameters: shape parameter and scale parameter .

When 0<<1, the failure rate monotonically decreases; when >1, the failure rate

monotonically increase; when =1 the failure rate is constant. The mean, variance and

reliability of the density function in the above equation are, respectively,

Example: A mechanical system time to failure is gamma distribution with =3 and 1/=120.

Find the system reliability at 280 hours.

Solution: The system reliability at 280 hours is given by

v.

The log normal lifetime distribution is a very flexible model that can empirically fit many

types of failure data. This distribution, with its applications in maintainability engineering, is

able to model failure probabilities of repairable systems and to model the uncertainty in

failure rate information. The log normal density function is given by

where and are parameters such that - < < , and > 0. Note that and are

not the mean and standard deviations of the distribution as in normal distribution.

The relationship to the normal (just take natural logarithms of all the data and time

points and you have normal data) makes it easy to work with many good software analysis

programs available to treat normal data.

Mathematically, if a random variable X is defined as X = lnT, then X is normally

distributed with a mean of and a variance of 2. That is,

E(X) = E(lnT) =

&

V(X) = V(lnT) = 2

Since T = eX, the mean of the log normal distribution can be found by using the

normal distribution. Consider that

The log normal lifetime model, like the normal, is flexible enough to make it a very

useful empirical model. Figure above shows the reliability of the log normal vs. time. It can

be theoretically derived under assumptions matching many failures mechanisms. Some of

these are: corrosion and crack growth, and in general, failures resulting from chemical

reactions or processes.

Example: The failure time of a certain component is log normal distributed with = 5 and

= 1. Find the reliability of the component and the hazard rate for a life of 50 time units.

Solution: Substituting the numerical values of , , and t into equation, we compute

Thus, values for the log normal distribution are easily computed by using the standard

normal tables.

Example: The failure time of a part is log normal distributed with = 6 and = 2. Find the

part reliability for a life of 200 time units.

Solution: The reliability for the part of 200 time units is

The Availability, A(t), of a component or system is defined as the probability that the

component or system is operating at time t, given that it was operating at time zero.

The Unavailability, Q(t), of a component or system is defined as the probability that

the component or system is not operating at time t, given that is was operating at time zero.

A(t) + Q(t) = 1

Maintainability

Maintainability is defined as the probability that a device will be restored to its

operational effectiveness within the given period when maintenance action is performed in

accordance with the prescribed procedure. Maintenance action is the prescribed operation to

correct an equipment failure.

Repairable and Non-repairable Items

It is important to distinguish between repairable and non-repairable items when predicting or

measuring reliability.

light bulb, transistor, rocket motor, etc. Their reliability is the survival probability

over the items expected life or over a specific period of time during its life, when only

one failure can occur. During the component or systems life, the instantaneous

probability of the first and only failure is called the hazard rate or failure rate. Life

values such as MTTF are used to define non-repairable items.

Repairable Items: For repairable items, reliability is the probability that failure will

not occur in the time period of interest; or when more than one failure can occur,

reliability can be expressed as the failure rate, , or the Rate of Occurrence of Failures

described above, but only under the condition of constant failure rate.

Some systems are considered both repairable and non-repairable, such as a missile. It is

repairable while under test on the ground; but becomes a non-repairable system when

fired.

FAILURE PATTERNS

Failure Patterns (Non-repairable Items)

There are three patterns of failures for non-repairable items, which can change with time. The

failure rate (hazard rate) may be decreasing, increasing or constant.

i.

Decreasing Failure Rate (Non-repairable Items): A decreasing failure rate (DFR) can be

caused by an item, which becomes less likely to fail as the survival time increases. This is

demonstrated by electronic equipment during their early life or the burn-in period. This is

demonstrated by the first half of the traditional bath tub curve for electronic components

or equipment where failure rate is decreasing during the early life period.

Constant Failure Rate (Non-repairable Items): A constant failure rate (CFR) can be

caused by the application of loads at a constant average rate in excess of the design

specifications or strength. These are typically externally induced failures.

Increasing Failure Rate (Non-repairable Items): An increasing failure rate (IFR) can be

caused by material fatigue or by strength deterioration due to cyclic loading. Its failure

mode does not accrue for a finite time, and then exhibits an increasing probability of

occurrence.

ii.

iii.

There are three patterns of failures for repairable items, which can change with time. The

failure rate (hazard rate) may be decreasing, increasing or constant.

i.

ii.

iii.

progressive repair and / or burn-in can cause a decreasing failure rate (DFR) pattern.

Constant Failure Rate (Repairable Items): A constant failure rate (CFR) is indicative

of externally induced failures as in the constant failure rate of non-repairable items.

This is typical of complex systems subject to repair and overhaul.

Increasing Failure Rate (Repairable Items): This increasing failure rate (IFR) pattern

is demonstrated by repairable equipment when wear out modes begin to predominate

or electronic equipment that has aged beyond its useful life (right hand side of the

bath tub curve) and the failure rate is increasing with time.

Operating Characteristic (OC) curves are powerful tools in the field of quality control,

as they display the discriminatory power of a sampling plan. In quality control, the OC curve

plots the probability of accepting the lot on the Y-axis versus the lot fraction or percent

defectives (p) on the X-axis. Based on the number of defectives in a sample, the quality

engineer can decide to accept the lot, to reject the lot or even, for multiple or sequential

sampling schemes, to take another sample and then repeat the decision process.

In reliability engineering, the OC curve shows the probability of acceptance (i.e. the

probability of passing the test) versus a chosen test parameter. This parameter can be the true

or designed in mean life (MTTF) or the reliability (R), as shown in the figure below. Program

Managers, Evaluators, Testers, and other key acquisition personnel need to know the

probability of acceptance for a test plan to design appropriate test plans which will ensure

demonstration of reliability requirement at the desired confidence level. The most commonly

used tool for this purpose is the Operating Characteristic (OC) Curve. Figure below provides

a sample OC Curve. This OC curve is generated for a fixed configuration test and displays

the relationship between the probability of acceptance and MTBF based on test duration and

acceptable number of failures. The OC curve is a tool to determine the probability of

acceptance of a test plan corresponding to a given reliability requirement. The OC curve is

used to quantify the consumer risk and producer risk associated with a given MTBF value for

the associated testplan.

Reliability Risks: There are two types of decision risks which are of significant importance

during the demonstration of reliability requirements. These risks are called Consumer Risk

and Producer Risk.

i.

ii.

Consumer risk: The probability that a level of system reliability at or below the

requirement will be found to be acceptable due to statistical chance. This is depicted

on the operational characteristic curve. We should endeavor to quantify and manage

consumer risk because reliability below the requirement results in reduced mission

reliability and increased support costs.

Producer risk: The probability that a level of system reliability that meets or exceeds

the reliability goal will be deemed unacceptable due to statistical chance. This risk is

also depicted in the figure above. If the system is incorrectly deemed unsuitable,

major cost and schedule impacts to the acquisition program may result.

An appropriate balance between the consumer risk and the producer risk is important to

determine test duration/number of trials. If the consumer risk and producer risk are not

balanced appropriately, the test duration/number of trials may be too short/small or too

long/large. If the test duration/number of trials is too short/small, the reliability goal (target)

for the test will be higher (test reliability requirement is inversely proportional to the test

duration/number of trials). For short/small test duration/number of trials, one or both risks

may be too high. If the test duration/number of trials is too long/large, it may be very costly

to perform the test. The cost factor may lead to an unacceptable program burden.

The probability of acceptance, P(A), can be represented by the cumulative binomial

distribution:

where: =

! !

This gives the probability that the number of failures observed during the test, f, is

less than or equal to the acceptance number, c, which is the number of allowable failures in n

trials. Each trial has a probability of succeeding of R, where R is the reliability of each unit

under test. The reliability OC curve is developed by evaluating the above equation for various

values of R.

Poisson distribution can be used for large values of n.

=

!

i.e., if c= 2, then Pa = P(X2) = P(X=0) + P(X=1) + P(X=2) for

corresponding c

Where c = T (T: Number of hours of test)

acceptance sampling.

The OC curve represents the probability of acceptance for a given mean life. An OC

curve may be constructed showing the probability of acceptance as a function of average life,

. In this case, the sampling plan may be defined with:

Number of hours of test and

an acceptance number

A major assumption is that the failed item will be replaced by a good item.

Consider a sampling plan with:

an acceptance number, c

For each average life, ,

Compute the failure rate per hour

Compute the expected number of failures during the test

c = T

Compute Pa=P(c or fewer failure)=1-P(c+1 or more failure when the mean number of failures

is c). This can be obtained from using Poisson equation or the table from statistical data

book.

Example: In one of the plans, 10 items were to be tested for 5000 hours with replacement and

with an acceptance number of 1. Plot an OC curve showing probability of acceptance as a

function of average life.

Solution: Given:

Duration of the test, T = 5000

c=1

Step 1: Create a column for mean life, .

Mean Life ()

1000

2000

3000

4000

5000

6000

7000

8000

(You can also assume Rt values and create the first column. Example, 0.05,

0.10, 0.15 etc. upto 8 to 10 rows)

Step 2: Calculate = 1/

Mean Life ()

Failure Rate,

1000

0.001

2000

0.0005

3000

0.0003

4000

0.00025

5000

0.0002

6000

0.00017

7000

0.00014

8000

0.00012

Step 3: Calculate c = T

Mean Life ()

Failure Rate,

Expected Average

no. of failure, c

1000

0.001

5

2000

0.0005

2.5

3000

0.0003

1.5

4000

0.00025

1.25

5000

0.0002

1

6000

0.00017

0.85

7000

0.00014

0.7

8000

0.00012

0.6

Step 4: Calculate Pausing Poisson distribution for c = 1

=

!

For example,

when = 1000,

P(X1) = P(X=0) + P(X+1)

1 =

Mean Life ()

5 50

0!

Failure Rate,

1000

0.001

2000

0.0005

3000

0.0003

4000

0.00025

5000

0.0002

6000

0.00017

7000

0.00014

8000

0.00012

Step 5: Plot graph, Y axis: Pa& X axis:

5 51

1!

= 0.041 = 4.1%

Expected

Average no. of

failure, c

5

2.5

1.5

1.25

1

0.85

0.7

0.6

Probability of

acceptance, Pa

0.041

0.287

0.558

0.644

0.736

0.790

0.845

0.878

100%) as the mean life increases.

Usually multiple components make up a systems and we often want to know the

reliability of a system that uses more than one component. How the components are

connected together determines what type of system reliability model is used.

There are different types of system reliability models and theses are typically used to

analyse items such as an aircraft completing its flight successfully. Once the reliability of

components or machines has been established relative to the operating context and required

mission time, plant engineers must assess the reliability of a system or process.

Series systems: Simplest reliability model is a serial model where all the components

must be working for the system to be successful.

To calculate the system reliability for a serial process, you only need to multiply the

estimated reliability of Subsystem A at time (t) by the estimated reliability of Subsystem B at time (t). The basic equation for calculating the system reliability of a

simple series system is:

RS = RA * RB .RZ

The Failure rate of the system is calculated as by adding the failure rates together, i.e

Example: So, for a simple system with three subsystems, or sub-functions, each

having an estimated reliability of 0.90 (90%) at time (t), the system reliability is

calculated as 0.90 X 0.90 X 0.90 = 0.729, or about 73%.

redundancy is the parallel reliability model where two independent items are

operating but the system can successfully operate as long as one of them is working.

To calculate the reliability of an active parallel system, where both machines are

running, use the following simple equation:

Where:

Rs(t) System reliability for given time (t)

Rn(t) Subsystem or sub-function reliability for given time (t)

configurations, m out of the n items may be required to be working for the system to

function. The reliability of an m-out-of-n system, with n identical independent

items is given by:

Problem 1: A certain type of electronic component has a uniform failure rate of 0.00001 per

hour. What is the reliability for a specified period of service of 10000 hours?

Solution:

Given:

= 0.00001 per hour

t = 10000 hours

Rt = e-t = e-0.00001x10000 = 0.90483 = 90.483%

Problem 2: Given a (MTTF) of 5000 hours and a uniform failure rate, what is the

reliability associated with a specified service period of 200 hours?

Solution:

Given:

' = 5000 hours

t = 200 hours

1

= = 1/5000

Rt = e-t =96.079%

Problem 3: The following reliability requirements have been set on the sub-systems of a

communication system:

Sub-System

Reliability (for a 4 hour period)

Receiver

0.970

Control system

0.989

Power supply

0.995

Antenna

0.996

What is the expected reliability of the overall system?

Solution: Rt(system) = Rt(subsystem1) xRt2 xRt3x Rt4 = 0.970x0.989x0.995x0.996 = 0.950 (95%)

The chance that the overall system will perform its function without failure for

a 4 hour period is 95%.

Problem 4: A unit has a reliability of 0.99 for a specified mission time. If 2 identical units are

used in parallel redundancy, what overall efficiency will be obtained?

Solution:

Rs(t) = 1 {1- R1(t)}n =1 {1 0.99}2 = 0.999 or 99.9%

Problem 5: An industrial machine compresses natural gas into an interstate gas pipeline. The

compressor is on line 24 hours a day. (If the machine is down, a gas field has to be shutdown

until the natural gas can be compressed, so down time is very expensive.) The vendor knows

that the compressor has a constant failure rate of 0.000001 failures/hr. What is the operational

reliability after 2500 hours of continuous service?

Solution:

The compressor has a constant failure rate and therefore the reliability follows

the exponential distribution: Rt = e-t

Given:

Failure rate = 0.000001 f/hr

Operational time t = 2500 hours

Reliability = e-(0.000001 * 2500) = 0.9975 or 99.75%

Problem 6: Suppose that a component we wish to model has a constant failure rate with a

mean time between failures of 25 hours? Find:(a) The reliability function.

(b) The reliability of the item at 30 hours.

Solution:

Since the failure rate is constant, we will use the exponential distribution.

Also, the MTBF = 25 hours. We know, for an exponential distribution, MTBF

= 1/.

Therefore = 1/25 = 0.04

(a) The reliability function is given by: R(t) = e-t = e- (0.04 * t)

(b) The reliability of the item at 30 hours = e-0.04 * 30 = 0.3012

Problem 7: A certain electronic component has an exponential failure time with a mean of 50

hours.

(a) What is the rate of this component?

(b) What is the reliability of this component at 100 hours?

(c) What is the minimum number of these components that should be placed in parallel if we

desire a reliability of 0.90 at 100 hours? (The idea of placing extra components in parallel is

to provide a backup if the first component fails.)

Solution:

(a) = 1/50 = 0.02 per hour

(b) R(100) = e-0.02x100 = 0.1353 (which is not very good)

(c) The parallel system will only fail if all components fail. The probability of

each failing is 1-0.1353= 0.8647.

If there are n parallel components needed

1 - 0.8647n = 0.9

0.8647n = 0.1

By trial and error, n = 16, so we need 16 components in parallel.

RELIABILITY TOOLS AND TECHNIQUES

Some of the tools that are useful during the design stage can be thought of as tools for

fault avoidance. The fall into two general methods, bottom-up and top-down.

I.

Top-down method

Undesirable single event or system success at the highest level of interest (the top event)

should be defined.

Contributory causes of that event at all levels are then identified and analysed.

Start at highest level of interest to successively lower levels

Event-oriented method

Useful during the early conceptual phase of system design

Used for evaluating multiple failures including sequentially related failures and commoncause events

Some examples of top-down methods include: Fault tree analysis (FTA) & Reliability

block diagram (RBD)

a. Fault tree analysis

Fault tree analysis is a systematic way of identifying all possible faults that could lead

to system fail-danger failure. The FTA provides a concise description of the various

combinations of possible occurrences within the system that can result in predetermined

critical output events. The FTA helps identify and evaluate critical components, fault paths,

and possible errors. It is both a reliability and safety engineering task, and it is a critical data

item that is submitted to the customer for their approval and their use in their higher-level

FTA and safety analysis. The key elements of a FTA include:

Gates represent the outcome

Events represent input to the gates

Cut sets are groups of events that would cause a system to fail

its modes and causes;

to quantify their contribution to system unreliability in the course of product design

FTA can be done qualitatively by drawing the tree and identifying all the basic events.

However to identify the probability of the top event then probabilities or reliability figures

must be input for the basic events. Using logic the probabilities are worked up to given a

probability that the top event will occur. Often the data from an FMEA are used in

conjunction with an FTA.

The following table shows the flowchart symbols that are used in fault tree analysis in

order to aid with the correct reading of the fault tree.

A rectangle signifies a fault or

undesired event caused by one

or more preceding causes

acting through logic gates.

Circle signifies a primary

failure or basic fault that

requires

no

further

development

Diamond denotes a secondary

failure or undesired event but

not developed further

And gate denotes that a

failure will occur if all inputs

fail (parallel redundancy)

Or gate denotes a failure will

occur if any input fails (series

reliability)

FTA example

Transfer event

The RBD is discussed and shown in section Modelling system reliability above. It

is however among the first tasks to be completed. It model system success and gives

results for the total system. It deals with different system configuration, including,

parallel, redundant, standby and alternative functional paths. It doesnt provide any fault

analysis and uses probabilistic measures to calculate system reliability.

II.

Bottom-up method

Identify fault modes at the component level.

For each fault mode the corresponding effect on performance is deduced for the

next higher system level.

The resulting fault effect becomes the fault mode at the next higher system level,

and so on.

Successive iterations result in the eventual identification of the fault effects at all

functional levels up to the system level.

Initially may be qualitative.

Some examples of bottom-up methods include: Event tree analysis (ETA); FMEA and

Hazard and operability study (HAZOP).

a. Event tree analysis

Considers a number of possible consequences of an initiating event or a system

failure.

May be combined with a fault tree.

Used when it is essential to investigate all possible paths of consequent events their

sequence.

Analysis can become very involved and complicated when analysing larger systems.

Example:

Failure mode and effect analysis (FMEA) is a bottom-up, qualitative dependability

analysis method, which is particularly suited to the study of material, component and

equipment failures and their effects on the next higher functional system level. Iterations of

this step (identification of single Failure modes and the evaluation of their effects on the next

higher system level) result in the eventual identification of all the system single failure

modes. FMEA lends itself to the analysis of systems of different technologies (electrical,

mechanical, hydraulic, software, etc.) with simple functional structures. FMECA extends the

FMEA to include criticality analysis by quantifying failure effects in terms of probability of

occurrence and the severity of any effects. The severity of effects is assessed by reference to

a specified scale.

FMEAs or FMECAs are generally done where a level of risk is anticipated in a

program early in product or process development. Factors that may be considered are new

technology, new processes, new designs, or changes in the environment, loads, or regulations.

FMEAs or FMECAs can be done on components or systems that make up products,

processes, or manufacturing equipment. They can also be done on software systems.

Identification of potential failure modes, effects, and causes;

Identification of risk related to failure modes and effects;

Identification of recommended actions to eliminate or reduce the risk;

Follow-up actions to close out the recommended actions.

Benefits include:

Gives an initial indication of those failure modes that are likely to be

critical, especially single failures that may propagate.

Identifies outcomes arising from specific causes or initiating events that

are believed to be important.

Provides a framework for identification of measures to mitigate risk.

Useful in the preliminary analysis of new or untried systems or processes.

Limitations include:

The output data may be large even for relatively simple systems.

May become complicated and unmanageable unless there is a fairly direct

(or "single-chain") relationship between cause and effect may not easily

deal with time sequences, restoration processes, environmental conditions,

maintenance aspects, etc.

Prioritizing mode criticality is complicated by competing factors involved.

Hazard and Operability Analysis (HAZOP) is a structured and systematic technique

for system examination and risk management. In particular, HAZOP is often used as a

technique for identifying potential hazards in a system and identifying operability

problems likely to lead to nonconforming products.

HAZOP is based on a theory that assumes risk events are caused by deviations from

design or operating intentions. Identification of such deviations is facilitated by using sets

of guide words as a systematic list of deviation perspectives. This approach is a unique

feature of the HAZOP methodology that helps stimulate the imagination of teammembers when exploring potential deviations.

As a risk assessment tool, HAZOP is often described as:

A brainstorming technique

A qualitative risk assessment tool

An inductive risk assessment tool, meaning that it is a bottom-up risk

identification approach, where success relies on the ability of subject matter

experts (SMEs) to predict deviations based on past experiences and general

subject matter expertise

HAZOP is a powerful communication tool. Once the HAZOP analysis is complete,

the study outputs and conclusions should be documented commensurate with the nature of

risks assessed in the study and per individual company documentation policies. As part of

closure for the HAZOP analysis, it should be verified that a process exists to ensure that

assigned actions are closed in a satisfactory manner.

Life testing is concerned with measuring the pertinent characteristics of the life of the unit

under study. Often this is accomplished by making statistical inferences about probability

distributions or their parameters.

In general, units are put on test, observed and the times of failure recorded as they occur.

For example, a group of similar components are placed on test and the failure times observed.

Obviously, the times at which individual units fail will vary. Sometimes, assignable causes

can be found that contribute to that variation. Suppose some components have been subjected

to testing at a high temperature environment and it is possible that such components will fail

sooner than those tested at an ambient temperature environment. However, the components at

the high temperature will still have different failure times; and, if there are no assignable

causes in operation, these components will still have different failure times, that is, it is

always assumed that the failure times of the components have some random elements and

will be assumed to be a random variable with a probability distribution.

To make statistical inferences about the probability distribution of the failure time random

variable, one uses the failure times that have been observed from a life test, ideally a test that

has been statistically designed for the purpose of the study. If the failure times of a particular

component under a given set of conditions, can be adequately described by a probability

distribution, there are considerable practical benefits. The failure times can then be used to

estimate the parameters of the distribution and to perhaps study the relationship of these

parameters to associated explanatory variables. The estimates can be used to make

predictions, determine component configurations in systems, determine replacement

procedures, specify guarantee periods and make other decisions about the use of the

component.

1) Accelerated life testing

The concept of accelerated testing is to compress time and accelerate the failure

mechanisms in a reasonable test period so that product reliability can be assessed. The only

way to accelerate time is to stress potential failure modes. These include electrical and

mechanical failures. Failure occurs when the stress exceeds the products strength. In a

products population, the strength is generally distributed and usually degrades over time.

Applying stress simply simulates aging. Increasing stress increases the unreliability and

improves the chances for failure occurring in a shorter period of time. This also means that a

smaller sample population of devices can be tested with an increased probability of finding

failure. Stress testing amplifies unreliability so failure can be detected sooner. Accelerated

life tests are also used extensively to help make predictions. Predictions can be limited when

testing small sample sizes. Predictions can be erroneously based on the assumption that lifetest results are representative of the entire population. Therefore, it can be difficult to design

an efficient experiment that yields enough failures so that the measures of uncertainty in the

predictions are not too large. Stresses can also be unrealistic. Fortunately, it is generally rare

for an increased stress to cause anomalous failures, especially if common sense guidelines are

observed.

Anomalous testing failures can occur when testing pushes the limits of the material out of

the region of the intended design capability. The natural question to ask is: What should the

guidelines be for designing proper accelerated tests and evaluating failures? The answer is:

Judgment is required by management and engineering staff to make the correct decisions in

this regard. To aid such decisions, the following guidelines are provided:

Always refer to the literature to see what has been done in the area of accelerated

testing.

Avoid accelerated stresses that cause nonlinearities, unless such stresses are

plausible in product-use conditions. Anomalous failures occur when accelerated stress

causes nonlinearities in the product. For example, material changing phases from

solid to liquid, as in a chemical nonlinear phase transition (e.g., solder melting,

inter-metallic changes, etc.); an electric spark in a material is an electrical

nonlinearity; material breakage compared to material flexing is a mechanical

nonlinearity.

Tests can be designed in two ways: by avoiding high stresses or by allowing them,

which may or may not cause nonlinear stresses. In the latter test design, a concurrent

engineering design team reviews all failures and decides if a failure is anomalous or

not. Then a decision is made whether or not to fix the problem. Conservative

decisions may result in fixing some anomalous failures. This is not a concern when

time and money permit fixing all problems. The problem occurs when normal failures

are labeled incorrectly as anomalous and no corrective action is taken.

Accelerated life testing is normally done early in the design process as a method for

testing for fit for purpose. It can be done at the component level or the sub-assembly level

but is rarely done at a system level as there are usually too many parts and factors that can

cause failures and these can be difficult to control and monitor.

Step-Stress Testing is an alternative test; it usually involves a small sample of devices

exposed to a series of successively higher and higher steps of stress. At the end of each

stress level, measurements are made to assess the results to the device. The measurements

could be simply to assess if a catastrophic failure has occurred or to measure the resulting

parameter shift due to the steps stress. Constant time periods are commonly used for

each step-stress period. This provides for simpler data analysis. There are a number of

reasons for performing a step-stress test, including:

Aging information can be obtained in a relatively short period of time. Common stepstress tests take about 1 to 2 weeks, depending on the objective.

Step-stress tests establish a baseline for future tests. For example, if a process

changes, quick comparisons can be made between the old process and the new

process. Accuracy can be enhanced when parametric change can be used as a measure

for comparison. Otherwise, catastrophic information is used.

Failure mechanisms and design weaknesses can be identified along with material

limitations. Failure-mode information can provide opportunities for reliability growth.

Fixes can then be put back on test and compared to previous test results to assess fix

effectiveness.

Data analysis can provide accurate information on the stress distribution in which the

median-failure stress and stress standard deviation can be obtained.

The goal of Reliability enhancement testing (RET) is to identify any potential failure

modes that are inherent in a design early in the design process. Identifying the root cause of

the failure mode and then incorporating a fix to the design can achieve reliability growth.

This is accomplished by designing out the possibility of potential failure modes occurring

with the customer and reducing the inherent risk associated with new product development.

RET at the unit or subassembly level utilizes step-stress testing as its primary test method. It

should be noted that Highly Accelerated Life Testing (HALT) is not meant to be a simulation

of the real world but a rapid way to stimulate failure modes. These methods commonly

employ sequential testing, such as step-stressing the units with temperature and then

vibration. These two stresses can be combined so that temperature and vibration are applied

simultaneously. This speeds up testing, and if an interactive vibration/temperature failure

mode is present, this combined testing may be the only way to find it. Other stresses used

may be power step-stress, power cycling, package preconditioning with infrared (IR) reflow,

electrostatic-discharge (ESD) simulation, and so forth. The choice depends on the intended

type of unit under test and the units potential failure modes.

HALT is primarily for assemblies and subassemblies. The HALT test method utilizes a

HALT chamber. Today, these multi-stress environmental systems are produced by a large

number of suppliers. The chamber is unique and can perform both temperature and vibration

step-stress testing.

3) Demonstration testing

Demonstration of reliability may be required as part of a development and production

contract, or prior to release to production, to ensure that the requirements have been met.

Two basic forms of reliability measurement are used:

a. A sample of units may be subjected to a formal reliability test, with

conditions specified in detail.

b. Reliability may be monitored during development and use.

The first method has been shown to be problematic and subject to sever limitations and

practical problems. The limitations include:

function;

It implies that MTBF is an inherent parameter of a system;

Extremely costly

It is an acceptance test

Objective is to have no or very few failures

It has been shown that a well-managed reliability growth programme as discussed earlier

would avoid the need for demonstration testing as they concentrate on how to improve

products. It is has also been argued that the benefit to the product in terms of improved

reliability is sometimes questionable having used PRST methods.

If all processes were under complete control, product screening or monitoring would be

unnecessary. If products were perfect, there would be no field returns or infant mortality

problems, and customers would be satisfied with product reliability and quality. However, in

the real world, unacceptable process and material variations exist. Product flaws need to be

anticipated before customers receive final products and use them. This is the primary reason

that a good screening and monitoring program is needed to provide high quality products.

Screening and monitoring programs are a major factor in achieving customer satisfaction.

Parts are screened in the early production stage until the process is under control and any

material problems have been resolved. Once this occurs, a monitoring program can ensure

that the process has not changed and that any deviations have been stabilized. Here, the term

screening implies 100% product testing while monitoring indicates a sample test. Screens

are based upon a products potential failure modes. Screening may be simple, such as on-off

cycling of the unit, or it may be more involved, requiring one or more powered

environmental stress screens. Usually, screens that power up the unit, compared with nonpowered screens, provide the best opportunity to precipitate failure-mode problems. Screens

are constantly reviewed and may be modified based on screening yield results. For example,

if field returns are low and the screen yields are high (near 100 percent), the screen should be

changed to find all the field issues. If yields are high with acceptable part per million (PPM)

field returns, then a monitoring program will replace the screen. In general, monitoring is

preferred for low-cost/high-volume jobs. A major caution for selecting the correct screening

program is to ensure that the process of screening out early life failures does not remove too

much of a products useful life. Manufacturers have noted that, in the attempt to drive out

early life failure, the useful life of some products can become reduced. If this occurs,

customers will find wear-out failure mechanisms during early field use.

5) Reliability Growth/Enhancement Planning

Traditionally, the need for Reliability Growth planning has been for large subsystems or

systems. This is simply because of the greater risk in new product development at that level

compared to the component level. Also, in programs where one wishes to push mature

products or complex systems to new reliability milestones, inadequate strategies will be

costly. A program manager must know if Reliability Growth can be achieved under required

time and cost constraints. A plan of attack is required for each major subsystem so that

system-level reliability goals can be met. However Reliability Growth planning is

recommended for all new platforms, whether they are complex subsystems or simple

components. In a commercial environment with numerous product types, the emphasis must

be on platforms rather than products. Often there may be little time to validate, let alone

assess, reliability. Yet, without some method of assessment, platforms could be jeopardized.

Accelerated testing is, without question, the featured Reliability Growth tool for industry. It is

important to devise reliability planning during development that incorporates the most time

and cost effective testing techniques available.

Reliability growth can occur at the design and development stage of a project but most of

the growth should occur in the first accelerated testing stage, early in design. Generally, there

are two basic kinds of Reliability Growth test methods used: constant stress testing and stepstress testing. Constant stress testing applies to an elevated stress maintained at a particular

level over time, such as isothermal aging, in which parts are subjected to the same

temperature for the entire test (similar to a burn-in). Step-stress testing can apply to such

stresses as temperature, shock, vibration, and Highly Accelerated Life Test (HALT). These

tests stimulate potential failure modes, and Reliability Growth occurs when failure modes are

fixed. No matter what the method, Reliability Growth planning is essential to avoid wasting

time and money when accelerated testing is attempted without an organized program plan.

Table below summarizes how different tests fit into the product life cycle.

Accelerated tests

or methods

Stage of product

life cycle

Reliability Growth

or Reliability

Enhancement

Design and

development

HALT (Highly

Accelerated Life

Test)

Design and

development

Step-Stress Test

Design and

Development of

units or

components

Failure-Free Test or

demonstration test

Post Design

ESS

(Environmental

Stress Screening)

HASS (Highly

Accelerated

Stress Screen)

Production

Production

Reliability Growth is the positive improvement in a

reliability parameter over a period of time due to

changes in product design or the manufacturing

process. A Reliability Growth program is commonly

established to help systematically plan for reliability

achievement over a programs duration so that

resources and reliability risks can be managed.

HALT is a type of step-stress test that often combines

two stresses, such as temperature and vibration. This

highly accelerated stress test is used for finding

failure modes as fast as possible and assessing

product risks. Frequently it exceeds the equipmentspecified limits.

Exposing small samples of product to a series of

successively higher steps of a stress (like

temperature), with a measurement of failures after

each step. This test is used to find failures in a short

period of time and to perform risk studies.

This is also termed zero failure testing. This is a

statistically significant reliability test used to

demonstrate that a particular reliability objective can

be met at a certain level of confidence. For example,

there liability objective may be 1000 FITs (1million

hours MTTF) at the 90 percent confidence level. The

most efficient statistical sample size is calculated

when no failures are expected during the test period.

Hence the name.

This is an environmental screening test or tests used

in production to weed out latent and infant mortality

failures.

This is a screening test or tests used in production to

weed out infant mortality failures. This is an

aggressive test since it implements stresses that are

higher than common ESS screens. When aggressive

levels are used, the screening should be established in

HALT testing.

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.