Вы находитесь на странице: 1из 15

Basic Concepts

Reliability, MTTF, Availability, etc.

CprE 545: Fault Tolerant Systems (G. Manimaran) 1


Definitions
• Reliability of a system is defined to be the probability
that the given system will perform its required function
under specified conditions for a specified period of
time.

• MTBF (Mean Time Between Failures): Average time a


system will run between failures. The MTBF is usually
expressed in hours. This metric is more useful to the
user than the reliability measure.

CprE 545: Fault Tolerant Systems (G. Manimaran) 2


Approaches to increase the reliability of a system

Increasing reliability of a system

1. Worst case design 1. Redundancy


2. Using high quality 2. Typically employed
components
3. Less expensive
3. Strict quality
control procedures

CprE 545: Fault Tolerant Systems (G. Manimaran) 3


Reliability expressions
• Exponential Failure Law:

• Reliability of a system is often modeled as:

– R(t) = exp(-λt)
• where λ is the failure rate expressed as
percentage failures per 1000 hours or as failures
per hour.

– When the product “λt” is small,


• R(t) = 1 - λt

CprE 545: Fault Tolerant Systems (G. Manimaran) 4


Relation between MTBF and the Failure rate
• MTBF is the average time a system will run between
failures and is given by:
∞ ∞
– MTBF = ∫0 R(t) dt = ∫0 exp(-λt) dt = 1 / λ

– In other words, the MTBF of a system is the


reciprocal of the failure rate.

– If “λ” is the number of failures per hour, the MTBF


is expressed in hours.

CprE 545: Fault Tolerant Systems (G. Manimaran) 5


A simple example
• A system has 4000 components with a failure rate of
0.02% per 1000 hours. Calculate λ and MTBF.

• λ = (0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4


failures/hour

• MTBF = 1 / (8 * 10-4 ) = 1250 hours

CprE 545: Fault Tolerant Systems (G. Manimaran) 6


Relation between Reliability and MTBF

• R(t) = (1 – λt) = (1 – t / MTBF)


• Therefore,
– MTBF = t / (1 – R(t))

1.0

0.8
Reliability 0.6
R(t) 0.4 0.36

0.2

0 1 MTBF 2 MTBF
Time t

CprE 545: Fault Tolerant Systems (G. Manimaran) 7


An example

• A first generation computer contains 10000


components each with λ = 0.5%/(1000 hours). What is
the period of 99% reliability?

• MTBF = t / (1 – R(t)) = t / (1 – 0.99)


– t = MTBF * 0.01 = 0.01 / λav
– Where λav is the average failure rate
– N = No. of components = 10000
– λ = failure rate of a component
• = 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per hour

• Therefore, λav = N λ = 10000 * 5 * 10-6 = 5 * 10-2 per hour

• Therefore, t = 0.01 / (5 * 10-2 ) = 12 minutes

CprE 545: Fault Tolerant Systems (G. Manimaran) 8


Reliability for different configurations

1. Series Configuration
1 2 3 4 N
R R R R R
Overall reliability = Ro = R * R * R…. R = RN

2. Parallel Configuration 1
R
Ro = 1 – (probability that all of the 2
R
components fail)
N
Ro = 1 – (1 - R) N
R

CprE 545: Fault Tolerant Systems (G. Manimaran) 9


Reliability for different configurations

3. Hybrid Configuration
1
R
1 2 N 2
R R R R
M
R

Overall reliability = Ro = ?

CprE 545: Fault Tolerant Systems (G. Manimaran) 10


Reliability for different configurations

4. Triple Modular Redundancy (TMR)

1
R
2
R Voting
M
R

Overall reliability = Ro = [3C2 * R2 * (1-R)] + [R3]

CprE 545: Fault Tolerant Systems (G. Manimaran) 11


Reliability calculation – a more complicated example
R = Rc Rs2 + (1-Rc) Rs1
System
B

Assuming C is faulty
A C E F

D S1
B E

Assuming C is fault A D F
free

S2
B Rs1 can be
calculated
Needs using
A E F further parallel
reduction series
formulae
D
Rs2 = RE Rs3 + (1-RE) Rs4
S2

Assuming E is faulty
A E F

D S4

Assuming E is fault A D F
free

S3
B

A F S3

D
A F
Maintainability
• Maintainability of a system is the probability of
isolating and repairing a “fault” in the system within a
given time.
• Maintainability is given by:
– M(t) = 1 – exp(-µt)
– Where µ is the repair rate
– And t is the permissible time constraint for the
maintenance action

– µ = 1/(Mean Time To Repair) = 1/MTTR

– M(t) = 1 – exp(-t/MTTR)

CprE 545: Fault Tolerant Systems (G. Manimaran) 14


Availability
• Availability of a system is the probability that the system will be
functioning according to expectations at any time during its
scheduled working period.

• Availability = System up-time / (System up-time + System down-time)

• System down-time = No. of failures * MTTR

• System down-time = System up-time * λ * MTTR

• Therefore,

– Availability = System up-time / (System up-time + (System up-time *


λ * MTTR)
• = 1 / (1 + (λ *MTTR)

– Availability = MTBF / (MTBF + MTTR)

CprE 545: Fault Tolerant Systems (G. Manimaran) 15

Вам также может понравиться