Академический Документы
Профессиональный Документы
Культура Документы
Reliability
The failure of computerized controls in some applications can have catastrophic results. This lecture describes how
faults can cause errors which in turn can lead to failures.
After this lecture you should be able to differentiate between availability and reliability. You should also be a able
to read a description of a system failure and identify and classify the fault, identify the resulting error and describe
strategies for reducing the likelihood of the faults.
Introduction tween faults, errors and failures.
lec16.tex 1
can range from the design stage (the earliest time) to detect hardware (or software) that is not perform-
through to operation (the latest time). Failures can ing as expected, and detecting accesses to protected
also be introduced during the prototyping, manufac- or non-existent memory.
turing, and test times. Masking Redundancy includes redundant circuitry
that hides failures. This can include N-module re-
Neumann (1995) dundancy (NMR) with voting to select the output
that is most likely correct, error-correcting coding
It’s also possible to classify most faults into a num- and logic design that uses additional gates to correct
ber of common categories. For example, among the for partial failures.
causes listed in (Neumann 1995) are: Dynamic Redundancy includes replacing failed
circuitry with backup circuits. This can include de-
System Analysis - incorrect assumptions or
tecting failures and selecting one of N modules, and
poor models used in the original system design
designing for reduced performance in the case of
Requirements Definition - an important require- some failures.
ment left out or incorrectly specified
Software
Design Flaws - errors in the design
The above strategies can also be used to minimize
Implementation Errors - includes faulty con- software failures.
struction, or buggy software Fault avoidance includes employing good soft-
System use - operator errors ware engineering principles such as modularity, data
hiding, sanity checks, regression testing, and bug
Hardware malfunction - hardware failures in the tracking.
field Fault detection and recovery can involve using
multiple independently-written programs to perform
Environmental problems - heat, flooding, etc
the same task and methods to recover from errors
Evolution and Maintenance - poorly-tested up- once they have been detected.
grades or faulty maintenance
Readings: Computer System Failures
Failure-Prevention Strategies
The handout contains examples taken from Neu-
Having identified possible causes of failures, we mann (1995) and includes examples of failures in
need to develop strategies to minimize their impacts. the areas of space exploration, trains, control systems
Different approaches are used to reduce hardware and robotics.
and software failures. Table 1 identifies some of the Exercise: Choose several examples that you find interesting.
techniques. Try to identify the fault and the resulting error(s). Localize the fault
(level and time) and see if it fits into one of Neumann’s categories.
Hardware Identify any of the above fault-reduction strategies might have
helped avoid the failure.
Figure 1 summarizes the various strategies used to
reduce the likelihood of hardware failures.
Fault Avoidance includes: conservative design References
(operation in a benign (low-temperature, clean
power, etc) environment, use of reliable components, Computer Related Risks, Peter G Neumann, ACM
reduced complexity, etc). Press, 1995. This is an easy-to-read book that is full
Fault Detection includes: performing the same op- of examples of the risks involved in computer-based
eration twice and comparing the results, using error- systems. Some of the material isn’t relevant to com-
detecting codes such as parity or CRCs, using timers puter control systems but it’s all interesting.
2
Table 1: Approaches used to reduce hardware and software failures (from Siewiorek and Swarz, 1992).
3
Figure 1: Strategies used to reduce the likelihood of hardware failures (Siewiorek and Swarz).