Академический Документы
Профессиональный Документы
Культура Документы
n
R= R
i =1
i (2.1)
This is the simplest basic model. Parts count reliability prediction is based on this
model. The failure logic model of the system in real applications is more complex,
if there are redundant subsystems or components.
The Markov model is a popular method for analyzing system status. The
Markov model provides a systematic method for analysis of a system which
consists of many modules and adopts a complex monitoring mechanism. The
Markov model is especially useful for a more complicated transition among the
system states or the repair of the system. A set of states and probabilities that a
system will move from one state to another must be specified to build a Markov
28 H.G. Kang
model. Markov states represent all possible conditions the system can exist in. The
system can only be in one state at a time. A Markov model of the series system is
shown in Figure 2.1(b). State S0 is an initial state. States S1 and S2 represent the
state of module 1 failure and module 2 failure, respectively. Both are defined as
hazard states.
Fault tree modeling is the most familiar tool for analysis staff, whose logical
structure makes it easy for system design engineers to understand models. A fault
tree is a top-down symbolic logic model generated in the failure domain. That is, a
fault tree represents the pathways of system failure. A fault tree analysis is also a p
owerful diagnostic tool for analysis of complex systems and is used as an aid for de
sign improvement.
l1 l2
l1 S1
S0 l2
S2
FAILURE OF
FUNCTION
The analyst repeatedly asks, What will cause a given failure to occur? in
using backwards logic to build a faulttree model. The analyst views the system
from a top-down perspective. This means he starts by looking at a high-level
system failure and proceeds down into the system to trace failure paths. Fault trees
are generated in the failure domain, while reliability diagrams are generated in the
success domain. Probabilities are propagated through the logic models to
determine the probability that a system will fail or the probability the system will
operate successfully (i.e., the reliability). Probability data may be derived from
available empirical data or found in handbooks.
Fault tree analysis (FTA) is applicable both to hardware and non-hardware
systems and allows probabilistic assessment of system risk as well as prioritization
of the effort based upon root cause evaluation. An FTA provides the following
advantages [6]:
The probability of failure (P) for a given event is defined as the number of failures
per number of attempts, which is the probability of a basic event in a fault tree. The
sum of reliability and failure probability equals unity. This relationship for a series
system can be expressed as:
P = P1 + P2 - P1 P2
= (1 - R1 ) + (1 - R2 ) - (1 - R1 )(1 - R2 )
(2.2)
= 1 - R1 R2
=1- R
The reliability model for a dual redundant system is expressed in Figure 2.2. Two
s-independent redundant modules with reliability of R1 and R2 will successfully
perform a system function if one out of two modules is working successfully. The
reliability of the dual redundant system, which equals the probability that one of
modules 1 or 2 survives, is expressed as:
R = R1 + R 2 - R1 R 2
(2.3)
= e -l1t + e -l2t - e -( l1 +l2 )t
30 H.G. Kang
R = 1 - (1 - R1 )(1 - R2 ) (2.4)
1 - R = (1 - R1 )(1 - R2 ) (2.5)
n
R = 1- (1 - R )
i =1
i (2.6)
l1
l2
l1 S1 l2
S0 l2 l1 S3
S2
FAILURE OF
FUNCTION
Not all systems can be modeled with simple RBDs. Some complex systems cannot
be modeled with true series and parallel branches. Module 2 monitors status
information from module 1 and module 2 automatically takes over the system
function when an erroneous status of module 1 is detected in a more complicated
system. The system is conceptually illustrated in Figure 2.3.
l1
l2
1 - R = (1 - R1 ){(1 - R2 ) + (1 - m ) - (1 - R2 )(1 - m )}
(2.7)
= (1 - R1 ){(1 - R2 ) m + (1 - m )}
The Markov model is shown in Figure 2.4. A fault tree is shown in Figure 2.5.
S4
l2 l1
l1m l2
S0 S1 S2
l1(1-m)
S3
Figure 2.4. Markov model for standby and automatic takeover system
32 H.G. Kang
FAILURE OF
FUNCTION
MONITORING
FAILURE
Figure 2.5. Fault tree for standby and automatic takeover system
Hard-wired
Process Parameter Signal Processing A Logic
Actuator
Process Parameter Signal Processing B
Actuator
Process Parameter Signal Processing
C Actuator
Process Parameter Signal Processing
D
(a) Typical process of signal processing using conventional analog
circuits
Process Parameter
Actuator
Process Parameter Digital
Actuator
Signal Processing Unit
Process Parameter Output
Module Actuator
Process Parameter
Figure 2.6. Schematic diagram of signal processing using analog circuit and digital
processor unit
FAILURE OF
SYSTEM
FAILURE OF FAILURE OF
CH A CH B
FAILURE OF FAILURE OF
CH C CH D
FAILURE OF
SYSTEM
Figure 2.7. The fault trees for the systems shown in Figure 2.6
Issues in System Reliability and Risk Model 35
FAILURE OF
SYSTEM
FAILURE OF INPUT TO
PROCESS UNIT
FAILURE OF PROCESS
UNITS
PROCESS TRAIN
PROCESS TRAIN CCF
INDEPENDENT FAILURE
2/3
Figure 2.8. The fault tree model of a three-train signal-processing system which performs 2-
out-of-3 auctioneering
Static modeling techniques, such as a classical event tree and a fault tree, do not
simulate the real world without considerable assumptions, since the real world is
dynamic. Dynamic modeling techniques, such as a dynamic fault tree model,
accommodate multi-tasking of digital systems [7], but are not very familiar to
designers.
Estimating how many parameters will trigger the output signals within the
specific time limit for specific kind of accident is very important, in order to build
a sophisticated model with the classical static modeling techniques. Several
assumptions, such as the time limit and the severity of standard accidents are
required. Parameters for several important standard cases should be defined. For
example, a reactor protection system should complete its actuation within 2 hours
and the accident be detected through changes in several parameters, such as Low
steam generator pressure, Low pressurizer pressure, and Low steam generator
level in the case of a steam line break accident in nuclear power units. The digital
system also provides signals for human operators. The processor module in some
cases generates signals for both the automated system and human operator. The
effect of digital system failure on human operator action is addressed in Section 2.6.
36 H.G. Kang
C = Pr(T U )
U
1 - (1 - p)U (2.8)
=
t =1
p(1 - p) t -1 = p
1 - (1 - p)
The failure probability is denoted p. This equation can be solved for U as:
ln(1 - C )
U= (2.9)
ln(1 - p)
An impractical number of test cases may be required for some ultra-high reliable
systems. A failure probability that is lower than 106 with 90% confidence level
implies the need to test the software for more than 2.3106 cases without failure.
Test automation and parallel testing in some cases is able to reduce the test burden,
such as sequential processing software which has no feedback interaction with
users or other systems. The validity of test-based evaluation depends on the
coverage of test cases. Test cases represent the inputs which are encountered in
actual use. This issue is addressed by the concept of reliability allocation [11]. The
required software reliability is calculated with target reliability of the total system.
The cases of no failure observed during test are covered by Equations 2.8 and 2.9.
Test stopping rules are also available for the cases of testing restart after error
fixing [11]. The number of needed test cases for each next testing is discussed in a
more detailed manner in Chapter 4.
Power Supply
W atchdog
Timer
W atchdog
Timer
Relay
Microprocessor
of Processing Unit
Microprocessor
of Processing Unit
Signal
PROCESSOR PROCESSOR
WATCHDOG FAILURE WATCHDOG FAILURE
FAILURE FAILURE
p p
1-c 1-c
FAILURE OF FAILURE OF
WATCHDOG DETECTS WATCHDOG DETECTS
WATCHDOG WATCHDOG
PROCESSOR FAILURE PROCESSOR FAILURE
SWITCH SWITCH
w c c w
Figure 2.10. Fault tree model of the watchdog timer application in Figure 2.9 (p: the
probability of processor failure, c: the coverage factor, w: the probability of watchdog timer
switch failure)
3.0E-07
.
.
2.5E-07
SystemUnavailability
2.0E-07
1.5E-07
1.0E-07
5.0E-08
0.0E+00
0.5 0.6 0.7 0.8 0.9 1
Coverage Factor
Figure 2.11. System unavailability along the coverage factor of watchdog timer in Figure
2.9
FAILURE OF
SAFETY FUNCTION
ALARM GENERATION
FAILURE
DISPLAY/ACTUATION INSTRUMENTATION
DEVICE FAILURE SENSOR FAILURE
Figure 2.12. The schematic of the concept of the safety function failure mechanism [22]
References
[1] Kang HG, Jang SC, Ha JJ (2002) Evaluation of the impact of the digital safety-critical
I&C systems, ISOFIC2002, Seoul, Korea, November 2002
[2] Sancaktar S, Schulz T (2003) Development of the PRA for the AP1000, ICAPP '03,
Cordoba, Spain, May 2003
[3] Hisamochi K, Suzuki H, Oda S (2002) Importance evaluation for digital control
systems of ABWR Plant, The 7th Korea-Japan PSA Workshop, Jeju, Korea, May
2002
[4] HSE (1998) The use of computers in safety-critical applications, London, HSE books
[5] Kang HG, et al. (2003) Survey of the advanced designs of safety-critical digital
systems from the PSA viewpoint, Korea Atomic Energy Research Institute,
KAERI/AR-00669/2003
[6] Goldberg BE, Everhart K, Stevens R, Babbitt N III, Clemens P, Stout L (1994)
System engineering Toolbox for design-oriented engineers, NASA Reference
Publication 1358
[7] Meshkat L, Dugan JB, Andrews JD (2000) Analysis of safety systems with on-
demand and dynamic failure modes, Proceedings of 2000 RM
[8] White RM, Boettcher DB (1994) Putting Sizewell B digital protection in context,
Nuclear Engineering International, pp. 4143
46 H.G. Kang
[9] Parnas DL, Asmis GJK, Madey J (1991) Assessment of safety-critical software in
nuclear power plants, Nuclear Safety, Vol. 32, No. 2
[10] Butler RW, Finelli GB (1993) The infeasibility of quantifying the reliability of life-
critical real-time software, IEEE Transactions on Software Engineering, Vol. 19, No.
1
[11] Kang HG, Sung T, et al (2000) Determination of the Number of Software Tests Using
Probabilistic Safety Assessment KNS conference, Proceeding of Korean Nuclear
Society, Taejon, Korea
[12] Littlewood B, Wright D (1997) Some conservative stopping rules for the operational
testing of safety-critical software, IEEE Trans. Software Engineering, Vol. 23, No. 11,
pp. 673685
[13] Saiedian H (1996) An Invitation to formal methods, Computer
[14] Rushby J (1993) Formal methods and the certification of critical systems, SRI-CSL-
93-07, Computer Science Laboratory, SRI International, Menlo Park
[15] Welbourne D (1997) Safety critical software in nuclear power, The GEC Journal of
Technology, Vol. 14, No. 1
[16] Dahll G (1998) The use of Bayesian belief nets in safety assessment of software based
system, HWP-527, Halden Project
[17] Eom HS, et al. (2001) Survey of Bayesian belief nets for quantitative reliability
assessment of safety critical software used in nuclear power plants, Korea Atomic
Energy Research Institute, KAERI/AR-594-2001, 2001
[18] Littlewood B, Popov P, Strigini L (1999) A note on estimation of functionally diverse
system, Reliability Engineering and System Safety, Vol. 66, No. 1, pp. 93-95
[19] Bastl W, Bock HW (1998) German qualification and assessment of digital I&C
systems important to safety, Reliability Engineering and System Safety, Vol. 59, pp.
163-170
[20] Choi JG, Seong PH (2001) Dependability estimation of a digital system with
consideration of software masking effects on hardware faults, Reliability Engineering
and System Safety, Vol. 71, pp. 45-55
[21] Bayrak T, Grabowski MR (2002) Safety-critical wide area network performance
evaluation, ECIS 2002, June 68, Gdask, Poland
[22] Kang HG, Jang SC (2006) Application of condition-based HRA method for a manual
actuation of the safety features in a nuclear power plant, Reliability Engineering &
System Safety, Vol. 91
[23] Kauffmann JV, Lanik GT, Spence RA, Trager EA (1992) Operating experience
feedback report human performance in operating events, USNRC, NUREG-1257,
Vol. 8, Washington DC
[24] Decortis F (1993) Operator strategies in a dynamic environment in relation to an
operator model, Ergonomics, Vol. 36, No. 11
[25] Park J, Jung W (2003) The requisite characteristics for diagnosis procedures based on
the empirical findings of the operators behavior under emergency situations,
Reliability Engineering & System Safety, Volume 81, Issue 2
[26] Julius JA, Jorgenson EJ, Parry GW, Mosleh AM (1996) Procedure for the analysis of
errors of commission during non-power mode of nuclear power plant operation,
Reliability Engineering & System Safety, Vol. 53
[27] OECD/NEA Committee on the safety of nuclear installations, 1999, ICDE project
report on collection and analysis of common-cause failures of centrifugal pumps,
NEA/CSNI/R(99)2
[28] OECD/NEA Committee on the safety of nuclear installations, 2003, ICDE project
report: Collection and analysis of common-cause failures of check valves,
NEA/CSNI/R(2003)15
http://www.springer.com/978-1-84800-383-5