Академический Документы
Профессиональный Документы
Культура Документы
Nancy G. Leveson
VENT
LA
GEARBOX
LC
CONDENSER
CATALYST
COOLING
VAPOR
WATER
REFLUX
REACTOR
COMPUTER
Types of Accidents
Component Failure Accidents
Single or multiple component failures
Usually assume random failure
System Accidents
Arise in interactions among components
No components may have "failed"
c
Safety
Reliability
c
c
All failures are faults but not all faults are failures.
c
c
'
c
SoftwareRelated Accidents
Are usually caused by flawed requirements
Incomplete or wrong assumptions about operation of
controlled system or required operation of computer.
Unhandled controlledsystem states and environmental
conditions.
c
A Possible Solution
Enforce discipline and control complexity
Limits have changed from structural integrity and physical
constraints of materials to intellectual limits
'
System Safety
A planned, disciplined, and systematic approach to
preventing or reducing accidents throughout the life
cycle of a system.
Organized common sense (Mueller, 1968)
Primary concern is the management of hazards:
Hazard
identification
evaluation
elimination
control
through
analysis
design
management
MILSTD882
'
'
Design
Development
Operations
Hazard identification
Hazard resolution
Verification
Change analysis
Operational feedback
Management
c
Process Steps
1. Perform a Preliminary Hazard Analysis
Produces hazard list
'
'
'
'
'
'
c
0
Human Factors
L
System Safety
A
]
Hazard List
Fault Tree Analysis
e
Completeness/Consistency
Analysis
>
Simulation/Experiments
J
Usability Analysis
P
Q
U
Safety Verification
Q
Operational Analysis
Safety Testing
Software FTA
A
A
X
Performance Monitoring
Operational Analysis
Change Analysis
Incident and accident analysis
Periodic audits
Periodic audits
c
V
Q
]
Change Analysis
Performance Monitoring
<
.
.
.
c
y
DESIGN CRITERION
REQUIREMENTS/CONSTRAINTS
1a. ATC shall provide advisories that
maintain safe separation between
aircraft.
1b. ATC shall provide conflict alerts.
REQUIREMENTS/CONSTRAINTS
HAZARDS
6. Loss of controlled flight or loss
of airframe integrity.
REQUIREMENTS/CONSTRAINTS
6a. ATC must not issue advisories outside
the safe performance envelope of the
aircraft.
6b. ATC advisories must not distract
or disrupt the crew from maintaining
safety of flight.
6c. ATC must not issue advisories that
the pilot or aircraft cannot fly or that
degrade the continued safe flight of
the aircraft.
6d. ATC must not provide advisories
that cause an aircraft to fall below
the standard glidepath or intersect
it at the wrong place.
y
SEVERITY
I
II
Catastrophic Critical
III
Marginal
IV
Negligible
A Frequent
IA
IIA
IIIA
IVA
B Moderate
IB
IIB
IIIB
IVB
C Occasional
IC
IIC
IIIC
IVC
D Remote
ID
IID
IIID
IVD
E Unlikely
IE
IIE
IIIE
IVE
F Impossible
IF
IIF
IIIF
IVF
LIKELIHOOD
y
C
Occasional
B
Probable
D
Remote
Catastrophic
I
E
F
Improbable Impossible
Critical
4
|
12
II
12
12
10
12
12
12
12
12
Marginal
4
|
III
Negligible
IV
10
11
12
Initiating
Events
Final
States
Initiating
Events
Final
States
nonhazard
nonhazard
HAZARD
HAZARD
nonhazard
nonhazard
nonhazard
nonhazard
Forward Search
Backward Search
Pressure
too high
Valve
failure
Relief valve 1
does not open
Relief valve 2
does not open
or
or
Computer does
not open
valve 1
Valve
failure
or
Sensor Computer
Failure
output
too late
Operator does
not know to
open valve 2
Operator
inattentive
and
Computer
does not issue
command to
open valve 1
Valve 1
Position
Indicator
fails on
Open
Indicator
Light fails
on
Requirements Completeness
Most softwarerelated accidents involve software requirements
deficiencies.
Accidents often result from unhandled and unspecified cases.
We have defined a set of criteria to determine whether a
requirements specification is complete.
Derived from accidents and basic engineering principles.
Validated (at JPL) and used on industrial projects.
Completeness: Requirements are sufficient to distinguish
the desired behavior of the software from
that of any other undesired program that
might be designed.
c
Startup, shutdown
Mode transitions
Inputs and outputs
Value and timing
Load and capacity
Environment capacity
Failure states and transitions
Humancomputer interface
Robustness
Data age
Latency
Feedback
Reversibility
Preemption
Path Robustness
Requirements Analysis
Model Execution, Animation, and Visualization
Completeness
State Machine Hazard Analysis (backwards reachability)
Software Deviation Analysis
Human Error Analysis
Test Coverage Analysis and Test Case Generation
Automatic code generation?
HAZARD REDUCTION
Design for controllability
Barriers
Lockins, Lockouts, Interlocks
Failure Minimization
Safety Factors and Margins
Redundancy
HAZARD CONTROL
Reducing exposure
Isolation and containment
Protection systems and failsafe design
DAMAGE REDUCTION
Decreasing cost
Increasing effectiveness
..
STAMP
(Systems Theory Accident Modeling and Processes)
ChainofEvents Models
ChainofEvents Example
Equipment
damaged
Operating
pressure
.
#
J
&
&
8
I
'
'
8
I
&
'
M
8
'
&
Fragments
projected
Tank
rupture
1
8
I
&
'
G
'
&
'
'
&
)
>
#
D
/
+
&
&
&
'
'
'
'
&
'
>
'
'
&
'
'
Personnel
injured
'
+
'
'
'
OR
Weakened
metal
Corrosion
>
AND
Moisture
&
'
'
'
>
'
>
#
)
'
>
'
'
&
'
<
>
'
>
c
X
System accidents
Software error
Adaptation
Major accidents involve systematic migration of organizational
behavior under pressure toward cost effectiveness in an
aggressive, competitive environment.
Vessel
Design
Harbor
Design
Cargo
Management
Design
Stability Analysis
Shipyard
Equipment
load added
Impaired
stability
Truck companies
Excess load routines
Passenger management
Excess numbers
Berth design
Calais
Passenger
Management
Traffic
Scheduling
Vessel
Operation
Operations management
Standing orders
Docking
procedure
Capsizing
Change of
docking
procedure
Berth design
Zeebrugge
Unsafe
heuristics
Operations management
Transfer of Herald
to Zeebrugge
Captains planning
Operations management
Time pressure
Crew working
patterns
Accident Analysis:
Combinatorial structure
of possible accidents
can easily be identified.
Systems Theory
Developed for biology (Bertalanffly) and cybernetics (Norbert Weiner)
For systems too complex for complete analysis
Separation into noninteracting subsystems distorts results
Most important properties are emergent.
and too organized for statistical analysis
Concentrates on analysis and design of whole as distinct from parts
(basis of system engineering)
Some properties can only be treated adequately in their entirety,
taking into account all social and technical aspects.
These properties derive from relationships between the parts of
systems how they interact and fit together.
STAMP
STAMP (2)
Views accidents as a control problem
e.g., Oring did not control propellant gas release by
sealing gap in field joint
Software did not adequately control descent speed of
Mars Polar Lander.
SYSTEM DEVELOPMENT
SYSTEM OPERATIONS
Government Reports
Lobbying
Hearings and open meetings
Accidents
Legislation
Government Reports
Lobbying
Hearings and open meetings
Accidents
Legislation
Certification Info.
Change reports
Whistleblowers
Accidents and incidents
Regulations
Standards
Certification
Legal penalties
Case Law
Company
Management
Policy, stds.
Company
Management
Status Reports
Risk Assessments
Incident Reports
Safety Policy
Standards
Resources
Safety Policy
Standards
Resources
Project
Management
Operations
Management
Hazard Analyses
Safety Standards
Hazard Analyses
Progress Reports
SafetyRelated Changes
Progress Reports
Problem reports
Operating Assumptions
Operating Procedures
Test reports
Hazard Analyses
Review Results
Revised
operating procedures
Safety
Reports
Hazard Analyses
Work
Procedures
safety reports
audits
work logs
inspections
Manufacturing
Operating Process
Human Controller(s)
Implementation
and assurance
Manufacturing
Management
Change requests
Audit reports
Work Instructions
Design,
Documentation
Safety Constraints
Standards
Test Requirements
Operations Reports
Software revisions
Hardware replacements
Documentation
Actuator(s)
Physical
Process
Design Rationale
Maintenance
and Evolution
Automated
Controller
Problem Reports
Incidents
Change Requests
Performance Audits
Sensor(s)
Note:
Does not imply need for a "controller"
Component failures may be controlled through design
e.g., redundancy, interlocks, failsafe design
or through process
manufacturing processes and procedures
maintenance procedures
But does imply the need to enforce the safety constraints
in some way.
New model includes what do now and more
Process 1
Controller 2
Process 2
Process
Process Models
Model of
Process
Model of
Automation
Sensors
Human Supervisor
(Controller)
Measured
variables
Process
inputs
Automated Controller
Controls
Displays
Model of
Process
Controlled
Process
Model of
Interfaces
Disturbances
Process
outputs
Actuators
Controlled
variables
development process
physical laws
etc.
In preventing accidents
Hazard analysis
2. Dynamic structure
Shows how the safety control structure changed over time
3. Behavioral dynamics
Dynamic processes behind the changes, i.e., why the
system changes
ARIANE 5 LAUNCHER
OBC
Executes flight program;
Controls nozzles of solid
boosters and Vulcain
cryogenic engine
Diagnostic and
flight information
D
Nozzle
command
Main engine
command
Booster
Nozzles
Main engine
Nozzle
SRI
Measures attitude of
launcher and its
movements in space
C
Horizontal
velocity
Strapdown inertial
platform
Backup SRI
Measures attitude of
launcher and its
movements in space;
Takes over if SRI unable
to send guidance info
C
Horizontal velocity
Ariane 5: A rapid change in attitude and high aerodynamic loads stemming from a
high angle of attack create aerodynamic forces that cause the launcher
to disintegrate at 39 seconds after command for main engine ignition (H0).
Nozzles: Full nozzle deflections of solid boosters and main engine lead to angle
of attack of more than 20 degrees.
ARIANE 5 LAUNCHER
OBC
Executes flight program;
Controls nozzles of solid
boosters and Vulcain
cryogenic engine
Diagnostic and
flight information
D
Nozzle
command
Main engine
command
Booster
Nozzles
Main engine
Nozzle
SRI
Measures attitude of
launcher and its
movements in space
C
Horizontal
velocity
Strapdown inertial
platform
Backup SRI
Measures attitude of
launcher and its
movements in space;
Takes over if SRI unable
to send guidance info
C
Horizontal velocity
DEVELOPMENT
Titan 4/Centaur/Milstar
OPERATIONS
Defense Contract
Management Command
contract administration
software surveillance
oversee the process
LMA System
Engineering
LMA Quality
Assurance
IV&V
Analex
Software Design
and Development
AnalexCleveland
verify design
LMA
Analex Denver
Ground Operations
(CCAS)
Honeywell
IMS software
Aerospace
Monitor software
development and test
.
5
System Hazard:
System Safety Constraints:
f
E
P
c
W
f
h
M
^
9
F
?
4
m
'
Design flaw:
Shallow location
Design flaw:
No chlorinator
Porous bedrock
Minimal overburden
Heavy rains
Dynamic Structure
M
^
F
?
0
]
m
4
'
Design flaw:
No chlorinator
Design flaw:
Shallow location
Porous bedrock
Minimal overburden
Heavy rains
J
c
s
t
w
}
t
z
u
}
u
}
u
}
u
z
u
}
t
t
u
y
u
}
Budget
cuts
Budget cuts
directed toward
safety
Rate of
safety increase
Complacency
External
Pressure
Rate of increase
in complacency
Performance
Pressure
Expectations
Perceived
safety
Risk
Launch Rate
Success
Success Rate
Accident Rate
Safety
System
safety
efforts
Priority of
safety programs
1. Identify
System hazards
System safety constraints and requirements
Control structure in place to enforce constraints
2. Model dynamic aspects of accident:
Changes to static safety control structure over time
Dynamic processes in effect that led to changes
3. Create the overall explanation for the accident
Inadequate control actions and decisions
Context in which decisions made
Mental model flaws
Control flaws (e.g., missing feedback loops)
Coordination flaws
TCAS Hazards
1. A near midair collision (NMAC)
(a pair of controlled aircraft violate minimum separation
standards)
2. A controlled maneuver into the ground
3. Loss of control of aircraft
4. Interference with other safetyrelated aircraft systems
5. Interference with groundbased ATC system
6. Interference with ATC safetyrelated advisory
c
Displays
Aural Alerts
Airline
Ops
Mgmt.
Pilot
Operating
Mode
TCAS
Advisories
FAA
Local
ATC
Ops
Mgmt.
Air Traffic
Aircraft
Radio
Controller
Advisories
Flight Data
Processor
Airline
Ops
Mgmt.
Aircraft
Pilot
Operating
Mode
Aural Alerts
Displays
Radar
TCAS
Step 4a: Augment control structure with process models for each
control component
Step 4b: For each of inadequate control actions, examine parts of
control loop to see if could cause it.
Guided by set of generic control loop flaws
Where human or organization involved must evaluate:
Context in which decisions made
Behaviorshaping mechanisms (influences)
Step 4c: Consider how designed controls could degrade over time
TCAS
Climb Inhibit
Inhibited
Not Inhibited
Unknown
On
System Start
Fault Detected
Descent Inhibit
Inhibited
Not Inhibited
Unknown
Sensivity Level
1
2
3
4
5
6
7
Current RA Sense
None
Climb
Descend
Unknown
Current RA Level
Status
Altitude Reporting
On ground
Airborne
Unknown
Inhibited
Not Inhibited
Unknown
Classification
Inhibited
Not Inhibited
Unknown
Yes
No
Lost
Unknown
RA Sense
Other Traffic
Proximate Traffic
Potential Threat
Threat
Unknown
None
Climb
Descend
None
Unknown
Altitude Layer
RA Strength
Layer 1
Layer 2
Layer 3
Layer 4
Unknown
Crossing
Not Selected
VSL 0
VSL 500
VSL 1000
VSL 2000
Nominal 1500
Increase 2500
None
VSL 0
VSL 500
VSL 1000
VSL 2000
Unknown
NonCrossing
IntCrossing
OwnCross
Unknown
Reversal
Reversed
Not Reversed
Unknown
Unknown
Other Bearing
Other Bearing Valid
Other Altitude
Other Altitude Valid
Range
Mode S Address
Sensitivity Level
Equippage
Sensors
Human Supervisor
(Controller)
Model of
Process
Model of
Automation
Measured
variables
Process
inputs
Automated Controller
Controls
Displays
Model of
Process
Model of
Interfaces
Controlled
Process
Disturbances
Process
outputs
Actuators
Controlled
variables