Академический Документы
Профессиональный Документы
Культура Документы
1
The De Havilland Comet
31 March 2004 (courtesy of Marc Schaeffer, find this photo at the following 3
website: www.geocities.com/CapeCanaveral/Lab/8803/)
Comet recounted
• First jet airline – speed and comfort
• 3 crashes between May 1953 and April
1954
• Extensive testing
• Catastrophic cracking from metal fatigue
• Fixes – rounded corners, reinforcing plates
• New understanding of metal fatigue
31 March 2004 4
2
Ariane 5
(Photographic source is ESA/CNES. You can find these photos at the following website:
www.mssl.ucl.ac.uk/www_plasma/missions/cluster/about_cluster/clu ster1/cluster1_images.html)
31 March 2004 5
Ariane 5 recounted
• Dual-redundant processors
• 3 unprotected variables that overflowed
• Processors reset on overflow, no graceful
recovery
• Used in Ariane 4, no check of flight dynamics
• Ariane 5 had > horizontal drift velocities
• Reuse is tricky, end-to-end system test
necessary
• Find report at:
www.esa.int/export/esaLA/Pr_33_1996_p_EN.html
31 March 2004 6
3
Therac 25
• Medical linear accelerator for treating tumors
• Mid-1980s overdosed six patients
• Problems
– Quick editing by operator caused race condition
– Cryptic error messages ignored
– No explanation in Users Manual of error codes
– 50 times full dose but displayed “no dose given”
– No mechanical interlocks
– No software reviews or audits, little
documentation
31 March 2004 7
Therac 25 Lessons
• Need general plan for system development
• The operator interface must be clear, intuitive,
and explained
• Hardware safeguards must limit software faults
• Good design, not testing, makes a safe system
• See Appendix A – Medical Devices: The Therac-
25 from Nancy Leveson, Safeware: System
Safety and Computers, Addison-Wesley, 1995.
31 March 2004 8
4
Chernobyl
Chernobyl recounted
• Chernobyl reactor 4 exploded, April 1986
• Released clouds of radioactive material for
10 days
• 100 x exposure over Hiroshima bomb
• Background
– Graphite block reactors unstable at low
reactivity
– Safety rules require power > 20% capacity at
all times
31 March 2004 10
5
Chernobyl events
– Experiment called for by engineers in Moscow
– Manual shutdown, automatic control turned off
– Power dropped to 1% capacity
– Removed more control rods
– Power crept up to 7%
– Turned on more water to produce more steam
– Water cooled reactor, dropping steam and reactivity
– Removed even more control rods
– Steam production rose until 1:22 a.m. when operators
shut off water flow
– Heat built up quickly, control rod sleeves bent
– Could not insert control rods
– Steam explosion
31 March 2004 11
Chernobyl Lessons
• Theoretical knowledge vs. hands-on
• Humans “over-steer” dynamic systems
• Humans don’t handle interacting,
nonlinear problems well
• “Groupthink”
• Understand human nature
– Clarity of function
– Reduce confounding problems
– Accommodate in system design
31 March 2004 12
6
Apple Lisa
(Part of the computer collection of Giorgio Ungarelli, photograph used with permission.)
31 March 2004 13
31 March 2004 14
7
Apple Lisa Lessons
• Prohibitive price for unappreciated
capability
• Cost-effective solutions rely on users’
understanding
• Failure falls into business/political arena –
difficult to predict and avoid
31 March 2004 15
Navy Terrier/LEAP
8
Terrier LEAP outline
31 March 2004 17
LEAP Target
(Photograph courtesy
of Raytheon, Inc.)
31 March 2004 18
9
LEAP General Operation
• High-resolution radars at Wallops Island track
target (shipboard radars insufficient)
• Wallops Island processor collected data from the
radars, filtered the target track with a six-state
Kalman filter, and transmitted the track to the
ship.
• Sent target tracks to ship via redundant telephone
landlines and Inmarsat satellite links
• Ship processor received the data, predicted the
intercept time and point, and indicated when to
launch the interceptor missile.
31 March 2004 19
31 March 2004 20
10
LEAP Testing Finds Problems
• End-to-end tests of the system
– simulated a target launch,
– transmitted the simulated data through the entire
system to the ship,
– calculated an intercept as if we were at sea.
• Redundant landlines – switch maintenance in
New Jersey cut off early test
• Separate landlines
– one through New Jersey
– other through Pennsylvania
31 March 2004 21
11
Testing Finds Problems (cont’d.)
• Two shipboard radars caused problems
– SPS-49 jammed the Inmarsat receivers
– SPS-20 jammed the GPS receivers
• Inmarsat situated on port and starboard
bridge to reduce superstructure blockage
• Too many dropouts with commercial
modems, switched to cell phone modems
31 March 2004 23
31 March 2004 24
12
LEAP: Lessons Learned
• Technical failure
• Simple, human error can interrupt the best
designs
• Careful development and thorough testing
necessary
• All components must be tested within the
system to uncover interactions
31 March 2004 25
Aegis LEAP
• A success story
• Three successful intercepts in 2002, more
in 2003
• Carefully planned development
31 March 2004 26
13
Aegis LEAP Flight Profile
31 March 2004 27
(Figure courtesy of the Johns Hopkins University Applied Physics Laboratory.)
14
Kinetic Kill Vehicle and Target
Image
15
Thorough Ground Test Program
• Separation tests – squibs, batteries, explosive bolts
• KW hover test for the closed loop pointing
• Air bearing tests of maneuvers: pitch-to-ditch, IR seeker
calibration, and pointing before separation
• Hardware-in-the-loop simulation and test of avionics
• KW tests for the IR seeker characterization, stabilization,
third stage interfaces
• Vacuum tests – PCB delamination, arcing, and outgassing
• Aerothermal testing in a hypersonic wind tunnel for
nosecone heating and outgassing, seeker shield function,
strake heating and insulation
31 March 2004 31
Types of Failure
16
Examples: Product Recalls
• [. . .] recalled 45,000 heaters for defective thermostats that
were improperly positioned, which could lead to the
overheating.
• [. . .] recalled 3.1 million dishwashers. The slide switch (the
lever that selects between heat drying and energy saving)
can melt and ignite over time, posing a fire hazard.
• [. . .] recalled 5,500 toy flashlights because the batteries
may overheat or leak and children can suffer burns from
the leaking battery.
• [. . .] recalled upright vacuum cleaners because the power
cord may break inside of the handle posing electrical
shock and burn injury hazards.
• http://www.matthewslawfirm.com
31 March 2004 33
31 March 2004 34
17
Examples: More Automotive
Recalls
• [. . .] recalled 263,000 1995-97 [vehicles] . . . The airbag
electronic control module (AECM) could corrode from
water or road salt and then accidentally fire the driver side
airbag.
• [. . .] recalled 757,000 1992-97 [vehicles] because higher
than specified electrical load through accessory power
feed circuit may cause a short circuit and allow current to
flow through ground wiring. This could cause overheating
and an electrical fire.
• [. . .] recalled 1995-97 [vehicles] because improperly
routed wire harness for the air-conditioner may permit
wires to rub together and short circuit, resulting in a blown
fuse, dead battery, or fire.
• http://www.matthewslawfirm.com
31 March 2004 35
31 March 2004 36
18
Elements of Unintended
Consequences in Previous Examples
• Passage of time – usually fielded units
• Nonobvious or obscure causes
• Environmental interactions, i.e. corrosion,
overheating
• Failure modes with significant effects, i.e.
fire or injury
31 March 2004 37
• Confounding complexity
– unforeseen circumstances
– multiple causes
• Human error
– nonobviousness to user
– improper use
– design oversight – even if it appears to be
a manufacturing problem
31 March 2004 38
19
Example: Complexity or Oversight?
• September 2003, Hurricane Isabel
• Power outages – trees down on power lines.
• NIST experienced 180 VAC for 20 minutes that
destroyed 1000s of fluorescent lamp ballasts
• Protective mechanisms for AC power were
controlled over telephone lines.
• Guess what was also knocked down by wind-
blown trees?
31 March 2004 39
31 March 2004 40
20
Remedies
• Truth in advertising – expertise, schedule estimation,
management style/employee responses
• Work hard to develop reasonable schedules
– review and testing
– plan for contingencies
• Continuous learning
– lessons learned, your own experience
– others’ experiences
• Reduce complexity
– understand and define interactions
– do not “reinvent the wheel”
– limit features
• Teamwork
31 March 2004 41
Integrity
• The “Big Picture”
• Truth in advertising (your capability and
skills)
• Estimation and scheduling
• Plan for the long term
– your success and reputation
– your product’s viability
– your company’s reputation
31 March 2004 42
21
Failure and How to Handle It
• Types of failure
– technical
– professional
Less control
– political/societal
Progression • Embrace failure
– admit and accept responsibility
– understand and learn
– put past behind you because others won’t
– forgive others’ failures; help them to
31 March 2004 rebound 43
Personal Examples
22
Technical Failure
• Ultraviolet satellite camera with image
intensifier
• Automatic gain control for image intensifier
• Nonlinear control problem
• First version – blooming/collapsing picture
• Second version – unreliable transmission
of gain value
31 March 2004 45
Image
intensifier
Camera Video
Frame signal
sync
reset
Dn
Hi-threshold
DAC
comparator
Up
Up-down
Pixel clock
counter
23
Technical Failure – 1st Version
• Problem: blooming/collapsing picture
• Background:
– Discrete logic, up-down counters
– Unstable for bright objects
– Not fully simulated or analyzed
– Short development time (flew breadboards)
• Should’a: analyzed/simulated expected
scenes during design
31 March 2004 47
31 March 2004 48
24
Technical Failure – 2nd Version
• Problem: unreliable transmission of gain value
• Background:
– Microcontroller implementation of AGC
– AGC stable for all scenes
– Readout of gain by ground equipment unreliable
– Analog encoding of gain into video frame
• Should’a:
– Use digital encoding into video frame for noise margin
– Needed better understanding of noise environment
31 March 2004 49
Professional Failure
• Asked to finish programming effort while
original designer moved onto other
projects
• False starts and procrastination
• Finally removed myself from project
31 March 2004 50
25
Professional Failure
• Problem: did not complete assignment
• Background:
– Mounds of documentation to plow through
– Early realization of no-win situation
• Lost motivation
• No real recognition of work obvious to me
• Should’a:
– Either not taken the job in the first place
– Or if no choice, plow through assignment while finding
another job (setting precedence)
31 March 2004 51
Professional/Business Failure
• Business deal
• My personal performance
– Technical excellence
– Professional excellence
– Maintained integrity
• Accused of bad stuff, which I did not do
• Deal fell through
31 March 2004 52
26
Professional/Business Failure
• Problem: business politics outside my control
• Background:
– Interesting proposition and product
– Long-term relationships
– Unknowns quantities introduced early in deal
– Weirdnesses grew
• Should’a:
– Either not make deal in the first place
– Or left earlier before weirdness got out of hand
• Note: always deal with integrity or don’t deal
31 March 2004 53
Political Failure
• Satellite subsystem
• Team’s performance
– Technical excellence
– Professional excellence
• NASA sponsor pulled project in-house
31 March 2004 54
27
Political Failure
• Problem: politics outside my company’s control
• Background:
– 6-month long set of trade studies to define architecture
– Thorough studies and review
– Schedule well understood, team prepared to build
system
– Groups at NASA out of work
– NASA pulled project in-house to feed their own
• Should’a:
– None, politics happen
31 March 2004 55
A Success Story
28
The Sidewinder Missile – A Success
Story
31 March 2004 57
Sidewinder recounted
• Goal: simple, sturdy, cheap missile
• Small development team, 1949 – 1953
• Simple, clever combination of ideas
– Rollerons: simple but important control
– Proportional navigation simplified circuitry
– Torque-balance servo for maneuvering
– Canard control fins reduced wiring and connectors
– Simple data acquisition equipment
• Extensive testing and prototyping
31 March 2004 58
29
Sidewinder Lessons
• Breakthroughs require vision
• Small teams facilitate commitment and
communications
• Simple and robust design
• Careful, thorough, and extensive testing
and integration
31 March 2004 59
30