Академический Документы
Профессиональный Документы
Культура Документы
Reliability
Customer
Cost Schedule
Satisfaction
Performance
(v0.1) 3
The Need for Design for Reliability
(v0.1) 4
Design for Reliability
(v0.1) 5
Different Views of Reliability
◈ Customers +
View reliability as a system-level issue, Electrical
with minimal concern placed on the Reliability
distinction into sub-domains.
System
(v0.1) 6
Reliability vs. Cost
(v0.1) 7
Reliability vs. Cost, continued
1. Choose the best tools from all of the tools available and
apply these tools at the proper phases of the product life
cycle.
(v0.1) 8
Hardware Reliability and Cost Reduction Strategy
TOTAL COST
CURVE
OPTIMUM
COST POINT RELIABILITY
PROGRAM COSTS
COST
ELEC / MECH
WARRANTY &
IN-SERVICE COSTS
RELIABILITY
TOTAL COST
CURVE
OPTIMUM
COST POINT RELIABILITY
PROGRAM COSTS
COST
ELEC / MECH
WARRANTY &
IN-SERVICE COSTS
RELIABILITY
The software
◈ A product-driven focus on reliability in order to reduce impact on overall
product hardware and mechanical warranty and in-service warranty costs is
costs
minimal at best?
◈ This emphasis results in some increase in development and
manufacturing costs Why?
◈ A good reliability program will strive to optimize the total Life
(v0.1) Cycle Costs 10
Software Reliability and Cost
analysability
changeability
Portability stability
testability
adaptability
Maintainab installability
co-existence
ility replaceability
(v0.1) 13
Software Reliability Definitions
The
The customer
customer perception
perception of
of
the
the software’s
software’s ability
ability to
to deliver
deliver the
the expected
expected functionality
functionality
in
in the
the target
target environment
environment
without
without failing.
failing.
Software
Software reliability
reliability isis aa measure
measure of of
the
the software
software failures
failures that
that are
are visible
visible to
to aa customer
customer and
and
prevent
prevent aa system
system from
from delivering
delivering essential
essential functionality.
functionality.
(v0.1) 14
Terminology - Defect
(v0.1) 16
Terminology - Failure
NOTE: for the remainder of this presentation, the term “failure” will
refer only to failure of essential functionality, unless otherwise stated.
(v0.1) 17
Summary of Defects and Failures
(v0.1) 18
Software Availability
where,
MTTF is Mean Time To [next] Failure
MTTR (Mean Time To [operational] Restoration) is still the duration of the
outage, but without the notion of a “repair time.” Instead, it is the time until
the same system is restored to an operational state via a system reboot or
some level of software restart.
(v0.1) 19
Software Availability (continued)
(v0.1) 20
System Availability Timeframes
(v0.1) 21
Software Availability
1) Software Failure Detection a) Slow Recovery Action
a) Fast Recovery Action
Mechanism b) Recovery Speed b) Recovery Speed
2) Implementation Example c) MTTR c) MTTR
d) Mission failure rate
3) Failure Detection Speed d) Mission failure rate
(v0.1) 23
Software Design for Reliability (DfR)
◈ Formal Methods
Methodologies that utilize mathematical modeling of a system’s requirements
and/or design in order to prove correctness
Primarily used for safety-critical systems that require very high degrees of
◘ Confidence in expected system performance
◘ Quality audit information
◘ Targets of low or zero failure rates
(v0.1) 24
Hardware Reliability Practices
(v0.1) 25
Hardware Reliability Practices
◈ (continued)
Extending software designs after product deployment is commonplace
◘ Software updates are the preferred avenue for product extensions and
customizations
◘ Software updates provide fast development turnaround and have little or no
manufacturing or distribution costs
Hardware failure analysis techniques (FMEAs and FTAs) are rarely used with
software
◘ Software engineers find it difficult to adapt these techniques below the
system level
(v0.1) 26
Software Process Control Methodologies
◈ Software process control assumes a correlation between
development process maturity and latent defect density in the
final software [Jones-1]
Inherent Delivered
CMM Level
Defects/KSLOC Defects/KSLOC
5 7.8 0.39
4 15.6 1.09
3 31.2 2.1
2 62.4 3.4
1 124.8 5.8
◈ If the current process level does not yield the desired software
reliability, process audits and stricter process controls are
implemented
Reliability improvement under this type of controlled methodology may increase
very slowly
(v0.1) 27
Defect Removal Potential of “Best Practices”
(v0.1) [Jones-2] 28
Impact of Integrating Multiple “Best Practices”
Design
Inspections n n n n Y n Y Y
/ Reviews
Code
Inspections n n n Y n n Y Y
/ Reviews
Formal SQA
n Y n n n Y n Y
Processes
Formal
n n Y n n Y n Y
Testing
Median
40 45 53 57 60 65 85 99
Defect
% % % % % % % %
Efficiency
(v0.1) [Jones-2] 29
“Best in Class” Results by Application
“Best in Class”
Application Type
Defect Removal Efficiency
IT Software 73%
(v0.1) [Jones-2] 30
Phase Containment of Defects and Failures
Typical Behavior
Defect
Discovery Reqts Design Coding Testing Maintenance
Surprise!
Defect
Discovery Reqts Design Coding Testing Maintenance
(v0.1) 31
Typical Defect Reduction Behavior
200
150
100
50
SysBld SysBldSysBldSysBldSysBld
#1 #2 #3 #4 #5
(v0.1)
System Test 32
DfR Phase Containment Behavior
200
Goals are to:
(1) Reduce field failure by finding most of the failures in-house
150
100
50
Req Design Code Unit SysBld SysBld SysBld SysBld SysBld Field
Test #1 #2 #3 #4 #5 Failures
(v0.1)
Development System Test Deployment 33
Requirements Phase DfR Practices
(v0.1) 34
Design Phase DfR Practices
◈ The software designs should evolve using a multi-tiered approach
1. System Architecture
Identify all essential system-level functionality that requires software
Identify the role of software in detecting and handling hardware failure modes by
performing system-level failure mode analysis
◈ For software engineers, the highest ROI for defect and failure
detection and removal is low level design (LLD)
LLD defines sufficient module logic and flow control details to allow analysis based
on common failure categories and vulnerable portions of the design
◘ Failure handling behavior can be examined in sufficient detail
LLD bridges gap between traditional design specs and source code
◘ Most design defects that were previously caught during code reviews will
now be caught in the LLD review
◘ More likely to correctly fix design defects since the defect is caught in the
design phase
– Most design defects found after the design phase are not properly fixed
because the scheduling costs are too high
– Design defects require returning to the design phase to correct and
review the design and then correcting, re-reviewing and unit testing the
code!
(v0.1) 36
Code Phase DfR Practices
Software failure analysis will focus on the robustness of the exception handling
behavior
◘ Software failure analysis should be performed as a separate code inspection
once the code has undergone initial testing
(v0.1) 37
Test Phase DfR Practices
(v0.1) 38
Reliability Measurements and Metrics
◈ Definitions
Measurements – data collected for tracking or to calculate meta-data (metrics)
◘ Ex: defect counts by phase, defect insertion phase, defect detection phase
Metrics – information derived from measurements (meta-data)
◘ Ex: failure rate, defect removal efficiency, defect density
(v0.1) 39
Conclusion
Conclusions
(v0.1) 41
References
[Jones-1], Jones, C., “Conflict and Litigation Between Software Clients and
Developers”, March 4, 1996
[Jones-2], Jones, C., “Software Quality in 2002: A Survey of the State of the
Art”, July 23, 2002
(v0.1) 42