Вы находитесь на странице: 1из 97

--``,-`-`,,`,,`,`,,`---

IEEE Standards

IEEE Std 1413.1-2002

1413.1

TM

IEEE Guide for Selecting and Using


Reliability Predictions Based on
IEEE 1413

IEEE Standards Coordinating Committee 37


IEEE Standards Coordinating Committee 37 on
Reliability Prediction

Published by
The Institute of Electrical and Electronics Engineers, Inc.
3 Park Avenue, New York, NY 10016-5997, USA
19 February 2003

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

Print: SH95020
PDF: SS95020

IEEE Std 1413.1-2002

IEEE Guide for Selecting and Using


Reliability Predictions Based on
IEEE 1413

Sponsor

IEEE Standards Coordinating Committee 37


on
Reliability Prediction
Approved 12 September 2002

IEEE-SA Standards Board

Abstract: A framework for reliability prediction procedures for electronic equipment at all levels is
provided in this guide.
Keywords: baseline, classic reliability, constant failure rate, estimation, failure, goal, item, operating environment, reliability prediction, requirement, system life cycle

The Institute of Electrical and Electronics Engineers, Inc.


3 Park Avenue, New York, NY 10016-5997, USA
Copyright 2003 by the Institute of Electrical and Electronics Engineers, Inc.
All rights reserved. Published 19 February 2003. Printed in the United States of America.
IEEE is a registered trademarks in the U.S. Patent & Trademark Ofce, owned by the Institute of Electrical and Electronics
Engineers, Incorporated.
Print:
ISBN 0-7381-3363-9 SH95020
PDF:
ISBN 0-7381-3364-7 SS95020
No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior
written permission of the publisher.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

IEEE Standards documents are developed within the IEEE Societies and the Standards Coordinating Committees of the
IEEE Standards Association (IEEE-SA) Standards Board. The IEEE develops its standards through a consensus development process, approved by the American National Standards Institute, which brings together volunteers representing varied
viewpoints and interests to achieve the nal product. Volunteers are not necessarily members of the Institute and serve without compensation. While the IEEE administers the process and establishes rules to promote fairness in the consensus development process, the IEEE does not independently evaluate, test, or verify the accuracy of any of the information contained
in its standards.
Use of an IEEE Standard is wholly voluntary. The IEEE disclaims liability for any personal injury, property or other damage, of any nature whatsoever, whether special, indirect, consequential, or compensatory, directly or indirectly resulting
from the publication, use of, or reliance upon this, or any other IEEE Standard document.
The IEEE does not warrant or represent the accuracy or content of the material contained herein, and expressly disclaims
any express or implied warranty, including any implied warranty of merchantability or tness for a specic purpose, or that
the use of the material contained herein is free from patent infringement. IEEE Standards documents are supplied AS IS.
The existence of an IEEE Standard does not imply that there are no other ways to produce, test, measure, purchase, market,
or provide other goods and services related to the scope of the IEEE Standard. Furthermore, the viewpoint expressed at the
time a standard is approved and issued is subject to change brought about through developments in the state of the art and
comments received from users of the standard. Every IEEE Standard is subjected to review at least every ve years for revision or reafrmation. When a document is more than ve years old and has not been reafrmed, it is reasonable to conclude
that its contents, although still of some value, do not wholly reect the present state of the art. Users are cautioned to check
to determine that they have the latest edition of any IEEE Standard.
In publishing and making this document available, the IEEE is not suggesting or rendering professional or other services
for, or on behalf of, any person or entity. Nor is the IEEE undertaking to perform any duty owed by any other person or
entity to another. Any person utilizing this, and any other IEEE Standards document, should rely upon the advice of a competent professional in determining the exercise of reasonable care in any given circumstances.
Interpretations: Occasionally questions may arise regarding the meaning of portions of standards as they relate to specic
applications. When the need for interpretations is brought to the attention of IEEE, the Institute will initiate action to prepare
appropriate responses. Since IEEE Standards represent a consensus of concerned interests, it is important to ensure that any
interpretation has also received the concurrence of a balance of interests. For this reason, IEEE and the members of its societies and Standards Coordinating Committees are not able to provide an instant response to interpretation requests except in
those cases where the matter has previously received formal consideration.
Comments for revision of IEEE Standards are welcome from any interested party, regardless of membership afliation with
IEEE. Suggestions for changes in documents should be in the form of a proposed change of text, together with appropriate
supporting comments. Comments on standards and requests for interpretations should be addressed to:
Secretary, IEEE-SA Standards Board
445 Hoes Lane
P.O. Box 1331
Piscataway, NJ 08855-1331
USA
Note: Attention is called to the possibility that implementation of this standard may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or
validity of any patent rights in connection therewith. The IEEE shall not be responsible for identifying patents
for which a license may be required by an IEEE standard or for conducting inquiries into the legal validity or
scope of those patents that are brought to its attention.
Authorization to photocopy portions of any individual standard for internal or personal use is granted by the Institute of
Electrical and Electronics Engineers, Inc., provided that the appropriate fee is paid to Copyright Clearance Center. To
arrange for payment of licensing fee, please contact Copyright Clearance Center, Customer Service, 222 Rosewood Drive,
Danvers, MA 01923 USA; +1 978 750 8400. Permission to photocopy portions of any individual standard for educational
classroom use can also be obtained through the Copyright Clearance Center.

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

Introduction
(This introduction is not part of IEEE Std 1413.1-2002, IEEE Guide for Selecting and Using Reliability Predictions
Based on IEEE 1413.)

IEEE Std 1413-1998, IEEE Standard Methodology for Reliability Predictions and Assessment for Electronic
Systems and Equipment, provides a framework for reliability prediction procedures for electronic equipment
at all levels. This guide is a supporting document for IEEE Std 1413-1998. This guide describes a wide variety of hardware reliability prediction methodologies.
The scope of this guide is processes and methodologies for conducting reliability predictions for electronic
systems and equipment. This guide focuses on hardware reliability prediction methodologies, and specically excludes software reliability, availability and maintainability, human reliability, and proprietary reliability prediction data and methodologies. These topics may be the subjects for future IEEE 1413 guides.
The purpose of this guide is to assist in the selection and use of reliability prediction methodologies satisfying IEEE Std 1413. The guide also describes the appropriate factors and criteria to consider when selecting
reliability prediction methodologies.

Participants
At the time this standard was completed, the Reliability Prediction Standard Development Working Group
had the following membership:
Michael Pecht, Chair
Gary Buchanan
Jerry L. Cartwright
Dr. Victor Chien
Dr. Vladimir Crk
Dr. Diganta Das

Dan N. Donahoe
Jon G. Elerath
Lou Gullo
Jeff W. Harms
Harold L. Hart
Tyrone Jackson

Dr. Aridaman Jain


Yvonne Lord
Jack Sherman
Thomas J. Stadterman
Dr. Alan Wood

Other contributors who aided in the development of this standard by providing direction and attending meetings were as follows:
Dr. Glenn Blackwell
Jens Braband
Bill F. Carpenter
Helen Cheung
Lloyd Condra
Dr. Michael J. Cushing
Dr. Krishna Darbha
Dr. Abhijit Dasgupta
Tony DiVenti

Sheri Elliott
Dr. Ralph Evans
Diego Gutierre
Edward B. Hakim
Patrick Hetherington
Zhenya Huang
Nino Ingegneri
Margaret Jackson
Dr. Samuel Keene
Dr. Dingjun Li

Stephen Magee
Dr. Michael Osterman
Arun Ramakrishnan
Jack Remez
Mathew Samuel
Kevin Silke
John W. Sullivan
Ricky Valentin
Nancy Neeld Youens

The following members of the balloting committee voted on this standard. Balloters may have voted for
approval, disapproval, or abstention.
Dr. Vladimir Crk
Dr. Michael J. Cushing
Dr. Diganta Das
Richard L. Doyle

Jon G. Elerath
Harold L Hart
Dennis R. Hoffman
Dr. Aridaman Jain

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Jack Sherman
Thomas J. Stadterman
Ricky Valentin
Dr. Alan Wood

--``,-`-`,,`,,`,`,,`---

Not for Resale

iii

When the IEEE-SA Standards Board approved this standard on 12 September 2002, it had the following
membership:
James T. Carlo, Chair
James H. Gurney, Vice Chair
Judith Gorman, Secretary
Sid Bennett
H. Stephen Berger
Clyde R. Camp
Richard DeBlasio
Harold E. Epstein
Julian Forster*
Howard M. Frazier

Nader Mehravari
Daleep C. Mohla
William J. Moylan
Malcolm V. Thaden
Geoffrey O. Thompson
Howard L. Wolfman
Don Wright

Toshio Fukuda
Arnold M. Greenspan
Raymond Hapeman
Donald M. Heirman
Richard H. Hulett
Lowell G. Johnson
Joseph L. Koepnger*
Peter H. Lips

*Member Emeritus

Also included are the following nonvoting IEEE-SA Standards Board liaisons:
Alan Cookson, NIST Representative
Satish K. Aggarwal, NRC Representative

Andrew Ickowicz
IEEE Standards Project Editor

iv
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Not for Resale

Contents
1.

Overview.............................................................................................................................................. 1
1.1 Scope............................................................................................................................................ 1
1.2 Purpose......................................................................................................................................... 1
1.3 Glossary ....................................................................................................................................... 1
1.4 Contents ....................................................................................................................................... 1

2.

References............................................................................................................................................ 2

3.

Definitions, abbreviations, and acronyms............................................................................................ 8


3.1 Definitions.................................................................................................................................... 8
3.2 Abbreviations and acronyms........................................................................................................ 9

4.

Background ........................................................................................................................................ 10
4.1 Basic concepts and definitions................................................................................................... 10
4.2 Reliability prediction uses and timing ....................................................................................... 16
4.3 Considerations for selecting reliability prediction methods ...................................................... 18

5.

Reliability prediction methods........................................................................................................... 18


5.1 Engineering information assessment ......................................................................................... 19
5.2 Predictions based on field data .................................................................................................. 23
5.3 Predictions based on test data .................................................................................................... 33
5.4 Reliability predictions based on stress and damage models ...................................................... 41
5.5 Reliability prediction based on handbooks ................................................................................ 49
5.6 Assessment of reliability prediction methodologies based on IEEE 1413 criteria.................... 55

6.

System reliability models................................................................................................................... 67


6.1 Reliability block diagram........................................................................................................... 68
6.2 Fault-tree analysis (FTA)........................................................................................................... 76
6.3 Reliability of repairable systems................................................................................................ 77
6.4 Monte Carlo simulation ............................................................................................................. 80

Annex A (informative) Statistical data analysis ......................................................................................... 83


Annex B (informative) Bibliography.......................................................................................................... 90
v

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE Guide for Selecting and Using


Reliability Predictions Based on
IEEE 1413

1. Overview
IEEE Std 1413-1998 [B5]1 provides a framework for reliability prediction procedures for electronic equipment at all levels. This guide is a supporting document for IEEE Std 1413-1998. This guide describes a wide
variety of hardware reliability prediction methodologies.

1.1 Scope
The scope of this guide is processes and methodologies for conducting reliability predictions for electronic
systems and equipment. This guide focuses on hardware reliability prediction methodologies, and specically excludes software reliability, availability and maintainability, human reliability, and proprietary
reliability prediction data and methodologies. These topics may be the subjects for additional future IEEE
guides supporting IEEE Std 1413-1998.

1.2 Purpose
The purpose of this guide is to assist in the selection and use of reliability prediction methodologies satisfying IEEE Std 1413-1998. The guide accomplishes this purpose by briey describing a wide variety of
hardware reliability prediction methodologies. The guide also describes the appropriate factors and criteria
to consider when selecting reliability prediction methodologies.

Many of the terms used to describe reliability prediction methodologies have multiple meanings. For example, the term reliability has a specic mathematical meaning, but the word is also used to mean an entire eld
of engineering study. Clause 2 contains denitions of the terms that are used in this document, taken primarily from The Authoritative Dictionary of IEEE Standards Terms, Seventh Edition [B3]. The terms reliability
and failure are discussed in more detail in Clause 4.

1.4 Contents
Clause 4 provides background information for reliability prediction methodologies. This background information includes basic reliability concepts and denitions, reliability prediction uses, reliability prediction
relationship with a system life cycle, and factors to consider when selecting reliability prediction methodologies. Clause 5 describes reliability prediction methodology inputs and reliability prediction methodologies
for components, assemblies, or subsystems. These methodologies include reliability predictions based on
eld data, test data, damage simulation, and handbooks. Clause 6 describes methodologies for combining
the predictions in Clause 5 to develop system level reliability predictions. These methodologies include reliability block diagrams, fault trees, repairable system techniques, and simulation.
1The

numbers in brackets correspond to those of the bibliography in Annex B.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

1.3 Glossary

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

2. References
This standard shall be used in conjunction with the following publications. When the following specications are superseded by an approved revision, the revision shall apply.
ABS Group, Inc., Root Cause Analysis Handbook, A Guide to Effective Incident Investigation, Risk & Reliability Division, Rockville, MD, 1999.

Asher, H. and Feingold, H., Repairable Systems Reliability: Modeling, Inference, Misconceptions and
Their Causes, lecture notes in statistics, Volume 7, Marcel Decker, New York, 1984.
Baxter, L. A. and Tortorella, M., Dealing With Real Field Reliability Data: Circumventing Incompleteness
by Modeling & Iteration, Proceedings of the Annual RAMS Symposium, pp. 255262, 1994.
Bhagat, W., R&M through Avionics/Electronics Integrity Program, Proceedings of the Annual Reliability
and Maintainability Symposium, pp. 216219, 1989.
Black, J. R., Physics of Electromigration, Proceedings of the IEEE International Reliability Physics Symposium, pp. 142149, 1983.2
Bowles, J. B., A Survey of Reliability-Prediction Procedures for Microelectronic Devices, IEEE Transactions on Reliability, Vol. 41, No. 1, pp. 212, March 1992.
Braub, E. and MacDonald, S., History and Impact of Semiconductor Electronics, Cambridge University
Press, Cambridge, 1977.
British Telecom, Handbook of Reliability Data for Components Used in Telecommunication Systems, Issue
4, January 1987.
Cox, D. R., Renewal Theory, Methuen, London, 1962.
Cunningham, J, Valentin, R., Hillman, C., Dasgupta, A., and Osterman, M., A Demonstration of Virtual
Qualication for the Design of Electronic Hardware, Proceedings of the ESTECH 2001, IEST, Phoenix,
AZ, April 2001.
Cushing, M. J., Krolewski, J. G., Stadterman, T. J., and Hum, B. T., U.S. Army Reliability Standardization
Improvement Policy and Its Impact, IEEE Transactions on Components, Packaging, and Manufacturing
Technology, Part A, Vol. 19, No 2, pp. 277278, June 1996.
Cushing, M. J., Mortin, D. E., Stadterman, T. J., and Malhotra, A., Comparison of Electronics-Reliability
Assessment Approaches, IEEE Transactions on Reliability, Vol. 42, No. 4, pp. 542546, December 1993.
Dasgupta, A. Failure Mechanism Models For Cyclic Fatigue, IEEE Transactions on Reliability, Vol. 42,
No. 4, pp. 548555, December, 1993.
Dasgupta, A., Oyan, C., Barker, D., and Pecht, M., Solder Creep-Fatigue Analysis by an Energy-Partitioning Approach, ASME Transactions on Electronic Packaging, Vol. 144, pp. 152160, 1992.3
2Information

on IEEE documents may be obtained by contacting the Institute of Electrical and Electronics Engineers, Inc., at http://
www.ieee.org.
3ASME publications are available from the American Society of Mechanical Engineers, 3 Park Avenue, New York, NY 10016-5990,
USA (http://www.asme.org/).

2
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Alvarez, M. and Jackson, T., Quantifying the Effects of Commercial Processes on Availability of Small
Manned-Spacecraft, Proceedings of the 2000 Annual Reliability and Maintainability Symposium (RAMS),
pp. 305310, January 2000.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Dasgupta, A., and Pecht, M., Failure Mechanisms and Damage Models, IEEE Transactions on Reliability,
Vol. 40, No. 5, pp. 531536, 1991.
Decker, Gilbert F., Assistant Secretary of the Army (Research, Development, and Acquisition), Memorandum for Commander, U.S. Army Material Command, Program Executive Ofcers, and Program Managers,
15 February, 1996.
Denson, W., A Tutorial: PRISM, RAC Journal, pp. 16, 3rd Quarter 1999.
Denson, W., Keene, S., and Caroli, J., A New System-Reliability Assessment Methodology, Proceedings
of the 1998 Annual Reliability and Maintainability Symposium, pp. 413420, January 1998.
Denson, W. and Priore, M., Automotive Electronic Reliability Prediction, SAE paper 870050.4
Dew, John R., In Search of the Root Cause, Quality Progress, pp. 97102, March 1991.
Elerath, J., Wood, A., Christiansen, D., and Hurst-Hopf, M., Reliability Management and Engineering in a
Commercial Computer Environment, Proceedings of the Annual Reliability and Maintainability
Symposium, pp. 323329, Washington, D.C., January 1821, 1999.

Gullo, L., In-Service Reliability Assessment and Top-Down Approach Provides Alternative Reliability Prediction Method, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 365377,
Washington, D.C., January 1821, 1999.
Hahn, Gerald J., and Shapiro, Sammuel S., Statistical Models in Engineering, John Wiley and Sons, Inc.,
New York, New York, 1967.
Hakim, E. B., Reliability Prediction: Is Arrhenius Erroneous, Solid State Technology, Vol. 33, No. 8, p. 57,
Aug 1990.
Hallberg, ., Hardware Reliability Assurance and Field Experience in a Telecom Environment, Quality
and Reliability Engineering International, Vol. 10, No. 3, pp. 195200, 1994.
Hallberg, . and Lfberg, J., A Time Dependent Field Return Model for Telecommunication Hardware,
Advances in Electronic Packaging 1999: Proceedings of the Pacic Rim/ASME International Intersociety
Electronic and Photonic Packaging Conference (InterPACK 99), pp. 17691774, The American Society of
Mechanical Engineers, New York, 1999.
Hu, J. M., Physics-of-failure Based Reliability Qualication of Automotive Electronics, SAE Communications in RMS Journal, pp. 2133, 1994.
Hughes, J. A., Practical Assessment of Current Plastic Encapsulated Microelectronic Devices, Quality and
Reliability Engineering International, Vol. 5, No. 2, pp. 125129, 1989.
Jackson, T., Integration of Sneak Circuit Analysis with FMEA, Proceedings of the 1986 Annual Reliability
and Maintainability Symposium, pp. 408414, 1986.
Jensen, Finn, Electronic Component Reliability, John Wiley and Sons, Inc., New York, New York, 1995.
4SAE

publications are available from the Society of Automotive Engineers, 400 Commonwealth Drive, Warrendale, PA 15096, USA
(http://www.sae.org/).

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Engelmaier, W., Fatigue Life Of Leadless Chip Carriers Solder Joints During Power Cycling, IEEE Transactions on Components, Hybrids, and Manufacturing Technology, Vol. CHMT-6, pp. 232237, September
1983.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Johnson, B. G. and Gullo, L., Improvements in Reliability Assessment and Prediction Methodology, Proceedings of the 2000 Annual Reliability and Maintainability Symposium (RAMS), pp. 181187, January
2000.
Jones, J. and Hayes, J., A Comparison of Electronic-Reliability Prediction Models, IEEE Transactions on
Reliability, Vol. 48, No. 2, pp. 127134, June 1999.
Jordan, J., Pecht M., Fink, J., How Burn-In Can Reduce Quality and Reliability, International Journal of
Microcircuits, Vol. 20, No. 1, pp. 3640, First Quarter, 1997.
Kececioglu, B. D., Reliability Engineering Handbook, Vols. 1 and 2, Prentice Hall, Englewood Cliffs, NJ
07632, 1991.
Kervarrec, G., Monfort, M. L., Riaudel, A., Klimonda, P. Y., Coudrin, J. R., Razavet, D Le, Boulaire, J. Y.,
Jeanpierre, P., Perie, D., Meister, R., Casassa, S., Haumont, J. L., and Liagre., A., Universal Reliability Prediction Model for SMD Integrated Circuits Based on Field Failures, Microelectronics Reliability, Vol. 39,
No. 6-7, pp. 765771, June-July 1999.
Klion, J., Practical Electronic Reliability Engineering, Van Nostrand Reinhold, New York, New York, 1992.
Knowles, I., Is It Time For a New Approach? IEEE Transactions on Reliability, Vol. 42, No. 1, pp. 3,
March, 1993.
Lall, P., Pecht, M., and Hakim, E. B., Inuence of Temperature on Microelectronics and System Reliability:
A Physics of Failure Approach, CRC Press, New York, New York, 1997.
Latino, R. L., and Latino, K. C., Root Cause Analysis: Improving Performance for Bottom Line Results,
CRC Press, Boca Raton, Florida, 1999.
Leonard, C. T., Failure Prediction Methodology Calculations Can Mislead: Use Them Wisely, Not Blindly,
Proceedings of the National Aerospace and Electronics Conference NAECON, Vol. 4, pp. 12481253, May
1989.
Leonard, C. T., How Failure Prediction Methodology Affects Electronic Equipment Design, Quality and
Reliability Engineering International, Vol. 6, No. 4, pp. 243249, 1993.
Leonard, C. T., Mechanical Engineering Issues and Electronic Equipment Reliability: Incurred Costs Without Compensating Benets, IEEE Transactions on Components Hybrids and Manufacturing Technology,
Vol. 13, pp. 895902, 1990.
Leonard, C. T., On US MIL-HDBK-217 and Reliability Prediction, IEEE Transaction on Reliability, Vol.
37, pp. 450451, 1988.
Leonard, C. T., Passive Cooling for Avionics Can Improve Airplane Efciency and Reliability,
Proceedings of the IEEE 1989 National Aerospace and Electronics Conference NAECON, Vol. 2102, pp.
18871892, 1989.
Lewis, E. E., Introduction to Reliability Engineering, John Wiley and Sons, Inc., New York, New York, 1996.
Luthra, P., MIL-HDBK 217: What is Wrong with it?, IEEE Transactions on Reliability, Vol. 39, pp. 518,
1990.
Lycoudes, N., and Childers, C. G., Semiconductor Instability Failure Mechanism Review, IEEE Transactions on Reliability, Vol. 29, pp. 237247, 1980.

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Meyer, Paul L, Introductory Probability and Statistical Applications, Addison-Wesley, Menlo Park, pp 328
335, 1970.
MIL-HDBK-217F, Reliability Prediction of Electronic Equipment, Version F, U.S. Department of Defense,
U.S. Government Printing Ofce, February 28, 1995.5
Miner, M. A., Cumulative Damage in Fatigue, Journal of Applied Mechanics, A-159, 1945.
--``,-`-`,,`,,`,`,,`---

Mobley, R. K., Root Cause Failure Analysis (Plant Engineering Maintenance Series), Butterworth-Heinemann, Woburn, Massachusetts, 1999.
Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for Engineers, John Wiley and
Sons, Inc., New York, New York, 1994.
Morris, S. F., Use and Application of MIL-HDBK-217, Solid State Technology, pp. 6569, August 1990.
Nash, F. R., Estimating Device Reliability: Assessment of Credibility, Kluwer Academic Publishers, Boston, MA, 1993.
Nelson, Wayne, Accelerated Testing, John Wiley and Sons, Inc., New York, New York, pp. 71107, 1990.
OConnor, P. D. T. Commentary: ReliabilityPast, Present, and Future, IEEE Transactions on Reliability,
Vol. 49, No. 4, pp. 335341, December 2000.
OConnor, P. D. T., Reliability: Measurement or Management? Quality Assurance, Vol. 12, No. 2, pp. 46
50, 1986.
OConnor, P. D. T., Reliability Prediction: A State-Of-The-Art Review, IEEE Proceedings A, Vol. 133, No.
4, pp. 202216, 1986.
OConnor, P. D. T., Reliability Prediction for Microelectronic Systems, Reliability Engineering, Vol. 10,
No. 3, pp. 129140, 1985.
OConnor, P. D. T., Reliability Prediction: Help or Hoax, Solid State Technology, Vol. 33, pp. 5961, 1991.
OConnor, P. D. T., Statistics in Quality and Reliability. Lessons from the Past, and Future Opportunities,
Reliability Engineering & System Safety, Vol. 34, No. 1, pp. 2333, 1991.
OConnor, P. D. T., Quantifying Uncertainty in Reliability and Safety Studies, Microelectronics and Reliability, Vol. 35, No. 9-10, pp. 13471356, 1995.
OConnor, P. D. T., Undue Faith in US MIL-HDBK-217 for Reliability Prediction, IEEE Transactions on
Reliability, Vol. 37, p. 468, 1988.
Osterman, M. and Stadterman, T., Failure-Assessment Software for Circuit-Card Assemblies, Proceedings
of the Annual Reliability and Maintainability Symposium, pp. 269276, Jan 1999.
Pease, R., Whats All This MIL-HDBK-217 Stuff Anyhow? Electronic Design, pp. 8284, October 24,
1991.
Pecht, J. and Pecht, M., Long-Term Non-Operating Reliability of Electronic Products, CRC Press, Boca
Raton, FL, 1995.
5MIL publications are available from Customer Service, Defense Printing Service, 700 Robbins Ave., Bldg. 4D, Philadelphia, PA
19111-5094.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Pecht, M., Integrated Circuit, Hybrid, and Multichip Module Package Design Guidelines, John Wiley and
Sons, Inc., New York, New York, 1993.
Pecht, M., Dasgupta, A., Barker, D., and Leonard, C. T., The Reliability Physics Approach to Failure Prediction Modelling (sic), Quality and Reliability Engineering International, Vol. 6, pp 267273, 1990.
Pecht, M., and Ko, W., A Corrosion Rate Equation For Microelectronic Die Metallization, The Journal of
the International Society of Hybrid Microelectronics, Vol. 13, No. 2, pp. 4152, June 1990.
Pecht, M. and Nash, F., Predicting the Reliability of Electronic Equipment, Proceedings of the IEEE, Vol.
82, No. 7, pp. 9921004, July 1994.
Pecht, M., Nguyen, L. T., and Hakim, E. B., Plastic Encapsulated Microelectronics, John Wiley and Sons,
Inc., New York, New York, 1994.
Pecht, M. and Ramappan, V., Are Components Still the Major Problem: a Review of Electronic System and
Device Field Failure Returns, IEEE Transactions on Components, Hybrids, and Manufacturing Technology,
Vol. 15, pp. 11601164, December 1992.
Raheja, D., Death of a Reliability Engineer, Reliability Review, Vol. 10, March 1990.
Rao, S. S., Reliability-Based Design, McGraw-Hill, Inc., NY, pp. 505543, 1992.
Reliability Assessment Center, PRISM, Version 1.3, System Reliability Assessment Software, Reliability
Assessment Center, Rome, NY, June 2001.
Rome Air Development Center, NONOP-1Non-Operating Reliability Databook, Rome Air Development
Center, 1987.
Rome Air Development Center, RADC-TR-73-248: Dormancy and Power On-Off Cycle Effects on Electronic Equipment and Part Reliability, Rome Air Development Center, August 1973.
Rome Air Development Center, RADC-TR-80-136: Nonoperating Failure Rates for Avionics Study, Rome
Air Development Center, April 1980.
Rome Air Development Center, RADC-TR-85-91: Impact of Nonoperating Periods on Equipment Reliability, Rome Air Development Center, May 1985.
Rooney, J. P., Storage Reliability, 1989 Proceeding of the Annual Reliability and Maintainability Symposium, pp. 178182, January 1989.
Ross, S., Stochastic Processes, John Wiley and Sons, Inc., New York, New York, 1983.
SAE G-11 Committee, Aerospace Information Report on Reliability Prediction Methodologies for Electronic Equipment AIR5286, Draft Report, January 1998.
Shetty, S., Lehtinen V., Dasgupta, A., Halkola, V., and Reinikainen, T., Fatigue of Chip-Scale Package Interconnects due to Cyclic Bending, ASME Transactions in Electronic Packaging, Vol. 123, No. 3, pp. 302
308, Sept. 2001.

--``,-`-`,,`,,`,`,,`---

Siemens AG, Siemens Company Standard SN29500, Version 6.0, Failure Rates of Electronic Components,
Siemens Technical Liaison and Standardization, November 9, 1999.

6
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Stadterman, T., Cushing, M., Hum, B., Malhotra, A. and Pecht, M., The Transition from Statistical-Field
Failure Based Models to Physics-of-Failure Based Models for Reliability Assessment of Electronic Packages, Proceedings of the INTERpack 95, Lahaina, Maui, HI, pp. 619625, March 2630, 1995.
Stensrud, A. C., Fear of Reform, Military & Aerospace Electronics, pp. 1219, December 1994.
Telcordia Technologies, Special Report SR-332: Reliability Prediction Procedure for Electronic Equipment,
Issue 1, Telcordia Customer Service, Piscataway, N. J., May 2001.
Tummla, R. and Rymaszeewski, E., Microelectronics Packaging Handbook, Van Nostrand Reinhold, New
York, NY, 1989.
Union Technique de LElectricit, Recueil de donnes des abilite: RDF 2000, Modle universel pour le calcul de la abilit prvisionnelle des composants, cartes et quipements lectroniques (Reliability Data
Handbook: RDF 2000A universal model for reliability prediction of electronic components, PCBs, and
equipment), July 2000.
Upadhyayula, K. and Dasgupta, A., An Incremental Damage Superposition Approach for Interconnect Reliability Under Combined Accelerated Stresses, ASME International Mechanical Engineering Congress &
Exposition, Dallas, TX, 1997.
U.S. Army MIRADCOM, LC-78-1: Missile Material Reliability Prediction Handbook, U.S. Army MIRADCOM, Redstone Arsenal, February 1978.
Wattson, G. F. MIL Reliability: A New Approach, IEEE Spectrum, Vol. 29, pp. 4649, 1992.
Wilson, P. D., Dell, L. D., and Anderson, G. F., Root Cause Analysis: A Tool for Total Quality Management,
ASQC Quality Press, Milwaukee, Wisconsin, 1993.
Witzmann, S. and Giroux, Y., Mechanical Integrity of the IC Device Package: A Key Factor in Achieving
Failure Free Product Performance, Transactions of the First International High Temperature Electronics
Conference, Albuquerque, NM, pp. 137142, June 1991.
Wong, K. L., A Change in Direction for Reliability Engineering is Long Overdue, IEEE Transactions on
Reliability, Vol. 42, p. 261, 1993.
Wong, K. L., The Bathtub Curve and Flat Earth Society, IEEE Transactions on Reliability, Vol. 38, pp.
403404, 1989.
Wong, K. L., What Is Wrong with the Existing Reliability Prediction Methods? Quality and Reliability
Engineering International, Vol. 6, No. 4, pp. 251257, 1990.
Wong, K. L., and Lindstrom, D. L., Off the Bathtub onto the Roller-Coaster Curve (Electronic Equipment
Failure), Proceedings of the Annual Reliability and Maintainability Symposium, pp. 356363, 1988.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Wong, K. L., Quart, I., Kallis, J. M., and A. H. Burkhar, Culprits Causing Avionic Equipment Failures,
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 416421, 1987.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

3. Denitions, abbreviations, and acronyms


3.1 Denitions
3.1.1 baseline: The set of data values selected as reference for comparing other similar sets of future data
values.
3.1.2 Bx life (e.g., B10): Time until a specied percent of a device population will have experienced a failure
(B10 means the time duration when 10% of a devices population will have experienced a failure).
3.1.3 Cx condence level (e.g., C90): Condence level (C90 means condence of 90%).
3.1.4 classic reliability: The probability that an item will perform its intended function for a specied interval under stated conditions.
3.1.5 constant failure rate: A hazard (see 3.1.14) rate that is constant or independent of time (applies only
to exponentially distributed failure assumptionsee 4.1.4.1).
3.1.6 estimation: A systematic procedure for deriving an approximation to the true value of a population
parameter.
3.1.7 failure: The termination of the ability of an item to perform a required function.
3.1.8 failure cause (root cause): The circumstances during design, manufacture, or use which have led to a
failure.
3.1.9 failure criticality: The combined effect of the qualied consequences of a failure mode and its probability of occurrence. Syn: risk.
3.1.10 failure mechanism: The physical, chemical, or other process that results in failure.
NOTEThe circumstance that induces or activates the process is termed the root cause of the failure.

3.1.11 failure mode: The effect by which a failure is observed to occur.


3.1.12 failure site: Failure site is the specic location where a failure mechanism occurs.
3.1.13 goal: An objective that is desirable to meet, but it is not mandatory to meet.
3.1.14 hazard rate: The hazard rate is the instantaneous rate of failure of the product.
3.1.15 item: An all-inclusive term to denote any level of hardware (or system) assembly.
3.1.16 item characteristics: The set of technical parameters that comprehensively dene an item, including
its goals and performance.

3.1.18 operating environment: The natural or induced environmental conditions, anticipated system interfaces, and user interactions within which the system is expected to be operated.

8
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

3.1.17 Lx life (e.g., L60): Time until a specied percent of a device population will have experienced a failure (L60 means the time duration when 60% of a devices population will have experienced a failure). Lx is
the same as Bx.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

3.1.19 operating prole: A set of functional requirements that are expected to apply to the system during its
operational life.
3.1.20 requirement: A condition or capability that must be met or possessed by a system or system component to satisfy a contract, standard, specication, or other formally imposed documents.
3.1.21 system life cycle: The period of time, that begins when a system is conceived and ends when the system is no longer available for use.

3.2 Abbreviations and acronyms

--``,-`-`,,`,,`,`,,`---

AFR
ASIC
BOM
CCA
CDF
CNET
CSP
CTE
DOA
DRAM
DSIC
EEPROM
EOS
EPROM
ESD
ESD
FAIT
FFOP
FITs
FMEA
FMECA
FPMH
FRACAS
FTA
HALT
LCC
LCR
MCBF or MCTF
MFOP
MIRADCOM
MLE
MTBF
MTBR
MTBSC
MTBSI
MTBWC
MTTF
NFF
NTT
ORT
OST
PCB

annualized failure rate


application specic integrated circuits
bill of materials
circuit card assembly
cumulative distribution function
The Centre National dEtudes des Telecommunications
chip scale package
coefcient of thermal expansion
dead on arrival
dynamic random access memory
Defense Standards Improvement Council
electrically erasable programmable read-only memory
electrical overstress
erasable programmable read-only memory
electrostatic discharge
electrostatic discharge
fabrication, assembly, integration, and test
failure-free operating period
failures per billion hours
failure modes and effects analysis
failure modes, effects and criticality analysis
failures per million hours
Failure Reporting and Corrective Action System
fault tree analysis
highly accelerated life tests
leadless ceramic capacitor
leadless ceramic resistor
mean-cycles/miles-between/before-failure
maintenance-free operating period
The U.S. Army Missile Research and Development Command
maximum likelihood estimation
mean-time-before/between-failure
mean-time-between-return/repair/replacement
mean-time-between-service call
mean-time-between-service interruption
mean-time-between-warranty claim
mean time to failure
no failure found
Nippon Telegraph and Telephone Corporation
ongoing reliability tests
over-stress tests
printed circuit board

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

PLCC
PROM
RAC
RBOC
RDT
ROM
SAE
SDDV
SRAM
TDDB
UTE

IEEE GUIDE FOR SELECTING AND USING

plastic leaded chip carrier


programmable read-only memory
Reliability Analysis Center
Regional Bell Operating Companies
reliability demonstration tests
read-only memory
Society of Automotive Engineers
stress driven diffusive voiding
static random access memory
time-dependent dielectric breakdown
Union Technique de LElectricit

4. Background
Background information on basic reliability concepts and denitions are provided in this clause. Subclause
4.1 provides basic reliability concepts and denitions commonly used in reliability engineering such as failure and hazard rate, bathtub curve, statistical distributions, and reliability metrics. It also introduces repairable and non-repairable system concepts. Subclause 4.2 describes some of the uses of reliability predictions
and how reliability predictions t into the system life cycle. Subclause 4.3 describes factors that should be
considered when selecting a reliability prediction method.

4.1 Basic concepts and denitions


This subclause provides background information on basic reliability concepts and denitions. Subclause
4.1.1 discusses common usage and denitions of the terms reliability and failure and associated concepts.
Subclause 4.1.2 describes the bathtub curve. Subclause 4.1.3 provides the characteristics of statistical distributions commonly used in reliability prediction. Subclause 4.1.4 describes reliability metrics. Subclause
4.1.5 is a brief discussion of the concepts for repairable and non-repairable system reliability.
4.1.1 Reliability and failure
The word reliability is used in many different ways. It can be used to broadly describe an engineering discipline or to narrowly dene a specic performance metric. Classic reliability is the probability that an item
will perform its intended function for a specied interval under stated conditions (see Pecht, [B11]). In classical reliability, a single product, unit, or component is generally considered to have two states, operational
and failed. The assumption is made that the product can only transition from the operational to the failed
state (no repair). The state of the product is represented by a random variable that takes a value of one when
it is operational and zero when it is failed. Classic reliability, R(t), is the probability that the product is in the
operational state up to time t. At time 0, the product is assumed to be good, and the product must eventually
fail, so R(0) = 1, R() = 0, and R(t) is a non-increasing function. If there is a mission with duration T, the
classic reliability for that mission is R(T).
A failure occurs when an item does not perform a required function. However, in practice, the word failure is
often used to mean whatever the customer considers as a failure. There is also the concept of transient or
intermittent failures, in which an item does not provide the specied performance level for a period, and then
once again provides the specied performance level, without repair of the item and sometimes without any
intervention.
Although reliability theory is based on the concept of failures, the denitions of a failure may be very different from different perspectives. To a hardware engineer, a failure means a component replacement and
verication of the replaced component failure. To a manufacturing repair depot, a failure is a returned
component. To the nance department, a failure is a warranty claim. To a service organization, a failure

10
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

When testing or tracking a subsystem, assembly, or component, the denition of failure becomes important.
For a simple unit under test with a binary response, dening failure is relatively easy: either the unit operates
or it doesnt. However, even digital devices may experience degraded modes such as changes in the timing of
critical signals or signal bounce under certain switching conditions, which will affect the system performance. Similarly, analog devices may show a slowed response or excessive drift from nominal. How far can
it drift before the device is considered failed? If a pull-up resistor on a digital circuit drifts from 10,000 ohms
to 7,000 ohms, it may still provide the intended function even though the magnitude of the change is 30%
decrease in resistance. For an electronic control system, the failure criteria may be the number of milliseconds delay in system response. Acceptable levels of drift or change should be established.
4.1.2 Bathtub curve
The idealized bathtub curve, shown in Figure 1, represents three phases of product hazard rates. The vertical
axis is the hazard rate, and the horizontal axis is time from initial product operation. The rst phase, often
called infant mortality, represents the early life of the product when manufacturing imperfections or other
initial failure mechanisms may appear. The hazard rate decreases during this time as the product becomes
less likely to experience failure from one of these mechanisms. The second phase, sometimes called the
useful life, represents the majority of the product operating time. During this period of time, the hazard
rate of the product appears to be constant, i.e., a constant failure rate. The third phase, often called wearout, occurs near the end of the expected product life and often represents failure mechanisms caused by
cumulative damage. Electronic components are subject to wear-out due to electromigration, material degradation, and other mechanisms. Weibull, lognormal, or other statistical distributions can be used to describe
hazard rates during both infant mortality and wear-out. Mathematically, the idealized bathtub curve is
actually the composite of 3 distinct distributions representing the 3 product behavior phasesan initial
decreasing hazard rate distribution in the early operation, an exponential distribution (constant failure rate)
the useful life, and an increasing hazard rate distribution for end of life. However, in practice, the bathtub
curve will not be so simple, may be a combination of many distributions representing many different failure
modes, and may not have the characteristic shape shown in Figure 1 (see Wong, K. L., and Lindstrom, D. L.,
Off the Bathtub onto the Roller-Coaster Curve (Electronic Equipment Failure), Proceedings of the Annual
Reliability and Maintainability Symposium6).

6Information

on references can be found in Clause 2.

11

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

is a service call for corrective maintenance (as opposed to planned or preventive maintenance). To a customer, a failure is a degradation of service or capability of the system (a failure that is tolerated without
service interruption is considered a degradation of the product capability since it is then less able to tolerate
future failures). These various denitions of failure lead to various types of metrics, such as replacement
rate or service call rate, in addition to the classic reliability metric of constant failure rate. For reliability predictions and discussions, it is important to make clear the system hierarchy level to which a failure applies.
For example, a component failure may not cause a system failure, particularly in systems that include redundancy or fault tolerance.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Hazard Rate

Decreasing Hazard
Rate (Infant Mortality)

Constant Hazard Rate


(Random Failures)

Increasing Hazard
Rate (Wearout)

Time
Figure 1Idealized bathtub curve
4.1.3 Statistical distributions
Many different statistical distributions are used in reliability prediction (see Table 1). These statistical distributions are sometimes called life distributions and usually represent the probability that an item is operating at a particular time. Classic reliability, hazard rate, mean life, and other reliability metrics can be
calculated from these distributions. A more detailed discussion of these distributions can be found in several
textbooks (e.g., Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for Engineers, and
Pecht, M., Nguyen, L. T., and Hakim, E. B., Plastic Encapsulated Microelectronics).
Table 1Example distributions used in developing reliability predictions
[see Alvarez, M. and Jackson, T., Quantifying the Effects of Commercial
Processes on Availability of Small Manned-Spacecraft]
Distribution name
Binomial

Density function, f(t)


Comb{n;x}pxqnx, where n is the number of trials, x ranges from 0 to n, p is the probability of
success and q is 1p.

Exponential

exp(t), where is the constant failure rate and the inverse of MTBF. Applies to middle
section of idealized bathtub curve (constant failure rate).

Gamma

(1/!+1)texp(t/), where is the scale parameter and is the shape parameter.

Lognormal

(2 t2 2)1/2exp{[(ln(t))/]2/2}, where is the mean and is the standard deviation.

Normal

(22)1/2exp{[(t)/]2/2}, where is the mean and is the standard deviation.

Poisson

(t)xexp(t)/x!, where x is the number of failures and is the constant failure rate. Appropriate distribution for number of failures from a device population in a time period when the
devices have an exponential distribution and are replaced upon failure.

Weibull

(/)(t/)1exp((t/)), where a is the scale parameter and is the shape parameter. Infant
Mortality (shape parameter < 1); Wear-out (shape parameter > 1); constant failure rate (shape
parameter = 1).

12
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

4.1.4 Measuring reliability


The classical denition of reliability is the probability of providing a specied performance level for a specied duration in a specied environment. This probability is a useful metric for mission-oriented, low volume products such as spacecraft. However, reliability metrics for most high-volume products measure the
reliability of a product population rather than the performance of a single system or a mission. Specifying a
single value such as MTBF is not sufcient for a product that exhibits a time-dependent hazard rate (i.e.,
non-constant failure rate). In this case, a more appropriate metric is the probability of mission success. This
metric may be time dependent, e.g., the probability of mission success may vary depending on the length of
the mission, or number of cycles, e.g., the probability of mission success may vary depending on the number
of uses. For one-shot devices, where the mission is a single event such as a warhead detonation, the probability of success will be a single number. Constant rate metrics are discussed in 4.1.4.1. Probabilities of success metrics are described in 4.1.4.2.
A useful reliability function is the cumulative hazard function, H(t), that can be derived from the Equation
H(t) = ln(R(t)). The derivative of the cumulative hazard function is the hazard rate, h(t) (see Pecht [B11]).
4.1.4.1 Constant rate reliability metrics
The hazard rate is the instantaneous rate of failure of the product. When the hazard rate is constant, or independent of time, it is usually designated by a parameter l. Since
t

H(t) =

h ( t ) dt = t

for a constant failure rate, the previous equation becomes the familiar R(t) = exp( t), the exponential distribution. The constant parameter is usually called the constant failure rate, although sometimes the function
h(t) is also called the failure rate, and there are many references in the literature to increasing or decreasing
failure rates.7
A constant failure rate has many useful properties, one of them is the mean value of the products life distribution is 1/. This mean value represents the statistically expected length of time until product failure and is
commonly called the mean life, or mean-time-before/between-failure (MTBF). Another useful property of
the constant failure rate is that it can be estimated from a population as the number of failures divided by
time without having to t a distribution to failure times. However, it should be noted that the exponential distribution is the only distribution for which the hazard rate is a constant and that the mean life is not 1/h(t)
when the hazard rate is not a constant.
MTBF is sometimes misunderstood to be the life of the product rather than an expression of the constant
failure rate.8 If a product has an MTBF of 1,000,000 hours, it does not mean that the product will last that
long (longer than the average human lifetime). Rather, it means that, on the average, one of the products will
fail for every 1,000,000 hours of product operation, i.e., if there are 1,000,000 products in the eld, one of
them will fail in one hour on the average. In this case, if product failures are truly exponentially distributed,
then 63% of the products will have failed after 1,000,000 hours of operation. Products with truly exponentially distributed failures over their entire lifetime almost never occur in practice, but a constant failure rate
and MTBF may be a good approximation of product failure behavior.
7Since

failure rate is so often implicitly interpreted as a constant parameter, the term constant failure rate is used throughout this guide
to mean the constant parameter l of the exponential distribution. The term hazard rate is used whenever the derivative of the hazard
function varies with time, e.g., decreasing hazard rate or increasing hazard rate.
8The use of mean time to failure (MTTF) and MTBF is not standard in either reliability literature or industry practice. In some contexts,
MTTF is used for non-repairable items, and MTBF is used for repairable items. In some contexts, either or both MTTF and MTBF are
implicitly assumed to imply a constant failure rate. For convenience and to help minimize confusion in this guide, MTTF is used in conjunction with non-repairable items, MTBF is used in conjunction with repairable items, and both are used only in conjunction with a
constant failure rate. When the hazard rate is not a constant, the mean value of the reliability distribution is referred to as the mean life
rather than the MTBF or MTTF.

13

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

If the constant rate is represented by the parameter , the mean value of the exponential distribution is 1/, as
discussed in the preceding subclause. Therefore, constant rate metrics can be described either as a rate or as
a mean life, e.g., constant failure rate or MTBF. Constant rate reliability metrics are approximations based
on the assumption of the exponential distribution. Predictions for these alternative types of reliability metrics
are discussed in 5.2. The term constant rate is used throughout this document to refer to this collection of
metrics that includes, but is not limited to, constant failure rate. Constant rate metrics other than constant
failure rate are collectively referred to as non-failure metrics. Table 2 contains a list of constant rate
metrics and their equivalent mean life inverses along with an indication of when these metrics might be
appropriate.
Table 2Example Constant Rate Reliability Metrics (NoteEquivalent non-constant versions of these metrics can also be used, e.g. hazard rate in place of constant failure rate)
Mean life equivalent

Denition

Constant failure rate

Mean-Time-Between/
Before-Failure (MTBF) or
Mean-Time-To-Failure
(MTTF)

Total failures divided by


total population operating
time; can be expressed as
failures per year annualized failure rate (AFR),
failures per billion hours
(FITs), failures per million
hours (FPMH).

Standard metric for


reliability predictions;
measure of inherent system hardware reliability.

Constant failure rate using


cycles or distance instead
of time

Mean-Cycles/MilesBetween/Before-Failure
(MCBF) or MCTF

Total failures divided by


total population number of
cycles or distance, e.g.,
miles.

Standard metric for reliability predictions when


usage is more relevant
than time. These metrics
are sometimes converted
to time-based metrics by
specifying an operating
prole.

Constant return/repair rate

MTBR (Mean Time


Between Return/Repair)

Total returns/repairs
divided by total population operating time.

Useful for sizing a repair


depot or manufacturing
repair line.

Constant replacement rate

MTBR (Mean Time


Between Replacement)

Total replacements
divided by total population operating time.

Used as surrogate for constant failure rate when no


failure analysis is available; useful for warranty
analysis.

Constant service or
customer call rate

MTBSC (Mean Time


Between Service Call)

Total service/customer
calls divided by total
population operating time.

Customer perception of
constant failure rate; useful for sizing support
requirements.

Constant warranty claim


rate

MTBWC (Mean Time


Between Warranty Claim)

Total warranty claims


divided by warranted population operating time.

Useful for pricing warranties and setting warranty


reserves.

Constant service
interruption rate

MTBSI (Mean Time


Between Service
Interruption)

Total service interruptions


divided by total population operating time.

Customer perception of
constant failure rate; may
be an availability metric.

14
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Use

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Constant rate metric

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

There are several equivalent ways of expressing the constant rate metrics in Table 2. For example, a constant
failure rate of 1% per year is equivalent to 0.01 failures per unit per year, 1.1 failures per million hours, 1100
FITs, and 10 failures per 1000 products per year (assuming replacement, 9.95 failures per 1000 products per
year without replacement).
Constant rate reliability metrics may be used as surrogates for time-dependent metrics by specifying different constant failure rates for different periods of time such as the Infant Mortality phase. For example, a
constant failure rate goal might be stated as 3, for rst 3 months of production or operation, 2, for next 3
months, and , after 6 months, where , is the constant failure rate during the useful life phase. Another
approach for replacing a time-dependent metric with a constant failure rate is to determine the expected
number of failures for a certain time period and to specify a constant failure rate during that time period. For
example, a product that follows the idealized bathtub curve might be expected to have 50 failures in a population of 1,000 in the rst year, 30 failures in each of the next 3 years, and 60 failures in the 5th year. This
products reliability may be approximated with an average constant failure rate of 4.6 failures per million
hours (4,600 FITs) for 5 years, equivalent to 200 failures in a population of 1,000 in 5 years.
4.1.4.2 Probability of success metrics
For non-exponential life distributions, i.e., when the failure rate is not constant, metrics other than failure
rates may be more meaningful. An easier to manipulate metric for non-exponential distributions is classic
reliability, i.e., the probability of success. This metric can be a point on one of the reliability distributions
shown in Table 1 or can be a convolution of many distributions.
Another way of expressing probability of success is the percentage of a population that survives a specied
duration. For this metric, percentiles of the distribution may be used. These percentiles may be stated as a
B life or an L life. For example, an L10 life of 300,000 hours means that 10% of the product population
would have experienced a failure by 300,000 hours of operation, and a B50 life of 5 years means that 50% of
the product population would have experienced a failure by 5 years of operation. Some metrics may be
stated with a condence level, e.g., R96C90 in the automotive industry means a reliability of 96% with 90%
condence. Table 3 contains a list of probability of success metrics along with their denitions and uses.
Table 3Example probability of success metrics
Metric

Denition

Example use

Classic reliability

Probability that a product performs a required function


under stated conditions for a stated period of time.

Classic reliability metric.

Lx life, e.g., L10 life

Time until 10% of a device population will have experienced a failure.

Mechanical items, e.g.,


fans. Electronic components with wearout, e.g.,
electrolytic capacitors.

Bx life, e.g., B10 life

Same as Lx life.

Automobile industry.

Failure-Free Operating
Period (FFOP)

An operating period where a product performs a


required function under stated conditions without
failure.

Application where a
period of time without
failure is required.

Maintenance-Free Operating Period (MFOP)

An operating period where a product performs a


required function under stated conditions without a
failure or maintenance action.

Applications where no
failure or maintenance is
allowed or possible.

Mean mission duration

Integral of the reliability function R(t) from 0 to the


specied design life.

Spacecraft.

15

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

4.1.5 Repairable and non-repairable system concepts


A repairable item is one that can be restored to functionality after failure, e.g., a system can be restored by
repair or replacement of a component(s) either physically or functionally. Examples of repairable items are
cars and computers. A non-repairable item is one that cannot be restored to functionality after failure (or one
for which an organizations maintenance policy does not restore after failure). Examples of non-repairable
items are light bulbs and fuses. Note that a repairable item may contain non-repairable lower-level items. For
example, if a circuit card assembly is repaired by replacing a resistor that is thrown away, the circuit card
assembly is repairable while the resistor is non-repairable.
Repair can affect the reliability prediction. Some redundant systems permit repair of a failed component that
restores the component to an operational state while the system continues to operate. Reliability distributions
are calculated from multiple failures of the same item in eld or test data. If repair of the item does not return
it to a good as new condition, these multiple failures represent different life distributions, and they cannot
be used to estimate the parameters of the original life distribution. When the repaired component has the
same statistical life distribution as the original component, it may be considered good as new.
Examples of techniques for analyzing repairable systems are Poisson processes (homogeneous and nonhomogeneous), renewal theory, and Markov analysis. Repairable system concepts are described briey in
Clause 6. If additional information is desired, sources such as Asher, H. and Feingold, H., Repairable Systems Reliability: Modeling, Inference, Misconceptions and Their Causes may be consulted.

4.2 Reliability prediction uses and timing


As stated in 1.2, the purpose of this guide is to assist in the selection and use of reliability prediction methodologies satisfying IEEE Std 1413-1998. However, before selecting a reliability prediction method, the
desired uses of the prediction (why), the appropriate time in the system life cycle to perform the prediction
(when), the item(s) for which the reliability prediction is to be performed (what), and the factors that should
be considered in selecting the appropriate reliability prediction method (how) should be considered. Subclause 4.2.1 describes some of the uses for a reliability prediction to help dene why a prediction is being
performed. Subclause 4.2.2 explains how reliability predictions t into the product life cycle, which helps to
identify the appropriate time to use a certain reliability prediction method. The factors that should be considered in selecting the appropriate reliability prediction method are discussed in 4.3 and 5.1.
Predictions can be made using data obtained during engineering development phases. Not only is the raw
data useful to determine the reliability but the rate of change in reliability can also be used to show additional improvements. Reliability improvement (growth) models available in literature include the Gompertz,
Lloyd-Lipow, Logistic Reliability Growth, Duane and AMSAA for non-homogeneous Poisson processes
(see Kececioglu, B. D., Reliability Engineering Handbook, Vols. 1 and 2).
4.2.1 Reliability prediction uses
Subclause 1.3 of IEEE Std 1413-1998 lists the uses of a reliability prediction. Those uses are also listed
below followed by text that expands upon the usage and describes the associated value.

Reliability goal assessment: Reliability predictions are used to help assess if the system will satisfy
its reliability goals.
Comparisons of designs and products: Most systems have design implementation options. Trade-offs
must be made among the various options, and reliability prediction is an important input in these
trade-offs. These options may even affect the system architecture, e.g., the amount and level of
redundancy. Since trade-offs must often be made early in the design process, the reliability prediction may be very preliminary. However, it is still useful since the important information may be the
relative reliability and ranking of design choices rather than a precise quantitative value.

16

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Method to identify potential reliability improvement opportunities: Reliability improvement activities should generally focus on the areas with the greatest opportunity for improvement. A reliability
prediction quanties the opportunity by identifying the relative reliability of various units and by
predicting the reliability improvement obtained from a reliability improvement activity.
Logistics support: Reliability predictions are a key input into spare parts provisioning and calculation
of warranty and life cycle costs.
Reliability and system safety analyses: Design and process analyses such as failure modes, effects
and criticality analysis (FMECA), event tree analysis, and fault tree analysis (FTA) may be performed to uncover items associated with reliability or safety related risk.
Mission reliability estimation: Missions may have multiple phases with different equipment congurations and system reliability models can be used to predict reliability for the entire mission.
Prediction of eld reliability performance: A reliability prediction provides an estimate of how reliably a product will perform in future eld usage based on how it performed in past eld usage. This
information may impact operational concepts, contingency planning, and other support planning
activities.

4.2.2 Reliability predictions in the system life cycle


There are ve system life cycle phases dened in IEEE Std 1220-1998 [B4]. These phases are listed below
along with the appropriate reliability prediction output of each phase:
a)
--``,-`-`,,`,,`,`,,`---

b)

c)

d)

e)

System Denition PhaseDenes the system requirements, including system interface specications. In the requirements denition phase, existing reliability data can be used for a reliability prediction based on similarity with existing design(s) or in-service product(s). Although this reliability
prediction will only be a rough estimate, it may be useful to help establish reliability goals, make
reliability allocations, use in trade-off studies, and/or aid in dening the high level system architecture. The output of this phase is reliability metrics and goals.
Preliminary Design PhaseDenes the subsystem requirements and functional architecture. The
preliminary reliability prediction, which is the output of this phase, is based on a well-dened functional description and a somewhat less well-dened physical description of the system.
Detailed Design PhaseCompletes the design and requirement allocation to the lowest level. The
design reliability prediction, which is the output of this phase, is more precise than the earlier ones
because it is based on documentation that denes the ready-to-manufacture system, such as design
and performance specications, parts and materials lists, and circuit and layout drawings.
Fabrication, Assembly, Integration, and Test (FAIT) PhaseVeries that the system satises its
operational requirements, which may require building prototypes or conducting tests. The operational reliability prediction, which is the output of this phase, includes the anticipated effects of the
manufacturers processes on the eld reliability of the system.
Production/Support PhaseManufactures, ships, and supports the system in the eld, including resolution of any deciencies. The eld reliability prediction, which is the output of this phase, is based
on the eld reliability data collected, possibly combined with other predictions. When changes are
made to the systems design or manufacturing process, the eld reliability prediction is updated. An
example of a production change that may affect the eld reliability prediction is parts obsolescence.
Parts that are available in the Detailed Design phase may not be available in the Production/Support
phase, and the subsequent change in the Bill of Materials (BOM) may signicantly affect the eld
reliability of the system.

The reliability prediction methodologies that are described in Clause 5 of this guide may be used in any
phase of the system life cycle, as long as the required engineering information is available. However,
because of the progressive nature of the system life cycle, there may be times when certain reliability prediction methods are preferred due to the type and quality of the available engineering information. For example,
eld data necessary for a reliability prediction based on eld data usually becomes available in the Production/Support Phase. However, eld data from similar in-service systems can be used for reliability predictions earlier in the life cycle. Similarly, the test data necessary for a reliability prediction based on test data

17

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

usually becomes available in the FAIT Phase. The engineering information necessary for each type of reliability prediction is explained in more detail in Clause 5.

4.3 Considerations for selecting reliability prediction methods


Subclause 5.1 describes the characteristics and required input data for reliability prediction methods. These
characteristics and input data are important criteria for selecting appropriate reliability prediction methods,
but there are also other factors that may inuence the choice of a reliability prediction method. These additional factors include product technology, consequences of system failure, failure criticality and available
resources.

--``,-`-`,,`,,`,`,,`---

Product technology: Product technology may inuence the selection of a reliability prediction
method in several ways. If the product technology is similar to that used in previous products, reliability prediction methods that make use of historical data or analyses may be appropriate. If the
product technology is new, it may be necessary to develop new models.
Consequences of system failure: The desired reliability prediction precision is a function of the
social or business consequences of a system failure. In general, the higher the risk, the higher is the
desire for accurate predictions, where risk includes both business risk and social risk. The risks
include nancial losses caused by delays in certication, nes emanating from regulatory requirements, delay in time-to-market, loss of customer condence, costs and results of litigation, safety,
and information privacy and security. Social risk refers to the potential for human injury or environmental disruption.
Failure criticality: Failure of an item contained in a system does not necessarily imply system failure.
The consequences of each items failure modes can be variable, ranging from system failure to
unnoticeable. The probability of occurrence of each failure mode can also be variable. It may be
important to spend more resources evaluating those failure modes with the most severe consequences of failure and/or the highest probability of occurrence.
Available resources: The choice of reliability prediction method may be affected by available
resources, including time, budget, and information. Some reliability prediction methods may require
engineering information or data that is unavailable, e.g., historical or test data. Time or budget limitations may prevent necessary data from being gathered. The skill levels and familiarity with certain
prediction types of the available personnel may inuence reliability prediction method selection.
External inuences: External inuences may impact the selection of a reliability prediction. An organization may have a specied reliability prediction method used for all products or all products of a
certain type. Customers and regulators may dictate the type of reliability prediction method used or
may require a precision that can only be obtained by certain methods. In addition, a bias for or
against certain types of prediction methods on the part of the customers or development organization
may inuence reliability prediction method selection. The available information on operating environment and prole may limit the applicable reliability prediction methods. The selection of a
reliability metric may also limit the applicable reliability prediction methods since some methods
are useful for only certain types of metrics, e.g., constant failure rate. The engineering information
available from a vendor may only support certain types of reliability prediction methods, or a vendor
may only have the capability to perform certain types of reliability prediction methods.

5. Reliability prediction methods


This clause provides information for selecting and using reliability prediction methods. Subclause 5.1 discusses the engineering information that should be considered in selecting a reliability prediction method and
performing a reliability prediction. Subclauses 5.2 and 5.3 describe the prediction methods that are based on
eld data and test data, respectively. Subclause 5.4 discusses reliability predictions based on stress and damage models. Subclause 5.5 describes reliability prediction methods based on reliability handbooks. Subclause 5.6 describes the assessment of different reliability prediction methodologies based on IEEE Std
1413-1998.

18
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

5.1 Engineering information assessment


The engineering information used in developing a reliability prediction may be collected from many different sources, including the manufacturers database, the customers database, in-house database, or public
domain literature and databases. It is necessary to consider the available resources, in terms of time, money,
and labor, for collecting this information and using it to develop a reliability prediction with the desired
degree of precision. Engineering information includes the following:

Reliability requirements
System architecture
Operating environment
Operating prole
Failure modes, mechanisms, and causes

Prior to the selection of a reliability prediction method(s), the type and quality of the engineering information that is available should be examined. Generally, the level of condence in the engineering information is
commensurate with the life cycle maturity of the system for which it applies (see 4.2).
--``,-`-`,,`,,`,`,,`---

5.1.1 Reliability requirements and goals


Reliability requirements dene the specied system reliability and reliability goals dene the desired system
reliability. Reliability requirements and goals may be stated either as a single value, e.g., a minimal level of
reliability below which the predicted system reliability is unacceptable, or a range of values, e.g., the 3
sigma values for a normal distribution that is centered at a specic time. Different reliability requirements or
goals may be dened for different system functions, e.g., when a system is in a high-power mode or a lowpower mode. Reliability predictions may be useful in dening reliability requirements and goals, and conversely, reliability requirements and goals may impact the selection of reliability prediction methods.
A single reliability goal may be sufcient for a simple system. However, for a complex system, it is useful to
allocate the system level reliability goal to lower levels, such as a subsystem, assembly, or component level.
These lower-level goals provide guidance for design engineers that are responsible for a specic portion of
the system, and provide reliability input to supplier specications. The way in which lower-level goals are
allocated depends on the metric. If the metric is classical reliability for a series system, then the product of
the reliabilities of lower-level units must be greater than or equal to the system goal. If the metric is constant
failure rate then the sum of the lower-level unit constant failure rates must be less than or equal to the system
goal.
5.1.2 System architecture
The system architecture consists of both the physical and logical system hierarchy structure. There are
several different levels in the system hierarchy structure, such as, component, assembly, and subsystem. A
component or assembly in one application may be a system in another application. The system hierarchy
from IEEE Std 1220-1998 [B4] is used throughout this document:

System
Product
Subsystem
Assembly
Component
Subassembly (optional)
Subcomponent
Part

19

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

The physical architecture denes the basic system hierarchy and the association of lower-level parts and
components with higher-level assemblies and systems. A set of bills of materials (BOM), also called indentured parts lists, is an example of a physical architecture. The level of physical architecture detail that is
available may inuence the selection of the reliability prediction method. Some reliability prediction methods require only indentured parts lists while others require the interconnection of the parts as well.
The logical architecture describes the partitioning of physical units into functional units and is often depicted
as a functional block diagram. The reliability prediction may include a transformation of the system logical
architecture into a reliability model such as a reliability block diagram or a fault tree. This reliability model
denes the system success/failure criteria and the level and type of redundancy. It may also identify the failure modes for components and/or assemblies and the effects of those failure modes on system operation.
Some systems have missions with multiple phases and different logical architectures for each phase. Therefore, the reliability model should dene all the different logical architectures for the system.
5.1.3 Operating environment
The operating environment consists of both the physical environment and the human interaction with the
system. The physical environment describes the operating conditions and loads under which the system
operates. It includes temperature, humidity, shock and vibration, voltage, radiation, power, contaminants,
and so forth. It also includes loads applied in packaging, handling, storage, and transportation. The human
interaction with the system includes human interfaces, skill level of the operators, opportunities for corrective and preventive maintenance, and so forth. If a system has a mission with multiple phases, the operating
environment for each phase needs to be identied. These data are used to estimate the reliability of the
system.
5.1.4 Operating prole
The operating prole (referred to as the mission prole by some industries) is a set of functional requirements that are expected to apply to the system during its operating life. A system may have multiple operating proles with different functional requirements and operating environments. For example, a reusable
spacecraft may have operational phases with different requirements and environments such as, launch, ight,
re-entry, and landing. Electronic equipment may have a non-operating prole (such as, shipping, transportation, and storage) as part of its operating prole. Sometimes system failures can occur during the transition
from one operating prole to another (see Jackson, T., Integration of Sneak Circuit Analysis with FMEA),
e.g., a reverse current path that initiates an unwanted function. The reliability prediction should incorporate
the operating prole for the system.
5.1.5 Failures, modes, mechanisms, and causes
IEEE Std 1413-1998 species the need to identify the failure site, failure mode, failure mechanism, and failure cause(s). The denitions of these terms are given in Clause 3. For every failure mode, there is an initiating failure mechanism or process that is itself initiated by a root cause. In general, the failure mode at one
level of the system becomes the failure cause for the next higher level. This bottom-up failure ow process
applies all the way to system-level, as illustrated in Figure 2. This is a generalized and simplied picture
assuming a linear hierarchical nature of connections between system elements. In reality, the source of a system failure can be at interactions between various system elements.
Failures can be described by their relation to failure precipitation.

Overstress failure: A failure that arises as a result of a single load (stress) condition. Examples of
load conditions that can cause overstress failures are shock, temperature extremes, and electrical
overstress, under-voltage input signals, mismatched switching speeds, and sneak paths.
Wearout failure: A failure that arises as a result of cumulative load (stress) conditions. Examples of
load conditions that cause cumulative damage are temperature cycling, abrasion and material aging.

20
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

System functional failure: A failure that arises as a result of an anomalous condition of the system
output. Examples of anomalous conditions that cause system functional failure are under-voltage
input signals, mismatched switching speeds, and sneak paths.

The root cause is the most basic causal factor or factors that, if corrected or removed, will prevent the recurrence of the failure (see ABS Group, Inc., Root Cause Analysis Handbook, A Guide to Effective Incident
Investigation). One of the purposes of determining the root cause(s) is to x the problem at its most basic
source so it doesnt occur again, even in other products, as opposed to merely xing a failure symptom.
Identifying root causes is the key to preventing similar occurrences in the future. Another purpose of determining the root cause(s) is to predict the probability of occurrence of the failure. Examples of sources of root
causes of failures are given as follows:

Design process induced failure causes: Such as design rule violations, design errors resulting from
overstressed parts, timing faults, reverse current paths, mechanical interference, software coding
errors, documentation or procedural errors, and non-tested or latent failures.
Manufacturing process induced failure causes: Such as workmanship defects caused by manual or
automatic assembly or rework operations, test errors, and test equipment faults.
Environment induced failure causes: Such as excessive operating temperature, humidity or vibration,
external electromagnetic threshold exceeded, and foreign object damage or mishandling damage.
Operator or maintenance induced failure causes: Such as operator errors, incorrectly calibrated
instruments, false system operating status, and maintenance errors or fault maintenance equipment.

--``,-`-`,,`,,`,`,,`---

Many methods are used for root cause analysis, including cause and effect diagram (Ishikawa diagram- shbone analysis), failure modes and effects analysis (FMEA), cause factor chart, fault tree analysis (FTA), and
Pareto chart. Readers are referred to the following references: ABS Group, Inc., Root Cause Analysis Handbook, A Guide to Effective Incident Investigation; Dew, John R., In Search of the Root Cause; Latino, R.
L., and Latino, K. C., Root Cause Analysis: Improving Performance for Bottom Line Results; Mobley, R. K.,
Root Cause Failure Analysis (Plant Engineering Maintenance Series); and Wilson, P. D., Dell, L. D., and
Anderson, G. F., Root Cause Analysis: A Tool for Total Quality Management; as well as EIA/JEP131 [B6]
and MIL-STD-1629A [B8], for more information on these methods. The reliability prediction process
should use these methods for improved product development.

21

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Cause
Part
Failure
Mechanism
Part

Mode
Cause
Component
Failure
Mechanism

Component

Mode
Cause
Assembly
Failure
Mechanism

Assembly

Mode
Cause
Subsystem
Failure
Mechanism

Subsystem

Mode
Cause
System
Failure
Mechanism
Mode

System

5.1.6 Engineering information quality


The quality of a reliability prediction is generally dependent on the quality of the available engineering
information. It is advantageous to determine the quality of the input data as early as possible in the planning
stages of the reliability prediction. The following is a list of characteristics that can be used to evaluate the
information quality:

Accuracy: Representing the data value with error bounds.


Applicability: Suitableness of data for its intended use.
Completeness: Degree to which all needed attributes are present in the data.
Consistency: Agreement or logical coherence among data that measure the same quantity.
Precision: Degree of exactness with which the data or error bounds are stated.
Relevance: Agreement or logical coherence of the data attributes to the quantity of interest.
Timeliness: Data item or multiple items that are provided at the time required or specied for the
data to be valid.
Trustworthiness: Degree of condence in the source of the data.
Uniqueness: Data values that are constrained to a set of distinct entries where each value is the only
one of its kind.

22
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Figure 2Failure Process Flow

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Utility: Degree to which the data is available in a form that facilitates using it in the manner desired.
Validity: Conformance of data values to known facts or sound reasoning.
Veriability: Degree to which the data can be independently veried.

5.2 Predictions based on eld data


Field data represents the actual reliability performance of an item in its actual operational environment.
Thus, a reliability prediction based on eld data is appropriate for an item already in service, (e.g. for logistics planning, warranty reserve, repair department sizing or future corrective action). Field data is also used
when comparing reliability predictions based on test data or analysis and the actual reliability performance
of the equipment. The type and quality of eld data can range from simple factory ship and return data to
sophisticated tracking of installation times, operating hours, and failure times for every unit in service. The
ideal data to use for an items reliability prediction is the eld reliability data for that item in the same operating environment. If some information is missing, similar items or similar environments may be found for
reliability predictions.
This subclause describes the collection and analysis of eld reliability data for reliability predictions. Subclause 5.2.1 describes types of eld reliability data, including approximations and adjustments to the data.
Subclause 5.2.2 describes eld reliability data collection. Field reliability data analysis is the subject of
5.2.3. Subclause 5.2.4 describes how to use eld reliability data from existing systems to predict the reliability of new designs. The use of non-failure eld reliability data, such as replacements or returns, is described
in 5.2.5. Subclause 5.2.6 contains an example of using eld reliability data to perform a reliability
prediction.
5.2.1 Field reliability data

--``,-`-`,,`,,`,`,,`---

Reliability predictions based on eld data require an estimate of the operating time before failure for failed
items and the accumulated operating time for all items that have not failed. This implies that three things are
known for each unit: 1) initial operation time, 2) life cycle history and operating prole (along with the operating environment), and 3) failure time (or current time if the item has not failed).9 Field reliability data usually consists of failed units with different failure times intermingled with non-failed units, all of which may
have had different installation times.10 In addition, the failures may be due to a number of different causes.
This situation is shown in Figure 3. In this gure, the time 0 is the starting time for the operation of unit 1.
Units 1 and 5 fail at different operating times due to failure cause X. Unit 3 fails due to failure cause Y. Units
2 and 4 are still operating at the current time although their initial operating times are different. Unit 4 also is
in a non-operating condition for a period of time as shown by the dashed line in the gure.

9If the aggregate operating time on installed units and number of failures are the only data available, then an exponential distribution
(constant failure rate) is the distribution that must be applied.
10This type of data is called multiply censored. Units that have not failed are often called suspended items. See Nelson [B10] for more
information on censored data.

23

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Unit
Failure cause

Non-operating period

Still operating at
current time

4
Failure cause Y

3
2

Failure cause

X
Current Time

0
Time

Figure 3Example: eld reliability data


5.2.1.1 Data approximations
The initial operating time, failure time, operating prole, and failure cause data shown in Figure 3 are critical
for accurate predictions, but may be unavailable, so approximations for that data may need to be made.
Example approximations are shown in Table 4. Approximations for initial operation time, such as installation time or shipment date, are events that occur before initial operation; so estimated delay times may be
required to account for the difference between the approximated time and the true time. Approximations for
failure time, such as return or replacement time are events that occur after the failure and may require estimated delay times. When shipment quantities or number of returns is used as approximations, shipment or
return dates, need to be assigned. For example, uniform shipment rates could be assumed (i.e., the same
number of items ship every day of the month).
Operating prole approximations may dene cyclical operational periods, possibly in different operating
environments, or may simply specify equipment operation some percentage of the time, e.g., continuous
operation is 100%. Data analysis may provide a statistical distribution for each failure cause or observed
failure mode, or may provide only a single statistical distribution. When a single distribution is utilized, the
implicit (but usually unstated) assumption is that there is a single failure cause or that a single distribution
can adequately represent the observed set of failure modes or underlying set of failure causes.
Table 4Example approximations for eld reliability data
Desired eld
reliability data
Initial operation time

24
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Alternative
approximate data

Data adjustments

Notes

Installation time

Delay from installation to initial


operation.

A vendor may know when they deliver and/


or install equipment but may not know
when the customer actually begins to
operate it.

Shipment date

Delay from shipment to initial


operation.

A vendor may know when they shipped a


unit but not when it is installed or initially
operated.

--``,-`-`,,`,,`,`,,`---

Not for Resale

Copyright 2003 IEEE. All rights reserved.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Table 4Example approximations for eld reliability data (continued)


Desired eld
reliability data

Failure time

Operating prole

--``,-`-`,,`,,`,`,,`---

Failure causes (or


modes)

Alternative
approximate data

Data adjustments

Notes

Shipment quantities

Estimated shipment
dates; delay from
shipment to initial
operation.

A vendor may know the number of units


shipped per month but may not track them
individually. This type of data should be
used only for constant failure rate
prediction.

Return or repair data

Some returns may


not be failures (see
5.2.1.2).

If failure analysis is not available, see 5.2.4


for non-failure metric calculations.

Return or replacement time

Delay from failure to


return or replacement.

A vendor will probably know when the


equipment is returned but may not know
the time it actually failed.

Number of returns

Estimated return
date and delay from
failure to return.

A vendor may know the number of units


returned per month but may not track them
individually. This type of data should be
used only for constant failure rate (or
return/replacement rate) prediction.

Number of failures
or returns for higherlevel items

Apportion higherlevel failures or


returns.

If failure analysis is not available down to


the level of the item being analyzed, it may
be necessary to apportion failures or
returns at higher levels in the system hierarchy to the system hierarchy level of the
item being analyzed.

Continuous
operation

This may be reasonable if equipment is left


powered on most of the time.

Standard operating
prole

Businesses may have standard operating


hours or there may be standard mission
proles available.

Duty cycles

Apply each duty


cycle to appropriate
percent of
population.

Electromechanical devices such as printers


often have specied duty cycles with different reliability predictions for each duty
cycle.

Combine all failure


causes

Assume a single statistical distribution


ts the data.

This is reasonable only if a single distribution can statistically represent the combination of all failure causes/modes.

The approximations in Table 4 must be applied with caution. They will cause statistical estimation of distribution parameters to be less precise because the data is approximated rather than actual. For example, the
estimated delay times in Table 4 are averages, so the initial operating time will be the installation time plus
an average delay rather than the actual delay for each unit. The variability in the input data may be understated by the approximations, so a distribution t to these data does not exactly represent the eld reliability.
On the other hand, eld reliability is often measured using the same approximations as the predictions.

25

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.2.1.2 Data adjustments

--``,-`-`,,`,,`,`,,`---

There are a number of adjustments that may have to be made to eld reliability data before using it in reliability predictions. If one of the approximations described in the previous subclause is used, there may be
adjustments to the initial operating time or to the failure time that need to be made, e.g., shipping delays. If
units are shipped to a spare parts inventory rather than an operational system, adjustments may need to be
made to shipping, installation time and quantity. In addition, there are a number of cases for which replacements or failures may be categorized and possibly discounted:

Dead on arrival (DOA): A unit that is unable to function immediately following installation.11
Although DOAs should be analyzed, they are often difcult to include in a reliability distribution
because their failure time is 0. Approximations can be made to t DOAs into the infant mortality
period, or different metrics such as DOA rate may be used to account for these types of failures as
manufacturing or shipping problems.
Physical or cosmetic damage: A unit that is returned because there is physical or cosmetic damage
rather than a functional failure. This may be indicative of a problem with packaging, handling, or
transportation. These units might be included in a reliability distribution if the damage causes a
functional failure or may be accounted for in other metrics.
No failure found (NFF): A unit that is returned but passes all the diagnostic tests. For a NFF it is
necessary to determine if the diagnostic tests are insufcient to reveal the failure, if a transient or
intermittent failure occurred, or if the unit really is fully operational (and probably should not have
been replaced). In the latter case, the unit might not be counted as a failure, and the units operational
hours could be included in a eld reliability calculation as a suspended item (see Annex A). Care
should be exercised in removing the item from the failure data since it is often very difcult to diagnose transient or intermittent failures. An NFF can also be treated as a failure in the diagnostics or
service manual documentation.
Inability to troubleshoot: An inability to determine the root cause of the failure. If multiple parts are
replaced in a single repair action, then care must be exercised to ensure that a single system-level
failure does not end up counting multiple times. For more information and examples, see Gullo, L.,
In-Service Reliability Assessment and Top-Down Approach Provides Alternative Reliability Prediction Method.

5.2.2 Field data collection


Regardless of the type or use of eld data, a eld failure tracking and reporting system along with a eld failure database is essential for providing eld data statistics. In addition to the failure reporting, records of
initial operating time, operating prole, operating environment, and failure time for each unit should be
stored in a database. An example of the type of failure information that needs to be kept is shown in Figure 4.
Data of maintenance actions, replacements, and returns should be kept in the Failure Reporting Database to
assist in predictions and to aid in corrective action. Replacements include functional restoration (e.g.,
switching to a backup assembly in a satellite). Returns include detailed failure event data used for diagnostics in lieu of having the failed item to examine. The failure causes in the Failure Reporting Database should
be as detailed as possible to allow future design analysis and corrective action as well as reliability predictions. The Failure Reporting Database is often part of a failure reporting and corrective action system
(FRACAS). It may also contain inspection and test failure data for analysis or predictions.

11The denition of DOA varies. For some items, it may be as simple as a power indicator turning on or not. For others, there may be a
complex set of tests that must be passed before the unit is declared operational. DOAs may also be extended to cover a time period, e.g.,
before the warranty starts or before the system is declared ready for customer use. This type of a measurement may also be called outof-box quality.

26
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Service Records/
Maintenance Requests
Non-hardware and
non-maintenance actions
(e.g., administrative, operator
comments)

Screen data
Hardware
maintenance actions
Analyze Maintenance
Actions

Non-replacement
(e.g., repair, realign, reboot)

Replacement
Analyze
Replacement

Non-returns
(e.g., throw-away, lost, inaccessible)
Returns
Possible non-failures
(e.g., NFF, damage)

Analyze Returns

Failures
Analyze Failures

Failure cause

Failure Reporting
Database

Figure 4Example of eld failure reporting database


5.2.3 Analysis of eld data
After the eld data has been collected, statistical analysis tools may be used to help determine trends and
identify problems. The rst step in any eld reliability data analysis is to plot the failure data. If individual
unit operating and failure times are available, the data can be used to determine the appropriate statistical
distribution.12 There may be several different statistical distributions that represent different failure causes
and modes within a single set of eld data. If failure analysis results are available and if sufcient data exists,
each failure mode and mechanism should be separately analyzed.13 Annex A contains examples of probability plots, hazard plots, and other methods for plotting data and determining the appropriate statistical
distribution.

12It

should never be assumed that the data follows an exponential distribution. Plots and goodness of t type tests can be used to determine if the exponential distribution is appropriate. However, if the units are not individually tracked, the exponential distribution is the
only statistical distribution that can be applied. Even when an exponential distribution is assumed, there may be a signicant amount of
data analysis and adjustment required to accurately plot eld reliability data.
13When analyzing modes and mechanisms separately, units that fail due to other failure modes and mechanisms may be treated as suspended items, i.e., items that are still operating at their time of failure.

27

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

It is important to correlate failures with manufacturing builds, process changes, and design changes. Design
and process changes are often made to improve reliability.14 Failure data and failure modes and mechanism
can be correlated with:

Process changes: To help identify process-induced failures.


Manufacturing builds: To help identify bad lots, problems with specic date codes, or manufacturing
process-induced failures. Noteit may be possible to approximate manufacturing builds by using
month of manufacture (or day, week, etc.).
Design changes: To demonstrate the effect on reliability of design modications.
Operating environment/prole changes: To help identify environment-induced failures or operations/
maintenance-induced failures.
Time: To help identify infant mortality and wear out and specic patterns such as seasonal variations.

It may also show that a perceived reliability distribution is really a combination of different reliability distributions. For example, the plot on the left of Figure 5 shows the hazard rate of an entire eld population over
time. In this picture, it appears that the hazard rate remains constant. The picture to the right shows that this
apparent hazard rate behavior is an artifact of the data and there actually are three different hazard rate
curves for three different populations of about the same size that fail by three different failure modes and
mechanisms.

Hazard rate

Hazard rate

--``,-`-`,,`,,`,`,,`---

h1
h3

h2
Time

Time

Figure 5Example hazard rate based on eld data


5.2.4 Similarity analysis
There are many ways in which a new system may be similar to an in-service system. For example, when the
new system is compared with an in-service system, it may have a minor design change, have similar technology, and similar operating environment. These similarities permit comparisons that can be used to develop a
reliability estimate. When there are similarities between new and in-service systems, reliability may be
assessed based on eld data for the in-service systems, comparisons of items in the new system with similar
items in the in-service systems and other prediction methods for unique items in the new system.
5.2.4.1 Similarity analysis process
The similarity analysis process is shown in Figure 6, there are six steps:

14The

Step 1. Select an in-service item that has similarities with the item of interest. Determine the items
that have sufcient similarities with existing items to make them candidates for similarity analysis.
This includes an examination of the physical and functional characteristics of the items. The appropriate system hierarchy level at which to make a comparison of new and in-service items is selected
reliability distributions derived from a series of reliability improvements can be used for reliability growth modeling.

28
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

in this step. This may vary depending on the item and the engineering information available. A close
design and operational similarity will improve reliability prediction accuracy.
Step 2. Analyze failure modes, mechanisms, and root causes of new and in-service items. After examining and comparing the engineering information available for the new and in-service products, the
next step is to dene and compare the failure modes, mechanisms and root causes for the selected
new and in-service items. This information may come from an FMEA or other similar analysis. The
level of analysis detail depends on the available engineering information. New item failure modes
with low criticality may be aggregated or approximated. The mechanisms and root causes for new
item failure modes with high criticality should be examined in detail. Failure modes, mechanisms,
and causes that are not similar between the new and in-service items are followed by Step 3, while
failure modes/mechanisms/causes that are similar between the new and in-service items are followed by Step 4.
Step 3. Select appropriate reliability prediction method. For failure modes, mechanisms and root
causes that are not similar between the new and in-service items, similarity analysis does not apply,
and one of the other reliability prediction methods described in this guide may be applied.
Step 4. Determine eld reliability prediction of new and in-service items. For the failure modes,
mechanisms and root causes that are similar between the new and in-service items, a eld reliability
prediction is performed for the in-service item. If the failure modes, mechanisms, and root causes
are identical, then the eld reliability prediction for the in-service item may be used for the eld reliability prediction for the new item. If they are similar but not identical, the eld reliability prediction
may be adjusted as described in Step 5.
Step 5. Adjust eld reliability prediction based on similarity between new and in-service items. This
step distinguishes similarity analysis from other prediction methods and is described in 5.2.4.2.
Step 6. Combine reliability predictions to create new item reliability prediction. In this step, the reliability predictions from similarity analysis are combined with the reliability predictions from other
methods.

1. Select in-service item that has


similarities w ith new item of interest

2. Analyze failure
modes/mechanisms/causes of new and
in-service items

For non-similar failure


modes/mechanisms/causes

For similar failure


modes/mechanisms/causes

3. Select appropriate reliability


prediction method

4. Determine field reliability prediction


of new and in -service items

5. Adjust field reliability prediction


based on similarity between new and in service items

6. Combine reliability predictions to


create new item reliability prediction

Figure 6Similarity analysis process ow

29

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.2.4.2 In-service item reliability prediction adjustment


The distinguishing characteristic of similarity analysis is adjusting the in-service item reliability prediction
to account for the differences between the in-service item and the new item. A failure mode in an in-service
item that is eliminated or reduced in frequency in a new item has the effect of reducing the failure probability
of the new item. Table 5, which is a modied version of the Generic Failure Modes Table found in the IEC
FMEA Standard (see IEC 812 [B2]), gives examples of factors that should cause the reliability for a new
item to increase in comparison to an in-service item. The reliability should decrease if the factors were opposite to those shown in Table 5.
Table 5Failure causes and increased reliability
Characteristic of new item that should increase
reliability in comparison to an in-service item

Failure cause
Contamination

Fewer parts, fewer foreign objects, or better processing.

Mistimed command

Less complexity in timing circuitry or software commands.

Excessive vibration

Sturdier mounting, fewer parts, or less volatile environment.

Open (electrical)

Lower powered circuitry.

Short (electrical)

Less dense circuit boards.

Intermittent operation

Less intense or less frequent electrical transients.

Over-temperature

Less heat-sensitivity, better insulation, or cooler operating environment.

Excessive temperature cycling

Lower number of temperature cycles.

Unwanted functions

More testing or greater testability.

5.2.5 Reliability prediction for non-failure metrics


The term non-failure metrics is used to refer to the metrics such as service call rate or part replacement
rate described in 4.1.4.1. Therefore, these predictions are usually based on eld experience. The eld data
can be used either directly or in combination with failure distributions derived from the other prediction
methods, to predict the metrics. The use of eld data and similarity analysis as described in the previous subclause still applies to non-failure metrics. Note that eld data may be available for metrics such as warranty
return rate, service call rate, or part replacement rate, so it may be possible to use eld data for non-failure
metrics either directly or via similarity analysis.
The following discussion is an example of predicting some non-failure metrics given a Weibull failure
distribution.
Figure 7 shows examples of rate metrics: return rate, warranty claim rate, replacement rate, corrective maintenance rate, and hardware problem call rate. Based on a given failure distribution, an example prediction for
each of these metrics is derived as follows. The failure distribution could come from eld data or any of the
other methods described in this guide.

30
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Return rate: Predicted using the failure distribution and repair depot and logistics data. From repair
depot data, the No Failure Found (NFF) rate for an item is determined to be 50%. However, from
logistics data, only 90% of defective items are returned (the others could be lost in transit or damaged and deemed unrepairable), so predicted return rate = 0.9 x predicted hazard rate/(1-0.5) = 1.8 x
predicted hazard rate. Note that if items were not returned because of customers that do not have service contracts or use other service providers, this would need to be factored in.
Warranty claim rate: A prediction for a warranty claim rate depends on how the warranty works. A
vendor may choose to warranty all maintenance actions, all replacements, only returns (meaning a
replacement has to be returned for warranty credit), or only failures (meaning they charge for returns
that are no defect found). A warranty usually only covers a period of time after product shipment or
receipt by the customer, e.g., a 1-year warranty. Assume that a warranty is based on returns as shown
in Figure 7, and the warranty period is 1 year. Then the predicted warranty claim rate is the same as
the predicted return rate (1.8 x predicted hazard rate from above) for 1 year. If the failure distribution
predicted 0.05 failures for the rst year, the warranty claim rate would be 0.09 warranty claims (1.8
x 0.05) for the rst year, usually quoted as 9% for warranty reserve.
Replacement rate: Predicted using the failure distribution and repair depot data. Based on the No
Failure Found (NFF) rate of 50%, predicted replacement rate = predicted hazard rate/(10.5) = 2.0 x
predicted hazard rate.
Corrective maintenance rate: Predicted using the replacement rate prediction and eld support data.
From eld support data, 70% of corrective maintenance actions are single part replacements, 10%
are double part replacements, and 20% are adjustments (no part replacement). Therefore, the ratio
between corrective maintenance actions and replacements is 0.9 (1x70% + 2x10% + 0x20%), and
predicted corrective maintenance rate = predicted part replacement rate/0.9 = 2.2 x predicted hazard
rate.
Hardware problem call rate: Predicted using the corrective maintenance rate prediction and call center data. Corrective maintenance actions are determined from call center data to be 80% of hardware
problem calls. Therefore, predicted hardware problem call rate = predicted corrective maintenance
rate/0.8 = 2.75 x predicted hazard rate.

Return
Rate

Verified
Failures

No Failure
Found

Warranty
Claim Rate

Verified
Failures

No Failure
Found

Replacement Rate

Verified
Failures

No Failure ThrowFound
away Item

Corrective Maintenance Rate

Verified
Failures

No Failure ThrowFound
away Item

Adjust,
align, etc.

HW Problem
Call Rate

Verified
Failures

No Failure ThrowFound
away Item

Adjust,
align, etc.

Not a problem,
not HW, etc.

Figure 7Example non-failure metrics

31

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Note: Numbers in the preceding examples are for illustrative purposes only.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.2.6 Example of reliability prediction using eld data and similarity analysis
There are many ways to use eld data for reliability predictions of similar items and many different ways to
adjust and combine reliability predictions (see Gullo, L., In-Service Reliability Assessment and Top-Down
Approach Provides Alternative Reliability Prediction Method; Johnson, B. G. and Gullo, L., Improvements in Reliability Assessment and Prediction Methodology; and Alvarez, M. and Jackson, T., Quantifying the Effects of Commercial Processes on Availability of Small Manned-Spacecraft). Constant failure
rates for the components and subassemblies the rates can simply be summed together, and rates derived by
one method can be simply adjusted by multiplicative factors derived from other methods. The remainder of
this subclause provides an example of using eld data to adjust predicted constant failure rates.
In this example, an initial reliability prediction from a handbook or other method is updated using eld data
and similarity analysis. The process for combining constant failure rate prediction methods is shown in Figure 8 (see Elerath, J., Wood, A., Christiansen, D., and Hurst-Hopf, M., Reliability Management and Engineering in a Commercial Computer Environment). The steps in the process are as follows:
--``,-`-`,,`,,`,`,,`---

a)
b)

c)

d)

e)

New reliability prediction: This is created using one of the methods described in 5.3 through 5.5.
Determine eld/prediction factors: These factors are developed from ratios of previous product
predictions to previous product eld reliability. In this example, the new circuit board predicted
constant failure rate = 12,556 FITs = 0.110 failures/year from a constant failure rate handbook. The
latest 6 months of eld data for a similar board is 8,760,000 hours and 10 removals = 0.010 failures/
year. The similar boards predicted constant failure rate from a handbook was 6,283 FITs (0.055 failures/year); therefore the eld/prediction factor = 0.010/0.055 = 0.182.
Updated reliability prediction: This is created by combining the information from steps a) and b).
Updated reliability prediction is created by multiplying the handbook prediction from step a) by the
eld/prediction factor from step 2 to get 0.110 x 0.182 = 0.020 failures/year. For this product, the
prediction did not meet the goal of 0.010 failures/year, so process improvements were dened to
improve the reliability.
Quantify characteristic differences: Using Similarity Analysis [steps c) through e)], determine the
reliability impact of process changes. The distribution of the failure causes for the previous product
is as follows: assembly40%, solder12%, and components48% (distributed between DRAM,
microprocessor, ASIC, SRAM, and miscellaneous components). Manufacturing process improvements were dened, and it was estimated that these improvements would reduce assembly errors and
solder defects by a factor of 3.
Updated reliability prediction: This is created by combining the information from steps c) and d). It
is possible to continue this process by updating the prediction with test data and eld data.

Similarity
Analysis

Previous Product
Field Data

4) Quantify Charac teristic Differences

2) Determi ne Field/
Prediction Factor

1) New Reliabil ity Prediction

3) Updated
Prediction

5) Updated
Prediction

Figure 8Example reliability prediction based on eld data

32
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Table 6 shows all the reliability prediction calculations. The new board is assumed to have the same failure
cause distribution as the old board, so the predicted constant failure rate is allocated accordingly. The
updated reliability prediction is derived by multiplying by the eld/prediction factor as described in step c).
This reliability prediction is then adjusted to account for the process improvements described in step d).
With the process improvement, the new board just meets the goal 0.01 failure per year.
Table 6Constant failure rate prediction example
%
Allocation
per failure
cause

Constant
failure rate
per failure
cause

Previous
product eld/
prediction
factor

Updated
reliability
prediction

Process
improvement
factor

Failure rate
prediction
with process
improvement

Assembly defects

40%

0.0440

0.182

0.0080

0.33 (3x)

0.0026

Misc. components

22%

0.0242

0.182

0.0044

1.00

0.0044

Solder defects

12%

0.0132

0.182

0.0024

0.33 (3x)

0.0008

DRAM

10%

0.0110

0.182

0.0020

1.00

0.0020

Microprocessor

8%

0.0088

0.182

0.0016

1.00

0.0016

ASIC

4%

0.0044

0.182

0.0008

1.00

0.0008

SRAM

4%

0.0044

0.182

0.0008

1.00

0.0008

100%

0.1100

0.182

0.0200

0.50 (2x)

0.0100

Failure cause

Board totals
Board goal

0.01

5.3 Predictions based on test data


The benets of reliability predictions based on test data are that they include actual equipment operational
experience (albeit in a test environment), and the time required to observe failures can be accelerated to
increase the amount of data available. Test data can be used in combination with or as a validation of other
methods.
One of the most critical aspects of all reliability tests is careful planning. Tests may be constructed so that
they either demonstrate a specic reliability at a specic condence level or generate valid test hours for
general data accumulation. Tests are often conducted to determine or demonstrate the reliability at the component, assembly, subsystem, or system level. Reliability test data at lower levels may be combined to infer
reliability at the next higher system hierarchy, if failure results from interaction are negligible. The value of
test data depends on how well the test environment can be related to the actual use environment. A test
should be conducted in a typical operating environment to include failures from sources such as human
intervention, thermal environment, electro-magnetic disturbances, humidity; and to avoid other failures that
are not typical of the operating environment.
Subclause 5.3.1 describes some test considerations that apply to all reliability related tests. Subclauses 5.3.2
and 5.3.3 provide guidance for using data from non-accelerated and accelerated life tests, respectively, in
reliability predictions. Subclause 5.3.4 provides an example of merging test data and damage simulation
results.

33

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.3.1 Test data considerations


A structured system for collection and storage of data gathered during any test phase is highly desirable. The
database should include test start and stop times and dates, as well as test environmental conditions, transients, transient durations, unit responses, etc. If a failure occurs, the database should also store the results of
the root cause analysis and identify corrective actions and design changes. This information is useful when
determining whether a failure occurred, whether or not it is chargeable and the test time and conditions prior
to failure. At an absolute minimum, the database should contain individual unit test and failure times. The
test time should not be aggregated until the analysis of the data conrms that the failure distribution will permit it.
Test times may be collected at the assembly level and used to determine failure distributions, hazard rates or
reliability metrics for constituent elements that comprise the assembly. For example, tests of a computer
assembly may be broken down to the capacitor, resistor, microprocessor, driver, etc., and put in the hazard
rate database for future predictions at the assembly level.
Some failures may be excluded from the results when analyzing the test data. However, exclusion should be
done only after rigorous failure analysis of the failed unit under test is completed and the failure cause can
truly be ascribed to the test xture (hardware), test software or environmental conditions that will not be
present in the actual use environment. Some examples that may justify exclusion include:

--``,-`-`,,`,,`,`,,`---

Test xture failure: The test xture can make the test unit appear to have failed. For example, if a
wire or connector in the xture fails, signals to or from the test unit or power to the test unit may be
lost. If the power supply in the xture loses regulation, the test xture may subject the unit to voltages outside the design limits, causing the unit to fail.
Runaway temperature: The unit under test may experience temperatures outside the test limits, the
unit under test may go into a runaway (high) thermal condition and damage the unit.
Physical damage due to overstress: If the unit is subject to an overstress condition, it may stop functioning properly. Shock, excessive vibration, high levels of electrostatic discharge (ESD), and
voltage are possible overstress conditions.
Data recording storage system out of calibration: If a data recorder is used, it may go out of calibration causing an erroneous signal that implies the unit under test has failed. If the test unit is
connected to a controller using active feedback, the controller may provide (correct) responses that
induce failure. Alternatively, the controller itself may be misprogrammed, causing conditions that
exceed the test limits of the unit under test and cause failure.

Multiple failures due to the same cause or exhibiting the same mode must not be consolidated. That is, multiple failures due to the same single cause or exhibiting the same single mode or mechanism must all be
counted as separate individual failures and not counted as a single failure. The (erroneous) rationale for consolidating failures is that there is only one underlying cause so it should be counted as only one failure. For
example, in the testing of Winchester disk drives, thermal asperities are a signicant failure mode. If there
were 10 thermal asperities in a reliability demonstration test, they should all be counted separately, resulting
in 10 failures. They should not be consolidated and counted as only one failure.
5.3.2 Non-accelerated test data
Non-accelerated tests are conducted at nominal load (stress) condition (e.g., temperatures, power, humidity) within the specication bounds. In these tests, there is no attempt to relate the test temperature, humidity,
voltage, or other environmental stimulus to an additional stress level that will increase the hazard rate,
thereby reducing the test time.

34
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

When analyzing test data, it is important to determine a distribution that provides a good t to the data and is
representative of the failure mechanism type. A good t is important so that the distribution parameters can
be used to extrapolate behavior, or predict the reliability, beyond the period over which the data was generated.15 If the data does not t any of the more commonly used distributions (see 4.1.3), the ability to predict
is severely limited.16
Reliability demonstration tests (RDT) tests stay within the design specication environments and may be
performed on pre-production or regular production level products. Demonstration tests are often done only
once or twice (if the rst was unsuccessful) before or near the beginning of production. In a reliability demonstration test, a statistical basis is identied and documented in a test plan.
Some guidance is provided in some publications for setting up a time terminated, failure terminated or accelerated life test for various distribution.17 If there is more than one unknown distributional parameter, interpreting the test is very difcult and requires simulation or prior knowledge. An alternative is to simply run a
large quantity of systems to accumulate sufcient failures that a distribution can be t. Then, determine the
parameters of the distribution and calculate (predict) the probability of failure for the time interval of interest. In essence, making statements about product reliability for a time interval beyond the test interval is a
prediction based on test data.
5.3.2.1 Manufacturing tests

Generally, burn-in and run-in tests are performed on 100% of the production line. They are essentially
screening tests, designed to remove early failures and marginal product from the production line. Run-in
tests are most often conducted at the nominal operating temperature, whereas burn-in tests may be conducted at an elevated temperature, thus making it an accelerated test. If a failure distribution is tted to the
test data, then the probability of surviving time intervals beyond the burn-in or run-in times can be estimated
and used in a prediction, assuming that the failure distribution does not change.
ORT is usually conducted in the manufacturing facility, testing units from the current production line. The
test is to assess whether there are any signicant changes in product quality or reliability. ORT will usually
divert a fraction (sample) of the production on a periodic basis for testing. The test usually runs longer than
either the burn-in or run-in tests, but still short enough so the product is still considered new. The failure
modes being sought should be identied and the test constructed to stress the weaknesses of the product for
those modes. If the test is too benign, the test results may be overly optimistic. However, if the test environment is too severe, it may be become an accelerated test rather than an ORT. The ORT data collected and
accumulated is usually the number of failures and total number of hours. If time-to-failure data is collected,
the data can be used to determine the underlying failure distribution. The environmental conditions during
15For example, suppose the time to failure is available on 1000 units that were run for 1000 hours each under representative environments and conditions. Further assume that the data is plotted on lognormal hazard paper using the techniques shown in Annex A and
that the distributions parameters, log-mean and log-standard deviation, are calculated. The cumulative probability of failure (unreliability) can be determined for any time interval, even those beyond the 1000-hour test time if the underlying causes and mechanisms of
failure remain the same and it is reasonable to assume the log-normal distribution is still appropriate.
16Assume the same 1000 units and 1000 hours of test time data is available, as previously discussed. If the distribution is unknown, a
cumulative probability of failure can be determined for the rst 1000 hours, but because no unit was tested beyond 1000 hours and the
distribution is unknown, the expected behavior beyond 1000 hours is unknown. Therefore, estimates of reliability within the rst 1000
hours can be made but predictions of failure (probability of failure) beyond 1000 hours cannot. This latter situation severely limits the
ability to predict (extrapolate) reliability at a future point in time.
17If an exponential failure distribution is assumed, the test can be time terminated or failure terminated and based on the chi-squared
statistic. The details of this are documented in numerous reliability texts. For these tests, it is important to make sure that the underlying
failure distribution really has a constant failure rate and that early life failures have been resolved prior to starting the test. Attempts to
demonstrate very low constant failure rates (high MTBFs) may not be practical with these tests.

35

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Several types of data from manufacturing processes can be used for predicting reliability of the same or similar units. The underlying failure distribution must be carefully considered when using this data. Burn-in,
run-in, and on-going reliability tests (ORT) are often sources for non-accelerated test data.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

--``,-`-`,,`,,`,`,,`---

test are usually the same as for product use, so there is no acceleration factor. It is most common to perform
ORT at non-accelerated conditions. However, if the test is conducted at elevated temperatures or is otherwise
accelerated, acceleration factors may be employed per 5.3.3. Table 7 shows an example of ORT where a rolling set of 80 units are kept under test.
Table 7Example of ORT testingin this test, 20 samples are swapped with new ones every
four weeks, keeping the maximum number under test at eighty
Product taken from
production in:

QTY on test
Week 1

Week 2

Week 3

Week 4

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

Week 1
Week 2
Week 3
Week 4
Week 5

Week 5

Week 6
Cumulative hours

Week 6

20
20

40

60

80

80

80

5.3.2.2 Actual usage tests


Testing units in their actual use environment is an excellent source of reliability data for a prediction. These
tests are often conducted by the company that designed the system (alpha tests) or at a friendly customer site
(beta tests). In either case, it must be understood that the reliability may be lower than desired if the product
has not been released for production. In the computer industry, for example, a manufacturer may build and
use their own computers to run their internal e-mail. During this test, failures are tracked, analyzed, and corrected. The data can be used later in calculating a constant failure rate or other reliability metric.
Alpha and beta tests usually occur just before a system is ready to ship. Therefore, alpha and beta test data is
probably the most applicable to reliability predictions because it is most representative of the nal system.
Often, failures found in alpha and beta tests are corrected before full production begins, in which case the
failures found in these tests may be excluded from the data used to create a prediction.
5.3.3 Accelerated tests
The purpose of accelerated testing is to verify the life-cycle reliability of the product within a short period or
time. Thus, the goal in accelerated testing is to accelerate the damage accumulation rate for relevant wearout
failure mechanisms (a relevant failure mechanism is one that is expected to occur under life-cycle conditions). The extent of acceleration, usually termed the acceleration factor, is dened as the ratio of the life
under life-cycle conditions to that under the accelerated test conditions. This acceleration factor is needed to
quantitatively extrapolate reliability metrics (such as time-to-failure and hazard rates) from the accelerated
environment to the usage environment, with some reasonable degree of assurance. The acceleration factor
depends on the hardware parameters (e.g., material properties, product architecture) of the unit under test
(UUT), life-cycle stress conditions, accelerated stress test conditions, and the relevant failure mechanism.
Thus, each relevant wearout failure mechanism in the UUT has its own acceleration factor and the test
conditions (e.g., duty cycle, stress levels, stress history, test duration) must be tailored based on these acceleration factors.

36
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Accelerated life tests attempt to compress the time it takes to observe failures. In some cases, it is possible to
do this without actually changing the hazard rate. However, if the hazard function changes, it is termed a
proportional hazard model. Mathematically, the differences between these two can be seen in the following two equations for a Weibull distribution in which HAL(t) is the cumulative hazard function for accelerated life, HPH(t) is the cumulative hazard function for the proportional hazard model, AF is an acceleration
factor due to some sort of stimulus and (t/) is the unmodied cumulative hazard rate for a Weibull distribution (t = time, _= characteristic life and _= shape parameter).
AF t
H AL ( t ) = ------------

t
H PH ( t ) = AF ---

In HAL(t), time is a linear function of the acceleration function. In HPH(t), the hazard function itself is being
modied. By rearranging the equation for HPH(t), it can be seen that time is a non-linear function of the AF.
That is, time is multiplied by (AF)1/. The difference between these two types of accelerated tests is that
HAL(t) requires knowledge only of the ratio of the actual test time to calendar time (non-accelerated time)
caused by the applied environmental stimulus whereas HPH(t) requires knowledge of the manner in which
the AF changes as a function of the parameter . Both of these are discussed in detail in Leemis [B7]. For the
Weibull distribution, of which the exponential is a special case, the resultant distribution for either of these
two conditions is still a Weibull distribution.
The two most common forms of accelerated life testing are 1) eliminating dead-time by compressing the
duty cycle and 2) reducing the time-to-failure by increasing the stress levels to beyond what is expected in
the life-cycle. The interested readers are referred to the following references: Nelson, Wayne, Accelerated
Testing; Jensen, Finn, Electronic Component Reliability; and Lall, P., Pecht, M., and Hakim, E. B., Inuence
of Temperature on Microelectronics and System Reliability: A Physics of Failure Approach.
Dead-time elimination is accomplished by compressing the duty cycle. A good example of duty cycle
compression is when a test runs for 24 hours per day, whereas the product in its actual use environment runs
for only 8 hours per day. This results in a time compression of 3. Each day of test time is equal to 3 days of
actual use time. Test data analysis must account for failure modes or mechanisms introduced to reduce dead
time. For example, a ball bearing designed for intermittent use may fail due to fretting in its actual expected
use environment. However, if an accelerated test is developed that uses the bearing 100% of the time, the
fretting mode may not be found due to the lack of sufcient corrosion. Furthermore, new failure modes may
be introduced due to increased levels of heat created during the test.
Accelerated stress tests of the second type can be run by enhancing a variety of loads such as thermal loads
(e.g.,. temperature, temperature cycling, and rates of temperature change), chemical loads (e.g.,. humidity,
corrosive chemicals like acids and salt); electrical loads (e.g., steady-state or transient voltage, current,
power); and mechanical loads (e.g., quasi-static cyclic mechanical deformations, vibration, and shock/
impulse/impact). The accelerated environment may include a combination of these loads. Interpretation of
results for combined loads and extrapolation of the results to the life-cycle conditions requires a quantitative
understanding of the relative interactions of the different test stresses and the contribution of each stress type
to the overall damage. The stress and damage method discussed in 5.4 provides a basis to interpret the test
results.
5.3.3.1 Example
There are numerous models relating the accelerated life to steady state temperature. The Arrhenius relationship is an example used here for illustrative purposes. The relationship is as follows:

37

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

E
1
1
AF = exp -----A- ------ ------
k T1 T2
where
AF
EA

is the acceleration factor


is the activation energy

k
T1
T2

is Boltzmans constant = 8.716x105 eV/K


is the base temperature, the temperature of the expected use environment, in K
is the accelerated temperature, which is usually greater than T1, in K

There is no substitute for experimentally determining the activation energy. However, obtaining this information requires conducting at least two sets of tests at different temperatures. In the absence of experimental
data, many are available in the public literature. Be sure to use the approximation that best applies to the
anticipated failure mechanism. Table 8 gives examples of other acceleration factor models.
--``,-`-`,,`,,`,`,,`---

Table 8Examples of models that can be used to derive acceleration factors


Cofn-Manson inverse power law
AF = Nuse / Ntest =
AF = Nuse / Ntest =

[Ttest/ Tuse]
[Gtest/ Guse]

Simplied acceleration factor for temperature cycling fatigue testing


Acceleration factor for vibration based on Grms for similar product
responses. Do not use when product responses are different from one level
to the next.

Rudra inverse power law model for conductive lament formation failures in printed wiring board
tf = a f (1000Leff)n / Vm (MMt)

tf = Time to failure (hours)


a = Filament formation acceleration factor
f = Multiplayer correction factor
Leff = Effective length between conductors (inches)
V = Voltage (DC)
M = Moisture absorbed
Mt = Threshold moisture
n = Geometry acceleration factor
m = Voltage acceleration factor
(Note: No temperature dependence up to 50C; most CFF occurs below
50C)

Pecks model for temperature-humidity (Eyring form)


AF = (Muse / Mtest)
Tuse)(1/Ttest)}]

exp[(Ea/k){(1/

AF = Acceleration factor
Muse = Moisture level in service
Mtest = Moisture level in test
Tuse = Temperature in service use, K
Ttest = Temperature in test, K
Ea = Activation energy for damage mechanism and material
k = Boltzmans constant = 8.617 * 105 eV/K
n = A material constant
(For aluminum corrosion, n=3 and Ea = 0.90)

Kemeny Model for accelerated voltage testing


Constant Failure Rate = [exp(C0Ea/
kTj] [expC1(Vcb/Vcbmax)]

Tj = Junction temperature
Vcb = Collector-base voltage
Vcbmax = Maximum collector-base voltage before breakdown
Ea = Activation energy for damage mechanism and material
k = Boltzmans constant = 8.617 * 105 eV/ K
C0, C1 = Material constant

38
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

5.3.3.2 Cautions on accelerated tests


Accelerated testing should begin by identifying all the possible overstress and wearout failure mechanisms
under operating environment. The load parameters that most directly cause the time-dependent failure
should be selected as the acceleration parameters. When increasing the stress levels, care should be taken to
avoid overstress failure mechanisms and non-representative material behavior.
Failure due to a particular mechanism can be induced by several acceleration parameters. For example,
corrosion can be accelerated by both temperature and humidity; and creep can be accelerated by both
mechanical stress and temperature. Furthermore, a single acceleration stress can induce failure by several
wearout mechanisms simultaneously. For example, temperature can accelerate wearout damage accumulation not only by electromigration, but also by corrosion, creep, and so on. Failure mechanisms that dominate
under usual operating conditions may lose their dominance as the stress is elevated. Conversely, failure
mechanisms that are dormant under normal use conditions may contribute to device failure under accelerated conditions. Thus, accelerated tests require careful planning in order to represent the actual usage
environments and operating conditions without introducing extraneous failure mechanisms or non-representative physical or material behavior.
In order for an accelerated test to be meaningful, it should not only duplicate the failure mechanisms
expected under life cycle conditions, but it should also be possible to estimate the corresponding acceleration factors. Without acceleration factors, there is no basis for estimating the meaning or relevance of the
test. The stress and damage method discussed in 5.4 provides the basis for determining acceleration factors,
determining the test period and load levels, as well as the relevance of individual failures.
When conducting the accelerated testing, stress sensors should be used in key locations (preferably close to
expected failure sites) so that the response of the UUT to the test environment can be quantitatively
recorded. The same sensor can be used to verify the specimen response at the same location under life-cycle
loading conditions. These responses must be quantitatively known either through sensor-based measurements or based on computer simulations, in order to obtain an accurate estimate of the acceleration factors.
For example, when conducting accelerated vibration tests of circuit card assemblies, accelerometers and
strain gages should be used at key locations to measure the in-plane accelerations and out-of-plane exure
both under accelerated excitation as well as under life-cycle vibration levels. Furthermore, the spectral
content of the life-cycle vibration condition should be approximately preserved during the accelerated stress
test. Suitable physics-of-failure models (e.g., fatigue models) should then be used to estimate the acceleration factors at the critical failure sites that experience fatigue failures.
Many wearout failure mechanisms are manifested initially as intermittent functional failures. Therefore, failure monitoring during accelerated testing should be conducted in real-time while the product is being
stressed. Otherwise, the initiation of failure can often be missed and the results will contain non-conservative
errors.
5.3.3.3 Overstress tests
Over-stress tests (OST) or highly accelerated life tests (HALT) are conducted during the design process with
the intent to stress the product to failure, learn as much as possible about design robustness and weakness
through rigorous root cause analyses, then redesign the product to improve resistance to the environments.
The purpose of the test is to identify design weaknesses that, due to variability, will eventually show up as
failures when larger quantities of the product are used within the design bounds. There is no attempt to predict reliability.
OST and HALT usually include some combination of temperature, vibration, voltage margining, humidity,
and on-off power cycling. The initial stress and order of increase or decrease for the various stress levels
(stress proles) are product dependent. Digital electronics may need a different prole than analog devices
or inductive devices such as motors. A common process is to begin well within the design envelope and

39

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

cycle temperature from high to low while simultaneously inducing 6-axis vibration. Additionally, at various
times within the cycle, electric power is turned off and then back on. The temperature extremes and vibration
levels are gradually increased until a failure occurs. This type of test is most effective when conducted on
assemblies, not individual electronic parts, such as ICs. OST and HALT tests are sound reliability engineering practices directed at improving product reliability. However, since the relationship between the stress
level and the hazard rate are unknown, the time to failure should not be used as a source of data for a prediction. Certainly, the failures from these tests should not be extrapolated back down to operating conditions to
determine reliability within the specications.
5.3.4 Example of reliability assessment from accelerated test data
This subclause provides an example of assessing durability of Chip Scale Package (CSP) interconnects
under quasi-static bending loads caused by keys on a keypad being pressed in a handheld electronic product.
The details of this study can be found in Shetty, S., Lehtinen V., Dasgupta, A., Halkola, V., and Reinikainen,
T., Fatigue of Chip-Scale Package Interconnects due to Cyclic Bending, and only the key features of this
study are summarized here for illustration. The hardware conguration is shown in Figure 9. The life cycle
application condition involves a deection of 0.1 mm of the printed circuit board (PCB) under the keypad.
The loading history is assumed to be a triangular ramp-up and ramp-down over a 1-second period. Taking
the life cycle usage prole (duty cycle) of the equipment into consideration, it is estimated that this bending
cycle occurs about 21 times each day.

Hardware configuration: Cross-sectional view of the CSP assembly


Copper and insulator

Die attach

Molding compound

Copper
bondpads

PCB

Solder
interconnect

Compliant interposer
(polyamide)

Figure 9Hardware conguration of the CSP assembly


The accelerated test is intended to verify the life-cycle durability of the solder joints that connect the CSP to
the printed circuit board. The failure mechanism of interest is fatigue of the interconnect solder joints due to
the cyclic bending of the PCB. The test planning, execution and data analysis was done in tandem with stress
and damage analysis in accordance with the caveats described in 5.3.3.1. To accelerate the fatigue of the solder joints, the amplitude of the bending deformation is increased in the test. First, the overstress limits for
bending are determined. The overstress limits can arise either due to instantaneous fracture of the solder
interconnect or due to a failure mechanism other than cyclic solder fatigue (examples include bond-pad
delamination, failures in the CSP, or failures in the PCB), or due to any non-representative material response.
A test specimen, consisting of 33 daisy-chained CSPs per circuit card, was fabricated. All the interconnects
were monitored throughout the test sequence to check for intermittent opens, by using the daisy-chains and
suitable connectors. A 3-point bend test setup was created to apply bending load in a quasi-static manner.
Based on an overstress step-test, it was determined that a maximum displacement of 15 mm could be applied
at the center of the test specimen, over a period of one second. A fatigue bending test was performed and
failure data was recorded. A failure analysis was done to ensure that the failure mode was relevant (solder
joint fatigue). Due to the 3-point bending, each row of CSPs on the specimen experienced a different amount
of exure (measured with strain gages mounted on the specimen) and thus each set of solder joints of the
assembly experienced a different level of fatigue load. The cycles to failure for all the components are normalized with respect to the component at the center which experiences the highest bending curvature . The
normalized bending load (curvature multiplied by half the PWB thickness t/2) versus the cycles to failure

40
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Die

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Normalized bending load


*0.5*t

(N) is plotted (Figure 10). The test setup and the life cycle loading condition were both simulated using nite
element models of the hardware, in order to estimate the acceleration factor, as a function of the exure
experienced by the circuit card at the site of each CSP. As an example, the acceleration factor, for the row of
CSPs at the center of the 3-point bend specimen, was calculated to be approximately 10. The interconnects
for this row lasted for approximately 8300 cycles which simulates approximately 83000 cycles (10 years)
under life-cycle conditions. Because of the reasonably large sample size tested here, the variability of the
measured life-time could also be determined, as shown in Figure 2. The subscript on N provides the failure
based on the observed variability.

1.00E-03

Accelerated test load


N 50% Calibrated with experiments
N 1% Calibrated with experiments

Prediction

N .1%Calibrated
with experiments
Life cycle load

1.00E-4

Cycles to failure

83,000

ACCELERATION FACTOR= 10
Cycles to failure = 83,000 cycles
(approx 10 years)

Figure 10Reliability prediction based on test data

5.4 Reliability predictions based on stress and damage models


The objective of reliability predictions based on stress and damage models is to assess the time to failure and
its distribution for a system and its components, evaluating individual failure sites which can be identied
and modeled based on the construction of the system and its anticipated life cycle. However, since simulation techniques continue to improve and models for assessing new and known failure mechanisms continue
to be developed, this subclause does not attempt nor is it intended to provide a complete reference to all of
the models that can be used to conduct a reliability prediction based on stress and damage models. Practitioners of this reliability prediction method need to document the acceptance and validity of the simulation and
modeling techniques used and be aware of their limitations.
Reliability predictions based on stress and damage models rely on understanding the modes by which a system will fail, the mechanisms that will induce the identied failures, the loading conditions that can produce
the failure mechanisms, and the sites18 that are vulnerable to the failure mechanisms. The methodology
makes use of the geometry and material construction of the system, its operational requirements (e.g.,
electrical connectivity), and the anticipated operating (e.g. internal heat dissipation, voltage, current), and
environmental (e.g., ambient temperature, vibration, relative humidity) conditions in the anticipated application. The method may be limited by the availability and accuracy of models for quantifying the time to failure of the system. It may also be limited by the ability to combine the results of multiple failure models for a
single failure site and the ability to combine results of the same model for multiple stress conditions.
However, there are recognized methods for addressing these issues, and research continues to produce
improvements.

18For

example, an interconnection. It is essential to know the failure site for predicting failure with this technique.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

41

--``,-`-`,,`,,`,`,,`---

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

In this approach, reliability predictions depend on the development of a representative model(s) of the system, from which the systems response to anticipated operating and environmental conditions can be
assessed. Stresses19 (responses within the structure to the applied conditions) determined from the simulation are then used as inputs to the damage models, to quantify the likelihood of failure at individual failure
sites. The number of failure mechanisms and sites addressed in this method is governed by the availability of
stress and damage models, and the quality of the prediction is governed by the quality of the models and the
input parameters. The method can be used for identifying and ranking failure sites, determining the design
margins and intrinsic (ideal) durability of the system, developing qualication tests for the system, and for
determining acceleration transforms between test and use conditions. It is recommended that the method be
used in combination with physical testing to verify that the modeling has adequately captured the system
failures.
Research into failure mechanisms found in electronic systems is actively pursued and models exist for the
majority of failures (see Dasgupta, A., and Pecht, M.,Failure Mechanisms and Damage Models; Tummla,
R. and Rymaszeewski, E., Microelectronics Packaging Handbook; Pecht, M., Integrated Circuit, Hybrid,
and Multichip Module Package Design Guidelines; and Upadhyayula, K. and Dasgupta, A., An Incremental
Damage Superposition Approach for Interconnect Reliability Under Combined Accelerated Stresses).
However, it should be recognized that models may not exist for all possible failures, and users of this
approach should clearly state that the assessment only covers failures that have been modeled. In many
cases, the stress and damage models are combined to form a single model sometimes referred to as a failure
model. The stress input is usually derived for a particular condition to which a system is exposed, using an
appropriate stress analysis approach. A review of the sensitivity of the assessment to geometric and material
inputs, as well as applied loading conditions, should be conducted and limits of the model and the appropriateness of the model for each assessment situation should be considered. Examples of conditions that are
known to induce failure include, but are not limited to, a temperature cycle, a sustained temperature exposure, a repetitive dynamic mechanical load, a sustained electrical bias, a sustained humidity exposure, and an
exposure to ionic contamination. Examples of damage include exceeding a material strength, reduction in
material strength, removal of material, change in material properties, growth of a conductive path, or separation of joined conductors.

The variability of the time to failure can be assessed by considering the distribution of the input data in a systematic approach (e.g., Monte Carlo analysis). Alternatively, one could directly apply known distributions
obtained from test or eld results that correspond to each specic failure mechanism to the results of the failure model. Figure 11 depicts time to failure distributions and their means arising from three competing failure sources. A deterministic assessment would consider the single time to failure for each source (i.e., t1, t2,
and t3), while a probabilistic assessment would consider the entire time to failure distribution of source. For
the illustration presented Figure 11, rst failure could be attributed to any of the three failures sources, however failure source number one is clearly dominant. A failure source is a failure that occurs at specic site.
19

In this discussion, stress refers to the local condition at the failure site, in response to an applied load. For example, vibration (loading)
can produce mechanical stresses at an interconnect; or power cycling (load) can produce temperature transients (stress) at an IC gate.

42
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Failure models may be classied as overstress or wearout. Models for overstress calculate whether failure
will occur based on a single exposure to a dened stress condition. For an overstress model, the simplest formulation is comparison of an induced stress versus the strength of the material that must sustain that stress.
Die fracture, popcorning, seal fracture, and electrical overstress are examples of overstress failures. Models
for wearout failures calculate an exposure time required to induce failure based on a dened stress condition.
Fatigue, crack growth, creep rupture, stress driven diffusive voiding (SDDV), time dependent dielectric
breakdown (TDDB), metallization corrosion, and electromigration are examples of wearout mechanisms. In
the case of wearout failures, damage is accumulated over a period until the item is no longer able to withstand the applied load. Therefore, an appropriate method for combining multiple conditions must be determined for assessing the time to failure. Sometimes, the damage due to the individual loading conditions may
be analyzed separately, and the failure assessment results may be combined in a cumulative manner. Invariably, the prediction based on stress and damage models relies on failure models that are documented with
assumptions and limitations and comparisons to experimental and/or eld data.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Figure 11Time to failure distributions for three competing failure sources


5.4.1 Stress and damage model reliability prediction method process
A owchart of the prediction methodology using stress and damage models is depicted in Figure 12, and
involves:

Reviewing geometry and materials of the system (e.g., component interconnects, board metallization, and solder joints), their distributions, and potential manufacturing aws and defects;
Reviewing the environmental and operating loading conditions (e.g., voltage, current, relative
humidity, temperature, or vibration and their variability) to which the system will be subjected to
from an anticipated operational prole for the system;
Identifying the modes by which the system can fail (e.g., electrical shorts, opens, or parametric shifts
which result from degradation), the locations or sites where the failure would occur, and the mechanisms (e.g., fatigue, fracture, corrosion, voiding, wear, or change in material property) that produce
the failure.
From all possible failure mechanisms identied for the system and components, only a subset of
these failure mechanisms will compete to cause the rst failure. The following methods can be used
to identify the dominant failure mechanisms:
Field or test data from the system or similar systems;
Highly accelerated life tests (HALT)
Vibration/shock/thermal or other environmental data collected on the actual system or similar
systems; and
Engineering judgment.
The determination of dominant failure mechanisms could be a combination of the above methods.
Test and eld data could show which failure mechanisms occur during testing conditions or actual
eld conditions. HALT is testing which uses high stress levels to identify what fails rst in a system,

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

43

--``,-`-`,,`,,`,`,,`---

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

although there is no relation to operational life. Environmental data collected on the actual system,
and system-level stress analysis, could be used to determine the severity of different environmental
stresses.
Identifying physics-of-failure models (e.g., Cofn-Manson Fatigue model, crack initiation and Paris
fatigue crack growth power law model, creep rupture model, Arrhenius, Eyring) for evaluating the
time to failure for the identied failure mechanisms;
Estimating time to failure and the variation of time to failure based on distributions in the inputs to
the failure models;
Ranking susceptibilities of the system to specic failure mechanisms based on assessment of times to
failure, their variation, and their acceptable condence level.

--``,-`-`,,`,,`,`,,`---

Figure 12Generic process of estimating the reliability of an electronic system

44
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

5.4.2 Stress and damage model reliability prediction method example


This subclause presents a simple step-by-step example (following the ow chart presented in Figure 12) of
applying the stress and damage model approach to reliability prediction based on an assumed failure mechanism at a few selected failure sites. Examples of evaluating multiple sites/mechanisms at a system level and
ranking the time to failure can be found elsewhere in Engelmaier, W., Fatigue Life Of Leadless Chip Carriers Solder Joints During Power Cycling, IEEE Transactions on Components, Hybrids, and Manufacturing
Technology and Dasgupta, A., Oyan, C., Barker, D., and Pecht, M., Solder Creep-Fatigue Analysis by an
Energy-Partitioning Approach.

Step 1Review the geometry and materials of the system


The description in 5.4.2 provides a general discussion of the geometry and material construction of the electronic hardware, its operation, and its anticipated application life cycle. Details of the parts may be obtained
from vendor data sheets and related design documentation. For this example, the relevant physical properties
of the part are dened in Table 9. The solder joint height for all parts is 0.1 mm and CTE of the board is 17.5
ppm/C.
Table 9Part geometry and coefcient of thermal expansion (CTE)
Part

Length (mm)

Width (mm)

CTE (ppm/C)

68 PLCC

24.0

24.0

22

LCR

6.3

3.2

LCC

3.2

1.6

Step 2Review the load conditions to which the system will be subjected to dene its anticipated operational prole
While there are varieties of conditions that can result in the failure of the circuit card, this example is
restricted to failure arising from temperature cycling. Temperature cycling is of particular concern, since
temperature variations in the assembly combined with differences in rates of expansion due to temperature
tend to fatigue material interfaces. Since the temperatures of the parts and the printed wiring board will vary
based on operation, we need to evaluate or determine the temperature that each part is likely to reach during
its anticipated use. This can be accomplished through simulation or measurement. For this example, the
daily temperature history of the parts as measured is depicted in Figure 13, and the powered and unpowered
part temperatures are provided in Table 10.

45

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Consider a circuit card assembly (CCA) composed of a leadless ceramic capacitor (LCC), a leadless ceramic
resistor (LCR), and 68-pin plastic leaded chip carrier (PLCC) on a printed wiring board. The printed wiring
board is 1.5 mm thick, has 4 signal layers, and is constructed of FR4. The CCA is expected to operate for 3
years in relatively benign environment where it will be powered on one time a day. The ambient environment
temperature is expected to be 22 C.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Temperature (deg C)

90
80

LCC

70

LCR

60

PLCC

50
40
30
20
10
0
0

240

480

720

960

1200

1440

Time (minutes)

Figure 13Daily temperature history of parts

Part

Operational (C)

Power-off (C)

68 PLCC (24 x 24)

80

22

LCR (6.3 x 3.2)

60

22

LCC (3.2 x 1.6)

60

22

Step 3Identify potential failure modes, sites, and mechanisms based on expected conditions
Methods for identifying potential failure modes, sites, and mechanisms include using global stress analysis
(e.g., coarse nite element analysis), accelerated tests to failure (e.g., HALT or elephant tests), or engineering judgment. Failure modes can include open or shorts in the electrical circuits as well as operation
drift. Failure sites may include the individual parts, metallization on the printed wiring board, and part interconnects. Examples of common failures are failure of the capacitor to maintain a charge (see Cunningham, J,
Valentin, R., Hillman, C., Dasgupta, A., and Osterman, M., A Demonstration of Virtual Qualication for the
Design of Electronic Hardware), failure of the semiconductor in the 68 pin PLCC due to electrical opens
(see Pecht, M., and Ko, W., A Corrosion Rate Equation For Microelectronic Die Metallization), shorting in
the PLCC (see Black, J. R., Physics of Electromigration), and potential for electrical opens of the solder
interconnects (see Engelmaier, W., Fatigue Life Of Leadless Chip Carriers Solder Joints During Power
Cycling, and Dasgupta, A., Oyan, C., Barker, D., and Pecht, M., Solder Creep-Fatigue Analysis by an
Energy-Partitioning Approach). For the purposes of illustration, the opens in the circuit at the solder interconnects due to low cycle fatigue will be evaluated.
Step 4Identify appropriate failure models and their associated inputs based on identied failure mode,
site, and mechanism

46
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Table 10Part temperatures

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

There is a variety of failure models that can be used to estimate the time to failure for low cycle fatigue. Two
of the most common are the total inelastic strain range (Cofn-Manson) and the energy partitioning
approaches (see Engelmaier, W., Fatigue Life Of Leadless Chip Carriers Solder Joints During Power
Cycling, and Dasgupta, A., Oyan, C., Barker, D., and Pecht, M., Solder Creep-Fatigue Analysis by an
Energy-Partitioning Approach).
For this example, the number of cycles to failure, based on the temperature cycling condition, will be evaluated by a Cofn-Manson low cycle fatigue relationship dened in Equation (1).
1
---

--``,-`-`,,`,,`,`,,`---

1 c
N f = --- -------
2 2 f

(1)

where c and f are material properties of the joint and is the strain range of the joint under a cyclic loading
condition. Assuming that a eutectic tin-lead solder was used to form the interconnect, the damage model
properties in Equation (1) are dened as
f = 0.325
and
360
c = 0.442 0.0006T s + 0.0172 ln 1 + ---------

td

(2)

where Ts is the mean temperature of the solder joint, and td is the dwell time in minutes (see Engelmaier, W.,
Fatigue Life Of Leadless Chip Carriers Solder Joints During Power Cycling). Constants for other materials can be found in engineering handbooks, and test methods for experimentally determining constants are
well established.
To quantify the time to failure (cycles to failure), it is necessary to evaluate the strain range. The strain range
can be evaluated by a number of simulation techniques. Figure 14 provides a simplied representation of the
dimensional changes on a part-board assembly under temperature cycling.

Figure 14Schematic of solder interconnect under temperature cycle

47

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Based on Figure 14, the strain range may be approximated (see Engelmaier, W., Fatigue Life Of Leadless
Chip Carriers Solder Joints During Power Cycling) as
L d ( )T
= ------------------------2h

(3)

where Ld is half of the length of the package, is the difference in coefcients of thermal expansion (CTE)
between the package and the board, T is temperature difference between powered off state and the operational state, h is the distance from the board to the bottom of the package, and is a model calibration factor.
Step 5Estimate the time to failure (and its variability) using relevant failure models
Using the failure model listed in Equation (1), the time to failure is estimated for each package. To determine
the variation of the time to failure for the individual failure sites, there are two methods: 1) using the variation of the input parameters to determine variation of outputs (e.g., Monte Carlo techniques); and 2) applying variability to the deterministic result using previous knowledge of what statistical distribution represents
the variability (e.g., using calculated time to failure as the mean/median of a lognormal distribution of time
to failure). For this case, if the operation temperature varies by 5 C, the coefcient of thermal expansion of
the package and board varies by 1 ppm/C, and the physical dimensions of the package and interconnect
varied by 0.1 mm, then the cycles to failure can be evaluated by randomly sampling the associated input
parameters through a Monte Carlo analysis. Using this process, the distribution of time to failure can be simulated. The results of 1000 runs are presented in Table 11.
Table 11Results of Monte Carlo analysis of 1000 runs

Part

Time to failure mean (days)

Time to failure standard


deviation (days)

Minimum observed time


to failure (days)

68 PLCC

50 000

1000

46 000

LCR

1400

15

1350

LCC

8600

600

7200

In some cases, test data may be sufcient to quantify the distribution of failures that arise for a specic failure mechanism. Assuming the distribution is valid over the range of stresses that are being evaluated, the
distribution may be applied directly to the model results. This provides an alternative to Monte Carlo
analysis.
Step 6More failure models and/or sites
When reviewing the system, the appropriateness of the failure modeling approach that is used to quantify the
likelihood of failure must be evaluated. For certain stress conditions, multiple failure mechanisms may be
active. For instance, electromigration of metal that is driven by operating voltage and temperature may cause
the integrated circuit in the 68 pins PLCC to fail. Alternatively, the temperature may also accelerate metallization corrosion at the bond pads of the device, driven by ingress of moisture and availability of mobile ions.
In cases of multiple failure mechanisms, the dominant failure mechanism is the mechanism that is most
likely to cause the rst failure (i.e., the mechanism with the lowest predicted time to failure) in the system. If
there are a few failure mechanisms that have similar predicted time to failure, these mechanisms will
compete for failure of the system. This concept of competing failure mechanisms can be addressed probabilistically. In other cases, other stress conditions may produce additional damage to the same site. For

--``,-`-`,,`,,`,`,,`---

48

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

instance, vibration may induce further damage to the solder interconnect presented in this example. In this
case, a method for combining the damage at the interconnect site by temperature cycling and vibration must
be developed. Both Miners rule (see Miner, M. A., Cumulative Damage in Fatigue) and the incremental
damage superposition method (see Upadhyayula, K. and Dasgupta, A., An Incremental Damage Superposition Approach for Interconnect Reliability Under Combined Accelerated Stresses) have been used to
accomplish this goal. Clearly, the accuracy of the method is based on ability to identify and model failures.
Step 7Rank failures based on time to failure and determine failure site with the minimum time to failure
For simplicity ignoring other failure mechanisms, the results of the analysis presented in Table 11 indicate
that the most likely failure site is the interconnect of LCR part. Based on the Monte Carlo analysis, the mean
time to failure for the system is 1400 days with the LCR identied as the failure site.

5.5 Reliability prediction based on handbooks


Handbook prediction methods are appropriate only for predicting the reliability of electronic and electrical
components and systems that exhibit constant failure rates. All handbook prediction methods contain one or
more of the following type of prediction:
a)
b)
c)

Tables of operating and/or non-operating constant failure rate values arranged by part type,
Multiplicative factors for different environmental parameters20 to calculate the operating or nonoperating constant failure rate, and
Multiplicative factors that are applied to a base operating constant failure rate to obtain non-operating21 constant failure rate.

Reliability prediction for electronic equipment using handbooks can be traced back to MIL-HDBK-217,
published in 1960, which was based on curve tting a mathematical model to historical eld failure data to
determine the constant failure rate of parts. Several companies and organizations such as the Society of
Automotive Engineers (SAE) (see SAE G-11 Committee, Aerospace Information Report on Reliability Prediction Methodologies for Electronic Equipment AIR5286), Bell Communications Research (now Telcordia)
(see Telcordia Technologies, Special Report SR-332: Reliability Prediction Procedure for Electronic Equipment, Issue 1), the Reliability Analysis Center (RAC) (see Denson, W., A Tutorial: PRISM), the French
National Center for Telecommunication Studies (CNET, now France Telecom R&D) (see Union Technique
de LElectricit, Recueil de donnes des abilite: RDF 2000), Siemens AG (see Siemens AG, Siemens Company Standard SN29500), Nippon Telegraph and Telephone Corporation (NTT), and British Telecom (see
British Telecom, Handbook of Reliability Data for Components Used in Telecommunication Systems, Issue
4) decided that it was more appropriate to develop their own application-specic prediction handbooks for
their products and systems.22 In most cases, they adapted the MIL-HDBK-217 philosophy of curve-tting
eld failure data to some model of the form given in Equation (4).
P = f ( G , i )

(4)

where, P is the calculated constant part failure rate, G is an assumed (generic) constant part failure rate,
and i is a set of adjustment factors for the assumed constant failure rates. What all of these handbook methods have in common is they either provide or calculate a constant failure rate. The handbook methods that
calculate constant failure rates use one or more multiplicative factors (which may include factors for part
quality, temperature, design, environment, and so on) to modify a given constant base failure rate.
20Many handbook prediction methods mistakenly name this method as stress modeling. The handbooks do not model the stress at
points of failures but only assume values of multiplicative factors to relate to different environmental and operational conditions.
21Non-operating reliability predictions are made for one of the following conditions: (a) stored prior to operation [see Pecht, J. and
Pecht, M., Long-Term Non-Operating Reliability of Electronic Products] or (b) dormant while in a normal operating environment.
22Several of these handbooks are effectively unavailable as the sponsoring organizations have stopped updating the handbooks and
associated databases.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Not for Resale

49

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

The constant failure rate models used in some of the handbooks are reportedly obtained by performing a linear regression analysis on the eld data.23 The aim of the regression analysis is to quantify the expected theoretical relationship between the constant part failure rate and the independent variables. The rst step in the
analysis is to examine the correlation matrix for all variables, which showed the correlation between the
dependent variable (the constant failure rate) and each independent variable. The independent variables used
in the regression analysis include factors such as the device type, package type, screening level, ambient
temperature, and the application stresses. The second step is to apply stepwise multiple linear regressions to
the data, which expresses the constant failure rate as a function of the relevant independent variables and
their respective coefcients. The constant failure rate is then calculated using the regression formula and the
input parameters.

--``,-`-`,,`,,`,`,,`---

The regression analysis does not eliminate data entries lacking essential information, since the scarcity of
data necessitates that all data be utilized. To accommodate such data entries in the regression analysis, a separate unknown category may be constructed for each potential factor where the required information was
not available. A regression factor can be calculated for each unknown category, considering it a unique
operational condition. If the coefcient for the unknown category is signicantly smaller than the next lower
category or larger than the next higher category, it can be decided that the factor in question could not be
quantied by the available data and that additional data was required before the factor can be fully evaluated.
A constant failure rate model for the non-operating condition can be extrapolated by eliminating from the
handbook predictions models all operation-related stresses, such as temperature rise or electrical stress ratio.
However, non-operating components were not included in the eld data that was used to derive the models.
Therefore, using handbooks such as MIL-HDBK-217F to calculate constant non-operating failure rates is
essentially an extrapolation of the empirical relationship of the source eld data beyond the range in which it
was gathered.
Some of the concerns regarding technical assumptions associated with the development of handbook-based
methodology are: limitation to constant failure rate assumptions (see OConnor, P. D. T., Statistics in Quality and Reliability. Lessons from the Past, and Future Opportunities), emphasis on steady state temperature
dependent failure mechanisms (see Hakim, E. B., Reliability Prediction: Is Arrhenius Erroneous), factors
based on burn-in and screening tests that predetermine superior reliability of ceramic/metal package types
over plastic packages (see MIL-HDBK-217F), and assumption of higher constant failure rate for newer technologies (see Pease, R., Whats All This MIL-HDBK-217 Stuff Anyhow?). The users of the handbook
methods need to consider these concerns and decide how it affects their reliability prediction.
The handbook prediction methods described in this subclause are MIL-HDBK-217F, SAEs Reliability Prediction Method, Telcordia SR-332, CNET Reliability Prediction Method, and PRISM.
5.5.1 MIL-HDBK-217F
The MIL-HDBK-217 reliability prediction methodology was developed under the preparing activity of the
Rome Air Development Center (now Rome Laboratory). The last version of the methodology was MILHDBK-217 Revision F Notice 2, which was released on February 28, 1995. The last issue of this handbook
prohibits the use of it as a requirement. In 2001, the ofce of the U. S. Secretary of Defense has stated that
. the Defense Standards Improvement Council (DSIC) made a decision several years ago to let MILHDBK-217 die a natural death. This is still the current OSD position, i.e., we will not support any updates/
revisions to MIL-HDBK-217. (See Desiderio, George, FW: 56/755/NP/ Proposed MIL Std 217 Replacement.)
The stated purpose of MIL-HDBK-217 was to establish and maintain consistent and uniform methods
for estimating the inherent reliability (i.e., the reliability of a mature design) of military electronic equipment
and systems. The methodology provided a common basis for reliability predictions during acquisition pro23It

is not known if all the different handbooks use this form of regression analysis.

50
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

grams for military electronic systems and equipment. It also established a common basis for comparing and
evaluating reliability predictions or related competitive designs. The handbook was intended to be used as a
tool to increase the reliability of the equipment being designed. (See MIL-HDBK-217F.)
MIL-HDBK-217 provides two constant failure rate prediction methods parts count and parts stress. MILHDBK-217F parts stress method provides electronic constant parts failure rate models based on curve-tting
the empirical data obtained from eld operation and test. The models have a constant base failure rate modied by an environmental, temperature, electrical stress, quality, and other factors. Both methods use the
formulization of Equation (4), but one method assumes there are no modiers to the general constant failure
rate. The MIL-HDBK-217 methodology only provides results for parts, and not for equipment or systems.

The SAE reliability predictor methodology was developed by the Reliability Standards Committee of the
Society of Automotive Engineers (SAE), and was implemented through a software package known as PREL.
The last version of the software (PREL 5.0) was released in 1990.24 The stated purpose of this methodology
was to estimate the number of warranty failures of electronic parts used in automotive applications as a
function of the pertinent component, assembly, and design variables. (See SAE G-11 Committee, Aerospace Information Report on Reliability Prediction Methodologies for Electronic Equipment AIR5286.)
The methodology was developed using MIL-HDBK-217 data combined with automotive eld data and
empirical data analyses on automotive data collected by the SAE Reliability Standards Committee. The
methodologys database included information on part type, screening level, package type, and location in the
vehicle (see SAE G-11 Committee, Aerospace Information Report on Reliability Prediction Methodologies
for Electronic Equipment AIR5286.), but the actual data sources are kept anonymous. The component constant failure rates were predominantly derived from the warranty records of the participating companies. It
was also assumed that all the vehicles are operated 400 hours/year while calculating time to failure from
warranty return time.
It was reported (see Denson, W. and Priore, M., Automotive Electronic Reliability Prediction) that the
modifying factors for constant failure rates were obtained through regression analysis followed by model
validation through residual analysis, examination of outliers and examination against (constant) zero failure
rates. Some factor values were later modied by the developers manually to account for being intuitively
incorrect. This method provides for what it calls a rst approximation of the non operating effect, with
only component type as the independent variable.
5.5.3 CNET reliability prediction method
The development of the CNET reliability prediction methodology was led by the Centre National dEtudes
des Telecommunications (CNET) of France (now France Telecom R&D) who carried out this work in conjunction with the work at the Institut de Sret de Fonctionnement. The most recent version of this methodology is RDF 2000, which was released in July 2000.
The RDF 2000 methodology is available from the Union Technique de LElectricit (UTE) of France as
standard UTE C80-810, which targets surface mounted parts. The UTE C80-810 standard has been developed using eld failure data for parts operating in ground equipment or employed in commercial aircraft,
and spread out over the period 19901998 (19901992 for avionics data). The data is extrapolated to cover
military, space, and automotive applications. The data is mainly taken from electronic equipment operating
in ground: stationary (or xed); ground: non-stationary; and airborne: inhabited (see Kervarrec, G., Monfort,
M. L., Riaudel, A., Klimonda, P. Y., Coudrin, J. R., Razavet, D Le, Boulaire, J. Y., Jeanpierre, P., Perie, D.,
Meister, R., Casassa, S., Haumont, J. L., and Liagre., A., Universal Reliability Prediction Model for SMD
Integrated Circuits Based on Field Failures).
24SAE

has now discontinued the use of PREL due to difculties in maintaining the database needed for the software.

51

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

5.5.2 SAEs reliability prediction methodology

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.5.4 Telcordia SR-332


Telcordia (previously known as Bellcore) SR-332 is a reliability prediction methodology developed by Bell
Communications Research (or Bellcore) primarily for telecommunications companies (see Telcordia Technologies, Special Report SR-332). Bellcore, which previously was the telecommunications research arm of
the Regional Bell Operating Companies (RBOCs), is now known as Telcordia Technologies. The most
recent revision of the methodology is dated May 2001.
The stated purpose of Telcordia SR-332 is to document the recommended methods for predicting device
and unit hardware reliability (and also) for predicting serial system hardware reliability (see Telcordia Technologies, Special Report SR-332). The methodology is based on empirical statistical modeling of commercial telecommunication systems whose physical design, manufacture, installation, and reliability assurance
practices meet the appropriate Telcordia (or equivalent) generic and system-specic requirements (see
Hughes, J. A., Practical Assessment of Current Plastic Encapsulated Microelectronic Devices). In general,
Telcordia SR-332 adapts the equations in MIL-HDBK-217 to represent what telecommunications equipment
experience in the eld. Results are provided as a constant failure rate, and the handbook provides the upper
90% condence-level point estimate for the constant failure rate.

5.5.5 PRISM
PRISM is a reliability assessment method developed by the Reliability Analysis Center (RAC) (see Denson,
W., Keene, S., and Caroli, J., A New System-Reliability Assessment Methodology; and Reliability Assessment Center, PRISM, Version 1.3). The method is available only as software; the latest version of the software is Version 1.3 released in June 2001.
PRISM combines empirical data of users with the built-in database using Bayesian techniques. In this
technique, new data is combined in a weighted average method, but there is no new regression analysis.
PRISM includes some non-part factors such as interface, software, and mechanical problems.
PRISM calculates assembly and system-level constant failure rates in accordance with similarity analysis
(see 5.2.4), which is an assessment method that compares the actual life cycle characteristics of a system
with predened process grading criteria, from which an estimated constant failure rate is obtained. The component models used in PRISM are called RACRates models and are based on historical eld data acquired
from a variety of sources over time and under various undened levels of statistical control and
verication.25
Unlike the other handbook constant failure rate models, the RACRates models do not have a separate factor for part quality level. Quality level is implicitly accounted for by a method known as process grading.
Process grades address factors such as design, manufacturing, part procurement, and system management,
which are intended to capture the extent to which measures have been taken to minimize the occurrence of
system failures.

25Motorola has evaluated the software and the data and Pradeep Lall of Motorola reports that much of the data comes from Mil-HDBK217.

52
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

The main concepts in MIL-HDBK-217 and Telcordia SR-332 are similar, but Telcordia SR-332 also has the
ability to incorporate burn-in, eld, and laboratory test data, using a Bayesian analysis. For example, Telcordia SR-332 contains a table of the rst-year multiplier, which is the predicted ratio of the number of failures of the part in the rst year of operation in the eld to the number of failures of the part in another one
year of (steady state) operation. This table contains the rst-year multiplier for each value of the part device
burn-in time in the factory. Here, the parts total burn-in time can be obtained as the sum of the burn-in time
at the part, unit, and system level.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

The RACRates models consider separately the following ve contributions to the total component constant failure rate: 1) operating conditions, 2) non-operating conditions, 3) temperature cycling, 4) solder joint
reliability, and 5) electrical overstress (EOS). It needs to be noted that the solder joint failures are combined
without consideration of the board material or solder material. These ve factors are not independent, for
example, solder joint failures depend on the temperature cycling parameters. A constant failure rate is calculated for solder joint reliability, although solder joint failures are primarily wearout failure mechanism due to
cyclic fatigue (see Dasgupta, A. Failure Mechanism Models For Cyclic Fatigue).
PRISM calculates non-operating constant failure rates with the following assumptions. The daily or seasonal
temperature cycling high and low values that are assumed to occur during storage or dormancy represent the
major contribution to the non-operating constant failure rate value. The solder joints contribution to the nonoperating constant failure rate value is represented by reducing the internal part temperature rise to zero for
each part in the system. Lastly, the probability of electrical overstress (EOS) or electrostatic discharge (ESD)
contribution to the non-operating constant failure rate value is represented by the assumption that the EOS
constant failure rate is independent of the duty cycle. This accounts for parts in storage affected by this failure mode due to handling and transportation.
5.5.6 Non-operating constant failure rate predictions
The MIL-HDBK-217 did not have specic methods or data related to the non-operational failure of electronic parts and systems. Several different methods were proposed in the 1970s and 1980s to estimate nonoperating constant failure rates. The rst methods used multiplicative factors based on the operating constant
failure rates obtained using other handbook methods. Reported values of such multiplicative factors are 0.03
or 0.1. The rst value of 0.03 is reportedly obtained from an unpublished study of satellite clock failure data
from 23 failures. The value of 0.1 is based on a RADC study from 1980 (see Rome Air Development Center,
RADC-TR-80-136). RAC followed up the efforts with RADC-TR-85-91 Method (see Rome Air Development Center, RADC-TR-85-91). This method was projected as an equivalent of MIL-HDBK-217 for nonoperating conditions and contained same number of environmental factors and same type of quality factors
as the MIL-HDBK-217 document current at the time of development on the method (see Rooney, J. P.,
Storage Reliability). Some other non-operating constant failure rate tables from the 19701980 include:
MIRADCOM Report LC-78-1, RADC-TR-73-248, and NONOP-1.26
5.5.7 Examples of constant failure rate predictions
This subclause includes the examples of handbook reliability predictions; the examples are taken from Telcordia methods. In the rst example, the unit constant failure rate in the Telcordia parts count method is
given by the formula:
n

SS = E Gi Qi Si Ti N i

(5)

i=1

where
Gi

is the generic constant failure rate for the ith device type,

Qi

is the quality factor for the ith device type,

Si

is the stress factor for the ith device type,

26The U.S. Army Missile Research and Development Command (MIRADCOM) Report LC-78-1, published in 1978, contains nonoperating constant failure rate data by part type along with a 90% upper condence limit. RADC-TR-73-248, published in 1973, contains non-operating constant failure rates that were developed by Martin-Marietta under a RADC contract. NONOP-1, published in
1987, is based on non-operating eld and test of using electronic and mechanical diodes.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Not for Resale

53

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Ti

is the temperature factor for the ith device type,

Ni
n
E

is the number of type i devices in the unit,


is the total number of device types, and
is the unit environmental factor.

5.5.7.1 Example 1: Parts count method


To illustrate the application, an example of a circuit card that consists of 3 types of components is used.
Table 12 gives the assumed generic constant failure rates and the factors associated with each part type
along with the total constant failure rate for the circuit card (the environment factor E is equal to 1). Thus,
the predicted constant failure rate for the circuit card under consideration by using the part count method is
120 FIT.
Table 12Example of part count method
Number of
components
of type i

Gi

Qi

Si

Ti

Total
failure rate
for type i

10

1.2

60

15

0.8

24

30

1.5

0.8

36

--``,-`-`,,`,,`,`,,`---

Part type i

Total

120

5.5.7.2 Example 2: Combining laboratory data with parts count data


The constant failure rate for combining laboratory data with parts count or parts stress data is given by the
formula:
SS = E [ w G + ( 1 w ) LAB ]

(6)

where
G
is the generic constant failure rate,
LAB is the laboratory constant failure rate incorporating the effective burn-in time, if applicable,
E
is the environment factor, and
w

is the weight assigned to the generic constant failure rate.27

The constant failure rate for combining the eld tracking data is obtained by replacing LAB in Equation (6)
by FIELD.

27The

weight assigned to the generic failure rate for any part of interest is based on the assumption that the generic failure rate of the
part is computed such that 2 failures were observed during the normal operating hours of the part of interest.

54
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

5.5.8 An Example of non-operating constant failure rate predictions


In this example, dormant constant failure rates are calculated for varying types of resistors using RADC-TR85-91, the MIL-HDBK-217F (Notice 2) parts count method, and PRISM. The following assumptions apply:
1) environment is ground benign, 2) duty cycle is 0% in PRISM, and 3) temperature rise is 0oC in PRISM.
The predicted non-operating constant failure rates in FITs for each method are shown in Table 13. As shown
in Table 13, the predicted non-operating constant failure rates vary by large factors amongst the prediction
methods. Similarly, the predicted non-operating constant failure rates vary by orders of magnitude amongst
component types especially for PRISM and RADC-TR-85-91.
Table 13Predicted non-operating constant failure rate values from handbook methods

PRISM

RADCTR-85-91

RAC
tool-kit

10%
rule-ofthumb

3% ruleof-thumb

MIRADCOM
LC-78-1

RADC-TR73-248

Fixed, carbon
composition (RC/RCR)

2.07

0.15

1.32

0.66

0.20

< 0.06

0.07

Fixed, lm (RN)

1.10

0.24

2.22

1.11

0.33

0.02

3.00

Fixed, network, lm (RZ)

2.49

1.03

0.96

0.48

0.14

< 909.10

N/A

Fixed, wirewound, power


(RW)

1.63

1.37

3.90

1.95

0.59

1.49

0.50

Fixed, thermistor (RTH)

6.75

6.48

0.84

0.42

0.13

16.90

30.00

Variable, wirewound (RT)

12.10

2.38

1.44

0.72

0.22

3.79

50.00

Part type

5.5.9 Comparison of handbook methods for predicting operating reliability


Handbook prediction methodologies have been extensively studied and compared, and results are readily
available in literature (see SAE G-11 Committee, Aerospace Information Report on Reliability Prediction
Methodologies for Electronic Equipment AIR5286; Kervarrec, G., Monfort, M. L., Riaudel, A., Klimonda, P.
Y., Coudrin, J. R., Razavet, D Le, Boulaire, J. Y., Jeanpierre, P., Perie, D., Meister, R., Casassa, S., Haumont,
J. L., and Liagre., A., Universal Reliability Prediction Model for SMD Integrated Circuits Based on Field
Failures; Bowles, J. B., A Survey of Reliability-Prediction Procedures for Microelectronic Devices;
Jones, J. and Hayes, J., A Comparison of Electronic-Reliability Prediction Models; Pecht M. and Nash, F.,
Predicting the Reliability of Electronic Equipment; Cushing, M. J., Mortin, D. E., Stadterman, T. J., and
Malhotra, A., Comparison of Electronics-Reliability Assessment Approaches; Leonard, C. T., How Failure Prediction Methodology Affects Electronic Equipment Design; OConnor, P. D. T., Reliability Prediction: A State-Of-The-Art Review; OConnor, P. D. T., Undue Faith in US MIL-HDBK-217 for Reliability
Prediction;and OConnor, P. D. T., Reliability Prediction: Help or Hoax.). The handbook reliability prediction methods described above are compared with other reliability prediction methods in 5.6 with respect
to IEEE 1413 criteria.

Reliability prediction method selection should be based on how well the prediction satises the users objectives. IEEE Std 1413-1998 was developed to identify the key required elements for an understandable and
credible reliability prediction, and to provide its users with sufcient information to evaluate prediction

55

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

5.6 Assessment of reliability prediction methodologies based on IEEE 1413 criteria

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

methodologies and to effectively use their results. A prediction made according to this standard includes sufcient information regarding the inputs, assumptions, and uncertainties associated with the methodology
used to make the prediction, enabling the user to understand risks associated with the methodology. IEEE
Std 1413-1998 was formulated to enable the industry to capitalize on the positive aspects of the available
prediction methodologies and to benet from the exibility of using various methodologies, as appropriate
during equipment development and use.
5.6.1 IEEE 1413 compliance
IEEE Std 1413-1998 identies the framework for the reliability prediction process for electronic systems
(products) and equipment. Since the reasons for performing a reliability prediction vary (e.g., feasibility
evaluation, comparing competing designs, spares provisioning, safety analysis, warranties, and cost assessment), a clear statement of the intended use of prediction results obtained from an IEEE 1413-compliant
method is required to be included with the nal prediction report. Thus, an IEEE 1413-compliant reliability
prediction report must include:

Reasons why the reliability predictions were performed


The intended use of the reliability prediction results
Cautions as to how the reliability prediction results must not be used
Where precautions are necessary

An IEEE 1413-compliant reliability prediction report should also identify the method used for the prediction
and identify the approach, rationale, and references to where the method is documented. In addition, the prediction report should include:

Denition of failures and failure criteria


Predicted failure modes
Predicted failure mechanisms
Description of the process to develop the prediction
Assumptions made in the assessment
Methods and models
Source of data
Required prediction format
Prediction metrics
Condence level

Further, IEEE Std 1413-1998 specically identies inputs that must be addressed with respect to the extent
to which they are known (and can be veried) or unknown for a prediction to be conducted. These include,
but are not limited to, usage, environment, lifetime, temperature, shock and vibration, airborne contaminants, humidity, voltage, radiation, power, packaging, handling, transportation, storage, manufacturing, duty
cycles, maintenance, prediction metrics, condence levels, design criteria, derating, material selection,
design of printed circuit boards, box and system design parameters, previous reliability data and experience,
and limitations of the inputs and other assumptions in the prediction method.
Besides prediction outputs, the prediction results should also contain conclusions, recommendations, system
gures of merit, and condence levels. The report should indicate how the conclusions follow from the outputs and justify the recommendations, where the recommendations are stated in terms of specic engineering and logistic support actions. Since the uncertainty (or the condence level) is affected by the
assumptions regarding the model inputs, the limitations of the model, and the repeatability of the prediction,
the reliability prediction results should be presented and included in the report.

56
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

In summary, a reliability prediction report complying with IEEE Std 1413-1998 provides documentation of
the prediction results, the intended use of prediction results, the method(s) used for the prediction, a list of
inputs required to conduct the prediction, the extent to which each input is known, sources of known input
data, assumptions used for unknown input data, gures of merit, condence in the prediction, sources of
uncertainty in the prediction results, limitations of the results, and a measure of the repeatability of the prediction results. Thus, any reliability prediction methodology can comply with IEEE Std 1413-1998.
In order to assist the user in the selection and use of a particular reliability prediction methodology complying with IEEE Std 1413-1998, a list of criteria is provided. The criteria consist of questions that concern the
inputs, assumptions, and uncertainties associated with each methodology, enabling the risk associated with
the methodology to be identied. Table 14 provides the assessment of various reliability prediction methodologies according to IEEE 1413 criteria (the rst eleven questions in the table). Other considerations may be
included when selecting the methodology, and these are also included in Table 14 following the IEEE 1413
criteria.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Not for Resale

57

Handbook methods
Field data

Test data

Stress and damage


models
MIL-HDBK-217F

RACs PRISM

SAEs HDBK

Telcordia SR-332

CNETs HDBK

Not for Resale

Yes

Yes

Yes

No

Yesa

No

No

No

Are assumptions used to conduct the prediction according to


the methodology identied,
including those used for the
unknown data?

Yes

Yes

Yes

No

Yes

Yes

Yes

No

Are sources of uncertainty in


the prediction results identied?

Can Be

Can Be

Can Be

No

No

No

No

No

Are limitations of the prediction


results identied?

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Are failure modes identied?

Can Be

Can Be

Yes

No

No

No

No

No

Are failure mechanisms


identied?

Can Be

Can Be

Yes

No

No

No

No

No

Yes

Yes

Yes

No

No

No

No

No

Yes. As input to
physics of failure
based models for the
failure mechanisms.

No. It does not consider the different


aspects of environment. There is a
temperature factor
T and an environment factor E in the
prediction equation.

No. It does not consider the different


aspects of environment. Environmental inputs include
operating and dormant temperatures,
relative humidity,
vibration, duty
cycle, cycling rate
and power and voltage conditions.

No. It does not consider the different


aspects of environment. Ambient temperature, application
stresses, and duty
cycle are used as
factors in prediction
equation.

No. It does not consider the different


aspects of environment. Ambient temperature, vibration
and shock, power
and voltage conditions are used as factors in prediction
equation.

No. It does not consider the different


aspects of environment. Requires a
range of parameter
values that dene
each environmental
category. Parameters
include vibration,
noise, dust, pressure, relative humidity, and shock.

Yes

No

No

No

No

No

Are condence levels for the


prediction results identied?
Does the methodology account
for life cycle environmental conditionsb, including those
encountered during a) product
usage (including power and
voltage conditions), b) packaging, c) handling, d) storage,
e) transportation, and
f) maintenance conditions?

Can be. If eld data


is collected in the
same or a similar
environment which
accounts for all the
life cycle
conditions.

Does the methodology account


for materials, geometry, and
architectures that comprise the
parts?

Can be

--``,-`-`,,`,,`,`,,`---

Can be. It can consider them through


the design of the
tests used to assess
product reliability.

Can be

IEEE GUIDE FOR SELECTING AND USING

Copyright 2003 IEEE. All rights reserved.

Does the methodology identify


the sources used to develop the
prediction methodology and
describe the extent to which the
source is known?

IEEE
Std 1413.1-2002

58

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Table 14Comparison of reliability prediction methodologies

Handbook methods
Field data

Test data

Stress and damage


models
RACs PRISM

SAEs HDBK

Telcordia SR-332

CNETs HDBK

Quality levels are


derived from specic part-dependent
data and the number of the manufacturer screens the part
goes through.

There is no part
quality factor in the
RAC rate models.
Part quality level is
implicitly addressed
by process grading
factors and the
growth factor, G.

There is no part
quality factor in the
SAE constant failure
rate model.

Four quality levels


that are based on
generalities regarding the origin and
screening of parts.

Seven main levels of


quality and several
subclasses based on
screening and part
origin are used.

No

Yes. Through Bayesian method of


weighted averaging.

No

Yes. Through Bayesian method of


weighted averaging.

No

Not for Resale

Does the methodology account


for part quality?c

Can be. Not explicitly considered.


Implicitly used from
the quality of the
parts in the system.

Can be. Not explicitly considered.


Implicitly used from
the quality of the
parts in the system.

Does methodology allow incorporation of reliability data and


experience?

Yes

Yes

Yes

Input data required for the


analysis

Information on initial operating time,


failure time, and
operating prole (or
approximations) for
all units.

Detailed test plan


and results including
Information on
operating stresses,
failure time(s), and
expected application environments.

Information on
materials, architectures, design and
manufacturing processes, and operating stresses.

Information on part count and operational conditions (e.g., temperature, voltage; specics depend on the handbook used).

Other requirements for


performing the analysis

Effort required in
creating and maintaining a eld data
collection system
might be high.

The analysis typically involves


designing and conducting tests.

The analysis typically involves stress


and damage simulations that can be
performed through
commercially available software.

Effort required is relatively small for using the handbook method and is limited to obtaining the handbook.

What is the coverage of


electronic parts?

What failure probability distributions are supported?

Yes. Considered
through the design
and manufacturing
data.

MIL-HDBK-217F

Not limited to a particular set of parts.

Not limited to
specic distribution. Statistical techniques are used to t
a distribution to the
test data.

Not limited to specic distribution.


The users choose to
input and interpret
the data in a manner
that suits the physical situation.

Extensive constant
failure rate databases are included in
the methodology.

Five generic part


categories (microcircuits, diodes,
transistors, capacitors and resistors).

Extensivee

Extensivef

Exponentialg

59

IEEE
Std 1413.1-2002

Not limited to
spcic distribution.
Statistical techniques are used to t
a distribution to the
eld data.

Extensived

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Table 14Comparison of reliability prediction methodologies (continued)

--``,-`-`,,`,,`,`,,`---

Handbook methods
Field data

Test data

Stress and damage


models
MIL-HDBK-217F

What reliability metrics are


supported?

Many including
time to failure,
number of cycles to
failure, failure probability distribution,
failure percentile,
condence levels,
failure free operating period, non-failure metrics.

Many including
time to failure,
number of cycles to
failure, failure probability distribution,
failure percentile,
failure free operating period, condence levels.

Many including
time to failure,
number of cycles to
failure, failure probability distribution,
failure free operating period, failure
percentile, condence levels.

Can it provide a reliability prediction for non-operational


conditions?

Yes, if eld data is


collected for nonoperational conditions.

Yes, if storage and


dormant condition
loads are used in the
tests.

Yes, non-operational
conditions can be
part of the environmental and operational prole.

Last revision as of guidebook


publication date

Not Applicable.

RACs PRISM

SAEs HDBK

Telcordia SR-332

CNETs HDBK

IEEE
Std 1413.1-2002

60

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Table 14Comparison of reliability prediction methodologies (continued)

MTBF (mean time between failure), constant failure rate.

Not for Resale

Noh

Yes

No

Version F Notice 2,
released in February 1995.

Version 1.3, released


in June 2001.

Version 5.0, released


in 1990.

No

Issue 7, released in
May 2001.

No

Version RDF
released in July
2000.

aSome data sources are included in the accompanying database.


bThe life cycle of a product describes the assembly, storage, handling,

IEEE GUIDE FOR SELECTING AND USING

Copyright 2003 IEEE. All rights reserved.

and scenario for the use of the product, as well as the expected severity and duration of these environments.
Specific conditions include temperature, temperature cycles, temperature gradients, humidity, pressure, vibration or shock loads, chemically aggressive or inert environments, electromagnetic radiation, airborne contaminants, and application-induced stresses caused by current, voltage, power, and duty cycles.
cQuality is defined as a measure of a parts ability to meet the workmanship criteria of the manufacturer. Quality levels for parts used by some of the handbook methods are different
from quality of the parts. Quality levels are assigned based on the part source and level of screening the part goes through. The concept of quality level comes from the belief that
screening improves part quality.
dMIL-HDBK-217F covers microcircuits, discrete semiconductors, tubes, lasers, capacitors, resistors, inductive devices, rotating devices, relays, switches, connectors, interconnection
assemblies, connectors, meters, quartz crystals, lamps, electronic filters, fuses, and miscellaneous parts.
eTelcordia SR-332 covers integrated circuits (analog and digital), microprocessors, SRAMs, DRAMs, gate arrays, ROMs, PROMs, EPROMS, optoelectronic devices, displays, LEDs,
transistors, diodes, thermistors, resistors, capacitors, inductors, connectors, switches, relays, rotating devices, gyroscopes, batteries, heaters, coolers, oscillators, fuses, lamps, circuit
breakers, and computer systems.
fUTE C80-810 (or CNET RDF 2000) covers standard circuits (e.g., ROMs, DRAMs, flash memories, EPROMs, and EEPROMs), application-specific integrated circuits (ASICs),
bipolar circuits, BiCMOS circuits, and gallium arsenide devices.
gTelcordia SR-332 can cover non-constant failure rate using first year multiplier.
hSome users use this handbook predictions under zero stress to calculate non-operational reliability.

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

5.6.2 Reliability prediction methodology assessment per IEEE 1413


The next four subclauses describe the major categories of reliability prediction methodologies. Additional
discussion is provided pertaining to the assessment according to IEEE 1413 criteria and other information in
Table 14. This subclause explains the yes, no, and can be entries in Table 14.
5.6.2.1 Assessment of reliability predictions using eld data

--``,-`-`,,`,,`,`,,`---

Field data can be used for a reliability predication of an item already in service or a similar item. The methodology is best applicable for high-volume applications from which sufcient data can be obtained to establish statistical condence. It can also be used to adjust other predictions based on other methods by
comparing previous reliability predictions based on those methods with the actual eld reliability performance of the item. Field data is applicable to many non-failure metrics (see 5.2.5). The answers to the 1413
assessment criteria questions may vary depending on the quality of data and the analysis level, both statistical and physical.
The data sources and assumptions used to conduct a reliability prediction based on eld data will be dependent on the eld data and the data collection methods. The sources of uncertainty and limitation of the
prediction will be based on the quality of the available data and the similarity of the item to other items for
which eld data is available. Since eld data analysis usually consists of tting a statistical distribution to
data, condence levels for the prediction can be developed from those distributions.
The type and quality of eld data is highly variable, ranging from simple factory ship and return data to
detailed tracking and failure analysis for every unit built. The answer to several of the questions in Table 14
are Can Be, however, the answer can be Yes if sufciently detailed data is available and used for the
analysis. For example, if failure analysis is available, a separate eld reliability prediction can be performed
for each failure mode, failure mechanism, or failure cause. Since it represents equipment in its actual operational operating conditions, eld data implicitly accounts for life cycle environmental conditions, including
non-operational environments such as storage and transportation if suitable records are kept. Field data also
implicitly accounts for part materials, geometry, architecture, and quality if eld data is used to predict the
reliability of an item already in service or an item with similar part materials, geometry, architecture, and
quality. Field data can explicitly account for these factors if sufcient data is available, e.g., if the eld reliability of a design change that modied part quality is tracked separately from the eld reliability of the original design. Although a prediction based on eld data includes the impact of the life cycle environment on
the product reliability, such as the conditions encountered during product operation (including power and
voltage conditions), it may not allow the user to discriminate between the effect of the individual components of the environment on the observed failures. In this case, the methodology will not account for failure
modes and mechanisms. In addition, it is often difcult to use eld data during the design stage to predict
and compare the effect of changes on part characteristics for theoretical designs, e.g., to determine the effect
of a part geometry change on the reliability prediction.
The primary strength of eld data is that it represents the actual reliability performance of an item in its
actual operational environment rather than simulating or estimating that environment. However, to perform
an accurate reliability prediction based on eld data requires extensive data collection on the same or similar
item for a sufcient length of time to t a statistical distribution with measurable condence.
5.6.2.2 Assessment of reliability predictions using test data
The process of designing a test, setting it up, and conducting it can provide in-depth knowledge of the test
data, the assumptions, approximations, uncertainties, errors, environments, and stresses. Therefore, a reliability prediction based on test data may implicitly satisfy many of the criteria in Table 14.

61

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

A test report should identify clearly the source of the data (who conducted the test) and the extent to which
the source is known (details surrounding the test). Such a report should also identify all electrical, mechanical, environmental, and statistical bases and assumptions. Knowing the bases upon which the test was developed provides understanding of the sources of uncertainty and the limitations on the use of the results and
the data. Such bases and assumptions will include, but not be limited to mechanical, electrical, statistical,
environmental or other modeling assumptions.
Knowing the processes that went into creating the test articles provides an understanding of the test articles
quality. This permits extrapolating the test results from the test article quality to the nal product quality. The
rigor with which the test is conducted provides insight as to the quality of the data itself. Similarly, the materials, geometry and system architecture can be reproduced to whatever level of accuracy desired. Regardless
of the exactness, the level of precision should be stated explicitly.
When failures occur the modes can be recorded. If failure analysis is performed, failure mechanisms can be
identied and, possibly, root causes identied. However, it is sometimes not possible to determine the
unequivocal cause of failure, especially on intermittent failures or destructive tests.
In order to derive a reliable result from an accelerated test, it is required that 1) the same failure mechanism
active during product operation is dominant during accelerated testing, and 2) the acceleration of this failure
mechanism by some appropriate change in the operating conditions to the test conditions can be expressed in
the form of an acceleration transform. Hence, a successful accelerated test should ensure that the failure
modes generated in the test are results of the same mechanism as those that occur in the actual use of the
product in the eld.
A source of error in a test is the inability to duplicate or model the actual use environment. In addition,
extrapolation beyond the actual test time may not be possible, depending on the failure distribution or lack of
data to accurately deduce the failure distribution. The utility of the prediction will also depend on the extent
to which the electronics are tested (whether all circuits cards and open slots were tested) and whether or not
non-operational tests were conducted.
The quality and type of reliability predictions will be predicated on the extent of the statistical analyses of
the data. Such analyses can include determining the underlying failure distribution, calculating condence
limits and expressing the reliability. These may be limited based on the amount of failure data that is developed. Too few failures may prevent accurate estimation of the failure distribution and, therefore, the inability
to determine condence limits (which depend on the distribution). Knowing the level of data quality desired
allows one to identify the types of inputs required.
5.6.2.3 Assessment of reliability predictions using stress and damage models
For the stress and damage model method, the prediction is based on the evaluation of documented models
and their variability, allowing for a ranking of potential failure sites and a determination of a minimum time
to failure. The use of documented models provides an identication of sources used to make the prediction
and describes the extent to which the sources are known. Since failure models are documented, literature
may be referenced to provide details of its development, including the underlying assumptions and limitations. Therefore, reliability prediction based on stress and damage models satisfy most of the criteria of
Table 14.
The accuracy of a stress and damage model prediction is dependent on the models used and the inputs to
those models. The sources of uncertainty can be identied because all the parameters used in the models are
associated with physical properties (e.g., material properties, geometry values, and environmental parameters) and the variations in these parameters can be considered to account for the uncertainty of the model
results. These failure models are for particular failure mechanisms, and the models predict the mode in
which the failure would manifest. These failure models utilize environmental and operational usage prole
conditions as inputs, including power and voltage conditions, environmental exposures, duration and duty

62

Copyright 2003 IEEE. All rights reserved.

--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

cycles at various temperatures, exposure to airborne contaminants, shock and vibration, humidity, radiation,
maintenance, packaging, handling, storage, and transportation conditions. Hence, this methodology identies failure modes and mechanisms and accounts for life cycle environmental conditions.
Using numerical methods such as Monte Carlo simulation, the distribution in the time to failure may be
developed by considering the range of variability of the input parameters to the failure models. From the calculated distribution, a condence level can be calculated to a given time to failure interval.
The stress and damage method considers part quality in terms of the variation of material properties and
structural geometries. Variation of material properties and structural geometries may be accounted for by
using worst-case parameter values or by conducting a sensitivity study of the effect of variation in input
parameters for the failure models. Again, a Monte Carlo simulation or other numerical methods may be used
to model the effect of variation in these properties.

The feasibility of reliability prediction based on stress and damage models is governed by the availability
and accuracy of models as well as the input data for those models. The method requires an a priori knowledge of the relevant failure modes and mechanisms, it is more suited for products where the dominant failure
mechanisms and sites are known. Failure mechanisms in systems are the subject of extensive and active
study by industry, professional organizations, research institutes, and governments, and there is extensive
literature documenting failure mechanisms as well as simulation techniques that can be used for their assessment. Since studies are documented in open literature and subjected to peer review, the accuracy of simulation techniques is reviewed and continues to be improved.
The stress and damage model approach considers the compatibility of the part to the next level of assembly.
In fact, the approach uses the next level of assembly to perform failure mechanism modeling (e.g., a circuit
card assembly (CCA) thermal analysis uses box level thermal characteristics and a CCA vibration analysis
uses structural response of the box and the system).
5.6.2.4 Assessment of handbook prediction methodologies
This subclause assesses the ve handbook prediction methodologies described in 5.5, namely MIL-HDBK217, the Reliability Analysis Centers (RAC) PRISM, the Society of Automotive Engineers (SAE) PREL,
Telcordia SR-332, and the CNET Reliability Prediction Standard according to the criteria derived from
IEEE Std 1413-1998. This method can only be used if the handbook under consideration covers the hardware of interest. The reader may also refer to Bhagat, W., R&M through Avionics/Electronics Integrity Program; Bowles, J. B., A Survey of Reliability-Prediction Procedures for Microelectronic Devices; Lall, P.,
Pecht, M., and Hakim, E. B., Inuence of Temperature on Microelectronics and System Reliability: A Physics of Failure Approach; Leonard, C. T., On US MIL-HDBK-217 and Reliability Prediction; Leonard, C.
T., How Failure Prediction Methodology Affects Electronic Equipment Design; OConnor, P. D. T., Reliability Prediction for Microelectronic Systems; OConnor, P. D. T., Reliability Prediction: A State-OfThe-Art Review; OConnor, P. D. T., Undue Faith in US MIL-HDBK-217 for Reliability Prediction;
OConnor, P. D. T., Reliability Prediction: Help or Hoax; OConnor, P. D. T., Statistics in Quality and
Reliability. Lessons from the Past, and Future Opportunities; Wong, K. L., What Is Wrong with the Existing Reliability Prediction Methods?; Wong, K. L., A Change in Direction for Reliability Engineering is
Long Overdue; Wong, K. L., The Bathtub Curve and Flat Earth Society; OConnor, P. D. T., Reliability:
Measurement or Management?; Nash, F. R., Estimating Device Reliability: Assessment of Credibility;
and Hallberg, ., Hardware Reliability Assurance and Field Experience in a Telecom Environment for
other assessments of handbook prediction methodologies. All the handbook prediction methods are easy to
apply and do not require failure data collection or specic design assessment. However, none of them identi-

63

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

The stress and damage method incorporates reliability into the design process by establishing a credible
basis for evaluating new materials, structures, and electronics technologies. The models are updated with the
results of additional tests and observations as they become available. This method focuses on the root-cause
failure mechanisms and sites, which is central to good design and manufacturing.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

--``,-`-`,,`,,`,`,,`---

es failure modes and mechanisms and thus offer limited insight into reliability issues. Hence, these methods can potentially misguide efforts to design reliable electronic equipment (see Cushing, M. J., Krolewski,
J. G., Stadterman, T. J., and Hum, B. T., U.S. Army Reliability Standardization Improvement Policy and Its
Impact; Hallberg, . and Lfberg, J., A Time Dependent Field Return Model for Telecommunication
Hardware; Leonard, C. T., Mechanical Engineering Issues and Electronic Equipment Reliability: Incurred
Costs Without Compensating Benets; Leonard, C. T., Passive Cooling for Avionics Can Improve Airplane Efciency and Reliability; Pease, R., Whats All This MIL-HDBK-217 Stuff Anyhow?; Wattson,
G. F. MIL Reliability: A New Approach; Pecht, M. and Nash, F., Predicting the Reliability of Electronic
Equipment; and Knowles, I., Is It Time For a New Approach?
5.6.2.4.1 MIL-HDBK-217
MIL-HDBK-217F was developed through the collection and analysis of historical eld failure data, and the
constant failure rate prediction models used in the methodology were based on data acquired from various
sources over time. However, information regarding the sources of this data, levels of statistical control and
verication, and data processing to derive constant failure rates and adjustment factors were not specied in
the document, although a bibliography was provided.
The methodology did not specify the assumptions used to predict reliability and the reasons behind the
uncertainties in the results, but identied the limitations of the predicted results. For example, the rst limitation cited in the document is that the models for predicting the constant failure rate are only valid for the
conditions under which the data was obtained, and for the devices covered. However, the handbook did not
provide any information about these conditions or the specic devices for which the data was collected.
MIL-HDBK-217F does not predict specic eld failures and failures due to environmental conditions such
as vibration, humidity, and temperature cycling (except for steady-state temperature), but only the number of
failures over time. The methodology assumes an exponential failure distribution, irrespective of the hazard
rates, failure modes, and failure mechanisms.28
The impact of failure mechanisms on the failure distribution was not considered, and condence levels for
the prediction results were not addressed. Models were lumped together and used at the package level, and
factors were used without determining the relative dominance of the failure mechanisms (i.e., without verifying that the same failure mechanism is being accelerated in the new environment).
MIL-HDBK-217 stated that the applications of systems could be signicantly different, even when used in
similar environments (for example, two computers may be operating in the same environment, but one may
be used more frequently than the other). In other words, the methodology acknowledged that the reliability
of a system depended on both its environment and the operational loads to which it was subjected. However,
although the prediction methodology covers fourteen environments, it does not account for the actual life
cycle environment, which includes temperature cycles, temperature gradients, humidity, pressure, vibration,
chemically aggressive or inert environments, radiation, airborne contaminants, and application-induced
stresses caused by voltage, power, or duty cycles. The methodology also does not account for the impact of
assembly, handling, storage, maintenance, and transportation on reliability. Consequently, the United States
Department of Defense (DoD) stated that a reliability prediction should never be assumed to represent the
expected eld reliability as measured by the user (see Lycoudes, N., and Childers, C. G., Semiconductor
Instability Failure Mechanism Review). Another indication of the effectiveness of the methodologys
results was the statement: MIL-HDBK-217 is not intended to predict eld reliability and, in general, does
not do a very good job when so applied (see Morris, S. F., Use and Application of MIL-HDBK-217).
The methodology assumes that the stress conditions and the predicted life were independent of material
properties and geometries, and consequently did not account for variabilities in part materials and geome28For example, MIL-HDBK-217FHDBK assumes that solder joint failures, which are known to be wearout, can be modeled by a constant failure rate.

64
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

tries. Parts within the same class and application are assumed to have the same constant failure rate, even
when made of different materials and having different geometries.
The methodology does not consider quality to be a function of manufacturer process control or part variability, but simply a function of the number of tests to which the part is subjected. In other words, the greater the
number of screens a part is subjected to, the higher its quality is assumed to be, irrespective of the damage
caused by faulty design, manufacturing, assembly, or screening procedures. The preparing activity recognized this shortcoming, and stated: Poor equipment design, production, and testing facilities can degrade
part quality. It would make little sense to procure high quality parts only to have the equipment production
procedures damage the parts or introduce latent defects (see Morris, S. F., Use and Application of MILHDBK-217).
The methodology does not address reliability growth, and adds little value to current, new, or futuristic technologies. The handbook stated: evolutionary changes (in technology) may be handled by extrapolation
from existing models; revolutionary changes may defy analysis (see Lycoudes, N., and Childers, C. G.,
Semiconductor Instability Failure Mechanism Review).
5.6.2.4.2 SAEs reliability prediction methodology

Although the methodology does not identify the information used to develop the models, the assumptions
used to develop the models and the limitations of the models are identied. The methodology states that it
only predicts failures due to common causes and not special causes, and that since (the models) were
derived from statistical analysis of (constant) failure rate information from a wide variety of manufacturers
and module types, the resultant reliability predictions are representative of industry averages. Therefore, predictions for a specic set of conditions will be estimates rather than true values (see Denson, W. and Priore,
M., Automotive Electronic Reliability Prediction).
Some of the other limitations stated in the methodology are that there is always an uncertainty in the failure
data collection process due to the uncertainty in deciding whether the failure is inherent or event-related, that
the failure models used do not account for materials, geometry, and manufacturing variations, and that the
data used to develop the failure models was obtained from 19821985, and is not representative of todays
parts. Failure modes and failure mechanisms were also not identied. Condence levels for the prediction
results were addressed by providing a predicted to observed constant failure rate ratio.
The models used in PREL included duty cycles for operating, dormant, and non-operating conditions, as
well as factors for evaluation of system-level infant mortality failures, allowing reliability to be predicted as
a function of any operating scenario and duty cycle. Although actual automotive eld data was used to
develop the models and predominant environmental parameters (e.g., ambient temperature, duty cycles,
power, voltage, and current) were inputs to the failure models, the true part life cycle environment, including
the effects of packaging, handling, transportation, and maintenance was not accounted for.
Although there was no distinct part quality factor in the SAE constant failure rate model, quality was implicitly considered in the regression analysis by a screening factor. In other words, the SAE reliability prediction
methodology also considered quality to be only a function of the number of tests to which the part was subjected, and not of manufacturer process control, assembly, or part variability. The effects of materials, part
geometry, and part architecture on the nal part reliability were also not addressed.

65

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

The SAE reliability prediction methodology (implemented through a software package known as PREL)
was developed using MIL-HDBK-217 data combined with automotive environments and other automotive
data. Since the extent to which the source of MIL-HDBK-217 data was not dened, the extent to which SAE
data is known is also undened.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

5.6.2.4.3 CNET reliability prediction method


The CNET methodology (see RDF 2000) states that the reliability data was taken mainly from eld data
concerning electronic equipment operating in three kinds of environments: 1) ground: stationary, 2) ground:
non-stationary, and 3) civilian: airborne (see Kervarrec, G., Monfort, M. L., Riaudel, A., Klimonda, P. Y.,
Coudrin, J. R., Razavet, D Le, Boulaire, J. Y., Jeanpierre, P., Perie, D., Meister, R., Casassa, S., Haumont, J.
L., and Liagre., A., Universal Reliability Prediction Model for SMD Integrated Circuits Based on Field
Failures). It does not identify the actual data sources and the extent, quantity or quality to which the data is
known. As with other handbook prediction methodologies, the CNET RDF 2000 methodology does not
identify failure modes, failure sites, and failure mechanisms. The constant failure rate is calculated by adding the contributions of the die, the package, and electrical overstresses (EOS). The methodology provides
some temperature, humidity, and chemical exposure information, but the actual life cycle environment is not
considered. Packaging, handling, storage, transportation, and maintenance conditions are also not considered. Environmental conditions are specied by selecting an adjustment factor from an available list, which
does not incorporate the actual life cycle environmental conditions.
--``,-`-`,,`,,`,`,,`---

The methodology identies certain assumptions made in the reliability prediction, such as failure rates being
constant and vibration and shock not being considered to generate signicant failures for the selected environment, but does not describe the assumptions made for unknown data. Although the limitations of the prediction results are identied, the sources of uncertainty in the prediction results are not. The methodology
states that the predictions are based only on the intrinsic reliability of parts; they do not therefore account for
external overload conditions, design errors, or incorrect use of parts, and risks involved in using parts with
poor reliability. Materials, part geometry, and part architecture are also not taken into account.
The methodology does not consider the impact of competing failure mechanisms; for example, environments where severe mechanical stresses are present are not represented in a realistic manner in a model
where only thermal fatigue is taken into account. Although a quality factor is included in the methodology,
this factor is largely based on criteria such as the time-in-production of the part, qualication or supervision
procedures followed by the manufacturer, and conformance of the part to various certifying authorities (such
as the IECQ or CECC), but not on the damage caused by faulty design, manufacturing, assembly, or screening procedures.
This prediction methodology does not allow the incorporation of reliability data and experience. The user
may modify the adjustment factors according to experience, but no venue exists to incorporate reliability
data and experience into the structure of the methodology.
5.6.2.4.4 Telcordia SR-332
Telcordia SR-332 primarily uses data from the computer and telecommunications industries, but the sources
of the data used to develop the prediction methodology and the extent, quantity, quality, and time frame of
the data are not identied. Assumptions and limitations in the methodology are identied (e.g., failure rates
being held constant). However, the sources of uncertainty in the prediction results are unknown. Furthermore, the failure modes and mechanisms are not identied.
Telcordia SR-332 provides the upper 90% condence-level point estimate for the generic constant failure
rate along with tables of multiplication factors. However, although these factors represent certain application
conditions (e.g., steady state temperature, electrical stress), they do not account for the effects of temperature
cycling, vibration, humidity, and other life cycle conditions (except steady-state temperature) on product
reliability. They also do not account for conditions encountered during storage, handling, packaging, transportation, and maintenance. The methodology aims to account for the uncertainty in the parameters contributing to reliability, but does not address the uncertainty in the failure models used.
The methodology accounts for quality as a function of the origin and screening of parts, but does not account
for the materials, part geometry, and part architecture. Four standard quality levels are dened, which are

66
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

identical for all part types, and are based on criteria regarding the origin and screening of the parts. The
effect of faulty design, manufacturing, and assembly procedures on part quality is not addressed. A methodology for combining laboratory test data and/or eld-tracking data with parts count data is provided, but for
that information constant failure rate must be assumed. The methodology allows for the incorporation of
reliability growth and experience in the form of laboratory and eld data without additional regression
analysis.
5.6.2.4.5 RACs PRISM
The constant failure rate calculation models used in PRISM for electrical, electronic, and electromechanical
devices are reported to be based on historical eld data acquired from a variety of sources over time and
under various levels of statistical control and verication, but no details on the quantity, quality and time
frames of the data, or references are provided. There is no presentation or justication for the algorithm used
for nding the constant failure rate, levels of statistical control and verication, and data manipulation to
derive the generic constant failure rates and adjustment factors. The methodology predicts constant failure
rates.
PRISM species assumptions used to predict reliability (e.g., failure rates are assumed constant), and identies the limitations of the predicted results such as the applicability to only similar parts used in developing
the failure rate models. Although the methodology accounts for the uncertainty in the parameters contributing to reliability (e.g., design and manufacturing), it does not account for the uncertainty in its failure models. Condence levels are not specied at all, but the software allows integration of the prediction results
with Bayesian statistics.

PRISM quality factors are obtained by a method called process grading, which are factors to modify a
generic constant failure rate. These factors include design, manufacturing, part procurement, and system
management, which are intended to capture the extent to which measures have been taken to minimize the
occurrence of system failures. If these grades are not calculated for a part or a subsystem, the model defaults
to assuming a typical manufacturing process, and does not adjust the predicted constant failure rate any
further. The effect of materials, geometries, and part architecture on the nal quality and reliability of a part
is not accounted for. PRISM models do have a growth factor that is meant to address improvements in part
reliability and manufacturing. As information about a part (e.g., eld or test data) becomes available, PRISM
allows the user to update the initial reliability prediction, but this estimate is not based on the analysis of the
complete database, so statistical accuracy is unknown.

6. System reliability models


System reliability prediction is based on the system architecture and the reliability of the lower-level components and assemblies. Clause 5 provides methods for predicting the reliability of the lower-level components
and assemblies. This clause describes how to combine these lower-level reliability predictions using the
system architecture to create a system reliability prediction. The combinatorial methods described in this
clause can also be used for combining reliability predictions at lower levels of the system hierarchy, e.g., for
combining component reliability predictions to create an assembly reliability prediction.

67

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

--``,-`-`,,`,,`,`,,`---

Some environmental conditions are direct inputs to the failure models used in PRISM. However, the models
used in the software are only based on an estimate of the inuence of these parameters because the RAC
databases do not consider failure modes and mechanisms. Hence, the methodology attempts to account for
the effects of life cycle processes on end-item operational reliability without accounting for the actual failure
modes, failure sites and mechanisms induced by the processes. If a process is suspected of having an effect
on a parts operational reliability, PRISM requires the user to have a working knowledge of the process and
answer process related questionnaire that PRISM uses to estimate a modifying factor.

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

Subclause 6.1 describes how reliability block diagrams can be used to represent the logical system architecture and develop system reliability predictions. Subclause 6.2 explains how fault trees can be used to combine lower-level reliability predictions. Subclause 6.3 describes Markov models, which are especially useful
for repairable system reliability prediction. Other techniques for repairable system reliability prediction are
briey mentioned at the end of 6.3. Subclause 6.4 describes the use of Monte Carlo simulation to create
system reliability predictions. There are many texts that describe how to combine reliability predictions and
distributions in much more detail than in this guide (see Kececioglu, B. D., Reliability Engineering Handbook, Vols. 1 and 2; Lewis, E. E., Introduction to Reliability Engineering; and Klion, J., Practical Electronic
Reliability Engineering).

6.1 Reliability block diagram


A reliability block diagram presents a logical relationship of the system components. Series systems are
described in 6.1.1, parallel systems are described in 6.1.2, stand-by systems are presented in 6.1.3, (k, n), or
k-out-of-n systems in 6.1.4 and complex systems are described in 6.1.5. All of the above system congurations are analyzed using the principles of probability theory.
6.1.1 Series system
In a series system all subsystems must operate successfully if the system is to function. This implies that the
failure of any of the subsystems causes the system to fail. The reliability block diagram of a series system is
represented by Figure 15.

Figure 15Series system


The units need not be physically connected in series for the system to be called a series system. The system
reliability can be derived from the basic principles of the probability theory. The system will fail if any of the
subsystems or any of the components fails, or the system will survive the mission time t if all the units survive by time t. Then,
n

R s ( t ) = R 1 ( t ) R 2 ( t )...R n ( t ) =

Ri ( t ) n

(7)

i=1

In general, each unit can have a different failure distribution. The reliability metrics such as hazard rates and
mean life can be derived for the system based on the individual component or subsystem failure
distributions.
Assuming that the time-to-failure distribution for all units is exponential with the constant failure rate, i, the
unit reliability is
i t

(8)

--``,-`-`,,`,,`,`,,`---

Ri ( t ) = e

68
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

then, the system reliability is given by

Ri ( t ) = e

Rs ( t ) =

i=1

i t

= e

t
i

i=1

(9)

i=1

The constant system failure rate is


n

s =

(10)

i=1

and the system mean life is


1
1
MTBF = ----- = -----------n
s
i

(11)

i=1

6.1.2 Parallel system


A parallel system is a system that is not considered to have failed unless all components have failed. Sometimes, the parallel system is called (1, n), or 1-out-of-n system, which implies that only one out of n subsystems is necessary to operate for the system to be in an operational, non-failed state. The reliability block
diagram of a parallel system is given in the following gure

Figure 16Parallel system


The units need not be physically connected in parallel for the system to be called a parallel system. The system will fail if all of the subsystems or all of the components fail by the time t, or the system will survive the
mission time, t, if at least one of the units survive by time t. Then, the system reliability can be expressed as
Rs ( t ) = 1 Fs ( t )

(12)

where
Fs(t)

is the probability of system failure, or


n

F s ( t ) = [ 1 R 1 ( t ) ] [ 1 R 2 ( t ) ]... [ 1 R n t ] =

[ 1 Ri ( t ) ]

(13)

i=1

69

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

and the system reliability for a mission time t is


n

Rs ( t ) = 1

[ 1 Ri ( t ) ]

(14)

i=1

In general, each unit can have a different failure distribution. The system hazard rate is given by
fs ( t )
s ( t ) = ----------Rs ( t )

(15)

where
fs(t)

is the system time-to-failure pdf (probability density function)

The mean life, m, of the system can be determined by

m =

R
(
t
)
=
1

s
[ 1 Ri ( t ) ] dt
i=1
0
0

(16)

For example, if the system consists of two units (n=2) with the exponential failure distribution, and the constant failure rates 1and 2, then the system mean life is given by
1
1
1
m = ----- + ----- ----------------1 2 1 + 2

(17)

--``,-`-`,,`,,`,`,,`---

Note that the system mean life is not equal to the reciprocal of the sum of the component constant failure
rates and the hazard rate is not constant over time although the individual unit failure rates are constant.
6.1.3 Stand-by system
A standby system consists of an active unit or subsystem and one or more inactive units, which become
active in the event of a failure of the functioning unit. These dormant systems or units may be in quiescent,
non-operating or warm-up modes. The failures of active units are signaled by a sensing subsystem, and the
standby unit is brought to action by a switching subsystem. The simplest standby conguration is a two-unit
system shown in the following gure. In the general case, there will be N number of units with (N-1) of them
in standby.

Figure 17Stand-by system

70
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

The following assumptions for standby redundancies are generally made:


a)
b)
c)
d)
e)

Switching is in one direction only.


Standby non-operating units cannot fail if not energized.
Switching devices should respond only when directed to switch by the monitor; false switching
operation (static failure) is directed by the monitor as a path failure, and switching is initiated.
Switching devices do not fail if not energized.
Monitor failure includes both dynamic (failure to switch when active path fails) and static (switching
when not required) failures.

R(t) = e

(1 + t)

(18)

6.1.4 (k, n) Systems


A system consisting of n components is called (k, n), or k-out-of-n system, where the system operates only if
at least k components are in the operating state. The reliability block diagram for the (k, n) system is the
same as for the parallel system, but at least k items need to be operating for the system to be functional. The
parallel system described in 4.1.2 is a special case of (k, n) system with k=1.

R1(t)
R2(t)

Rn(t)
Figure 18k-out-of-n system
The reliability function for the system is very complex when the components have different failure distributions. Assuming that all the components have the same failure distribution, F(t), the system reliability can be
determined using the Binomial distribution; i.e.,
n
i
n1
[1 F(t)] [F(t)]
i
i=k
n

Rs ( t ) =

(19)

and the probability of system failure is then


n
i
n1
i [ 1 F ( t ) ] [ F ( t ) ] =
i=k
n

Fs ( t ) = 1 Rs t = 1

n
i
n1
[1 F(t)] [F(t)]
i
i=0

k=1

71

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

(20)

Not for Resale

--``,-`-`,,`,,`,`,,`---

When the active and the standby units have equal constant failure rates, , and the switching and sensing
units are perfect, sw =0, the reliability function for such a system is

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

The probability density function can be determined from


dF s ( t )
k1
nk
n!
f s ( t ) = -------------- = ----------------------------------------- [ 1 F ( t ) ]
[F(t)]
f(t)
( n k )! ( k 1 )!
dt

(21)

and the hazard rate is given by


fs ( t )
s ( t ) = ----------Rs ( t )

(22)

6.1.5 Complex system


If the system architecture cannot be decomposed into series-parallel structures, it is deemed a complex system. Subclauses 6.1.5.1 through 6.1.5.3 describe three methods for reliability analysis of a complex system
using Figure 19 as an example.

Figure 19A complex system


6.1.5.1 Complete enumeration method
The complete enumeration method is based on the list of all possible combinations of the unit failures. Table
15 contains all possible states of the system given in Figure 19. The symbol O stands for system in operating state and F stands for the system in failed state. Letters in uppercase denote a unit in an operating
state and the lowercase letters denote a unit in a failed state.

System
description

System
description

System condition

System status

ABcde

AbCde

AbCDE

AbcDe

AbcDE

AbcdE

ABCdE

aBCde

ABCDe

aBcDe

System condition

System status

All components
operable

ABCDE

One unit in failed


state

aBCDE

Three units in
failed state

72
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Table 15Complete enumeration example

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Table 15Complete enumeration example (continued)


System
description

System condition

System status

abCDE

Two units in
failed state

System
description

System condition

System status

aBcdE

aBcDE

abCDe

aBCdE

abCdE

aBCDe

abcDE

AbcDE

Abcde

AbCdE

aBcde

AbCDe

abCde

ABcdE

abcDe

ABcDe

abcdE

ABCde

abcde

Four units in
failed state

All ve units in
failed state

Each combination representing system status can be written as a product of the probabilities of units being in
a given state; e.g., combination 2 can be written as (1RA)RBRCRDRE, where (1RA) denotes probability of
failure of unit A by time t. The system reliability can be written as the sum of all the combinations for which
the system is in operating state, O, i.e.,
R s = R A R BR C R D R E + (1 R A )R B RC R D RE + R A (1 RB )R C R D R E

+ R A RB (1 R C )R D R E + R AR B RC (1 RD )R E + R AR B RC R D (1 RE )
+ (1 R A )R B (1 RC )R D R E + (1 R A ) RB R C (1 R D )R E + (1 R A )R B RC R D (1 RE )

+L
M
+ (1 R A )R B (1 RC )(1 R D )RE

(23)

After simplication the system reliability is given by


Rs = RB RC R D RE R A RB RC RB RC RD
R B RC RE R B RD RE + R A RC + R B RC

(24)
--``,-`-`,,`,,`,`,,`---

+ RB R D + R B RE

6.1.5.2 Conditional probability method (the law of the total probability)


This method is based on the law of the total probability, which allows system decomposition by a selected
unit and its state at time t. For example, system reliability is equal to the reliability of the system given that
unit A is in operating state at time t, denoted by RS| AG, times the reliability of unit A, plus the reliability of
the system, given that unit A is in a failed state at time t, RS| AB, times the unreliability of unit A, or

73

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

R S = R S|A G R A + R S|A B Q A

(25)

where
QA

= 1RA

This decomposition process continues until each term is written in terms of the reliabilities and unreliability
of all the units. As an example of the application of this methodology, consider the system given in Figure 19
and decompose the system using unit C. Then the system reliability can be written
R S = R S|C G R C + R S|C B Q C

(26)

If the unit C is in operating state at time t, the system reduces to the conguration shown in Figure 20.

--``,-`-`,,`,,`,`,,`---

Figure 20System reduction when unit C is operating


Therefore, the system reliability, given unit C is in operating state at time t, is equal to the series-parallel
combination as shown above, or
RS | C G = [1 (1 R A ) (1 RB )]

(27)

If unit C is in a failed state at time t, the system reduces to the conguration as given in Figure 21.

Figure 21System reduction when unit C fails

74
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Then the system reliability, given that Unit C is in a failed state, is given by
RS | C B = R B [1 (1 RD ) (1 RE )]

(28)

The system reliability is obtained by substituting Equation (29) and Equation (30) into Equation (28).
R S = (R S | CG ) RC + (R S | C B ) QC

(29)

= [1 (1 R A ) (1 RB )] R C + R B [1 (1 R D ) (1 R E )] (1 RC )

The system reliability is expressed in terms of the reliabilities of its components. Simplication of Equation
(29) gives the same expression as Equation (26). The component reliabilities can be obtained using methodologies presented in the preceding subclauses.
6.1.5.3 Cut-sets methodology
A cut set is a set of components; with the property, that failure of all the components causes a system to fail.
A minimal cut set is a set containing minimum number of components that causes a system to fail. If a single
unit is removed (not failed) from the minimal cut set, the system will not fail. This implies that all the units
from a minimal cut set must fail in order to system to fail. The procedure for system reliability calculation
using minimal cut sets is as follows:
a)
b)
c)
d)

Identify minimal cut sets for a given system.


Model the components in each cut set as a parallel conguration.
Model all minimal cut sets as a series conguration.
Model system reliability as a series combination of cut sets with the parallel combination of components in each cut set.

For a) through d), the following cut sets can be identied:


C1 = {A, B}

C 2 = {B , C }

(30)

C 3 = {C , D, E}

Following b) and c), the system block diagram in terms of minimal cut set is as given in Figure 22.

Figure 22System block diagram in terms of minimal cut sets


Using the methodologies for the series and parallel systems, the system reliability is
RS = [1 (1 R A ) (1 RB )] [1 (1 RB ) (1 RC )] [1 (1 RC ) (1 R D )(1 R E )]

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Not for Resale

(31)

75

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

and upon simplication


RS = R A RC + RB RC + R B RD + RB RE R A RB RC RB RD RE
RB RC R D RB RC RE + R B RC RD R E

(32)

The same as given by Equation (23).

Fault tree analysis is a technique that graphically and logically connects various combinations of possible
events occurring in a system. The events, usually failures of the components or subsystems, lead to the top
undesired event, which may be system failure or its malfunction. The primary benets of the fault tree analysis are:
a)
b)
c)
d)
e)

It provides methodology for tracking down system failures deductively.


It addresses system design aspects while dealing with the failures of interests.
It provides graphical tool for describing system functions as well as insight into the system behavior.
It provides system analysis by considering one failure at a time.
It provides qualitative and quantitative reliability analyses of the system of interest.

There are three phases in the fault tree analysis:


1)

2)
3)

Develop a logic block diagram or a fault tree using elements of the fault tree. This phase
requires complete system denition and understanding of its operation. Every possible cause
and effect of each failure condition should be investigated and related to the top event.
Apply Boolean algebra to the logic diagram and develop algebraic relationships between
events. If possible, simplify the expressions using Boolean algebra.
Apply probabilistic methods to determine the probabilities of each intermediate event and the
top event. The probability of occurrence of each event has to be known; i.e., the reliability of
each component or subsystem for every possible failure mode has to be considered.

The graphical symbols used to construct the fault tree fall into two categories: gate symbols and event symbols. The basic gate symbols are AND, OR, k-out-of-n voting gate, priority AND, Exclusive OR and Inhibit
gate. The basic event symbols are Basic Event, Undeveloped Event, Conditional Event, Trigger Event,
Resultant Event, Transfer-in and Transfer-out Event. For complete list of symbols and their graphical presentation (see Rao, S. S., Reliability-Based Design; Kececioglu, B. D., Reliability Engineering Handbook,
Vols. 1 and 2; and Lewis, E. E., Introduction to Reliability Engineering). Quantitative evaluation of the fault
tree includes calculation of the probability of the occurrence of the top event. This is based on the Boolean
expressions for the interaction of the tree events. There are several methodologies for quantitative evaluation
of the fault trees. Some of them are as follows:
a)

b)

Minimal cut set algorithms. A cut set is a set of basic events whose occurrence causes the top event
to occur. A minimal cut set is a set that satises the following: If any basic event is removed from the
set, the remaining events are no longer a cut set. See 6.1.5.3 for more details. The MOCUS algorithm (see Kececioglu, B. D., Reliability Engineering Handbook, Vols. 1 and 2) can be used to determine the minimal cut set for given fault tree.
Dual trees and the minimal path sets. A path set is a dual set of the cut set. A path set is a set of basic
events of the fault tree for which the top event is guaranteed not to occur if none of the events in the
set occurs. A path set is a minimal path set if, when any of the basic events is removed from the set,
the remaining set is no longer a path set. A dual tree of a given fault tree is such a tree for which the
OR gates are replaced with AND gates and the AND gates are replaced with the OR gates in the
original tree. The cut sets obtained from the dual tree are, actually, the path sets of a fault tree.

76
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

6.2 Fault-tree analysis (FTA)

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

The ultimate goal of the fault tree analysis is to compute the probability of occurrence of the top event.
Knowing the minimal cut sets for the tree of interest, the probability of occurrence of the top event can be
obtained using the structure function methodology. Assuming that any basic event has only two states; i.e.,
occurring or not occurring, then a binary indicator variable
1, if the basic event occurs,
zj =
0, if the basic event does not occur,

(33)

is assigned to basic event j, and j = 1, 2,, n, where n is the number of events (components) in the system.
The structure function of the ith minimal cut set is
ni

i ( Z ) =

zij

(34)

j=1

where i=1,2,, m, and m is the number of minimal cut sets, and ni is the number of basic events in the ith
minimal cut set. Then, the structure function for the top event in terms of the minimal cut sets is

(Z ) = 1

[1 (Z )]

(35)

i =1

The probability of occurrence of the top event is calculated from


P(TE ) = E [ (Z )]

(36)

--``,-`-`,,`,,`,`,,`---

In order to calculate the expectation given in Equation (35), the probability of occurrence of every basic
event must be known. If the basic events are, actually, the component failures, then the probability of a basic
event is the probability of component (or subsystem) failure. These probabilities can be calculated using
methodologies presented in the previous clauses of this document. The probability of the occurrence of the
top event is, then, the probability of system failure.

6.3 Reliability of repairable systems


This subclause presents reliability analysis models for the systems that can be repaired during its operation
or mission. A redundant system that contains two or more units can be repaired as long as at least one of the
units is functioning while the other is being repaired. Reliability analysis of repairable systems with redundancy includes several techniques, some of which are Markov processes. For a parallel system of N identical
units, N+1 states of the system existence can be identied: State N implies that all N units are operable, State
N1 implies that one unit has failed and it is under repair and N1 units are operable, , State 1 implies
that 1 unit is operable and the rest are under repair either one at the time (single repair) or more than one at
the time (multiple repairs), and State 0 implies that all units have failed and the system has failed, as well.
For Markov process theory to apply, the units failure rate, , and the units repair rate, , must be constant
(although non-constant rates can be approximated by combinations of constant rates). A general procedure
for determining the reliability of repairable systems is as follows:
a)
b)

c)

Identify all states of the system existence.


Determine the probability of the system being in each state at time t considering only the transition
rates for one state above and one state below the state of interest. Write down the system of differential equations describing all system states.
Solve the system of the differential equations and dene the system reliability from

77

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

R , ( t ) = 1 R 0 ( t )

(37)

where R0(t) is the probability of system failure at time t.


There are three methods for completing the above steps that result in the differential equations for a given
system. They are the following:
1)
2)
3)

System States Analysis Method.


States Transition Matrix.
Markov Graph Method.

For illustration, consider a redundant system consisting of two identical units in parallel as shown in the following gure.

Figure 23Parallel system with repair


The constant failure rate, , and the constant repair rate, , are identical for both units. The states of the
system existence are:

State 2: Both units are operable. It is assumed that the system is in state 2 at t=0.
State 1: One unit is operable, the other is in a failed state and is undergoing repairs.
State 0: Both units are in a failed state, and the system has failed.

At any point in time, the system must be in one of the states, but it can not be in two states at the same time;
i.e., the states are mutually exclusive. The state transition matrix is given in Figure 24:

States at t

P=

States at t+ t
2
1

1 2

1
0

1 ( + )
0
1

Figure 24Markov transition matrix for the two-unit parallel system


and the Markov graph for this system is given in Figure 25:

--``,-`-`,,`,,`,`,,`---

78

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

--``,-`-`,,`,,`,`,,`---

Figure 25Markov graph for the two-unit parallel system


The system states, or the graph nodes, are identied by S2, S1 and S0 and the transition probabilities are written on each branch. The description of the Markov graph for this system is as follows: The system is in State
2 (S2) at time (t+t) if neither of the units fails, or if a unit, which has previously failed at time t, is repaired
and both units are operational at time (t+t). The system is in State 1 (S1) at time (t+t) if one of the units is
in a failed state at time t and is not repaired at time (t+t) and the operational unit has not failed. The system
is in State 0 (S0) if the second unit fails at time (t+t), as well. If in State 0, the system fails. The system of
differential equations for the system is:
P2(t ) = 2 P2 (t )+ P1 (t ),
P1(t )= P2 ()
t ( + ) P1(t ),

(38)

(t )= P1 (t ).

P0

Upon solving the system and using Equation (37), the reliability of the system with two units in parallel with
repair is given by
s t

s t

s1 e 2 s2 e 1
R , ( t ) = -------------------------------------------s1 s2

(39)

where
(
2
2
3 + ) + 6 +

(40)

1
s2 = (3 + ) + 2 + 6 + 2

(41)

s1 =

1
2

and

The methodology can be extended to more complex system congurations that include unequal failure and
repair rates for the parallel units, stand-by units with perfect or imperfect sensing and switching, systems
with single or multiple repairs, etc. However, application of the Markov process implies that the failure and
repair rates are constant which in some instances may not be true. Non-constant failure or repair rates leads
to the Semi-Markov or Non-Markov processes whose application becomes much more complex from the
computational point of view (see Ross, S., Stochastic Processes). Another approach to study reliability and
availability of repairable systems may be application of the renewal theory, which is based on the assumption that each system repair, or renewal, restores the system to as good as new condition. For more information on the renewal theory, see Cox, D. R., Renewal Theory.

79

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

6.4 Monte Carlo simulation


Monte Carlo simulation is a very powerful tool that enables engineers to study a complex system behavior
and its performance. It is particularly effective for complex systems, whose performance is difcult to
analyze analytically, whose performance evaluation by experimenting is long and costly and whose components and subsystems performances are known in terms of the random variables that describe such
performance. If system performance parameters are known to follow certain probability distributions, system behavior can be studied by considering several possible values of those parameters that are generated
using corresponding distributions. The reliability of a system can be calculated by simulating system performance using random number generation and determining the percentage of the system successful
performance outcomes.
All Monte Carlo simulation calculations are based on the substitution of random variables that represent a
quantity of interest by a sequence of the numbers having the statistical properties of the variables. Those
numbers are called random numbers. In general, random variables can be classied as continuous, discrete,
and correlated random variables, and corresponding methodologies have been developed for their
generation.

--``,-`-`,,`,,`,`,,`---

The rst step in predicting the complex system reliability using Monte Carlo simulation is considering as
many of the parameters inuencing the performance as possible. The next step is to determine the random
parameters (variables) and to estimate corresponding distributions. A sample of system performance is created by generating all random performance parameters and then comparing the sample performance with the
performance requirement. If the sample performance meets the requirement, it is considered successful. The
process continues until a predetermined number of performance samples are generated. The system reliability is then calculated using
R=

Number of successful " experiments"


Total number of "experiments" , N

(42)

The ow diagram in Figure 26 presents Monte Carlo simulation technique as applied to reliability estimation
of a complex system.

80
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Define the total


number of
experiments to be
conducted (N)

Generate a
uniformly
distributed number
for each random
variable

Identify random
parameters of the
system

Generate each
random variable
corresponding to its
distribution

Assume appropriate
distributions for the
parameters (random
variables)

Using the set of


generated random
variables (parameters),
evaluate the
performance of the
system

Initialize counter of
the experiments,
i=1

Examine the system


performance and
determine whether the
experiment is a
success or a failure

i = i+1

Is i=N ?

Calculate system reliability using


R=
Number of successful experiments
Total Number of experiments, N

Figure 26Monte Carlo simulation technique

81

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

There are several advantages of the Monte Carlo simulation techniques for the system reliability prediction.
Monte Carlo simulation provides evaluation of a complex, real world, systems with stochastic elements,
which can not be evaluated analytically. The simulation allows one to estimate system performance under
some projected set of operating conditions. Alternative system designs can be compared via simulation to
see which best meets a specied requirement. Better control over experimental conditions can be obtained
than when experimenting with a real system.
The disadvantages of the Monte Carlo simulation technique for the system reliability prediction include the
following.

Each run of the stochastic simulation produces only estimates of models true characteristics for a
particular set of input parameters.
Simulation is not good for optimization, but it is good for comparison of different systems designs.
Simulation of complex systems is expensive and time consuming to develop.
The validity of a model is critical. If a model is not valid, simulation results are useless.

--``,-`-`,,`,,`,`,,`---

82
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Annex A
(informative)

Statistical data analysis


This annex contains several standard statistical methodologies that can be used for analysis of reliability
data. The methods contained herein are brief synopses. Details of the methods and a greater discussion of the
theoretic bases for each are found in the references.
The concepts of complete, singly censored, and multiply censored data are germane to all life data
analysis techniques. Some techniques are more difcult than others to apply to specic data types. Complete data means all units were run to failure. Singly censored data means there are survivors and all the
survivors have the same operation (test) time on them. Multiply censored means each surviving unit may
have accumulated different operating/test time. That is, there are multiple censor times on the surviving
units. Field data is often multiply censored because units are installed at different times so that the survivors
accumulate different amounts of usage time.
Annex subclause A.1 describes graphic plotting techniques including both probability plotting and hazard
plotting. Subclause A.2 presents an analytic technique, the maximum likelihood method for determining distribution parameters (not a plotting technique). Goodness-of-t techniques and tests for determining, statistically, whether the data really ts the assumed distribution are not covered, but references are provided.

A.1 Graphical techniques


Graphical plotting techniques are used to determine the parameters of the underlying failure distribution.
Once determined, probability of survival or failure can be estimated for various time intervals. Making a
statement about reliability that requires extrapolation out beyond the longest point in time for which data has
been generated, whether failure or censor, is a prediction. It is predictive because it estimates the reliability
while assuming the current distribution continues for the rest of the life of the product and that no new failure mechanisms (distributions) will occur.
Graphical methods consist of plotting data points on paper developed for a specic distribution. Linear
regression techniques are often used to t a straight line to the plotted data. Statistical goodness-of-t tests
are used to determine whether or not the data can be modeled using the assumed distribution. That is, a
goodness-of-t test can help determine whether the data really should be modeled by the assumed distribution. Goodness-of-t tests are not discussed herein, but can be found in Nelson [B10]. The advantages of the
graphical methods are simplicity and speed (ease of plotting). A disadvantage is a less precise estimate of the
distribution parameters.
When tting data to distributions, the process is trial and error using educated guesses as to the best candidate distributions. Specic distributions are selected as candidates based on the type of data collected. For
example, candidate distributions for time based data (time to failure) include Weibull and exponential. For
dimensional measurements, normal is often tried rst. For go/no-go (Bernoulli trials) data, binomial is a
good start. Lognormal is often a good starting point for cyclic mechanisms such as metal fatigue. Other candidate distributions can be found in Table 4-3 of Leemis [B7]. If the rst distribution does not t the data,
another is assumed and evaluated until an appropriate distribution is determined. Sometimes no standard distribution adequately models the data. While more complicated and elaborate models can be developed, the
mathematics required frequently becomes very difcult. In these cases, non-parametric analysis techniques
may prove helpful.

--``,-`-`,,`,,`,`,,`---

83

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

The two types of plotting techniques discussed are hazard plotting and probability plotting. Both types of
plotting depend on the linear dependency of two variables, e.g.,
y = ax+b

(A.1)

To arrive at a linear equation as shown in Equation (A.1), the distribution must undergo a mathematical
transformation. The cumulative distribution function (CDF), which models the cumulative probability of
failure as a function of time, is linearized through. The plot then consists of plotting time (or a function of
plotting time) on one axis and the transformed CDF on the other. Having the plotted data follow a straight
line veries that the underlying failure distribution used to generate the axis transform is valid.
Once the plot is made, the distributions parameters can be read directly off the chart. The transforms are different for every distribution and are different for probability and hazard plots. Some of the transformations
are discussed briey in the following subclauses. Equations for other distributions are available in most textbooks, such as Nelson [B10].

A.1.1 Probability plotting


A probability plot consists of the cumulative probability of failure plotted on the vertical axis and time (or a
function of time) plotted on the horizontal axis. The plotting paper usually has additional graphical scales to
permit reading the distributions parameters directly from the plot. Probability plots are easiest to apply to
complete data. They can be applied to singly and multiply censored data, but special software tools are
often required to do the analysis.
In probability plots, the plotting positions are determined based on their rank (order of occurrence). The
most commonly used, because of the higher accuracy, is the median rank. A commonly used estimate of
median rank is shown in Equation (A.2):
i 0.3
F ( t i ) = MR i = ---------------- 100
n + 0.4

(A.2)

where
i
n

is the rank, and


is the number of data points.

The transforms for several different distributions are shown in A.1.1.1 through A.1.1.3.
A.1.1.1 Weibull distribution
The two-parameter Weibull distribution is given by

F(t) = 1 e

t
---

where
is the shape parameter, and
is the scale parameter or characteristic life.
--``,-`-`,,`,,`,`,,`---

84
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

The linear transform is determined by taking the natural logarithm of both sides 2 times. This results in the
following:
1
log ln
= log(t ) log( )
1 F (t )

(A.3)

which is of the form


y = mx+b
This linear transform is included in the axes of Weibull probability paper, so it is necessary only to determine
the median ranks and plot median ranks versus the time for each data point. An example is shown in Figure
A.1.

Figure A.1Example Weibull probability plot


The parameter can be read off the graph as the time that corresponds to the probability of failure of 63.2%
as illustrated in Figure A.1. The shape parameter can be calculated as the slope of the tted line; i.e.,

log ln{F (t 2 )} log ln{F (t1 )}


log(t 2 ) log (t1 )

(A.4)

where
t2

is not equal to t1.

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Not for Resale

85

IEEE
Std 1413.1-2002

IEEE GUIDE FOR SELECTING AND USING

The mean life for a Weibull is not the characteristic life, . For a Weibull the mean life is

m =

( +1)

(A.5)

where (*) is the Gamma function. The reliability function is given by

R(t) = 1 F(t) = e

t
---

(A.6)

A.1.1.2 Exponential distribution


The CDF for the exponential distribution is given by
F(t) = 1 e

(A.7)

Taking log of both sides yields


ln[1 F (t )]= t

(A.8)

which indicates that ln[1F(t)] varies linearly with time, t. Thus, the probability paper for the exponential
distribution is constructed by plotting the values of [1F(ti)] on a logarithmic scale against the values of t on
a linear scale. The process for complete data is as follows:
a)

Arrange the times for the n failed units in increasing order, so that

b)
c)
d)
e)
f)

Determine the median rank for each failure using Equation (A.2).
Select plotting paper for the assumed distribution (the axes are already transformed)
Plot the points as shown on Figure A.2.
Draw a straight line through the data points. Least-squares can be used to determine the best t line.
Determine the goodness-of-t to be sure that the underlying distribution (exponential in this case) is
appropriate.

To estimate the distribution parameter, the constant failure rate , draw a horizontal line at 63.2% that intersects the tted line and then draw a vertical line that intersects the x-axis. The value at the intersection with
the x-axis is the MTTF, and the constant failure rate is the reciprocal of that value.
In this example, shown in Figure 1, the MTTF is 1000 hours and = 1 = 0.001/hr. The reliability func1000
tion is given by:
R(t) = 1 F(t) = e

(A.9)

86
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

t1 t2 L ti L tn

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

--``,-`-`,,`,,`,`,,`---

Figure A.2Example probability plot


The predicted probability of failure by a specic time can be read directly off the plot. For example, the
probability of failing by 4000 hours is 98%.
A.1.1.3 Other distributions
The normal and lognormal probability plotting techniques are performed analogously to the methods
described for exponential and Weibull distributions. Several commercial software packages that use probability-plotting techniques are available.

A.1.2 Hazard analyses


The advantage of hazard plots is that they are easily used for complete or incomplete data, as well as multiply-censored data. Therefore, they are appropriate for analyzing test data in which all units have failed, tests
in which only a fraction of the units have failed, and eld data in which only a fraction of the units have
failed and the remaining functional units may all have different total number of operational hours on them.
Furthermore, it is often easy to identify multiple failure mechanisms (each having its own hazard rate or
parameters) directly off the plot. As with probability plots, each distribution requires its own paper. Exponential hazard paper is semi-log, whereas Weibull paper is log-log. This is a result of the transformation of
variables described in A.1.1.
Hazard plotting does not require specialized computer software and can be performed with a simple spreadsheet application. The recommended process (see Nelson [B10]) is included here. Plotting can be done in the
spreadsheet or by hand.

87

Copyright 2003 IEEE. All rights reserved.


Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

a)
b)
c)
d)
e)

IEEE GUIDE FOR SELECTING AND USING

Order the times to failure or censure from shortest to longest.


Number the failure and censure times from 1 to n, where n is the total number of data points (test
articles or eld units).
Now reverse the order of the numbers, so the shortest test time is n and the longest is 1.
Calculate the inverse of the order, h(t), from 1/n to 1/1. These are the instantaneous hazard rates
Use only the failures and maintain a running sum of the instantaneous hazard rates. This is the
cumulative hazard rate, H(t)

Using an appropriate Hazard paper, plot the time of failure on the ordinate and the cumulative hazard percent
on the abscissa for each point. If the plot approximates a straight line, then the data t the assumed distribution. Linear regression can be used to determine the best t line.
If plotted on Weibull paper, the slope of the line is the shape parameter, , and the intercept at 100% failure
probability is the characteristic life, . If the shape parameter is 1.0, then the distribution is exponential and
the characteristic life is the MTBF. If the slope is greater than 1.0, then the hazard rate is increasing. If the
slope is less than 1.0 the distribution has a decreasing hazard rate. An example set of test data is shown in
Table A.1 and plotted in Figure A.3. This data shows a decreasing hazard rate situation. The distribution
parameters can be read directly off the plot. The characteristic life is 300,000 hours, corresponding to the
100% value of the cumulative hazard function. For details on the theory of Hazard Plotting, see Nelson
[B10].

1000.00%

= 0.67
= 300,000 hrs

100.00%

10.00%

1.00%

0.10%
1

10

100

1000

10000

100000 1000000

Figure A.3Hazard plot

Unit

Time

Fail?

h(t)

H(t)

Unit

Time

Fail?

10

1/1000

1/1000

150

16

2/1000

3/1000

175

30

3/1000

6/1000

50

100

5/1000

135

148

H(t)

9/1000

33/1000

11/1000

44/1000

10

200

11

210

11/1000

12

250

6/1000

17/1000

13

260

13/1000

57/1000

7/1000

24/1000

14

500

14/1000

71/1000

88
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

h(t)

Copyright 2003 IEEE. All rights reserved.

Not for Resale

--``,-`-`,,`,,`,`,,`---

Table A.1Data for Weibull Plot

IEEE
Std 1413.1-2002

RELIABILITY PREDICTIONS BASED ON IEEE 1413

Table A.1Data for Weibull Plot (continued)


Unit

Time

Fail?

h(t)

H(t)

Unit

Time

Fail?

151000

1000

h(t)

H(t)

A.2 Analytical techniques


Maximum Likelihood Estimation (MLE) is frequently used as an analytic technique to estimate distribution
parameters. The advantage of analytical methods such as MLE is the accuracy of the parameter estimates.
The disadvantage is computational complexity. In case of incomplete data such as suspended units, left and
right censoring, or interval data, MLE modeling is the best analytical tool.
The MLE method can accommodate both complete and incomplete data and can provide condence limits
for the model parameters and for the functions of those parameters. The MLE technique is computationally
very intensive and requires complex numerical routines. However, many commercial software packages can
perform the standard linear regression or MLE calculations and provide analytical estimates of the model
parameters. The theoretic bases of the MLE method are not presented here but can be found in Nelson
[B10].
If the data is incomplete in a non-traditional way, such as the inclusion of non-operating time periods, not
observing the effect of environmental factors, or if failure analysis indicates only the failure of a higher-level
assembly, the Expectation-Maximization (EM) algorithm can be used to calculate the MLE parameters (see
Albert and Baxter [B1]).

89

Copyright 2003 IEEE. All rights reserved.


--``,-`-`,,`,,`,`,,`---

Copyright The Institute of Electrical and Electronics Engineers, Inc.


Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

Not for Resale

IEEE
Std 1413.1-2002

Annex B
(informative)

Bibliography
[B1] Albert, J. R. G., and Baxter, L. A., Application of EM Algorithm to the analysis of life length data,
Applied Statistics, Vol. 44, No. 3, pp. 323341, 1995.
[B2] IEC 60812 (1985): Analysis Techniques for System ReliabilityProcedure for Failure Mode and
Effects Analysis (FMEA), pp. 2935.29
[B3] IEEE 100, The Authoritative Dictionary of the IEEE Standard Terms, Seventh Edition.30
[B4] IEEE Std 1220-1998, IEEE Standard for Application and Management of the Systems Engineering
Process.31
[B5] IEEE Std 1413-1998, IEEE Standard Methodology for Reliability Predictions and Assessment for
Electronic Systems and Equipment.
[B6] JEDEC JEP 131-1998: Process Failure Mode and Effects Analysis (FMEA).32
[B7] Leemis, Lawrence M., Reliability: Probabilistic Models and Statistical Methods, Prentice-Hall, Upper
Saddle River, New Jersey, pp. 230247, 1995.
[B8] MIL-STD-1629A: Procedures for Performing A Failure Mode, Effects and Criticality Analysis (Canceled), 1980.33
[B9] Modarres, M., What Every Engineer Should Know About Reliability and Risk Analysis, Marcel Dekker, 1993
[B10] Nelson, Wayne, Applied Life Data Analysis, John Wiley and Sons, New York, 1982.
[B11] Pecht, M., Product Reliability, Maintainability, and Supportability Handbook, CRC Press, New
York, New York, 1995.
[B12] U.S. Department of Commerce, Questions and Answers: Quality System Registration ISO 9000 Standard Series, U.S. Department of Commerce, Washington D.C., 1992.

29IEC publications are available from the Sales Department of the International Electrotechnical Commission, Case Postale 131, 3, rue
de Varemb, CH-1211, Genve 20, Switzerland/Suisse (http://www.iec.ch/). IEC publications are also available in the United States
from the Sales Department, American National Standards Institute, 11 West 42nd Street, 13th Floor, New York, NY 10036, USA.
30The IEEE products referred to in this standard are trademarks belonging to the Institute of Electrical and Electronic Engineers, Inc.
31IEEE publications are available from the Institute of Electrical and Electronics Engineers, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331, USA (http://standards.ieee.org/).
32JEDEC publications are available from JEDEC, 2001 I Street NW, Washington, DC 20006, USA.
33MIL publications are available from Customer Service, Defense Printing Service, 700 Robbins Ave., Bldg. 4D, Philadelphia, PA
19111-5094.

90
Copyright The Institute of Electrical and Electronics Engineers, Inc.
Provided by IHS under license with IEEE
No reproduction or networking permitted without license from IHS

--``,-`-`,,`,,`,`,,`---

Copyright 2003 IEEE. All rights reserved.

Not for Resale

Оценить