Вы находитесь на странице: 1из 27

PROCEEDINGS OF THE IEEE, VOL. 62, NO.

2,

FEBRUARY

1974

185

TheReliability of Semiconductor Devices in the Bell System


D. STEWART PECK,
FELLOW, IEEE, AND

CONRAD H. ZIERDT, JR.,

FELLOW, IEEE

Invited Paper

Abstracf-The history of reliability of semiconductor devices in the Bell System is a story of sequential application of principlesfirst, of reliability-enhancing processing principles, which are applied in the absence of completely knowledgeable designand testing control; second, of testing-control principles, which follow growing knowledge of process defects and their susceptibility to exposure by test; third, of ,design-control principles, which derive from mature processing and theessentially automatic elimination workmanship of defects. These principles were generally applied to the design and reliability technologies of successive decades of semiconductor-device manufacture. The 1950s saw the development of new device designs with no really severe reliability requirements; a history of this period shows, in fact, that semiconductordeviceswere betterthan reasonable testing specifications could prove they were; although one of the reliability advances of that period was promoted by Bell Laboratoriesthe the use of a sampling technique which provided user a statistical guarantee of the reliability he was buying. The 1960s saw the implementation of, first, the manufacture of diffused devices capable of high-stress testing and, second, the use of high-stress burn-in and life-testing controls which were rigorously relatable to expected reliability at the low stress of application. The motivating forces behind this high-stress testing were the need for low-failure-rate devices and the recognition that such failure rates, of the order of 0.001 percent per thousand hours, could notbe proved by life tests at the stresses the application. of The technology of the 1970s is that of the beam-lead sealedjunctiondevice,representingelimination by design of the major failure mechanisms of the devices of the 1960s. Although the elimination of the need for a hermetic seal raised the additional need for demonstration of the ability of adevice and its plasticcoating to survive in a high-humidity environment, this demonstration was also found to be amenable to physically based relationshipsbetween humidity and temperature conditions and device life. The successfuldevelopment of relationshipbetweeneffective device test requirements and system life objectives is attributed to the Bell System interrelationships by which the device designer is responsible both to the user (the system designer), with whom he has agreed upon a reliability objective, to thedevice manufacturand ing organization, to which he owes the most economical combination a of processes, process controls,and h l testing requirements to meet the system objective.

degrees, the reliability factor becomes more critical than it is for many deviceusers. T h e size and complexity of some of the equipments such as, for example, a large Electronic Switching System, is such that the system design and the device reliability are clearly interrelated. The system design must provide the necessary features to assure acceptable overall customer service while tolerating the part failures expected forknown a part reliability. In such cases subsystem, circuit, or part redundancy may be required in order to maintain continuous functioning, and the necessary fault indication and correction capability must be provided to keep repair time sufficiently low and thereby have an acceptably low probability of failure reof a system or circuit whichis temporarilywithoutits dundant back-up.In other Bell equipment, temporary outages may be tolerable, and the reliability consideration may emphasize overall maintenance economics. I n the other extreme, such as t h a t of a communications satellite or a submarine cable, repair is either impossible or extremely expensive and time-consuming, redundancy may not be feasible because of either economic or technical considerations, and n extremely a low failure rate for the component parts must be obtained to achieve a viable system. For such cases, many organizations concerned with communications in earth orbit or with data transmissionfromdeepspacefaceseveredevicereliability problemsfor(usually)relativelysmallnumbers of devices. Nearly as difficult reliability requirements exist, however, in many telephone systems because of the large numbers of devices used. Because of the factor of equipment size and complexity, a second feature for consideration is the importance tradeoffs of between device reliability and device manufacturing cost, and between system tolerance for device failure and system manufacturing and operating cost. With the multiplying factor of large device usage, device cost is obviously a critical consideration, but so also is the cost of system repairs due to device failures. After the circuit designer has done best to provide his I. INTRODUCTION performance margins and has taken other precautions to miniHE RECENT 25th anniversary of the invention of the mize stress on the devices, he may still be faced with a probtransistor has emphasized the many advances in elec- able failure ratewhich is too high for acceptable performance. tronicsmadepossiblebysemiconductordevices,and Now his ability to decide whether he can afford-or, indeed, the manycomplex systems made viable through the small size, can even get-higher device reliability, or whether he must cost, and weight, and the high reliability of these products. make other changes in circuitry or system configuration, can T h e Bell System itself has certainly shared in making use of be improved greatly by the close organizational relationship these advantages and, since the introduction the transistor betweenequipmentdesignanddevicedesignin of Bell Teleintomanufacture,sometwo billion transistors,diodes,and phone Laboratories, device and manufacture Western in integrated circuits have gone into operation in the Bell Sys- Electric Company, which exists in theBell System. The additem, and they have a major position in determining the per- tional association with the operating companies which must formance and reliability of todays telephone system. maintain the equipment provides for automatic consideration of thesedevices While all theperformanceadvantages of the total life-cycle cost of a device-including replacement serve the various Bell System equipment designs in varying if cost and the impact, any, of the cost of equipment outagein the tradeoff between device cost and reliability. The third factor, which is implied in the preceding discusManuscript received June 25, 1973; revised August 15, 1973. sion, is the necessity of knowing what the device failure rate The authors are with Laboratories, Allentown, Pa. 18103. Bell

186

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

2) The cost of error when the necessarily larger new sysin the system will be-and, most important, knowing it at the tems are committed to design, manufacture, and installation time of the system design. The followingadvantages are apwith increasingly shorter time periods and increasingly newer parent: 1) The systemdesign can be optimized with respect to the device technologies. The extentof the problem underlying the concerns over deneedfor redundancy,margintesting,repairabilityfeatures, alarms, and all such features relevant to overall system avail- vice reliability previously indicated can be put into focus by taking a brief look a t some numbers. The earliest transistors, ability. were 2) The planned device costwill include the marginal costs such as the point-contact and grown-junction devices, lives relatedto processcontrols,environmentalscreens,and life being compared to electron tubes with average of perhaps tests necessary for knowing, hence guaranteeing, the device 10 000 h, so that a failure rate even ashigh as 2 percent/1000 quite good. reliability. Thus the system cost estimateswill be consistent, h (an average life of 50 000 h)wouldstilllook By 1960, however, the needs for large systems, using diffused from the beginning,with its reliability objective, and valid judgments can be made regarding the economic viability of silicon devices, appeared much different. For the #1 ESS [l], the system. for example, in order to achieve a n acceptable mean time for 3) For large systems, many years may elapse during comrepair (with low probability of concurrent failure of redundant pletion of design, prototype building and testing, and building subsystems) it was required that the semiconductor devices of productionsystems before,finally, the initiation of cus- have failure rates between 0.0001 and 0.001 percent/1000 h tomer service. Only in the final service environment can the (from 1 to 10 failure units, or FITS). At the same time, in eventual reliability result be measured, and in the meantime another smaller switching system i t was found that, although large numbers of devices and much equipment will have been 200 FITs were originally assumed for the transistor failure built. If i t is discovered at this late date that reliability is rate, a reduction i n the guaranteed failure rate to 10 FITs inadequate, the correction can be very, very expensive. For simplified planned maintenance procedures and significantly example, commitments were made as to the reliability of the reduced the costof the alarm and test facilities. (The severity devices for the #1 Electronic Switching System (ESS) (using of a 10-FIT requirement illustrated by the fact thatrepreis it of the order of 50 000 transistors and 100 000 diodes per sys- sents only one failure per month in a population of approxitem) in 1960; the first system was cut over into service in mately 140 OOO devices.) March, 1966, and itwas a year later before removal data were At one extreme from these large-system requirements are the small equipments and power supplies wherethe transistors availableto confirm thatthesemiconductor devicefailure rates were as good as (actually better than) expected. During or integrated circuits (ICs) (frequently purchased from outthis interval, some million transistors and diodes were man- side suppliers) have reliability requirements of 200 and up to 50 ufactured for use in this system. 2000 FITs or more. At the other extreme are the submarinecable transistors and diodes for which, because of extremely Since, as in the case cited, the devices for new large systems are frequently at the state the artin performance and high system reliability requirements and very stringent limits of of time, failures can are untried, there are no field reliability data available at the on performance over very long periods hardly be contemplated and failure-rate limits can hardly be time of system design as guides for estimating the expected performance. The absolute importance of the need to have a stated. This special subject is covered in a companion paper guaranteed maximum failure rate, however, and to know how by Miller.* to improve itwhen necessary, is in no way diminished for new If system-complexity requirements had remained static, devicedesigns, and the present status of thereliability of the introduction of ICs would have allowed a higher failurerate objective for each silicon chip used, since each I C chip semiconductor devices in the Bell System has depended upon the development and application of relevant tests for early replacedmanydiscrete devices. Further,earlyaccelerated and accurate evaluation new designs, upon studiesof failure tests indicated that moderately of complex IC chipswerecamechanisms corresponding and production controls, and, pable of reliability approaching that which could safely be prefundamentally,upon design toavoiduncontrollablefailure dicted for a single discrete transistor. This welcome confirmamechanisms. tion of reliabilityjudgment, based onfailuremechanisms, The Bell Systemhas,sinceitsinception, placed strong opened new system-complexity horizons (albeit with substanemphasis on component-partreliabilityandonequipmenttially different function-partitioning concepts) and very exdesign disciplines so that customer service could be provided pandedIC size andcomplexityas well as communication or malfunction. I n earlier system objectives and the number of system functions. Presessentiallywithoutinterruption years, the selection and evolutionof design and manufacturing ent-day failure-rate requirements for many IC chips, therepractices were largely based on experience: those which had fore, are still found to be in the 10- to 30-FIT range, with a history of good past performance were continued, and innova- strong motivation toward the order of 1 F I T for next-generations were introduced only after exhaustive laboratory testing tion systems. and field trials which frequentlyrequiredseveralyears for The range of realizable IC functions is considerablyexcompletion. This methodology depended, in large measure, on tended by interconnecting silicon chips (containing primarily the retention of part design and processing experience in Bell activedevicessuch as transistors and diodes) withpassive components (resistors, capacitors, inductors) and made of Laboratories and the concomitant manufacturing experience in the Western Electric Company. TWO the se\q=ral factors which have worked to obsolete of I Thepossibility of erroror confusion becomesgreater whendealing with small fractions like 0.001 percent, and the accepted failure-rate term this slow-but-sure procedure are as follows: in the Bell System has become the failure unit, or FIT, equal t o one fail1) The extremely rapid pace of technology changes, reure in 10 device-hours. Where distinction is madebetweenremovals a quiring rapid accommodat~on to maintain economy in new and actual failures, the removalrate is stated in removal units or RITsservices required to meet the growing demand.
a removal in 10 device-hours. * This issue, pp.230-244.

PECK .4SD ZIERDI: RELIABILITY IK THE BELL SYSTEM

187

other materials, in hybrid ICs. System density and highT h e first devices made were encapsulated in a variety of frequencyperformancecanalsofrequentlybeincreasedby plastic materials; humidity testing (and even shelf life) soon interconnecting a number of silicon chips on asingle s::b- indicated that these materials could not adequately control the device-surfaceconditions, and solder-sealedmetalcans strate mounting to form multichip ICs. I n either hybrid with glass-insulated feed-through terminals were emfirst or multichip devices, reliable interconnecting technology devices from ambient atmospheres. which will permit short interchip conductor runs necessary; ployed toprotectthe is Difficulty in obtaining dependable hermetic sealing by solderfurtherdesignfreedom is obtained if theseinterchipconing led to the evolution of precision glass-to-metal headers ductors can be made with electrically insulated crossovers on which transistor elements were mounted, with finalherormultilayertechniques. I t has,therefore,beenessential to most efficient system design that the passive-part, intermetic sealing accomplishedby resistance-welding of a cap to connection, substrate and technologies be developed to a the header. At the same time, the terminal configuration was degree of reliability consistent with t h a t of which the silicon changedfromthein-linearrangementinheritedfromsubminiature electron tubes to the now-familiar triangular conchips are capable. During same the period, diodes rectifiers and I n Section I1 of this paper some of the history of early figuration. device designsof the 1950s is reviewed briefly, providing some underwent similar changes. Life testing still produced large changes in device characof the modern reliability rebackground appreciation for teristics. when even they were truly hermetically sealed. quirements. Section I11 considers the device technology, Studies[2] of germaniumsurfacesexposedtovarious atlargely of the 1960s, when the 10-FIT reliability needs were mospheres revealed the significant effect of water vapor on metbythedevelopment of accelerated-stresstesting,with device performance, the and cause of life instability was theconcurrentemphasis on control of failuremechanisms. film onthedevicehousings, Section IV describes the success of design for reliability as tracedtotheadsorbedwater which couldnotbesufficientlyremovedbyheatingbefore demonstratedinthebeam-leadsealed-junctiontechnology, and Section V attempts to evaluate the impact of organiza- sealing because of the low melting points of the alloy-device elements. Some moisture level was desirable in order to protionalrelationship on theadvancesseeninsemiconductor videoptimumperformance,andpretreatment of moisture device reliability. getters (e.g., porous quartz, calcium or aluminum silicate 11. THE1950S-EVOLVING DESIGNS AND TECHSIQUES zeolites, barium carbonate) was optimized, yieldinga t last (in the late 1950s) devices which were relatively stable and with Active product development of transistors and junction good performance. diodes,within the Bell Systemandinotherorganizations, The recorded reliability history of the devices of this era startedveryearlyinthe 1950-1960 decade. Bylate 1951 results from the early point-contact types and from an expoint-contact, grown-junction, and alloy-junction structures had been disclosed; soon thereafter the surface-barrier (SBT) tensive trial of alloy germanium transistors which took place structureenteredthe field, andin 1955, Bell Laboratories in the later 1950s a t Morris, Ill., in a prototype model of the announced diffused-junction devices. One the earliest appli- present # 1 ESS. Table I summarizes some of this early exof cations of transistors in theBell System, however, was t h a t of perience within the limitations of available records. I t can be seen that, although these failure rates are high by a phototransistor a andpoint-contact amplifier transistor present standards, they are compared to the then-common low driving a cold-cathode gas tube in equipment used for cuselectron tube failure rates, and hence show early evidence of tomer identification for toll calls. In the mid-1950s the alloy a n d S B T performance capathe reliability potential of this new semiconductor technology. In contrast to this early picture, a 1965 field study of a bilities were established in the industry, and semiconductor system complexity had increased to levels which made device p-n-p alloy germanium transistor disclosed a 20-FIT failure reliability an increasingly important consideration. This relirate, and a more recent review of several applications of this ability interest received strong stimulus from the concurrent transistor in a digital transmission system shows a range of discovery by government organizations that electronic equip- removal rates for the different circuits which the transistors in ment was not, in general, very dependable and that its main- are used, up to several hundred RITs,~ but with 50 percent of tenance costs far outstripped the initial purchase costs; retheapplications below 30 RITsand 25 percent below 10 quirements for reliability demonstration began to appear in RITs. The effect of variations in the severity of the applicaequipment contracts. The Hell System, with its built-in incen- tionswithinonesystem is clear.Since thefailurerateis tive of maintaining its own equipment, was proceeding rapidlynormally much lower than the removal rate, as discussed in in advancing device reliability and technology. The developdetail in a later section, i t is strongly suggested that the rement of semiconductor-device reliability improvement took liability of products of thisearlytechnologyhasbeenimthe parallel paths of: proved significantly in subsequent years, largely due t o continued improvements in processing. 1) improving device reliability by design and testing; Withrespecttodiodes,unpassivated diffused-silicon di2) searchingforwaystomeasureandtoproveeither odes were being introduced during the latter 1950s into variabsolute or relative device reliability. ous Bell System applications. Removal rates were observed to of range from 10 to 200 RITs depending upon the severitythe A primary reason for device instability was found to be

188
TABLE I
FAILURE RATES(IN FITS) OF EARLY SEMICONDUCTOR DEVICES TIMES OF OBSERVATION I N SYSTEM USE AT
DEVICE TYPE
YEAR O OBWRVAnON F 1952 1954 1957 I951 1959 1961
170

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

PHOTOTRANSlSTOR 1900

640

POINT CONTACT
TRANSISTOR

IO00

650

ALLOY GERMANIUM TRANSISTOR pnp

- LOGIC - OTHER npn - LOGIC - OTHER

1200 600 200 170


2000

360

l o 0 0 400
2000 1200

FITs, although one as low as 5000 FITs was used, with a minimum sample size of 1065 units (allowing two defectives). Although both of these methods are still in wide use today for equipment reliability calculations, they offer no assurance in themselves that a given batch of devices will react consistently with the models. Even when data-based rather than assumed, they are historical in nature and therefore inapplicable to new products. In general, then, the reliability assurance techniques available for the products prevalent up to the end of the 1950s could not be practically used to provide guaranteed (rather than estimated) reliability of the level which those products have demonstrated in actual use. 111. THE1960S-HIGHER
RELIABILITY

ASSURANCE

The changes in the general status of semiconductor device usecondition.Analysis of the failuresdisclosed thatthey reliability from the 1950s to the 1960s stemmed from three usually came from low-strength portions of bimodal failure major factors which evolved through the late 1950s. These distributions. Manufacturing development devoted to product were improvement of these devices, even into the 1970s, has re1) the need for increasingly higher reliability; sulted in a gradual elimination of the low-strength portion of 2) the development of structures capable of being tested a t the life distribution. Studies of failure rates of such devices high stress levels, with respect temperature to and in a variety of apparatus have disclosed removal rates ranging mechanical stresses; from 1 t o 30 RITs depending on the use condition. 3) the increasing evidence that, through improvements in Inadditiontothedevelopment of processing improvedevice design, materials, and process control, transistors ments, developments testing assure in to reliability were could be extremely reliable if workmanship defectscould pursued. A contribution by Bell Laboratories was the concept be controlled. [3] that a life-test sampling plan should give the user a high degree of assurance that the specific lot of product which he Encouraged by this growing evidence of semiconductorspecified (for buys will not have a higher failure rate than that device reliability and the relatively small size and low cost the given life-test condition). This was a departure from the of semiconductor systems, system planners thought more and type of sampling traditionally used for electron tubes which, more in terms of large systems, using large numbers of active while ittendedtocontroltheaveragequality level,could devices,with a sufficientlylowfailure rate requirement for allow rather wide variations in reliability in individual lots. the devices to provide a satisfactory system availability. As The concept was incorporated into the basic military specifi10cations, at therequest of deviceuserswhorecognized its Bell System plans developed, it became apparent that a benefit to their interest. As an example of this new concept, F I T reliability requirement was to become the objective of for a 10-FIT product, the life test should demonstrate that this period, at least for large systems for telecommunication the failure rate would not exceed 10 FITs with a 90-percent switching or data handling applications.A data processor with wouldbuild uptonearly statistical confidence, if the life test is run at thestress condi- 50 OOO transistors,forexample, tion a t which that failure rateis required. In this case, 23 OOO 5X10s device-hours in one year, so that a 10-FIT transistor devices would have to be life tested under typical use condi- failure rate would cause approximately tions for 10 000 h (or 230 000 for loo0 h) with no failures, to meet the 10-FIT requirement. Even a 1000-FIT requirement failures/device-hour X (5 X lo8device-hours) would require 2300 devices to be tested for 1000 h; this is still a much larger sample size than is considered practical for a = 5 failures/year. lot-by-lot test. These sobering figures indicated that an approach other While this could be considered acceptable, a failure rate of than use-condition life testing was necessary if reliability was 100 FITs (in comparison) would result in nearly 50 failures to be assured a t reasonable cost, and some approaches toward per year or nearly one per week, due to only one component assurance of high reliability, in the industry, turned toward type out of many. Depending on the system redundancy prothe following methods: visions (to allow repair without interrupting service), the100a)Prediction models basedonthe Weibull distribution F I T r a t e could easily be found to be unacceptable, and genwhich would predict lower failure ratesat later times than the erally was. Certainly the 10-FIT reliability level was of the life test time, reflecting the consensus that failure rates didor order required. should decrease with time, a t least in early life. The second factor of importance was the development of b) Application of assumed or data-based acceleration or diffused transistor structures, which alleviated earlier the derating factors to give low failure-rate estimates for lower limitations on the temperature to which devices could be exstr& use conditions based on a much higher failure-rate limit posed without structural damage or change in state of maledfirst to diffusedgerinvoked by specification for life testing of a relatively small terials or materialinterfaces.This sample at the maximum rated stress condition. Requirements manium mesa structures and later tosilicon mesa and planar at ratings were typically in the range of 50 000 t o 100 OOO devices. A part of this achievement was also the development

(g

PECK AND ZIERDT: RELIABILITY IN THE BELL SYSTEM

189

of thermoc~mpression~ bonding [4]of smallwires tometal contacts on the transistor structure to provide connections to the external leads through the encapsulation. n- 6,660/ The third evolutionary factor was the dramatic success of development and manufacturing-engineering work in improving device reliability through better material process and choices and better control of processing. Numerous problems of control of silicon-crystal structure, chemical purityand cleanliness, metallurgical microstructure, thermocompression bonding, etc., were overcome despite the fact that the necessary measurements could not be made directly (available equipment was not sufficiently sensitive) and variationscould onlybedetectedbytheireffects on devicecharacteristics. Meticulousbootstrapdevelopmentineachmaterialand process area, plus the interchange of information stimulated by the physics of failuren5 concept, yielded device designs 101 0.01 02 1 10 X) 90 98 999 andmanufacturingpractices which dramaticallyimproved CUMULATIVE FAILURE ,PERCENT reliability. As the inherentcauses of device unreliability were isolated Fig. 1. Life test data on diffused germanium transistors a t high temperatures. In this plot of logarithm of time-to-failure versus a normal and controlled, another factor began to assume a dominant probability scale, a straight line represents a log-normal failure disrolein determining the actual reliability. In increasing protribution. portion, device failures were found to have been initiated by workmanship defects, which originated basically from the low-stress conditions, and not limited to the long waiting time stringent demandsof new designson the capabilitiesof human necessary to determine, in actual application, the effect of a workers and the then-available machines. Mechanically, the presumedimprovement, as was the problemwithelectron minute size of the device elements and the slow evolution of tubes. The way was now clear for an examination of the effect new machinery for such things as chip-handling, photslithog- of life testing at higher and higher stress levels to determine raphy, and wire-bonding left devices vulnerable to a variety how much a given failure mechanism of interest could be acof workmanship defects. Those whichwere related to yield celerated while maintaining a physically consistent and mathwere recognized quickly; those related to reliability required ematically definable relationship among results. growth of process-physics studies relating failure analysis back Early Accelerated-Stress Tests to process defects. Such studies initiated the concept, and led to the growth, of reliability-physics work. mhile the reliabilTheearlycontrol of reliability of diffused germanium ity-physicsapproachengenderedmanyreliabilityimprovemesa devices was achieved largely through tight control of ments via better designs, processes, and measuring techniques, those process steps which were understood to have an effect i t could notpracticallysolvetheproblem of defectswhich on probable failure mechanisms. These included items such as: occurred infrequently and were of random rather than sys1) cleanliness of rinse of the final emitter etch; tematic origin or nature. The final result, for this era of de2) thermocompression-bond strength: vices, was the recognition that, no matter how stringent the 3) control of water vapor in the final encapsulation. process control sampling and the extent of efforts to build i reliability into the product, t was sometimes more economical Details of these efforts will not be discussed here, since more on silicon, andfrequently moreeffective to test-toscreen-the final of our attention is drawn later to similar efforts product to eliminate workmanship defects than to attempt towhich is more to the point in the 1960 era. The development eliminate them from the process. The judgment for each type of 10-FIT reliabilityrequirements,however,foradiffusedof defect depended on the stateof the artof either approach at germanium transistor brought about the decision to examine the time. For example, even to this date there is no process- a t least the temperature-acceleratable failure mechanisms by control technique equal the to effectiveness of the long- supplementingthe specified100C temperature-storage life testing with additional testsat 150C and 200C [SI.At about established leak-testing techniques for 100-percent assurance the same time, the development step-stress testing [ 6 ] indiof of hermetic seals. cated that the distribution time-to-failure a t constant stress of These factors provided the impetus and the opportunity be for the development of a completely different approach to re- for semiconductor devices should log-normal, i.e., Gaussian in logarithm of time-to-failure. (The log-normal distribution, liability assurance, not limited to massive test programs as testing, including step-stress and tests, required for brute-force demonstration of low failure rates a t accelerated-stress failure rate calculations are discussed in Appendix A.) This provided the techniquefor handling the test results.which are Thermocompressionbonding consists of applying pressure (suffi- shown in Fig. 1. These tests were made on a production basis cient to cause severe plastic deformation in the materials) a t elevated temperature to a junction of ductile metals, to obtain a molecular bond and i n large numbers and, although a limited number of test between them. When well made, such bonds are as strong as the metals. periods were used,the fit to thelog-normal distributionis very The physics of failure conceptdescribes a feedback-loop discipline good for the 150C and 2OOOC data. (With less than 1-percent whereby failed devices are analyzed to establish the causes of failure (failure mechanisms) and this informationis fed back to generate suitable failuresin the 100C testandonlytwodatumpoints,the controls for each failure mechanism by design changes, or process control question of a distribution fit could not be resolved, but a dis(e.g., process monitoring, inspection, operator training, and motivation, tribution similar to that of the other data was assumed.) I t or special tests on completed product).

190
450 400

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

though the planar structures replaced them for a number of applications, i t was found that similar reliability results could be obtained with either design. 300 Laboratory testing disclosed a host of failure mechanisms 25 0 t o which the silicon diffused device technology was heir. The 200 following were the most significant: 150 1 ) Surface Instability: The mesa silicon devices were foundtobequiteaspronetosurface-originatedelectrical 100 characteristicinstabilitiesasgermaniumdeviceshadbeen, and the early opinion that planar silicondeviceswouldbe 50 completelysurfacepassivatedbytheirSi02layersover junctions, and therefore immune to their surroundings, was 25 soon proven incorrect. planar The stability problem was IO-* loo 101 102 103 104 105 106 showntobecausedbycontamination of the silicon-oxide MEDIAN LIFE, HOURS passivationbychemicalswhich,byproviding mobileions, Fig. 2. Comparison of diffused-germanium transistor processes by causedthe silicon surfacestoconductbecausetheionsinmeans of Arrhenius plots of median livesfrom accelerated-stress data. See Appendix A for a comparison of constant-stress and stepstress verted the bulk-silicon conductivity just beneath the oxide. testa and for a discussion of the Arrhenius plot. 2) Housings: Although glass-to-metal sealed housing quality wassteadilyimproved, close control of seal andcapwas recognized that such accelerated-stress data could be used welding quality was required to prevent air leaks. Plastic enfor comparing one process with another, with respect to ex- capsulations were eventually introduced andwill be discussed pectedreliability,andFig. 2 showssuch a comparisonfor later. these diffused germanium transistors. In all cases the tran3) Chip Bonds: Bonding of the chip body to the housing sistors were testedtoelectrical-characteristicrequirements and a nonrectifying required both a goodmechanicalbond which matched the circuit application, so that the extrapola- low-resistance electrical contact, because the chip bond also tion would be most applicable. I t was observed that thefailure provided the electrical contact to the collector region. Indusmode was consistent throughout the temperature range, indi- trial suppliers experimented with hard solder, soft solder, concating a valid life acceleration. Failures in subsequent equip- ducting plastics, and similar materials, but had mechanical all ment operation were also of the same type and showed a delimitations (such as bond fatigue) except, perhaps, in unique gree of consistencywiththeaccelerated-testdata,inthat applications. The best known process, eutectic Au-Si bonding product of the ungettered process had a 17-FIT failure rate as to a gold-plated header surface, served best but still required measured over the initial 2-year period, and in a later equipclose control of contactwettingthroughthemonitoring of ment product of the gettered type showed a 3.5-FIT failure thermal resistance between chip and housing. rate in a similar operating period. The equipment operation 4 ) WireBonds: Thermocompression-bonded wires were was observed for quite limited times and probably reflected used to connect the active chip regions to the terminals the of some infant mortality in the product, but at least demondevice housings. With a complete wedge bond in which the strated the same sort of reliability comparison as was indiwire is nearly flattened with the bonding tool point, a multicated by the high-stress tests. Other observers [7] have seen tude of factors (e.g., improper temperature or pressure, tool differences between failure modes of diffused germanium wear or improperform)producedweakbonds which later transistors under temperature stress or power stress, as has failed in service. When goldwirewas bonded to aluminum also been seen in alloy germanium transistors. This was not metallization, gold-aluminum intermetallic compoundswhich observedinthiscase,andanexamination of a number of were brittle or electricallyresistive wereformed and,with these transistors af.ter some ten years in service showed the poorly controlled bonds, open-circuit early failures were sametype of parametric changes indicated the by high- caused. temperature tests. 5 ) Particulate Contamination: Control of particulate contamination within the very small device housings could be Hermetically Sealed Silicon Devices and Their Failure Mechaparticularlydifficult:particles of practicallyeverymaterial nisms used in or near the devices found their way into sealed deWhile they were in very active development and limited vices, causing permanent or intermittent short-circuits. use duringthelate 1950s,silicondevicesmoved intothe 6 ) Aluminum Metallizations: In the early era of diffused technologicalforefrontthroughtheearly 1960s. Theirindevices terminal wires were connected to the chip by bonding herentlyhigheroperating-temperaturecapabilities,thestathem to evaporated-aluminum contact areas placed directly bility of the silicon-dioxide insulating layers which could be over the active regions. The developmentof higher frequency readily formed on their surfaces, and a number of simplified (hence smaller junction) designs led to placing the terminal processingfactors(e.g.,oxidemasking of diffusants)drew pads over the Si02 layer on the chip surface to obtain adequate developmenteffortawayfromgermaniumdevicestoward bonding areas, and connecting them to the active device resilicon, although material problems had to be overcome in the gionsvianarrowevaporated-aluminumconductors. I t was process. Initial work was done on mesa structures,6 and althen found that open-circuit failures in the aluminum conductors occurred because of electromigration of the metal at * In the mesa design the collector-base junction terminates the edge high current densities: this problem was compounded by the at of the chip where a Si02 insulating layer is not readily formed. In planar fact that simple aluminum-evaporation technique frequently devices all junctions terminate on the top surface of the chip, where procovered the edges of steps in the underlying Si02 layers very tective Si02 layers are readily formed and are not disturbed during the poorly,causingexcessivelyhighcurrentdensitiesinthese slice-separation operation.
350

RELIABILITY ZIERDT: PECK ASD THE

IN

BELL SYSTEM

191

areasalthoughthedesignedmetalcrosssectionsmayhave been adequate. Poor adhesion of the aluminum metallization totheSiOtsurfaceswasalsofrequentlyencountered,contributing to electromigration. The aluminum metallization system was also found to be pronetogalvaniccorrosioninthepresence of very minute amounts of water vapor enclosed in (or leaking into) device housings and of very small amounts of certain chemical contaminants such as chlorine. This listing only suggests the major areas processing and of material usage where opportunities device for failure are present. In most of these areas there are many variations of the class of defect, depending on specific design or on proccssing techniques, and these variations tended to grow as the devicesbecamemorecomplex.Comprehensivereviewsare available in the literature [8].

nitrogen atmosphere with tight controls on humidity ( < 2 0 ppm) and contaminating gases. This controlled the influence of humidity on silicon surfaces as well as on the aluminum metallization. Through such design and process control means most of the obvious device weaknesses were counteracted. Since the major applications were in electronic switching or other lowpower uses, the problems of temperature cycling* which tend to rise in larger structures were not of significance. Electromigration in aluminum conductors was not a problem with lower frequency devices, in which wires were bonded directly to the chip contact areas. For higher frequency planar transistors, the possibility of thinmetallization a t steps in the silicon oxide surfacehadbeen foreseen, anddeposition of aluminum from multiple sources, t o give good oxide-step coverage, was thestandardproductiontechnique used in addition generous to conductor design. area Corrosion of Reliability Control by Design and Process Control aluminum by water vapor and chlorine or other residues was T h e principle of controlling reliability of devices for the largely controlled by process control the final sealing atmosof Bell System,prior to theapplication of accelerated-stress phere, but corrosion occasionally appeared, still and this testing, was by using designs and process controls aimed a t failure mechanism remained a concern. minimizing the probability of occurrence of the known failure As in the case of the transistor, the design of diodes for the mechanisms. T h e Western Electric devices of the early1960s, Bell System in the decade the 1960s was strongly influenced of by thenecessity of avoiding known device failure mechanisms. then, included the following special features: 1) Thermally bonded glass-metal housings, with 100-perFor example, a particular diode for high-speed switching apcent Radiflo7 leak testing for protection against lack of ade- plication in the #1 ESS was designed during the early 1960s quate manufacturing process controls on housing leaks. when the understanding of the effect of alkali ions on device 2) Use of a constrained wedge bond, by which only the stability was beginning to emerge. Considerations influencing edges of the wire are flattened, the center of the bonding tip choices of various design options a t t h a t t i m ewere the needs having a notch which permits most of the wire to retain nearly for a high degree of structural integrity and for minimization its full thickness while still providing the necessary pressure of alkali-ion contamination. Thusa specially formulated lowon the juncture of the wire and the metal contact on the chip alkali glass, which also sealed well to, and had a temperature to obtain a high-strength thermocompression bond. Tests had coefficient of expansion consistent with that of, the diode submade it clear that a bond of sufficient strength could withassemblywaschosenforthediodeenvelope.Theresulting structurewasnot stand the growth of gold-aluminum compounds, unless they only capable of withstandingautomatic sufficiently weakened the heel of a n overly crushed plain assembly in cordwood circuit construction without suffering wedge bond. With the added heel strength of the full diameter breakage or intermittent open circuits, but provided margin provided by the constrained wedge bond the heel weakness for the high-stresstests necessary to assure the desired level of was eliminated. As was demonstrated later, bond failures a t reliability. any temperature stress were then a small proportion of the During the 1960s, a greater awareness on the part of the total failures. of device and circuit designers the impactof use conditions on T h e refinement of the ball-bonding technique provided a n the reliability performance of semiconductor devices wasalso adequate alternative process. Here the end of the gold conbeing developed. This is particularly true for semiconductornecting wire is melted and balled by an intense flame, pro- diodeapplications. T h e high-speedswitchdiscussedearlier viding a relatively larger surface which can then be pressed operates in a particularly benign environment. Voltages and against the bonding pad to form the thermocompression bond.currents are low and well regulated. Duty cycles are low and This avoided the weak heel of the wedge bond and provided surge protection was provided. the capability of reliability comparable t o t h a t of the conFrom the beginning, however, it was recognized t h a t restrained wedge bond. liabilitycontrolbyintuitivelydesigningandprocessingto 3) Although not required in the normal Bell System appli- minimize failure mechanisms was not enough, and that more cation, devices intended for equipment which was subject to explicit testing control was needed, to be based on knowledge vibration were coated(afterwirebonding)withdeposited of the significance of the test with relation to the expected life Sios, in order to insulate against shorts from metallic particles. conditions in application. 4) T h e possibility of leaving moisture in the sealed device Testing Technology-Surface Degradation was initially avoided by vacuum-pumping through a tubula- Accelerated-Stress Failures tion on the top of the transistor envelope, with appropriate soon high-temperaturebaking on thepumps,butthiswas Theinitialresults of high-temperaturestorage tests on replaced by welding the enclosure, after a drying bake, in a germanium transistors, as shown in Figs. 1 and 2, encouraged
8 When large-area devices are temperature-cycled (either externally or by cyclic power-dissipating operation) the chip bonds tend to be fatigued by stress induced by thermal-expansion-coefliaent mismatch between the chip the and mounting surface. Careful design and manufacture are necessary to eliminate this problem.

7 Radiflo leak-testing involves exposing devices to a pressurized radicactive gas, then examining each device for the presence of radioactivity which would indicate a leak in the housing. It can be considerably more sensitive than other usual methods depending on the pressure and time of exposure.

192

PROCEEDINGS OF THE IEEE. FEBRUARY

1974

a similar approach to silicon transistors [9], with the result shown in Fig. 3, representing preproduction-model devices of about 1960 vintage. These tests were conducted using tem1962 perature storage alone, but by similar data were available using power to achieve the accelerated junction temperature. From 1960 to 1962 the median life of Western Electric transistors had been improved by a factor of 20, and the standard deviation of the life distribution had reduced by half, so that (for example) the first 1-percent failures would occur at onetenth of the median life, rather than at one-hundredth of the median. Several other pertinent features in such tests on silicon devices are as follows: 1) In silicon devices, frequently was more than exLIFE.HOURS perienced with germanium, the application of power (or temFig. 3. Arrhenius plot of several life tests of 1960 silicon transistors, indiperature with reverse junction bias) may cause higher failure cating anapmrent activation energy of 1.02 eV. The lower percentiles rates than will temperature storage, for the same junction of failures are calculated using the averageu of the severaltests. The u of the stepstress test includes a conversion to equivalent constanttemperature. For a well-made product, the results from the stress conditions. two conditions would be the same; for poor-quality devices, life under power would be much lower than that under temfreaks, and 2) life-testing of the remaining product a t adeperature. quate stress levels. Median lives have been found, in recently 2) Thefailuremechanism was the same over the stress testedcommercialdevices,rangingfrom 30 h a t 200C to range tested; the observed parametric degradation indicated the presence of surface inversion layers due to surface charges. 2000 h at 300C. This range of quality (of over 3 orders of In testsof a 400-mW-rated silicon transistor, the failure modes magnitude, when normalized as in Fig. 3) would generally not resulting from aging different samplesa t power levels from 20 be distinguished by testing at 125C, as is typical of current to 700 mW were compared (at the failure percentage seen a t commercial IC practice, or at 150-185C as commonly used A treats the rigorous the 20-mW level) : no trend toward different modes occurring forcommercialtransistors.Appendix a t low stress, at the different power levels was evident in the distribution of relationship between failure-rate requirements failedparameters.Thisobservationindicated that a basic and the necessity of life testing at or near 300C junction precept of useful accelerated testing (similarity of the failure temperatures in order to disclose undesirably short surfacemechanisms)wassatisfiedbypoweracceleration, and t h u s inversion life distributions in economically feasible life-testing tended to allay, for these devices, fear of unrealistic failure times. the The tests at high junction temperature also disclose the acceleration as expressed by others [lo]. time temperature necessary remove to the 3) The failure activation energyof 1.02 eV (Fig. 3) is com- screening and thermallyacceleratedfreaksinordertomakethelife-test mon to silicon mesa and planar transistors, and the surfacecontrol effective (see Appendix B). As with life testing, i t is inversion failure mechanism has shown this same activation seen that the typical screening conditions high-reliability of energy for all transistors using aluminum metallization and S O 2surface protection (from any manufacturer) onwhich we commercialspecifications areinadequateforremovingthe have performed accelerated life tests. I t h a s also been seen for freak units exhibiting such failure mechanisms, and yet freak percentages of from less than 1 percent to over 40 percent ICs madewiththattechnology,andshould be expected have been observed in higher stress tests. In life tests a t relawhen a n I C failureisattributabletothesurface-inversion tively low stresses near commercial ratings, the failures ob.failure of a transistor in its circuit. served are typically freaks due to several failure mechanisms, 4) In all such some tests, level of failures appears: these are defined as those units which become defective in a some of which are thermally accelerated and some perhaps not; the low and randomly varying frequency of failures and shortertimethan wouldbe expected,forthatpercentage, the mixing of failure mechanisms seen i n such tests give rise from the main distribution. In many cases, these freaks are which similar in failure mode, and appear to have a thermal activa- to estimates of acceleration or activation energy are not repeatable and, therefore, cannot be used with confition energy quite similar, to that of the main population. dence for prediction. Anothergeneralclass of freaks which is frequently obThe basic technique for controlling this significant surfaceserved includes opens, shorts, and other (generally workmaninversion mechanism of the 1960s in Western Electric prodship-related) early failures. Because of their random natures ucts for the Bell System was as follows: and origins, no consistent failure-activation energy has been 1) Requiredmaximumfailureratesinsystemoperation found or would be expected for these failures. were translated into acceptable life distributions at the juncThe surface-inversion failure mechanism was certainly a significant failure mechanism of the semiconductor devices of tion temperature of operation. 2) By means of the established activation energy, a correthe 1960s. There is considerable evidence, in addition, that no manufacturer, using the technology of the 1960s, has pro- sponding acceptable life distribution at high stress was estabcess control sufficiently tight, even now, assure freedom from lished from which a choice could be made of life test time and to allowable percentage of failures. some level of contamination which could cause inversion in 3) In order to assure the validity the above relationship of either freak or major quantities; high reliability of such devices cannot be guaranteed without l ) the elimination of such and thereby the value of the life test control, the thermally accelerated freaks were removed by a high-stress screen. In actual practice, in effect through most of the 19605, a * Appendix B treats the recognitiono thermally activated freaks and f life test a t 300C junction temperature for 48 h, allowing 15the requirements of screening to eliminate them.

PECK A N D ZIERDT: RELIABILITY IN THE BELL SYSTEM

193

percent failures (originally for 100 h, allowing 25-percent 104 1 I failures)became the basiclife-distributioncontrolused to protect against the surface-inversion failure mechanism. This was complemented by a preceding 100-percent screen at high lo3 power for 2 h; this power burn-in was carried on for l e s s time than wouldbe requiredforcompletefreakelimination,in order to minimize cost. A 2-h (30OOC) life test was also performed on a sample from each production lot, to assure that the remaining freak percentage was in control a t a very low level; if this test indicated an unacceptable freak proportion remaining, the burn-in was repeated as necessary before shipment of the lot. In every case of a lot which failed this latter test, extension of the burn-in reduced the proportion of failures seen in a retest, demonstrating the time dependence of 100 thefreakpopulation.Sincemuchequipmentcanaccepta 0 BOND FAILURES somewhat higher failure during rate check-out early and DEyA7:N FAILURES operation than is expected later, this technique appears quite 10 economical. 0.1 2 10 30 5 0 70 90 I t should be recognized that this practice may not identify CUMULATIVE FAILURE,PERCENT a possible failure mechanism which has a thermal activation Fig. 4. Log-normal lifedistributions for two failure mechanisms ina energy substantially less than 1.0 eV, and occurs in time such common test population. The similarity response of the two mechof anisms is quite apparentat this stress level. t h a t i t could not be recognized in a high-stress test, but could cause significant failure a t long times at low stress. No such mechanism has been observed as a reliability problem in the 1 0 field in over ten years of application of devices controlled by high-stress life tests. Furthermore, it is even less likely t h a t a low-stress test of practical extent would have shown such a mechanism, had it existed. ICs, at least those using the 1960s With respect to bipolar-transistor technology, it can be seen t h a t ionic congr d tamination, that would cause surface inversion, would proba 0 ably exist over a whole chip (in fact, probably a whole wafer) a rather than over isolated junctions within circuit. Therefore, the chip failure distribution will be the same as that for a discretetransistor,withrespect to this failuremechanism. N P N SILICON TRANSISTORS Tj m275.C This wasexpected at the beginning of the development of integrated circuits, and the expectation has been confirmed in many tests, including those of recent commercial ICs using l@+ 0 BOND FAILURES that processing technology. We have also observed, as would DEGRADATION FAILURES be expected, the same kinds and magnitudes workmanshipof I I I l l KT1 related freaks in these ICs as were seen in 1960s transistors. 0.1 2 10 30 50 70 90 CUMULATIVE FAILURE,PERCEHT Insummary,withrespectto a commondevicefailure mechanism, surface inversion, of the 1960s, a rigorous techFig. 5. Log-normal life distributions in the same population as that of Fig. 4, but at lower power level and junction temperature. Note the nique was available for rapid evaluation of product quality similarity of response to that of Fig. 4 for both failure mechanisms. and for the establishment of practical life tests which would control the reliability of silicon transistors in service. face-degradation and open-circuit failures could,of course, be Accelerated-Stress Testing Technology-Au-A1 Bond Failures easily separated by review of the test data, followed if necesA criticalconcernin all accelerated-stresstesting is the sary by a simple failure analysis. Separate distributions and possibility of a failure mechanism being introduced which is separate activation energies could then be established for the not relevant to the failures expected in actualuse. A particu- two mechanisms, by tests a t different temperatures. for larly pertinent concern resulted from the observation that, A comparison between bond life distributions and surfacesome productsof the 1960 era, exposure to300C temperature degradation life distributions was carried out and reported in would cause not only surface degradation but also open cir- 1962 [ l l ] . Here in a common population, tested at three temcuits the in bonds of gold wires to aluminum contacts. peratures (shown partly in Figs. 4 and 5) we find life distribuOpen contacts, in this context, must include those which tions due to bond failures which are very similar in shape (as show high impedancelo a t low currents and voltages. The sur- shown by the slopes of the lines) tothoseduetosurface degradation. In Figs. 6 and 7 are shown similar comparisons

1 1

when making other parametric tests, would result in atemporary welding a resulting apparently good contact which would be of the contact, subject to early failure under further steady operating conditions.

The activation energyof the bond failure mechanism is established to be essentially the same as t h a t of the surface-degra-

194

PROCEEDINGS OF THE IEEE. FEBRUARY

1974

i o- ~ 4

= -

I
0

BOND FAILURES DEGRADATION FAILURES

YE2

400

350 300
250
L

2
P

5s + a
z9 92 63 z
3
J

200
150

I
100 1 102

MEDIAN DEGRADATION FAILURE


I I I

II
107

103

104

105

106

L I F E , HOURS

Fig. 8. Arrhenius plot of the two failure mechanisms of the product of Figs. 4 and 5, with supporting data. The similarity of activation energies is apparent.

sults equally well for either mechanism, and a high-stress life test will be equally effective regardlessof which failure mechaCUMULATIVE FAILURE.PERCENT nism dominates. The technique elimination for of low-strength freak Fig. 6. Log-normal life distributions similar to those of Figs. 4 and 5, but bonds is to expose all completed bonds (in completed encapon difterent product, showing again the similarity in failure distribution betweenbond failures and surface degradation failures. sulations) to a high temperature (the 300C condition necesor a comparable sary to remove surface-degradation freaks, temperature bake, should be adequate) and then to eliminate the too-weak bonds. For this purpose, a 20 000g centrifuge on 0.7-1.0-mil (17-25-pm) wires gold has been adequate for ground-based controlled-temperature environments. Aftersuchtreatment,thesubsequentdegradationrate under use conditions will benegligiblebecausediffusion of NPN SILICON new materialform to additional intermetallic compound TRANSISTOR will be extremely slow. A t the indicated activation energy, TjR230.C a 20-h 300C condition is the equivalent of, for example, over lo6 h at 100C, and the subsequent growth of AI-Au compounds will cause negligible failuresin 50 years of service (4.4X lo5 h); currentexperience does not deny this prediction, since no bond failures have been observed in those transistors, removed from service, which were available for analysis. Restricting processing and screening conditions tolow tempera0 DEGRADATION FAILURES tures in order to minimize stress Au-A1 bonds fails to recogon nize the validityof this thermal activation. The real answer to 0.1 2 10 30 50 m 90 s 99.9 e Au-AI intermetallic problems lies in providing bonds which CUMULATIVE FAILURE, PERCENT will withstand high-stress exposure still and pass a pullFig. 7. Log-normal life distributions of the product of Fig. 6 but a t a strength test, all within the final atmosphere of the encaplower temperature. Similarity of the two failure mechanisms again is sulated device. suggested. Thus the use ofAu-A1 bonds was continued for Bell System devices, including those purchased from commercial supwhen necessary. The use of AI-AI bonds diswas dation mechanism as shown in the Arrhenius plot Fig. 8, for pliers of couraged because there was no practical way to test the bond the product of Figs. 4 and 5. This activation energy, about 1.04 eV, was confirmed by metallurgical tests by Schnable and strength after sealing and completion of all processes, such as Au Keen [12] down to 150C, and other work [13] with Au and by the centrifuge on wires. (The densityof AI wire is so low AI films indicates that interactions occur with essentially the compared to that of Au t h a t practical centrifuge levels prosame activation energy at temperatures as low as 20C. vided negligiblepull onthebond.) Processcontrolcanbe a pull test directlyonthe I t can be assumed, therefore (since a wide distribution of established, of course,byusing initial bond strength is found), t h a t devices completed withsample of the AI-AI bonds, but a completeelimination of a 100-percent nondektructive out some screeningprocess for the bond weakening caused by marginalbondswouldrequire pull to an established safe level. This is feasible but very timealuminum-goldintermetalliccompounds will be subject to a consuming compared to centrifuge implementation. Further, failures due to this mechanism during exposure to their norif an aluminum wire is used in order to achieve a single-metal mal life temperature,withtheweakerinitialbondsfailing earlier. The failure rate is calculable through recognition of bond at thechip,itisadditionallyexpensivetoavoidan t h e log-normal failure distribution a t high temperature and of AI-Au bond at the end of the wire. The use of a completely single-metal bond system has obvious advantages, but it rethe activation energy (see Appendix A). Since bond failures mained for the technology of the 1970s (see Section IV) to and surfacedegradationshowthesameactivationenergy, provide the economic system. then any accelerated-stress test will predict lower stress re-

PECK AND ZIERDT: RELIABILITY I N THE BELL SYSTEM

195

Accelerated-Stress Testing Technology-Other Mechanisms

Types of Failure

Manytypes of failuresarisingfrommechanismsother than surface degradation and Au-A1 bond degradation may also be controlled by stress-accelerated screening tests; quite comprehensivelistings of such tests and their applicability are available [16].The following discussions briefly describe those which are frequently applied to silicon devices. 1) Hermeticity Tests (for hermetically sealed derices): Very effective elimination of those units housing with leaks is achievedbytestingseparatelyfor fine leaksandfor gross leaks, eitherof which might not cause device failure during the normal testing cycle.Leak detection is not temperature-accelerated, but can be accelerated by increased gas or liquid tn pressures (using detectable materials) order provide in to .- 5 10 20 30 50 70 80 SO 95 98 2 viabletests i n shortintervals.The Radiflotestdescribed PERCENT LESS THAN OR EQUAL TO ORDINATE earlier or equivalent mass-spectrometer tests using pressurFig. 9. Distribution of removal rates observed in a number of applicaized helium serve as tests for fine leaks, tests and using tions of a low-power silicon transistor in a digital carrier system. The colored dyes or observation of escaping gas bubbles serve for range of rates suggests the variationsof environmental stress and circuit margin which exist in this equipment. gross leaks. 2 ) CurrentandVoltageCapabilityTests: Particularly i n power transistors, it may be necessary, to apply above-normal panies feel thatmaintenance of equipment, or the replaceoperating conditions to assure the ability of each device to ment rate of a part, is excessive, and the close liaison between equipment design and operating requirements minimizes these withstand surge conditions without developing transientproblems. In addition, concentrated field studies are estabrunaway failure not predictable from normal operation. Such lished to sample the reliability history systems (particularly of tests will alsotend to detect deviceswhichhavepoorchip those using new classes of parts), to obtain accurate reports of bonds to the headers. 3) Tests Loose for Conductive Particles: I n applications part removals and associated conditions, and to analyze the removals sufficiently to determine if any corrective action of where the devices are subject to vibration, loose electrically be a more conductive particles of sufficient size can short-circuit operat- any kind is indicated. This practice is judged to economical,moreaccurate,andquitesatisfactoryalternate ingdevices.Thesecan be detectedbyshakingthedevice to complete field reporting. while monitoringformomentaryshortcircuitsbetweenits In such field studies on discrete transistors it is typically terminals. I t is necessary that this test be continued for some found that only about 10 percent of removed devices show time because of the relatively small probability that the particle(s) will contact a sensitive area during their random mo- degradation of electrical characteristics, with the others divided more or less equally between good devices (removed in tion. With appropriate biasing and detection circuitry, this test can also detect intermittent open circuits caused by poor the process of locatingsomeotherfault)andthosewith evaporated emitter wires (damaged by high-current pulses). internal bonds. 4 ) Current-Pulse for Tests Metallization Dejects: When We thushavemuchevidencethatmaintenancepractices, vulnerability of the equipment to lightning pulses or other both ends of ametallized path are accessible(forinstance, base andemitterstripes of overlay-contacttransistors, in disturbances, or even device misapplication in the circuit, can serieswith theforward-biased E-B junction)fromthe ex- contribute to a large difference between the removal rate (in ternalterminals,current pulses be can used to selectively RITs)andtheinherentdevicefailurerate(inFITS)as measured to a specific set of end-of-life characteristic limits. burn out the metallization in areas which have significantly A modification of such field studies results when defective reduced cross sections because of voids or mechanical damage. apparatus (printed wiring boards, etc.) from a system is reThe pulse length and current levelwhich will burn out returned to only a few repair centers, so that a count of removed duced sections, without damaging the sound conductor, are parts by circuit position is available, but failure analysis is ordinarily determined experimentally; pulse lengths from 1 ms to 1 s have been found effective for several silicon signal typically not performed. One such result is shown i n Fig. 9, transistors having various conductor geometries and loci of giving the distribution of removal rates for one small-signal silicon transistor type from several circuit applications in a defects. The possibility of weakeningsoundconductorsby such pulsing is evaluated by life-testing previously pulsed de- digital data transmission system. The range of removal rates is believed to be indicative of vices, and by repetitive pulsing of the same conductor.

L l

Field Experience A complete field history of any class of devices in the Bell
Systemisnotpracticallyobtainablebecause of the widespread usage, both in physical location and (generally) in variety of application, and because of the expense and impractiIn cality of reporting which is complete and also accurate. general it is not considered necessary; complaint and investigation procedures are available when any of the using com-

1) variations in stress among the different circuit applications, 2) variations margin in between designed the deviceparameter allowances in the different circuits, and the single established life-test end points of the transistor specification. Applying the typical 10 :1 ratio between transistor removals and transistor failures which has been observed in this typeof

196

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

TABLE I1 REMOVAL RATESOF BELLSYSTEM SILICON DEVICES OF THE 1960's TECHNOLOGY (RIT's)
SYSTEM AND DATE OF STUDY MULTIPLEX EQUIPMENT (1970) TRANSISTORS DIODES LOW MEDIUM LOW MEDIUM POWER POWER POWER POWER 19 16
40

26

(4).

251

I 111111'

I II1111'

'

CARRIER SYSTEM
II1LI.l

(1970) (1971) (1971)

102

103
LIFE.HOURS

104

105

SIGNALING SYSTEM
'ESS X1 (18 OFFICES)

(17).
18

8
0.2 2.5

Fig. 10. Arrhenius plots of early failures of certain diodes, showing the effect of voltage stress on life.

*ASSUMING ONEFAILUREWHERE

NONE OCCURRED

system, the average of the results shown in Fig. 9 appears quite consistent with the general 10-FIT reliability objective for such transistors. MODEL A specific example of the effect of variations in stress is t h a t of Fig.10,showing the effect of reverse-biasvoltage t K stress on the temperaturelife regression line, on a particular diodetypesubjectto a charge-movementtype of failure mechanism. The activation energy shown is 1.14 eV, close to the 1.02 eV of charge motion on transistor surfaces; for tran0 I CUTOVER TO sistors, however, there is usually a wide range of voltage and Y K SERVICE current within which the same life distribution will be obtained for a given junction temperature [14], although adverse effects can be obtained a t extremes of either condition. For the diode represented in Fig. 10, the field history of 0.2OPERATING SYSTEM F I T reliability(fromTable 11) inalow-voltageswitching usage is of a level to be expected from the extrapolation, and 1 10 1 0 OPERATING TIME ,MONTHS higher removal rates, not clearly documented, are being found in circuits which impose higher voltage stress. Fig. 11. Removal rates versus operating time for alogic transistor in A more precise comparison between results and objectives f l ESS. The high early rates reflect hardware prove-in and system changes. The rapid reduction after "cutover" customer service may to may be found in the case of the logic switching transistor in reflect the effect of reduced system handling. #1 ESS, where the transistor is used nearly entirely in a singlelogic-circuit design and thespecified life-test end points match suchcontrol;thecorrosion of aluminummetallization,for the requirements of that circuit. Here the removal rate patexample, in the presence of humidity and chemical contamitern [IS] for this circuit is shown in Fig. 11, giving the removal rates for two prototype systems whichwereused to nation, is not appreciably subject to acceleration by temperacheck software (and hence subject to many hardware changes)ture, and a tight control on humidity in the sealing environand for the second system put into actual service. The latest ment is required (in addition to the 100-percent leak test) to of control can be interval between removals reported in this study, for system protect against such a problem. This type a in service, corresponded to a removal rate of 2.5 RIT's (which instrumented, however, while the occasional presence of contaminationinthe siliconoxide, providingthepotentialfor is perhaps still reducing), suggesting an actual transistor failure rate of a fraction of a FIT. The screening and life-test re- surface inversion, cannot be detected by instrumentation or sults of the 1962-1963 periodwhenthesetransistors were by economic sampling, and can only be practically controlled made, extrapolated the to approximately 3OoC junction- by a 100-percent burn-in screening. For theBell System in the and applied with suffitemperaturecondition of theapplication, would provide a 1960's, such screens were understood cientrigor thattemperature-dependentfailuremechanisms prediction of approximately 0.01-FIT failure due rate to became a significantfactorinsemiconductor-device of the never either surface inversion or bondfailures.Sincenone reliability. removed transistors which were analyzed showed either failThe same accelerated-stress qualification and acceptanceure mechanism, a degree of consistency is exhibited which is hardtoquantifybecause of the smallnumber of failures. testing techniques were applied to those devices which were not made in the Bell System but were purchased from comOther field studies provide results as in Table 11. These results, together with the general Bell System oper- mercial suppliers [16]. Rapid evaluation of the product capalife ating history of silicon transistors of approximately 10 years bility was possible, and burn-in conditions and short-term testing requirements were established which were consistent without evidence of development of wear-out failure mechanisms,lendsomecredibility totheconcept of 100-percent with the specific reliability requirement for t h a t device (with screening, a t suitable high-stress levels, to eliminate thoselow due consideration to other, temperature-independent, failure of percentages of units which, given time and stress, would have mechanisms). Since the reliability most commercial products become early failures with an unacceptable failure rate. Un- was limitedbyfreakproportionsratherthanbythemain population, the application of these techniques was generally questionably,therearefailuremechanismsnotsubjectto

PECK AND ZIERDT: RELIABILITY

IN THE BELL SYSTEX

197

the expected failure-rate improvement resulted, for there were successful and more effective, both with respect to the time required for burn-in and testing and assurance of meeting the no detectable diode or transistor failures, except for those reversiblefailures due to ionizing radiation, in over 5.8X lo7 objective, than the typical lower stress longer time industry device-hours of operationinorbit.Onefailure would have approachtohi-relproduct.Inthelongrun,then,this resulted in a failure rate of about 17 FITs. Since only 94 of practice provided a lower cost production-oriented reliability were the total 2640 semiconductor devices in each satellite improvement,assumingthattheoriginalproductwasnot well for limited by structure or materials to low-stress treatment. Ex- manufactured with any special care, the result speaks periencewithcommercialdevicesprocured to specifications the inherent reliabilityof semiconductor products as made in which apply these techniques showed that adequate reliability 1960. I t also speaks well for the capability of a 100-percent control was generally achieved, but relatively frequent speci- screeningandselectionprocessforeliminatingthoseunits fication changes were needed because of design and processing which could be early failures and cause a high failure rate. The screening process prior to the long-term aging did not changes which were made by the device manufacturer without include a period of high junction temperature under bias since consideration of thecontinuedapplicability of thetesting experience with that type of stress had not developed, at that procedures. As the decade saw more and more cost competition, how- time, to a level where i t could be used with confidence. Later of such a ever, itbecame more difficult to locate productswhich did not studieshaveindicatedthegreatereffectiveness screen. but the total time only about one year available for of have stress restrictions built in by cost-reducing design. For example,thegeneralavoidanceinindustry of encapsulant the whole program, up to providing the first selected units for assembly, would not allow the use of any but well-known plastics which could withstand 300C burn-in prevented the procedures. necessaryscreeningforfreakdevicesandhenceprevented A little-known problem developed,however-the degradaconsideration of devices using these materials for even modesttion of silicon mesa transistors under bias and in a field of reliability applications. ionizingradiation [18], 1191. This effect wasdiscovered a t The Telstar Experiment about the time the first transistors were being deIivered for assembly,havingcompletedtheirselectionaging.Thefull In late 1960 i t wasagreed t h a t Bell Telephone Laboratories would design, build, and put into orbit a communica- effect of this discovery was not realized until the first Telstar satellite started to fail to respond to command after barely tions satellite intended to operate as an experiment for only two years, at which time the communications transmitter was more than one month in orbit. Radiation-measuring equipthat i t was being exto be disabled. Since the initial launch was scheduled 1962, ment on the spacecraft had indicated for higher-than-expected radiation level, resulting very little time was available to select the device types and toposed to a recent high-altitude nuclear testing; failure the and guarantee the reliability of individualdevicestobe flown. from subsequent performance indicated the effect of this radiation Through close cooperation between the circuit designers and on the spacecraft. The reliability challenge was posed, howthe device specialists, some 18 diode types and 23 transistor of the effect of radiation and types (from 8 manufacturers) were selected [17] for suitable ever,withthefirstevidence structure and performance and were given qualification tests electrical bias, and it was recognized in December, 1961, that a technique should be found, if possible, fordetermining whichincludedtemperaturecycling,shock,centrifugeand life vibration, and step-stress tests of 2 -, 16 -, and 48-h durations. which devices, or which device types, would have longer to Except for one transistor code which had be made especially in this environment. Here again was shown the value testing at a stress level of for this program, the devices were taken off the shelf. The individual selection of flyable devices was made after where failures can be seen. By testing device response a t high 100-percent environmental screening which included a bake of dose rates of radiation, several results were quickly obtained: 1) The surface effect was dependent largely on total dose, on structure), centrifuge, 150C to 250C (depending the withsometemporaryvariationduetodoserate,and,for temperature-humidity cycling, shock or vibration, temperature cycling, and X-ray inspection and an extended low-stress these mesa transistors, was not seen when the device was not electrically biased. power operation at,or somewhat above, the most severe usage 2) The effectwasrelated totheionization of gases or condition for t h a t device in the satellite (generally around 25 other materials at the silicon surface. A mesa transistor in a percent of maximum power or current rating). During this final operation, the objectivewas to obtain ten or more mea- vacuum enclosure would show no effect until radiation sufficient to cause bulk damage had accumulated. surement periods in order to distinguish any degrading trend 3) For gas-filled mesatransistors,thesurface effect anin individual device electrical characteristics. In some cases, because of lateness of selection of device t y p e s , only one month nealed with time (when either radiation or electrical bias was of aging was obtained although the objective was a minimum removed), after which the response of an individual device would, with a repeated radiation dose, repeat its previous reof six months. sponse. When required for the satellite assembly, individual de4) A moderate total dose levelwas found, at which the vices were selected on the basis of stability of characteristics more stable devices could be distinguished from those likely through all the screening and aging procedures. Through the mechanical screens, an average of 1.7 percent of the off-the- to change the most. a The resultof this work was that, within few months from shelf deyices failed, although only 0.3 percent of those spethe discovery of the problem, the failure mechanism was a t cifically made for the program failed. During the extended aging process, the quite typical result was a failure rate of least pragmatically understood, a technique was available t o be sufficiently 10 OOO FITs in the first 1 or 2 weeks, dropping to 500 FITs screendevices(selectingthosewhichwould t stable a the expected dose), and the previously aged transisat about six months.Whenthemoststabledeviceswere tors were radiation-screened, annealed restore to them to selected and put in the rather benign satellite environment

198

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

theirpreradiationcondition,andprovidedforsatelliteasBEAM-LEAD sembly.Certaindevices(largelydiodes) were not screened because of their relatively high resistance to ionizing-radiation effects, according to qualification tests. I i In the second Telstar satellite, the redundant command SILICON NITRIDE decoders (which had contained the critical transistors in the SILICON DIOXIDE first satellite) were made with radiation-resistant transistors. PLATINUM One contained vacuum-sealed types, and the other contained SILICIDE those selected by radiation screening. The effectiveness of the SILICON screen is attested toby the fact t h a t the second satellite saw a total estimated dose of about 3 X 106 rads with all indications showing basic the features of the beam-lead of proper operation of both decoders, whereas the first satellite Fig. 12. Cross section sealed-junction (BLSJ) device design. Seal o the silicon dioxide edge f showed trouble at an estimated 8X105 rads. a t the contact is obtained by the metallization, as shown here, or by This radiation experience in the Telstar experiment is not extending the silicon nitride over the edge into the edge of the contact area. specifically relevant to the subject reliability in the normal of Bell System environment. I t is another example, however, of prosecuted. venture This supplanted well-known printedhow testing a t stress levels which quickly induce failures can circuit-board cordwood and modular equipment designs, provide useful information about the related failure mechawhich used individually housed semiconductor devices, with nism without the loss of time usually associated with using only the normal stress. In this case,led to the rapid develop- a conceptually simpler arrangement of much smaller devices it of ment of a nondestructive testing technique for eliminating the which solved, by design, many the problems contributing to the possibility of unreliability in the conventional structures undesirablepotentialearlyfailures,whereasreliabilityimprovement by design change or by long-term testing a t nor- and which was most dramatically capable of providing relimal stress levels would not have provided a solution to the ability assurance for the era of ICs. This concept was originally developed largely as acost immediate problem. reduction. I t was recognized that a significant part of the cost Summary of the typical silicon device is in providing a package to proThe reliability history of the products of the 1960s in the tect the silicon surface from contaminating atmospheres and Bell System indicates that the failure rates due to semiconin providing electrical connections from the device contacts ductor devices were, in general, better than those assumed in to the external circuit. If the package protection could be the equipmentdesigns. The early evolution of process controls brought back from an external can to the device surface, on such features as quality of the sealed-in atmosphere and and connections provided directly from the device metalliza100-percent tests on hermeticity allowed attention to be dition, much cost saving could achieved and the total number be rected to the temperature-acceleratable failure mechanisms. of connections reduced, giving aninherentreliabilityimThisresultedinthe use of accelerated-stress(at or near provement. 300OC) life testsandprocessingburn-ins. A consistent ArHence beam-lead the sealed-junction (BLSJ) concept rheniusrelationship was shownbetweenjunctiontempera[20], the basic featuresof which are shown in the cross-section of the view of Fig. 12. A deposition of silicon nitride, SilN,, over the turesand life distributions major for mechanisms devices of that period and a rigorous technique was developed usualthermallygrownsilicon-oxidesurface,inconjunction forprediction and control of thesemechanisms, to comple- with extension of the contact metal over the edges the insuof ment a thorough application of experience-based techniques lator, provides a seal for the semiconductor junction [21] and for other failure mechanisms. Application of these techniques a barrier to the introduction of contaminating ions, particufurtherrestrictedthe choice of structureandmaterialsto larly the dangerously mobile and commonly available sodium those which would withstand the required stresses, and would ion. For insulated-gate field-effect devices it has been found not introduce new uncontrolled failure mechanisms. They also advantageous to use A120, as the deposited layer; it offers a allowed rapid evaluation of products not made within the Bell similar junction seal to that SirN, and has less effect on the of System (for which purchase specifications were required), and gate threshold voltage. allowed the specifications to be tailored by these same techFor beam-lead devices the ohmic contact to the silicon is niques to the reliability requirements of the application. of platinumsilicide,formedinthecontactareasbyhighThis fundamental work and experience provided the back- temperature treatment of deposited platinum; the three-layer ground which allowed the Bell System to enter the eraof the beam-leadmetallization [22] isformedoverthesecontacts 1970s, with its new device-design and technology challenges, and over the insulator. An initial deposition of titanium prowith confidence t h a t new processing technologies could rapvides adhesion to the insulator surface, and the gold layer final idly be evaluated in order to meet tightening reliability reprovides necessary the electrical conductivity. A film of quirements expeditiously and economically. platinum (for bipolar devices) or of palladium (for insulatedgate field-effect devices) is appliedafterthetitaniumand IV. THE1970s-BLSJ INTEGRATED CIRCUITS before the gold to inhibit diffusion of gold into the titanium The Barn-Lad Sealed-Junction (BLSJ)Concept (and thence through the contact into thesilicon) and to serve While the reliability-assurance technology convenfor asanelectroplatingbaseforthegold.EvaporatedPdhas tional (SiOZ-passivated aluminum-metallized) Bell System been used for IGFET devices because the commonly used elecdevices was being exercised in the 1960s, the development of tron-gun Pt deposition process causes surface changes which an entirely new hardware-design concept was being vigorously result in higher-than-desired IGFET threshold voltages.
?-

PECK A N D ZIERDT: RELIABILITY IN THE BELL SYSTEM

199

Connections to this structure are provided by beams, sion of moisture. Although no plastic is impervious to water which are extensions of the gold metallization, plated (prior vapor, one which can maintain very close contact (preferably to removal of the underlyingsilicon for separation of the chips via a physical bond) with the device surface exposes the surin a wafer) to a thickness of about 0.0005 in (12.5 pm) and face only to that level of water which is absorbed in the plastic, extending beyond the edge of the separated chip. restricting electrolytic current growth, (It must, of course, be This plated-gold metallization system, developed for capable of withstandingsubsequent processing andtesting beam-lead devices (which is now also used for discrete tranconditions, as well as life, without losing its protective propersistorsinWesternElectricmanufacture,eitherin molded ties or causingchemicalormechanicaldegradation of the plastic or in sealed designs), avoids electrochemical corrosion silicon chip.) Thin layers of silicone resins [27] or silicone rubproblems and, with proper application in sufficient thickness, bers have been found most suitable for application to BLSJ the likelihood of thin metallization over insulation steps. Al- devices mounted on their substrates. though gold is subject to electromigration, comparative tests Thetypical molded-plasticencapsulationdeveloped for show the life of Au metallization to be one to two orders of discrete transistors, and extended to ICs, suffers from probAI at high current densities magnitude longer than that of lems which are alsosolved to a largedegreebytheBLSJ (e.g., 10 A/cm*) even structure whether using beam leads or simply the sameconand at high temperature (200Cwires. Themajorremaining 3OOOC). Bulk gold has a higher activation energy of self-diffu- tactmetallurgywithbonded sion than has bulk aluminum [23], [24], and this ratio should problems are 1) mechanical stresses on bonds and beams due also apply to the relative rates electromigration in thin of films to a mismatch of thermal coefficients of materials and 2) posof these materials given comparable grain sizein the films. siblevoidsin the encapsulation, allowing moisture to cause leakage currents and electrolytic corrosion between metals. Inaddition,thegreater thickness typically used gold for Themechanicalstressescan be evaluated by an accelmetallization (3 pm rather than the typical1 pm for deposited aluminum)permitsthedesigned use of higher currents or erated temperature cycling such as that of the typical MIL narrowermetalstripes in LSIdeviceswith confidence of specification (-65OC to 2 0 0 O C ) . The humidity effect can be evaluated with a humidity test [28]. With these types of confreedom from electromigration. trols, the Bell System has recently been using silicone-molded The chip is mounted face-down on a ceramic substrate, discrete transistors with reliability comparable to that of the with the beams thermocompression-bonded to thin-film gold been extended to interconnectionpatterns.Bondingtechniques [25],[26] are sealed prototypes. This structure has also 100- IC chips and to multiple combinationsof chips on ceramic in well established, and the bonds are made with nearly percent yields. Many chips can be bonded to asingle substrate a molded plastic dual-in-line package. (which may also include thin-film or appliqued passive circuit elements), thus achieving veryhigh packing density withhigh Reliability of BLSJ Detlices Distributions of life, or time-to-failure, can be established interconnection reliability as well as with low values of parasitic circuit elements. Each chip may have from 2 beams (for for BLSJ devices, just as for conventional devices, by apply16 a discrete diode) to a common beams (for a small integrated ing high power, combinations of power and temperature, or high temperaturewithreverse bias, as appropriate for the circuit) to perhaps 100 beams (for an LSI circuit). The total device being tested. The life data are again found to fit a lognumber of beams bonded to a common substrate can easily be of the orderof 1000, without concern for the field reliability normal distribution, although the dominant failure mechanism is now that of metal (platinum) diffusion from the contact of the interconnections, in this single-metal system. the silicon, resulting eventually in a short-circuited Althoughthejunctionsealprotectsthe silicon surface into against a contaminating environment, the metallizationitself junction. The mechanism of surface channeling due to penetration must be protected against conducting particles and the effects of ions from the outer surface through the oxide eliminated is of moisture. The presence of water vapor affects the conductivity of the insulator surface between two metallized con- by the SisNl or Also, junction seal, together with the metal seal at the contacts to the silicon surface. Surface channeling ductors, as a function of the relative humidity. With applied voltage, the resulting electrolytic currentflow is proportional arising from other causes such as ion-contaminated oxide due to the voltage and inversely proportional to the spacing be- to p o o r processing or field inversion due to poor design is not inhibited by the Si,N, or AI208 layers. tween metals; this current flow then transfers metal, increasFig. 13 shows the comparison of the activation energy of ing the leakage current, eventually to an unacceptable value, the metal-penetration failure mechanism of Ti-Pt-Au beamorcorrodingoneelectrode toanopen-circuitedcondition. This failure mechanism could be of particular concern on ex- lead devices with that of the surface degradation mechanism of earlier posed IC chips because of the very narrow spacings between (or Au-AI bond failure mechanism) characteristic metallized conductors on the surface. The same electrolysis silicon devices [29]. Here is shown the range of median lives can also occur at the exterior terminals of a packaged device, expected for high-quality processing of each type of device (conventional and BLSJ), although poorprocessing can debut the larger spacings make i t of less concern. by eithermechanism by gradethemedian life as defined I t is therefore necessary that the chip surface (and other of degradation of metalsurfaces with narrow metallization spacings, such as the sub- perhaps 2 decades.Thisamount penetration-limited life, with the indicated activation energy strate for the chip mounting) be protected against the adheof about 1.8 eV, should have little or no effect on reliability at use conditions, whereas the same degree of degradation of *I The abbreviation L S I is applied to large-scale integrated circuits, containinggenerally more than 100 gate arcuits or theequivalentin surface-degradation-limited life, with i t s apparent activation other arcuit functions. Because of this high component density, it is deenergy of about 1 eV, would be catastrophic. (Note that, as sirable that narrower metallizedconductorpaths be used to conserve indicated in Appendix a median life of the orderof 107-108 h A, space on the chip, thus aggravating the electromigration problem.

200
450 400 350

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

300
250

200

150

too
100

101

102 103 104 105 MEDIAN LIFE ,HOURS

106

107

tests indicate no long-term reliability problems with the silicone coating materials currently being used with sealedjunction devices. A recent comparison [28] of device-test results with d a t a relating the surface conductivity of an insulating material to the ambient relative humidity indicates a more method precise for relating accelerated-humidity test results to expected life in field operating conditions. Details are shown in Appendix C. providesmuch This a more satisfactory technique for establishing minimum requirements encapsulating on materials i n relation to the severity the expected environment. of I t has allowed the establishmknt of a biased humidity test of only 48- to 96-h durationforacceptance of deviceswhich should have satisfactory life in the Bell System.

Fig. 13. Arrhenius plots of the expected ranges of median lives of wellprocessed conventional SiOrpassivated Al-metallized devices limited by surface inversion and of BLSJ devices limited by metal diffusion. It is apparent that the BLSJ failure mechanism is relativelyunimportant at any useful operating junction temperature.

Field Experience The limited Bell System experience to date with the BLSJ devices of the 1970s has been obtained from the early experience in a few systems using a mixture of ICs and discrete devices, all beam-leaded. I t is felt, because of the commonality of design rules and bonding and coatingprocedures, t h a t these is required to assure a 10-FIT failure rate at the junction resultsareindicative of thefuturereliability of all such temperature of the application.) In addition to the advantage achieved by elimination of devices. A call-storesubsystemfor #1 ESS uses about 15 000 surface degradation, several other gains are achieved in the BLSJ chips of various sizes-ICs, transistors, and diode arBLSJ design, all contributing to high reliability: 1) Such freaks as have been observed in high-temperature rays. Their reliability can be measured since circuits removed As life tests have rarely occurred in less than 100 h at 300OC. from these stores are returned to one location for repair. of 32 difWith a 1.8-eV activation energy, this will have an insignificant August 1972,some177 stores had beeninservicein effect on field reliability, and for most applications, screening ferent locations for a totalof 8.75 X lo8device-hours without a single removal known to be due to a defective BLSJ device. or burn-in is not required. In development of a modified ESS system, prototype sub2) Since the IC gold intraconnections are much thicker than deposited aluminum and are electroplated, there are no systems were testedusingBLSJICshavingexceptionally high switching speeds. Of approximately 37 000 devices opdiscontinuities or thin regions at steps in the oxide surface. eratingforapproximately six months,there was no known Hence electromigration failures stress-induced and opens, failure of a silicon chip, giving a total of 1.6 X lo8 failure-free sometimes troublesome with aluminum, are eliminated. 3) The metallization system is appreciably less sensitive device-hours. Supporting the preceding two examples, which operate in than aluminum to chemical corrosion by humidity and residair-conditioned space, there is the experience a field trial in of ual salts on the device surface. 4) The bonding of a gold beam to a gold conductor on the Louisiana of pole-mounted apparatus using 2640 BLSJ ICs. 1972 (the latest report date) substrate eliminates potential bond failures due to intermetal-FromAugust1971toAugust there had been no failures of a BLSJ chip in that trial, which lic compounds. Similar gold-gold thermocompressionbonds of service, but under at the outer edge the ceramic substrate provide gold-plated providedonly2.35X107device-hours of rather severe environmental conditions. connecting leads, similar to those of a dual-in-line package, In total, fromtheseknownsituations,over lo9 devicefor coqnection to external circuitry, These features solve, it may be noted, many of the major hours have been observed with no failure of a BLSJ device or a beam-lead bond. Although thereis no record of what failures problems experienced commercial with plastic-packaged transistors or integrated circuits-silicon surface degradation, mayhaveoccurredduringassembly,test,andinstallation, this first evidenceof performance in serviceis unusually good. openbonds,corrodedmetallization,andelectromigrationand provide the basis for molded plastic devices as well as for Typically, failure rates of active devices decrease linearly in a log-log plot of failure rate versus operating time [IS]. These those mounted by beam leads on substrates, and coated. The of long-term reliability of both structuresis tested by electrically BLSJ devices havea weighted average operating time about 3000 h, at which time thefirst ESS transistor in themid-1960s biased in tests accelerated humidity conditions [typically had a failure rate of 200 FITs i n system testing, or 15-20 8SoC, 85-percent relative humidity ( R H ) ] in order to evaluFITs i n operating service. If one failure had occurred in the ate the quality of the protective coating. Since there have BLSJ devices, this would result in a failure rate of 1 F I T , a been, in the past, few data to relate a high-humidity test revery satisfying comparison. sult to expected field performance, the initial test objective I t is reasonable to assume, of course, that some problem was to establish that coated or molded devicesa t least equaled the life of hermetically sealed devices (e.g., a 500- to 5000-h area might develop in the future. A review of potential probmedian life of a low-voltage-biased TO-18 transistor package lems would therefore be in order. First, analysis of failures of at 85C, 85-percent R H ) . This procedure was judged suitable early production devices shows t h a t metallization defects can provide near-failures which later can become actual failures. in view of the long operating history accumulated with such hermeticallysealeddevices,withoutsignificantfailure at- Care i n metallization definition is therefore certainly called for, although visual inspection a t lOOX magnification should tributed to electrolytic corrosion of the external leads. Such

PECK AND ZIERDT: RELIABILITY IN THE BELL SYSTEM

201

be effective eliminating in defective devices. Maintaining Functional (operating) life testing of large-scale arrays is minimum 10-pm spacings between adjacent metallized paths difficult and costly, as is parametric measurement or funcwill avoid probable trouble, and process development has a tional testing of their performance, in the present state of the high probability of reducingthisspacingguideforgeneral art. As an example, several hours of rapid sequential autoapplication. Even now the BLSJ technology has been used for matic testing are required in order to partially functionally a line of microwave transistors where 2.5-pm linewidths and test some large-scale memory-array devices. spacings are used. For such unique production and for relaThus a different approach to reliability control [30], [31] tivelysimplestructures,moreextensiveinspectioncan be is presently applied to MIS arrays. I t depends on combined economically applied to avoid potential failures. These tran- assessment of the initial distributionof the VT values, and the sistorshave been extensivelyevaluated a t high stressand observeddistribution of changeduring a n accelerated life have been found to match the median failure expectation for test, to limit the failure rate due to VT shift to an acceptable BLSJ devices as shown in Fig. 13. value. I t is applied either to accessibledevicesin an array Defects of the nature of a silicon-nitride pinhole over a (when the array organization permits) or to devices made on silicon-oxide pinhole, if undermetallization a path, could small simple test chips which are processed simultaneously causeamalfunctionwhichcouldtakesometimeperiodto with the array chips. This procedure can give good control of showup, but this type of defectcouldbeconsideredvery main-population reliability, but does little to isolate any freak unlikely. devices which may exist. The process of successfully bonding a chip and coating it As measurement and functional test capabilities evolve, for humidity protection involves suitable thermal-expansion reliability evaluation by determining failure distributions durmatches between the chip, the substrate, and the coating ma- ing accelerated tests will become feasible for larger scale inteterial, involving the beam flexibility, the beam anchor strengthgrated devices. Failure-rate estimates based on such distribua t each end, and other details depending on the design and tions should be more precise for the circuit than an estimate the application. In general, however, any defects which could made by combining discrete device failure rates based on specoccur would tend to be apparent early inlife, if not in testing ified parameter limits. itself, so t h a t these factors, while possibly of concern in manufacture, would tend not to influence reliability after a period Summary of infant mortality. The reliability problemsof the conventional devices of the 1960s tended to be those associated with inherent material M I S Field-Efect Daficesu weaknesses or related to general contamination or processing Reliability evaluation of MIS field-effect transistors and defects, so t h a t large areas of device material were affected ICs differs from those of bipolar circuits or devices in that, and, except for the number bonds involved, a device of one of while the BLSJ bipolar devices tend to fail catastrophically size would tend to have the same probability of failure as a due to junction shorts, the life of MIS devices depends critidevice of another size, regardless of the number of junctions cally upon the stability of their threshold voltage VT,as well involved. as upon the diode characteristics the drain and source juncof To the extent that of that technology is continued into use tions. Failure of an MIS-LSI circuit can result from one MIS the 1970s, thetestingandscreeningnecessarytoavoid or transistor in the circuit exceeding its designed end-of-life lim- eliminate those types of failures will still be required. Long its on VT. VT changes in an MIS transistor can result from times a t low stress, or high stress for short times, are required severalcauses,theirrelativeimportancevaryingwiththe to reveal these time-dependent failures. type and magnitude of conductivity of the gate channel, the For the Bell System, BLSJ the technology eliminates gatevoltagepolarityandmagnitude,thematerials of the these types of failures and leaves as sources of unreliability gate structure, and the qualityof the gate insulation. largely: 1) mechanical defects due to bonding or coating; and They will fail catastrophically if the breakdown voltage 2) point defects in the insulating metallizing systems. Either or of the insulation is exceeded, even for very short time periods; of these types of life failures will be likely to be associated i t is thus necessarythat stringentdesign and handling precau- with a related number of manufacturing rejects of the same tions be taken to prevent application of excessive voltages to type. Hence, probably for the first time in the history semiof their terminals. conductor technology, reliability will truly be related to proSince the performance and stability of these devices deduction yield; this has been a popular theory (not supported pend critically upon very high gate-insulation resistance and by data) for many years. upon charge stability in the insulator layers and the siliconat V. RELIABILITY THE INTERRELATION DEVICE VIA OF insulator interface, relatively small material or processing DESIGNERS SYSTEM AND DESIGNERS variations or contaminant levels can cause substantial changes in their characteristics, particularly VT and leakage currents. A presentation of the active-device reliability status of a so particular segment of the electronics industry such as the Bell They are also most frequently applied in large-scale arrays that individual-device characteristi.cs cannot measured be System is of interest largely because of the relatively unique unequivocally; although arrays the may be functionally position of an organization which maintains the equipment tested, such testing does not give information about the nawhich it builds. This paper is not concerned with the overall ture of failure of a buried device which causes a functional structure by which the system and device design organizations failure of the array. Thisinaccessibility also inhibits measure- (Bell TelephoneLaboratories,Incorporated),themanufacso thatthe margin turing ment of initial device characteristics, organization (Western Electric Company), the and allowed for degradation is not readily established. Operating Companies interrelate; i t is concerned largely with the motivations of those responsible for the design of systems and of devices and with the impact of reliability consideraI* Metal-Insulator-Semiconductor.

202

PROCEEDINGS OF THE

IEEE,FEBRUARY 1974

is in a positiontospecifya)multipleevaporationsources, b) sloped oxide openings, c) SEM examinations of each slice, d) replacement of aluminum with plated gold, or e) any other control (by design, process specification, or test) against this failure mechanism; he can minimize cost of this control by the The Role of the System Designer making an optimum choice. Making thedevice designer a party to both the T h e overall objectives of performance, cost, and reliability reliabilityof the telecommunications system, as derived by the Operat- versus-cost decision (where has he thebest knowledge of ingCompaniesfromtheirassessments of customerneeds, available tradeoffs) and the execution of the necessary conmust be considered bythe system engineer in design work. trols in manufacturing practice (where he has the responsibilhis These objectives are eventually translated into design require- puts the pressure for final reliability performance in the ity) ments for electronic components, with the semiconductor de- place where it is most effective. vices sometimes being the most critical items with respect to hlany management options can help to improve device rereliability. liability in many stages. Each can be in some degree effective a cost-versus-re- by itself, or, in conjunction with the responsibility assignment Herethesystemsengineermustmake liability decision, just as the designer any system or equip- previouslydiscussed,canassistgreatlyinmakingthejob of ment, except that in theBell System he has the incentive and easier. A few of such available techniques are the information to make his decision based on the total life1) statistical quality control of manufacturing processes; cycle cost of each component (including the equivalent initial 2) user incoming inspection; cost of future replacement operations), as well as on the im3) independent quality assurance audits on the manufacon systemavailabilityfor pact of thedevicefailurerate turing processes and tests; customer performance. At this point he is alreadydifferent a in 4) failure analysis of removals during equipment assembly position than that of his counterpart in many other company or operation, with feedback to device design or manusituations: if the final customer is not satisfied with the equipfacture. ment reliability, he, the designer, must eventually account to his management for his choice in each cost-versus-reliability I t is felt that theseor other similar types of aids, or combinadecision. On the other hand, he must also account for these tions of them, cannot match the effectiveness of having device decisions a t the earlier time a t which the initial cost of t h a t reliability commitment, device design control, and reliability system is determined. testingcontrol all inthesameorganizationwiththosereI t is easy t o see t h a t he develops a critical attitude toward sponsible for the system design and performance. the cost-versus-reliability question, regarding the devices he The impactof this combined commitment and authority is accepts for his system design. At the same time he develops realized when i t is compared with the situation of purchasing a n appreciation t h a t a few pennies paid nowto assurea needed devices from an outside supplier. Such purchasing must be level of reliability may save much more total expense over thedone by the vendors commercial type number, a standardby system lifetime. Where, then, does he get the information by ized improved-reliability specification such as a M I L specifiwhich to make his decisions? cation, or by a specification prepared by the purchaser. If the reliability requirements are sufficiently relaxed and The Role of the Dmice Designer the device sufficiently standardized, the system design engiIn many company situations, the system designer goes to neers specify can a commercial typenumber. For higher the marketplace to hear what competing vendors claim t o reliability requirements, the systems engineer can benefit from will give him regarding the reliability of available devices. In the the advice of a specialist, who can be either 1) a specification Bell System, the system designer has available to him, under writer Rrho hassomeknowledge of the usualspecification the same general management and committed to the same techniques, and who has developed rapport with the suppliers, customer-satisfaction objectives, a device design expert who or 2) a device-design expert who is knowledgeable of failure must both 1) agree with the system engineer on what permechanisms can and prepare specifications which will as formance reliability and objectives be and can met, also nearly as possible guarantee the required reliability. Unfortu2) provide the necessary design information, process control nately, even this expert never has available to him the comrequirements,andfinal-devicereliability-assurancerequireplete control of the product that he would have if the processments to the \Yestern Electric manufacturing organization to ing were also under his control. If he has to testfinal product satisfy the system design assumption of device reliability. forevery possible failuremechanism,testingmustrequire T h e device design organization, then, is committed not to large samples for destructive tests and will be very expensive. design the device which has the lowest initial cost, but to de- Arrangementstoparticipateinsuppliersprocesscontrols, sign the lowest cost device for the totallife cycle of the equip- such as pull tests on bonds (perhaps 100 percent), can also be ment. Since i t has control of the total design, including maof very expensive. And assuming even the best good faith and terials, processes, and process control requirements, it has the capability on the partof the supplier, the purchaser can never opportunity to choose the least expensive technique for either have the assurance meeting his reliability objectives of that he controlling or eliminating a failure mechanism. would have if he were in control of process design- and conAs a n example, consider the classic problem of thin alumi- trols. num metallization at oxide steps, with known vulnerability to If the only devices commercially available are limited by of high-stress electromigration-causedfailures. T h e devicedesigner,comstructure or materialsfromtheapplication mitted to his management to provide reliability such that the screening or life-testing, the buyer is additionally restricted. If probability of this failure mechanism must be essentially zero, he is working on a small-production program, his ability to :ions on theirdesignthinking.Propermotivationsinthese people are essential in organizations which are concerned with the life-cycle cost of equipments; they can be attained under appropriate management conditions.

RELIABILITY ZIERDT:BELL PECK AND

IN THE

SYSTEM

203

obtain acceptance of special specification requirements by the supplier is also limited. Changes in devicedesign and processing by the supplier, with or without warning, are a continual cause of friction. Although the supplier may be most sympathetictothe need of a specific customerorprogram,the pressure of his total commitments toall of his customers must dilute his attention to the small-volume high-reliability customer.Thenetresult of this buyer-seller relationship,in which the parties are necessarily adversaries (however amiable) incostversusreliabilitydecisions,isgenerallyeither higher cost for equally reliable product, or a lower-than-desired reliability level a t equal cost. Where the device design organization and the manufacturing organization are equally dedicated to providing the system the required devicesof the required reliability, an optimum total life cycle cost, as well as an optimum supply availability, must be achievable. The close interrelationship of research, systems engineering, systems design, and semiconductor design and manufacture to obtain high reliabilitythrough processcontrol and high-stress screening or life testing has consistently provided the Bell Systemwiththereliabilityrequired for advancing needs, i n advance of the first operation of a given system.

would not be a stress of concern for that case. A stress could meet this criterion but still not cause a significant failure rate in normal use, in which case the test may be valid but not useful. 2) Thefailuremechanism or mechanismsshouldbe the same as that or those which could occur at the normal stress. In some cases i t would be acceptable to have some percentage of abnormal failures, which could be ignored statistically in making a decision about the results, but the undesirable factor in this situation is that the abnormal failurescould get out of control and make the test meaningless or very difficult to interpret. 3) All samples of a given type should respond similarly to a change in stress;i.e., the device which fails firsta t one stress level should have failed first if i t had been exposed to a different stress level. One cannot generally make failed devices recover so that they can be retested a t a different stress level for comparison, but if the distribution of time-to-failure retains the same shape a t different stress levels for similar samples, there is a strong inference that all devices in the population are responding similarly to variation in the stress. 4) Thefailuremechanismshould besufficiently understood that thetestdata couldbe handled in aphysically APPENDIX A if enough meaningful way. This, too, might not be necessary LIFE DISTRIBUTIONSD FAILURE AN RATES: test and field data were available to establish a statistically THEIR DEPENDENCE STRESS ON valid relationship between test results a t high stress and reGeneral s u l t s in the application. Since we generally,however,must As it is recognized that life-testing at thelow stress level of extrapolate to the low-stress level on a physical rather than a normal use is not practical for reliability assurance becauseof statistical basis, i t is desirable to know the physically correct the time and large samplesize involved, there arises a need to means of extrapolation. An important implication is that a test which is estabdetermine if failure can be accelerated in such a way t h a t a lished to accelerate a particular failure mechanism will only fixed relationship is maintained between the test results and provide information about that mechanism, or those which expected life in normal use. Accelerated-stress tests are not are similarly accelerated. I t will not tell anythingabout uncommon; essentially all of the common mechanical or enmechanisms which arenotacceleratedby that stressbut vironmental tests such as shock, acceleration, and temperature cycling involve stress levels far greater than any expected which may still be important in the application. Thus forfinal reliability evaluation, it may be necessary to consider failure i n actual use but which are rates due to several stresses. or combinations of stresses. 1) used only to indicate adequate design or assembly quality; Semiconductor Devices-The Life Distribution 2) easily met with normal design techniques; and therefore, Many mathematical models have been proposed for estab3) accepted by common usage. lishing a relationship between the failure rate, X, of a device Long history of testing has resulted in theuse of certain com- and various stresses of the electrical and mechanical environmonly accepted stress levels for such tests for quality control, ment [32]. These implicitly assume a constant failure rate, but without defined relationship to normal use. although all life testing andfield experience invariably demonEven the maximum rating levels of power dissipation or strate a decreasingfailure rate in early life. What is really of typical use, and important is a knowledge of what the life distribution, or the temperature are much higher than those therefore should result in accelerated failures; conversely, the distribution of time-to-failure, really is, and how changes in user who wants high reliability typically derates the device, various stresses affect that distribution. Are the early failures or applies i t a t stress levels well below maximum ratings, in in the distribution a t one stresslevel affected moreor less than recognition of the relatively higher failure rate expected t t h e later ones? Does the failure mode (hence probably the failure a rated stress level. Since the ratings are established only to mechanism)remainthesamethroughoutthedistribution? prevent users from suffering unacceptable failure rates, and These, and similar questions, must beansweredbeforeone usually do not relate to immediate device failure, an obvious canmakepredictions of futurereliability, eithefl fromacarea of exploration of stress levels for testing is that of using celerated distributions or from the early reliability of a prodthe rated stress types, but beyond the rated levels. uct as suggested by a life test within normal-use ratings. In evaluating the usefulness of a stress level for testing, One of the obviously important stresses of concern i n the several requirements should be considered: reliability of semiconductor devices is temperature, and life 1) The stress should be that type of stress found i n actual distributions at various temperatures have been found to be application. A stress such as temperature is, of course, always log-normal, as is shown for germanium devices in Fig. 1. Expresent; mechanical shock, on the other hand, could presum- amples of additional evidence that the log-normal distribution ably beessentiallyeliminatedinaspecificapplication and is applicableto semiconductor-device life are shown inFigs. 14

2 04

PROCEEDINGS OF THE IEEE,FEBRUARY

1974

.z$
w 2 >
LLd

PROCESS A 1962

PROCESS B 1960 1961

PROCESS C

Fk E i 103 104 d
4 -

10s

EXTRAPOLATEDLIFE

107 108 109 AT 100*C, HOURS

11 00

Fig. 16. Normalized (to 100C temperature) failure-frequency distribution plots of three consecutive production processes of a diffused-germanium transistor, showing how changes in process a n improve the mode of the distribution, and how experience with a single process (B) narrows the distribution dispersion.

0.01 0.1 0.5 2 5 1 20 0 CUMULATIVEFAILURE,PERCEM

Fig. 14. Log-normal life distribution of a sample of about 2000 n-p-n germanium alloy transistors.

2 1 0 30 50 70 90 98 CUMULATIVE FAILURE.PERCENT

Fig. 15. Log-normal life distribution of a small sample of diffused gerof thisdistribution manium transistors,showingtheapplicability throughout the major portion of the sample.

and 15, where Fig. 14 shows the applicabilityof this distribution in a large sample of about 2000 transistors, while Fig. 15 shows, in a much smaller sample, the applicability of the lognormal through the major portion the product life. The logof normaldistributioncan be recognized in allcases of sufficiently large and sufficiently accelerated tests of semiconductorproductsandthereforeis now assumedin Bell System studies, even on small-sample test results, where the form of the distribution could not otherwise be recognized. The log-normal life distribution is not unexpected for semiconductorproducts; if thenormaldistributionresults from the additive effects of a combination of randomly distributedvariables,thenthelog-normaldistributionresults from the multiplicative effects [33] of combinations of such variables. Examples of these variables might be as follows: 1) Varying amounts of ionic contaminants in encapsulant materials or in semiconductor surface oxides; these would be subject to diffusion at varying temperatures (therefore varying rates) throughout the structure. 2) Varying crystal structures in materials, altering overall apparent diffusion constants. rhenius equation.

3) Varying dopant levels at semiconductor surfaces, altering their sensitivity to surface charges. 4) Varying amounts of Hz0 in the encapsulating insulators or atmosphere. Combinations of such variables, with their multiplicative effects, make the log-normal distribution a reasonable one to expect, and in fact i t has been recognized in tests of semiconductor devices, electron tubes, filamentary lamps, and other devices which are subject thermally to activated failure mechanisms, and which naturally involve multiplicative variables. Otherdistributions,suchastheexponentialorthe Weibull, will frequently fit portions of the life data, but will invariably fail to fit the entire sample if the test can be continued long enough to show the complete distribution [34]. An ancillary advantage to the use of the log-normal distributionisthe engineers familiaritywiththenormal distributionandthe significance of an observed u (standard deviation) to the concept process control. Fig. 16 shows the of frequency of failures of the early diffused germanium transistors in a normal distribution in In (time) for three manufacturing processes a t four time periods. For processes B and C, the effect of poor initial control is seen in the large dispersion, the and effect of improved process control gained through production experience is evidenced in a smaller dispersion of the later product of process B . The effect of a process change in improving the mode of the life distribution is also evident. These features are familiar in engineering experience and make the meaningful interpretation of distribution plots somewhat easier. Time-Temperature Relationship In a thermally accelerated physical process it would be expected that the process rate would be dependent upon temperature according to the Arrhenius equation

where

R reaction rate Ro a constant E A activation energy (eV) k Boltzmannsconstant (8.63 X le5 eV/K) in T absolute k temperature (K).
The reaction rate will be related to failure time through some undefined function but, regardlessof that function, the failure timeshould follow thetemperaturedependence of the Arremembered, I tbe however, must the that

PECK AND ZIERDT: RELIABILITY IN THE BELL SYSTEM

205

In Fig. 3, the u estimates of the tests are averaged, and the lines of lower (<median) percentagefailuresarecalculated from the averageu. In these data there no apparent trend in is the value of u over the range of temperature of the tests, giving evidence that the life distribution is not distorted in this stress acceleration. This consistency u is adesirable facof tor in the development of an accelerated life evaluation, and has also been observed in our test of silicon mesa and planar transistors and diodes, and of IC's. Figs. 1 and 2 need to be considered together since there is some measure of interaction. This results from the fact that The value of this parameter at the point of device failure, t p thetransistors were given a 100-percenthigh-temperature (in a constant-temperature test), is 4 p , whence screening (100 h a t 100C) in the manufacturing process after sealing, and the life-test distributions corresponding and Arrhenius plot become consistent when the effect of this screening condition (i.e., the percentage of failures observed and the equivalent time at thelife-test temperature) is taken Then, since t$p is an established (implicitly or explicitly) coninto account. For example, in Fig. 1, the 100-h screening time stan t is added to the 500- and 1000-h life-test measurement times a t 100C, and the low percentage of failures during the screen EA G = nlntp - is included in the cumulative percentage failures at these time KT periods. A t 150C, however, according to Fig. 2, the equivalent of 100 h a t 100C is only0.5 h, a negligible addition to the and test measurement times, just as the low failure percentage EA during screening is a negligible addition to the measured failIn t p = c' ures. (Of course, for a product which might have a significant nk T loss during screening, there results a shift in both time and (the basic Arrhenius plot of time-to-failure percentage versus failure percentages. The failure percentage of the screen retemperature, in In t versus l/Tk), where the apparent activa- mains constant for all other conditions, but the equivalent tion energy is &/n, or l / n times the activation energy of the time changes according to the life-test temperature and the actual mechanism causing the change in the device parameter determined activation energy.) #J.(A specific recent example of this is the changeof threshold This leads to the reasonable inference that the 'life" of voltage VT of an MOS transistor, whereAVT at a constant test these transistors started when they were sealed, and all subsetemperature varies as t" [30]. [31], with n observed as 5 or 1 quent processing operations became a part their life history. of in different gate-insulator structures.) (As will be seen later, any reasonable thermal-processing me- times and stresses areso short relative to inherent germanium In Fig. 2, for earlydiffused-germaniumtransistors, dian-life estimates from the various tests are plotted accordor silicon life capabilities that the effect of "shortening" life is ing to the Arrhenius equation, with the ordinate scale linear negligible, and the only factor concern is whether the failure of in the inverse of temperature (in Kelvins). The median values rate of the remaining product will be improved from that of are plotted because the median is the location parameter of the total product before such processing.) the log-normaldistributionandistheonemostaccurately The purpose of an accelerated-test program is to develop estimated from sampledata.The median-life values a fit enough information to enable a prediction the expected life of straight line very well in this plot, indicating either the pres- distribution a t low stress. With as much data as is shown in ence of a single failure mechanism or, if several are present, Fig. 3, for example, a regression line of best fit can readily be the similarity of their activation energies. estimated by eye. A best fit line could also be calculated staThe dispersion of the failure distribution is described by u tistically (converting both the ordinate and abscissa to linear which, for the log-normal distribution, is the standard devia- scales), if thisdegree of precisionwere desired;sincethis tion of the logarithm of time to failure, and can be estimated technique treats only one failure mechanism at a time, it is as14 doubtful that obtaining statistical precision and, further, the establishment of confidence intervals16 are worthwhile. The u = In (time-to-SO-percent failure) extrapolation is certainly least subject to argument if i t is minus In (time-to-16-percent failure) used onlytoindicatewhetherthetemperature-accelerated mechanismis going to provideanundesirably high failure or rate, or will be essentiallyunseen at the stress level of the application. One feature that is helpful, according to observations to

apparent activation energy of the process of reaching some defined end-point limit of a parameter, as measured by the effect of temperature on the failure distribution, is not necessarily the same as the activation energy of the process itself by which the parameter changes. Consider,forexample, the casewheretheparameter 4 changes according to the equation"

The parameterr$ may be anelectricallp measurable characteristic of the device, or it may be some parameter such as rate of charge motion, which requires progressing to some critical degree before it exhibits itself in a change anelectrically measurable characteristic. in 1 The Naperian-base logarithm is ' used for u throughout this work.
18

16 Statistical evaluations are necessary, on the other hand, if a comparison is being made between two tests at the Same stress levels in order to choose between two products, having the Same apparent activation energy of failure times. Here statistical-inference techniques developed for the normal distribution can be adapted t the log-normal for such o purpcrses.

206
TABLE I11 APPARENT ACTIVATION ENERGIES FAILURES FOR OF SEVERAL DEVICE TYPES

PROCEEDINGSOF

THE IEEE, FEBRUARY

1974

DEVICE TYPE TRANSISTORS GERMANIUM, UNGETTERED , GETTERED WITH VYCOR OR MOLECULAR SIEVE SILICON, BIPOLAR , WITH SURFACE-INVERSION FAILURES , WITH Au-AI BOND FAILURES , WITH METAL PENETRATION INTO Si , p-n-p-n DIODES SILICON, p-n-p-n , VARACTORS , OTHERS

APPARENT ACTIVATION ENERGY


0.88

0
1.24

'FAILURE SURFACE' EXTRAPOLATION

k
1.02 1.02-1.04 1.77 1.65 1.41 2.31-2.38 1.13-2.77

r NORMAL

TIME AT STRESS (LOG SCALE)

Fig. 17. Three-dimensional view o the frequency distribution of failure f in time, as affected by stress. For semiconductor devices with failure mechanisms as discussed, this distribution extrapolation becomes linear when the stress is temperature, plotted as linear in the inverse of absolute temperature.

date, is t h a t temperature-accelerated failures show a common temperature-regressioncharacteristicin all transistors of a NUMBER DEVICES OF class,regardless of manufacture. T h a t is, all conventional STEP FAILING AT THIS silicon SiOz-surface AI-metallized transistors failing by surface inversion have shown the temperature regression of Fig. 3. This suggests that a compilation of apparent activation energies for time-to-failure for various products would b e useful, at least as a guide toa life-testing program on a specific device structure. Table 111 lists the activation energies of time-tofailure for a number of device types (and failure modes) as of activation theyhavebeenobservedtodate.Therange energiesshownfordiodesreflects a lack of the consistency which has been seen in transistors. T h e review of a number of tests on a variety of diode types, of different structures and materials,showsnoobservablepattern;evendiodes which fail because of surface channels show higher activation energy than the 1.02 eV of transistors, Further study would be required here for an understanding of the factors which deterti INCREASING NUMBER OF STRESS S T E P S 4 minetheactivationenergy of failure in a semiconductor diode. I t is a t least generally high, supporting the general ob- Fig. 18. Pictorial representation of the frequency distribution of failures in increasing stress as stress steps of length f i are sequentially applied servation t h a t failure rates of diodes in service are lower than to a given sample of devices. those of transistors, a common level of processing technology being implied. Y 6, 450 Step-Stress Tests

s '

T h e acceleration of life distributions with stress as shown in Figs. 1, 2, and 3 can be represented inthe three-dimensional sketch of Fig. 17, showing the relative frequencyof failures as a function of stress and time. Lines 1 and 2 represent the frequency distributionof failures which would result in constantstress life. tests a t the corresponding stress levels. I t should also be possible to obtain frequency distributions of failures resulting from increasing stress as suggested by lines 3 and 4 [ 6 ] . Obtainingtheselatterdistributions is complicatedby two factors: 1) I t is generally necessary to measure several device a 'failure" occurs, i.e.. characteristics determine to when when a deviceexceeds its parametric limits. Such measurements are generally difficult to make while the device is being stressed. 2) Atanypointinincreasingstressthereis a residual effect of all the cumulative exposure of the sample to earlier stress. Ignoringthesecondfactorforthemoment,thefirstis answered by using discrete steps of stress, rather than a continuously increasing stress,so that thedevices can be removed

5 22
W + E

z -c'
b

400 350

300
250

''
z a p w
7

+z

200

1500.5 2 10 30 50 70 90 98 99.5 CUMULATIVE FAILURE ,PERCENT

Fig. 19. Typical step-stress plot of stress temperature (plotted as linear inverse of absolute temperature) versus cumulative failures (on a normal probability scale).

from stress and measured after each step.Alth stepsof equal extime periods a cumulative failure distribution results as of emplified in Fig. 18. Fig. 19 shows actual results plotting the cumulative failure percentage for step-stress testingof a logic diode, with the stress scale that of junction temperature linear in l / T k , to be consistent with the Arrhenius plot. If the stress (temperature) differences are sufficiently large, the residual effect of the previous step isnegligible. T h e determination of what is sufficiently large requires knowledge

PECK AND ZIERDT: RELIABILITY IN TEE BELL SYSTEM


700

207

?5

550
450 400 350 250

:
2 U

Ktn

x300

k: z w

200

+ i

I50
100

loo

IO!

102 103 104 LIFE, H W R S

105

108

As shown in Fig. 3 and in plots of other devices [6], [9], such step-stress results can be a useful adjunct to constantstress results, or they can serve merely to indicate quickly the range of stress and time which is of interest for a particular failure mechanism in a particular processing and construction technology. A significant advantage of the step-stress test is that the time interval for obtaining information (positive or negative) is defined once the tester has chosen the step time and the number of steps. If i t develops that insufficient failures have occurred, at the last planned step, to provide useful information, the test could be continued a t t h a t level as a constantstress test.

Fig. 20. Arrhenius plot showing how the residual effect of earlier s t e p stress steps can be accounted for in plotting test results. The assumed of the actual step activation-energy line is shown, and also a table should temperature and that at which the observed failure percentage step time. be plotted, for the given

Calculation of Failure Rates Given an extrapolated estimate of the median-life value a t a low-stress level of application interest (from high-stress life data), and an estimate of u from the available tests, a of the activation energy and hence the equivalent timea t t h e calculation of instantaneous failure rate can be made for any new stress level of the timea t previous stresslevels. Therefore, time in the low-stress application. Fig. 21 [35] shows failure rate versus time in normalized terms of (failure rate, FITs) thestep-stresstechniquecannot beused convenientlyfor X(median life, hours) versus (real time, hours) +(median life, initialdetermination of theactivation energy of afailure hours). For example, a t 10 000 hforadevicewith an exmechanism. I t is most useful as an initial indicator, particutrapolated median life of lo7 h, t = 1 0 - 3 and, if the u were estilarly for an unknown product, of the prevalent failure modes life) would and of where in the stress-time domain they occur. This pro- mated at 2.0, theterm(failurerate)X(median vides the basis for establishing the stress levels for constant- equal approximately 5 X lo8 (from the ordinate axis), whereby the failure rate a t 10 000 h is 50 FITs for the failure mechastress tests which will provide the desired information in the nism under study. A curve of failure rate versus time in applileast time, or which will identify different failure mechanisms cation can t h u s be developed which (with the application of most readily. some judgment to take into account the amount of data taken, Corrections to Step-Stress Test Results confidence in the assumption of activation energy, etc.) can be To improve the accuracy step-stress tests, and toresolve a guide as to whether this failure mechanism will or will not of be a factor i n meetingtherequireddevicereliability.(See the second factor complicating the use of step-stress test reAppendix B regarding the effect of screening or burn-in.) sults, a verysimplecorrectionisavailable,usinggraphical The reverse process can also be carried out: i.e., given a techniques, with an assumption of, or knowledge of, the activation energy. If the activation energy is known, or can be failure-rate objective and enough history to suggest the failure reasonably assumed, the technique is straightforward; if i t is mechanism (or activation energy) and an expected u,one can not known or cannot be assumed, then one mustuse raw step- develop an appropriate high-stress life test. For example, note that the curves for all us up to 2.0 (a reasonable upper limit stress data from tests a t several test periods to obtain a firstprocesses) peak at an ordinate order estimate of activation energy. This can be used to cor- for shaken-down semiconductor rect all step-stress median-failure levels, providing an adjusted value of lo9, so that a median life of lo8 h would guarantee estimate of activation energy, which can then be used again on 10-FIT reliability at any time. Further study of the range of t throughout service life (see Appendix B) couldrefine the the raw data to provide refined distributions. Since the corrections tend to be relatively small, few iterations should be median life requirement to a time less than lo8 h, generally not less than lo7 h. The correspondingmedian life requirerequired. ment a t ahigh-stress levelcan then be established, using The correction technique can be explained with reference Fig. 3 or its equivalent, and with an estimate of u,a choice of to Fig. 20, as a n example, in the following sequence: allowed failurepercentagecan be made. 1) The first stress step was 16 h a t 18OoC, and the failure life-testtimeand percentage which occurred would be plotted at that tempera- Table I V shows some examples of life-test selections possible for a few chosen reliability requirements for these temperature for the 16-h step plot. 2) The second step was a t 210C, but the equivalentof the ture-acceleratable failure mechanisms. first step is 2 h a t 21OoC, according to the acceleration line shown across the centerof the plot. This means that, after the Results 16-h exposure a t 21OoC, the total cumulative time was 18 h In many cases, particularly with digital devices operating which, extrapolated back to the 16-h point, gives an equiva- a t average junction temperatures not far above ambient temlent 16-h exposure temperature of 212 C. a t which tempera- perature, calculations of expected reliability in the field from ture the new cumulative percentage of failure is now plotted. accelerated-stress studies would predict failure rates well be3) Continuing this extrapolation back to the next stress low l FIT due to the usual temperature-acceleratedfailure temperature of 24OoC, the equivalent is nearly4hwhich, mechanism. I t is very difficult to confirm such a low failure added to the new 16-h exposure, gives a cumulative 20 h, or rate, although the field d a t a of # l ESS logic transistors the equivalent of 244C at 16 h. (quoted in the text) tend to confirm a device failure rate not 4) This process is repeated a t each step, giving a more ac- significantly higher than the prediction, with a removal rate curateplot of thestep-stressresponsetoa16-htimecut down to 2.5 RITs and a typical 1O:l difference between rethrough the failure frequency distribution. movals and device failures due to degradation.

208
IO 14

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

IO'!

1011

VI

;
z
W

IO"

10'0

k2

X
W

d
1L z

D O

101

10'

1 0

105

loa
Io-K,
,0-9

,o-r

m-I

l'=(TIME)

- (MEDIAN LIFE)

e-*

C S

10-2

m-I

0 0

0 1

(02

Fig. 21.

Normalized plot of failure rate versus time for log-normal distributions having various u's.

TABLE IV
EXAMPLES LIFE-TEST OF REQUIREMENTS FOR CONVENTIONAL SILICON TRANSISTORS
FAILURERATE REOUIREMENT, FIT'S C H O S E N TEST TEMPERATURE, 'C REQUIRED LIFE TEST M E D I A N LIFE, REOUIREMENT H R S . A T TEST T I MA I L U R E S FE TEMPERATURE AI,LOWED

recognized that the power of accelerated-stresstests is diminished and one's ability to predict reliability becomes more limited; reliability assurance suffers.

APPENDIX B

RECOGNITION CONTROL FREAK AND OF DEVICES Recognition of Freak Percentages 100 AT 1) = 70C 175 5000 500 13% The failure rate calculations Appendix A assume perfect of 100 AT 1) = 125C 250 12 000 1000 12% log-normal distributions, although Figs. 4-7 show evidence of 10 AT T j = 7 0 C bimodal distributions (i.e., the early failure percentages occur 250 450 100 23% a t shorter times than would be expected from an extension of l O A T T , = 125C 300 10 000 1000 13% whatsubsequentlyappearsto be themajordistribution). 70C 1 ATTj 250 12000 1000 12% This same pattern is shownmoredramaticallyinFig. 22, showing the results of two accelerated power life tests on a of only 20 In the case of a high-frequency diffused germanium tran- power transistor. Each test represents a sample units, so i t is expected that the percentage of the early-failure sistor operating a t 50C junction temperature, the predicted distribution-thefreaksI6-mightvary,assuggestedby the failure rate, from step-stress data at 4- to 100-h steps. was 60 FIT's at a 3000-h operating time. Actual field d a t a showed different slopes of the freak portion of the plotted curves. Fig. 23 shows the form of a n expected result of combining an average failure rateof 52 FIT's in 4000 h of operation. life A siliconpower diode was subjected to 2-, 6-, and 24-h a small percentage of a freak population with a normal distribution. The early-life part of the combined distribution step-stress tests, median with failure temperatures the in range of 25OoC to 28OoC. Thecorrespondingprediction of varies greatly depending on the percentage of the freak disfailures in a 1000-h power life test at rated power (a junction tribution and its actual parameters, and the actual test repretemperature of 150OC) was within a factor of two of the actual sentation of t h a t portion of curve will depend also on the sample size and the choice of time intervals for measurements of life-test results. Similar results havebeen obtained in military applications the sample. I t is difficult to obtain enough data to accurately describe the shape of the freak distribution, and the assumpwhere the type of failure predicted high-temperature by tion of a log-normal distribution in Fig. 23 is only convenient tests, with and without dc or R F operation, confirmed the type of failure and the failure rate occurring in system opera- because of the expectation of the log-normal for the main distribution. The S-shape curve has been observed, however, in tion. Although care was required to consider the failure mechanismsinvolved,there were notthatmanymechanisms of As has been discussed, earlyfailures can result from inapient opens, concern for discrete devices that accelerated-stress data could shorts, etc., resulting from mechanically defective units. In the sense disnot be used effectively. As devices become more complex and cussed here, these failures are removed for separate consideration, and the 'freaks" of concern are those which are temperatureaccelerated in use more stress-limiting materials and structures, it must be essentially the samemanner as the failures from the main distribution.

PECK AND ZIERDT: RELIABILITY IN THE BELL SYSTEM 1o4

209
TABLE V
CONDITIONS h'ECESSARY FOR FREAK REMOVAL

T&*C
103

TIME, HOURS 20

300

VI

a
W

150

250

= 102 .
LL
J

200
180 125

1500
000

101

IO0 1

5 10 20 30 50 70 80 999 9 05 CUMULATiVE FAILURE.PERCENT

Fig. 22. Log-normal plots of power-transistor life tests showing the distinctionbetween freakportions and mainportions of the distributions.

LIFE,HOURS

Fig. 24. Plots of expected failure rate versusoperatinglife (at 100C junction temperature) of a power transistor with and without screening to eliminate freak units.

.I

.5

2 5 IO 30 50 70 90 9 8 9 9 95 CUMULATIVE FAILURE,PERCEKT

Fig. 23. Typicalmainand freak distributions of surface-degradation failures, as could be expected at 300OC junction temperature.

indicates the times necessary (from data in the 25O0-3OO0C range,andassuming 1.02 eV) toeliminatethefreakunits from the population a t various junction temperatures. While failure-rate improvement is likely to result from a n y screen, or burn-in, at lesser conditions of time or temperature, extrapolations and failure-rate calculations based only on main distributions, without complete elimination of freakunits from the product, could be misleading.

many life tests where the sample size was large enough and the test times were short enough to reveal the early distribution.

Elimination of Freak Units I t can be recognized, by looking a t the combined distributions from a variety of freak distributions, that the inflection point of the S-shaped curve indicates the percentage of freak units in the sample; correspondingly, i t would also indicate thetime(atthestress level that test) at which all of the freak units would be eliminated, if it can be assumed that all the freaks fail before failures start from the main distribution. Although this may not always be the case, a review of a large number of accelerated-stress tests (in the 250-3OO0C range) shows a considerable similarity of the obserx-able inflection points where surface inversion is the failure mechanism the of freak population as well as of the main population. There is some suggestion from limited data that the freak population may have somewhat lower activation energy than the1.02 eV of the main population surface-inversion failures, suggesting of a different ion involvement or a slightly different mechanism of ion motion in the freak units. I t would seem conservative, 1.02-eV activation energy, in order not however, to assume the to anticipate freak removaltoo early a t lower stress. Table V

Failure Rates Including Freak Units Failure rates can be calculated for product which includes freaks, such as in Fig. 22, regardless of the exact form of the freak distribution, but assuming the same activation energy of the main population. Even then, as seen in this figure, considerable judgment may be necessary; for the example in this figure, an average slope in the freak region may have to be assumed, and an average temperature assumed. (In Figs. 4-7, thelargersamplewouldpermitmore precisecalculations.) Using such an estimateof the distributioni n the freak region, a median life and u can be estimated to represent the total product for t h a t portion of its life, extrapolated to the ,use condition according to the activation energy. Fig. 24 shows the resulting failure rate expectations the combined distri(of bution and of only the main distribution) of the product of Fig. 22 b u t a t a junction temperature of 100OC. Further Effects of Screens or Burn-In Although the intent of a burn-in is to remove freak units, it also has an equivalent effect on the main population. For example, a 20-h burn-in at 300C on a conventional silicon transistor has the effect of nearly lo7 h at 8OoC (see Fig. 3), so t h a t this would represent the beginning of service life, at t h a t stress, of product so burned-in (with respect to surface

210

PROCEEDINGS OF THE IEEE, FEBRUARY

1974

inversion and Au-A1 intermetallic growth). Now ten years of service life represents only an addition of approximately one one-hundredth of the life already passed, providing an essentially constant failure rate, according to Fig. 21. Even though this life may turn out to in the increasingbe failure-rate portion of the curve, the overall failure rate has been improved through elimination of the freak population, and the resultant failure rate is essentially constant for this failure mechanism. If a burn-in is done with less than total a of w h a t failure elimination of the freaks, then the estimate 0.1 I LO 100 remains rate is very difficult since i t involves knowing the RELATIVE LIFE F W O R exact freak-failure distribution. I t can only be said that the failure rate would be somewhat improved from that which Fig. 25. Arrhenius plot of therelativelife of semiconductor devices(at would have been expected burn-in. without constant varying relative in humidity) temperature, normalized to
the common humidity test condition85OC. of

APPENDIXC ACCELERATED TEMPERATURE-HUMIDITY TESTS


Testing devices in high humidity and t high temperature a in order to accelerate surface-finish degradation has been a long-establishedpracticefor parts that areto beusedin exposedlocations. The use of electricallybiasedmetalconductors at very close spacings, as on the surfaces ICs, adds of an additional factor to the question of part acceptability in humidenvironments-that of electrolyticcorrosion of the metals. An appropriate humidity-life test, therefore, for modern semiconductor devices with only plastic protection the for surface metallization, now must include electrical bias in order to make the test meaningful for this failure mechanismwell as as for others due to humidity alone. This testing bias should be a t t h e level of normal operation, in order not to introduce an additional acceleration factor unless such an acceleration can be confirmed independently. Fig. 26. Plot of the relative life of semiconductor devices (at constant Data regarding life of plastic-encapsulated devices in temperature) in varying relative-humiditypercentage (on a log scale), normalized to thecommonrelative-humidity test condition of 85 humidity have been obtained by many sources, but without percent. or establishingaccelerationfactorsforeithertemperature humidity independently, and without physically based consideration of their possible interaction. Recently, data [28] device surface on which the process of electrolysis may cause have come to light regarding changes in surface resistivityof failure. This surface may be expected to be a t some temperainsulators due to theeffect of humidity. Such surface resistiv- ture higher than the environmental ambient, due to general ity could be expected to be related inversely to electrolytic temperature rise in the equipment, and to specific temperacurrent flow betweenbiasedconductors and,therefore,diture rise of the device over the equipment ambient. This temrectly to device time-to-failure due to this mechanism. Comperature rise is significantsince, if thereshouldbe a void parison of these resistivity data and device life d a t a shows between the surface and the plastic, the water vapor pressure remarkablesimilaritybetweensurfaceresistivitymeasurein the void will be the same as that in the ambient environments and device failures due to electrolysis in terms of their ment(duetopermeability of the plastic) andtherelative relative dependence on temperature and humidity. humidity at the surface will be reduced to a degree related to Fig. 25 shows the resulting relative relationship between the temperature rise. Fig. 27 provides the data by which one device lifeand temperature developed from data from to can translate from a given ambient environment of tempera50C 150OC; this relationship is shown to independent of relative ture and relative humidity to the relative humidity be at the humidity, a t least above 10 percent, according to the resistiv- (warmer) device surface, using a constant absolute humidity 26 shows the expectedrelationship as the basis for translation. itymeasurements.Fig. between life and relative humidity, which is independent of For most device and equipment designs and typical oper1) the temperature at least from 50C to 150C. Since the tempera- ating conditions, it seems reasonable to assume that: ture-life and humidity-life accelerations independent, are chip face will be the critical surface; and 2) the average chip face temperature will be very close to the average chip juncthey can be multiplied to aetermine the total acceleration of time-to-failure between the expected device environment and tion temperature. If another surface (such as uncoated chipa short-term temperaturehumidity test condition. This acmounting substrate surface) should found to be more critibe celeration factor may then be used, with a chosen acceptable cal for given circumstances, its temperature rise and resulting relative-humidity conditions are those which should be conlimit on field failures, to derive an accelerated temperaturehumidity test which may be used to qualify device lots for sidered. shipment or to qualify materials and coating procedures. Whenthecombinedaccelerationfactorbetweenaproposed temperaturehumidity test condition and the applicaThe determination of these acceleration factors requires knowing the temperature and relative humiditya t t h ecritical tion condition has been determined, then either the acceptable

PECK AND ZIERDT: RELIABILITY IN THE BELL SYSTEM

211
[SI G. L. Schnable and R. S Keen, On failure mechanisms in largei scale integratedcircuits,in Adoances i n Electronics and Electron Physics, vol. 30. New York: Academic Press, 1971, pp. 79-138. [9] D. S. Peck, Semiconductor reliability predictions from distribulife tion data, inSemiconductor Reliability, Schwop and Sullivan, Eds., New York: Reinhold, 1961, pp. 5:-67. [lo] J. Partridge and L. D. Hanley, The impact of the flight specificaProc. 6th Annu. Reliability tions on semiconductor failure rates, in PhysicsSymp. (Lm Angeles, Calif., Nov. 6-8, 1967), pp. 20-30 (IEEE Cat.no. 7-15C58). [ l l ] D. S. Peck, Transistor failure studies a t accelerated stress levels, in Proc. 5th Annu.Conf. on Basic Failure Mechanismsand Reliability i n Electronics (Metropolitan New York Section of IEEE, Newark College of Engineering, June 1964). [12] R. S. Keen, L. R. Loewenstern, and G. L. Schnable, Mechanisms of contact failure in semiconductor devices,in Proc. 6th Ann. ReZG Angeles, Calif., Nov. 6-8, 1967), pp. ability Physics Symp. (h 216-233 (IEEE Cat.no. 7-15C58). [13] C. Weaver and L. Brown, Diffusion in evaporated films of goldaluminum, Phil. Mag., vol. 7, pp. 1-16, 1962. [14] G. C. Sikora and L. E. Miller, Application of power stepstre= technique to transistor predictions, in life Proc. Physicsof Failure i n Electronic Symp., 1965 (published by IIT Res. Inst. and Rome Air Develop. Cent.). pp. 30-42. 1151 D. S. Peck, Semiconductor device life and system removal rates, in Proc. 1968 Symp. on Reliability (Boston, Mass., Jan. 19681, pp. 593-601 (IEEE Cat; no. 68C-33-R). [16] C. H.Zierdt,Jr.,Procurement specification techniquesfor high reliability transistors, in Proc. 1967 Symp. on Reliability (Washington, D. C., Jan. 10-12, 1967), pp. 388-407 (IEEE Cat. no. 7C50). [17] D. S. Peckand M. C. Wooley, Component design, constructionand evaluation for satellites,Bell Syst. Tech. J., vol. 42, pp. 1665-1686, July 1963. [18] D. S. Peck, R. R. Blair, W. L. Bmown, and F. M. Smits, Surface effects of radiation on transistors, Bell Syst. Tech. J . , vol. 42, pp. 95-130, Jan. 1963. (191 D. S. Peck and E. R. Schmid, Effects of radiation on transistors in the first Telstar satellite, Nature, vol. 199, pp. 741-744, Aug. 24, 1963. [20] M. P. Lepselter, Beam-lead sealed-junction technology, BeU Lab. Rec., pp. 299-303, 0ct.-Nov. 1966. [21] G. H. Schneer, W. van Gelder, V. E. Hauser, and P. F.Schmidt, A metal-insulator-silicon junction seal, IEEETrans. Electron Devices, vol. ED-15, pp. 290-293, May 1968. [22] M. P. Lepselter, Beam lead technology, Bell Syst. Tech. J . , vol. 45, pp. 233-253, Feb. 1966. 1231 R. V. Penney, Current-induced mass transport in aluminum, J . Phys. Chem. Solids, vol. 25, pp. 335-342, 1964. [24] H. B. Huntington and A. R. Grove, Current-inducedmarker m* tion in gold wire, J . Phys. Chem. Solids, vol. 20, pp. 76-87, 1961. [25] J. E. Clark, Wobble table for thermo-compreasion bonding beamlead silicon integrated circuits, in Proc. 1968 Int. EIectronic Circuit Packaging Symp. (Lm Angeles, Calif., Aug. 19-20, 1968). (261 M. P. Eleftherion, Assembling beam-lead sealed-junction integrated-circuit packages, WesternElectricEngineer, vol. XI, no. 4, Dec. 1967. [27] M. L. White, Encapsulation of integrated arcuits, PTOC. IEEE, vol. 57, pp. 1610-1615, Sept. 1969. [28] D.S. Peck and C. H. Zierdt, Jr., Temperature-humidity acceleration of metal-electrolysis failure in semiconductor devices,in PYOC. &as Vegas, Nev., Apr. 2-5, 1973) 1973ReliabilityPhysicsSymp. (IEEE Cat.n 73 CHO 755-9 PHY). : [29] D. S. Peck, Reliability of beam-lead sealed-junctiondevices,in Proc. 1969 Annu.ReliabilitySymp. (Chicago, Ill., Jan. 19691, pp. 191-201 (IEEE $at. no. 69CS-R). [SO] F. H. Reynolds, The respOnse of the threshold voltage of the trapistors in simple MOS circuits to tests a t elevated temperatures, in Proc.1971ReliabilityPhysics Symp. (Mar. 31-Apr. 2, 1971), PP. 46-56 (IEEE Cat.no. 71-C-9-PHYl. [31] E. E. Lampi and E. F. Labuda, A reliability stundy of insulated Proc. 1972 gate field transistors with a n A1203 gate structure, in Reliability Physics Symp. (Apr. 5-7, 1972), pp. 112-119 (IEEE Cat. no. 72 CHO 628-S-PHY). [32] C. M.Ryerson, Mathematical modeling for predicting failure rates SWP. of component parts, in Proc. 6th Annu. Reliability Physics Los Angeles, Calif., (Nov. 6-8, 1967), pp. 10-15 (IEEECat. no. 7-15C58). [33] J. S. Smith and J. Vaccaro, Failure mechanisms and device reliability, in Proc. 6th Annu. Reliability Physics Symp. (LaAngeles, Calif., Nov. 6 8 , 1967), pp. 1-9 (IEEE Cat. no. 7-15CS8). [34] D. S. Peck, The uses of semiconductor life distributions, in semiconductor Reliability, vol. 2, W. H. Von Alven, Ed. New York: Reinhold, 1962, pp. 10-28. [35] L. R. Goldthwaite, Failure rate study for the log-normal lifetime model, in Proc. 7th Not. Symp. on Reliability and Quality Control, pp. 208-213, Jan. 1961.

s
4

>
a
n
I &

1.0

$
0
0

01

a
w

5i

%
51

I &

c s
0

> .01 -

I w
I 3

a ,0010 $

10

20

30

40 50 60 70 TEMPERATURE ,*C

80

90 l o 0

Fig. 27. Psychrometric plot showing the relationship between absolute humidity and temperature for a constant relative humidity. Therelative humidity a t a given surface can be determined by translating from a given ambient relative humidity and temperature, a t constant absolute humidity, to the temperature the surface of interest. of

maximum cumulative percentage of failures or the acceptable maximum failure rate in the application can be treated as in Appendices A and B to establish a suitable combination of time and allowable percentage of failures a t the test condition,toassureacceptableperformanceinthesystem.This since temperaturetreatment is appropriate accelerated humiditydatacansatisfactorilybefittedto a log-normal time-to-failure distribution, which can be treated as in Appendices A and B. Determination of the minimum allowable median life corresponding to the maximum allowable failure rate would require testing a t accelerated conditions to determine the expected u of the log-normal distribution. Because of the wide variationinencapsulatingtechniquesandmaterials, the u may vary significantly between products, and must be determined for each case.

ACKXOWLEDGMENT
The authors would like to thank for their assistance all of those who have participated in the many tests which have provided data for these results, and particularly the Quality Assurance Center for providing data on overall failure rates, L. E. Miller for participation regarding diodes, and J. Godfrey and F. L. Howland for their expert review.

REFERENCES
[ l ] Bell Syst. Tech. J . , vol. 43, Sept. 1964. [2] A. J. Wahl and J. J.nKleimack. Factors affecting reliability of alloy junction transistors, Proc. I R E , vol. 44, pp. 494-502, Apr. 1956. [3] M. Sobel and J. A. Tischendorf, Acceptance sampling withnew life test objectives, in Proc. 5th Not. S y m p . on Reliability and Quality Control (Philadelphia, Pa., Jan. 12-14, 1959). pp. 1OS;llS. [dl 0. L. Anderson, H. Christensen, and P. Andreatch, Technique for connecting electrical leads to semiconductors, J . Appl. Phys., vol. 28, p. 923ff, Aug. 1957. [SI D. S. Peck, A mesa transistor reliability program, Solid Stale J . , Nov.-Dec. 1960. [6] G. A. Dodson and B. T. Howard, High stress aging to failure of semiconductor devices, in Proc. 7th Nat. Symp. on Reliability and Quality Control (Philadelphia, Pa.,1961), pp. 262-272. [7] J. Partridge, On the extrapolation of accelerated stress conditions to normal stress conditions of germanium transistors, presented a t the Physics of Failures in Electronics Symp., Illinois Inst. Technol. Res. Inst., Chicago, Ill,. Sept. 26, 1963.

Вам также может понравиться