Академический Документы
Профессиональный Документы
Культура Документы
Abstract—This paper presents a low hardware overhead test testing [1], [9]–[11], the outputs of test pattern generator (TPG)
pattern generator (TPG) for scan-based built-in self-test (BIST) are biased to generate test sequences that have nonuniform
that can reduce switching activity in circuits under test (CUTs) signal probabilities to increase detection probabilities of RPRFs
during BIST and also achieve very high fault coverage with
reasonable lengths of test sequences. The proposed BIST TPG that escape pseudorandom test sequences, which have a uni-
decreases transitions that occur at scan inputs during scan shift form signal probability of 0.5. Random pattern generators
operations and hence reduces switching activity in the CUT. The proposed in [12] and [13] use Markov sources to exploit spatial
proposed BIST is comprised of two TPGs: LT-RTPG and 3-weight correlation between state inputs that are consecutively located
WRBIST. Test patterns generated by the LT-RTPG detect in the scan chain. A 3-weight weighted random BIST (3-weight
easy-to-detect faults and test patterns generated by the 3-weight
WRBIST detect faults that remain undetected after LT-RTPG WRBIST) can be classified as an extreme case of conventional
patterns are applied. The proposed BIST TPG does not require weighted random pattern testing BIST. However, in contrast
modification of mission logics, which can lead to performance to conventional weighted random pattern testing BIST where
degradation. Experimental results for ISCAS’89 benchmark various weights, e.g., 0, 0.25, 0.5, 0.75, 1.0, can be assigned
circuits demonstrate that the proposed BIST can significantly to outputs of TPGs, in 3-weight WRBIST, only three weights,
reduce switching activity during BIST while achieving 100% fault
coverage for all ISCAS’89 benchmark circuits. Larger reduction 0, 0.5, and 1, are assigned. Since only three weights are used,
in switching activity is achieved in large circuits. Experimental circuitry to generate weights is simple; weight 1 (0) is obtained
results also show that the proposed BIST can be implemented with by fixing a signal to a 1 (0) and weight 0.5 by driving a signal
low area overhead. by an output of a pseudorandom pattern generator, such as an
Index Terms—Built-in self-test (BIST), heat dissipation during LFSR. Weight sets are calculated from test cubes for RPRFs.
test application, low power testing, power dissipation during test Though the attainment of high fault coverage with practical
application, random pattern testing. lengths of test sequences is still one major concern of BIST
techniques, reducing switching activity has become another im-
I. INTRODUCTION portant objective. It has been observed that switching activity
during test application is often significantly higher than that
INCE in built-in self-test (BIST), test patterns are gener-
S ated and applied to the circuit-under-test (CUT) by on-chip
hardware, minimizing hardware overhead is a major concern
during normal operation [14]. The correlation between consec-
utive random patterns generated by an LFSR is low—this is a
well-known property of LFSR generated patterns [1], [15]. On
of BIST implementation. Unlike stored pattern BIST, which the other hand, significant correlation exists between consecu-
requires high hardware overhead due to memory devices re- tive patterns during the normal operation of a circuit. Hence,
quired to store precomputed test patterns, pseudorandom BIST, switching activity in a circuit can be significantly higher during
where test patterns are generated by pseudorandom pattern BIST than that during its normal operation. Finite-state ma-
generators such as linear feedback shift registers (LFSRs) and chines are often implemented in such a manner that vectors
cellular automata (CA), requires very little hardware overhead. representing successive states are highly correlated to reduce
However, achieving high fault coverage for CUTs that con- power dissipation [16]. However, use of design-for-testability
tain many random pattern resistant faults (RPRFs) only with (DFT) techniques such as scan significantly decreases the cor-
(pseudo) random patterns generated by an LFSR or CA often relation between consecutive state vectors. Use of scan allows to
requires unacceptably long test sequences thereby resulting in apply patterns that cannot appear during normal operation to the
prohibitively long test time. The random pattern test length state inputs of the CUT during test application. Furthermore, the
required to achieve high fault coverage is often determined by values applied at the state inputs of the CUT during scan shift
only a few RPRFs [1]. operations represent shifted values of test vectors and circuit re-
Several techniques have been proposed to address this sponses and have no particular temporal correlation. Excessive
problem. Reseedable and/or reconfigurable LFSRs are pro- switching activity due to low correlation between consecutive
posed in [2]–[4]. In [5] and [6], random patterns that do not test patterns can cause several problems [14], [17]–[19].
detect any new faults are mapped into deterministic tests for Since heat dissipation in a CMOS circuit is proportional to
RPRFs. In test point insertion (TPI) techniques [7], [8], control switching activity, a CUT can be permanently damaged due
and observation points are inserted at selected gates to improve to excessive heat dissipation if switching activity in the circuit
detection probabilities of RPRFs. In weighted random pattern during test application is much higher than that during its normal
operation. Heat dissipated during test application is already in-
Manuscript received May 24, 2006; revised December 22, 2006.
The author is with NEC Laboratories, America, Princeton, NJ 08540 USA.
fluencing the design of test methodologies for practical circuits
Digital Object Identifier 10.1109/TVLSI.2007.899234 [14], [19], [20].
Metal migration (electromigration) causes erosion of conduc- activity during BIST are illustrated in Section III. The architec-
tors and subsequent failure of circuits [21]. Since temperature and ture of the proposed TPG and the outline of algorithm used to
current density are major factors that determine electromigration design the proposed BIST TPG are described in Section IV. A
rate, elevated temperature, and current density caused by exces- technique to minimize hardware overhead for implementing the
sive switching activity during test application will severely de- proposed BIST TPG for circuits with multiple scan chains is
crease reliability of CUTs. This is even more severe in circuits presented in Section V. Section VI reports experimental results.
equipped with BIST since such circuits are tested frequently. Finally, Section VII presents the conclusions.
To test a bare die, power must be supplied during the period
of test through probes, which typically have higher inductances II. 3-WEIGHT WRBIST
than power and ground pins of the circuit package. Hence, the bare
A. Generator
die under test will experience higher power/ground noise which is
given by , where is the inductance of power and ground In this paper, we assume that the sequential CUT has pri-
line and is the rate of change of current flowing in power mary and state inputs, and employs full-scan. Even though the
and ground lines. Excessive power/ground noise can erroneously proposed BIST TPG is applicable to scan designs with multiple
change the logic state of circuit lines causing some good dies to scan chains, we assume that all primary and state inputs are
fail the test, leading to unnecessary loss of yield. driven by a single scan chain unless stated otherwise (applica-
In this paper, we propose a low hardware overhead scan- tion to multiple scan chains is discussed separately in Section V)
based BIST technique that can achieve very high fault coverage only for clarity and convenience of illustration. A test cube is
without the risk of damaging CUTs due to excessive switching a test pattern that has unspecified inputs. The detection proba-
activity during BIST. Recently, techniques to reduce switching bility of a fault is defined as the probability that a randomly gen-
activity during BIST have been proposed in [17] and [22]–[25]. erated test pattern detects the fault [1]. In the 3-weight WRBIST
A straightforward solution is to reduce the speed of the test scheme, fault coverage for a random pattern resistant circuit is
clock during scan shift operations. However, since most test ap- enhanced by improving detection probabilities of RPRFs; the
plication time of scan-based BIST is spent for scan shift op- detection probability of an RPRF is improved by fixing some
erations (typically , where is the number of scan inputs of the CUT to the values specified in a deterministic test
flip-flops in the longest scan chain), this will increase test appli- cube for the RPRF. A generator or weight set is a vector that rep-
cation time by about a factor of if scan flip-flops are clocked at resents weights that are assigned to inputs of the circuit during
speed during scan shift operations. Furthermore, reducing 3-weight WRBIST. Inputs that are assigned weight 1 (0) are
the clock speed does not solve high power/ground noise that fixed to 1 (0) and inputs that are assigned weight 0.5 are driven
is caused by a large number of simultaneous transitions in the by outputs of the pseudorandom pattern generator, such as an
circuit. A technique that uses enhanced flip-flops to isolate mis- LFSR and a CA. A generator is calculated from a set of deter-
sion logics of the CUT from scan chains is proposed [17]. Major ministic test cubes for RPRFs.
disadvantages of this technique are performance degradation Consider a sequential circuit that has primary and state
and area overhead entailed by adding extra logics to isolate inputs. denotes a set of test cubes for
mission logics from scan chains. Techniques to schedule tests RPRFs in the CUT, where is an -bit
under power constraints are proposed in [14], [19], and [20]. test cube, where , where is a don’t care.
Test scheduling techniques can reduce overall chip power dis- Fig. 1 shows test cube set , which consists of four test cubes,
sipation. However, these techniques cannot solve the hot spot and . Generator for
problem that is caused by temperature being excessively ele- the circuit with inputs is denoted as an -bit tuple, where
vated at a small area of the chip. and .
Though techniques to reduce switching activity during scan If input is assigned only a 1 (0) or an in every test cube
BIST have been extensively studied, very few papers simul- and assigned a 1 (0) at least in one test cube in , then input
taneously address both excessive switching activity and fault is assigned a 1 (0) in the generator, i.e., (0). If input
coverage. A BIST TPG that can achieve high fault coverage is assigned a 1 in a test cube , i.e., , and assigned a 0
and also reduce switching activity during BIST is proposed for in another test cube , i.e., , then input is assigned
single scan chain designs in [26], which augments the LT-RTPG a in the generator, i.e., . Otherwise ( is always
[23] with the serial fixing 3-weight WRBIST [27]. It is shown assigned an in every generator), input is assigned an
that the 3-weight WRBIST can achieve very high fault coverage in the generator, i.e., . In summary, is defined as
with low hardware overhead [27]. The LT-RTPG proposed in follows:
[23] generates correlated test patterns that can reduce transitions
at state inputs during scan shift operations. The serial fixing if or in and at least one
3-weight WRBIST can also generate test patterns that cause less if or in and at least one
switching activity during BIST. This paper is a significant ex- if and where
tension of [26], especially, a technique to optimize TPGs for otherwise
multiple scan chain designs is proposed. (1)
The rest of this paper is organized as follows. Section II
briefly introduces the serial fixing 3-weight WRBIST [27]. Inputs that are assigned ’s in a generator are called conflicting
The techniques that are used in this paper to reduce switching inputs of the generator.
WANG: BIST TPG FOR LOW POWER DISSIPATION AND HIGH FAULT COVERAGE 779
Fig. 1. Example test cube sets. (a) Testcube set C . (b) Testcube subsets C ;C .
cube set. Test cubes are added into the current test cube set until where and are, respectively, the inputs and the output of
placing any more test cube into makes the number of con- a gate with controlling value and inversion . The controlling
flicting inputs, i.e., ’s, in the generator greater than a prede- value of a gate is the binary value that, when applied to any input
fined threshold. Whenever a test cube is placed into , gener- of a gate, determines the output value of that gate independent
ator is updated according to (1). Upon the completion of the values applied at the other inputs of the gate. The control-
of generating a test cube set , a new current test cube set lability cost functions guide the ATPG to select the backtrace
is created and the test cubes generated later are placed into the paths that require the minimum cost (number of conflicting in-
new test cube set . puts), whenever there is a choice of several paths to backtrace
Since each generator requires a different sequence of control from the target line to the inputs.
bits at the output of the decoding logic, hardware overhead for The observability cost functions are recursively computed
the decoding logic is determined also by the number of genera- from primary outputs to primary inputs. The observability cost
tors. A special automatic test pattern generation (ATPG) is used of line is given by
to generate deterministic test cubes for RPRFs that are suitable
if is a stem with branches
to minimize the number of generators. In order to minimize the
otherwise
number of generators (placing more test cubes into each test
cube set will result in smaller number of generators), the pro- (4)
posed ATPG generates each test cube taking all test cubes ex- where in the latter case is the output of gate with input and
isting in the current test cube set into consideration. At the heart are all inputs of other than . The observability cost functions
of the ATPG technique are three cost functions: controllability, are used to guide the objective selection.
observability, and test generation cost function, which are ob- The proposed ATPG will now be described for the stuck-at
tained by modifying the traditional SCOAP-like [29] cost func- fault model. In order to generate a test cube to detect a
tions. The test generation process of the ATPG, which is based stuck-at- at line , first the fault should be activated
on PODEM [30], is guided by the cost functions to generate by setting line to . The cost to activate l - - is . Then,
suitable test cubes. the activated fault effect should be propagated to one or more
The controllability cost of input , is defined by outputs. The cost to propagate the activated fault effect at line
considering the generator of the current test cube set is . Hence, the test generation cost to generate a test cube
as follows: for l - - is defined as the sum of two cost functions
if
if (5)
where (2)
if The test generation cost is used to select a best target fault from
if the fault list.
where is a binary value 0 or 1. Since test cubes generated by the proposed ATPG are often
The purpose of the controllability cost is to estimate the over-specified, a few bits that are assigned binary values by the
number of conflicting inputs that would be caused by adding proposed ATPG can be relaxed to don’t cares while ensuring
into the current test cube set a test cube where input is the detection of targeted faults. Test cubes with fewer specified
assigned a binary value . If , the current test cube set inputs have fewer conflicting inputs with test cubes already in
already contains test cubes that conflict at input (at least the current test cube set so that more test cubes can be placed
one test cube in the current test cube is assigned a 1 at and in the current test cube set. Whenever a test cube is generated,
another test cube is assigned a 0 at ). Hence, assigning any inputs that are assigned binary values are ordered according to
binary value to does not cause any more adverse effect. the cost incurred by assigning each input to its binary value. The
Hence, . When , adding a test cube binary value assigned to each of these inputs is flipped in this
whose input is assigned a 1 (0) causes no conflict with any order. If all the targeted faults can still be detected even after an
test cube in the current test cube set. Hence, . If input is flipped to its opposite value, the value assigned to the
(0), all test cubes existing in the current test cube set input is relaxed to a don’t care.
are assigned only 1 (0) or at input . Hence, adding a test If a circuit has any reconvergent fanout, an input assignment
cube whose input is assigned the opposite value 0 (1) causes required to satisfy some objectives may conflict with that re-
a conflict at input with other test cubes in the current test quired to satisfy other objectives, causing the proposed ATPG
cube set. Hence, high cost is given. Finally, if to select an objective or backtrace path with a high cost. Hence,
(input is not specified in any test cube in the current in circuits with reconvergent fanouts, the actual cost of the test
test cube set), adding a test cube that is assigned at input cube generated for a fault may be much higher than the cost
may increase one minterm in the on-set for the function of . of the fault given by the estimate test generation cost function
Hence, a small cost 1 is given. shown in (5). To prevent adding such test cubes to the current
The controllability costs for internal circuit line in the cir- test cube set, if the actual cost of a generated test cube is higher
cuit, which are computed in a similar manner to the testability by a certain number (say, 100) than the estimate test generation
measures used in [29], are given by cost of the fault, the generated test cube is discarded. Test gen-
eration is then carried out for alternative target faults until a test
if cube is found for a fault whose actual cost is close to the es-
(3)
otherwise timate test generation cost of the fault or all faults in the fault
782 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007
list are tried. Even in the worst case where all faults in the fault
list need to be tried, generating test cubes is required only for a
few faults. The estimate test generation cost of a fault is always
an optimistic approximate of the actual cost for the fault in the
sense that the actual cost of any test cube for the fault cannot be
less than the estimate test generation cost. Hence, if the estimate
test generation cost of a fault is greater than the actual cost of
the test cube that has the minimum actual cost among the test
cubes that have been generated but discarded due to high actual
cost, then the actual cost of any test cube for the fault cannot Fig. 5. Transitions at scan chain input.
be smaller than the current minimum actual cost. Hence, we do
not need to generate a test cube for the fault. If test cubes for
all faults in the fault list have very high actual cost, then the test
cube that has the minimum actual cost is chosen to be a new
member of the current test cube set.
assignments. This implies that RPRFs that escape LT-RTPG test value 0 or 1 in , then both and for are as-
sequences can be effectively detected by fixing selected inputs signed 0 to minimize the number of minterms in the on-sets of
to binary values specified in deterministic test cubes for these the decoding logic function. Note that this minimizes also the
RPRFs and applying random patterns to the rest of inputs. This number of transitions since when either or is set to
technique is used in the 3-weight WRBIST to achieve high fault 1 for scan shift cycles for inputs , no transi-
coverage for random pattern resistant circuits. In this paper we tions occur during the scan shift cycles for these scan inputs.
demonstrate that augmenting the LT-RTPG with the serial fixing On the other hand, if both and are
3-weight WRBIST proposed in [27] can attain high fault cov- 0, i.e., , then we check , i.e., the value assigned
erage without excessive switching activity or large area over- at in , to determine the values of and for
head even for circuits that have large numbers of RPRFs. scan input (recall that both and values for scan in-
puts are always assigned 0). If ,
B. Property of 3-Weight WRBIST to Reduce Switching Activity then we assign for input to 1 to make
If a large set of scan inputs that are consecutively located in and ( and ). This adds
the scan chain are assigned identical values ( is identical to one minterm in the on-set of for all the scan inputs,
any binary value 0 or 1) in a generator, then the flip-flops . In this case, a transition can occur at the
and of 3-Weight WRBIST (see Fig. 3) stay at the input of the scan chain only in the scan shift cycles when a value
same state for many scan shift cycles. While holds a 1, for scan input is scanned in among all scan shift cycles for in-
the output of the OR gate in the fixing logic is set to a 1 and 1’s puts . As the last case, if both and
are continuously scanned into the scan chain and no transitions , i.e., both inputs that flank the consecutive scan inputs
occur at the input of the scan chain. Likewise, while holds are assigned in , then we determine
a 1, random pattern values generated by the LFSR are blocked at the values of and for scan input based on the number
the AND gate and no transition occurs at the input of scan chain of scan inputs between and , i.e., . If , where
provided that the other flip-flop does not toggle. Hence, is a predefined natural number, then we arbitrarily select ei-
in order to significantly reduce the number of transitions at the ther or and assign it to a 1 for input to suppress tran-
input of scan chain, either or should be assigned a 1 sitions at the input of scan chains. Otherwise, we assign both
and stays at the 1 for long periods of scan shift cycles. and for scan input to 0 to minimize the number of
Typically, the majority of scan inputs are assigned ’s (don’t minterms in the on-sets of functions for and . If is
cares) in every generator. Since all the faults that are targeted large, then transitions can occur at the input of the scan chain
by a generator can be detected independent of the scan values in many scan shift cycles. Hence adding one more minterm in
applied to the scan inputs that are assigned ’s in the gener- the on-set of functions for the decoding logic to suppress large
ator, the values of and for those scan inputs are assigned number of transitions is worthy.
such that the number of minterms in the on-sets of the functions For example, consider the set of generators shown in Fig. 4(a).
for and is minimized (to minimize hardware overhead Consecutive scan inputs and are assigned ’s and
for the decoding logic) and either or stays at 1 for input , which precedes in the scan chain, is assigned a
long periods of scan shift cycles (to minimize the number of in . Since is assigned a and hence both
transitions at the input of the scan chain). In order to minimize , we check input , which is the first scan input
the number of minterms in the on-sets of functions for and that is assigned a care bit (0, 1, or ) in after the con-
and for the scan inputs that are assigned ’s in a secutive scan inputs , and , which are assigned ’s in
generator should be assigned 0. Note that does not . Since is assigned a 0 in for is as-
toggle (holds its previous state) when the state of is 0. signed a 1 to toggle the state of in the scan shift cycles
Assume that test patterns are currently generated by gener- for input . On the other hand, in generator , consec-
ator . Also, assume that scan input is assigned an utive inputs , and are assigned ’s and inputs and
while its predecessor is assigned a care bit (0, 1, or , which flank the inputs , and , are assigned both .
) in . Let be the first scan input in the scan chain Hence, the and values for the scan input are deter-
that is assigned a care bit in after , i.e., , In mined by considering the predefined number and the number
other words, consecutive scan inputs , which of consecutive scan inputs that are assigned ’s between and
are located between and , are assigned ’s in . , i.e., 3. If is used, then for input is assigned a 1
and , respectively, denote the states of and to toggle the state of to 1 to suppress transitions. If
in a scan shift cycle when a value for scan input is is used, then both and for input are assigned 0’s to
scanned into the scan chain. In this paper, the decoding logic minimize the number of minterms in the on-sets of functions
is designed such that the states of and for the scan for the decoding logic.
inputs are always the same as those of
and for input , i.e., and values for the inputs IV. PROPOSED TEST PATTERN GENERATOR
are always 0 to minimize the number of The proposed BIST is comprised of two TPGs: an LT-RTPG
minterms in the on-sets of the function for the decoding logic. [23] and a 3-weight WRBIST [27] (see Fig. 7). The multiplexer,
and values for input are determined by consid- which drives the input of scan chain, selects a test pattern source
ering values assigned at and in generator . If either between the LT-RTPG and the 3-weight WRBIST. In the first
or is 1, i.e., is assigned a binary test session, test patterns generated by the LT-RTPG are selected
784 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007
split the longer scan chain into two shorter scan chains keeping
the order of scan flip-flops in the original scan chain. The values
required at , and , where 1 and 2, for the two scan
chain version of the circuit are shown in Fig. 9(b) (assume that
the two pairs of flip-flops / and
are initialized both to 1 and 0, respectively, before a scan shift
operation starts). The function of the decoding logic for the
two scan chain version also has four minterms on its on-set.
(However, the two scan chain version will require one more
fixing logic, which can however be implemented with very little
hardware.) As this example shows, the number of minterms in
the on-sets of functions for the decoding logic does not change
substantially unless there are drastic changes in the order of
scan flip-flops, even if long scan chains are split into shorter
scan chains.
In the previous paragraph, we assume that the decoding logic
is designed such that a separate pair of outputs and are
assigned for each scan chain , where . How-
ever, it is not necessary to assign a separate pair of outputs to
each scan chain. In the following, we present a method, which
is based on compatibility analysis, to reduce the number of out-
puts of the decoding logic for circuits with multiple scan chains
to reduce hardware overhead.
If the value assigned at every scan input of scan chain
, where , is identical to the value assigned at the
corresponding scan input of another scan chain in every
generator , where , where is the
number of generators, then scan chain is said to be compat- Fig. 10. Merging compatible scan chains. (a) Generators before merging. (b)
ible with scan chain (don’t care is identical to any value Graph representation of compatible scan chains. (c) Generators after merging.
, and ). Otherwise, scan chains and are not com- (d) Decoding logic for merged scan chains.
patible. For example, in Fig. 10(a), which shows a set of gen-
erators computed for a circuit with four scan chains ,
and , the value assigned at every scan input in scan chain compatible scan chains. Reducing the number of outputs of the
, where , is identical to the value assigned at decoding logic by merging compatible scan chains can reduce
in every generator and . Scan hardware overhead for the decoding logic. In this paper, mini-
inputs and , where , are also assigned mizing the number of outputs of the decoding logic is achieved
identical values in every generator. Hence, scan chains and by finding maximal numbers of compatible scan chains, which
and also and are compatible pairwise. In Fig. 10(b), can be formulated as the clique problem [33]. If a set of com-
the nodes represent the scan chains and the arcs depict compat- patible scan chains are merged to be driven by the same pair
ibility relationships between scan chains (if scan chain and of decoding logic outputs, then generators for the merged scan
scan chain are compatible with each other, then the node for chains are also updated accordingly as follows. If input
scan chain and the node for scan chain are connected by is assigned a care value , where , in generator
an arc). denotes the value assigned at th scan input of scan , then values for all th scan inputs of the scan
chains that are merged together are updated to in the new gen-
chain in generator . denotes that is
erator. Fig. 10(c) shows the new generators after generators of
identical to while denotes that is not iden-
compatible scan chains are updated. Fig. 10(d) shows an im-
tical to . Scan chains and are not compatible because
plementation of 3-weight WRBIST for the merged generators
while input of scan chain is assigned a 0 in ,
shown in Fig. 10(c). Since the circuit has four scan chains, the
i.e., of is assigned a 1 in the same generator,
3-weight WRBIST has four fixing logics. However, since the
i.e., . and are assigned nonidentical values at
two pairs of compatible scan chains and , and and
other inputs too;
are merged, the decoding logic has only two pairs of outputs,
, and .
, and .
If scan chain is compatible with scan chain , then the
fixing logics for scan chains and can share a common de-
coding logic output pair. Note that in Fig. 10(d), the inputs of VI. EXPERIMENTAL RESULTS
flip-flops for compatible scan chains and ( and ) are Table I compares results obtained by applying test sequences
driven by the same output pair and ( and generated by regular LFSRs (LFSR sequences, for short)
) of the decoding logic. If a circuit has a large number and results obtained by applying test sequences generated
of scan chains, then typically there are also a large number of by the proposed TPGs (the proposed TPG sequences, for
786 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007
TABLE I
COMPARISONS WITH LFSR GENERATED PATTERNS (SINGLE SCAN CHAIN)
short). Columns under the heading LFSR give results of LFSR TPG to achieve 100% stuck-at fault efficiency. The columns #
sequences while columns under the heading proposed give gen show the number of generators that were generated by the
results of proposed TPG sequences. The column # pat under proposed ATPG (see Section II-C). The results on the number
the heading LFSR shows the number of test patterns generated of test patterns clearly demonstrate that the proposed TPG
by the LFSR and the column % shows fault efficiency can achieve high fault coverage with significantly fewer test
achieved by the LFSR sequence. Fault efficiency is defined patterns than the LFSR. The average number of transitions per
as the number of detected faults / the number of total test cycle is shown in the columns Switching as a fraction of
faults the number of untestable faults . The average number average number of transitions caused by LFSR sequences. Both
of transitions per cycle is given in the column # Aver. Trans.. LT-RTPG and 3-weight WRBIST test sequences significantly
This number includes the number of transitions caused not reduced switching activity in every circuit. Test sequences
only by test patterns being scanned in but also by responses that are generated by the 2-input (3-input) AND gate LT-RTPG
being scanned out. The number of transitions at signal lines caused on average 33% (54%) fewer transitions than LFSR
are weighted by the number of fanouts in each stem. Hence, if generated test sequences. The 3-weight WRBIST generated
a fanout stem that drives branches has a transition, then the sequences also significantly reduced the number of transitions
transition is counted as rather than 1. Total CPU time spent (by average 47%). Note that larger reduction in the number of
to synthesize the proposed TPG is shown in the columns run transitions is achieved for large circuits. These results clearly
time (sec) in seconds. All experiments were run on a 1.6-Ghz demonstrate that the proposed TPG can achieve high fault
Sunfire V 440 workstation with 8 GB of RAM. coverage and also efficiently reduce excessive switching ac-
For each circuit, we implemented two different proposed tivity that may occur during BIST. Hardware overhead for the
TPGs each of which is comprised of a different LT-RTPGs: proposed TPG is presented in Table II.
one comprised of 2-input AND gate and the other comprised of For the experiments shown in Table II, we made three dif-
3-input AND gate (see Fig. 7). Results obtained by the proposed ferent versions of circuit for each benchmark circuit each of
TPG with a 2-input (3-input) AND gate LT-RTPG are shown which has a different scan chain length: one with only single
in the columns under the heading . The same scan chain, one with scan chain length of 128, and finally one
number of test patterns were generated by both 2-input AND with scan chain length of 64. Unlike [26] where scan flip-flops
gate LT-RTPG and 3-input AND gate LT-RTPG for each circuit are reordered to minimize hardware overhead, in this paper
(to drop easy-to-detect faults). The number of test patterns scan flip-flops are routed in the original order (the order scan
generated by the LT-RTPG is given in the column LT-RTG flip-flops appear in the circuit). Columns # chains give numbers
# pat. Fault efficiency achieved by the LT-RTPG sequence is of scan chains in the versions that have multiple scan chains.
shown in the columns LT-RTG FE%. Numbers of scan flip-flops in scan chains are balanced in all
100% single stuck-at fault efficiency was achieved by the pro- multiple scan chain circuits. Compatible scan chains were
posed TPG sequence for every benchmark circuit. In addition merged together to minimize hardware overhead as described
to stuck-at fault efficiency, we computed transition delay fault in Section V. Columns # Dec output give the number of output
efficiency achieved by the same sequence of test patterns. Tran- pairs of the decoding logic. The results for the proposed TPG
sition delay fault efficiency is given in the columns Trans dly with 2-input (3-input) AND gate LT-RTPG are shown in the
FE%. To compute transition delay fault efficiency, test patterns rows (3). The decoding logic circuits were obtained by
generated by the proposed TPG were applied to scan chains running SIS [35] for two-level circuit implementations. Area
by the skewed-load approach [34] (or the launch-off-shift). It overhead of the synthesized decoding logic is reported in gate
is interesting to see that even though the proposed ATPG is equivalents (columns labeled GE). Only NAND gates, NOR gates,
not designed for transition delay fault model, transition delay and inverters were used to synthesize the decoding logics. The
fault efficiency achieved by the proposed TPG is very close gate equivalents are computed in the manner suggested in [9]:
to 100% for every circuit. Columns tot. # pat. give the total 0.5 for an -input NAND or NOR gate and 0.5 for an inverter.
number of test patterns, which includes both LT-RTPG patterns If a circuit has large number of scan chains, then typically
and 3-weight WRBIST patterns, generated by the proposed there are many compatible scan chains that can be merged
WANG: BIST TPG FOR LOW POWER DISSIPATION AND HIGH FAULT COVERAGE 787
TABLE II
EXPERIMENTAL RESULTS FOR PROPOSED TPG WITH MULTIPLE SCAN CHAINS
TABLE III
COMPARISONS WITH PRIOR WORK
together (see Section V). Note that numbers of output pairs with [12] and [36] may not be fair. For s9234, s15850, and
of decoding logics (columns # Dec output) for multiple scan s38417, the gate equivalents of the proposed TPG are signifi-
chain versions of large benchmark circuits such as s38417 and cantly smaller than those of [12] while the gate equivalent of
s38584 are much smaller than numbers of scan chains in these [12] is much smaller than that of the proposed TPG for s13207
circuits (versions) since many scan chains are compatible and and s38584. Note that the gate equivalent of the proposed TPG
thus merged together. Even if there are no merged scan chains for s38417, the largest circuit, is smaller than half the gate equiv-
(the number of scan chains is the same as that of decoding alent of [12] while the number of patterns to achieve 100% is
logic output pairs), gate equivalents of the decoding logic even smaller than that of [12]. The gate equivalents of [36] are
for versions that have shorter scan chains are slightly smaller a little smaller than those of the proposed TPG for most circuits
than those of decoding logics for versions with longer scan except s9234 and s15850. The gate equivalent of [36] for s9234
chains. For example, even though no scan chains are merged in is significantly smaller than that of the proposed TPG while the
either 128 or 64 scan chain length version of s5378 and s9234, gate equivalent of [36] for s15850 is even larger than that of the
gate equivalents of decoding logics for single scan chains are proposed TPG.
slightly larger than those of decoding logics for 128 scan chain The TPGs proposed in [37] and [38] can reduce switching
length versions and gate equivalents of decoding logics for 128 activity during BIST. The columns % AP reduction show re-
scan chain length versions are in turn slightly larger than those duction in the average number of transitions against regular
of decoding logics for 64 scan chain length versions. Average LFSRs. Fault efficiency and coverage achieved are shown in the
numbers of transitions are shown in the columns under the columns FE% and FC%. Even though more test patterns were
headings #Trans.; columns LT-(3W-) give average numbers of applied than the proposed TPG, fault efficiencies (coverages)
transitions per test clock cycle caused by LT-RTPG test se- achieved by the TPGs proposed in [37] and [38] are much lower
quences (3-weight WRBIST sequences). In general, LT-RTPG than 100%. The proposed TPG (with LT-RTPG with )
sequences cause smaller numbers of transitions for versions achieves even larger reduction in the average number of transi-
that have shorter scan chains. tions than [36]. The TPG [37] achieves larger reduction in the
Table III compares the proposed method with recent prior average number of transitions than the proposed TPG.
work [12], [36]–[38]. Like the proposed TPG, the TPGs pro- Table IV shows experimental results for three industrial de-
posed in [12] and [36] also achieve 100% fault efficiency for signs. The column grid cnt gives the size of each design in the
every ISCAS’89 benchmark circuit. Unlike the proposed TPG, number of grids (the grid count does not include area occu-
since either [12] or [36] does not consider power dissipation pied by embedded memories) and the column # FFs gives the
during BIST. Hence direct comparisons of the proposed TPG number of scan flip-flops in the design. The column # Init pat
788 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007
[20] T. Schuele and A. P. Stroele, “Test scheduling for minimal energy con- [33] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algo-
sumption under power constrainits,” in Proc. VLSI Test. Symp., 2001, rithm. Cambirdge, MA: MIT Press, 1990.
pp. 312–318. [34] J. Savir, “Skewed-load transition test: Part I, calculus,” in Proc. IEEE
[21] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Int. Test Conf., 1992, pp. 705–713.
Systems Perspective, 2nd ed. Reading, MA: Addison-Wesley, 1992. [35] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A.
[22] S. Gerstendorfer and H.-J. Wunderlich, “Minimized power consump- Saklanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni-
tion for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1999, pp. Vincentelli, “SIS: A system for sequential circuit synthesis,” Electron.
77–84. Res. Lab. Memorandum, Univ. California, Los Angeles, UCB/ERL
[23] S. Wang and S. K. Gupta, “LT-RTPG: A new test-per-scan BIST TPG M92/41, 1992.
for low heat dissipation,” IEEE Trans. Comput.-Aided Des. Integr. Cir- [36] L. Li and K. Chakrabarty, “Test set embedding for deterministic BIST
cuits Syst., vol. 25, no. 8, pp. 1565–1574, Aug. 2006. using a reconfigurable interconnect network,” IEEE Trans. Comput.-
[24] S. Wang and S. K. Gupta, “DS-LFSR: A BIST TPG for low heat dis- Aided Des. Integr. Circuits Syst., vol. 23, no. 9, pp. 1289–1305, Sep.
sipation,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 2004.
21, no. 7, pp. 842–851, Jul. 2002. [37] N.-C. Lai, S.-J. Wang, and Y.-H. Fu, “Low-power BIST with a
[25] F. Corno, M. Rebaudengo, M. S. Reorda, G. Squillero, and M. Violante, smoother and scan-chain reorder under optimal cluster size,” IEEE
“Low power BIST via non-linear hybrid celluar automata,” in Proc. Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 11, pp.
VLSI Test. Symp., 2000, pp. 29–34. 2586–2594, Nov. 2006.
[26] S. Wang, “Generation of low power dissipation and high fault coverage [38] N. Z. Basturkmen, S. M. Reddy, and I. Pomeranz, “A low power
patterns for scan-basedBIST,” in Proc. IEEE Int. Test Conf., 2002, pp. pseudo-random BIST technique,” J. Electron. Test.: Theory Appl., vol.
834–843. 19, no. 6, pp. 637–644, Dec. 2003.
[27] S. Wang, “Low hardware overhead scan based 3-weight weighted
random BIST,” in Proc. IEEE Int. Test Conf., 2001, pp. 868–877.
[28] S. Wang, “Minimizing Heat Dissipation During Test Application,”
Ph.D. dissertation, EE-Systems Dept., Univ. Southern California, Los
Angeles, 1998. Seongmoon Wang received the B.S. degree in elec-
[29] L. H. Goldstein and E. L. Thigpen, “SCOAP: Sandia controllability/ob- trical engineering from Chungbuk National Univer-
servability analysis program,” in Proc. IEEE-ACM Des. Autom. Conf., sity, Chungbuk, Korea, in 1988, the M.S. degree in
1980, pp. 190–196. electrical engineering from Korea Advanced Institute
[30] P. Goel, “An implicit enumeration algorithm to generate tests for com- of Science and Technology, Daejeon, Korea, in 1991,
binational logic circuits,” IEEE Trans. Comput., vol. C-30, no. 3, pp. and the Ph.D. degree in electrical engineering from
215–222, Mar. 1981. University of Southern California, Los Angeles, in
[31] K.-H. Tsai, J. Rajski, and M. Marek-Sadowska, “Star test: The theory 1998.
and its applications,” IEEE Trans. Comput.-Aided Des. Integr. Circuits He is currently a Senior Research staff member at
Syst., vol. 19, no. 9, pp. 1052–1064, Sep. 2000. NEC Laboratories America, Princeton, NJ. He was
[32] I. Pomeranz and S. Reddy, “3-weight pseudo-random test generation previously a Design Engineer at GoldStar Electron,
based on a deterministic test set for combinational and sequential cir- Korea, and a DFT Engineer at Syntest Technologies, Sunnyvale, CA, and 3Dfx
cuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 12, Interactive San Jose, CA. His main research interests include design for testa-
pp. 1050–1058, Jul. 1993. bility and computer-aided design.