Вы находитесь на странице: 1из 8

Redundancy, repair, and test features of a 90nm embedded SRAM generator

Rob Aitken, Neeraj Dogra, Dhrumil Gandhi, Scott Becker Artisan Components 141 Caspian Court Sunnyvale, CA, USA 94089 aitken@artisan.com Abstract Todays System on Chip (SoC) designs can use hundreds of embedded memories. Most of these are created by automated generators, which must produce a wide variety of configurations while retaining essential capabilities in area, performance, power and testability. Redundancy, repair, and test issues in the context of memory generators are more complex than they are in the context of individual embedded memories, or even internally developed memories. 1 Background and motivation

Embedded memories are the most common cores on todays SoCs. It is common to have dozens or even hundreds of individual instances on chip, ranging from register files as small as 64 bits to larger memories of hundreds of kilobits or even megabits in size. Building such a variety of memories by hand is infeasible, and so memory generators are used to build most SRAMs, with custom design restricted to application-specific configurations or special needs (e.g. 5 write ports). A memory generator differs from other IP in that it provides the ability to create a specific block of hard IP, rather than the IP itself. This additional capability generates some unique test challenges. Because memories can occupy 50% or more of total SoC area, high quality test for them is vital to overall chip quality. Most literature in this area has concentrated on built-in self-test and repair solutions for embedded SRAM [1]-[5], or on assisting ATE [6][7], but quality tests require good design-for-testability within the RAM itself, as well as high quality test algorithms. Because SRAM circuits are extremely dense, they are more susceptible to defects than other types of circuitry such as standard cells or I/O. Many foundries report defect densities in SRAM at twice the rate of other circuitry. This paper describes some of the challenges that a provider of memory generators faces and shows how they have been addressed in a particular implementation of a generator for the 90nm process generation. The generator produces memories in the range of 1k bits to 512k bits, with word widths from 2 to 128 bits, and three different column multiplexing schemes (for aspect ratio, timing, and power tradeoffs). The paper does not address built-in self-test or repair algorithms, nor does it consider the memory design or verification process itself. Instead, specific memory features that enable defect tolerant operation are discussed, including redundancy, accelerated retention test, test multiplexers, adjustable timing margins, and error correcting codes. A test chip implementing these features is currently in fabrication. 2 Requirements for a memory generator

Memory generators have some constraints that should be addressed before looking at specific testability features. A memory generator should be robust, flexible, and easy to use. The memories it creates need to be area efficient, manufacturable, testable, reliable, and able to meet a wide variety of performance requirements in both power and timing. Finally the generator must produce a complete set of models for the tools in the users SoC design flow. It is desirable for embedded memory to retain the same look and feel across process generations and across product types in a given generation, keeping the same instance names, port names, generator GUI, etc. over time whenever possible. It is also desirable to support access to multiple foundries where similar processes exist. Thus while the underlying architecture of a memory may change with each process generation, the users view of it through the generator remains consistent. An example GUI view is shown in Figure 1.

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 1063-6722/03 $17.00 2003 IEEE

Figure 1 GUI interface


Column shifts

jth column

Two Redundant Columns

Left Core

Right Core
ith row

RRA=1 RAA=1 RRA=0 RAA=0

Redundant Rows

n RREN RRA RCA

D[] Q[] CREN[]

Figure 2 Redundancy architecture of single port RAM 3 Redundancy and Repair

Memory redundancy and repair is a well-known topic. However, the relative costs and benefits of redundancy change at every process node. Current foundry recommendations suggest that repairable memory be used for chips with more than 1 to 2Mb of SRAM. The 90nm generator discussed here generates single instances only up to 512k bits, so if only one memory is used on a chip, redundancy is probably not an issue. However, most SoCs contain numerous memories, many or most of which are small single instances. This means that a generator-based repair solution must be general enough to accommodate a wide variety of repair schemes and still satisfy yield improvement goals. The 90nm generator provides two physical rows and two physical columns of redundancy, but the user is able to choose whether to use zero, one or both of each of the redundant rows and columns. The decision to add two physical rows and columns was made in order to provide for reasonable yield improvement and low area overhead over a wide variety of memory instance sizes, while accommodating physically likely defects (e.g. neighboring bit line shorts breaking two adjacent columns). The architecture is shown in Figure 2. Row redundancy is implemented using external address matching (in a provided soft macro (synthesizable RTL)), together with a dynamic row redundancy enable (RREN) and row redundancy address (RRA) signal. These signals cause the memory to use one of the redundant rows, independent of the row address selected on the address bus. Column redundancy is enabled by dynamically shifting columns. A column redundancy enable

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 2 1063-6722/03 $17.00 2003 IEEE

signal (CREN) is provided for each column if it is zero, then that columns data is shifted to the right (next highest bit). The highest bit is shifted into the redundant column selected by the column address signal (RCA). The RTL needed to implement the address recognition and CREN signals is generated along with the memory. One might think that it would be desirable to always use the maximum amount of redundancy, but this is not necessarily the case. Controller size and complexity has some influence in the overall picture. To give a flavor for the issues involved, we will look at two case studies: Multiple small RAMs and a few large RAMs. The following section provides some review and may be skipped by readers familiar with the topic 3.1 Background: Redundancy and yield

We will use the Poisson yield model for simplicity. Other models produce similar results for the circuit sizes under consideration, since we are concerned with changes in yield rather than absolute values. See[12] for more information on yield modeling.

Y = e AD
where Y is yield A is area D is effective defect density

(1)

Redundancy improves yield by repairing defects, or, more precisely, by eliminating failures associated with defects. In the Poisson model, the number of defects is given by the Poisson distribution, so the probability of k defects is given by

P(k ) =

( AD) k e AD k!

(2)

Notice that equation (2) reduces to equation (1) when k is 0. Imagine a perfect repair process that was able to repair any n defects. The yield for a chip using this perfect repair scheme would be given by

Y perfect

( AD) k e AD = P(k ) = k! k =0 k =0
n n

(3)

In reality, repair schemes are not perfect and are not able to repair all defects. For example, a row-only repair scheme cannot repair a defective column. In general, a method, say X, will be able to repair some percentage of single defects, a different percentage of double defects, and so on. This can be represented as

Y repair X = =

f X (k ) P (k ) ( AD ) k e AD f X ( k ) k!
(4)

k =0

k =0

3.2

Case I: Multiple Small RAMs

While yield equations remain relatively constant over time, their application changes with each process generation. Consider a design containing 50 small memories of 64k bits each arranged as 4k words of 16 bits each, with a 16-bit column mux (this memory has 256 physical rows and 256 physical columns), for a total of about 3 megabits of memory. In order to simplify product characterization and verification, the generator

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 3 1063-6722/03 $17.00 2003 IEEE

adds both redundant rows and columns whenever either is used, and ties off any unused inputs. Each resulting instance is 3.6% larger than it would be without redundancy. We will use the defect distribution in the table below to calculate yields (this is meant to serve as an example, not to represent any particular process):

Single bit errors 50%

Paired bit (horiz.) 5%

Paired bit (vert) 5%

Entire row

Entire col.

Unrepairable

10%

20%

10%

Table 1 Example Defect Distribution With the distribution above, a double row repair scheme can repair 70% of all single defects (all but the entire column and unrepairable categories), similarly a double column scheme can repair 80% of all single defects. The number of double errors requires additional calculations, and works out to about 45% for the double row scheme and 60% for the double column scheme. Higher order defect coverage is negligible. Assume that each memory requires its own BIST/BISR controller of 1k gates. A combined double row and double column repair approach can repair all repairable single defects (90%), and 81% of dual defects (any pair of repairable defects), plus decreasing percentages for 3 and 4 defects. Assume controller size in this case is 4k gates. Note that actual repair yields will be lower because repair algorithms are never perfect, and that in practice controllers can often be shared. The total area for these memories is 6.25 mm2. Assuming an effective defect density of 5 defects/cm2, then equations (1) and (4) allow us to calculate the expected values for yield. Suppose a wafer yields 300 good die excluding memory, and that memory covers 1/3 of the die, so a 3% increase in memory area equals a 1% increase in die area (and thus decreases logic wafer yield to 297 die per wafer). Further assume that each 50k gates adds 1% to die area. Assuming for simplicity that logic yield does not change, we can calculate the following: Approach None Double row Double column Double row, double column RAM Yield 73.9% 91.2% 94.1% 97.0% Added RAM Area 0 1.2% 1.2% 1.2% Added Logic Area 0 1% 1% 4% Die/Wafer (ex. RAM) 300 293.4 293.4 284.4 Net Die per Wafer 222 268 276 276

Table 2 Example predicted yields and costs, many small RAMs When the added design complexity of a row and column approach are factored in for a complete picture, it appears that a double column redundancy approach may be best suited to this design. Memory redundancy improves net die per wafer by 276/222, or 24%, which should be more than enough to cover the cost of repair. 3.3 Case II: A Few Large RAMs

For larger memory elements (multiple megabits), additional redundancy measures may be employed, such as redundant banks. For this example we will assume that 3 megabits memory is constructed from 6 banks of 512k each. We will again use the defect distribution of Table 1. For this larger memory, the overhead is 1.1% for each instance, which translates to 0.37% overall. For this example, the controller overhead is greatly reduced (6k and 24k gates total). If we assume the same initial logic yield as before (300 die per wafer), we can calculate the following expected yields: In this example, having both row and column redundancy significantly improves yield over either alone, and controllers for only six memories are needed, so the additional design cost versus simply row or column may be worth it (especially if controllers can be shared, which will increase the relative benefit of the row/column approach). Again, memory redundancy is able to improve net die per wafer by 31%.
Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 4 1063-6722/03 $17.00 2003 IEEE

Approach None Double row Double column Double row, double column

RAM Yield 73.9% 92.9% 95.1% 97.7%

Added RAM Area 0 0.37% 0.37% 0.37%

Added Logic Area 0 0.1% 0.1% 0.5%

Die/Wafer (ex. RAM) 300 298.6 298.6 297.4

Net Die per Wafer 222 278 284 291

Table 3 Example predicted yields and costs, a few large RAMs

Accelerated Retention Test

Retention testing is performed in order to verify that the two P transistors in an SRAM bit cell are present and functioning. If one is missing the resulting fault is called an asymmetric fault, if both are missing (e.g. due to a missing VDD contact), then the fault is called symmetric [8]. It is important to check for retention faults, because a cell without P transistors can operate correctly over a short period of time due to the charge storage on node capacitance (see Figure 3). The most common retention test method is simply to load the memory with a known data value (e.g. checkerboard), wait for some period of time (empirically determined, can be up to hundreds of milliseconds), read the data from the memory, and then repeat with the inverse value. If the chip is structured such that all retention testing can take place simultaneously, this approach works well. In some cases, however, dedicated retention testing is desirable examples include high volume production where reductions of fractions of a second in test time translate into significant cost savings and quality/performance requirements for weak P fault coverage [8][9].
VDD

Open here causes asymmetric fault (one P missing)

Open here causes symmetric fault (both Ps missing)

WL

Stored charge within cell


BL

~BL

Figure 3 Six-transistor memory cell It might be argued that increasing background leakage currents should allow for reduced retention test time as technology advances, but the data do not support this. Table 4 shows how core cell leakage values change across technologies for a standard logic process (data are taken at a comparable voltage/temperature corner for each generation). The relative core cell capacitor size is also shown for comparison. While the fast process leakage increases dramatically at each node, the typical leakage increases more slowly, and the slow leakage actually decreases from 130nm to 90nm. IDDQ testing must consider the fast process corner, but retention testing needs to focus more on the slow corner. It can be seen that the ratio between fast and slow is increasing dramatically as transistors shrink, and while this makes memory design more complex, it cannot be used to reduce data retention testing. This effect is particularly striking when viewed graphically, as in Figure 4.

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 5 1063-6722/03 $17.00 2003 IEEE

180 nm slow typ fast fast/slow capacitor size 1.0 2.5 9.1 9 1.00

130 nm 3.8 11.6 53.1 14 0.54

90 nm 2.0 31.8 253.1 124 0.43

Table 4 Relative core cell leakage across process generations

relative leakage
300.0

200.0

100.0

process generation
90

0.0 slow typ

130 180 fast

process corner

Figure 4 Relative core cell leakage Because some designs require a higher speed retention test, the memory generator optionally provides customized circuitry to help detect data retention faults without the extended wait period. The architecture of the test circuit is based on a modified write operation, which operates as follows (to write a 1): Drive both bit lines to zero to clear cell contents. Release drive on positive bit line (BL), keep driving inverse bit line (~BL) to zero. The P transistor in the cell under test (right hand PFET in Figure 3) now must pull BL to 1 to successfully write a 1 in the cell. Subsequent reading of the cell determines if the write was successful.

Since the test uses the small PMOS devices in the bit cell during the modified write operation, the modified write must be performed at a slower speed, approximately 25% of the maximum speed supported by the SRAM during normal operation. Since each bit cell has two PMOS devices that need to be tested for faults, each memory bit should be tested for retention of both 0 and 1 values just as in the case of the retention test using wait periods. It is recommended that the data programmed into the rest of the memory bits in the same physical column as the

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 6 1063-6722/03 $17.00 2003 IEEE

bit under test should have same numbers of 0s and 1s while testing for either of the two values (e.g. checkerboard). This will eliminate potential influence of leakage current from the other bit cells in the case where the bit cell under test has faults in both PMOS devices (symmetric fault), and the memory sense amp may be sensitive to the net leakage current in the column in the absence of direct read current from the bit under test. 5 Extra Margin Adjustment

As a consequence of increasing chip performance requirements and the inherent complexity of integrating design IP from multiple sources, achieving timing closure is the dominant problem in SoC design. In addition, measuring timing accurately needs to consider signal integrity effects such as crosstalk, IR drop, and glitches (e.g. [10]). Finally, there is an issue with built-in self-repair with respect to marginal timing issues: If a memory is tested at start-up, its temperature will be lower than later during operation (e.g. 25C versus 100C). Timing defects could be present, but not active, at the lower initial temperature. These issues can be addressed for both debug and normal operation by adjusting memory timing. The 90nm memory generator includes an option to provide variable read timing for the memory. This works by delaying the timing of the self-timing path in the sense amp circuitry, which allows for a more robust read at the expense of additional delay in circuit operation. The bit cells have additional time to discharge either the bit line or its complement, resulting in a higher differential voltage and a more robust read, even for a weakened cell. Note that the delay is in terms of maximum memory operating speed, so for a lower frequency operation (e.g. 100 MHz), the added delay would be zero. Table 5 describes the option in more detail.

EMA value 00 01 10 11

Result Default, normal operation Adds 25% additional delay Adds 60% additional delay Maximum delay, tolerates about 40% variation in read current Table 5 Extra Margin Adjustment values

As an example use, a BISR algorithm might run in 00 mode at startup, presumably at room temperature, and reconfigure/repair the memory based on these results. It could then set EMA to 01 and provide a 25% guardband to protect against any problems resulting from higher temperature operation after the device warms up. Similarly, a low frequency design might be tested at room temperature with the minimum guardband, but to operate at a reduced margin for security against signal integrity issues. Since the EMA option does not increase the area of the SRAM, the decision to use it or not depends only on performance needs and overall design robustness. 6 Error Correcting Code

Soft error detection and correction is becoming increasingly important in many system designs. While the complexity and variety of potential approaches to soft errors is beyond the scope of the generator (e.g. latency, rollback, multiple word combination), the ability to provide a basic error correction is available. This option produces RTL to enable a single bit correction, single or double bit error detection Hsiao code [11] on one word at a time in the memory. Ordinarily, the largest number of physical columns that are generated is 1024 (1026 with redundancy), but with this option additional columns are generated to accommodate the code bits (log2 n + 2 for single correct, double detect, where n is the number of bits in the word). The combination of extra memory and logic area typically increases by 20-30%, while cycle time typically doubles. This should be adequate for many applications. For those designs where higher performance or lower area is necessary, other approaches that make use of the basic memory generator are available commercially.

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 7 1063-6722/03 $17.00 2003 IEEE

Conclusions

As process technologies advance and the amount of embedded memory on chip continues to rise, repairable memory is increasingly becoming a fact of life on complex SoCs. Memory generators are necessary to enable efficient SoC design, and DFT features are required on generated memories to enable high quality chip testing. Several important DFT features have been outlined and their implementation in a 90nm embedded SRAM generator described. 8 References [1] B.F Fitzgerald and E. P. Thoma, "Circuit Implementation of Fusible Redundant Addresses of RAMs for Productivity Enhancement," IBM J. Res. Development, Vol.24, pp. 291-298, 1980. [2] A.J. van de Goor, Testing Semiconductor Memories: Theory and Practice, John Wiley and Sons, 1991. [3] D.K. Bhavsar, An Algorithm for Row-Column Self Repair of RAMs and Its Implementation in the Alpha 21264, Proc. International Test Conference, pp. 311-318, 1999. [4] K. Zarrineh et al, Self Test Architecture for Testing Complex Memory Structures, Proc. International Test Conference, pp. 547-556, 2000. [5] R. D. Adams, High Performance Memory Testing: Design Principles, Fault Modeling, and Self-Test, Kluwer, 2002. [6] O. Hirabayashi et al, DFT Techniques for Wafer-Level At-Speed Testing of High Speed SRAMs, Proc. International Test Conference, pp. 164-169, 2002. [7] J. Jayabalan and J. Povazanec, Integration of SRAM Redundancy into Production Test, Proc. International Test Conference, pp. 187-193, 2002. [8] A. Meixner and J. Banik, Weak Write Test Mode: An SRAM Cell Stability Design for Test Technique, Proc. International Test Conference, pp. 309-318, 1996. [9] J. Brauch and J. Fleischman, Design of Cache Test Hardware on the HP PA8500, Proc. International Test Conference, pp. 286-293, 1997. [10] M.R. Becer et al, Early probabilistic noise estimation for capacitively coupled interconnects, Proc. International Workshop on System-Level Interconnect Prediction, 2002, http://doi.acm.org/10.1145/505348.505365 [11] M.Y. Hsiao, "A Class of Optimal Minimum Odd-Weight Column SEC-DED Codes," IBM J. Res. Develop., Vol.14, pp 395-401, 1970. [12] R. Ross and N. Atchison, Yield Modeling, pp. 851-868 in Handbook of Semiconductor Manufacturing Technology, Y. Nishi and R. Doering, ed., Marcel Dekker, 2000. [13] IEEE P1500 draft standard, see http://grouper.ieee.org/groups/1500/

Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT03) 8 1063-6722/03 $17.00 2003 IEEE

Вам также может понравиться