Вы находитесь на странице: 1из 9

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO.

2, FEBRUARY 2007

173

A New Single-Ended SRAM Cell With Write-Assist


Richard F. Hobson
AbstractA 6T static random access memory (SRAM) cell with a new write-assist (WA) feature is presented. The WA technique reduces the problem of writing a one through an nMOS pass device, thereby making a single-ended bit line more attractive. Both active power and leakage power can be signicantly reduced. Leakage charge can be pooled to help precharge bit lines. Cell area and performance are competitive with traditional SRAM cell area and performance. Index TermsLeakage powered bit lines, low leakage static random access memory (SRAM), low power memory, single-ended 6T SRAM cell.

I. INTRODUCTION TATIC random access memory (SRAM) is a vital part of most system-on-chip (SoC) applications. SRAM power consumption in both the active and standby states is an important concern, especially for microprocessor cache memories [1][4], [18][22]. The standard 6T SRAM cell (STDcell) uses a pair of differential bit lines for input-output (I/O) via a pair of nMOS pass devices, as shown in Fig. 1. All of the cited work shows that careful choice of transistor threshold voltages can substantially reduce leakage power. Other techniques may voltage adjustments, or substrate include dynamic bias voltage adjustments [20]. Asymmetric cell design, preferentially reducing the leakage of dominant 0 bits has also been proposed [4], [21]. Adding more transistors and/or additional access ports has also been shown to effectively reduce power [18]. Single-ended I/O (SEIO) bit line variations on the 6T structure have been proposed but not widely adopted, e.g., [5][7]. Reducing to a 5T cell is attractive due to the potential for cell area reduction. The main problem with SEIO is that writing a one (write-one) through an nMOS pass device poses a difcult design challenge. However, SEIO also has considerable potential for active and standby power reduction, even if the number of transistors in the cell is not reduced. This paper presents a new SRAM cell design with a write-assist (WA) technique that facilitates SEIO (WAcell). Techniques for substantial active and leakage power reduction are also introduced, based on the use of WAcell. All results have been obtained using UMC BSIM3v3.2 130-nm CMOS Hspice models at 110 C. This work has its origin in a series of experiments by the author involving the SRAM cell shown in Fig. 2. In this schematic, transistor P3 replaces transistor N4 of the traditional Fig. 1 schematic. Writing is performed over a dedicated WRITE

Fig. 1. STDcell schematic.

Fig. 2. 6T SRAM cell with dedicated SEIO READ/WRITE busses.

bus (WB), with a decoded WRITE select signal (W). Reading is performed over a dedicated READ bus (RB) with a decoded active low READ select signal (RZ). The read bus is generally and pulled up if . Several test precharged to chips with typical memory size of 2 K 32 bits were successfully prototyped in CMOS technologies ranging from 800 down to 180 nm [8]. Although these memories tended to be slower than commercially available memories (e.g., 200 MHz in 180-nm CMOS), they have potential for lower power consumption. The new cell introduced in Section II has potential for both high speed and low power. Sections II and III introduce WAcell and some leakage suppression techniques. Sections IV and V compare writing and reading with STDcell. Section VI expands on WAcell READ stability. Section VII concludes this paper. II. GENERAL WACELL FEATURES Fig. 3 shows a 6T SRAM cell schematic with the proposed WA feature [10]. Instead of having separate READ and WRITE busses, as in Fig. 2, the READ bus has been replaced by a

Manuscript received March 2, 2006; revised September 3, 2006. The author is with the School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada (e-mail: rick@cs.sfu.ca).

1063-8210/$25.00 2007 IEEE

174

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

TABLE I SAMPLE OF TRANSISTOR THRESHOLDS, E,G, [9]

Fig. 3. Proposed 6T SRAM cell with WA (WAcell).

Fig. 5. Typical WAcell (A) READ timing (B) and WRITE timing.

Fig. 4. Typical WAcell layout style. As drawn, metal 1 appears to be below polysilicon.

ground (or oating ground, FGND) connection. Both reading and writing take place over a common SEIO BIT line. Fig. 4 shows a possible layout that is the same area (1.3 u 1.8 u) as the conventional cell shown in [1], using similar design rules. WAcell uses metal-2 for RW and WAZ (running vertically) , and (running horizontally. and metal-3 for BIT, Metal-2 and -3 are removed for clarity. There is also room for a global-BIT (GBIT) line (horizontal) and an additional vertical wire if necessary. The STDcell may not need three layers of metal for small memories, as the WORD line is in metal 1 and the BIT lines are in metal 2 [1]. An alternative STDcell layout with a metal-3 BIT line option is shown in [16]. Advanced CMOS technology offers a designer several choices for transistors with various thresholds, as shown for example, in Table I (130-nm CMOS). High speed (HS) devices use low thresholds to gain performance while low leakage (LL) devices have higher thresholds, trading performance for lower power. Standard (or medium) performance (SP) is in-between. Careful selection of transistors is important to power consumption, performance, and noise margin [1][3]. Fig. 5 shows typical access timing. Reading consists of a decode/precharge stage followed by a READ/sense stage, then a capture/output stage. This is discussed more in Section V.

During a WRITE, decoding and data are activated followed by RW, WAZ, and FGND. WAZ and FGND should be pulsed ), to save power (Section IV). If node QZ is initially high ( transistor P3 lowers the voltage enough to weaken N2 (inside , as shown in Fig. 6. This inverter 2), e.g., signicantly improves write-one performance. The ratio of QZs to , where is the voltage below which the cell changes state, can be controlled to a safe level by choosing the widths and lengths of P1 and P3 appropriately, as shown in Fig. 6. The worst process corner case for WRITE is slow-N, slow-P (SS). However, the widths and lengths of P1 and P3 must be chosen to be safe over all corner cases. Fig. 6 shows both the ratios are nearly the SS and FF cases. The QZ N3 is higher for FF, as one would same for both cases. more for the FF case than the SS expect. Also, P3 reduces case. Minor statistical variations to the process parameters of P1 and P3 (and the other transistors) can be tolerated if they are carefully sized to start with. Note that during WRITE, RW is active before, during, and after WAZ is activated. If BIT is 0, Q is held rmly at that value until after WAZ is released. So WAZ does not play a signicant role ), Q is pulled up past the trip in writing a 0. If BIT is 1 ( point and driven high by the regenerative action of the cell. At this point, neither N3 nor P3 play any further role, as they shut off. If P3 could be made strong enough to ip the cell, writing would still work but the cell could have to ip back (to 0) after the WAZ pulse. This uses more time and power so it is best not to make P3 stronger than necessary.

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

175

TABLE II STDCELL LEAKAGE WEIGHTS WITH CLAMPED (PRECHARGED) BIT LINES. TRANSISTOR NAMES N1, N2, ETC. ARE DEFINED IN FIG. 1 TRANSISTOR STRENGTHS (LL, SP, HS) ARE DEFINED IN TABLE I

Fig. 6. Transistor P3 pulls node QZ down to weaken transistor N2. Without P3, I N3 occurs at 9.5 on the normalized Y -axis (SS case). Beta P1 = 1:07. Minimum WRITE trip current through N3 is reduced considerably.

TABLE III WACELL LEAKAGE WEIGHTS WITH CLAMPED BIT AND FGND. TRANSISTOR NAMES N1, N2, ETC. ARE DEFINED IN FIG. 3

Fig. 7. SNM buttery voltage curves for reading (upper) and writing (lower). Beta N3 = 1:23, Beta N2 = 1:58, Beta P3 = 1:67, Beta P1 = 1:07.

Fig. 8. Transistors N1 (N3 in Fig. 3) come from memory cells with Q = 1 while N0 have Q = 0. V is connected to one or more BIT lines through a transistor. Similarly, P1 (P3 in Fig. 3) is for a memory cell with QZ = 1, and P0 has QZ = 0. V connects to one or more FGND lines through a transistor. The conguration shown implies an equal number of 0s and 1s stored in the memory.

Beta-READ ( ) is dened as the ratio of for transistor N2 ( ) to for transistor N3 ( ) (Fig. 3). In STDcell, is usually required to be larger than 1.3 for READ stability can improve READ performance. [1]. Relaxed constraints on With WAcell, values comparable to 1.3 or less are achievable (see Section VI). Cell stability is often demonstrated by static noise margin (SNM) buttery curves [2]. Fig. 7 shows simulated SNM buttery curves for transistor sizes corresponding to the layout in Fig. 4. Traditionally, READ curves are generated by shorting the , while QZ is swept between 0 and BIT line and RW to . The equivalent for WAcell WRITE curves is generated by while Q is swept between 0 shorting FGND and WAZ to and . Transistor threshold plays a signicant role in the SNM [1][3]. III. LEAKAGE SUPPRESSION Transistor leakage power increases primarily with reductions in threshold voltage and channel length [1][3]. Consequently, it has become an important SoC concern. STDcell leaks internally as well as to one BIT line or the other, due to symmetry. Table II shows the major components of cell leakage for STD,a of 1.3, and transistors of the cell with busses held at indicated type. These factors can be varied considerably by selecting transistors with different properties. For example, if all

high-speed transistors are used, the relative leakage jumps from 4.3 to 16.2. Table III shows some corresponding leakage factors for WAcell. Because WAcell is asymmetric, its leakage contributions leaks more than STDcell are quite different. The case because both sides of the cell are leaking maximally. However, , the total leakage is much less than for STDcell bewhen cause both sides of the cell are leaking minimally. Under these conditions, the average leakage power in WAcell with an equal number of 0s and 1s is 72% of STDcell leakage. Some memory applications, such as computer caches, are known to have more zeros than ones stored in them. In this case, the polarity of data on the BIT line should be inverted (i.e., BITZ in Fig. 3) in favor of writing 1s for the low leakage case [4], [18], [20]. Fortunately, this polarity is the same for reading, writing, and leakage. Under these assumptions, WAcell leakage reduces to only 53% of STDcell leakage. Further reductions are and , possible if BIT and FGND are not clamped at respectively. Fig. 8 introduces pairs of memory cell leakage circuits. N-leakers are connected to local-BIT lines which are shown at voltage . If there are an equal number of 0s and 1s stored along the BIT line, there will be an equal number of N1 and N0

176

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 10. Connecting FGND to charge pool at V .

Fig. 9. Subthreshold current behavior. The arrows show minimum power points for N- and P-leaker pairs as presented in Fig. 8. Transistors are from the FF corner.

devices. The mix can be adjusted according to the percentage of 0s and 1s. Similarly, P-leakers are connected to FGND lines which are shown at voltage . All devices are in the off-state. A reduction of 1.7 or more in BIT line leakage power was reported in [3], [15] for BIT lines that are only precharged when necessary (hence, leakage-biased bit-lines). This reduction depends upon the time between access cycles as well as the threshold voltage of the pass device and the number of cells contributing to the leakage. Fig. 9 shows the Ids leakage current for various output voltages of the individual leaker transistors of Fig. 8 (with weights shown in Table III). The minimum power for N, and point for a pair of leakers (Fig. 8) is close to for P (shown with arrows in Fig. 9). If and close to are clamped to , the N1 and P1 transistors leak maximally while the N0 and P0 transistors do not leak at all. Similarly, and are clamped to , N0 and P0 leak maximally. if The maximum difference is over 10 for P and nearly 30 for N. Thus, it is easy to see why BIT lines clamped to a xed precharge voltage may cause serious leakage. Now suppose that a BIT line has been precharged to a voltage or (typically 0.60.7 ). This would place the either N-arrow of Fig. 9 near the right end of the -axis where N0 devices are leaking near maximum and N1 devices are leaking near 0. Since power is measured relative to the N1 devices, N-leakage power has dropped to near 0. Of course, the price for precharging, in terms of power, is paid right away. The point is that if charge that has been paid for (e.g., through a precharge) can later be diverted to a pool that acts to reduce leakage, then the cost of the precharge may be partially or fully recovered through reduced leakage. P-leakage can also be put to good use. In WAcell (Fig. 3), the supply line can be made to oat (FGND) with little or P3 no impact on cell layout area (Fig. 4). In this case, all of the cells ) stored at node QZ, will gradually in a word that have a 1 ( via the leakage path. When pull the P3 FGND up towards words are not accessed, FGND will self bias to near the P-arrow point of Fig. 9, permitting leakage power to drop by 2 to 7 , depending upon the properties of the P3 transistor (the SS corner leaks much less than the FF corner). By connecting FGND to a common charge pool with voltage, , this charge pool can be used to dynamically switch active FGND wires and precharge BIT lines. Fig. 10 shows one way

Fig. 11. Circuit for demonstrating leakage-based precharge. For experimental purposes, set Cb = 18 fF, Cp = 103 Cb, WNp = 1:75 3 min, WNs  3:5 3 WNp, WP1 = WP0 = 1:25 3 min.

to switch an FGND line with N-memory cells (P3 connections) on it (e.g., 1 word of 32 bits) to the charge pool with M FGNDs ). An (e.g., M words), when in a standby state (Write Pulse n-channel transistor can be used to subsidize the charge pool to guarantee that the voltage will not drop below a safe value during a precharge operation. The subsidization device is not necessary for FGND only predischarge. To see how leakage power can be harnessed for precharging BIT lines, consider Fig. 11. With Np off and no subsidization will be at a steady state similar to point (Ns off), voltage V in Fig. 9. When Np is turned on, if starts at 0 V, that depends on the relative sizes of it will rise to a voltage Cp and Cb, as well as the pull-up strength of both the P-leakers and the Ns subsidize device. The contribution of transistor Ns depends on its width and its threshold. For an HS nMOS as in below 0.95 V. Table I, it starts to contribute to Fig. 12 shows some specic results for the circuit of Fig. 11. was initialized to 0.9 V and to 0 V. Cb was Voltage is charged to 0.75 V over 0.6 ns out of a period of 2 ns. required to end up no lower than 0.9 V after 2 ns in order to be able to sustain the precharge operation for multiple cycles. Precharge power with no P-leakers is normalized to 1 (subsidization only). The top trace of Fig. 12 shows power as P-leaker pairs are added without using them for precharge. The bottom trace shows P-leaker power only. The top trace is simply the sum of the precharge power plus the leakage power. The middle trace Together, shows power with both P-leakers and subsidization device contributing charge, as in Fig. 11. Up to about 32-K P-pairs (64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb) the leakage power portion (in

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

177

Fig. 12. Several combinations of leakage and precharge power.

Fig. 13. Write-1 (W1) effectiveness as a function of the strength of N3. Beta N2 = 1:58; Beta P3 = 1:67. Without P3, Beta N3 has to be 4:6 to ip Q from 0 to 1. SS ip time with a Beta N3 of 1.23 is 0.1 ns.

Together) is 37% of the Leak Only value. Starting at 48-K pairs, the subsidization device is not necessary to maintain at precharged to 0.75 V. The trace Nopor above 0.9 V with ullup (Ns off) shows further power reduction. Leakage has been reduced to 27% of the Leak Only value. Total power with 128 Kb and Nopullup is about half of the separated precharge and leakage power. Fig. 12 was generated with FF (fast-fast) process parameters and a 5050 split of 0s and 1s. Under these conditions higher than under typical or slow conleakage may be ditions. Because the subsidization device automatically turns drops, the circuit of Fig. 11 will work effectively on when under all leakage conditions. In a more advanced technology where leakage is higher, the previously mentioned techniques should also work very well. For any technology, the subsidization device Ns, should be chosen to compensate for uctuations in leakage. The strongest leakers are the N0 and P1 devices of Figs. 8 and voltage range than 9 because they leak over a much wider N1 and P0. Thus, even if there are over 90% cells with (N1 case), (P0 case), the other 10% will still maintain and steady-state values within 0.1 V of the arrows in Fig. 9. With inverted BIT lines, this is actually 90% data . Under such skewed conditions, more subsidization will be used for BIT line precharge. IV. WACELL WRITE EFFECTIVENESS During an STDcell WRITE operation, differential BIT lines are driven to complementary data values, so one side of the symmetrical cell can be forced to zero. With WAcell, the data value (or its complement) must be driven onto a single bit line. When this ), there is no problem ipping the state of value is zero ( ), some a memory cell without assistance. When it is one ( form of WRITE assistance is required. One type of WRITE assistance is to skew the layout lengths and widths until WRITE becomes stable [7]. For example, the cell in Fig. 2 requires a fairly long channel for N2 so it can be overpowered by N3. When reading and writing both occur through the same port, it is difcult to size transistors to account for all process parameter vari, without loosing some noise margin. ations, including With the technique shown in Fig. 3, transistor P3 is activated by WAZ, and pulls node QZ (which is at for this case) down by a predetermined amount. This weakens transistor

Fig. 14. Write-1 performance variation with reduced V

. Beta N2 = 1:6.

N2, thereby permitting node Q to be easily pulled up above the WRITE threshold by transistor N3. Fig. 13 shows how cell ip-time varies with the strength of N3. These were measured . from 50% of WRITE select to the point where Fig. 13 also shows how strong the N3 transistor is relative to the minimum required to trip the cell. Another sensitivity of interest is the tolerance to variations . Fig. 14 shows variations in cell WRITE time as is in reduced. Tolerance well beyond the normal 10% is possible. could be lowered as an additional viable This means that method to conserve power. Other SEIO cells, e.g., Fig. 2, are not . This is also shown in Fig. 14. so tolerant to reductions in WRITE power depends upon many choices from the cell level up to the system level. Since this papers focus is mainly cell oriented, only limited system level details are included. One important point is that local-BIT line partitioning is facilitated by the global-BIT line that runs in parallel to the local-BIT line through the WAcell. Thus, local-BIT lines in a WAcell memory may be kept short relative to typical STDcell memories, cf. Fig. 15. There are typically 28 times as many STDcells on local-BIT lines as WAcells on local-BIT lines. Shorter local-BIT lines help to speed up the READ operation and reduce active power consumption. Since STDcell (Fig. 1) has complementary BIT lines, it is assumed for comparison that one BIT line must always be driven during a WRITE operation. Also, both BIT lines get to before either a READ or a WRITE (but one precharged to ). WAcell has a local-BIT line of them will already be at that operates in conjunction with the global-BIT line (GBIT),

178

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

TABLE IV NORMALIZED WRITE POWER FOR VARIOUS W0 AND W1 OPERATIONS RELATIVE TO STDCELL

Fig. 15. BIT-line structure of WAcell (A) and (B) STDcell. Sense/WRITE (precharge) circuits interface GBIT to local-BIT lines. Global I/O connects to the next level.

Fig. 16. WAcell local-BIT line setup for writing. Data is driven onto GBIT by the WRITE driver, then onto BIT through the transmission gate. BIT is parked at voltage V at the end of a cycle. Sensing and precharge are not used during a WRITE. Precharge can be done through the transmission gate during a READ.

as shown in Fig. 16. There are some differences from STDcell that follow: It is not necessary to precharge WAcells BIT line before writing. The correct data value may be directly written onto BIT through a transmission gate (to GBIT). According to [3], [15], BIT line leakage is minimized if they are permitted to self-bias to a steady state. As described in Section III, considerable additional power can be saved if BIT lines are parked to a common voltage , as shown in Fig. 16. GBIT lines can be directly driven to their correct starting voltage from the nal voltage at the end of their last active cycle. When writing a 1, GBIT will remain unchanged if the last value was also a 1. Similarly when writing a 0, GBIT will remain unchanged if the last value was a 0. WAcell BIT wires do not couple with adjacent cell BIT wires because they are shielded by power lines on the boundary between cells. BIT wires do couple with GBIT wires. Both move in the same direction during READs and WRITEs, so the coupling is not destructive. References [3] and [15] do not discuss wire coupling effects on oating BIT lines. Parking helps to prevent spurious voltage excursions on oating BIT lines. WAcell has a small dc power component during a WRITE, due to the pull-down of transistor P1 by transistor P3 (Fig. 3). To limit this power feature, WAZ and FGND should be activated with a short pulse. Some timing techniques are discussed in [13] and [14]. The Park transistor in Fig. 16, connects a BIT line to the charge pool. This pool extends to the other three local-BIT

lines that are also parked. Thus, any charge that is pooled to has already been paid for powerwise, but may be reused to reduce or eliminate N-leakage power. In addition, if the voltage rises because charge is supplied faster than it is removed by N0 leakers, the cost of future precharging is also reduced because the BIT line voltage swing is reduced. This is demonstrated by the write-1 (W1) sequence in rows 46 of Table IV (and is discussed in the next few paragraphs). and charge pools is a system level The extent of the consideration. For example, if several adjacent BIT lines are pooled together, some will be adding charge to the pools while others are removing charge from the pools. Table IV shows relative WRITE power for STDcell and WAcell for various initial conditions and data values. It is assumed that there is a single column of 128 STDcells per local BIT line in the conventional architecture and four sets of 32 WAcells per global-BIT line in the partitioned architecture. GBIT is long enough to span three sets of 32-WAcells, as shown in Fig. 15. BIT wire capacitance per micron was taken to be the same for both cells (STD and WA), as they both have the same pitch (1.3 m) and wire density (three wires per cell). The power is for leakage (50% 0/1), bus, and cell state change over a 1.5-ns interval with a WAZ and FGND pulse of 0.2 ns (increasing this pulse to 0.3 ns affects WRITE power by less than 2%). STDcell architecture uses the same amount of power for writing 0 or 1, which comes from two BIT line transitions, plus STDcell state change power, plus leakage power (64 pairs of N-leakers per BIT line). It is normalized to 1. Leakage contributes about 6%. The lowest partitioned architecture power is for W0 when GBIT, and BIT are already near 0. This is less than 1/16th of the standard architecture power. W1 uses the most power when GBIT and BIT are initially near 0, but this is still less than half of STDcell. The other values are near one-third and one-fourth of the standard architecture power. Table IV rows 2, and 46 show that W1 power reduces to less than 1/12th of the standard power when a sequence of 1s are written (approaching W0 performance). This is because the voltage gradually rises after each park operation ( column). As previously mentioned, computer caches are known to WRITE more 0s than 1s to memory [4], [18], [20] (e.g., 70%), so the polarity of data on BIT and GBIT should be chosen to take advantage of this. Writing a sequence 110110 uses an average power of 27.6% of the standard power while the sequence 001001 uses 27.8% of the standard WRITE power. This implies that BIT line polarity should be the complement of

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

179

Fig. 17. Worst case BIT line leakage. Fig. 19. STDcell READ-sense circuit for comparison. A BIT line swings up V =2 during precharge and down V =2 during a READ. It is assumed that there are half 0s and half 1s for leakage purposes.

TABLE V NORMALIZED READ POWER FOR VARIOUS CONFIGURATIONS, R0, AND R1 RELATIVE TO STDCELL Fig. 18. READ-sense circuit with precharge and select. GBIT spans 96 WAcells as shown in Fig. 15(a). An output circuit connects to GBIT.

the data, as was also found in Section III. Stronger arguments for this are presented under READ conditions in Section V. In [18], an extra dedicated WRITE port was added to the STDcell. For cases when their WRITE-BIT line did not require a transition, power consumption was less than 1/30th of the STDcell power (350-nm CMOS). This is better than row 2, column W0 in Table IV, mainly because of the larger voltage swings in 350-nm CMOS. With a WRITE-BIT line transition, power was over 94.5% of the STDcell power in [18]. This is higher than similar cases in Table IV. V. WACELL READ EFFECTIVENESS In advanced technologies, BIT line leakage due to nonaccessed cells can cause BIT lines to drift low after precharge [1][3], [11]. This is an asymmetric phenomenon, where across source and drain many N3 (N4) transistors with (N0-leakers) can pull a BIT line down from a full precharge . This is more detrimental to differential bus sensing of methods than to single-ended bus sensing methods [11]. There is a lower limit to this leakage dened by Fig. 17. In is selected. The this case, a single SRAM cell with memory cells N-pass device will hold the bus at an intermediate voltage value. If all of the other cells on the bit line are leaking (perhaps 31 or 63 of them) the resulting voltage will to ). One be slightly less than the nMOS pull-up value ( can take advantage of this and precharge to a similar reference voltage. With fewer cells on a local-BIT line (e.g., 32), pulldown leakage is correspondingly less. Fig. 18 shows a possible READ-sense circuit. To operate with a suitable switching point, the inverter structure and feedback mechanism need to be appropriately sized. Also, devices with different thresholds can be used, for example, to increase speed. For comparisons, [11] shows both dynamic and static precharge, [13] shows a NAND-style sense amps for a full precharge, latch-technique for differential sensing, and

while [12] shows a current mode sense amp for a lower voltage precharge. precharge To make use of the previously mentioned scheme, the BIT line pull-up (Pre) could be directly to through either an nMOS or a pMOS device, or indirectly to via a transmission gate to GBIT, as in Fig. 16. READ simulations were performed to compare the circuit shown in Fig. 18 with the more standard circuit shown in Fig. 19, cf. [16] and [17]. Table V shows various power results. The STDcell READ was limited to a voltage swing of , and normalized to 1 (this was 54% of the STDcell WRITE power). The worst case WAcell READ (R0) is 82% V). of the STDcell READ (GBIT initially 0, and uses only 23% of Reading a 1 starting with GBIT at 0.6 the STDcell READ. Reading a sequence of 1s as in row 2, and 46, shows that READ power gradually drops to 1/16th or less of the standard architecture power due to the increased voltage (voltage swings on the local-BIT line as well as GBIT are reduced). Reading a series 110110 uses an average power of only 34% of the standard, while the series 001001 uses 62% of the standard value. Thus, cache-like situations where there are more 0s than 1s should again store the complement of the data for minimal power. Reading alternating values 101010 uses 48% of the standard architecture power. Mixing READ and WRITE in the sequence: w1r1r0r1w1r0r1r1w0 uses 35% of the standard architecture power. 0.55 ns, while Word select to output time for Fig. 18 is for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the before advantage of needing only a threshold drop from

180

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 20. Stability of READ reduces gradually as Beta N3 increases. Beta N2 = 1:58.

being turned on. However, aggressive sense timing can lead to more sense power. VI. READ CELL STABILITY During a memory READ operation, BIT lines are precharged , but could be as to an initial state, which is traditionally . After precharge, a memory cell is engaged to eilow as ther pull a BIT line down, or hold it close to (possibly raising it above) its initial value. When the state of node Q is initially , the READ operation causes a maximum disturbance at . If is too high ( ) the cell state voltage on Q, changes spuriously from 0 to 1. The ability of a cell to withstand . Fig. 20 shows this operation could be dened as how READ stability is affected by varying Beta N3, and hence, . The worst case process corner for READ stability was fast-N, fast-P (FF). READ stability increases a small amount if a lower precharge voltage is used [7]. A good choice for Beta N3 would lie between 1.2 and 1.4 in Fig. 20. , where is the reOther authors have used quired amount of test current forced through N2, injected at is the maximum node Q, to change the state of the cell. READ current through a pass device (N3) [3]. This is also shown tracks in a similar way to in Fig. 20. The ratio . VII. CONCLUSION A new 6T SRAM cell with a single BIT line architecture has been presented, featuring several power saving techniques. Standby leakage power can be cut in half (depending upon the distribution of 0s and 1s) compared with the STDcell. Large savings, 2 7 are possible (for either cell) when BIT or FGND are near the optimal points shown on Fig. 9, rather than being clamped at a high (low for FGND) precharge voltage. Active leakage power may be reduced to less than 27% of the standard architecture by pooling BIT line charge. This depends pool reon process parameters as well as memory size. The ceives BIT line charge after a cycle (parking), while the pool is used to precharge BIT lines for reading. Standby leakage is and time constants also reduced by this technique, as the could be hundreds to thousands of clock cycles. Thus, after a (or deburst of activity, whatever increased voltage is left on creased voltage on ) has already been paid for powerwise, so standby leakage continues to be reduced as and decay back to their self-biased value.

The combination of partitioned BIT line, charge pooling, and single BIT line can cut WRITE power from one-half to 1/16th of the standard architecture, with typical values around one-fourth. READ power is similarly reduced, with typical values of onethird of the standard architecture. Local-BIT line parking could increase memory access time a small amount as this is initiated at the end of a cycle where precharge is traditionally located. However, precharge should only be done on an as-needed basis at the beginning of a READ cycle to avoid high clamped BIT line leakage power. Relocating precharge should leave adequate time for parking. Since GBIT lines are not parked, some timing overlap is possible. and Future work could investigate optimal ways to join pools together across multiple memory blocks. For example, values reduce READ/WRITE power but joining many larger voltage increments memory blocks together will reduce the because the capacitance also increases. If leakage power reduction is high enough, joining multiple memory blocks together will be benecial. As on-chip memory size increases into multimega bits, one could look beyond the memory for opportunities to utilize and pools for functions leakage power. Any use of the and ush it that would otherwise get power straight from to , could help cut leakage power (e.g., precharged buses). Extensive transistor threshold adjustments can be made to speed up a memory access, or further reduce leakage power [1][3]. These adjustments generally apply to WAcell as well as to STDcell. ACKNOWLEDGMENT The Canadian Microelectronics Corporation arranged access to 130-nm CMOS process information under nondisclosure. The CMC and the Natural Sciences and Engineering Research Council of Canada have also facilitated several generations of SRAM prototyping. REFERENCES
[1] R. W. Mann et al., Ultralow-power SRAM technology, IBM J. Res. Dev., vol. 47, no. 5/6, pp. 553566, Sep./Nov. 2003. [2] T. B. Hook, M. Breitwisch, J. Brown, P. Cottrell, D. Hoyniak, C. Lam, and R. Mann, Noise margin and leakage in ultra-low leakage SRAM cell design, IEEE Trans. Electron Devices, vol. 49, no. 8, pp. 14991501, Aug. 2002. [3] F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, Analysis of dual-Vt SRAM cells with full-swing single-ended bit line sensing for on-chip cache, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 2, pp. 9195, Apr. 2002. [4] N. Azizi, A. Moshovos, and F. Najm, Low-leakage asymmetric-cell SRAM, in Proc. ISLPED, 2002, pp. 4851. [5] H. Tran, Demonstration of 5T SRAM and 6T dual-port RAM cell arrays, in Proc. IEEE Symp. VLSI Circuits, 1996, pp. 6869. [6] C. Wang, C. Wu, R. Hwang, and C. Kao, Single-ended SRAM with high test coverage and short test time, IEEE J. Solid-State Circuits, vol. 35, no. 1, pp. 114118, Jan. 2000. [7] I. Carlson, Design and evaluation of high density 5T SRAM cache for advanced microprocessors, M.S. thesis, Dept. Electr. Eng., Linkoping Univ., Linkoping, Sweden, 2004. [8] R. Hobson, A compact multiport static random access memory cell, U.S. Patent No. 5 754 468, May 19, 1998. [9] UMC, Taiwan, SoC solutions, 2005 [Online]. Available: www.umc. com/English/process/b.asp [10] R. Hobson, Write-Assisted SRAM Bit Cell, U.S. Patent No. 6 804 143, Oct. 12, 2004.

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

181

[11] F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, Dual-Vt SRAM cells with full-swing singleended bit line sensing for high-performance on-chip cache in 0.13 m technology generation, in Proc. ISLPED, 2000, pp. 1519. [12] H. Kondoh, H. Yamanaka, M. Ishiwaki, Y. Matsuda, and M. Nakaya, An efcient self-timed queue architecture, in Proc. IEEE CICC, 1994, pp. 637640. [13] K. Mai, T. Mori, B. S. Amrutur, R. Ho, B. Wilburn, M. Horowitz, I. Fukushi, T. Izawa, and S. Mitarai, Low-power SRAM design using half-swing pulse-mode techniques, IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 16591671, Nov. 1998. [14] B. Amrutur and M. Horowitz, Fast low-power decoders for RAMs, IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 15061515, Oct. 2001. [15] S. Heo, K. Barr, M. Hampton, and K. Asanovic, Dynamic ne-grain leakage reduction using leakage-biased bitlines, presented at the ISCA-29, Anchorage, AK, May 2002. [16] J. Greason, D. Buehler, J. Kolousek, Y. Ng, K. Sarkez, P. Shay, and A. Waizman, A 4.5 Megabit, 560 MHz, 4.5 GByte/s high bandwidth SRAM, in Proc. Symp. VLSI Circuits, 1997, pp. 1516. [17] R. Singh and N. Bhat, An offset compensation technique for latch type sense ampliers in high-speed low-power SRAMs, IEEE Tran. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 6, pp. 652657, Jun. 2004. [18] Y. Chang, F. Lai, and C. Yang, Zero-aware asymmetric SRAM cell for reducing cache power in writing zero, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 8, pp. 827836, Aug. 2004. [19] C. Kim, J. Kim, S. Mukhopadhyay, and K. Roy, A forward body-biased low-leakage SRAM cache: Device, circuit and architecture considerations, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 3, pp. 349357, Mar. 2005.

[20] A. Moshovos, B. Falsa, F. Najm, and N. Azizi, A case for asymmetric-cell cache memories, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 7, pp. 877881, Jul. 2005. [21] M. Kandemir, M. Irwin, G. Chen, and I. Kolcu, Compiler-guided leakage optimization for banked scratch-pad memories, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 10, pp. 11361146, Oct. 2005. [22] N. Kim, D. Blaauw, and T. Mudge, Quantitative analysis and optimization techniques for on-chip cache leakage power, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 10, pp. 11471155, Oct. 2005.

Richard F. Hobson received the B.Sc. degree from University of British Columbia (UBC), Vancouver, Canada, in 1967, and the Ph.D. degree from University of Waterloo, Waterloo, ON, Canada, in 1972. He has held various appointments with the Simon Fraser University Schools of Computing Science, and Engineering Science, Burnaby, BC, Canada, since 1974. His research interests involve low power memory, embedded processor design, digital signal processing, parallel systems-on-chip, and computer hardware acceleration. Challenging real-time embedded software applications are also of interest. In 1999, he co-founded Cogent ChipWare Inc., Burnaby, BC, Canada, and became Chief Technical Ofcer.

Вам также может понравиться