Вы находитесь на странице: 1из 9

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

173

A New Single-Ended SRAM Cell With Write-Assist

Richard F. Hobson

Abstract—A 6T static random access memory (SRAM) cell with a new write-assist (WA) feature is presented. The WA technique reduces the problem of writing a “one” through an nMOS pass device, thereby making a single-ended bit line more attractive. Both active power and leakage power can be significantly reduced. Leakage charge can be pooled to help precharge bit lines. Cell area and performance are competitive with traditional SRAM cell area and performance.

Index Terms—Leakage powered bit lines, low leakage static random access memory (SRAM), low power memory, single-ended 6T SRAM cell.

I. INTRODUCTION

S TATIC random access memory (SRAM) is a vital part of most system-on-chip (SoC) applications. SRAM power

consumption in both the active and standby states is an impor- tant concern, especially for microprocessor cache memories [1]–[4], [18]–[22]. The standard 6T SRAM cell (STDcell) uses a pair of differential bit lines for input-output (I/O) via a pair of nMOS pass devices, as shown in Fig. 1. All of the cited work shows that careful choice of transistor threshold voltages can substantially reduce leakage power. Other techniques may include dynamic voltage adjustments, or substrate bias voltage adjustments [20]. Asymmetric cell design, prefer- entially reducing the leakage of dominant 0 bits has also been

proposed [4], [21]. Adding more transistors and/or additional access ports has also been shown to effectively reduce power

[18].

Single-ended I/O (SEIO) bit line variations on the 6T struc- ture have been proposed but not widely adopted, e.g., [5]–[7]. Reducing to a 5T cell is attractive due to the potential for cell area reduction. The main problem with SEIO is that writing a “one” (write-one) through an nMOS pass device poses a difficult design challenge. However, SEIO also has considerable poten- tial for active and standby power reduction, even if the number of transistors in the cell is not reduced. This paper presents a new SRAM cell design with a write-as- sist (WA) technique that facilitates SEIO (WAcell). Techniques for substantial active and leakage power reduction are also in- troduced, based on the use of WAcell. All results have been ob- tained using UMC BSIM3v3.2 130-nm CMOS Hspice models at 110 C. This work has its origin in a series of experiments by the author involving the SRAM cell shown in Fig. 2. In this schematic, transistor P3 replaces transistor N4 of the traditional Fig. 1 schematic. Writing is performed over a dedicated WRITE

1 schematic. Writing is performed over a dedicated WRITE Manuscript received March 2, 2006; revised September
1 schematic. Writing is performed over a dedicated WRITE Manuscript received March 2, 2006; revised September

Manuscript received March 2, 2006; revised September 3, 2006. The author is with the School of Engineering Science, Simon Fraser Univer- sity, Burnaby, BC V5A 1S6, Canada (e-mail: rick@cs.sfu.ca).

sity, Burnaby, BC V5A 1S6, Canada (e-mail: rick@cs.sfu.ca). Fig. 1. STDcell schematic. Fig. 2. 6T SRAM

Fig. 1.

STDcell schematic.

Canada (e-mail: rick@cs.sfu.ca). Fig. 1. STDcell schematic. Fig. 2. 6T SRAM cell with dedicated SEIO READ

Fig. 2.

6T SRAM cell with dedicated SEIO READ/WRITE busses.

bus (WB), with a decoded WRITE select signal (W). Reading is performed over a dedicated READ bus (RB) with a decoded active low READ select signal (RZ). The read bus is generally precharged to and pulled up if . Several test chips with typical memory size of 2 K 32 bits were suc- cessfully prototyped in CMOS technologies ranging from 800 down to 180 nm [8]. Although these memories tended to be slower than commer- cially available memories (e.g., 200 MHz in 180-nm CMOS), they have potential for lower power consumption. The new cell introduced in Section II has potential for both high speed and low power. Sections II and III introduce WAcell and some leakage sup- pression techniques. Sections IV and V compare writing and reading with STDcell. Section VI expands on WAcell READ sta- bility. Section VII concludes this paper.

R E A D sta- bility. Section VII concludes this paper. I I . G E
R E A D sta- bility. Section VII concludes this paper. I I . G E

II. GENERAL WACELL FEATURES

Fig. 3 shows a 6T SRAM cell schematic with the proposed WA feature [10]. Instead of having separate READ and WRITE busses, as in Fig. 2, the READ bus has been replaced by a

1063-8210/$25.00 © 2007 IEEE

174

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007 Fig. 3. Proposed 6T SRAM cell with
Fig. 3. Proposed 6T SRAM cell with WA (WAcell). Fig. 4. Typical WAcell layout style.
Fig. 3.
Proposed 6T SRAM cell with WA (WAcell).
Fig. 4.
Typical WAcell layout style. As drawn, metal 1 appears to be below

polysilicon.

ground (or oating ground, FGND) connection. Both reading and writing take place over a common SEIO BIT line. Fig. 4 shows a possible layout that is the same area (1.3 u 1.8 u) as the conventional cell shown in [1], using similar design rules. WAcell uses metal-2 for RW and WAZ (running vertically) and metal-3 for BIT, , and (running horizontally. Metal-2 and -3 are removed for clarity. There is also room for a global-BIT (GBIT) line (horizontal) and an additional vertical wire if necessary. The STDcell may not need three layers of metal for small memories, as the WORD line is in metal 1 and the BIT lines are in metal 2 [1]. An alternative STDcell layout with a metal-3 BIT line option is shown in [16]. Advanced CMOS technology offers a designer several choices for transistors with various thresholds, as shown for example, in Table I (130-nm CMOS). High speed (HS) devices use low thresholds to gain performance while low leakage (LL) devices have higher thresholds, trading performance for lower power. Standard (or medium) performance (SP) is in-between. Careful selection of transistors is important to power consump- tion, performance, and noise margin [1][3]. Fig. 5 shows typical access timing. Reading consists of a de- code/precharge stage followed by a READ/sense stage, then a capture/output stage. This is discussed more in Section V.

a capture/output stage. This is discussed more in Section V. TABLE I S AMPLE OF T
a capture/output stage. This is discussed more in Section V. TABLE I S AMPLE OF T
a capture/output stage. This is discussed more in Section V. TABLE I S AMPLE OF T

TABLE I

SAMPLE OF TRANSISTOR THRESHOLDS, E,G, [9]

TABLE I S AMPLE OF T RANSISTOR T HRESHOLDS , E , G , [9] Fig.
TABLE I S AMPLE OF T RANSISTOR T HRESHOLDS , E , G , [9] Fig.

Fig. 5.

Typical WAcell (A) READ timing (B) and WRITE timing.

During a WRITE, decoding and data are activated followed by RW, WAZ, and FGND. WAZ and FGND should be pulsed to save power (Section IV). If node QZ is initially high ( ), transistor P3 lowers the voltage enough to weaken N2 (inside inverter 2), e.g., , as shown in Fig. 6. This signicantly improves write-one performance. The ratio of QZs to , where is the voltage below which the cell changes state, can be controlled to a safe level by choosing the widths and lengths of P1 and P3 appropriately, as shown in Fig. 6. The worst process corner case for WRITE is slow-N, slow-P (SS). However, the widths and lengths of P1 and P3 must be chosen to be safe over all corner cases. Fig. 6 shows both the SS and FF cases. The QZ ratios are nearly the same for both cases. N3 is higher for FF, as one would expect. Also, P3 reduces more for the FF case than the SS case. Minor statistical variations to the process parameters of P1 and P3 (and the other transistors) can be tolerated if they are carefully sized to start with. Note that during WRITE, RW is active before, during, and after WAZ is activated. If BIT is 0, Q is held rmly at that value until after WAZ is released. So WAZ does not play a signicant role in writing a 0. If BIT is 1 ( ), Q is pulled up past the trip point and driven high by the regenerative action of the cell. At this point, neither N3 nor P3 play any further role, as they shut off. If P3 could be made strong enough to ip the cell, writing would still work but the cell could have to ip back (to 0) after the WAZ pulse. This uses more time and power so it is best not to make P3 stronger than necessary.

ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it
ip back (to 0) after the WAZ pulse. This uses more time and power so it

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST Fig. 6. Transistor P3 pulls node QZ down

Fig. 6. Transistor P3 pulls node QZ down to weaken transistor N2. Without P3,

N3 occurs at 9.5 on the normalized -axis (SS case). Beta P . -axis (SS case). Beta P .

Minimum WRITE trip current through N3 is reduced considerably.

WRITE trip current through N3 is reduced considerably. Fig. 7. Beta N , Beta N ,

Fig. 7.

Beta N , Beta N , Beta P , Beta P .

SNM buttery voltage curves for reading (upper) and writing (lower).

Beta-READ ( ) is dened as the ratio of for transistor N2 ( ) to for transistor N3 ( ) (Fig. 3). In STDcell, is usually required to be larger than 1.3 for READ stability [1]. Relaxed constraints on can improve READ performance. With WAcell, values comparable to 1.3 or less are achievable (see Section VI). Cell stability is often demonstrated by static noise margin (SNM) butterycurves [2]. Fig. 7 shows simulated SNM but- tery curves for transistor sizes corresponding to the layout in Fig. 4. Traditionally, READ curves are generated by shorting the BIT line and RW to , while QZ is swept between 0 and

. The equivalent for WAcell WRITE curves is generated by WRITE curves is generated by

shorting FGND and WAZ to

while Q is swept between 0WRITE curves is generated by shorting FGND and WAZ to and . Transistor threshold plays a

and . Transistor threshold plays a signicant role in the SNM

[1][3].

plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S
plays a signi fi cant role in the SNM [1] – [3]. III. L EAKAGE S

III. LEAKAGE SUPPRESSION

Transistor leakage power increases primarily with reductions in threshold voltage and channel length [1][3]. Consequently, it has become an important SoC concern. STDcell leaks inter- nally as well as to one BIT line or the other, due to symmetry. Table II shows the major components of cell leakage for STD- cell with busses held at , a of 1.3, and transistors of the indicated type. These factors can be varied considerably by se- lecting transistors with different properties. For example, if all

transistors with different properties. For example, if all 175 TABLE II STD CELL L EAKAGE W

175

TABLE II

STDCELL LEAKAGE WEIGHTS WITH CLAMPED (PRECHARGED) BIT LINES. TRANSISTOR NAMES N1, N2, ETC. ARE DEFINED IN FIG.1TRANSISTOR STRENGTHS (LL, SP, HS) ARE DEFINED IN TABLE I

SP, HS) ARE D E F I N E D IN T A B L E

TABLE III

WA CELL L EAKAGE W EIGHTS W ITH C LAMPED BIT AND FGND. T RANSISTOR NAMES N1, N2, ETC. ARE DEFINED IN FIG. 3

N AMES N1, N2, ETC . ARE D EFINED IN F IG . 3 Fig. 8.
N AMES N1, N2, ETC . ARE D EFINED IN F IG . 3 Fig. 8.

Fig. 8. Transistors N1 (N3 in Fig. 3) come from memory cells with while N0 have . is connected to one or more BIT lines through a transistor. Similarly, P1 (P3 in Fig. 3) is for a memory cell with , and P0 has . connects to one or more FGND lines through a transistor. The conguration shown implies an equal number of 0s and 1s stored in the memory.

an equal number of 0 ’ s and 1 ’ s stored in the memory. high-speed
an equal number of 0 ’ s and 1 ’ s stored in the memory. high-speed

high-speed transistors are used, the relative leakage jumps from 4.3 to 16.2. Table III shows some corresponding leakage factors for WA- cell. Because WAcell is asymmetric, its leakage contributions are quite different. The case leaks more than STDcell because both sides of the cell are leaking maximally. However, when , the total leakage is much less than for STDcell be- cause both sides of the cell are leaking minimally. Under these conditions, the average leakage power in WAcell with an equal number of 0s and 1s is 72% of STDcell leakage. Some memory applications, such as computer caches, are known to have more zeros than ones stored in them. In this case, the polarity of data on the BIT line should be inverted (i.e., BITZ in Fig. 3) in favor of writing 1s for the low leakage case [4], [18], [20]. Fortunately, this polarity is the same for reading, writing, and leakage. Under these assumptions, WAcell leakage reduces to only 53% of STDcell leakage. Further reductions are possible if BIT and FGND are not clamped at and , respectively. Fig. 8 introduces pairs of memory cell leakage circuits. N-leakers are connected to local-BIT lines which are shown at voltage . If there are an equal number of 0s and 1s stored along the BIT line, there will be an equal number of N1 and N0

If there are an equal number of 0 ’ s and 1 ’ s stored along
If there are an equal number of 0 ’ s and 1 ’ s stored along
If there are an equal number of 0 ’ s and 1 ’ s stored along
If there are an equal number of 0 ’ s and 1 ’ s stored along
If there are an equal number of 0 ’ s and 1 ’ s stored along

176

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007 Fig. 9. Subthreshold current behavior. The arrows

Fig. 9. Subthreshold current behavior. The arrows show minimum power points for N- and P-leaker pairs as presented in Fig. 8. Transistors are from the FF corner.

devices. The mix can be adjusted according to the percentage of 0s and 1s. Similarly, P-leakers are connected to FGND lines which are shown at voltage . All devices are in the off-state.A reduction of 1.7 or more in BIT line leakage power was reported in [3], [15] for BIT lines that are only precharged when necessary (hence, leakage-biased bit-lines). This reduc-

tion depends upon the time between access cycles as well as the threshold voltage of the pass device and the number of cells contributing to the leakage. Fig. 9 shows the Ids leakage current for various output voltages of the individual leaker transistors of Fig. 8 (with weights shown in Table III). The minimum power point for a pairof leakers (Fig. 8) is close to for N, and

a “ pair ” of leakers (Fig. 8) is close to for N, and close to
a “ pair ” of leakers (Fig. 8) is close to for N, and close to
a “ pair ” of leakers (Fig. 8) is close to for N, and close to

close to for P (shown with arrows in Fig. 9). If

for N, and close to for P (shown with arrows in Fig. 9). If and are

and

N, and close to for P (shown with arrows in Fig. 9). If and are clamped

are clamped to , the N1 and P1 transistors leak maximally while the N0 and P0 transistors do not leak at all. Similarly, if and are clamped to , N0 and P0 leak maximally. The maximum difference is over 10 for P and nearly 30 for N. Thus, it is easy to see why BIT lines clamped to a xed precharge voltage may cause serious leakage. Now suppose that a BIT line has been precharged to a voltage either or (typically 0.60.7 ). This would place the N-arrow of Fig. 9 near the right end of the -axis where N0 devices are leaking near maximum and N1 devices are leaking near 0. Since power is measured relative to the N1 devices, N-leakage power has dropped to near 0. Of course, the price for precharging, in terms of power, is paid right away. The point is that if charge that has been paid for (e.g., through a precharge) can later be diverted to a pool that acts to reduce leakage, then the cost of the precharge may be partially or fully recovered through reduced leakage. P-leakage can also be put to good use. In WAcell (Fig. 3), the P3 supply line can be made to oat (FGND) with little or no impact on cell layout area (Fig. 4). In this case, all of the cells in a word that have a 1 ( ) stored at node QZ, will gradually pull the P3 FGND up towards via the leakage path. When words are not accessed, FGND will self bias to near the P-arrow point of Fig. 9, permitting leakage power to drop by 2 to 7 , depending upon the properties of the P3 transistor (the SS corner leaks much less than the FF corner). By connecting FGND to a common charge pool with voltage, , this charge pool can be used to dynamically switch active FGND wires and precharge BIT lines. Fig. 10 shows one way

FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to
FGND wires and precharge BIT lines. Fig. 10 shows one way Fig. 10. Connecting FGND to

Fig. 10.

Connecting FGND to charge pool at .

one way Fig. 10. Connecting FGND to charge pool at . Circuit for demonstrating leakage-based precharge.
one way Fig. 10. Connecting FGND to charge pool at . Circuit for demonstrating leakage-based precharge.

Circuit for demonstrating leakage-based precharge. For experimental

purposes, set 18 fF, Cb, , , .

Fig. 11.

to switch an FGND line with N-memory cells (P3 connections) on it (e.g., 1 word of 32 bits) to the charge pool with M FGNDs (e.g., M words), when in a standby state (Write Pulse ). An n-channel transistor can be used to subsidizethe charge pool

to guarantee that the voltage will not drop below a safe value during a precharge operation. The subsidization device is not necessary for FGND only predischarge.To see how leakage power can be harnessed for precharging BIT lines, consider Fig. 11. With Np off and no subsidization (Ns off), voltage will be at a steady state similar to point

(Ns off), voltage will be at a steady state similar to point V in Fig. 9.
(Ns off), voltage will be at a steady state similar to point V in Fig. 9.
V
V
off), voltage will be at a steady state similar to point V in Fig. 9. When

in Fig. 9. When Np is turned on, if starts at 0 V,

it will rise to a voltage that depends on the relative sizes of Cp and Cb, as well as the pull-up strength of both the P-leakers and the Ns subsidize device. The contribution of transistor Ns depends on its width and its threshold. For an HS nMOS as in Table I, it starts to contribute to below 0.95 V. Fig. 12 shows some specic results for the circuit of Fig. 11. Voltage was initialized to 0.9 V and to 0 V. Cb was charged to 0.75 V over 0.6 ns out of a period of 2 ns. is required to end up no lower than 0.9 V after 2 ns in order to be able to sustain the precharge operation for multiple cycles. Precharge power with no P-leakers is normalized to 1 (subsi- dization only). The top trace of Fig. 12 shows power as P-leaker pairs are added without using them for precharge. The bottom

trace shows P-leaker power only. The top trace is simply the sum of the precharge power plus the leakage power. The middle trace Together,shows power with both P-leakers and subsidization device contributing charge, as in Fig. 11. Up to about 32-K P-pairs (64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb) the leakage power portion (in

(64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb)
(64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb)
(64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb)
(64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb)
(64 Kb) the leakage power has dropped to a negligible level. At 128-K P-pairs (256 Kb)

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST Fig. 12. Several combinations of leakage and precharge

Fig. 12.

Several combinations of leakage and precharge power.

Together) is 37% of the Leak Onlyvalue. Starting at 48-K pairs, the subsidization device is not necessary to maintain at or above 0.9 V with precharged to 0.75 V. The trace Nop- ullup(Ns off) shows further power reduction. Leakage has been reduced to 27% of the Leak Onlyvalue. Total power with 128 Kb and Nopullupis about half of the separated precharge and leakage power. Fig. 12 was generated with FF (fast-fast) process parame- ters and a 5050 split of 0s and 1s. Under these conditions leakage may be higher than under typical or slow con- ditions. Because the subsidization device automatically turns on when drops, the circuit of Fig. 11 will work effectively under all leakage conditions. In a more advanced technology where leakage is higher, the previously mentioned techniques should also work very well. For any technology, the subsidiza- tion device Ns, should be chosen to compensate for uctuations in leakage. The strongest leakers are the N0 and P1 devices of Figs. 8 and 9 because they leak over a much wider voltage range than N1 and P0. Thus, even if there are over 90% cells with (N1 case), (P0 case), the other 10% will still maintain and steady-state values within 0.1 V of the arrows in Fig. 9. With inverted BIT lines, this is actually 90% data . Under such skewed conditions, more subsidization will be used for BIT line precharge.

more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W
more subsidization will be used for BIT line precharge. I V . W A CELL W

IV. WA CELL W RITE E FFECTIVENESS

During an STDcell WRITE operation, differential BIT lines are driven to complementary data values, so one side of the symmet- rical cell can be forced to zero. With WAcell, the data value (or its complement) must be driven onto a single bit line. When this value is zero ( ), there is no problem ipping the state of a memory cell without assistance. When it is one ( ), some form of WRITE assistance is required. One type of WRITE assis- tance is to skew the layout lengths and widths until WRITE be- comes stable [7]. For example, the cell in Fig. 2 requires a fairly long channel for N2 so it can be overpowered by N3. When reading and writing both occur through the same port, it is dif- cult to size transistors to account for all process parameter vari- ations, including , without loosing some noise margin. With the technique shown in Fig. 3, transistor P3 is acti- vated by WAZ, and pulls node QZ (which is at for this case) down by a predetermined amount. This weakens transistor

down by a predetermined amount. This weakens transistor 177 Fig. 13. N ; Beta P .
down by a predetermined amount. This weakens transistor 177 Fig. 13. N ; Beta P .
down by a predetermined amount. This weakens transistor 177 Fig. 13. N ; Beta P .
down by a predetermined amount. This weakens transistor 177 Fig. 13. N ; Beta P .

177

down by a predetermined amount. This weakens transistor 177 Fig. 13. N ; Beta P .

Fig. 13.

N ; Beta P . Without P3, Beta N3 has to be to ip Q from 0 to 1. SS ip time with a Beta N3 of 1.23 is 0.1 ns.

Write-1 (W1) effectiveness as a function of the strength of N3. Beta

(W1) effectiveness as a function of the strength of N3. Beta Fig. 14. Write-1 performance variation

Fig. 14.

Write-1 performance variation with reduced

N3. Beta Fig. 14. Write-1 performance variation with reduced . Beta N . N2, thereby permitting

. Beta N .

N2, thereby permitting node Q to be easily pulled up above the WRITE threshold by transistor N3. Fig. 13 shows how cell ip-time varies with the strength of N3. These were measured from 50% of WRITE select to the point where . Fig. 13 also shows how strongthe N3 transistor is relative to the minimum required to trip the cell. Another sensitivity of interest is the tolerance to variations in . Fig. 14 shows variations in cell WRITE time as is reduced. Tolerance well beyond the normal 10% is possible. This means that could be lowered as an additional viable

This means that could be lowered as an additional viable method to conserve power. Other SEIO
This means that could be lowered as an additional viable method to conserve power. Other SEIO
This means that could be lowered as an additional viable method to conserve power. Other SEIO
This means that could be lowered as an additional viable method to conserve power. Other SEIO

method to conserve power. Other SEIO cells, e.g., Fig. 2, are not

so tolerant to reductions in

WRITE power depends upon many choices from the cell level up to the system level. Since this papers focus is mainly cell ori- ented, only limited system level details are included. One impor- tant point is that local-BIT line partitioning is facilitated by the

global-BIT line that runs in parallel to the local-BIT line through the WAcell. Thus, local-BIT lines in a WAcell memory may be kept short relative to typical STDcell memories, cf. Fig. 15. There are typically 28 times as many STDcells on local-BIT lines as WAcells on local-BIT lines. Shorter local-BIT lines help to speed up the READ operation and reduce active power con- sumption. Since STDcell (Fig. 1) has complementary BIT lines, it is as- sumed for comparison that one BIT line must always be driven to during a WRITE operation. Also, both BIT lines get precharged to before either a READ or a WRITE (but one of them will already be at ). WAcell has a local-BIT line that operates in conjunction with the global-BIT line (GBIT),

WAcell has a local-BIT line that operates in conjunction with the global-BIT line (GBIT), . This

. This is also shown in Fig. 14.

WAcell has a local-BIT line that operates in conjunction with the global-BIT line (GBIT), . This
WAcell has a local-BIT line that operates in conjunction with the global-BIT line (GBIT), . This
WAcell has a local-BIT line that operates in conjunction with the global-BIT line (GBIT), . This

178

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007 Fig. 15. BIT-line structure of WAcell (A)

Fig. 15. BIT-line structure of WAcell (A) and (B) STDcell. Sense/WRITE (precharge) circuits interface GBIT to local-BIT lines. Global I/O connects to the next level.

to local-BIT lines. Global I/O connects to the next level. Fig. 16. the WRITE driver, then

Fig. 16.

the WRITE driver, then onto BIT through the transmission gate. BIT is parked

at voltage

at the end of a cycle. Sensing and precharge are not used during a

WAcell local-BIT line setup for writing. Data is driven onto GBIT by

line setup for writing. Data is driven onto GBIT by WRITE . Precharge can be done

WRITE. Precharge can be done through the transmission gate during a READ.

as shown in Fig. 16. There are some differences from STDcell that follow:

It is not necessary to precharge WAcells BIT line before writing. The correct data value may be directly written onto BIT through a transmission gate (to GBIT). According to [3], [15], BIT line leakage is minimized if they are per- mitted to self-biasto a steady state. As described in Section III, considerable additional power can be saved if BIT lines are parkedto a common voltage , as shown in Fig. 16.

GBIT lines can be directly driven to their correct starting voltage from the nal voltage at the end of their last active cycle. When writing a 1, GBIT will remain unchanged if the last value was also a 1. Similarly when writing a 0, GBIT will remain unchanged if the last value was a 0.

WAcell BIT wires do not couple with adjacent cell BIT wires because they are shielded by power lines on the boundary between cells. BIT wires do couple with GBIT wires. Both move in the same direction during READs and WRITEs, so the coupling is not destructive. References [3] and [15] do not discuss wire coupling effects on oating BIT lines. Parking helps to prevent spurious voltage excur- sions on oating BIT lines.

WAcell has a small dc power component during a WRITE, due to the pull-down of transistor P1 by transistor P3 (Fig. 3). To limit this power feature, WAZ and FGND should be activated with a short pulse. Some timing tech- niques are discussed in [13] and [14]. The Parktransistor in Fig. 16, connects a BIT line to the charge pool. This pool extends to the other three local-BIT

charge pool. This pool extends to the other three local-BIT TABLE IV N ORMALIZED W RITE
charge pool. This pool extends to the other three local-BIT TABLE IV N ORMALIZED W RITE

TABLE IV

NORMALIZED WRITE POWER FOR VARIOUS W0 AND W1 OPERATIONS RELATIVE TO STDCELL

AND W 1 O PERATIONS R ELATIVE TO S T D C E L L lines
AND W 1 O PERATIONS R ELATIVE TO S T D C E L L lines

lines that are also parked. Thus, any charge that is pooled to has already been paid for powerwise, but may be reused to re- duce or eliminate N-leakage power. In addition, if the voltage rises because charge is supplied faster than it is removed by N0 leakers, the cost of future precharging is also reduced because the BIT line voltage swing is reduced. This is demonstrated by the write-1 (W1) sequence in rows 46 of Table IV (and is dis- cussed in the next few paragraphs). The extent of the and charge pools is a system level consideration. For example, if several adjacent BIT lines are pooled together, some will be adding charge to the pools while others are removing charge from the pools. Table IV shows relative WRITE power for STDcell and WA- cell for various initial conditions and data values. It is assumed that there is a single columnof 128 STDcells per local BIT line in the conventional architecture and four sets of 32 WAcells per global-BIT line in the partitioned architecture. GBIT is long enough to span three sets of 32-WAcells, as shown in Fig. 15. BIT wire capacitance per micron was taken to be the same for both cells (STD and WA), as they both have the same pitch (1.3 m) and wire density (three wires per cell). The power is for leakage (50% 0/1), bus, and cell state change over a 1.5-ns in- terval with a WAZ and FGND pulse of 0.2 ns (increasing this pulse to 0.3 ns affects WRITE power by less than 2%). STDcell architecture uses the same amount of power for writing 0 or 1, which comes from two BIT line transitions, plus STDcell state change power, plus leakage power (64 pairs of N-leakers per BIT line). It is normalized to 1. Leakage contributes about 6%. The lowest partitioned architecture power is for W0 when GBIT, and BIT are already near 0. This is less than 1/16th of the standard architecture power. W1 uses the most power when GBIT and BIT are initially near 0, but this is still less than half of STDcell. The other values are near one-third and one-fourth of the standard architecture power. Table IV rows 2, and 46 show that W1 power reduces to less than 1/12th of the standard power when a sequence of 1s are written (approaching W0 performance). This is because the voltage gradually rises after each park operation ( column). As previously mentioned, computer caches are known to WRITE more 0s than 1s to memory [4], [18], [20] (e.g., 70%), so the polarity of data on BIT and GBIT should be chosen to take advantage of this. Writing a sequence 110110 uses an average power of 27.6% of the standard power while the se- quence 001001 uses 27.8% of the standard WRITE power. This implies that BIT line polarity should be the complement of

“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity
“ 001001 ” uses 27.8% of the standard WRITE power. This implies that BIT line polarity

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST Fig. 17. Worst case BIT line leakage. Fig.
Fig. 17. Worst case BIT line leakage. Fig. 18. READ-sense circuit with precharge and select.
Fig. 17.
Worst case BIT line leakage.
Fig. 18.
READ-sense circuit with precharge and select. GBIT spans 96 WAcells

as shown in Fig. 15(a). An output circuit connects to GBIT.

the data, as was also found in Section III. Stronger arguments for this are presented under READ conditions in Section V. In [18], an extra dedicated WRITE port was added to the STD- cell. For cases when their WRITE-BIT line did not require a tran- sition, power consumption was less than 1/30th of the STD- cell power (350-nm CMOS). This is better than row 2, column W0 in Table IV, mainly because of the larger voltage swings in 350-nm CMOS. With a WRITE-BIT line transition, power was over 94.5% of the STDcell power in [18]. This is higher than similar cases in Table IV.

V. WA CELL R EAD E FFECTIVENESS

In advanced technologies, BIT line leakage due to nonac- cessed cells can cause BIT lines to drift low after precharge [1][3], [11]. This is an asymmetric phenomenon, where many N3 (N4) transistors with across source and drain (N0-leakers) can pull a BIT line down from a full precharge of . This is more detrimental to differential bus sensing methods than to single-ended bus sensing methods [11]. There is a lower limit to this leakage dened by Fig. 17. In this case, a single SRAM cell with is selected. The memory cells N-pass device will hold the bus at an intermediate voltage value. If all of the other cells on the bit line are leaking to (perhaps 31 or 63 of them) the resulting voltage will be slightly less than the nMOS pull-up value ( ). One can take advantage of this and precharge to a similar reference voltage. With fewer cells on a local-BIT line (e.g., 32), pull- down leakage is correspondingly less. Fig. 18 shows a possible READ-sense circuit. To operate with a suitable switching point, the inverter structure and feedback mechanism need to be appropriately sized. Also, devices with different thresholds can be used, for example, to increase speed. For comparisons, [11] shows both dynamic and static NAND-style sense amps for a full precharge, [13] shows a latch-technique for differential sensing, and precharge,

a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.
a latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison.

179

latch-technique for differential sensing, and precharge, 179 Fig. 19. STDcell READ -sense circuit for comparison. A

Fig. 19. STDcell READ-sense circuit for comparison. A BIT line swings up during precharge and down during a READ. It is assumed that there are half 0s and half 1s for leakage purposes.

are half 0 ’ s and half 1 ’ s for leakage purposes. TABLE V N
are half 0 ’ s and half 1 ’ s for leakage purposes. TABLE V N

TABLE V

NORMALIZED READ POWER FOR VARIOUS CONFIGURATIONS, R0, AND R1 RELATIVE TO STDCELL

, R0, AND R1 R ELATIVE TO S T D C E L L while [12]

while [12] shows a current mode sense amp for a lower voltage precharge. To make use of the previously mentioned precharge scheme, the BIT line pull-up (Pre) could be directly to through either an nMOS or a pMOS device, or indirectly to via a transmission gate to GBIT, as in Fig. 16. READ simulations were performed to compare the circuit shown in Fig. 18 with the more standard circuit shown in Fig. 19, cf. [16] and [17]. Table V shows various power re- sults. The STDcell READ was limited to a voltage swing of , and normalized to 1 (this was 54% of the STDcell WRITE power). The worst case WAcell READ (R0) is 82% of the STDcell READ (GBIT initially 0, and V). Reading a 1 starting with GBIT at 0.6 uses only 23% of the STDcell READ. Reading a sequence of 1s as in row 2, and 46, shows that READ power gradually drops to 1/16th or less of the standard architecture power due to the increased voltage (voltage swings on the local-BIT line as well as GBIT are reduced). Reading a series 110110 uses an average power of only 34% of the standard, while the series 001001 uses 62% of the standard value. Thus, cache-like situations where there are more 0s than 1s should again store the complement of the data for minimal power. Reading alternating values 101010 uses 48% of the standard architecture power. Mixing READ and WRITE in the sequence: w1r1r0r1w1r0r1r1w0 uses 35% of the standard architecture power. Word select to output time for Fig. 18 is 0.55 ns, while for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage of needing only a threshold drop from before

for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage
for Fig. 19 it is 0.50 ns. The sense circuit in Fig. 19 has the advantage

180

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007 Fig. 20. Stability of READ reduces gradually

Fig. 20.

Stability of READ reduces gradually as Beta N3 increases. Beta N

.

being turned on. However, aggressive sense timing can lead to more sense power.

VI. READ CELL STABILITY

During a memory READ operation, BIT lines are precharged to an initial state, which is traditionally , but could be as low as . After precharge, a memory cell is engaged to ei- ther pull a BIT line down, or hold it close to (possibly raising it above) its initial value. When the state of node Q is initially at , the READ operation causes a maximum disturbance voltage on Q, . If is too high ( ) the cell state changes spuriously from 0 to 1. The ability of a cell to withstand this operation could be dened as . Fig. 20 shows how READ stability is affected by varying Beta N3, and hence,

READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner
READ stability is affected by varying Beta N3, and hence, . The worst case process corner

. The worst case process corner for READ stability was fast-N, fast-P (FF). READ stability increases a small amount if a lower precharge voltage is used [7]. A good choice for Beta N3 would lie between 1.2 and 1.4 in Fig. 20. Other authors have used , where is the re- quired amount of test current forced through N2, injected at node Q, to change the state of the cell. is the maximum READ current through a pass device (N3) [3]. This is also shown in Fig. 20. The ratio tracks in a similar way to

also shown in Fig. 20. The ratio tracks in a similar way to . VII. C
also shown in Fig. 20. The ratio tracks in a similar way to . VII. C
also shown in Fig. 20. The ratio tracks in a similar way to . VII. C
also shown in Fig. 20. The ratio tracks in a similar way to . VII. C
.
.

VII. CONCLUSION

A new 6T SRAM cell with a single BIT line architecture has been presented, featuring several power saving techniques. Standby leakage power can be cut in half (depending upon the distribution of 0s and 1s) compared with the STDcell. Large savings, 2 7 are possible (for either cell) when BIT or FGND are near the optimal points shown on Fig. 9, rather than being clamped at a high (low for FGND) precharge voltage. Active leakage power may be reduced to less than 27% of the standard architecture by pooling BIT line charge. This depends on process parameters as well as memory size. The pool re- ceives BIT line charge after a cycle (parking), while the pool is used to precharge BIT lines for reading. Standby leakage is also reduced by this technique, as the and time constants could be hundreds to thousands of clock cycles. Thus, after a burst of activity, whatever increased voltage is left on (or de- creased voltage on ) has already been paid for powerwise, so standby leakage continues to be reduced as and decay back to their self-biased value.

to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT
to be reduced as and decay back to their self-biased value. The combination of partitioned BIT

The combination of partitioned BIT line, charge pooling, and single BIT line can cut WRITE power from one-half to 1/16th of the standard architecture, with typical values around one-fourth. READ power is similarly reduced, with typical values of one- third of the standard architecture. Local-BIT line parking could increase memory access time a small amount as this is initiated at the end of a cycle where precharge is traditionally located. However, precharge should only be done on an as-needed basis at the beginning of a READ cycle to avoid high clamped BIT line leakage power. Relocating precharge should leave adequate time for parking. Since GBIT lines are not parked, some timing overlap is possible. Future work could investigate optimal ways to join and pools together across multiple memory blocks. For example, larger values reduce READ/WRITE power but joining many memory blocks together will reduce the voltage increments because the capacitance also increases. If leakage power reduc- tion is high enough, joining multiple memory blocks together will be benecial. As on-chip memory size increases into multimega bits, one could look beyond the memory for opportunities to utilize leakage power. Any use of the and pools for functions that would otherwise get power straight from and ush it to , could help cut leakage power (e.g., precharged buses). Extensive transistor threshold adjustments can be made to speed up a memory access, or further reduce leakage power [1][3]. These adjustments generally apply to WAcell as well as to STDcell.

adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation
adjustments generally apply to WAcell as well as to STDcell. A CKNOWLEDGMENT The Canadian Microelectronics Corporation

ACKNOWLEDGMENT

The Canadian Microelectronics Corporation arranged access to 130-nm CMOS process information under nondisclosure. The CMC and the Natural Sciences and Engineering Research Council of Canada have also facilitated several generations of SRAM prototyping.

REFERENCES

[1] R. W. Mann et al., Ultralow-power SRAM technology,IBM J. Res. Dev., vol. 47, no. 5/6, pp. 553566, Sep./Nov. 2003. [2] T. B. Hook, M. Breitwisch, J. Brown, P. Cottrell, D. Hoyniak, C. Lam, and R. Mann, Noise margin and leakage in ultra-low leakage SRAM cell design,IEEE Trans. Electron Devices, vol. 49, no. 8, pp. 14991501, Aug. 2002. [3] F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, Analysis of dual- SRAM cells with

full-swing single-ended bit line sensing for on-chip cache,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 2, pp. 9195, Apr. 2002. [4] N. Azizi, A. Moshovos, and F. Najm, Low-leakage asymmetric-cell SRAM,in Proc. ISLPED, 2002, pp. 4851. [5] H. Tran, Demonstration of 5T SRAM and 6T dual-port RAM cell arrays,in Proc. IEEE Symp. VLSI Circuits, 1996, pp. 6869. [6] C. Wang, C. Wu, R. Hwang, and C. Kao, Single-ended SRAM with high test coverage and short test time,IEEE J. Solid-State Circuits, vol. 35, no. 1, pp. 114118, Jan. 2000.

[7]

I. Carlson, Design and evaluation of high density 5T SRAM cache for

advanced microprocessors,M.S. thesis, Dept. Electr. Eng., Linkoping Univ., Linkoping, Sweden, 2004. [8] R. Hobson, A compact multiport static random access memory cell,U.S. Patent No. 5 754 468, May 19, 1998. [9] UMC, Taiwan, SoC solutions,2005 [Online]. Available: www.umc. com/English/process/b.asp [10] R. Hobson, Write-Assisted SRAM Bit Cell,U.S. Patent No. 6 804 143, Oct. 12, 2004.

HOBSON: A NEW SINGLE-ENDED SRAM CELL WITH WRITE-ASSIST

F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar,

M. Stan, and V. De, Dual- SRAM cells with full-swing single-

ended bit line sensing for high-performance on-chip cache in 0.13 m technology generation,in Proc. ISLPED, 2000, pp. 1519. [12] H. Kondoh, H. Yamanaka, M. Ishiwaki, Y. Matsuda, and M. Nakaya, An efcient self-timed queue architecture,in Proc. IEEE CICC, 1994, pp. 637640. [13] K. Mai, T. Mori, B. S. Amrutur, R. Ho, B. Wilburn, M. Horowitz, I. Fukushi, T. Izawa, and S. Mitarai, Low-power SRAM design using half-swing pulse-mode techniques,IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 16591671, Nov. 1998. [14] B. Amrutur and M. Horowitz, Fast low-power decoders for RAMs,IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 15061515, Oct. 2001. [15] S. Heo, K. Barr, M. Hampton, and K. Asanovic, Dynamic ne-grain leakage reduction using leakage-biased bitlines,presented at the ISCA-29, Anchorage, AK, May 2002.

[11]

[16] J. Greason, D. Buehler, J. Kolousek, Y. Ng, K. Sarkez, P. Shay, and A. Waizman, A 4.5 Megabit, 560 MHz, 4.5 GByte/s high bandwidth SRAM,in Proc. Symp. VLSI Circuits, 1997, pp. 1516.

[17]

R. Singh and N. Bhat, An offset compensation technique for latch type

[18]

sense ampliers in high-speed low-power SRAMs,IEEE Tran. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 6, pp. 652657, Jun. 2004. Y. Chang, F. Lai, and C. Yang, Zero-aware asymmetric SRAM cell for reducing cache power in writing zero,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 8, pp. 827836, Aug. 2004.

[19] C. Kim, J. Kim, S. Mukhopadhyay, and K. Roy, A forward body-bi- ased low-leakage SRAM cache: Device, circuit and architecture con-

siderations,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13,

no. 3, pp. 349357, Mar. 2005.

181

[20] A. Moshovos, B. Falsa, F. Najm, and N. Azizi, A case for asym- metric-cell cache memories,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 7, pp. 877881, Jul. 2005. [21] M. Kandemir, M. Irwin, G. Chen, and I. Kolcu, Compiler-guided leakage optimization for banked scratch-pad memories,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 10, pp. 11361146, Oct. 2005. [22] N. Kim, D. Blaauw, and T. Mudge, Quantitative analysis and opti- mization techniques for on-chip cache leakage power,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 10, pp. 11471155, Oct. 2005.

Syst. , vol. 13, no. 10, pp. 1147 – 1155, Oct. 2005. Richard F. Hobson received

Richard F. Hobson received the B.Sc. degree from University of British Columbia (UBC), Vancouver, Canada, in 1967, and the Ph.D. degree from Univer- sity of Waterloo, Waterloo, ON, Canada, in 1972. He has held various appointments with the Simon Fraser University Schools of Computing Science, and Engineering Science, Burnaby, BC, Canada, since 1974. His research interests involve low power memory, embedded processor design, digital signal processing, parallel systems-on-chip, and computer hardware acceleration. Challenging real-time em- bedded software applications are also of interest. In 1999, he co-founded Cogent ChipWare Inc., Burnaby, BC, Canada, and became Chief Technical Ofcer.