Вы находитесь на странице: 1из 14

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO.

5, MAY 2011

869

A Novel Column-Decoupled 8T Cell for Low-Power Differential and Domino-Based SRAM Design
Rajiv V. Joshi, Fellow, IEEE, Rouwaida Kanj, and Vinod Ramadurai
AbstractWe present a novel half-select disturb free transistor SRAM cell. The cell is 6T based and utilizes decoupling logic. It employs gated inverter SRAM cells to decouple the column select read disturb scenario in half-selected columns which is one of the impediments to lowering cell voltage. Furthermore, false read before write operation, common to conventional 6T designs due to bit-select and wordline timing mismatch, is eliminated using this design. Two design styles are studied to account for the emerging needs of technology scaling as designs migrate from 90 to 65 nm PD/SOI technology nodes. Namely we focus on a 90 nm PD/SOI sense Amp based and 65 nm PD/SOI domino read based designs. For the sense Amp based design, read disturbs to the fully-selected cell can be further minimized by relying on a read-assist array architecture which enables discharging the bit-line (BL) capacitance to GND during a read operation. This together with the elimination of half-select disturbs enhance the overall array low voltage operability and hence reduce power consumption by 20%30%. The domino read based SRAM design also exploits the proposed cell to enhance cell stability while reducing the overall power consumption more than 30% by relying on a dynamic dual supply technique in combination of cell design and peripheral circuitry. Because half-selected columns/cells are inherently protected by the proposed scheme, the dynamic supply High voltage is only applied to read selected columns/cells, while dynamic supply Low is employed in all other situations, thereby reducing the overall design power. A short bitline loading of 16 cells/BL is adopted to achieve high-performance low-power operation and lower bitline capacitance to improve stability. A newly developed fast Monte Carlo based statistical method is used to analyze such a unique cell, and 65 nm design simulations are carried out at 5 GHz. The feasibility of the cell and sensitivity to sense Amp timing has been proved by fabricating a 32 kb array in a 90-nm PD/SOI technology. Hardware experiments and simulation results show improvements of cell min over traditional 6T cells by more than 150 mV for 90 nm PD/SOI technology. Also experimental results based on fabricated 65 nm PD/SOI (1.6 kb/site 80 sites) hardware also asserts half-select disturb elimination and hence the ability to enable signicant power savings. The performance and speed are shown to be comparable with the conventional 6T design.

Fig. 1. SRAM cell scaling (dashed line) is limited due to process variation effect on cell yield.

Vdd

Index TermsColumn-decoupled, differential/domino read, half-select, low power 8T, SRAM, stability.

I. INTRODUCTION

EVICE miniaturization and the rapidly growing demand for mobile or power-aware systems have resulted in an urgent need to reduce power supply voltage (Vdd). However,

Manuscript received September 09, 2009; revised December 10, 2009. First published March 29, 2010; current version published April 27, 2011. R. V. Joshi is with IBM T.J. Watson Research Labs, Yorktown Heights, NY 10598 USA (e-mail: rvjoshi@us.ibm.com). R. Kanj is with IBM Austin Research Labs, Austin, TX 78758 USA (e-mail: rouwaida@us.ibm.com). V. Ramadurai is with IBM Systems and Technology Group, Austin, TX 78758 USA (e-mail: ramadu@us.ibm.com). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TVLSI.2010.2042086

voltage reduction along with device scaling are associated . Furthermore, inwith decreasing signal charge creasing intra-die process parameter variations, particularly can lead to random dopant threshold voltage variations large number of fails in extremely small channel area memory designs. Due to their small size and large numbers on chip, SRAM cells are adversely affected. This trend is expected to grow signicantly as designs are scaled further with each technology generation [1]. Particularly, it conicts with the need to maintain a high signal to noise ratio, or high noise margins, in SRAMs and is one of the major impediments to producing a stable cell at low voltage. When combined with other effects such as narrow width effects, soft error rate (SER), temperature, and process variations and parasitic transistor resistance, the scaling of SRAMs becomes increasingly difcult due to reduced margins [2]. Fig. 1 illustrates the saturation in the scaling trend (dashed line) of SRAM cells across technology generations. The plot indicates that the SRAM area scaling drops below 50% for 32-nm technology and beyond. Furthermore, voltage scaling is virtually nullied. Higher fail probabilities occur due to voltage scaling, and low voltage operation is becoming problematic as higher supply voltages are required to conquer these process variations. To overcome these challenges, recent industry trends have leaned towards exploring larger cells and more exotic SRAM circuit styles in scaled technologies. Examples are the use of write-assist design [3], read-modify-write [4], read-assist designs [5], and the 8T register le cell [6], [7]. Conventional 6T used in conjunction with these techniques does not lead to power saving due to exposure to half select condition [3], [4]. Column select/half-select is very commonly used in SRAMs to provide SER protection and to enable area efcient utilization

1063-8210/$26.00 2010 IEEE

870

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 2. SRAM half-select stability failure. When WL rises, node R in the half-selected mode cell rises (V ) as T2 and T4 form a resistive voltage divider. increases subthreshold leakage on T3. A high enough V will destabilize cell (dashed lines indicate ipping nodes). V

and wiring of the macro. Nevertheless, the use of column select introduces a read disturb condition for the unselected cells along a row (half-selected cells), potentially destabilizing them. In this paper we present a new column-decoupled 6T-based SRAM cell where read disturb is eliminated for column selected/half selected cells [5], [8]. The decoupling logic uses two additional devices and henceforth we will refer to the cell as the 8T-column-decoupled-cell (8T-CDC). We study the cell in the presence of two design styles: namely, sense Amp-based read peripheral circuitry that was typical for the 90-nm node, and domino read peripheral circuitry [9] for 65 nm and beyond. In a sense Amp-based read design, the read disturb condition is further minimized for the selected cells by the use of a sense-amp architecture which actively discharges the selected cell(s) BL to GND, thereby eliminating the source of disturb. Through a combination of accurate simulations and hardware (HW) data acquired from a 32 kb SRAM macro, a path towards low voltage SRAM operation of the cells is shown, and the design is shown to enhance read stability and half-select stability . problems thereby enabling improved However, process variations are increasingly affecting sense Amp designs in PD/SOI designs and it is natural to converge to domino-read designs [9]. In domino read-based designs, the column-decoupled cell still maintains guard against half-select cell disturbs. However, with the absence of read-assist feature in domino designs, we need to account for the read disturb on fully-selected cells. For this, we propose a dynamic dual supply header design that leverages the benets of the columndecoupled cell design and helps save power. As is the case with traditional dual supply techniques, the proposed header design maintains separate cell supply (Vcs) and logic supply (Vdd). However, unlike traditional techniques, the dynamic cell supply changes based on the column selection status. The new header design sets: 1) the selected cell columns at a voltage supply for imhigher than the logic one proved read stability and 2) maintains a low supply for half-se-

since half-select lect cells disturbs are not an issue for this design. Hence, we rely on the column-decoupled cell to enable a simplied low-power highperformance column-decoupled domino read based design. We implement the design using simplied bit-select logic and dynamic supply headers with shorter bitlines. In what follows, we provide a thorough analysis into the design modications compared to the traditional 6T dynamic supply designs. We also highlight the advantages this methodology brings in terms of lower power and yield improvements. This paper is organized as follows. Section II provides a review of column-select disturbs. The cell is introduced in Section III. In Section IV, we study the sense amplier based design and in Section V we present the sense Amp-based design yield, area and power simulation and hardware. Domino-based designs for 65-nm technology are studied in Section VI. Analysis and results for the dynamic domino based design are presented in Section VII. Conclusions are presented in Section VIII. II. BACKGROUND: COLUMN SELECT (HALF-SELECT) AND MEMORY DESIGNS Fig. 2 shows a typical array topology which employs a two-way column select condition. In this topology, the word-line (WL) activates both the selected and half-selected cells along the decoded row. However, only the read/write data from the selected cell is allowed to pass to/from peripheral logic, while the half selected cell is isolated. When the word-line is activated during a selected read or half-select condition, the pass-gate (PG) transistor and the pulldown (PD) device (transistors T2 and T4 in Fig. 2) form a resistor-voltage divider between the BL and the storage node of the cell. This causes the 0 node of the cell (node R in Fig. 2) which subto bump up to some intermediate voltage sequently increases sub-threshold leakage (on transistor T3 in Fig. 2) and causes discharge of the 1 node in the cell thereby

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

871

Fig. 3. Column-select decoupled 8T-CDC cell (in dashed rectangle) eliminates half-select condition. Selected column LWLE0 is high. Half-selected column LWLE1 stays due to ANDing GWLE with BDC1).

destabilizing it. The read disturb to the selected cell keeps diminishing as the cell read current discharges the BL capacipotential tance. The half-selected cell see the maximum in the case when the BLs are clamped to Vdd. This is the reason why there has been an industry wide trend towards shorter BL heights, thin cell designs and unclamping (oating) the bitlines of half-selected cells [2], [9] for 65 nm and beyond; in prior technologies, like 90 nm, the general trend was to use clamped curve is used as a measure bitlines. If the area under the of the read disturb witnessed by the cell, unclamping the BLs results in a mere 12% reduction over clamping for 128 cells/BL. While this benet increases to 25% for 32 cells/BL, a signicant area penalty is paid to achieve this. In the following section, an 8T-CDC which can result in larger area reduction of curve for the half-selected cell is presented. This dethe sign, together with special sensing technique or special dynamic headers can lead to improved yield for both selected and half-selected cells and lower operating voltages for the overall design. III. 8T COLUMN DECOUPLED CELL A. Proposed 8T-CDC Fig. 3 illustrates a new 8T-CDC SRAM cell (inside dashed rectangle) with a gated wordline which enables the decoupling of the column/half-select condition [5] hence eliminating halfselect stability fails. A localized gated inverter consisting of two additional transistors, T1 and T2, effectively perform a logical AND operation between the column select signal (BDT0) and the decoded row, or global wordline, GWLE. The output of the inverter is the local wordline signal (LWLE0). The local wordline is ON only when both the column and row are selected (i.e., for fully selected cells only); hence, as illustrated in the waveforms of Fig. 3, LWLE0 of the selected columned turns ON while LWLE1 of the half-selected column remains low. This ensures that the local wordline for only the selected cells is activated, thereby effectively protecting the half-selected SRAM cells from the read disturb scenario that exists in 6T cell due to wordline sharing. Alternatively, it is possible to swap the input

and supply pairs of the gated inverter; however this comes at the cost of extra delay stage and power. The advantages of the 8T-CDC cell are as follows: 1) conforming with traditional 6T requirements in terms of (a) allowing the designer to integrate it in a column select fashion and (b) offering/maintaining SER protection while 2) maximizing array efciency, 3) eliminating the read disturb to the unselected cells, and 4) reducing power with simplication in peripheral logic. Fig. 4(a) shows a layout view of the 8T-column-decoupled cell in a 90-nm PD/SOI technology. The two extra devices are integrated on top of an existing 6T cell to allow for easy cell mirroring and integration into an array topology. The addition of the two new transistors results in a cell area increase of 40% (all in -direction). Through the use of higher level metallurgy to wire in the column decode (BDC) signal, the growth to the -direction of cell was not impacted. The increase to the -dimension of the cell causes a proportionate increase to the BL metal capacitance while maintaining the original diffusion capacitance contributed by the 6T cell. Column decode signal integrated with higher level metal. Area penalty can be further reduced to 30% via use of 6T thin cell integration in Fig. 4(b); further reduction can be achieved by use of non-DRC clean devices. Fig. 4(b) and (c) presents the front end of the line (FEOL) and back end of the line (BEOL) layout view of 2 2 8T-CDC thin cell. The views illustrate how the recessed oxide (ROX) and power buses are shared. The area can be reduced further to 30% by utilizing thin cells as presented in this paper without degrading the bitline capacitance. B. Timing Advantages: Elimination of False Read Before Write During the write operation in conventional 6T SRAM, when the wordline precedes ahead the column-select in timing, then the cell starts reading the data [8]. When the bitline droops, false read before write happens [see Fig. 5(a)]. This is a disadvantage for conventional 6T SRAM. This particular drawback is overcome by the technique that is proposed here as illustrated

872

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 4. Layout view of the new 8T-CDC SRAM cell for a (a) typical cell and a (b) 2 layout view to show ROX and GND sharing.

2 2 thin cell front end of the line layout view and (c) back end of the line

Fig. 5. (a) For conventional 6T SRAM, during write, when the wordline precedes ahead the column-select, the cell starts reading the data [8]. When the bitline droops, false read before write happens. (b) This particular drawback is overcome by the 8T-CDC cell; the early wordline (GWLE in dashes) will be gated by the column select and thus false read before write does not happen.

in the Fig. 5(b); if the wordline arrives earlier than the column select it will be gated by the column select and thus false read

before write does not ripple through the bitlines to the evaluation logic.

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

873

Fig. 6. Wordline logic (a) for traditional 6T cell and (b) for the 8T-CDC cell. the wordline driver (inverting function) is eliminated as it is already accounted for inside the gated cells.

C. Logic and Circuit Requirements The difference between the 8T-CDC and 6T array design can be highlighted in terms of distributed versus global wordline drivers. Hence, for the 8T-CDC cell, the wordline driver (inverting function) is eliminated as it is already accounted for inside the gated cells. This helps reduce the area overhead and will be discussed in detail later. Fig. 6(a) and (b) illustrates the wordline logic for the 6T design and 8T-CDC design, respectively. The gated inverters transistor sizes are comparable to those of the SRAM cell. In the presence of large distributed loads, it may be desirable to optimize the NAND gate nfets further. This is feasible with minimal area penalty because the gated inverter sizes leave room for NAND gate optimization. Furthermore the gated inverters improve the local wordline slews as opposed to the case of conventional 6T wordline drivers with large pass gate loads where the slew rates are wire limited. IV. SENSE AMP BASED DESIGN The 8T-CDC cell together with read-assist sense amp designs [5] can mitigate the read disturb problem both for selected and half-selected designs. A. Read Assist Sense Amp-Based design Fig. 7 illustrates the 8T-CDC cell design combined with readassist sense Amp. The sense amplier is shared among multiple columns. In a typical sense Amp scenario, the bit switch (BDC), and the WL on the selected cells columns are turned off once enough margin is developed for the sense-amplier to accurately resolve the BL differential. This is done to save ac power (prevents discharge of BL to GND) and to speed up sense-time (smaller capacitance for sense-amplier to discharge). For this scenario, only the PFET transistor exists (solid bit switch PFET Fig. 7) and it closes during Sense to save power and perform faster sense. In a read-assist scenario the bit-switch PFET is converted to a complementary (dashed line) NFET and PFET bit-switch pair. The pair is kept open during the entire WL active phase. Consequently, the sense-amp and the cell discharge the BL completely during a sense-read operation [5].

Fig. 7. Gated 8T-CDC cell design combined with read-assist sense Amp [5]. In a typical scenario, PFET bit-switche closes during sense to save power and perform faster sense. In a read-assist scenario true/comp (dashed line) NFET and PFET bit-switch pair are kept open during sense. Hence the sense Amp sees the BL capacitance; it discharges the capacitance to GND, and the cell data is written back. This helps minimize the amount of read disturb charge.

Hence the sense amplier sees the full BL capacitance during a read operation; it discharges the capacitance to GND, and the cell data is written back. This helps minimize the amount of read disturb charge induced onto the cell from the bitlines. It can be readily seen from the waveforms in Fig. 8 that discharging the BL capacitance completely to GND during a read operation has curve dramatically. In fact a the effect of reducing the typical sense-amplier scenario for a weak cell ipping during read disturb under the impact of process variation is illustrated in Fig. 8(a); the storage nodes of the cell ip due to noise injected from the bitlines (L in dashes and R in dots). Fig. 8(b) illustrates how the discharge of BLT0 fully using read-assist scenario enables recovery of storage nodes of the same cell; when SET turns ON, BLT0 is discharged and nodes L and R restore state. It should be noted that the benet seen is strongly dependent on: 1) the duration of the LWLE pulse after SET res; 2) LWLE to SET timing (amount of bit-line margin allowed to develop); 3) BL height; and 4) time taken by sense-amp to discharge BL capacitance to GND. Nevertheless, our studies have shown signicant reduction in read upset noise over the clamped condition with a read-assist scenario (27%50%). Finally, this scheme will not yield any benets when compared with sense-amp architectures which shut off the wordline signal immediately following SET activation. However, this is extremely challenging to achieve as the minimum wordline pulse width needed is determined by the worst-case cell write window/time (considerably larger) and not the read time. Moreover, the LWLE and consequently the storage nodes of the half-selected cells are not activated or disturbed with the 8T-CDC scenario, and hence read disturb is minimized in the selected mode and eliminated in the half-select mode for 8T-CDC read-assist design; this enables improving the yield and lowering the overall design operating voltage as we will see in the following section.

874

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 8. Stable 8T-CDC without read-assist for (a) a weak cell: nodes R and L can ip contents due to charge owing from BLT0. (b) With read-assist, when we sense drop in BLT0, SET signal arrives and it discharges the bitline BLT0 to ground minimizing risk of erroneous node ipping. Both nodes L and R are restored to original values. This also enables faster read operation.

V. SENSE-AMP APPROACH ANALYSIS AND RESULTS To effectively evaluate the 8T-CDC cell, it was compared to two versions of a 6T cell within the same read disturb mitigating system. The rst was a default 6T (106 cell) and the second a 40% larger 6T cell (149 cell); the latter is intended to compare functionality gains under similar design area constraints for 6T and 8T-CDC. It should be noted that the cell devices within the 8T-CDC cell (PG, PD, PU) are identical to the 106 cell, while the 149 cell has devices that are 40% larger than the 106 cell. For each cell type, simulations were run using 90-nm PD SOI technology to determine the cells sigma to fail as a function of voltage. Simulations were also performed to investigate the effect of SET timings and BL height on each cell type. Finally, area and power tradeoffs were studied to determine optimum design points. Also a chip was fabricated and hardware results corroborate well with the simulations. A. Simulation Methodology Although certain implementation specics such as redundancy must be considered, typical SRAM cells need to function within a ve sigma process distribution to guarantee less than 1 per million fail rate. Hence statistical analysis that takes into consideration six transistor Vt distributions of the SRAM cell must be considered. For the 90-nm design analysis was performed a model that sensitizes and assigns weights to each transistor in the cell based on a failure metric [11]. For a given operating point condition a binary search algorithm determines the sigma value that causes cell failure. For the 65-nm design as the technology trends become more complex we introduce and rely on a more sophisticated fast Monte Carlo-based methodology [12]. Because the SRAM cells in question were designed and fabricated in a PD/SOI process, a dynamic stability margin metric

(DSM), which includes the oating body dependence was used instead of the traditional static noise margin metric [1]. Also, the write before read operation is performed to invoke the history effect of the oating body SOI transistors and to determine the minimum WL pulse width that would be needed to ensure a 5 sigma skewed cell for write passes. The latter is done to ensure that the WL is not kept activated longer than necessary. A functional cell was dened as having the ability to overwrite its existing data state with opposite data and to maintain this new data state during a subsequent read operation. For the following simulations, the voltage was varied from 0.3 to 1.1 V. The individual threshold voltage distributions were extrapolated from hardware. B. Simulation Results In the following analysis, cell and logic supplies are assumed . is minimum supply needed the same to maintain the desired cell yield. Fig. 9 shows the cell yield in sigma values for the three different cell options. For a BL height of 128 cells, clamped half-select condition, and a 10% of Vdd BL differential SET timing, the half-select stability fails dominate in 6T. The 8T-CDC cell shows a marked improvement of 200 mV when compared to 6T (106 cell) and 80 mV compared to 6T (149 cell). The comparison was performed at the 5 sigma cell yield point. For unclamped (oating BL) half-select, the 8T-CDC curve remains unchanged; half-select is not an issue for the 8T-CDC cell, and read stability graph remains the ( 30 mV) for the two same. A small improvement in 6T cells is noticed due to relaxation in the half-select conditions for the 6T. This improvement increases for shorter BL heights ( 50 mV for 32 cells/BL) [5]. The effect of SET timing (for 8T-CDC) on yield sigma was investigated by advancing the SET signal earlier during the read

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

875

Fig. 9. Cell yield in sigma values versus Vdd. Clamped bitlines; load 128 cells/ bitline. Half-select stability fails dominate in 6T. Even sized-up 6T (6T-149) increase of 80 mV and the regular 6T requires an increase of requires Vdd 200 mV.

Fig. 11. Unclamped bitlines: the half-select problem still dominates in sized-up of 0.6 V the 6T(149) must operate with 32 6T-149 cell. For a target Vdd cells/bitline, whereas the 8T-CDC offers multiple bitline height options.

TABLE I AREA OVERHEAD AND SAVING FOR THE 8T-CDC VERSUS 6T(106) AND 6T(149) IN THE READ-ASSIST SENSE AMP TOPOLOGY

Fig. 10. It is possible to further improve Vdd of 8T-CDC with earlier set arrival (due to lowering margin criteria of bitline drop voltage).

cycle. Fig. 10 depicts this data for three different SET timings (10%, 7%, and 5% of supply BL differential). 8T-CDC cell improvement between 70 and 130 mV was observed compared to 10% margin (for the 7% and 5%, respectively); . again we assumed 5 sigma yield point as the target for for 6T Advancing SET timing will have no effect on versions as half-selected cells will not derive any benet from the read disturb mitigating topology. Finally, the dependence of was invesBL height for the unclamped case on cell tigated and the results plotted in Fig. 11. It can be seen that to of 0.6 V (with 5 sigma yield), the 6T-106 achieve a cell cell cannot be used, the 6T-149 cell offers only one design option (32 cells/BL), while the 8T-CDC cell offers several options to the designer (32 to 128 cells/BL with 10% to 7% BL margin SET timings). C. Area and Power Tradeoffs Table I shows the area comparison of the 8T-CDC cell system with the two 6T options for a 32 kb macro. The 8T-CDC system is 7% larger than the 149 cell option for an equivalent access time comparison. This occurs as: 1) the 8T-CDC bit pitch circuitry (sense-amp and BDC devices) needs to be sized larger to

enable effective discharge of the BL capacitance to GND and 2) integration of a new BDC driver to route the BDC signal to the cells on a per column basis. Furthermore, the periphery of the 149 cell macro (WL driver, decode, pitch circuits) can be detuned to save area to account for the additional read current/performance delivered by the larger cell devices. The WL driver area for the 8T-CDC macro is 0 as the decoder now drives transistors T1 and T2 in the cell directly (these devices act as a distributed local WL driver to the cell). A detailed comparison of the different units is available in [5]. Table II compares the normalized total power consumption (ac and dc) of the 8T-CDC cell macro to the two 6T options, 6T(106) and 6T(149), which is 40% larger cell compared to 6T(106) cell. The power was simulated at two conditions: (a) 1.0 V with matched performance 1 GHz cycle time and . A 50% read and 50% write pat125 C and (b) tern was assumed. The power is for mux 8 column-select option. The primary ac power penalty for the 8T-CDC system arises from: 1) the BL capacitance is completely discharged to GND during a read operation and 2) the new BDC signal needs to be activated for selected columns. However, this penalty is mitigated by the fact that only one of columns is discharged. For 1024BL, instance, in a 32 kb macro (organized as 32 WL mux 8 option), only 128 columns BL are discharged to GND for a mux 8 option [5]. Furthermore, the 8T-CDC compares very favorably to 6T options for short BL heights because the voltage droop approaches GND (for a 6T cell) as the BL capacitance decreases. For the same supply conditions, 18% ac power improvement is seen for the 8T-CDC and the dc power of

876

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

TABLE II NORMALIZED POWER CONSUMPTION WHEN OPERATING AT MATCHED PERFORMANCE: (a) Vdd = 1.0 V; BASED ON [5]; (b) Vdd = Vdd FOR LOW-POWER LOW-FREQUENCY OPERATION; USING THE LOWEST ALLOWABLE CELL OPERATING VOLTAGE. FOR ALL THE DECODE MUX 8 CONDITION. THIS INCLUDES 32 kb MEMORY AND PERIPHRAL LOGIC. NOTE THAT WE MEASURE THE TOTAL POWER IN TERMS OF DC AND AC POWER. BY GUARDING AGAINST HALF-SELECT VALUES AND HENCE SIGNIFICANT POWER SAVINGS. THE WIDER 6T(149) DEVICES UPSETS, 8T-CDC CELLS ENABLES OPERATION AT LOWER Vdd IMPACT POWER CONSUMPTION SIGNIFICANTLY

Fig. 12. Chip layout of 32 kb 8T-CDC macro. Actual die photo.

the 8T-CDC macro is only marginally higher than the 6T (106 cell) macro as the penalty of the two additional transistors in the 8T-CDC cell (which act as local WL drivers) is paid for in the WL drivers of the 6T cell. The 6T (149 cell) macro however experiences signicantly more power consumption due to its larger cell device widths and associated driver circuitry. For low-power operation mode, when we take into consideradue tion the fact that the 8T-CDC will operate at lower to improved yield, we nd that the 8T-CDC cell enables upto 50% power improvement for the same yield target compared to the 6T-cell. D. Hardware Corroboration A 32 kb macro using the 8T-CDC cell was fabricated and tested in a 90-nm PD/SOI technology. Fig. 12 shows a chip capture of the nal macro. Fig. 13(a) shows the experimental data (schmoo) measured from the test chip. The experiment was conducted using a single supply and a clamped half-select condition. The comparison was made between the

Fig. 13. (a) Vdd Schmoo for 32 kb macro. Cycle time 5 ns. (b) Chip hardware-based access time measurements from clock to the pad output corroborate well with (c) simulation access time measurements.

8T-CDC cell and default 6T cell (106). A 150 mV cell improvement was observed, correlating closely with predicted improvement of 200 mV from Fig. 9. Hardware access time measurements were also obtained and the access time corroborate well with access time simulations as illustrated in Fig. 13(b) and (c). The bitline cap increase is not signicant enough to impact the sense amp differential-developed for sensing. As a result the access time degradation is negligible. VI. DOMINO READ-BASED DESIGN In the following sections, we discuss the advantages of the proposed 8T-CDC design in the presence of domino read based architectures as well as the rational behind these architectures.

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

877

TABLE III DYNAMIC SUPPLY SETTINGS FOR COLUMN-DECOUPLED CELLS VERSUS STANDARD 6T DESIGNS

B. Dynamic Supply Technique for Column Decoupled Topologies The decoupling of half-selected columns is capable of eliminating half-select fails. Combined with dynamic supply techniques read disturbs can be minimized while enabling signicant power savings. Dynamic supply techniques for 6T designs were proposed in [3]. The proposed architecture [3] involved 128 128 banks with 8 interleaved columns. During a read cycle all the columns of the selected bank (i.e., the selected-cell along with half-selected cell columns, regardless of which column is actually undergoing read) are switched . This enhances both the cell read to performance and cell (half)-select stability. During write, the selected-column is pulled down to while the remaining half-selected columns are maintained at . For the proposed column decoupled scheme, only the during columns of the selected cells need to be raised to a read operation. Due to decoupled design properties, the . Likewise, half-selected columns can be maintained at during write, both the selected and half-selected columns can . Table III summarizes the dual supply be supplied with requirements for the two cells: a typical dynamic supply based 6T cell and the proposed 8T-CDC cell. accessed set ( is the number of In general, in an rows, is the total number of columns), columns are combined into groups; being the number of decoded columns per selected columns and group. There are half-selected columns. Hence, for the proposed 8T-CDC dynamic header design the bank access power can be minimized signicantly. Fig. 16 illustrates the different conditionings of a given cell during a set (bank) access. Standby cells sharing the for the 8T-CDC design half-selected columns are held at and are referred to as stand2 in Fig. 16, whereas stand1 cells during read share the same supply as the accessed cell ( during write). and Table IV categorizes and compares power contributions by the different cell types in the selected bank for the 6T and 8T-CDC designs assuming matched area for the global and distributed drivers. It is important to highlight the following power advantages. 1) Halfselect cells in 8T-CDC design have their pass gates off and no ON currents ow through these transistors (unlike 6T cell during charge ow from bitlines onto the cell) and thus their power is equivalent to that of a standby cell. 2) Distributed driver power can be justied by the absence of the large wordline driver. Besides, distributed drivers in unselected columns consume no power because their supply is grounded. This is opposed to the 6T where the equivalent lumped driver is always comsuming signicant power. 3) All

Fig. 14. 8T-CDC-decoupled cell memory cross-section for domino read designs.

As technology scales, sense-Amp devices suffer from Vt-mismatch and scaling becomes difcult particularly for PD/SOI technology designs due to hysteretic Vt variation. Thus, it is preferred to use large signal domino read circuitry [9]. During a domino read, the dual rail signals from the cell are amplied by skewed inverters to full rails. This eliminates the dependency on bitline differential which can be highly sensitive to Vt-mismatch and we refer the reader to [9] and the references within for a detailed overview of domino based read designs. However, the SRAM cell read disturbs and half-select problems are still critical in a domino read design. In what follows, we study the advantages of combining a decoupled half-select column design cell design with dynamic supply techniques for a 65-nm PD/SOI domino read-based design. Our goal is to exploit the elimination of half-select disturbs together with dynamic supply techniques for optimal yield and power. For this purpose, we propose new header designs for the dynamic supply suitable for the 8T-CDC cell. An overview of the targeted domino-read memory cross-section is illustrated in Fig. 14. Next, we revisit traditional circuit and peripheral logic for 6T domino designs and propose simplications/modications as well as novel dynamic header designs suitable for low-power 8T-CDC cell design.

A. Domino Circuit Logic Circuit and Area Requirements In the 8T-CDC cell, the bit-select function is being ANDed with the global wordline using the local gated inverter. Thus, gating the write control signal as well as the footer devices with the bit-select signal in the column-select circuitry is eliminated compared to the case of traditional domino read circuits. Fig. 15(a) illustrates a traditional domino read and bit-select circuit; the simplied design will eliminate the devices inscribed in circles. The simplied 8T-CDC-specic domino circuit is shown in Fig. 15(b). Such logic savings help reduce the total area overhead to below 22% using thin cell approach [9] unlike [5] (despite the fact that cell area is increased by 30%). Furthermore the total area of the distributed wordline driver is reduced as opposed to one large global driver. This leads to power savings and enhanced stability as will be seen in the results section.

878

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 15. Domino read and bit select circuit for (a) traditional 6T [9] (b) for gated/column-decoupled 8T-CDC-cell. Transistors in dashed circle are eliminated for 8T-CDC design. Node WRITE is a local version of the write enable signal (complement of WC). TABLE IV 6T AND 8T-CDC CELL POWER CONTRIBUTION COMPARISON ASSUMING MATCHED DISTRIBUTED AND GLOBAL DRIVER. BANK IS , -INTERLEAVED. HALF-SELECT POWER AND STAND2 CELL POWER ARE MUCH LESS IN 8T-CDC DESIGN. SUBSCRIPT LO IMPLIES OPERATING AT Vcs

N 2M

Fig. 16. Terminology for an -interleaved bank. Cell: the accessed cell in the selected column. Stand1: the standby cells in the selected column. Half: the half-selected cells. Stand2: the standby cells in the half-selected columns.

N2M K

half-select columns have their supply at versus for the 6T approach. To enable the 8T-CDC-specic dynamic supply requirements we designed the headers in Figs. 17 and 18. Fig. 17 illustrates a typical dynamic supply header design. The header relies on two PFETs to appropriately select the cell supply. The PFETs are and its complement ( being the th column); gated by To generate these two gating signals we rely on a NAND gate and

an inverter. The cell supply can be set to either or (usually is set to 0.10.15 V). The logic is maintained at Vdd and the word-line decoder supply is . A column is selected when BDT is high. maintained at Write control (WC) signal triggers writing when it goes low. is only selected during read. The alternative header embodiment in Fig. 18 utilizes an NFET and a PFET for the virtual cell supply selection. It ento be explicitly available ables eliminating the need for in the header. Rather the virtual cell supply is conditionally boosted by relying on SOI body coupling mechanisms [13]. This header therefore enables dual column supply with a single supply thereby reducing power. Hence, both FETs share the , and the two FETs are gated by the same same supply

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

879

Fig. 19. 8T-CDC design simulation at 5 GHz. Vcs boosting during read for header in Fig. 18 is illustrated. For node information refer to Figs. 15 and 17. Fig. 17. Dynamic supply header design for the 8T-CDC decoupled design. Vcs = Vdd and Vcs = Vdd + 0:15, where Vdd is the logic supply; word-line decoder supply is maintained at Vcs . A column is selected when BDT is high. Write mode corresponds to WC signal low. Vcs is only selected during read.

voltages for high-performance applications and lower Vdd operability; thus one can further optimize the cell Vts because the bitline leakages are minimized due to the elimination of half-select disturbs. VII. DOMINO-BASED APPROACH ANALYSIS AND RESULTS In this section, we compare and analyze the 8T-CDC cell and traditional 6T thin cell with dynamic dual supply using dominobased read sensing. A. Simulation Environment and Methodology Simulation environment conditions are similar to those of Section V-A. This time the cell under study is built in 65-nm PD/SOI technology. The threshold voltages of the neighboring transistors of the SRAM cell are again treated as Gaussian random variables whose standard deviation values are extrapolated from hardware. For the proposed column decoupled schemes, the local gated-inverter transistors are also subjected to threshold voltage variation because their dimensions are comparable to those of the cell transistors and undergo more variation in 65-nm technology. For this, we employ a more sophisticated fast Monte-Carlo statistical simulation methodology [12] to study the read, write, and stability yield of the 65-nm design. A cell is deemed stable if during read 0, the cell contents are not ipped. It is deemed writable if at the operating frequency we are able to ip the contents of the cell during one cycle. Unlike sense amp-based circuitry where we judge a cell readable based on a desired bitline differential, for large signal domino-read designs, a cell is readable when the output of the domino circuit (RDC in Fig. 15) fully rises during a specied cycle time. For domino sensing cell topology with shorter bitlines (16 cells/bitline) with dynamic power supply is used unlike sense Amp based dynamic power with large cells on bitline in bulk technology [3]. For the simulations, Vdd values are varied between 0.4 and 1.0 V. B. Simulation Analysis and Experimental Results and For purposes of our analysis, we set . We also dene four supply modes of

Fig. 18. Enhanced dynamic supply header design that exploits SOI body effects (body coupling effect can lead to Vcs boosting [13]). Sufx j corresponds to the j th column. Low Vt pfets are used for design. Vcs = Vdd. 1 is usually appreciable and we have Vcs + 1 comparable to Vcs .

signal . Prior to the read cycle, the source and the drain . When the gate of the NFET of the NFET device are at device switches, the gate-to-body coupling boosts the nMOS as we see in the simulations of source node to Fig. 19. Another novel feature of this header is the reduction in the number of transistors to create the AND gate using two inverters with four transistors as opposed to the traditional 6 transistor logic AND gate. Hence, we rely on this concept to further simplify our dual supply header and obtain the circuit shown in Fig. 18. The table in Fig. 18 summarizes the dynamic supply header functionality; the rst row is a dont care status. For purposes of our designs, we adopt a shorter 16 cells/BL 96 to enable the bitline load and size our banks to be 16 dynamic supply to switch faster. The simulations of Fig. 19 illustrate the ability of the design to operate at 5 GHz for read and write operations. Finally, it is worth mentioning that one can tradeoff the power reduction in the 8T-CDC by relaxing the cell device threshold

880

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 20. Dual supply helps improve the cell stability Vdd

Fig. 23. Half-select cell power is negligible in 8T-CDC cell. Cell wordline is off, and cell supply is lower than traditional 6T cell. by 220 mV.

Fig. 21. Dynamic dual supply helps reduce writability compared to simple dual supply.

Vdd

Fig. 24. Total power of selected bank and the corresponding peripheral logic at 3 GHz frequency. Both selected bank power and peripheral logic powers are much less than 6T (both designs assume dynamic supply). Peripheral logic (e.g., decoder, bitselect, dynamic headers, . . .). by 200 mV

Fig. 25. Fabricated chip layout and the chip photograph. Fig. 22. Vdd of the dynamic dual supply 8T-CDC (and 6T) design is determined by read stability. Half-select stability yield is ideal in the 8T-CDC enabling Vcs drop for the 8T-CDC half-selected cell. Dashed line implies desired yield.

operation. The fourth mode represents the proposed dynamic supply design as set by the header logic.

1) Single supply: cell at Vdd; wordline and BDT at Vdd; logic at Vdd. ; wordline and BDT at ; 2) Dual supply: cell at logic at Vdd. (read, half-select), 3) 6T Dynamic dual supply: cell at (write); wordline and BDT at ; logic at Vdd.

JOSHI et al.: NOVEL COLUMN-DECOUPLED 8T CELL FOR LOW-POWER DIFFERENTIAL AND DOMINO-BASED SRAM DESIGN

881

Fig. 26. Hardware data schmoo corresponding to 1.6 kb array for half-select fails show improvement of the 8T-CDC due to elimination of half-select disturbs. The cells are subject to half-select conditions at lower voltages; read is performed at higher voltage.

Fig. 27. Hardware-based access time screen shots for 6T and 8T-CDC cell indicate neligible difference. The bitline height is not impacted in the thin cell architecture and the wordline is optimized based on the distributed drivers. ADDR3 and ADDR4 are address lines. DATA and ENABLE signals are also plotted. Global reset (RS) and bitline select (BL_RS) are also presented. OUT4 and OUT5 are the far-end and near-end outputs, respectively. Access time is measured from the clock signal, CLK_WORD, falling edge to OUT4(5). Worst-case far-end delay are measured as 190 ps for 6T cell and 194 ps for the 8T-CDC cell. Vdd = 0.95 V.

yield for the 8T-CDC is found to be ideal despite operating the . In fact, the decoupled design rehalf-selected cells at laxes the supply constraints on the half-select cells and we can maintain them at even lower supplies. This in turn enables signicant power savings as will be demonstrated in the following section. Finally, it is worth noting that the 6T design with dynamic dual supply header (mode 3), shares the same read stability, readability, and writability yields as the 8T-CDC design; however, the half-select stability yield of the 6T design is similar to its read stability (half-select cells are exposed to node upset in 6T cell) and hence it cannot employ additional power savings as is the case with the 8T-CDC cell with the specialized header design. C. Area and Power Tradeoffs Figs. 23 and 24 present the simulated power gains of the dynamic 8T-CDC design compared to the dynamic 6T design. As shown in Fig. 23, the half-select power is almost eliminated in 8T-CDC design (100 reduction); the cell wordline is off and the cell supply is lower than that in traditional 6T, so it behaves like a standby cell only with lower supply. Fig. 24 compares the normalized power of the logic and memory parts of the 8T-CDC and 6T cell in a 4-interleaved 96 16 bank; Frequency was set at 3 GHz. The 8T-CDC peripheral logic power and accessed memory bank power are reduced by 30%40% over the plotted Vdd range (compared to traditional 6T) despite the fact that bit ; note that the peripheral logic power select is operating at accounts for dynamic headers, bit-select, and decoders. Power savings can be even larger for an 8-interleaved bank. Finally, as discussed in Section VII-A, for a thin cell-based design and due to additional logic simplications we obtained 22% overhead for a 16 1.6 kb 8T-CDC column-decoupled memory design. D. Hardware Corroboration A 1.6 kb macro using the 8T-CDC cell was fabricated and tested in a 65-nm PD/SOI technology. Fig. 25 shows the fabricated chip layout and the corresponding chip photograph. Fig. 26 shows the hardware data schmoo corresponding to a 1.6 kb array. Stress conditions measuring half-select disturb show improvement of the 8T-CDC due to elimination of half-select disturbs using multiple write read patterns for 0 and 1. In bumped patterns are used (write order to determine the at higher and half-select at lower voltages and then read at higher voltages to make sure the cell content is intact). Finally, Fig. 27 illustrates hardware measured waveforms for a Write

4) 8T-CDC Dynamic dual supply: cell at (read,), (write, half-select); wordline and BDT at ; logic at Vdd. First we study the 8T-CDC design yield improvement due to adopting the proposed header design (mode 4). Similar to previous sections, the cell yield is represented in terms of the equivalent sigma of a standard normal distribution. Fig. 20 illustrates the improvement in readability and stability yields of cells in selected columns due to adopting (dynamic) dual supply design (modes 2 and 4) compared to traditional single supply designs (mode 1). A gain of 220 mV (due to modes 2 and 4) in is seen for a target yield of 5 sigma. Fig. 21 demonstrates the writability yield improvement for the cells in selected columns due to mode 4 compared to modes 1 and 2. We note 120 mV due to simple dual supply techniques improvement in (mode 2) (compared to single supply mode 1). We also note an when dynamic dual additional 190 mV improvement in supply techniques (mode 4) are employed for 8T-CDC cell write operation compared to simple dual supply (mode 2 that xes cell regardless of read or write operation). supply at Fig. 22 highlights the advantages and summarizes the yield trends of the different design metrics (readability, writability, read stability, and half-select stability) for the proposed dynamic dual supply design (mode 4). We conclude from Fig. 22 that for the 8T-CDC design is delimited by the read stability. However, and most importantly, the half-select stability

882

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

cycle followed by a Read cycle. Access times for both far-end and near-end outputs are measured. We nd negligible difference, around 4 ps, in the access time hardware measurements for the 6T and 8T-CDC. Also simulations shown in Fig. 19 illustrate that the 8T-CDC-based design can function with 5 GHz cycle time maintaining similar speeds as that of 6T. VIII. CONCLUSION We studied a novel 8T-CDC column-decoupled SRAM design. The half-select free design enables enhanced voltage scaling capabilities, and 30%40% power reduction in comparison to standard 6T techniques. This study involved a 90-nm read assist-based sense Amp design, and a 65-nm domino read-based design with dynamic supply capabilities. The 8T-CDC cell enables signicant power savings in terms reduction for read-assist design, and half-select of column power reduction in dynamic dual supply domino read designs with the aid of new header designs. New simplied local evaluation logic and shorter bitlines are employed for the domino read-based design. Simulations showed high performance for the proposed design using shorter bitlines, and dynamic header circuit. Measured hardware data from fabricated chips in 90- and 65-nm PD/SOI technology shows improved stability and yield, and voltage scalability due to the elimination of half-select disturb with comparable access times as that of 6T-based designs. REFERENCES
[1] R. Joshi, S. Mukhopadhyay, D. W. Plass, Y. H. Chan, C.-T. Chan, and A. Devgan, Variability analysis for sub-100 nm PD/SOI CMOS SRAM cell, in Proc. 30th Eur. Solid-State Circuits Conf., Sep. 2004, pp. 211214. [2] L. Itoh, K. Osada, and T. Kawahara, Reviews and future prospects of low voltage embedded RAMs, in Proc. IEEE Custom Integr. Circuits Conf., 2004, pp. 339344. [3] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, and M. Bohr, 3-GHz 70 MB SRAM in 65 nm CMOS technology with integrated column-based dynamic power supply, in ISSCC Dig. Tech. Papers, Feb. 2005, pp. 474475. [4] M. Kellah, Y. Yibin, S. K. Nam, D. Somasekhar, G. Pandya, A. Farhang, K. Zhang, C. Webb, and V. De, Wordline & bitline pulsing schemes for improving SRAM cell stability in low-Vcc 65 nm CMOS designs, in Proc. VLSI Circuits Symp., 2006, pp. 910. [5] V. Ramadurai, R. Joshi, and R. Kanj, A disturb decoupled column select 8T SRAM cell, in Proc. CICC, 2007, pp. 2528. [6] W. Henkels, W. Hwang, R. Joshi, and A. Williams, Provably correct storage arrays, U.S. Patent 6 279 144, Aug. 21, 2001. [7] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch, Stable SRAM cell design for the 32 nm node and beyond, in Proc. IEEE Symp. VLSI Technol., 2005, pp. 128129. [8] R. Joshi, Random access memory with stability enhancement and early ready elimination, U.S. Patent Appl. A1/20060250860, Nov. 9, 2006. [9] R. Joshi, Y. Chan, D. Plass, T. Charest, R. Freese, R. Sautter, W. Huott, U. Srinivasan, D. Rodko, P. Patel, P. Shephard, and T. Werner, A low power and high performance SOI SRAM circuit design with improved cell stability, in Proc. SOI Conf., Oct. 2006, pp. 47.

[10] H. Pilo, J. Barwin, G. Braceras, C. Browning, S. Burns, J. Gabric, S. Lamphier, M. Miller, A. Roberts, and F. Towler, An SRAM design in 65 nm and 45 nm technology nodes featuring read and write-assist circuits to expand operating voltage, in Proc. Symp. VLSI Circuits, Jun. 2006, pp. 1516. [11] C. Wann, R. Wong, D. J. Frank, R. Mann, S.-B. Ko, P. Croce, D. Lea, D. Hoyniak, Y.-M. Lee, J. Toomey, M. Weybright, and J. Sudijono, SRAM cell design for stability methodology, in Proc. IEEE VLSI-TSA Int. Symp. VLSI Technol., 2005, pp. 2122. [12] R. Kanj, R. Joshi, and S. Nassif, Mixture importance sampling and its application to the analysis of SRAM designs in the presence of rare failure events, in Proc. Des. Autom. Conf., Jul. 2006, pp. 6972. [13] R. Joshi, A oating-body dynamic supply boosting technique for Low-voltage SRAM in nanoscale PD/SOI CMOS technologies, in Proc. ISLPED, pp. 813. Rajiv V. Joshi (F02) received the B.Tech. degree from Indian Institute of Technology, Bombay, India, the M.S. degree from Massachusetts Institute of Technology, Cambridge, MA, and the Doctorate degree in engineering science from Columbia University, New York. He is a Research Staff Member with T. J. Watson Research Center, IBM, Austin, TX. He joined IBM in Nov 1983, and worked on VLSI Technology (NMOS, and CMOS, sub-0.5 m CMOS logic, DRAM, and SRAM technologies). He developed novel interconnect processes and structures for Aluminum, tungsten, and Copper technologies which are widely used in IBM for various sub-0.5 m memory and logic technologies as well as across the globe. His circuit and CAD work is used in IBM main frame and power PC processors. He has authored and coauthored over 140 research papers. He holds 140 U.S. patents in addition to several pending patents. Dr. Joshi was a recipient of three corporate and two Outstanding Technical Achievement Awards from IBM. He also received 48 Invention Plateau Awards from IBM. He has presented several invited and keynote talks and tutorials in IEEE SOI, SSDM, ICCAD, CICC, ASYNC, AMC, and coauthored tutorials in ISSCC and DAC. He received the Lewis Winner Award in 1992 for an outstanding paper he coauthored at the International Solid State Circuit Conference. He is an ISQED fellow. He received Distinguished Alumnus Award in 2008 from IIT, Bombay. He received IEEE/ACM William J. McCalla ICCAD Best Paper Award in 2009. He is in program committees of IEEE ISLPED (International Symposium Low Power Electronic Design) IEEE VLSI design, IEEE International SOI Conference (2000-2003), ISQED. He was a general chair for 2004 ISLPED Conference.

Rouwaida Kanj received the B.Eng. degree (with high distinction) from the American University of Beirut, in 1998, and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois, Urbana-Champaign, in 2000 and 2004, respectively. She is currently with the Silicon Analytics Team, IBM Austin Research Labs, Austin, TX. Prior to this position, she held multiple internships with the IBM EDA Group in Fishkill. She worked on modeling SOI effects, noise characterization of CMOS circuits, library characterization of novel circuit technologies, and is currently involved in variability driven SRAM analysis. She is the author of several technical papers and holds two issued patents and several pending patents. Dr. Kanj was a recipient of three IBM Ph.D. Fellowships, an Outstanding Technical Achievement Award, ve Invention Plateau Awards from IBM, and the IEEE/ACM William J. McCalla ICCAD Best Paper Award in 2009.

Vinod Ramadurai received the Bachelors degree in electrical engineering from The University of Arizona, Tempe, in 1999 and the Masters degree in electrical engineering from Cornell University, Ithaca, NY, in 2001. He is an Advisory Engineer with the Systems and Technology Group, IBM, Austin, TX. He joined IBM in 2001 and has been a Memory Circuit Designer on PowerPC 750 and 970 products for Apples G3 and G5 chips and is currently working on 32 nm SRAM IP development for IBM ASICS. His interests lie in the area of high performance and low power circuit design. He has coauthored ve conference papers, two journal papers, and holds ten issued patents and six patents pending.

Вам также может понравиться