Вы находитесь на странице: 1из 11

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE JOURNAL OF SOLID-STATE CIRCUITS 1

A 28 nm 2 Mbit 6 T SRAM With Highly


Configurable Low-Voltage Write-Ability Assist
Implementation and Capacitor-Based
Sense-Amplifier Input Offset Compensation
Mahmut E. Sinangil, Member, IEEE, John W. Poulton, Fellow, IEEE, Matthew R. Fojtik, Thomas H. Greer III,
Stephen G. Tell, Andreas J. Gotterba, Member, IEEE, Jesse Wang, Member, IEEE, Jason Golbus,
Brian Zimmer, Member, IEEE, William J. Dally, Fellow, IEEE, and C. Thomas Gray, Member, IEEE

Abstract—This paper presents a highly configurable low-voltage user experience. However, the SoCs in Fig. 1 also need to be
write-ability assist implementation along with a sense-amplifier extremely energy efficient since the end product is generally
offset reduction technique to improve SRAM read performance. battery powered.
Write-assist implementation combines negative bit-line (BL) and
VDD collapse schemes in an efficient way to maximize Vmin SRAMs, the most common type of on-chip memories, are
improvements while saving on area and energy overhead of these extremely important, as the contribution of SRAM area and
assists. Relative delay and pulse width of assist control signals power to total chip power has also been continuously increas-
are also designed with configurability to provide tuning of assist ing over the past years [2]. This is due to the trend of increased
strengths. Sense-amplifier offset compensation scheme uses capac- levels of parallelism to improve performance in the realm of
itors to store and negate threshold mismatch of input transistors.
A test chip fabricated in 28 nm HP CMOS process demonstrates multicore and multithreaded computing platforms. Refs. [3]–
operation down to 0.5 V with write assists and more than 10% [5] are three examples of recently published state-of-the-art
reduction in word-line pulsewidth with the offset compensated microprocessors featuring 24, 37.5, and 64 MB of total SRAM
sense amplifiers. cache on die. Consequently, achieving energy efficiency for
Index Terms—CMOS, low-voltage SRAM, offset compensation, SRAMs is one of the key components for energy-efficient
SRAM assist. systems.
Dynamic voltage and frequency scaling (DVFS) is one of
I. I NTRODUCTION the most effective methods to lower energy consumption in
digital circuits [6], [7]. While DVFS presents various complex-
T HE PROCESSING capabilities packed into a single die
have been driven by Moore’s law and have been con-
tinuously increasing. Fig. 1 shows the 16 bit floating point
ities for logic in terms of power delivery, clock generation and
distribution, timing validation and so forth, there is generally
precision (FP16) performance of NVIDIA Tegra SoC designs not a functionality problem with standard CMOS static logic
[1]. Over the course of 5 years, from Tegra 2 to Tegra X1, pro- operating as long as the operating voltage is not in the deep sub-
cess technology scaled from 40 to 20 nm and FP16 performance threshold region. In contrast, the concept of operating SRAMs
increased by roughly two orders of magnitude. Driven by the over a large voltage range is very challenging. This is because
integration of high-end graphics processor cores with multi- conventional SRAM bit-cells are generally designed to operate
core CPUs, today’s high-end SoC platforms can deliver more robustly at the nominal supply voltage. Because of the ratioed
than 1 TFLOPs of FP16 performance and provide an enhanced design of the conventional 6 T SRAM bit-cells, when the oper-
ating conditions move away from the nominal point, operating
Manuscript received April 30, 2015; revised October 01, 2015; accepted
margins that are required for functionality start to erode quickly.
October 25, 2015. This paper was approved by Associate Editor Hideto Hidaka.
This research was developed, in part, with funding from the Defense Advanced Moreover, local and global transistor variation coupled with
Research Projects Agency (DARPA). The views, opinions, and/or findings con- aging effects restricts the design space for SRAM functionality,
tained in this article/presentation are those of the author(s)/presenter(s) and and consequently it is becoming increasingly difficult to make
should not be interpreted as representing the official views or policies of the
Department of Defense or the U.S. Government. Distribution Statement A
SRAMs operational at lower supply voltages.
(Approved for Public Release, Distribution Unlimited). One method to utilize DVFS for a system consisting of logic
M. E. Sinangil was with NVIDIA, Santa Clara, CA 95050 USA. He gates and SRAM macros is decoupling the supply voltages.
is now with TSMC North America, San Jose, CA 95134 USA (e-mail: This dual-rail option enables separate control and scaling of
sinangil@alum.mit.edu).
J. W. Poulton, M. R. Fojtik, T. H. Greer III, S. G. Tell, and C. T. Gray are
logic voltage and frequency and hence provides energy sav-
with NVIDIA, Durham, NC 27713 USA. ings for the logic while the SRAM supply voltage is kept at
A. J. Gotterba, J. Wang, J. Golbus, B. Zimmer, and W. J. Dally are with a higher voltage to satisfy SRAM functionality requirements.
NVIDIA, Santa Clara, CA 95050 USA. Although effective, this approach presents various challenges at
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. the system level. First, introduction of a new rail requires allo-
Digital Object Identifier 10.1109/JSSC.2015.2498302 cation of metal resources for robust power delivery. Creation
0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS

write-ability is the main limiter of Vmin , while read-ability is the


next. Hence, a write-ability improvement assist solution is nec-
essary for this process along with a read-ability improvement
technique to enable low Vmin operation. It should also be noted
that without assists, Vmin changes drastically across process
corners. A write-assist implementation targeting the worst-case
corner can be wasteful at other corners, as the same amount
of Vmin improvement, or indeed any Vmin improvement, might
not be necessary. Moreover, in a system with fine-grain DVFS,
there can be time windows during which the requested voltage
level is only slightly lower than SRAM Vmin and a moderate
Fig. 1. FP16 performance of NVIDIA Tegra system-on-chip (SoC) designs Vmin improvement with a weaker assist can be preferred over
over the past 5 years. a full strength assist. Hence, assist circuits should be designed
with built-in configurability to minimize energy overhead.

A. Overview of Write-Ability Assist Techniques


Previous approaches to improve write-ability focus on using
circuit techniques to either weaken the pull-up devices or
strengthen pass-gate devices in the bit-cell. Although the
strengthening/weakening can be done in the row or column
direction, column-wise approaches are more compatible with
features such as column multiplexing and write masking. Most
widely used column-wise write-assist techniques either lower
the supply voltage terminal or brings the low BL terminal below
ground level. The work in [9] modulates the column-wise sup-
Fig. 2. SRAM Vmin due to the three failure mechanisms across different ply voltage to aid write-ability problems by selecting the lower
process corners in 28 nm CMOS technology.
of the two available supply voltages for columns performing a
write operation. This design uses 8-to-1 column multiplexing
and regulation of this new supply level and its routing through and the remaining half-selected columns’ supply voltages are
package and die generally increase system cost. Second, signals connected to the higher voltage level during a write operation.
crossing the power domains need to be carefully controlled. Similarly, the work in [10] lets the column-wise supply volt-
Specifically, level conversion might be necessary for all input age float during a write operation to weaken the bit-cell pull-up
and control signals crossing power domains and this results in devices. The floated supply nodes are charged back to nomi-
an area overhead for the macro. If explicit level converters are nal supply voltage level at the end of the write operation. A
avoided to conserve area, the logic supply and SRAM supply recent work [11] proposes three different methods to modulate
cannot be controlled completely independently. Next, the tim- column-wise supply voltage: charge sharing with another node
ing of the dual-rail design needs to be considered carefully. As to lower voltage level, pull-down to ground through a strong
the two supply domains are separated, they can be subject to path and pull-down to ground through a pulsed path. The work
different amounts of voltage noise that can potentially vary in in [11] has also shown that the optimum assist technique is
opposite directions as well. Hence the characterization and vali- different for low and high ends of the voltage range.
dation of the macro for all the timing paths arriving or departing For the assists involving BL voltage level, the work in [12]
the SRAM macros, as well as timing arcs inside the macro, proposes a boosting circuit to create a negative voltage to pull
need to consider worst-case noise events for both domains. This the low BL voltage to negative voltage levels. Similarly, the
results in an increase in the complexity of the macro and logic work in [13] creates a negative voltage using a boost capaci-
area surrounding the macro. tor along with a self-timed replica path to time the start of the
An alternate way of addressing the Vmin problem in SRAMs boost operation. A recent work in [14] also uses a negative BL
is using assist circuits. These circuits modulate various voltage operation to improve write-ability in 14 nm technology. The
levels inside the SRAM macro and consequently enhance or work in [15] uses both techniques to improve write-ability in
weaken bit-cell devices to improve functional margins [8]. Most 16 nm technology. A negative BL voltage is generated and also
pronounced failure mechanisms in conventional 6 T SRAMs column-wise supply voltage is reduced to enhance write-ability
can be divided into three groups as write-ability, read stabil- and to provide 300 mV Vmin improvement.
ity, and read-ability. Fig. 2 shows how Vmin is affected by
these three failure mechanisms for a 28 nm CMOS technology
across different process corners. Vmin is defined as the point B. Overview of Sense-Amplifier Offset Reduction Techniques
where failure probability of a bit-cell exceeds 10−9 , and the WL Previous work in the literature proposed different techniques
pulsewidth is selected to be 32 fan-out-of-four inverter delays at to improve sense-amplifier offset. Ref. [16] analyzes the offset
the corresponding voltage level. From Fig. 2, it can be seen that of the latch type sense amplifiers and proposes using a lower
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

SINANGIL et al.: 28 NM 2 MBIT 6 T SRAM WITH HIGHLY CONFIGURABLE LOW-VOLTAGE WRITE-ABILITY ASSIST IMPLEMENTATION 3

Fig. 4. Implementation of the proposed write-assist circuit.

a boosting capacitor needs to be placed in the SRAM macro


to create negative voltage levels. Larger capacitor size results
Fig. 3. Energy-per-access cost of assist circuits needs to be considered for in a larger magnitude of negative BL voltage and a more
different operating conditions as assists increase energy-per-access. aggressive write assists. However, increasing capacitor size pro-
vides diminishing returns as discussed in Section II. Moreover,
creating a larger magnitude negative voltage at the BL can cre-
common-mode voltage for the BLs to improve sense-amplifier ate disturbance on the rows that are not accessed. Similarly,
offset. Ref. [17] uses a closed-loop decision mechanism to for VDD collapse technique, collapsing column-wise supply
detect and compensate sense-amplifier offset through body- voltage deeper can improve write-ability for a more aggres-
biasing. Another work in [18] proposes using capacitors to sive write assist. However, this introduces a larger overhead
auto-zero sense-amplifier offset with a two-step sensing pro- for energy/access since column-wise supply node generally
cedure. A recent paper in [19] also proposes using capacitors to has a much larger capacitance than bit-lines (BLs). Moreover,
store the input transistor threshold mismatch voltage and then collapsing column-wise supply deeper also increases the distur-
compensate it when the sense amplifier is energized. The tech- bance on the cells on other rows that are not accessed.
nique proposed in [19] is similar to ours. However, our work In this paper, we are proposing a highly configurable write-
proposes an improved implementation and a more detailed assist scheme combining negative BL and VDD collapse in an
analysis as described in Section III, along with measurement efficient way so that energy and area overhead can be amortized
results from a test chip in Section IV. by sharing devices between both techniques and an aggres-
sive write-assist scheme can be created with highest efficiency.
Reconfigurability built into the proposed design can provide
C. Energy-Per-Access and Area Overhead of Assist Techniques cycle-by-cycle activation/deactivation or tuning of the strength
of the write assists depending on changes in operating con-
Although write assists are effective in improving write-
ditions. To address read performance, an offset compensated
ability, these techniques also result in an increase in switched
sense amplifier is also proposed. These features are imple-
capacitance-per-cycle. Fig. 3 shows a scenario where write
mented as part of a 128 kbit SRAM macro. This paper is
assists are used to improve Vmin down to 0.5 V. For this pur-
structured as follows. Section II presents the implementation
pose, negative BL voltage as well as supply voltage collapse
of the highly reconfigurable write-assist scheme combining
techniques are utilized. Because of the increased switching
negative BL and VDD collapse. Section III talks about the sense-
capacitance, these assist circuits present an increase in energy-
amplifier offset compensation technique. Section IV discusses
per-access. At the SF corner where Vmin is severely limited
the test chip architecture along with measurement results.
by write-ability, Vmin improvement provides energy-per-access
Finally, Section V concludes the paper.
savings. However, at the FS corner where Vmin without a write
assist is already low, assist circuits result in an increase in
energy-per-access. So, it is more energy efficient to turn OFF
or tune assist circuits for the FS corner so that the overhead of II. H IGHLY C ONFIGURABLE W RITE A SSIST C OMBINING
N EGATIVE BL AND VDD C OLLAPSE
assists is lower.
In this work, we have implemented negative BL and VDD Fig. 4 shows the implementation of the proposed write assist
collapse, two most widely used write-ability assist techniques circuit. The design features a 4-to-1 column-multiplexing ratio
to provide an aggressive and configurable write-assist solu- such that four columns each having 256 memory cells inter-
tion. Compared to a single assist implementation, our proposed face with a BL multiplexer that is composed of NMOS pass
scheme is more area and energy efficient. For negative BL, gates. Bit-cell supply voltage terminal (VCOL ) is shared across
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS

256 cells on a single column. A header transistor (P0) is inserted


for every column of memory cells to decouple VCOL from VDD
during voltage collapse. Another PMOS device (P1) is placed at
every column between VCOL and the boost capacitor’s (CBOOST )
left terminal (NCL) to charge CBOOST for negative BL opera-
tion. Pull-down device N0 is used to boost the charge across
CBOOST to create a negative voltage at NCR node which is then
transferred to BL or BLB of the selected column through two
levels of multiplexing. The first level of multiplexing is done
by NMOS devices N3 and N4 controlled by DSEL/DBSEL sig-
nals and the second level of multiplexing is done by BL MUX
block to select the active column undergoing a write operation.
Finally, N1 and N2 pull-down devices constitute the conven-
tional write driver to discharge one of the BLs at the beginning
of the write access depending on the polarity of input data.
Although not shown on the figure, cross-coupled PMOS keep-
ers are placed between BL/BLB pairs of every column to keep
the high BL at VDD during the write operation. To allow neg-
ative BL operation, DGATE/DGATEB signals are pulsed and
both N1 and N2 are turned OFF before the application of nega-
tive BL operation. It should be noted that CBOOST and devices
N0–N4 are shared across four columns of memory cells.
The design in this work allows applying negative BL and
VDD collapse write assists separately or at the same time.
Fig. 5(a) shows waveforms of control signals for a single col-
umn with respect to the WL pulse. Along with these signals,
BL and VCOL node voltages are also shown for the same column
undergoing the write operation. In conjunction with Fig. 5(a),
Fig. 5(b) shows step-by-step assist operation when both nega-
tive BL and VDD collapse are performed. This case corresponds
to the last access shown in Fig. 5(a) and will be explained in
detail in the next paragraph.
Before the beginning of assist operation, one of the BLs
is pulled low and WL is asserted. At this point, the column-
wise supply node VCOL is at VDD and there is no charge across
CBOOST . To perform negative BL, CBOOST needs to be charged.
Since the voltage on VCOL is also going to be collapsed, we
perform charge sharing between VCOL and CBOOST . This allows
the charge to be reused and improves energy efficiency of the
assist scheme. To enable charge sharing, P0 is first turned OFF
and then P1 is turned ON. At the end of the charge sharing
phase, the voltage of VCOL gets lower and CBOOST is charged. Fig. 5. (a) Waveforms for different assist modes and (b) step-by-step operation
Once charged, CBOOST can be used to perform the negative BL of assists when both negative BL and VDD collapse are used. In this figure, VCS
operation by first turning OFF N1/N2 and then turning ON N0 is the voltage after charge sharing and VVC is the collapsed VCOL voltage.
device. The charge across the capacitor is transferred to the
low BL to further pull its voltage level below the ground level.
Turning ON N0 also creates a path from VCOL to ground through to be charged to VDD through P1. Before negative BL is applied
the P1 switch further reducing VCOL ’s voltage level. Since P1 by N0 turning ON, P1 is turned OFF first to ensure there is not a
is a PMOS device, it prevents VCOL from being discharged to path from VDD to ground.
very low voltages and keeps VCOL at most one PMOS threshold Charge sharing between VCOL and CBOOST provides energy
above ground. The assist operation is designed to be self-timed savings, but comes at the expense of reduced negative BL volt-
and finishes before the end of the WL pulse by pulling VCOL age because CBOOST cannot be fully charged to VDD . To analyze
back to the VDD level. This allows the internal storage nodes the effect of this, Fig. 6(a) shows the comparison between the
in the cell to recover to full-swing before the end of the WL conventional negative BL implementation along with the pro-
pulsewidth. Fig. 5(a) also shows the control signals when assists posed charge sharing scheme. The corresponding negative BL
are turned OFF and when only negative BL operation is per- voltage achieved in both cases is shown in Fig. 6(b). Without
formed. For the operation with only negative BL, P0 is not charge sharing, CBOOST is precharged from VDD and conse-
turned OFF during the charging of CBOOST . This allows CBOOST quently stores a larger amount of energy and creates a larger
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

SINANGIL et al.: 28 NM 2 MBIT 6 T SRAM WITH HIGHLY CONFIGURABLE LOW-VOLTAGE WRITE-ABILITY ASSIST IMPLEMENTATION 5

Fig. 7. Digitally controlled delay lines along with a voltage divider at the gate
of pull-down device N0 are used to tune the strength of assists.

Fig. 6. (a) Charging of capacitor with and without charge sharing and (b) cor-
responding negative BL voltage achieved. Fig. 8. Simulated Vmin improvement versus energy overhead of assists scat-
ter plot. Configurability and tuning of assists provide a larger space for Vmin
versus energy overhead tradeoff.

negative BL voltage for the same CBOOST size. With charge


sharing from VCOL , CBOOST is precharged to a lower voltage.
When the VCOL capacitance is much larger than CBOOST , the capacitance, we also chose to use low-threshold and larger
difference in the final negative BL voltage is small. However, than minimum gate length transistors for the implementation of
as the size of CBOOST increases, the magnitude of negative CBOOST . To increase capacitance further, lower level metals on
BL voltage provides diminishing returns. Although CBOOST top of the capacitor-connected device are also utilized. A metal
increases, the energy that can be stored on this capacitor is mesh is created to utilize metal-to-metal parasitic capacitance.
limited by the initial charge sharing with VCOL . For our tar- This parasitic capacitance increases total boosting capacitance
get negative BL voltage of −130 mV which is chosen for the by around 15% and comes at the expense of no additional area
worst-case PVT conditions for write-ability, the charge-sharing and minimum change to the signal routing in the column circuit.
technique results in only a modest cost, about 13% smaller Our proposed write-assist implementation provides config-
negative BL voltage boost. However, it should be noted that urability in terms of which assist is selected to be used on
our main goal is not creating the largest magnitude of nega- a cycle-by-cycle basis. To increase the configurability range
tive BL boost, but improving write-ability. The charge sharing in this work, we can also change the relative timing and
with VCOL results in a lower VCOL voltage, which also improves pulsewidth of assist control signals to vary the strength of the
write-ability. applied assists. For example, by allowing a shorter amount of
Another important consideration is the implementation of charge sharing time between VCOL and CBOOST , the magnitude
CBOOST . Our goal was to get the largest capacitance in unit area of negative BL voltage as well as voltage collapse on VCOL
and use a capacitor that provide the highest level of integra- can be reduced. Alternatively, by creating longer pulsewidth for
tion with the rest of the column circuit. Although enhancement control signals, assist magnitudes can be increased. To adjust
mode varactors provide higher capacitance per unit area, they the pulsewidth of control signals, we use digitally controlled
require additional spacing with core transistors and the area delay lines as shown in Fig. 7. Since these signals are global,
lost for these spacing requirements in layout prohibit their the area and energy overhead of the delay lines are negligible.
usage for this work. A core transistor, on the other hand, can A weak PMOS transistor is also placed at the gate of N0 to cre-
easily be connected with the rest of the column circuit tran- ate a voltage divider between the driver of the pullCap signal.
sistors. Between NMOS and PMOS devices, we chose to use When this weak PMOS is turned ON, the gate voltage of N0
PMOS since it provides around 10% larger effective capaci- cannot reach VDD and consequently N0 slows down, and the
tance per unit area during negative BL operation. To maximize strengths of both assists are reduced.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS

shows the retention margin for a 5 sigma cell with varying


assist strengths. Because of the collapsed supply voltage and
positive gate-to-source voltage of access transistors during neg-
ative BL operation, retention margin degrades with increasing
assist strength. However, it should be noted that in this analy-
sis, the disturbance on the cells is exacerbated because of the dc
simulation. In reality, transient collapse of VCOL and transient
application of negative BL have a smaller effect on retention
SNM margin. In silicon, we did not observe any retention
failures down to 0.5 V with the available die from TT and SF
Fig. 9. Simulated negative BL and VDD collapse magnitude for assists plotted process corners which supports the dc SNM simulation results.
as a percentage of supply voltage level.
It is, of course, desirable to minimize the area overhead for
assist circuits. Our configurable assist circuits incur only 8%
area overhead at the 128 kbit macrolevel. Area overhead can
be reduced if some of the configurability is not implemented.
Most of the overhead is due to CBOOST (more than 4%) and
header transistors P0 and P1 (around 2%). To minimize area
spent on CBOOST and its integration with the rest of the circuits,
as discussed above, a core PMOS transistor along with a metal
mesh is used to implement CBOOST . If only negative BL assist
was implemented, area overhead is estimated to be around 5%.
However, it should be noted that this estimate assumes the same
CBOOST size that is used in this design. To achieve same Vmin
with this design while using only negative BL, CBOOST needs to
Fig. 10. 5 sigma retention margin characterized with dc SNM simulations with
be larger along with its driver circuits and the area overhead
varying assist strengths at 0.5 V.
in this case will be larger. Furthermore, if configurability of
assists is necessary, negative BL design needs to be designed
Fig. 8 shows a scatter plot of the energy overhead of assists with this requirement while our proposed design allows a wide
versus Vmin improvement at SF corner, 0.5 V and −25 C. Vmin configuration range for the activation of write assists by design.
improvement is calculated for a target failure probability of
1 × 10−9 . When both assists are turned ON, at a weak setting,
150 mV of Vmin improvement is achieved with 18% energy III. C APACITOR -BASED T HRESHOLD M ATCHING C IRCUIT
FOR S ENSE A MPLIFIERS
overhead. Alternatively, when only negative BL is turned ON, at
a strong setting, Vmin improvement is nearly 200 mV. Finally, With the proposed assist circuits discussed in Section II,
when both assists are turned ON, at a strong setting, a Vmin write-ability limited Vmin can be improved significantly. As
improvement of more than 250 mV can be provided with 37% shown in Fig. 2, read-ability is the second Vmin limiter in most
energy overhead. At the strongest assist setting, negative BL of the process corners. Hence, we present an offset compensa-
and VDD collapse magnitudes are 23% and 36%, respectively, tion scheme to improve read-ability Vmin and further improve
and at the weak setting, assist magnitudes are nearly half of overall SRAM Vmin across process corners.
the strong setting. The modular design of the write assist pro- The conventional sense amplifier used in 6 T SRAMs con-
vides a Vmin improvement versus energy overhead tradeoff by sists of two cross-coupled inverters with a footer device placed
providing a range of assist magnitudes. In a system applying as a current source to enable the sense amplifier. The volt-
fine-grain DVFS, the modularity of assists can provide SRAM age differential on BLs is passed to the input/output terminals
operation at the target voltage with minimum energy overhead of the cross-coupled inverters, and then the sense amplifier
from assists.7 is enabled. NMOS devices of the cross-coupled inverters are
To demonstrate the effect of PVT conditions on the effec- critical to amplify the initial differential. After this amplifi-
tiveness of assists, Fig. 9 shows negative BL and VDD collapse cation, PMOS devices of the cross-coupled inverters turn ON
magnitude as a percentage of supply voltage at two different and further amplify the differential to rail-to-rail voltage lev-
voltages (0.5 and 0.9 V) and three different process corners (TT, els. Sense-amplifier offset is mostly affected by the threshold
SF, and FS). Both negative BL and VDD collapse magnitudes voltage mismatches between the NMOS devices of the cross-
for assists stay relatively flat across these PVT conditions. It coupled inverters [20]. In this work, we are using capacitors
should be noted that, these write assists are most needed at SF to measure and store the threshold voltage information of the
corner and low-voltage levels and can be turned OFF at other critical input transistors and then use the voltage across these
PVT conditions. capacitors to compensate for the threshold mismatches.
The application of negative BL and VDD collapse increases Fig. 11 shows the offset compensated sense-amplifier design.
the stress on the cells that are not accessed but share the same C0 and C1 are NMOS devices connected as capacitors at the
BL and VCOL . Retention margin analysis is performed by using source of input devices N0 and N1, respectively. When the
dc static noise margin (SNM) simulations at 0.5 V to quantify sense amplifier is reset (pchgb = “0”), input/output nodes of
the amount of additional stress due to the applied assists. Fig. 10 the sense amplifier are precharged to VDD . During this time,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

SINANGIL et al.: 28 NM 2 MBIT 6 T SRAM WITH HIGHLY CONFIGURABLE LOW-VOLTAGE WRITE-ABILITY ASSIST IMPLEMENTATION 7

into the WL pulse and input differential is passed to the input of


the sense amplifier after threshold information is stored on C0
and C1. It should be noted that, as ct0 and ct1 voltages reach
VDD − VT , input transistors N0 and N1 go into subthreshold
region of operation and current through these transistors get
smaller. Consequently, at the end of Phase A, voltage of ct0
and ct1 will not be exactly VDD − VT,N 1 and VDD − VT,N 2 , but
they will be close to these voltage levels and proportional to the
threshold voltages of the input transistors.
After Phase A, the en signal is asserted to start the sensing
operation (Phase B). Assertion of the en signal pulls the nsrc
low, which also pulls the ct0 and ct1 nodes low through capac-
itors C0 and C1. This causes the N0 and N1 devices to turn ON
Fig. 11. Sense-amplifier design using capacitors for input device threshold and amplify the input differential. However, the charge stored
matching.
on C0 and C1 cannot provide enough current for the sense-
amplifier outputs to reach rail-to-rail levels. After a delay set
by the delay elements shown in Fig. 11, devices N3, N4, and
N5 are turned ON and amplify the input/output nodes to rail-to-
rail (Phase C). During this time, the top and bottom terminals of
capacitors C0 and C1 are both driven to ground and completely
discharged. This ensures that there is no residual charge left on
C0 or C1 for the next sensing operation. Although not shown in
Fig. 11, a latch is connected to the input/output terminals of the
sense amplifier and stores the sense-amplifier output until the
next access.
The work in [19] proposes two separate global signals for
their proposed sense-amplifier offset compensation (sae and
cap). In our design, we use a single sen signal which is simi-
lar to a conventional design. Each sense amplifier generates the
signal to end the offset compensation internally. This imple-
mentation not only avoids routing of an extra global signal
for sense-amplifier operation but also makes the timing of off-
set compensation independent and self-timed for each sense
amplifier. This relaxes the tracking requirements for the global
Fig. 12. Waveforms showing the operation of the sense amplifier during back- sen signal in our design. Our design also proposes to use a
to-back read operations. local PMOS device embedded into sense-amplifier layout to
precharge nsrc node. In [19], the precharging of bottom plates
the top terminals of capacitors (ct0 and ct1 in Fig. 11) are of capacitors is done by a global signal which need to drive two
precharged to VDD − VT,N 0 and VDD − VT,N 1 , respectively, capacitors for every sense amplifier along with parasitic loading
where VT,N 0 and VT,N 1 are threshold voltages of N0 and N1. across column i/o circuit. This large load can be significantly
In other words, N0 and N1 charge C0 and C1 to the point where large especially in designs with wide i/o and introduce a large
both N0 and N1 are on the verge of turning ON. This ensures delay for Phase A. A balancing NMOS device is also included
that driving strength of N0 and N1 will be mainly determined in our design between the top plates of capacitors. This device
by the input voltages at the gates of N0 and N1, not by their ensures that the top plates of the capacitors are pulled down
threshold voltages. together and the effect of a mismatch between N3 and N4 can
Fig. 12 shows the waveforms of various signals during the be minimized during offset compensation.
operation of the sense amplifier. During the phase denoted by In the operation of the offset compensated sense amplifier,
the letter A, read operation (by the assertion of WL signal) the duration of Phase B is important. If the length of Phase
as well as the sense-amplifier offset compensation operation B is very short, there might not be enough amplification from
start. During this phase, the sense amplifier is in reset mode N0 and N1, and offset compensation will be reduced. On the
and ct0 and ct1 are charged to VDD − VT for both N0 and N1. other hand, if Phase B is very long, the voltage on the float-
Also during this time, a differential is building between the ing input/output nodes can be partially or fully lost due to
BLs since WL is asserted. When column-select transistors are leakage paths. In this work, we chose this duration to be five
turned ON, this differential is passed to the input/output termi- back-to-back inverter delays such that overall offset reduction
nals of the sense amplifier. If the column-select transistors are is maximized. Another important consideration is the rise time
turned ON before ct0 and ct1 reach VDD − VT , input differen- of the node connected to the gates of N3–N5. The mismatch
tials will affect the sampling of threshold voltages. To prevent between the N3–N5 devices can adversely affect the differen-
this, column-select transistors are on-purposely turned ON later tial voltage at the input/output nodes, and a slower rise time
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 13. Input referred offset voltage improvement with the proposed compen- Fig. 14. Die photograph of the test chip fabricated in 28 nm HP CMOS process.
sation scheme across SF and TT corners with VDD = 0.5 and temperature at
85 C.

can exacerbate this effect. Hence, a fast rising signal is ensured


at the input of devices N3–N5 to minimize the effect of their
variation on sense-amplifier offset.
Another consideration for the effectiveness of the offset com-
pensation is the mismatch between the capacitors due to process
nonidealities. It should be noted that the absolute value of the
capacitors is not critical in the proposed scheme. If the capacitor
sizes are large enough to provide the initial amplification of
the input differential and start to turn ON PMOS load devices,
the cross-coupled inverters will start to latch to the correct out-
put state. Hence, mismatch between the capacitors will not be
very critical as long as the size of the capacitors is chosen large
enough to provide the initial amplification.
Fig. 13 shows the offset reduction with the proposed
scheme for two different process corners. Sense-amplifier
input-referred offset is calculated based on Monte Carlo sim-
ulations using a postlayout extracted netlist and quoted as the
improvement with respect to the input-referred offset of a con-
ventional sense-amplifier calculated in the same manner. For
this figure, the supply voltage is at 0.5 V and temperature is at
85 C. Offset compensation provides 48% and 49% lower input-
referred offset at SF and TT corners, respectively. It should
be noted that the proposed offset compensated sense-amplifier
increases the delay of the sense-amplifier stage because of Fig. 15. High-level architecture of the test chip. 2 Mbit of conventional and 2
the initial and final amplification phases. Specifically, because Mbit of assisted RAMs are placed on the chip.
of the delay between Phases B and C, output resolution
takes around 30% larger than the conventional sense-amplifier
design. However, because of the reduced offset, sense ampli-
28 nm HP CMOS process (Fig. 14). Fig. 15 shows the archi-
fiers can be enabled earlier into the WL pulse which can reduce
tecture of the test chip. On each die, 2 Mbits of conventional
overall delay from the input clock to the output data. Moreover,
SRAMs are placed along with 2 Mbits of SRAMs featuring
it should also be noted that smaller swing on BLs provides
the write assist and sense-amplifier offset compensation ideas.
reduced BL switching energy across the macro. The capacitors
SRAM bits are partitioned into 16 macros of 256 rows by
and other devices added to the sense amplifier for offset com-
520 columns (including two redundant columns) for both con-
pensation result in only a 3.2% area overhead for a 128 kbit
ventional and assisted RAMs. Along with SRAM macros, an
memory macro. If the same area was used to increase the size of
on-chip sampling scope, MBIST engine, and JTAG interface
the sense-amplifier devices, offset improvement would be less
are also placed on the test chip. The on-chip sampling scope
than half of the improvement achieved with the proposed offset
is used to monitor WL pulsewidths, while the MBIST engine
compensation scheme.
creates various standard test patterns used for product testing.
Finally, the JTAG interface is used for off-chip communication.
SRAM macros on the test chip are designed to be phase
IV. T EST C HIP OVERVIEW AND M EASUREMENT R ESULTS
based, where the low phase of the clock is used to assert the
Configurable write-assist and sense-amplifier offset compen- WLs. In order to stress write-ability and read-ability problems
sation ideas are demonstrated on a test chip fabricated in a without pushing operating frequency very high, an additional
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

SINANGIL et al.: 28 NM 2 MBIT 6 T SRAM WITH HIGHLY CONFIGURABLE LOW-VOLTAGE WRITE-ABILITY ASSIST IMPLEMENTATION 9

Fig. 16. Oscilloscope screenshot when observing WL pulsewidth. Because of


the small frequency difference between clk and sclk, a time expanded view of
the WL pulsewidth can be displayed on the oscilloscope screen.

pin (wlg) is inserted for each macro that causes the WL pulse
width to be shorter than the low phase of the clock. The on-chip
sampling scope placed on the test chip is used to measure the
WL pulsewidth observed on die by connecting it to one WL
from the conventional and assisted SRAM macros.
During the scope operation, MBIST is placed in a mode
to continuously hit the address corresponding to the observed
WL. The on-chip sampling scope uses a comparator clocked Fig. 17. Measured shmoo plots of the (a) conventional and (b) assisted SRAM
with a separately controlled sampling clock, sclk. By intention- macros showing improvement of write-ability errors with assists.
ally creating a slight frequency difference between clk and sclk
from an off-chip instrument, sclk starts to “slide” through the
WL pulse and effectively samples the WL voltage with small
time steps. The entire WL pulse is sampled across many cycles
but as MBIST is placed in a continuous mode, WL is guaran-
teed to turn ON every cycle. At the output, this provides a time
expanded view of the WL pulsewidth as shown in the oscillo-
scope screenshot in Fig. 16. The conversion factor between the
displayed pulsewidth on the oscilloscope screen and the actual
pulse on the chip is set by the frequency difference between
sclk and clk. For the example in Fig. 16, this conversion factor
is 1.418 ns/ms and with clk running at 30 MHz, wlg is used
to shorten the WL pulsewidth from half of the clock period
(16.67 ns) to 4.57 ns.
SRAMs from TT and SF process corners were tested. Fig. 18. Measured WL pulsewidth versus correct sensing probability for the
Fig. 17(a) and (b) shows shmoo plots with and without assists. conventional and proposed sense-amplifier designs.
Without write assists, macros begin to see write failures as
the supply voltage is scaled down. Turning ON both negative
BL and VDD collapse assists results in more than 25% WL
pulsewidth reduction at 0.5 V, enabling operation down to 0.5 V. WL pulsewidth, correct and erroneous bits are recorded and
The proposed write assists can provide even higher Vmin sav- bits that are sensed correctly are represented with the correct
ings for larger die with higher number of bits, as these will have sensing probability. For the same correct sensing probability
more bit-cell samples and a longer tail for the Vmin distribution of 99%, proposed sense amplifiers provide more than 10%
of cells. shorter WL pulsewidths at 0.5 V. These results agree with the
Fig. 18 shows the measured reduction of WL pulsewidth simulated sense-amplifier offset improvement discussed in this
by using the proposed capacitor-based input threshold match- paper. Because of the independent clock tree for clk and wlg
ing sense amplifiers. With a smaller sense-amplifier offset, signals and the skew introduced by them at each macro inter-
the WL pulsewidth can be shorter and a smaller differen- face, a direct measurement of offset compensation across all
tial between BL and BLB can be resolved correctly. For the macros is not possible. The plot in Fig. 18 shows the WL
experiment in Fig. 18, the WL pulsewidth is set to a very nar- pulsewidth improvement of each macro independently. Finally,
row pulse, and then it is incremented in small steps. For each Table I summarizes specifications of the test chip.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS

TABLE I [7] V. Gutnik and A. Chandrakasan, “Embedded power supply for low-power
C HIP S PECIFICATIONS DSP,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 5, no. 4,
pp. 425–435, Dec. 1997.
[8] A. P. Chandrakasan et al., “Technologies for ultradynamic voltage scal-
ing,” Proc. IEEE, vol. 98, no. 2, pp. 191–214, Feb. 2010.
[9] K. Zhang et al., “A 3-GHz 70-Mb SRAM in 65-nm CMOS technol-
ogy with integrated column-based dynamic power supply,” IEEE J.
Solid-State Circuits, vol. 41, no. 1, pp. 146–1151, Jan. 2006.
[10] M. Yamaoka et al., “Low-power embedded SRAM modules with
expanded margins for writing,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, Feb. 2005, pp. 480–481.
[11] E. Karl et al., “A 0.6 V 1.5 GHz 84 Mb SRAM design in 14 nm FinFET
CMOS technology,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, Feb. 2015, pp. 310–311.
[12] K. Nii et al., “A 45-nm single-port and dual-port SRAM family with
robust read/write stabilizing circuitry under DVFS environment,” in Proc.
IEEE Symp. VLSI Circuits, Jun. 2008, pp. 212–213.
V. C ONCLUSION [13] Y. Fujimura et al., “A configurable SRAM with constant-negative-
level write buffer for low-voltage operation with 0.149 µm2 cell in
This work presents a highly configurable write-assist imple- 32 nm high-k metal-gate CMOS,” in IEEE Int. Solid-State Circuits Conf.
mentation along with a sense-amplifier offset compensation (ISSCC) Dig. Tech. Papers, Feb. 2010, pp. 348–349.
scheme in a 28 nm HP CMOS process. The assisted design [14] T. Song et al., “A 14 nm FinFET 128 Mb 6T SRAM with VMIN-
enhancement techniques for low-power applications,” in IEEE Int. Solid-
extends SRAM Vmin down to 0.5 V with sense-amplifier off- State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2014, pp. 232–233.
set compensation providing 10% reduction in WL pulsewidth. [15] Y.-H. Chen et al., “A 16 nm 128 Mb SRAM in high-K metal-gate FinFET
The configurable nature of the assists allows for easy selection technology with write-assist circuitry for low-VMIN applications,” in
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.
of assist techniques to be used, and also enables adjustment 2014, pp. 238–239.
of the strength of the assist circuits. These settings can be [16] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, “Yield and speed opti-
altered even on a cycle-by-cycle basis to provide maximum mization of a latch-type voltage sense amplifier,” IEEE J. Solid-State
Circuits, vol. 39, no. 7, pp. 1148–1158, Jul. 2014.
flexibility. As the process, voltage and temperature (PVT) con- [17] Y. Sinangil and A. P. Chandrakasan, “A 128 kbit SRAM with an embed-
ditions create a multidimensional design space across which ded energy monitoring circuit and sense amplifier offset compensation
SRAM functionality needs to be assured, the energy cost of using body biasing,” IEEE J. Solid-State Circuits, vol. 49, no. 11,
pp. 2730–2739, Nov. 2014.
assist circuits can be minimized with these configurable assist [18] B. Giridhar, N. Pinckney, D. Sylvester, and D. Blaauw, “A reconfig-
techniques. Along with global process monitors and local tem- urable sense amplifier with auto-zero calibration and pre-amplification
perature and supply voltage monitoring circuits, a system-level in 28 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, Feb. 2014, pp. 242–243.
hierarchical control scheme can be created to decide whether [19] A. Kawasumi et al., “Energy efficiency deterioration by variability in
assists are necessary or not for the current PVT conditions. SRAM and circuit techniques for energy saving without voltage reduc-
Moreover, a programmable lookup table can be created to tion,” in Proc. IEEE Int. Conf. IC Des. Technol. (ICICDT), May 2012,
pp. 1–4.
decide which PVT conditions require which assists to be turned [20] S. J. Lovett, G. A. Gibbs, and A. Pancholy, “Yield and matching impli-
ON and with what strength. This information can be trans- cations for static RAM memory array sense amplifier design,” IEEE J.
mitted to each macro from the local monitoring and control Solid-State Circuits, vol. 35, no. 8, pp. 1200–1204, Aug. 2000.
circuits synchronously and assist schemes allowing cycle-by-
cycle adjustments can respond very quickly. Alternatively, a Mahmut E. Sinangil (S’06–M’12) received the
B.Sc. degree in electrical and electronics engineering
simpler approach can involve stopping accesses and making from Bogazici University, Istanbul, Turkey, in 2006,
adjustments to the assist settings when system-level PVT con- and the S.M. and Ph.D. degrees in electrical engi-
ditions are changed. These methods can provide maximum neering and computer science from Massachusetts
Institute of Technology (MIT), Cambridge, MA,
energy efficiency at the system level by providing SRAM Vmin USA, in 2008 and 2012, respectively.
improvement by only the necessary amount. From 2012 to 2015, he was a Senior Research
Scientist with the Circuits Research Group, NVIDIA,
Durham, NC, USA. In 2015, he joined TSMC North
America where he is currently a Technical Manager.
R EFERENCES His research interests include low-power and high-density memory circuit
design with a focus on low-voltage operation and application specific circuit
[1] NVIDIA Corp.. (2015, Jan.). NVIDIA Tegra X1 White Paper [Online].
optimizations.
Available: http://international.download.nvidia.com/pdf/tegra/Tegra-X1-
Dr. Sinangil was the recipient of the Ernst A. Guillemin Thesis Award from
whitepaper-v1.0.pdf MIT for his Master’s thesis in 2008, the 2006 Bogazici University Faculty
[2] International Solid State Circuit Conference. (2013, Nov. 1). ISSCC 2014
of Engineering Special Student Award, and corecipient of the 2008 A-SSCC
Tech Trends [Online]. Available: http://isscc.org/doc/2014/2014_Trends.
Outstanding Design Award.
pdf
[3] R. Kan et al., “The 10th generation 16-core SPARC64 processor for mis- John W. Poulton (M’85–SM’90–F’12) received the
sion critical UNIX server,” IEEE J. Solid-State Circuits, vol. 49, no. 1, B.S. degree from Virginia Polytechnic Institute and
pp. 32–40, Jan. 2014. State University, Blacksburg, VA, USA, in 1967, the
[4] S. Rusu et al., “Ivytown: A 22 nm 15-core enterprise Xeon processor M.S. degree from the State University of New York,
family,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers Stony Brook, NY, USA, in 1969, and the Ph.D. degree
(ISSCC), Feb. 2014, pp. 102–103. from the University of North Carolina, Chapel Hill
[5] P. Li et al., “A 20 nm 32-core 64 MB L3 cache SPARC M7 processor,” in (UNCCH), NC, USA, in 1980, all in physics.
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers (ISSCC), From 1981 to 1999, he was a Researcher with the
Feb. 2015, pp. 72–73. Department of Computer Science, UNCCH, where
[6] T. Burd and R. Brodersen, “Design issues for dynamic voltage scaling,” from 1995 he held the rank of Research Professor.
in Proc. Int. Symp. Low Power Electron. Des. (ISPLED), 2000, pp. 9–14. From 1999 to 2003, he was a Chief Engineer with
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

SINANGIL et al.: 28 NM 2 MBIT 6 T SRAM WITH HIGHLY CONFIGURABLE LOW-VOLTAGE WRITE-ABILITY ASSIST IMPLEMENTATION 11

Velio Communications, Milpitas, CA, USA. From 2003 to 2009, he was a Jason Golbus received the B.S. degree in electri-
Technical Director with Rambus, Inc., Chapel Hill, NC, USA. Currently, he cal engineering from Duke University, Durham, NC,
is a Senior Scientist with NVIDIA, Inc., Durham, NC, USA. His research inter- USA, in 1996, and the M.S. degree in electrical engi-
ests include VLSI-based architectures for graphics and imaging, design and neering from the University of California, Berkeley,
construction of the pixel-planes and PixelFlow computer graphics systems. CA, USA, in 1998.
He joined NVIDIA in 2009 and has been leading
Matthew R. Fojtik received the B.S., M.S., and the SRAM custom design team for several genera-
Ph.D. degrees in electrical engineering from the tions of Tegra processors.
University of Michigan, Ann Arbor, MI, USA, in
2008, 2010, and 2013, respectively.
He joined NVIDIA, Durham, NC, USA, as a mem-
ber of the Circuits Research Group and is currently a
member of NVIDIA’s ASIC/VLSI Research Group. Brian Zimmer (S’09–M’15) received the B.S.
His research interests include timing margin reduc- degree in electrical engineering from the University
tion techniques, clocking and synchronization, low of California at Davis, Davis, CA, USA, in 2010,
power on-chip communication, and efficient VLSI and the M.S. and Ph.D. degrees in electrical engi-
methodologies. neering and computer sciences from the University of
California at Berkeley, Berkeley, CA, USA, in 2012
Thomas H. Greer III received the B.S. degree in and 2015, respectively.
mathematics and physics from the University of the He is currently with the Circuits Research Group,
South, Sewanee, TN, USA, in 1984, and the M.S. NVIDIA Corporation, Santa Clara, CA, USA. His
degree in computer science from the University of research interests include energy-efficient digital
North Carolina, Chapel Hill, NC, USA, in 1988. design, with an emphasis on low-voltage SRAM
He is currently with NVIDIA Corporation, design and variation tolerance.
Durham, NC, USA. His research interests include
efficient movement of data and pumpkins.
William J. Dally (M’80–SM’01–F’02) received the
B.S. degree in electrical engineering from Virginia
Tech, Blacksburg, VA, USA, in 1980, the M.S. degree
Stephen G. Tell was born in New Jersey, USA, in in electrical engineering from Stanford University,
1967. He received the B.S.E. degree in electrical engi- Stanford, CA, USA, in 1981, and the Ph.D. degree in
neering from Duke University, Durham, NC, USA, in computer science from Caltech, Pasadena, CA, USA,
1989, and the M.S. degree in computer science from in 1986.
the University of North Carolina, Chapel Hill, NC, He is a Chief Scientist and Senior Vice President
USA, in 1991. of Research with NVIDIA Corporation, Durham,
He was a Senior Research Associate with NC, USA, and a Professor (Research) and Former
UNC/Chapel Hill, Chapel Hill, NC, USA, from 1991 Chair of Department of Computer Science, Stanford
to 1999, worked on parallel graphics systems and University, Stanford, CA, USA. He currently leads projects on computer archi-
high speed signaling, and in 1999 joined Chip2Chip tecture, network architecture, circuit design, and programming systems. He
Inc., San Jose, CA, USA. From 2003 to 2009, he has authored over 200 papers in these areas, holds over 90 issued patents,
worked with Rambus, Inc. In 2009, he joined NVIDIA, Durham, NC, USA, as a and authored textbooks Digital Design: A Systems Approach, Digital Systems
member of the Circuits Research Group. His research interests include custom Engineering, and Principles and Practices of Interconnection Networks.
circuit design and the surrounding logic for intra- and interchip communication. Dr. Dally is a member of the National Academy of Engineering, a fellow of
the ACM and American Academy of Arts and Sciences. He was the recipient
Andreas J. Gotterba (S’02–M’05) received the of ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM
B.S. degree in electrical engineering from Stanford Maurice Wilkes Award.
University, Stanford, CA, USA, in 2003, and
the M.Eng.Sc. degree in photovoltaics from the
University of New South Wales, Sydney, Australia, C. Thomas Gray (M’89) received the B.S. degree in
in 2004. computer science and mathematics from Mississippi
From 2005 to 2009, he worked with Novelics LLC, College, Clinton, MS, USA, in 1988, and the M.S.
Aliso Viejo, CA, USA. In 2009, he joined NVIDIA, and Ph.D. degrees in computer engineering from
Santa Clara, CA, USA. His research interests include North Carolina State University, Raleigh, NC, USA,
SRAMs and other custom circuits, particularly low- in 1990 and 1993, respectively.
power and handshaking designs. From 1993 to 1998, he was an Advisory Engineer
with IBM, Research Triangle Park, NC, USA, work-
Jesse Wang (M’14) received the B.S. degree in com- ing in the area of transceiver design for communica-
puter engineering from the University of California, tion systems. From 1998 to 2004, he was a Senior
Irvine, CA, USA, in 2006, and a graduate certificate Staff Design Engineer with the Analog/Mixed Signal
in electronic circuits from the Stanford University, Design Group, Cadence Design Systems, Bracknell, UK, working on SerDes
Stanford, CA, USA, in 2010. system architecture. From 2004 to 2010, he was Consultant Design Engineer
From 2005 to 2009, he worked with Novelics LLC, with Artisan/ARM, San Jose, CA, USA, and Technical Lead of SerDes archi-
Aliso Viejo, CA, USA. In 2009, he joined NVIDIA tecture and design. In 2010, he joined Nethra Imaging, Cupertino, CA, USA,
Corporation, Santa Clara, CA, USA, as a member of as a System Architect, and in 2011 he joined the Circuits Research Group,
the Custom Circuit Design Team. His research inter- NVIDIA, Durham, NC, USA, where he currently serves as Director of Circuit
ests include driving custom implementation for CPU Research and leads activities related to high-speed signaling, low-energy mem-
L2 data caches. ories, variation tolerant clocking, and power delivery. His research interests
include digital signal processing design and CMOS implementation of DSP
blocks as well as high-speed serial link communication systems, architectures,
and implementation.

Вам также может понравиться