Академический Документы
Профессиональный Документы
Культура Документы
| |
|
+
\ .
>
| |
|
\ .
, ,
2
3
3 5
5
With 2.5 , 0.4 , 2.25
1 (2.1)
1.3
2.25 2(1.9)0.4
n
DD T n T p
p
V V V V V
W
L W W
W L L
L
= = = =
| |
|
| | | | \ .
> >
| |
| |
\ . \ .
|
\ .
33
Shahin Nazarian/EE577A/Spring 2012
Write Operation (Cont.)
34
Shahin Nazarian/EE577A/Spring 2012
SRAM Column Write
35
Shahin Nazarian/EE577A/Spring 2012
SRAM Sizing Summary
Bls are high during read, they should not overpower the
inverters during read, therefore nMOS transistors should
be strong to pull them down
However during write, the bls need to overpower, so we
make pMOS transistors weak
bit
bit_b
med
A
weak
strong
med
A_b
word
36
Shahin Nazarian/EE577A/Spring 2012
Typical SRAM Transistor Sizes
Transistors may be sized as
follows:
nMOS pulldown:M1,M2: 6:2
pMOS pullup: M5, M6: 4:3
Access xtors: M3, M4: 4:2
All boundaries are shared
Reduces the Write delay
One may also use equal-size
transistors in the SRAM cell
(e.g., 4:2 for all) however this
should be carefully checked, as
this sizing is not conservative
may not work for all scenarios
WL
Bit
Line
Bit_bar
Line
M5
M1
M3
Yet a different layout
37
M4
M2
M4
Shahin Nazarian/EE577A/Spring 2012
Example Design of a 256Kbit SRAM Array
2 macro-blocks (aka 2
banks), each with 256 rows
and 512 columns (total 2
18
bits)
Want to access a double-
word (2
6
=64 bits) at a time
Need 12 address lines
A
0
,,A
11
4 LSB bits (A
0
,,A
3
)
are used for column
addressing while the
other 8 MSB bits
(A
4
A
11
) are used for
row addressing
38
Decoder
SRAM Cells
Macro#1
SRAM Cells
Macro#2
Read Multiplexer
Sense Amplifier
Output Buffer
DFF
Column Decoder
Sense_en
Control Circuit
Addr
clk
Addr
wldummy
precharge
Read_write
Write_en
Sense_en
clk
precharge precharge
Addr wldummy
wl wl
Output
Write Multiplexer
Write Circuit Write_en
512 512
1024
1024
64
256 256
512
64
4 16
Need 8:256 row decoder, 4:16 column decoder, and 16:1
Read and Write Multiplexers
Use 64 sense amplifiers
Shahin Nazarian/EE577A/Spring 2012
Sense Amplifier
The bit line capacitance is
significant for large arrays
If each cell contributes 2fF,
with 256 cells per column, we get
512fF plus wire cap. Pull-down
resistance is about 15K. The RC
delay will be 5.3ns (with AV =
V
DD
) ?
We cannot easily change R, C,
or V
DD
, but can change AV !
It is possible to reliably sense
AVs as small as 50mV
With margin for noise, most
SRAMs sense bit-line swings
between 100~300mV
For writes, we still need to
drive the bit line to full-swing
Only one driver needs to be this
big
39
Use SPICE sweep function to
optimize transistor sizes
(typically, Q0, Q5, Q6 are
minimum-size transistors)
Isolation
Transistors
Regenerative
Amplifier
Q1
Q2
Q3 Q4
Q5
Q0
Q6
sense_en
sense_en sense_en
bit
bit_bar
out out_bar
Shahin Nazarian/EE577A/Spring 2012
Sense Amplifier Waveforms
bit
bit_bar
out
out_bar
sense_en sense_en
40
Shahin Nazarian/EE577A/Spring 2012
Comments about the Sense Amp Design
Isolation transistors must be pMOS
Bit lines are within 0.2V of V
DD
(not enough to turn an nMOS
transistor ON)
Load on outputs of regenerative amplifier must be
equal
Need to precharge the sense amp before opening the
isolation transistors to avoid discharging the bit lines
Out Out_bar
Data
Data_bar
41
Both outputs go high during
precharge
Usually follow the regenerative
amplifier by a cross-coupled
NAND latch
Requires 3 timing phases
Typically self-timed
Shahin Nazarian/EE577A/Spring 2012
Precharge and Write Circuitry
Recall that M3 and M5 denote
the nMOS access transistor
and the pMOS pullup transistor
inside the SRAM cell (on the
bit line side)
For successful write operation,
R3+R9+R7 should be < R5
Let R
*
denote resistance of 2:2
nMOS transistor, and
n
/
p
=2
If M3=4:2 and M5=4:3, then
R5 = R
*
and R
3
=R
*
;
therefore, R9+R7 should be
R
*
M9 and M7 should be 16:2
each
I
42
Shahin Nazarian/EE577A/Spring 2012
Row Decoder
Another example of pre-decoding addresses decode in octal
addresses
One-level decoding of 9-bit address (A8 A7 A0) requires 512 nine-input
AND gates
Predecode (A2 A1 A0), (A5 A4 A3), and (A8 A7 A6) by using 3*2
3
=24
three-input NAND gates, followed by 8
3
=512 three-input NOR gates
43
Two implementations of a 4:16 decoder
Requires 16 four-input AND gate
Requires 8+16=24 two-input NAND/NOR gates
Shahin Nazarian/EE577A/Spring 2012
4-to-1 Tree Multiplexer for Read Circuitry
44
BL0 BL1 BL2 BL3
pmos
Shahin Nazarian/EE577A/Spring 2012
Precharge
SRAM
Cell
Write-mux
Write-en
data
4:16
Decoder
Sense-en
Sense-en
Sense-en
Read-mux
Decoder
(8:256)
A0
A1
A2
A3
A4
A5
A15
15 Write_mux transistors
15 Read_mux transistors
60
30
20
20
60
60
160
160
20
20
20
4 4
4
30
20
A4
A5
A11
A0
A1
A2
A3
To Read-muxs
Column Mux
We have 16 read_mux and 16
write_mux transistors in parallel
During read operation, one of
the read MUXs is selected,
according to the values of
A
0
,,A
3
, and that column
enables the sense amplifier
and the corresponding value
of SRAM cell will be read at
the output of sense
During the write operation,
the desired SRAM cell is
selected and the data will be
written into the corresponding
SRAM cell
Need to replicate the drawing for
the bit_bar side
Need a total of 64 similar
structures which makes 16:1 64-
bit wide column MUX
45
Shahin Nazarian/EE577A/Spring 2012
SRAM Array Floor plan
Column Mux
16:1
D
E
C
O
D
E
R
64 64
r rows
c columns
... 64 64
r rows
c columns
...
64
Sense Amplifier
Output buffer
SRAM Cells, Macro 1
SRAM Cells, Macro 2
Decoder
Control
10240
20480
10240
Precharge Precharge 82
670
10000
MUX
82
Sense Amplifier
40
Output buffer 1 40
Output buffer 2
128
1900
128
Output Flip-Flop
270
All units are in
46
Shahin Nazarian/EE577A/Spring 2012
Example Read Delay Calculation for an
SRAM Array
Consider a 256512 SRAM core. Bit lines are pre-charged to
V
DD
= 2.5V before each read operation. A read operation is
complete when the bit line has discharged by 0.25V. A memory
cell can provide 0.25mA of pull-down current to discharge the
bit line. Assume the word line resistance is 2 per memory cell,
the word line capacitance is 20fF per memory cell, while the bit
line capacitance is 12fF per cell. Ignore the bit-line resistance
and read-mux transistor. Calculate the worst-case read delay
for this SRAM. Assume row decoding takes 3ns while sense
amplifier and output buffer take 1ns.
Solution: Each word line drives 512 SRAM cells; The RC delay
for driving the furthest cell is:
The time needed to discharge the bit or bit_bar line by 250mV
is:
512
1 1
( 1)
0.69 0.69 0.69 256 513 20 2 3.7
2
N k
row j k cell cell
k j
N N
t R C R C f ns
=
= =
| |
+
| |
= = = =
|
|
\ .
\ .
256 12 0.25
3.1
0.25
10.8
col
col
dis
access dec row col sen buf
C V f
t ns
I m
t t t t t ns
+
A
= = =
= + + + =
47
Shahin Nazarian/EE577A/Spring 2012
SRAM Scaling Challenges
For cell stability, separate power rails for cell array
vs. word line driver may be needed (bad for
leakage)
Reduced read and write margins as we scale
voltages
Increased transistor leakage (high-k gate dielectric)
Introduction of various power management modes:
Reduced V
DD
Raised V
SS
Soft error immunity
Low standby power
48
Shahin Nazarian/EE577A/Spring 2012
Optional: Leakage Currents in the SRAM
Cell
M1
M2
M5 M6
M3
M4
0
Vdd
VDD
I
sub3
I
s
u
b
5
I
s
u
b
2
I
gate1
0
0
Vdd
Vdd
1 3 2 5
3 2 5 , ,
( )
gate leak sub sub sub
sub sub sub bitline cell leak leak
I I I I I
I I I I I
= + + +
+ + = +
49
Note that I
leak
is dominated by the drain-source
leakage in 90nm CMOS technology (i.e., we may ignore
gate leakage and other leakage mechanisms which are
small compared to the sub-threshold conduction
currents.)
Shahin Nazarian/EE577A/Spring 2012
Optional: Bitcell Stability Failures
Read Access Failure
The WL activation period is too short for a pre-specified V
to develop between bit line and bit_bar line in order to
trigger the sense amplifier correctly during read
This may occur due to increase in V
t
for the pass-gate or
pull-down transistors
Read Stability Failure
Cell may flip due to increase in the 0 storage node above
the trip voltage of the other inverter during a read
To quantify the bitcell's robustness against this failure,
SNM is the most commonly used metric
Notice that read stability failure can occur anytime the
WL is enabled even if the bitcell is not accessed for read
or write operations
SNM related failures are the limiter for V
DD
scaling
especially after accounting for device degradation due to
hot electron effects and negative bias temperature
instability
50
Shahin Nazarian/EE577A/Spring 2012
Optional: Read Stability Failure in the
SRAM Cell
A read stability (a.k.a. hold) failure occurs when
stored data flips during the memory standby mode
(while WL is enabled)
A cells VTC is composed of the two inverters VTCs that
enclose two regions
The cells hold stability is characterized by the static noise
margin (SNM), which is measured by the diagonal length of
the largest square fitted in the enclosed region (the derivation
is omitted)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
SNM
Ideal
VTC
Actua
l VTC
SNM
V
out,L
V
in,R
V
out,R
V
in,L 51
The SNM butterfly
curves must be analyzed
for different process
corners, FS: fast
NMOS, slow PMOS and
SF: slow NMOS, fast
PMOS and different
temperatures
Shahin Nazarian/EE577A/Spring 2012
Optional: Bitcell Stability Failures (Cont.)
Write Stability (or write-ability) Failure
The internal 1 storage node may not be reduced below
the trip point of the other inverter during the WL
activation period
One way to quantify a cell's write stability is to use write
trip voltage or write margin (WM), which is the maximum
bit line voltage at which the bitcell flips state (assuming
that bit line is pulled to GND by the line driver)
Data Retention Failure
When V
DD
is reduced to the Data Retention Voltage, all
six transistors in the SRAM cell operate in subthreshold
region, hence, they show strong sensitivity to variations
PMOS transistor must provide enough current to
compensate for leakage in the NMOS pull-down and access
transistors
Due to L and V
T
variations, data retention current may not
be sufficient to compensate the leakage current
52
Shahin Nazarian/EE577A/Spring 2012
Optional: Minimum Voltage Needed to
Preserve Data
The Data Retention Voltage (DRV) is
defined as the minimum V
DD
under which
the data in a SRAM cell is still
preserved
When V
DD
is reduced to DRV, all
transistors are in the sub-threshold
region, thus SRAM data retention
strongly depends on the sub-V
T
current conduction behavior (i.e.,
leakage)
Cell leakage is greatly reduced at
DRV
This provides a highly effective
leakage suppression scheme for
standby mode
Maximum leakage saving and
minimum design overhead
Distribution of DRV in a 0.13u
CMOS with 3 variations in VT
and L
Measured SRAM leakage
current
53
Shahin Nazarian/EE577A/Spring 2012
Optional: DRV of SRAM (Cont.)
When V
DD
scales down to DRV,
the Voltage Transfer Curves
(VTC) of the internal inverters
degrade to such a level that
Static Noise Margin (SNM) of the
SRAM cell is reduced to zero
The temperature coefficient of
DRV is 0.169mV/C, which implies
an increase of 12.3mV in DRV
when temperature rises from
27C to 100C
54
DRV V when ,
DD
inverter Right
2
1
inverter Left
2
1
=
c
c
=
c
c
V
V
V
V
DRV Condition:
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
V
1
(V)
V
2
(
V
)
VTC
1
VTC
2
V
DD
=0.18V
V
DD
=0.4V
VTC of SRAM cell inverters
V
DD
V
1
M
2
M
6
M
4
M
3
M
1
M
5
V
2
Leakage
current
V
DD
V
DD
0
0
Shahin Nazarian/EE577A/Spring 2012
If collected charge Q
s
exceeds some critical charge
level Q
crit
, it will upset bit value and cause a soft error
Soft Error Rate (SER) in SRAM:
Q
crit
is 10fC in a 65nm CMOS process
Optional: Soft Error Rate for the
SRAM Cell
A high-energy alpha particle or
an atmospheric Neutron striking
a capacitive node
Deposits charge leading to a time-
varying current injection at the
node
In case of atmospheric
Neutrons:
2
( , ) exp( )
s s s
Q t t
I Q t
T T T p
-
=
exp( )
crit
s
Q
SER
Q
-
;
0
20
40
60
80
100
120
140
0 50 100 150 200
Time(ps)
I
(
Q
,
t
)
(
u
A
)
55
Shahin Nazarian/EE577A/Spring 2012
Optional: How to Mitigate the SER Fail Rate
To mitigate soft errors, several radiation-hardening
techniques can be implemented
Process technology changes (e.g., SOI technology)
Circuit design (e.g., adding capacitor, using larger
transistors, memory words interleaving)
Architecture (e.g., parity, error correction codes)
56
Good News: SER per bit value tend to decrease with scaling