Вы находитесь на странице: 1из 40

Process Variations and Mitigation Techniques

in
Deep Submicron Devices

Micron tehcnology ==> 1um, 2um, 3um, etc


Sub-micron techology ==> 0.8um, 0.6um, 0.35um 0.25um etc
Deep sub-micro technology ==> 0.18um, 0.13um
Nanotechnoogy ==> 90nm, 65nm etc
Contents
Introduction

Classification of process variation

Spatial Variations & Temporal Variations

Effect of Process Variations

Mitigation Techniques

Conclusion
Introduction
Successful design of IC requires complicate optimization between several
parameter e.g. area, speed, power dissipation, testability etc.

Traditional design assumes that the devices electrical and physical


properties are deterministic and constant over the device life time.

The device characteristics does not behave deterministically and the


fluctuation in device parameter becomes serious as device move into sub-
100nm regime (deep submicron region).

Change in feature size due to limitation of fabrication process is called


process variation e.g. (L = L L)

PV occurs due to limitation of fabrication


Sub-wavelength lithography and etching process
Device Scaling Trend

Figure: Scaling trend of device feature size and optical wavelength of lithography process.
Sources of Process Variations
Optical lithography has been used from last three decades to fabricate IC.

Below 65nm technology, feature size becomes less than wavelength of light
that result in difficult to print accurate design.

Some processes and corresponding variations are:


Wafer: topography, reflectivity
Reticle: CD error, proximity effects, defects
Stepper: lens heating, focus, dose, lens aberrations
Etch: power, pressure, flow rate
Resist: thickness, refractive index
Develop: time, temperature, rinse
Environment: humidity, pressure

These process causes variations in L;W;TOX, VTH (interdie variations) as


well as dopant atom concentration and line edge irregularities (intradie
variations).
Classification of Variations
Variations can be classified into:
Spatial Variations: The variation in the device characteristics at zero time
(t = 0s) is spatial variation. It can be:
Intra-die process variations: PV within the same die
Inter-die process variations: PV between die-to-die/ wafer-to-wafer / lot-to-lot

Temporal Variations: The change in the device characteristics with time is


called temporal
Aging-related: Variation due to age of the device
Environmental condition: Fluctuation in supply voltage, temperature etc.
Inter-die Process Variation
The variation in between the dies that come from different runs, lots and
wafers are called inter-die process variation.

Fluctuation in length (L), width (W), oxide thickness (tox), flat-band


condition are causes of inter-die process variations

These variations causes device may run at different speed or consume


power of different amount over the die of other lot.

These variations are systematic in nature and shift the device characteristic.

The process variations change the nature of delay and leakage from
deterministic to stochastic.
Intra-die Process Variation
Variations in devices characteristics within the same die are Intra-die
process variations.

The variations are due to limitation or statistical effect in the fabrication


process e.g. variation in the doping level.

Spatial correlations and deterministic variation due to CMP and optical


proximity effect.

Random dopant fluctuation (RDF) and Line edge roughness (LER) are two
main source of intra-die process variation.

These variation are random in nature i.e. circuit with intra-die process
variation behave randomly.
Intra-die Process Variation (cont)

At older technology such as 0.5,


0.25 number doping atom was
order of 1000, therefore small
variation was not catastrophic.

At technology below 65nm, no.


of atoms are small (50-100).

Therefore, at 65nm and below


small variation in no. of doping
atoms significantly affects the
channel characteristic.

Figure: Spatial variation (a) channel length variations, (b) RDF, (c)
Scaling trends of number of doping atoms in the channel and (d) LER
Intra-die Process Variation (cont)
RDF is caused by random placement of dopant atom in the channel region

Since the number of dopant atoms are very small (in scaled device) their
placement severely affect the device characteristics

LER is the distortion in gate shape along the channel width direction. This
is distortion occurred during process of gate etching and gate material
dependent

Since the channel width is very small, small fluctuation change the device
characteristics

Figure: Illustration of RDF and LER


Temporal Variations
Another important source of variation is the temporal variation i.e. the
degradation in device performance with time.

The source of these variation may be:


Negative bias temperature instability (NBTI)
Positive bias temperature instability (PBTI)
Hot carrier injection (HCI)
Electro-migration
Time dependent dielectric breakdown (TDDB)

The age related variation degrade the device strength when device is used
for long period of time.

The temporal variations can


Increase leakage current through the gate dielectric
Change transistor metrics such as the threshold voltage, or
Result in the device failure due to oxide breakdown.
Negative Bias Temperature Instability (NBTI)
NBTI has become the limiting factor device life time.

NBTI is the generation of interface traps under negative bias condition at


elevated temperature in PMOS devices.

Breaking of hydrogen bonds creates dangling Si acts as defect trap and


increases Vth. Partial recovery occurred when stress is removed.

Figure: NBTI degradation process in PMOS. Breaking of hydrogen bonds creates dangling Si that acts
as a defect trap near SiSiO2 interface, increases VTH of the transistor. VTH degradation and recovery
mechanism under NBTI stress is also shown.
NBTI (Cont)
NBTI increases threshold voltage, reduce channel mobility and introduces
parasitic capacitances.

NBTI has recovery mechanism i.e. generated traps passivated partially


when stress is removed.

Geometry of the device determines the NBTI-induced interface trap


generation.

Analysis show that NBTI degradation rate increases with scaling for
planner, triple-gate and surround-gate MOSFETs.
Positive Bias Temperature Instability (PBTI)
PBTI is experienced by NMOS transistor similar to NBTI experience by
PMOS.

The effect of NBTI in PMOS was considered more severe than PBTI in
nMOS.

With the introduction of high-k dielectric and metal gates in sub-45nm


technologies, the PBTI is equally concern for reliability as NBTI.
Hot Carrier Injection (HCI)
The carrier heating due to high electric field near drain side of MOS causes
impact ionization resulting degradation is shown below.

HCI is more significant in nMOS as electrons have higher mobility and can
gain higher energy than holes.

HCI has faster rate of degradation, and no recovery than NBTI.

HCI occurs during low to high transition of gate of nMOS and so


degradation increases with higher switching activity.

Figure: Illustration of impact ionization


Time Dependent Dielectric Breakdown
TDDB is known as oxide breakdown.

TDDB is the formation of conductive path from continuous degradation of


the material when a sufficiently high electric field is applied

TDDB accelerate as oxide thickness decreases

Short circuit between substrate and gate electrode result in oxide failure
and once occurred, it causes a sudden energy burst may result in thermal
runaway

TDDB may cause soft breakdown or permanent breakdown.


Effect of PV on Logic
The inter-die variations (variations in L, W, tox, work function, flatband
condition etc.) and intra-die (RDF and LER) modifies the threshold voltage
(Vth) of the device.

In older technology effect of inter-die variation (systematic variation) was


more severe than intra-die (random variation) but within scaled device
intra-die process variation become more severe.

Figure: Growing proportion of within die process variation with technology scaling.
Effect of PV on Logic (cont)
Increased delay and spread of delay distribution: The inter-die and intra-
die variation modify the Vth which result in variation in speed/delay.

If the circuits are designed using nominal VTH of transistors to run at a


particular speed, some of them may fail to meet the desired frequency.

Figure: Variation in threshold voltage and corresponding variation in frequency distribution


Effect of PV on Logic (cont)
Several statistical static timing analysis techniques have been investigated
to accurately model the mean and STD of the circuit delay.

Chips slower than the target delay are discarded (or sold at a lower price).

It is observed that the effect of within-die process variations tends to


average out with the number of logic stages.

Figure: Impact of within-die process variation becomes prominent with increasing pipeline depth.
Effect of PV on Dynamic Logic
The variations have different impact on the dynamic logic. As these logic
works on the principle of precharge and evaluate.

Since the information is saved as charge at the output node capacitor,


dynamic logic is highly susceptible to noise and timings of input signals.

Threshold voltage (Vth) variations due to PV may result in loss of


functionality in dynamic logic where high leakage may kill logic value in
evaluation mode.

Figure: Example of a dynamic logic circuit. The VTH of NMOS transistor determines the noise margin
Effect of PV on Dynamic Logic (cont)
Large transistor sizing is required in the keeper to increase robustness if
devices shows large leakage due to process variation.

Register files with increased leakage due to PV, requires unnecessarily


strong keeper.
Effect of PV on Pipelined Logic
In the high-performance design, the throughput is primarily improved by
pipelining the data and control paths.

In high performance pipelined design, throughput is limited by the slowest


pipe segment.

Under parameter variation as delay varies significantly, identifying the


slowest pipe is difficult.

The variation in the stage delays thus results in variation in the overall
pipeline delay.
Effect of PV on Pipelined Logic (Cont)
Traditionally, the pipeline clock frequency has been enhanced by
Increasing the number of pipeline stages, which, in essence, reduces the logic
depth and hence, the delay of each stage; and
Balancing the delay of the pipe stages, so that the maximum of stage delays is
optimized.

However, it has been observed that if intradie parameter variation is


considered, reducing the logic depth increases the variability.

Thus, balancing the pipelines stages under PV is a very difficult task.


Effect of PV on Logic Power Consumption
Process variation causes increased leakage results in localized heating of
die called hot spot.

Power dissipates as heat, if package unable to sink heat generated by the


circuit, it elevates the temperature which in turn degrade the reliability.

The hot spots are one of the primary factors behind reliability degradation
and thermal runaways.

Significant efforts have been made to generate compact and full-chip


thermal modeling and compact thermal modeling.
Effect of PV on Logic Power Consumption
It is observed that there can be ~100X variation in leakage current in 150-
nm technology.

Designing for the worst case leakage causes excessive guard banding,
resulting in lower performance.

While, underestimating leakage variation results in low yield, as good dies


can be discarded for violating the product leakage requirement.

Figure: Leakage spread in 150-nm technology.


Effect of Temporal PV on Logic
Temperature and voltage fluctuations modifies the circuit delay and reduce
the noise margin in unpredictable manner.

The higher temperature decreases the VTH (good for speed) but reduces the
on current reducing the overall speed of the design.

A higher voltage speeds up the circuit while a lower voltage slows down
the system.

Since the voltage and the temperature depend on the operating conditions,
the circuit speed becomes unpredictable.

The aging-related temporal variations, on the other hand, affect the circuit
speed systematically over a period of time.
Effect of Temporal PV on Logic (Cont)
The impact of NBTI degradation is manifested as the increase of delays of
critical timing paths and reduction of sub-threshold leakage current.
A compact statistical NBTI model considering the random nature of the Si
H bond breaking in scaled transistors has been developed.
It is observed that from the circuit level perspective, NBTI variation closely
resembles the nature of RDF-induced VTH variation.

Figure: Variation under NBTI degradation and RDF (a) both the mean and spread of inverter gate delay distribution increase due to combined
effect of RDF and NBTI. (b) inverter leakage is reduced with NBTI, however, the spread of leakage can increase due to statistical NBTI effect.
Effect of PV on SRAM
PV effect is more severe in SRAM than logic since transistors are of
minimum size for high density requirement.

These variation results mismatch in device strength which in turn increase


the failure of SRAM cells.

The failure in SRAM may be because of


Destructive read (Read failure), Unsuccessful write (Write failure)
Increase in access time (Access time failure), Destruction of cell contents in the standby
mode (Hold failure)

Figure: (a) Overall cell failure probability due to combined effect of read, write, access, and hold failures. (b) Memory array equipped with
redundant columns for repair
Effect of PV on SRAM (cont)
Inter-die variation causes shift in Vth, this results in increase in effect of
random intra-die variation effect.
Low Vth shift increase read and hold failure probability while high Vth shift
increase write and access failure probability
Failure in any cell of a column make column faulty and if number of faulty
column are more than redundant column then chip is considered to be
faulty
Impact of temporal variation on SRAM
The process variation result in degradation in all transistor of SRAM cell
while NBTI affect only pMOS, result in increased failure

A SRAM with perfect SNM may experience failure after few years due to
NBTI. Results shows that SNM degrades more than 9% in three years

The cell storing constant data for long time results in more degradation

SNM degradation severely increase with unmatched signal probabilities


Variation Tolerant Design
There are number of circuit and micro-architectural techniques to provide
the process variation tolerance

The technique are mainly two types design-time and run-time given in table
Circuit Level Technique for SRAM
Sizing: Length and width of different transistor of SRAM cell affect cell
failure probabilities by modifying access time, read voltage, trip point of
the inverter, write time, and minimum retention voltage
Weak access transistor reduces PRF but increases PAF, PWF i.e. it is good for
read but bad for write
Reducing strength of pull up transistor reduces PWF but increases PRF and
PHF also degrade i.e. weak pull up good for write but read
Increasing the strength of pull down nMOS transistor reduces PRF and PAF
Circuit Level Technique for SRAM (cont)
Body Bias Technique: Transistor threshold which varied due to PV, is
critical parameter can be controlled by applying body bias as shown below
Leakage monitor detect the threshold voltage of SRAM array if high
indicates low Vth may result in read and hold failure which is improved by
generating RBB to increase Vth with the help of body-bias generator
Similarly, high Vth due to PV effects Access and write failure can be
reduced by increasing Vth with the help of applied FBB by body-bias
generator
Circuit Level Technique for SRAM (cont)
Source Biasing: Source potential control the leakage in the memory
When a particular row is accessed, the source line is biased to zero results
in increase in drive current and achieve fast read/write operation.
When row is not accessed, source line is raise to VSB potential which
reduce subthreshold and leakage current in turn reduces hold failure
Adaptive source biasing is also used to minimize leakage while controlling
the hold failure
8T and 10T SRAM cell
The major problem with 6T is read and write margins have tradeoffs each
other, increasing one decrease other
If read and write operation are decoupled, designer will get better flexibility
to optimize read and write margins, 8T and 10T cell provide this feature
6T cell where two MOS are added provide read operation without
disturbing the internal node. This significantly improve the SNM at the cost
of ~30% increase in area
10T SRAM cell shown below improve read SNM equal to hold SNM and
M10 reduces leakage current but cell still have single ended read operation

8T SRAM Cell 10T SRAM cell


8T and 10T SRAM cell
New differential 10T cell shown below improve read SNM also allowing
bit interleaving and avoiding pseudo-read. It can work with low supply as
~200mv.
Another 10T cell shown uses ST principle to improve read and write
margin concurrently.
Feedback transistors AXL2/AXR2 raise the switching threshold voltage of
the inverter during 01 transition providing ST action

10T SRAM Cell ST based SRAM cell


Architectural Techniques for PV Tolerance
Cache resizing: process tolerance cache are designed to detect and replace
faulty cell adaptively.

Cache is built with online BIST to detect faulty cells and a configurator for
remapping the faulty block

To map faulty block to non-faulty permanently, tag array width is expended


and these extra bits are set appropriately such that whenever accessed the
cache miss occurs.

Cache line deletion: In this faulty lines are marked and excluded from
normal cache line allocation and use.

Availability bits may be attached to each cache tag to determine the


faulty cache.
Process Tolerant Logic Design
Conservative approach uses enough margin which is area and power
inefficient
Logic Sizing: The delay of a transistor can be modulated by W/L ratio
which in turn change the drive ability
Threshold voltage of the transistor is the important parameter that can be
changed by adaptive body bias to achieve tradeoff between speed and
leakage
Appropriately modulating the supply voltage will provide the improvement
in the performance degradation
Process Tolerant Logic Design (cont)
RAZOR: Supply voltage is tuned by monitoring the error rate during
operation
RAZOR flip-flop double sample the pipelined stages, once with fast clock
and again with time-borrowing delayed clock
Meta-stability-tolerant comparator validate latched values sampled at fast
clock. If there is timing error, a modified pipeline misspeculation recovery
mechanism restore the correct program state

RAZOR
References

Swaroop Ghosh and Kaushik Roy, "Parameter Variation Tolerance and


Error Resiliency: New Design Paradigm for the Nanoscale Era ,
Proceedings of the IEEE, Volume: 98 , pp 1718-1751, Issue: 10, 2010.
S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, B Adaptive techniques for
overcoming performance degradation due to aging in digital circuits, in
Proc. Asia-South Pacific Design Autom. Conf., 2009, pp. 284289.
K. Kang, S. P. Park, K. Roy, and M. A. Alam, B Estimation of statistical
variation in temporal NBTI degradation and its impact in lifetime circuit
performance, in Proc. Int. Conf. Comput.-Aided Design, 2007, pp. 730
734.
J. P. Kulkarni, K. Kim, S. Park, and K. Roy, BProcess variation tolerant
SRAM design for ultra low voltage application, Int. Proc. Design Autom.
Conf., 2008, pp. 108113.

Вам также может понравиться