Вы находитесь на странице: 1из 89

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/269818702

Aging Analysis of Datapath Sub-blocks Based on CET Map Model for Negative
Bias Temperature Instability (NBTI)

Thesis · January 2014

CITATIONS READS

0 6,535

1 author:

Moustafa Khatib
Fondazione Bruno Kessler
13 PUBLICATIONS   101 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

THz Radiation Detection Based on CMOS Technology View project

All content following this page was uploaded by Moustafa Khatib on 21 December 2014.

The user has requested enhancement of the downloaded file.


Aging Analysis of Datapath Sub-blocks Based on CET Map Model
for Negative Bias Temperature Instability (NBTI)

A Thesis Presented to the Faculty

Of

Nile University

In Partial Fulfillment of the


Requirements for the Degree of

Master of Science

in

Micro-electronics System Design

By

Moustafa Khatib

January 2014
This page left intentionally blank

ii
iii
iv
To my family, my friends, people from whom I learned
and to the reader.

v
ACKNOWLEDGMENTS

I would like to dedicate a few lines to everyone who has made possible this thesis.
First of all, I would like to sincerely thank Professor Francky Catthoor, for giving me
the chance of coming to imec and work under his supervision. I appreciate his continued
support and guidance throughout this thesis project.
Second, I would like to thank my advisor Professor Rafik Guindi for his continuous
support through my graduate studies and his time and effort he spent with me in order to
accomplish my master.
Third, I express my great gratitude to my daily supervisors in imec, Praveen Raghavan
and Halil Kükner for their time and effort they spent with me through the thesis work. This
work would not have been possible without their vision and guidance.
I would like to thank Professor Ahmed Radwan for the time and efforts he dedicated to
review my thesis work, being a member of my thesis defense committee. I would also thank
him for time and huge effort, he has provided during my research work in Memristive
Systems. I learnt a lot from him during my graduate studies.
I also would like to thank all the other excellent colleagues in imec: Seyab Khan,
Matthias Hartmann, Vikas Dubey, Alaa Medra, Hossam Al Anzeery and many friends who
were a great help during my stay in Belgium.
I also would like to thank my mother and sisters whose support and encouragement
helped me carrying out my studies in Nile University.
Finally, I express my gratitude to Allah (God) for providing me the blessings and
strength to complete this work.

Moustafa Khatib
January, 2014

vi
ACRONYMS

ASIC Application Specific Integrated Circuit


ALU Arithmetic Logic Unit
AVERA Adaptive Variation and Error Resilient Agent
BTI Bias Temperature Instability
CET Capture/Emission Time
CLA Carry Look Ahead
CPA Carry Propagation Adder
CSA Carry Save Adder
DF Duty Factor
DSP Digital Signal Processing
DSM Deep Sub-Micron
DRM Dynamic Reliability Management
EDA Electronic Design Automation
EM Electro-Migration
FF Flip-Flop
FFT Fast Fourier Transform
FO Fan Out
HCI Hot Carrier Injection
HKMG High-K Metal Gate
HND Half Normal Distribution
IC Integrated Circuit
LD Logic Depth
LDPC Low Density Parity Check
LER Line-edge Roughness
LLC Longest Carry Chain
LUT Look-up Table
MOSFET Metal Oxide Semiconductor Field Effect Transistor
MSM Measure-Stress-Measure
NBTI Negative Bias Temperature Instability
PBTI Positive Bias Temperature Instability

vii
PDN Pull-down Network
PG Propagate and Generate
PP Partial Product
PTM Predictive Technology Model
PUN Pull-up Network
PVT Process Voltage Temperature
RCA Ripple Carry Adder
RD Reaction Diffusion
RDD Reaction Dispersive Diffusion
RDF Random Dopant Fluctuation
RTN Random Telegraph Noise
SAIF Switching Activity and Interchange Factor
SEU Single Event Upset
SPAF Signal Probability and Activity Factor
SSTA Statistical Static Timing Analysis
STA Static Timing Analysis
TCL Tool Command Language
TDDB Time Dependent Dielectric Breakdown
TDDS Time-dependent Defect Spectroscopy
VHDL Very high speed integrated circuit Hardware Description Language
WL Workload
WN White Noise

viii
Abstract

The continuous downscaling of CMOS technologies over the last few decades has resulted in
higher integrated circuit (IC) performance and density. However, it has caused also the
emergence of several reliability issues. Among these reliability issues, Bias Temperature
Instability (BTI) is the most prominent one for logic CMOS devices and it has drawn much
attention in the recent years.

BTI is a time-dependent degradation phenomenon and it worsens as a transistor ages. It is


proving to be one of the major limitations of the lifetime of present-day downscaled devices.
At the circuit level, BTI is manifested as a shift in a device’s threshold voltage and drain
current due to the generation of traps in the gate oxide layer. That is increased further in the
presence of higher electric fields and temperatures, and due to the smaller dimensions which
are a result of downscaling.

Unfortunately, most of the current Electronic Design Automation (EDA) tools lack in the
ability to accurately predict and analyze the impact of BTI. Other tools able to analyze the
BTI are often on a very low design level and require significant computational resources. The
purpose of this thesis is to analyze the BTI degradation impact at sub-block-level in processor
datapaths.

BTI aging affects the performance of sub-blocks in the datapath and this is studied in
relation to workload dependency and architectural topology, based on the state-of-the-art
Capture/Emission Time (CET) map-based model developed at IMEC for a 10nm FinFET
process technology. BTI aging analysis is carried out by a novel aging-aware digital design
framework that has been developed on top of the industry standard EDA tool chain. Static
Timing Analysis (STA) is performed to capture the performance metrics of the aged system.
A comparison of the original designs with their degraded counterparts has revealed many
significant observations on circuit performance.

ix
Finally, a few error resilient techniques to achieve the required timing are investigated.
Moreover, implementation and simulation of error sensor circuit are presented that can be
embedded to the critical paths of the aged system for detecting timing errors.

Keywords: Bias Temperature Instability (BTI), CMOS downscaling, aging effects,


error-resilient techniques, logic synthesis, BTI mitigation, datapath sub-blocks.

x
This page left intentionally blank

xi
Contents

ACKNOWLEDGMENTS ...................................................................................................... vi
ACRONYMS .......................................................................................................................... vii
Abstract .................................................................................................................................... ix
Chapter 1: Introduction........................................................................................................... 1
1.1 Reliability Issues in Deep Sub-Micron Technologies ................................................. 1
1.2 Thesis Contribution ..................................................................................................... 3
1.3 Thesis Outline and Organization ................................................................................. 4
Chapter 2: Bias Temperature Instability ............................................................................... 6
2.1 Physical Mechanism Behind BTI ................................................................................ 6
2.2 Transistor-level Modeling ........................................................................................... 9
2.2.1 The Reaction-Diffusion (R-D) Model ................................................................. 9
2.2.2 The Atomic Trap-based Model ......................................................................... 12
2.2.3 Capture/Emission Time (CET) Map-based Model ........................................... 15
2.3 BTI Impacts at Gate-level ......................................................................................... 16
2.3.1 Experimental Setup ........................................................................................... 17
2.3.2 Experimental Results ......................................................................................... 17
2.4 Chapter Review ......................................................................................................... 20
Chapter 3: Workload-Dependent NBTI Reliability-Aware Logic Synthesis ................... 22
3.1 Related Work ............................................................................................................. 22
3.2 Workload-Dependent CET Map-based NBTI Model ............................................... 23
3.3 Workload-Dependent Aging-Aware Logic Synthesis ............................................... 27
3.4 Chapter Review ......................................................................................................... 30
Chapter 4: NBTI Aging Analysis of Datapath Sub-blocks................................................. 32
4.1 Introduction ............................................................................................................... 32
4.2 Datapath Sub-blocks .................................................................................................. 33
4.3 Experimental Setup ................................................................................................... 46
4.4 Experimental Results and Discussion ....................................................................... 48
4.5 Chapter Review ......................................................................................................... 56

xii
Chapter 5: Exploration on Timing Error Resilient Techniques ........................................ 58
5.1 Introduction ............................................................................................................... 58
5.2 Error Detection Scheme ............................................................................................ 59
5.3 Simulation Results ..................................................................................................... 61
5.4 Chapter Review ......................................................................................................... 62
Chapter 6: Conclusions and Future Work .......................................................................... 64
6.1 Conclusions ............................................................................................................... 64
6.2 Suggestions for Future Work..................................................................................... 65
Bibliography ........................................................................................................................... 67

xiii
List of Figures

1.1 SRAM devices variation due to scaling trend. Ref. [10] ........................................... 2

2.1 Formation of traps in gate oxide layer of PMOS transistor .............................................. 6

2.2 Threshold voltage shift due to BTI aging.......................................................................... 7

2.3 Two phases of BTI. The transistor does not fully recover ................................................ 8

2.4 (a) Stress phase, (b) Recovery phase of R-D model ......................................................... 10

2.5 Concepts behind Atomic Trap-based model ................................................................... 13

2.6 Extending the measured CET map Ref. [47] .................................................................. 16

2.7 Degradation impact on inverter and buffer gates with increasing drive strength ........... 18

3.1 CET map integration w.r.t. the workload to obtain the active traps on device. Ref.[47] 24

3.2 Geometric parameters of FinFET device. Ref. [48] ........................................................ 25

3.3 Threshold-voltage shift surfaces under (a) mean, and (b) 3-sigma corners w.r.t. duty

factor, age and device size, for 10 nm FinFET process technology. ........................................ 26

3.4 Workload-dependent NBTI reliability-aware digital design framework. ....................... 28

4.1 Structure of parallel prefix adder Ref. [51] ..................................................................... 33

4.2 Parallel prefix adders (a) KoggeStone, (b) HanCarlson, (c) LadnerFischer, (d) Sklansky,

(e) Knowles22111, (f) Knowles44221 and (g) BrentKung ...................................................... 34

4.3 Taxonomy of parallel-prefix adders. ............................................................................... 35

4.4 RippleCarry adder based on HAs cells ........................................................................... 35

4.5 Main components of parallel multiplier structure ........................................................... 36

4.6 The building block of CSA multiplier............................................................................. 37

4.7 (a) Full Array multiplier, (b) CarrySave Array multiplier ................................................ 38

4.8 Dot diagram of 16-bit Wallace multiplier ....................................................................... 39

4.9 Dot diagram of 16-bit Dadda multiplier .......................................................................... 40

xiv
4.10 Booth encoder and decoder circuits .............................................................................. 41

4.11 Shifter architectures: (a) Logarithmic shifter, (b) Barrel shifter ................................... 42

4.12 Two different multiplexer structures based on MUX4 cells and logic gates ................ 44

4.13 Structure of 16-bit Register File .................................................................................... 45

4.14 Critical path timing and delay increment in the sub-blocks: (a) Adder, (b) Multiplier,

(c) Shifter, and (d) Mux/Demux ............................................................................................... 49

4.15 Percentages of critical path replacement ....................................................................... 50

4.16 Delay degradation w.r.t workload profiles for different architectures of (a) adder, b)

multiplier, c) mux/demux, d) shifter blocks ............................................................................. 51

4.17 Impact of workload-variation on the delay degradation of different architectures of

datapath sub-blocks .................................................................................................................. 52

4.18 Delay degradation w.r.t workload profiles for write and read operations of register file

.................................................................................................................................................. 53

4.19 Impact of workload-variation on the delay degradation for write and read operations of

register file................................................................................................................................ 54

4.20 Relative aging of datapath sub-blocks .......................................................................... 55

5.1 Timing error detection scheme ........................................................................................ 59

5.2 Simulation of 20 stage inverter chain with the proposed error detection circuit (a) before

aging, (b) after aging ................................................................................................................ 60

5.3 Paths delay distributions.................................................................................................. 61

xv
List of Tables

4.1 Truth table of Booth-encoding algorithm......................................................................... 41

4.2 Workload profiles ............................................................................................................. 47

xvi
This page left intentionally blank

xvii
Chapter 1
Introduction

1.1 Reliability Issues in Deep Sub-Micron Technologies

The semiconductor industry has witnessed remarkable growth and achievements in IC


manufacturing through significant scaling in transistor dimensions. Such scaling has not only
made the IC more compact and dense, but also enhanced its performance and integration
attractively without any increase in power consumption as long as the chip area is kept
constant. Although the vision of Gordon Moore in 1965 that the complexities of an IC will be
approximately doubled every two years [1] seemed to be a dream at that time, but it came
true over four decades, this would result in enabling the production of more complex circuits.
In the year 2010, there were more than billion transistors in one single processor die. So, this
high integration density has to be accompanied by tough efforts to increase the IC reliability.
Since the failure of several transistors in a circuit can lead to complete failure of the whole
system.

Despite there were some claims of semiconductor industry hitting a “red brick” wall at
100nm technology node in 1998 [2], leading edge research and development is currently
working towards developing transistors even for 22nm technology node and beyond [3]. Of
course, at some point, semiconductor scaling will approach its final limits. As with the
continuous downscaling of device dimensions, variations of transistor parameters are
increasing drastically and lead to unexpected reliability issues [4, 5]. These issues are
essentially classified to “Time-zero” variability issues [6] such as Line-Edge Roughness
(LER), Random Dopant Fluctuations (RDFs), Metal Gate Granularity and Body Thickness
Variation, etc. that cause intra-die variations during manufacturing process, and “Time-
dependent” variability issues that are considered a major source of performance degradation

1
of scaled devices over their lifetime, such as Negative Bias Temperature Instability (NBTI)
[7], Hot Carrier Injection (HCI) [8], and Time-Dependent Dielectric Breakdown (TDDB) [9]
etc., these degradation mechanisms caused by the formation of charge traps inside the gate
oxide layer due to the high electric field and temperature and lead to a change of device
parameters (e.g. threshold voltage, carrier mobility, drain current) over time dependent on the
operating conditions and the workload over lifetime. Therefore, these issues degrade
reliability of the scaled devices and eventually may lead to IC failure, when the variations
reach to a certain limit.

An example to demonstrate the impact of variability on the scaled devices, Figure 1.1
shows that variation for SRAM devices increases significantly with the continuous
downscaling of technologies. Variation can reach up to 50% of in advanced technologies
[10] which strongly affects SRAM functionality and pose a major challenge for SRAM
design.

Figure 1.1: SRAM devices variation due to scaling trend. Ref. [10]

Reliability of digital integrated circuits has become one of the critical challenges at Deep
Sub-Micron (DSM) semiconductor technologies. Researchers, nowadays, are studying these
reliability issues at different design levels such as process-, transistor-, and circuit-level. They
also argue that these time-dependent mechanisms can be best described in terms of an
ensemble of individual defects and their time, voltage, and temperature-dependent properties

2
then can be modeled and inserted into a circuit simulation, and thus, enabling reliability-
aware design [11] as will be discussed in the following chapters.

In fact, the simulation and analysis of aging effects at higher design levels are basically
difficult, since the degradation rate depends on operating conditions and workloads over the
lifetime. These factors are often unknown during the design of a circuit since the change of
workloads applied to a circuit will lead to different amounts of performance degradation, and
thus, it imposes dramatic challenges for the design of digital integrated circuits. In order to
improve design predictability and support robust design it is necessary to develop appropriate
techniques that are efficiently able to predict the aging effects in the existing and future
technologies. In this thesis, we objective to provide a detailed and accurate study for
predicting aging effects that result due to one of above mentioned time-dependent phenomena
that called NBTI mechanism.

NBTI has become one of the major reliability threats for the future of most semiconductor
industries. NBTI effects worsen with successive technology generations with greater
performance and reliability loss, as this phenomenon causes severe shifts of important
transistor parameters. Hence its exact modeling and estimation of its effects on circuit
performance have become imperative. NBTI degradation mechanism and its implications on
circuit reliability are the main focus of this work.

1.2 Thesis Contribution

The contributions in this thesis are summarized in two points:

 A Comprehensive analysis of NBTI aging effects on datapath sub-blocks at 10nm


FinFET process technology based on the state-of-the-art Capture/Emission Time
(CET) map NBTI model by using a novel workload-dependent aging-aware logic
synthesis flow.

 Implementation of timing error detection circuit that can be integrated into the
degraded circuit paths. Moreover, a review of the current timing error resilient
techniques is presented.

3
1.3 Thesis Outline and Organization

The organization of this thesis is as follows: In the next chapter, the background and physical
concepts behind NBTI phenomenon and the ways of modeling this degradation mechanism at
transistor and gate levels are explained.

Chapter 3 presents detailed steps of the method for predicting the impacts of NBTI
aging at circuit-level based on the state-of-the-art CET map-based NBTI model.

Chapter 4 presents a comprehensive analysis of NBTI aging effects on datapath sub-


blocks (e.g. adder, multiplier, shifter, mux/demux, register file etc.) based on predictive 10nm
FinFET process technology.

Chapter 5 discusses the timing error resilient techniques and presents an


implementation of error detection scheme that can sense timing errors in the aged designs.

Finally, Chapter 6 summarizes the work that has been done in the course of this thesis
and suggests ideas for future work related to this topic.

4
This page left intentionally blank

5
Chapter 2
Bias Temperature Instability

2.1 Physical Mechanism Behind BTI

Bias Temperature Instability (BTI) is a time-dependent degradation mechanism, it has been


known since 1966 [12] and a model for understanding its effects was first proposed in 1977
[21]. BTI has been emerged as a key reliability concern due to its increasingly negative
impact on performance of modern electronic devices. BTI effects worsen as a transistor ages,
and lead to severe shifts of important transistor parameters. Therefore, understanding the
impacts of BTI degradation is of primary importance for current and near future CMOS
technologies.

The continuous MOSFET miniaturization trends (i.e., aggressive oxide thickness scaling)
result in higher oxide fields and temperature [7]. Consequently, more charge traps are able to
tunnel through the gate oxide. Figure 2.1 shows the formation of charge traps in the oxide
layer when a bias is applied to the transistor’s gate.

Figure 2.1: Formation of traps in gate oxide layer of PMOS transistor

6
These traps capture some of the charge carriers responsible for the current flowing between
transistors’ source and drain. Therefore, it results in the formation of a narrower transistor
channel due to this charge loss. That means less current can flow through the device.
Consequently, device performance will be degraded. These effects show up themselves at the
circuit-level by increasing circuit delays and in turn circuit timing errors.

In order to maintain the drain current to its pre-degradation state, a higher voltage bias
needs to apply on transistor’s gate. Therefore, a higher voltage will be needed before the
transistor begins to conduct. That means threshold voltage ( increases significantly over
the period of time as shown in Figure 2.2. The threshold voltage shift is accelerated
by elevated temperature or supply voltage and it is a direct measure of the device degradation
and is widely used in the literature to evaluate BTI impacts [13, 14].

BTI mechanism occurs in two phases as shown in Figure 2.3. Firstly, the transistor is in
stress phase when voltage is applied to the gate of the transistor over period of time.
During this phase, charge traps are generated in gate oxide layer and the transistor threshold
voltage increases (degrades). Secondly, when the stress voltage is removed, the transistor is
in recovery phase. During the recovery phase, trapped charges are released and the threshold
voltage partially recovers to level prior to stress. The transistor enters into stress and recovery
phases alternately, when the input is dynamic.

Figure 2.2: Threshold voltage shift due to BTI aging

7
Figure 2.3: Two phases of BTI. The transistor does not fully recover

Therefore, amount of BTI degradation depends on the stress history. This history is
represented within the concept of duty factor, which is calculated by the device stress and
relaxation periods. Devices in arithmetic and memory circuit tend to present un-balanced
duty factor, while devices in clock circuitry are an example of duty factor of 50%.

The impact of BTI is observed in both NMOS and PMOS transistors and both are
susceptible to Positive Bias Temperature Instability (PBTI) and Negative Bias Temperature
Instability (NBTI) respectively. Hard et al. [15] has analyzed the impact of BTI in four
scenarios: 1) NMOS under negative gate bias, 2) NMOS under positive bias, 3) PMOS under
negative bias and 4) PMOS under positive bias. The study clearly shows that PMOS devices
are more susceptible to BTI, regardless of negative or positive bias. And also proves that
PMOS under negative bias is the case that presents the largest threshold voltage shift. This is
unfortunate, since in digital circuits PMOS devices are negatively biased. This is the reason
why BTI is often referred to as NBTI and attributed to cause a threshold voltage shifts in
PMOS devices only. Therefore, most of literatures [16, 17] are only focus on study the
impact of NBTI on the circuit reliability.

In the last decade, accurately modeling of BTI has become a major concern for industry.
Several approaches have been proposed to understand the origin of this phenomenon and
predict its impacts, while all agree on the fact that BTI is caused due to the generation of

8
traps in oxide layer when bias is applied at transistor’s gate. In addition, the BTI degradation
impact can be directly measured as a shift in transistor’s threshold voltage.

In the next sections, an overview of the developed models for NBTI mechanism at
transistor-level is presented and reasons behind the choice of model used to carry out the
research done for this work are given. Moreover, BTI impacts at gate-level are also
presented.

2.2 Transistor-level Modeling

In this section, we present a detailed study of NBTI modeling at transistor-level where NBTI
degradation reflects as threshold voltage shift [16]. In section 2.2.1, a detailed description of
the Reaction-Diffusion (R-D) model [18] is presented, which is considered as one of the
earliest approaches to explain NBTI actions. The R-D model estimates the number of
interface traps as a function of time during stress and recovery phases. A simplified
expression to determine the threshold voltage degradation of a PMOS transistor is also
presented. In section 2.2.2 Atomic Trap-based model [19] is presented. The model is based
on the stochastic properties of gate oxide traps. And finally, Capture/Emission Time (CET)
map-based model [20] is presented in section 2.2.3 which has been recently developed to
provide accurate analysis and long-term prediction of NBTI degradation. The drawbacks of
each model are also presented.

2.2.1 The Reaction-Diffusion (R-D) Model

The Reaction-Diffusion (R-D) model is considered as the first model that developed to
describe the effects of NBTI phenomenon. The model is developed by Jeppson and Svenssen
in 1977 [21] in an attempt to explain how interface traps are generated in the gate oxide layer
due to NBTI aging and estimate the number of these traps and in order to predict the impact
of degradation over the device’s lifetime.

9
R-D model assumes that NBTI degradation originates from breaking of Si-H bonds at
Si/SiO2 interface. The model is explained in the two phases of NBTI depending upon the bias
condition of the gate of PMOS transistor. During the stress phase, (i.e., ),
as shown in Figure 2.4 (a), at elevated temperature and under the influence of vertical electric
filed on the gate oxide the Si-H bonds weaken with time and end up breaking at a certain
point. The broken Silicon bonds act as interface traps near the Si/SiO2 interface and
Hydrogen atoms are then free to spread out; some of them group together to form H2
molecules which diffuse towards the poly gate. This phase causes a threshold voltage shift
and subsequently device performance degrades over time. In recovery phase, when
(i.e., ), as shown in Figure 2.4 (b) the PMOS device is under recovery as Hydrogen
atoms/molecules diffuse back to the interface and anneal the broken Si-H bonds resulting in
reducing the number of traps and consequently less and the impacts of NBTI are
attenuated and partially disappear. This phase has a significant impact on the estimation of
NBTI during the dynamic switching in digital operations. Therefore, according to stress and
recovery phases of the R-D model, the input sequence applied the transistor’s gate is
significant as some particular input combinations can, in some cases, almost anneal NBTI
impacts.

(a) (b)

Figure 2.4: (a) Stress phase, (b) Recovery phase of R-D model

10
According to the R-D model, the two phases of NBTI can be identified in the simplified
diffusion equation that is given by:

generation annealing

Where is the initial number of Si-H bonds, is the number of interface traps and
is the breaking rate constant of Si-H bonds and it is a temperature dependent. During the
recovery phase, as it is described by the second term of the equation. Where is the
recovery rate constant when H atoms back-diffuse towards silicon atoms to recreate the Si-H
bonds when the stress is removed and it depends on the electric field and temperature, and
is the number of hydrogen atoms at the Si/SiO2 interface. The generation of causes
threshold voltage shift in PMOS transistor [4], which can be evaluate by:

Where is elementary charge and is PMOS gate capacitance.

For more than 30 years, the R-D model was the only one used to characterize this
phenomenon. It was first developed for a single cycle where NBTI was perceived as an
electrochemical process simulated only by the vertical electric field on the gate oxide, and
then Yang et al. [22] augmented the impact of lateral electric fields along the channel that
increases NBTI at high drain bias potential. Later on, Kumar et al. [23, 24] presented the
analytical solution to cover multiple stress/recovery phases by using the signal probability
and activity factor (SPAF) concept. However, the SPAF method oversimplified the actual
workload dependence by down-converting any waveform to its SPAF equivalent at 1Hz.
Thus, different input sequences were assumed to be identical (i.e. not realistic). In addition,
R-D model does not well support some experimental observations (i.e. non-Arrhenius
behavior of temperature dependence, etc.). Also the model failed to precisely explain the
recovery phase of BTI degradation by using diffusive process. The model predicts a slow

11
recovery phase especially in the case of long stress time is applied. However, the
experimental results proved that the recovery phase is very fast, even for long stress time.
The results show over 50% recovery occurs within 1 second which does not correspond at all
with the model’s slow recovery predictions. Consequently, R-D model underestimated the
recovery phase of BTI.

Several attempts have been made to improve the R-D model and overcome its drawbacks;
Kaczer et al. [25] proposed a disorder-controlled diffusion and drift mechanism based on the
dispersive transport. Later on, it was generalized to reaction (dispersive) diffusion (RDD)
model by Grasser et al. [26]. In spite of all these attempts, but R-D model is still inconsistent
with experimental results. For all these reasons, a new model for BTI degradation mechanism
was developed. It is presented in the next section.

2.2.2 The Atomic Trap-based Model

Due to inconsistencies in the R-D model, researchers start to look for a better alternative
method that can accurately describe BTI phenomenon and more specifically its recovery
phase that could not be properly explained by R-D model.

The new model is entirely based on the stochastic properties of the individual gate oxide
traps [27]. Each particular device has a number of traps that may cause a shift in the threshold
voltage. Each one of these traps is characterized by a set of time constants. These traps
capture some of the charge carriers responsible for the current flowing between transistors’
source and drain [28, 29]. If a trap captures a charge carrier, then it affects the drain current,
because the number of charge carriers available in the channel changes, and the charged traps
may be a source of scattering and affecting the device mobility [19].

The atomic trap-based model is based on the recently proposed Time-Dependent Defect
Spectroscopy (TDDS) method [30]. It makes use of the fact that molecules from the silicon
oxide can exist in different states based on their energy. In other words, if a molecule receive
or loses a certain amount of energy, then it can move from one state to another, either
capturing or releasing several traps in the process as shown in Figure 2.5.

12
Figure 2.5: Concepts behind Atomic Trap-based model

The amount of the received or the lost energy is highly voltage and temperature, each one of
these traps has two states: charged (occupied) or uncharged and only the occupied traps make
contributions to the device’s threshold voltage. The timing characteristics of each trap
corresponding to the charge carriers trapping and detrapping events as described in [28],
namely the capture and emission time and respectively, are strongly dependent on
voltage and temperature.

The capture time is defined as amount of stress time that has to be applied before the
trap is being occupied with a high probability. In other words, if the probability of a trap
being charged and the amount of time when stress voltage is applied, then the
probability of the trap occupancy in case of capture is defined by

{ [ ( ) ]} (2.3)

13
Similarly, emission time is defined as the time that the charged trap takes to discharge

in the absence of stress; therefore, the probability of the trap occupancy in case of emission

is defined by

{ [ ( ) ]} (2.4)

Where is the relaxation time when stress is removed.

Traps change their occupation state according to their characteristic time constants,
meaning that the number of trapped charges does not directly reflect the new occupation
probability. The faster traps (with shorter capture time constants) become filled first, while
the slowest traps take longer time to become filled. The gradually changing of the number of
occupied traps accordingly changes the channel conductivity. Since the dynamics of this
occupation depends on the bias point and temperature, therefore, it may lead to bias
temperature instability (BTI).

When the traps have very short capture and emission times that means they are constantly
being charged and discharged, leading to another phenomenon called Random Telegraph
Noise (RTN) [31, 32]. In RTN, the traps capture and release charge carriers and cause
random switching of the drain current at fixed gate bias, while in the case of BTI, the capture
of charge is forced at high gate voltage ( ) and the emission at low voltage ( ). The
scope of this thesis is only focused on study the impact of BTI, so RTN will not be discussed
in any further detail.

Under the atomic trap-based model, each device is characterized by the number of traps,
the capture/emission times of each trap and the contribution of each charged trap on device’s
threshold voltage shift [27]. It was observed that the number of traps in the oxide can be well
estimated by using simple Poisson processes [7]. Therefore, with the knowledge of the
capture/emission time constants, the impact of individual trap on device’s threshold voltage
and trap density, the atomic trap-based model is able to determine the impact of threshold
voltage shift evaluation with time.

Although, the atomic trap-based model provide a very detail cycle-by-cycle degradation,
but it was not scalable, and suffered on complex designs because of its stochastic nature and

14
high computational complexity. Therefore, an enhanced model for predicting BTI effects
called Capture/Emission Time (CET) map-based model is recently proposed to overcome
these limitations as it is presented in the next section.

2.2.3 Capture/Emission Time (CET) Map-based Model

CET map-based model is not considered as a novel model for measuring BTI effects meant to
replace the previous modeling efforts such as [16]. Rather, the CET map model affords an
appropriate transformation of the experimental data to help for understanding the wide
distribution of capture and emission time constants by using the non-radiative multiphonon
model for charge exchange [29]. In particular, the CET map-based model can intuitively
explain the effect of stress and recovery on a distributed set of defects, including DC, AC,
and Duty Factor (DF) stress effects [33, 34].

The CET map-based model is constructed by measure-stress-measure (MSM)


experiments [20, 35]. It deals with the effective activation energy directly instead of energy
levels of the trap as in the Atomic trap-based model since both capture and emission are
thermally activated processes and highly dependent on voltage and temperature [30]. The
degradation according to digital switching between a given stress and recovery voltages can
be directly calculated in [33].

The CET map is filled in two steps; first, all defects that have capture time constants
smaller than the stress time are filled. Then, during the consequent recovery, all
defects that have emission time constants smaller than the recovery time are
discharged. Hence, for any combination of stress and recovery times, the
CET map can be integrated over the entire time domain to obtain the main number of
occupied traps and then total can be determined with respect to the device
dimension, workload and frequency [20]. However, the measured capture and emission times
(i.e. and ) are limited to the experimental window. Consequently, to acquire accurate
data for very short or long stress/relaxation times is technically impossible or requires long
observation times. Therefore, P. Weckx et al. [47] developed analytical 2-component
bivariate log-normal mixture distribution to extend the experimental window to cover the
short/long operating lifetimes as shown in Figure 2.6.

15
Figure 2.6: Extending the measured CET map Ref. [47]

The CET map-based model is at the heart of all simulations carried out in this work as it
is scalable (i.e. cover a complete space of defects) and very accurate for very short or long
stress/relaxation times. In addition, it takes into account the saturation of degradation, and
therefore provides more optimistic lifetime predictions than the R-D model. Further details
about how the CET map-based model is used for analysis the impact of NBTI degradation at
netlist-level will be explained in the next chapter.

2.3 BTI Impacts at Gate-level

According to the bottom-up hierarchical order, starting with the transistor-level, and going up
to the circuit-level BTI degradation analyses, the BTI impact on a group of transistors within
a logic gate will be demonstrated. BTI modeling of logic gates carries the shifts of
transistors up to one higher level, where the degradation impact reflects on the gate delay.

As mentioned in section 2.1, BTI degradation exists in two types; Negative Bias
Temperature Instability or NBTI and Positive Bias Temperature Instability or PBTI. Based
on the applied signals to a logic gate’s input, different transistors will be affected at different
times, either by NBTI or PBTI according to the type of the transistors.

16
2.3.1 Experimental Setup

At gate-level, BTI degradation depends on some parameters such as; duty factor, gate size,
gate drive strength, stress location, transistors organizations within a gate and frequency. The
impact of the input signal on BTI is characterized by duty factor (DF). The duty factor refer
to ratio of time length of logic high (logical ‘1’) to the whole stress stimuli period.
Simulations were executed by stressing the logic gates by signals with different duty factors
to examine their impacts.

While the gate size is expected as a factor that may have an effect on the amount of
degradation therefore, measurements on different gates with different transistor dimensions
are performed. Simulations also evaluate the impacts of gate drive strength on BTI
degradation. Drive strength means upsizing transistors in a certain gate or adding more
transistors in parallel in order to make the gate able to drive without loss of performance.
Finally, the impacts of transistors organization within a gate and frequency will also be
investigated.

All the experiments are performed on 45nm PTM transistor model and BTI impacts are
augmented to the transistors with Verilog-A modules for stress time of 10 years. Each
module generate voltage shift that depend on the activity factor of the transistor. All
measurements are performed through timing simulation using HSPICE. The purpose of these
experiments is to observe what changes BTI brings at one higher degree of complexity (group
of transistors) from transistor-level modeling before study BTI impacts at netlist-level.

2.3.2 Experimental Results

Impact of duty factor

The degradation of a logic gate depends strongly on the input signals over the lifetime.
Since the two types of BTI degradation behave on transistors in opposite ways, duty factor of
the input signal has reverse effects on the NBTI and PBTI degradations. It is observed that

17
for higher duty factor the impact of PBTI increases while the impact of NBTI decreases
which is consistent with the previous results [13] since PBTI occurs on NMOS transistors
when high voltage (logical ‘1’) is applied. On the other hand, NBTI occurs on PMOS
transistors when low voltage (logical ‘0’) is applied. It is also observed that PMOS devices
are more susceptible to BTI effects than NMOS.

Impact of gate size and gate drive strength

This experiment has shown that the gate with higher drive strength is less susceptible to
the BTI degradation than the gate with lower drive strength for any given duty factor [14].
Figure 2.7 shows the impact of increasing the drive strength from D0 to D4 on inverter and
buffer gates. The delay degradation decreases with higher gate drive strength. As adding
more transistors in parallel, means that all these parallel transistors have mutually exclusive
BTI impacts that result in smaller delay increment [13], and therefore, will reduce the overall
delay degradation of the gate, therefore lower degradation will result with higher gate drive
strength.

Figure 2.7: Degradation impact on inverter and buffer gates with increasing drive strength

18
Impact of stress location and transistors organization in a gate

In this experiment, it is observed that all transistors in a logic gate do not contribute
uniformly to the delay increment. Results have shown that transistors close to the supply
voltage tended to be more affected by BTI than those near to the ground. Moreover, the delay
degradation is sensitive to the type of connection between transistors in a gate. The transistors
connected in series have positive interference in their BTI impacts and result in a larger delay
increment. On the other hand, the transistors connected in parallel have mutually exclusive
BTI impacts that result in smaller delay increment.

According to CMOS gates, PMOS transistor’s source is connected to the supply voltage
, whereas an NMOS transistor’s source is connected to the ground. Consequently, the
signal at the output of a gate has a fall time depends on NMOS transistor, and rise time
determined by PMOS transistor. For instance, inverter gate has higher delay increment in rise
time than in its fall time as shown in Figure 2.7 since the impact of NBTI in PMOS is higher
that PBTI in NMOS. Further explanation of this point, NMOS transistors in Pull-down
Network (PDN) in NAND gate are connected in series therefore it results in higher delay
increment in the fall time than in the case of NOR gate where the NMOS transistors are
connected in parallel. Same observation for PMOS transistors in the Pull-up Network (PUN)
in NAND and NOR gates. NOR gate has higher delay increment in the rise time than NAND
gate. In short, the BTI degradation depends strongly on the type of connection between
transistors and the location of transistors.

Impact of frequency

Previous studies reported that frequency has a weak impact on BTI degradation that can
be neglected. Alam et al. [18] claimed that BTI is independent on frequency due to
equivalent number of Si-H bonds breaking and annealing during stress and relaxation phases
respectively. While T. Grassor et al., in [36] suggested weak frequency due to asymmetry
between the bonds breaking and recovery during stress/recovery phases. While more recent
studies have revealed a frequency dependent contribution [37-39]. In the CET map-based
model frequency dependence has been taken into account [20, 34], as it is carried out all
simulations of all experiments in chapter 4.

19
2.4 Chapter Review

In this chapter, the physical concept of BTI phenomenon is explained then the transistor-level
models that are developed to explain the behavior of this phenomenon are presented, as it is
revealed that R-D model underestimated the recovery phase of BTI. As it predicted a slow
recovery phase, while the experimental results proved that the recovery phase is very fast.
Therefore, the R-D model was too pessimistic to predict BTI actions. Then, the Atomistic
Trap-based model is introduced. The model is based on the stochastic properties of gate oxide
traps. It described all traps in terms of voltage and temperature dependent capture and
emission time constants. It also successfully modeled Random Telegraph Noise (RTN).
Atomistic Trap-based model was able to present very detailed and accurate BTI degradation,
however it was not scalable, and suffered on complex designs due to its stochastic nature.
Therefore, the CET map-based model is presented. The model covered the frequency
dependence and it takes into account the saturation of degradation and provides long-term
predictions.

At gate-level, BTI degradation depends on some parameters such as; duty factor, gate
drive strength, stress location, transistors organizations within a gate and frequency, the
impact of each parameter is illustrated.

20
This page left intentionally blank

21
Chapter 3
Workload-Dependent NBTI Reliability-
Aware Logic Synthesis

3.1 Related Work

In order to perform aging-aware timing analysis at netlist-level, a gate model is required that
provides aged gate delay instead of the fresh one. This is the main difference compared to the
traditional Static Timing Analysis (STA) without aging.

Several approaches have been proposed to study NBTI aging at netlist-level, while most
of them determine the transistor’s threshold voltage shift either by using the R-D model or
the Atomistic Trap-based model. After that, this information has been translated to timing
path degradation at netlist-level through several methods. Kumar et al. [24] developed R-D
model-based aging framework to characterize look-up tables (LUTs) for gate delays versus
threshold voltage w.r.t signal probability of each device. These LUTs were carried out to
evaluate the path delay degradation by using an in-house timing analysis flow. However,
their LUT-based solution contained only 22 standard cells due to the complexity of the pre-
characterization simulations. Their study is only focused on ISCAS benchmarks under 70nm
technology node. Wang et al. [40] proposed a flow to perform pre-characterization of
standard cells by using SPICE simulations for input slew rate, output capacitive load and
. A closed form expression of the aged gate delay as a function of is calculated by
using polynomial fitting method to be used in the SPICE simulations. However, each gate
was characterized by a single delay value, i.e. maximum of the propagation path delays, and
ignoring the rest of the input-to-output delays in order to simplify the LUT generation
complexity, hence it was not accurate. Moreover, their characterized library consisted of only
15 standard cells, and was not in the form of a standard liberty file format. Therefore, it was

22
not compatible with the standard digital design flow. Velamala et al. [41] developed a
framework based on trapping/detrapping theory to estimate the NBTI degradation of
individual devices depending on the workload. Their flow reuses the slew rate versus load
capacitance tables in the existing standard libraries under different corners through
relating to the supply voltage change without performing any characterization to the
aged library. Therefore, its computational complexity was lighter than the LUT-based
approaches, but with a loss of accuracy. The degradation impact on the system’s critical path
timing was evaluated by their in-house timing analysis flow. Lorenz [42] focused on study
the inter-gate degradation effects, e.g. the aged slew rate at the output of the preceding gate.
However, this study was comprehensively covering multiple sensitivities, i.e. a simple 2-
input NAND gate required 56 LUTs hence it requires a massive storage space. Finally, Huard
et al. [43] proposed a bottom-up framework based on the composite BTI model which
presented the recovery phase of NBTI more accurately than the R-D model [44]. Their gate
delay model covers gate type, output capacitance, slew rate, workload dependencies.

The aforementioned approaches developed their own simplified gate delay models to be
used within their in-house timing analysis flows, which are not compatible with the
commercial EDA tools, and not usable for the rest of the digital design flow, e.g. placement
& routing. Moreover, their LUT-based pre-characterizations were time-costly, storage-
hungry, and not scalable in case of libraries contain more than 500 standard cells.

Recently, Kükner et al. [45, 46] proposed a fast, accurate, and scalable, workload-
dependent aging-aware framework that is integrated within the standard EDA tool chain,
while covering both inter-, and intra-gate degradation impacts. Moreover, it is applicable with
the standard digital design flow. In the next section, more details about integrating CET map
with workload-dependency will be explained.

3.2 Workload-Dependent CET Map-based NBTI Model

The CET map model has the advantages of higher accuracy (i.e. covering the complete defect
space), better long-term aging predictions (i.e. more optimistic than the R-D model), and
lightweight nature (i.e. no need for any heavy Monte Carlo simulations).

23
Figure 3.1: CET map integration w.r.t. the workload to obtain the active traps on device. Ref. [47]

The CET map is constructed by measure-stress-measure (MSM) experiments based on


28nm HKMG technology, and scaled for 10nm FinFET process node regarding to device’s
dimension, voltage, temperature, oxide thickness, and time-zero variability.

The CET map can accurately describe the stress and recovery patterns for DC, AC and
DF stressing by integrating over the portion of the CET map that is active under stress [20,
34]. For a given frequency f, duty factor (DF) and stress time the occupancy
probability map can be determined by using Equation 3.1. After that, CET map is
multiplied by the occupancy probability map to obtain the CET-active map that consists
of a distribution of active traps as shown in Figure 3.1. Consequently, only a portion of defect
population will be active regarding to the stress waveform. Then, the CET-active map is
integrated over the entire time domain, and rescaled by the device dimensions to
determine the workload-dependent distributions (please see [47] for details).

( )
( )
( )

In FinFET process technology, the transistor channel became thin and vertical called
“Fin” and device’s gate is fully wrapped around the channel formed between the source and
the drain resulting in much better electrostatic control of the channel and better electrical
characteristics and lower leakage and variability [48]. In addition, it provides high integration
density as its channel is vertical and it can deliver more performance than planar CMOS
device.

24
Figure 3.2: Geometric parameters of FinFET device. Ref. [48]

The most important geometric parameters of the FinFET transistor are fin height ( ),
fin width or body thickness ( ), and the channel length ( ). Figure 3.2 shows those
parameters. The effective electrical width of the FinFET is the fin width/body thickness
plus twice the fin height as shown in Equation 3.2.

Based on the CET map model, the threshold voltage shift surfaces for PMOS device
can be represented as a function of duty factor (DF), device size (number of fins NFIN), and
frequency at different ages (e.g. from 10 years to 3 sec) by using distributions (i.e. 3σ
and µ sampling) as shown in Figure 3.3 surface becomes lowers and flattens for shorter
ages. Moreover, decreases when DF tends to 100%, since the PMOS device is less
stressed. It can be observed that, the change of surface w.r.t duty factor is around Δ50-
87mV while the change w.r.t device size is about Δ0.8-23mV. That means, surface is
more sensitive to workload than device size, which emphasizes the high workload-
dependency of NBTI aging. Also it is noted surface is not affected with the change of
device size at any duty factor and , since the average trap density stays constant; on the
other hand, rises with the decreasing of the device size, since average impact per trap
on is reversely proportional to device size (~1/device size). That means, the smaller
devices are more susceptible to NBTI degradation which is consistent with previous studies
[19], [29]. Finally, can be up to 50mV for aging of 10 years, while can reach
to 100mV.

25
NBTI is a time-dependent degradation mechanism where the impact of device size, duty
factor, age, etc. are quite visible at transistor-level. However, it is importantly required to
predict its impacts at higher design levels such as netlist-level to enable reliability simulation
of more complex designs. In the next section, a detailed description of the reliability-aware
digital design flow is presented.

(a)

(b)

Figure 3.3: Threshold-voltage shift surfaces under (a) mean, and (b) 3-sigma corners w.r.t. duty
factor, age and device size, for 10 nm FinFET process technology.

26
3.3 Workload-Dependent Aging-Aware Logic Synthesis

This section describes the NBTI reliability-aware digital design flow that is used to measure
the impact of NBTI degradation at netlist-level. The flow is fully compatible within the
commercial EDA tools and applicable with the standard digital design flow e.g. placement &
routing. Moreover, it does not need for any pre-characterization steps to generate gate delay
LUTs, and hence saves extensive storage space. Furthermore, it is faster compared to
processing LUT-based approaches. The flow is not limited only to a small set of standard
cells, but can cover a complete library of cells. The communication interface between the
flow and the NBTI model is generic. That means, any kind of NBTI model (R-D model,
atomic trap-based model, etc.) can be plugged-in within the flow.

The CET map-based model is chosen to be used within the reliability-aware flow to
estimate the NBTI degradation of individual devices depending on the workloads. The
reliability-aware digital design flow is illustrated in Figure 3.4. Flow consists of four main
steps: 1) Gate-level workload simulations, 2) calculation of the NBTI degradation based on
the CET map model and back-annotation to SPICE sub-circuits, 3) characterization of the
aged library, and finally 4) static timing analysis (STA) of the aged gate-level netlist. The
flow uses the standard file formats (i.e. .vhd, .saif, .lib, .sp), compatible with the commercial
EDA tools (i.e. ModelSim, Cadence Altos Liberate and Cadence RTL Compiler). In order to
simulate the impact of NBTI aging, the flow requires inputs that are specified as technology
node (i.e. 10nm FinFET), Process, Voltage, and Temperature (PVT) corner, workload
benchmark (WL), fresh gate-level netlist (.vhd), reference standard cell library (.lib), and
reference SPICE cell library (.sp).

The flow operates as follows: first, gate-level workload simulations are executed by using
ModelSim to generate (.saif) file where switching activity of individual transistors in the cells
of the fresh netlist is captured. (.saif) file is defined as Switching Activity Interchange Format
(SAIF) file contains toggle counts (number of changes) on the signals of the design. It also
contains the timing attributes which specify time durations for signals at level 0, 1, X, or Z.
After that, the Control Script that is written in TCL (Tool Command Language) scripting
language determines the gates that are used in the fresh netlist (.vhd). Then, the SPICE-level
netlist of each gate is fetched from the reference SPICE cell library (.sp). Then, according to

27
Figure 3.4: Workload-dependent NBTI reliability-aware digital design framework.

the internal gate node activities (.saif), the Control Script evaluates the duty factor (DF) for
all devices in each gate of the fresh gate-level netlist. After that, based on device dimensions,
DF, age, voltage, and temperature, per device is calculated by using the CET map NBTI
model. Then, per device is back-annotated to the SPICE-level netlists of the gates via
the delvto command that enables to shift the in the device.

28
Next, by using Cadence Altos Liberate, the back-annotated SPICE-level netlists are
characterized to generate the instance-based aged library (.lib) at the chosen PVT corner
while each cell in the design is represented as a unique instance. At the same time, the
Control Script writes the aged gate-level netlist (.vhd) by using these unique instance names.
Finally, by using Cadence RTL Compiler, Static Timing Analysis is performed to capture the
performance metrics of the aged system under the WL stress. Comparison of the aged
performance metrics with the fresh version shows the percentage of degradation on the
system’s critical path timing.

The complete reliability-aware flow can be re-executed for a desired NBTI degradation
corner (e.g. µ, 3σ cases) at a desired age (e.g. from 10 years to 3 sec). In this work, we limit
our analysis to the 3σ degradation corner at age of 3 years on 10nm FinFET process
technology. Moreover, the flow is open to the future extensions to cover the impacts of
different time-dependent degradation mechanisms (e.g. Hot Carrier Injection (HCI), Time-
Dependent Dielectric Breakdown (TDDB), etc.) with their own models.

29
3.4 Chapter Review

In this chapter, the previous studies for NBTI modeling at netlist-level are discussed, while
either R-D model or Atomistic Trap-based model is used as the basis to calculate . In
most cases the gate delay model is evaluated by characterizing look-up tables (LUTs) through
using their in-house timing analysis flows, which are not compatible with standard EDA tools
and not applicable for the rest of the digital design flow, e.g. placement & routing. Moreover,
their characterized libraries are limited to small set of standard cells (i.e. not scalable).

For these reasons, CET map-based, workload-dependent reliability-aware flow is


developed by Kükner et al. [45, 46]. The flow can be fully integrated within the commercial
EDA tools and applicable with the standard digital design flow e.g. placement & routing.
Moreover, it can cover a complete library of standard cells. The methodology of how the
flow works is explained in this chapter.

30
This page left intentionally blank

31
Chapter 4
NBTI Aging Analysis of Datapath Sub-
blocks

4.1 Introduction

This chapter presents a comprehensive analysis of NBTI aging effects on various designs of
datapath sub-blocks at 10nm FinFET process technology. The degradation impacts are
investigated in relation to workload dependency, and architectural topology based on the
state-of-the-art CET map-based model.

For this purpose, the most fundamental sub-blocks in the datapth such as adder, multiplier,
shifter, mux/demux and register file, etc. are implemented at different architectural topologies
to examine robustness/sensitivity to aging of each sub-blocks’ architectures. The datapath
sub-blocks have been designed in gate-level VHDL and their functionality was verified by a
testbench. Static Timing Analysis (STA) is performed to capture the performance metrics of
the aged system after 3 years at 3σ corner.

Aging impacts on a large set of architectural topologies of the datapath with various
workload benchmarks have revealed many significant observations of circuit performance
that make up the core sections of this chapter. In next section, the design and implementation
of datapath sub-blocks and their architectural topologies are described. In section 4.3, the
experimental setup for this study is explained. Then, the analysis and discussion of the
experimental results are given in section 4.4, and finally, section 4.5 summarizes the main
points of this chapter.

32
4.2 Datapath Sub-blocks

The datapath sub-blocks are the functional components within a microprocessor that actually
employed to perform computational operations that include reading/writing to memory,
arithmetic, logic operations, and shift operations. These components form the fundamental
building blocks of a complete datapath. Each of these datapath sub-blocks is built from the
basic to complex architectures. All microprocessors contain these sub-blocks in some form or
another, satisfying particular price/performance constraints. For instance, an arithmetic unit
that can perform floating point calculations is much more complicated (and expensive) than
an arithmetic unit that only performs integer calculations.

The previous studies for analyzing aging effects were mostly focusing on ISCAS
benchmarks circuits [41], SRAM cells [49], application-specific blocks (e.g. Low-Density
Parity-Check (LDPC) codec circuit) [43], etc. to show the functionality of BTI aging models
at the netlist-level, or mitigation techniques such as guard-band reduction, instruction
scheduling, voltage/frequency scaling, etc. However, there is no literature has been reported
so far to investigate the aging effects on the datapath sub-blocks (the fundamental blocks of a
processing unit, e.g. adder, multiplier, shifter, etc.). Therefore, those sub-blocks are designed
for analyzing aging trends for different architecture choices with the emphasis on workload
dependency.

Figure 4.1: Structure of parallel prefix adder Ref. [51]

33
Adder Architectural Topologies

Nine adder architectures are designed varying from basic to complex parallel prefix
architectures. They include seven parallel prefix 32-bit adders, one 32-bit RippleCarry adder
(RCA), and one 32-bit Carry LookAhead (CLA) adder that exhibit various area and delay
characteristics. Parallel prefix adders are the most flexible and widely-used binary adders in
computer arithmetic and ASIC design [50] because, unlike the traditional RippleCarry
adders, the carry bits do not have to propagate through the whole design providing a
remarkable performance. Parallel prefix adders are sharing a regular structure, as shown in
Figure 4.1 all are composed of a row of half adders, a network of propagate and generate
(PG) logic and a row of sum logic made up of XOR gates.

Figure 4.2: Parallel prefix adders (a) KoggeStone, (b) HanCarlson, (c) LadnerFischer, (d) Sklansky,
(e) Knowles22111, (f) Knowles44221 and (g) BrentKung

34
Figure 4.3: Taxonomy of parallel-prefix adders.

The network of PG logic differentiates every parallel prefix adder from the others as it is
known as “Parallel prefix network”. Figure 4.2 shows seven such parallel prefix structures for
the adder studied here. Black and gray cells represent a combination of AND-AND-OR and
AND-OR gates respectively. Parallel prefix adders provide a wide variety of Fan-out (FO)
and Logic Depth (LD). As shown in taxonomy of parallel-prefix adder in Figure 4.3
KoggeStone has the full of parallelism with the cost of highest area, thus the adder areas are
normalized to it. Increasing FO and/or LD, decreases the area cost as well as the level of
parallelism.

RippleCarry adder (RCA) provides one of the simplest types of Carry-Propagate Adder
(CPA) designs. 32-bit RCA is implemented based on half adder cells (HAs) as shown in
Figure 4.4. RippleCarry Adder is a very area-efficient adder design but, unfortunately, it is
very slow.

Figure 4.4: RippleCarry adder based on HAs cells

35
The maximum delay of RCA is from the carry-in input to the carry-out, passing through each
half adder along the way. The Carry LookAhead Adder (CLA) solves this problem by
calculating the carry signals in advance, based on the input signals. Therefore it is faster (but
larger) than RCA.

Multiplier Architectural Topologies

In this study, Parallel multipliers are chosen since they are classified as high speed
multipliers and provide the most benefit to computer architecture with less cost. Multiplier
commonly consists of three separate components as shown in Figure 4.5. The operation of
multiplier is as the following: firstly, a collection of AND gates generate the partial product
(PP) bits for multiplier and multiplicand operands. Then, array of half-adders and full-adders
(counters) is used to reduce the partial products to sum and carry vectors. At last, a final
Carry-Propagate Adder (CPA) or parallel-prefix adder is used to add the sum and carry
vectors to produce the final product. Most multipliers follow these steps although there are
various different perspectives on the implementation of multiplier.

Figure 4.5: Main components of parallel multiplier structure

36
There are many different structures that can be used to implement parallel multiplier. In
this study, we only focused on the structures that are widely used in ASIC design. Five
parallel multipliers from three different categories are designed: two Array multipliers, two
Tree multipliers, and one Booth-Encoding multiplier, 16-bit implementation is targeted as it
is widely used in FFT wireless domain and the impact of architectural topology is clear (e.g.
more possible critical paths). In Array multiplier, Full Array and CarrySave Array (CSA)
multipliers are designed. Full Array multiplier is very regular in its structure as shown in
Figure 4.7 (a) where the black cell represents a combination of full-adder and AND gate and
gray cell represents a combination of half-adder and AND gate. At each stage, the carry-out
from one full-adder cell is connected to the carry-in of the adjacent full-adder cell. In CSA
multiplier, the black and gray cells are the same as in Full Array except the last cell in each
row represents a combination of fall-adder with two AND gates. The building block of CSA
multiplier is shown in Figure 4.6. The carries of each full-adder is diagonally forwarded to
the input of full-adder of the next row. Therefore, CSA multiplier is faster than Full Array
multiplier, as the carry bits are not immediately added but rather entered to the next stage of
addition. KoggeStone adder is used to add carries and sums in the final stage of CSA
multiplier rather than a conventional Carry Propagation Adder (CPA), as KoggeStone adder
is widely considered the fastest adder design possible as shown in Figure 4.7 (b).

Figure 4.6: The building block of CSA multiplier

37
Figure 4.7: (a) Full Array multiplier, (b) CarrySave Array multiplier

38
In Tree multipliers, 16-bit Wallace Tree [52] and Dadda Tree [53] multipliers are
designed. In Wallace multiplier, the partial products are generated by AND gates.
Each partial product term is represented by a dot. The dot diagram for 16-bit Wallace
multiplier is shown in Figure 4.8. The 16 rows of partial products are grouped together in sets
of three rows each. Any additional rows that are not a member of a group of three are
transferred to the next level without modification. Within each group of three rows, (3, 2)
counters are applied to the columns containing three bits and (2, 2) counters are applied to the
columns containing two bits. Columns contain only single bit are transferred to the next level
unchanged. A full-adder is an implementation of (3, 2) counter takes 3 inputs and produces 2
outputs. Similarly a half adder is an implementation of (2, 2) counter which takes 2 inputs
and produces 2 output.

Figure 4.8: Dot diagram of 16-bit Wallace multiplier

39
In Dadda Tree multiplier, the partial products are formed by AND gates in the
same manner as for Wallace multiplier. Next, partial product reduction is performed but in
contrast to Wallace method, Dadda multiplier does the minimum reduction necessary at each
level to perform reduction as the same number of levels as required by Wallace multiplier as
shown in the dot diagram in Figure 4.9. Similarly to CSA multiplier, KoggeStone adder is
used in the final stage of Wallace and Dadda multipliers.

Figure 4.9: Dot diagram of 16-bit Dadda multiplier

40
The last multiplier in this study is Booth-Encoding multiplier [54]. Booth multiplier
attempts to reduce the number of partial products generated in a multiplication process by
using the modified Booth algorithm. The multiplier is composed of three components: the
modified booth encoder and decoder for generating the partial products; then, the reduction of
the generated partial products until the last two rows are remained. Wallace Tree network is
used for reduction. Lastly, the final stage adder is used to compute the final multiplication
results by adding the last two rows. To multiply X by Y, the modified Booth algorithm starts
from grouping Y by three bits and encoding into one of {-2, -1, 0, 1, 2}. Table I shows the
rules to generate the encoded signals. Figure 4.10 shows the corresponding logic diagram of
booth encoder and decoder. The Booth decoder generates the partial products using the
encoded signals.

Table 4.1: Truth table of Booth-encoding algorithm

Figure 4.10: Booth encoder and decoder circuits

41
Figure 4.11: Shifter architectures: (a) Logarithmic shifter, (b) Barrel shifter

42
Shifter Architectural Topologies

Two 16-bit shifter architectures are designed. They include Logarithmic shifter and Barrel
shifter. As shown in Figure 4.11 (a), Logarithmic shifter is based on 3:1 MUX cells,
consisted of 4 stages with the order of 1-bit, 2-bit, 4-bit and 8-bit shift. It can perform the
following operations: shift right logical, shift right arithmetic with sign extension, shift left
logical and shift left arithmetic. While Barrel shifter is designed for rotation operation, it is
based on 4:1 MUX cells and is consisted of two stages. The first stage of multiplexers rotates
by 0, 1, 2 or 3 positions, and the second by 0, 4, 8 or 12 positions. Barrel shifter is shown in
Figure 4.11 (b).

Multiplexer /De-multiplexer Architectural Topologies

Multiplexers/de-multiplexers are used to route the many inputs and outputs that were
distributed among the functional units (e.g. ALU) and memory or register files in the
datapath. Two different 4:1 multiplexer architectures and one 1:4 de-multiplexer are
designed. The two multiplexers are implemented using MUX4 cells and logic gates (AND,
OR, INV) as shown in Figure 4.12. The de-multiplexer is implemented based on logic gates.
The input and output data in the multiplexers and de-multiplexer are 16-bit wide.

Register File Architecture

register file is designed, as shown in Figure 4.13, it has 16 registers and each
register is 16-bit wide. Each cell of the register file is constructed by D Flip-flop. Each of the
16 registers has a clock input (positive edge triggered), a 16-bit data input, and a 16-bit data
output. This collection of registers is managed by one 4:16 decoder and two 16:1 16-bit wide
multiplexers. There are two output ports (‘Data_out 1’ and ‘Data_out 2’) for the register file.
Therefore, two registers can be read simultaneously. Each output of the registers is connected
to the two multiplexers. By changing the ‘Read_Register address 1’ and ‘Read_Register
address 2’, we can choose which two registers we want to extract data into the ‘Data_out 1’
and ‘Data_out 2’ buses.

43
Figure 4.12: Two different multiplexer structures based on MUX4 cells and logic gates

44
The clock signal is ANDed with the Write/Read enable signal, the output of this AND
gate is inserted into a set of AND gates that is connected to the outputs of the 4:16 decoder to
control the register writing. When write enable signal comes with the positive edge of the
clock signal, a candidate register will be selected to be updated according to which AND gate
is on. The read operation of the register file is enabled when Write/Read enable signal is low.

Figure 4.13: Structure of 16-bit Register File

45
4.3 Experimental Setup

This study investigates the overall NBTI degradation behavior of collection of devices in
complex designs based on predictions from the state-of-the-art CET map-based model that is
developed at IMEC. NBTI aging has a remarkable impact on timings and logic functionality
of the circuit. The degradation of a circuit strongly depends on operating conditions and
workload over the lifetime. Four experiments to analyze NBTI impacts are performed in this
work as follows: 1) Impact on circuit’s critical path timing, 2) Aging sensitivity to workload
variations, 3) Impact of architectural parameters, and finally 4) Relative Aging of datapath
sub-blocks. The predictive 10nm FinFET technology is chosen for hardware implementations
with transistor fin dimensions: 10, 30, 20nm of thickness, height, and length, respectively.
Library characterizations are performed at standard threshold-voltage (SVT) process corner.
Aging stress conditions are 85 degree temperature, 0.8V supply voltage, and 3 years.

1) Impact on circuit’s critical path timing

NBTI Aging effects degrade transistor parameter, which results in increasing gate delays
over time. Consequently, the circuit’s critical path delay increases and the timing
specification might be violated during the specified lifetime. In this experiment, we analyze
how circuit’s critical path timing is affected after aging, and how can the NBTI degradation
affect logic functionality of circuit and how it can lead to complete system failure.

2) Impact of workload

As discussed before, NBTI impact is strongly dependent on the applied input patterns. In
this experiment, workload dependence of NBTI aging is analyzed by applying various
workload benchmarks on the datapath sub-blocks. 9 WLs for adders and 7 WLs for the rest of
sub-blocks are chosen varying from artificial profiles (e.g. count-up/down, white noise,
longest carry-chain (LLC), constant zero/one input, etc.) to realistic profiles (i.e. FFT from
wireless domain applications). Table 4.2 shows the chosen workload profiles.

46
Table 4.2: Workload profiles

3) Impact of architectural parameters

This experiment examines how the architectural parameters (e.g. fan-out, logic depth,
etc.) can have an effect on the amount of degradation caused by NBTI at netlist-level. As
explained in the previous section, different architectural topologies (e.g. linear, logarithmic,
etc.) for each sub-block are implemented to provide a wide variety on trade-off space (9
adders, 5 multipliers, 2 shifters, 3 mux/demux etc.). 16/32-bit implementations are targeted,
as the impact of the architectural choice becomes more clear for higher number of bits in
Pareto space (e.g. more possible critical paths).

47
4) Relative Aging of datapath sub-blocks.

In this experiment, we investigate which sub-blocks on the datapath are relatively more or
less sensitive to NBTI aging. It can be expected that performance degradation on the datapath
might be dominated by a specific block, while the rest of blocks can be on the safe side, and
more robust to NBTI aging. Therefore, the results of this experiment could be a valuable
input to study mitigation schemes, fault tolerant architectures, etc.

4.4 Experimental Results and Discussion

In this section, the experimental results of NBTI aging effects on datapath sub-blocks are
presented. We will start with discussing experimental results of adder, multiplier, shifter and
mux/demux and later in this section we will discuss the register file results; because it has
different characteristics (e.g. Read, Write operations) unlike the rest of sub-blocks.

The graphs in Figure 4.14 show circuit’s critical path timing ( ) and delay increment
(Δdelay) due to aging of each architectural topology of datapath sub-blocks. The varying of
delay increment comes from certain gates in the circuit’s critical path are more sensitive to
NBTI aging than others. In other words, delay degradation depends on the complexity of the
logic gate, for instance, XOR gate is more sensitive to aging than AND and OR gates.
Therefore, the delay increment is relatively large in complex designs than the simple ones.
Moreover, the critical path timing and delay increment are highly correlated to architectural
parameters such as fan-out (FO) and logic depth (LD). increases for higher logic depth and
fan-out, since the number of logic stages on the path, and their capacitive loads increase.
Δdelay increases for higher LD, since the degradation is additive over the path (i.e. more
gates are traversed); also increases for higher FO, since shift due to NBTI has a higher
delay increment for gates with higher capacitive load [40]. In adders, Δdelay in parallel prefix
adders varying from 5.8% to 7.2%, while Carry LookAhead and RippleCarry adders have the
smallest Δdelay values due to the simplicity of their designs, however they have the highest
values due to the high logic depth, as the carry signal propagates from input to output causing
large delay. Δdelay is about 4.5% and 4.74% for Carry LookAhead and RippleCarry adders
respectively.

48
(a) (b)

(c) (d)
Figure 4.14: Critical path timing and delay increment in the sub-blocks: (a) Adder, (b) Multiplier, (c)
Shifter, and (d) Mux/Demux

In multipliers, Δdelay values for Tree multiplier (Wallace, Dadda) are 6.47% and 7.34%
respectively, which are less than Δdelay of Array multiplier (Full Array, CSA Array) that are
8.15% and 7.59% respectively, while Booth multiplier reach to 8.94% due to complexity of
the design (e.g. booth encoder, booth decoder). Logarithmic shifter (9.85%) has higher
Δdelay than Barrel shifter (6.97%) due to the higher logic depth. For Mux/DeMux, MUX4-
based multiplexer has a higher Δdelay than the logic-based version as the MUX4 cells are
more complex than AND and OR gates in logic-based multiplexer and therefore, more
sensitive to NBTI degradation, however it has smaller due to the lower logic depth. The
Δdelay reaches values of 11.2%, 7.26% and 8.76% in case of MUX4-, logic-based
multiplexers and logic-based de-multiplexer respectively.

49
Figure 4.15: Percentages of critical path replacement

NBTI aging causes paths that are non-critical in the fresh design become critical after
aging as they are affected by much larger timing degradations, and paths that are critical in
fresh design become non-critical in aged design as they experience smaller timing
degradations. The number of critical paths that have been replaced may be a significant
fraction of the total number of paths in the overall design. Results show that critical paths in
32 out of 35 cases for multipliers (5 architectures versus 7 workloads) have been replaced and
same trend is observed in the other sub-blocks (69/81 for adders, 15/21 for mux/demux,
10/14 for shifters) as shown in Figure 4.15 critical-path replacement can be high as 91.4% in
the multiplier which shows the severity of NBTI degradation on performance of VLSI
systems during their lifetime. Therefore, both critical and non-critical paths should be in
focus for a guaranteed level of system-level reliability.

For a given block architecture, NBTI degradation is calculated through the percentage-
wise delay degradation ( ) that represents mean-to-mean difference between time-zero
and after-degradation timing paths histograms. As shown in Figure 4.16, each graph
represents profiles of different architectural topologies of particular datapath sub-
block.

50
Figure 4.16: Delay degradation w.r.t workload profiles for different architectures of (a) adder, b)
multiplier, c) mux/demux, d) shifter blocks

Each group of bars corresponds to of particular block architecture under different


workload stresses (e.g. 9 WLs for adders, 7 WLs for multipliers, etc.). varying from
1.2%, 5%, 4.9% and 1.2% to 11.3%, 13.7%, 15.1% and 17% for adder, multiplier,
mux/demux and shifter respectively.

The changing of workloads applied on a given architecture results in different switching


activities on the internal nets, and therefore, different values are back-annotated for
individual transistors in each logic gate and it results in different delays of the gates and this
translates as a different aging per workload profile at netlist-level, e.g. is higher
than , etc. The second observation is the varying of with respect to particular
WL applied at different architectural topologies (e.g. KoggeStone vs. Sklansky) of a specific
block. Since internal architectural differences result in distinct switching activities on their
internal nets, and therefore, resulting in different values at transistor-level. For example,
is lower than for KoggeStone, while it is vice versa for Sklansky.

51
Figure 4.17: Impact of workload-variation on the delay degradation of different architectures of
datapath sub-blocks

Aging of a given circuit can be less or more resistive to workload variations than another
circuit with respect to its architectural dependencies. Aging sensitivity to workload variations
can be evaluated by

Where and are the maximum and minimum of a given


architecture. Figure 4.17 shows the aging sensitivity to workload variations of datapath sub-
blocks, where logic depth and fan-out values (LD, FO) values for each sub-block’s
architectures lay on the x-axis (e.g. KoggeStone (5, 2), WallaceTree (14, 4), etc.).
Architectures are in the same order as in Figure 4.16. Aging sensitivity to workload variations
increases for higher logic depth or fan-out. In the adders, LD increases for the
architectures from KoggeStone (5, 2) to RippleCarry (63, 1) causes increases from
to also with increasing of FO, rises from to in the

52
direction from Knowles22111 (5, 3) to Sklansky (5, 17) adder. In multipliers, for Wallace
Tree and Dadda Tree (14, 4) multipliers, is in the same range ( ), while it is a
bit higher in CSA multiplier (21, 5), and it drops to for booth multiplier due to lower
FO, then it raises again to for Full Array multiplier (45, 1) due to higher LD.

Same trend observed in shifter and multiplexers, from Barrel (2, 4) to Logarithmic (5, 15)
shifter rises from to ; and with higher LD and FO from MUX4 (1, 1) to
logic-based multiplexer (3, 32), rises from to . We can conclude that
RippleCarry, Sklansky adders, FullArray multiplier, logic-based mux/demux, and
Logarithmic shifter have the highest sensitivity, while KoggeStone, HanCarlson adders,
Booth multiplier, MUX4-based multiplexer, and Barrel shifter are the least sensitive ones.
Reason behind that is the trade-off between area and delay. As with higher LD and FO for a
given circuit, the area overhead as well as the level of parallelism decrease. For instance,
KoggeStone has a cost of highest area and consisted of many parallel paths compared to
Sklansky, RippleCarry etc. Therefore, it is easier to find an alternative path in case of
degradation on the critical path. However, RippleCarry and Sklansky have relatively less
number of paths. From here comes the architectural dependency of NBTI aging.

NBTI aging effects on register file are examined during read and write operations. It is
observed that circuit’s critical path changes after aging with the change of the executed
operation (read or write) due to certain gates in paths of the circuit are experienced larger
timing degradations than others.

Figure 4.18: Delay degradation w.r.t workload profiles for write and read operations of register file

53
The impact of workload dependency is examined on four workload profiles (constant
input, white noise (WN), FFT and count up) as in Table 4.1. The percentage delay
degradations during read and write operations is shown in Figure 4.18. During write
operation, the gates in the paths from 4:16 decoder to DFF cells of registers are under
workload stresses, while during read operation the gates in the paths from DFF cells to the
two 16:1 multiplexers are stressed. The change of workloads results in different switching
activities on the internal nets and leads to different values at transistor-level and
therefore, different aging per WL at circuit-level. It is observed that during read is
higher than during write and this is expected, because of the higher logic depth and fan-out of
read-circuit (e.g. Two 16-to-1 MUXes) than in the write-circuit (e.g. 4-to-16 decoder). As
mentioned before, sensitivity to workload variations increases for higher logic depth
and fan-out (LD, FO). of write-circuit (2, 16) is while it is in read-
circuit (4, 64) as shown in Figure 4.19.

In conclusion, according to experimental results it can quantify which sub-block in the


datapath is more or less sensitive to NBTI aging. The results show that adder and multiplier
are less impacted by NBTI aging while mux/demux, shifter and register file are the most
degraded ones. NBTI can cause performance loss up to 16.5% as in the case of register file as
shown in Figure 4.20 which show the severity of NBTI aging on the performance of circuits.
The output of these experiments could be a valuable input to study mitigation schemes and
fault tolerant architectures to maintain the reliability of nano-scaled circuits.

Figure 4.19: Impact of workload-variation on the delay degradation for write and read operations of
register file

54
18
16
Performance loss (%)

14
12
10
8
6
4
2
0
Adder Multiplier Mux/DeMUx Shifter Register File

Figure 4.20: Relative aging of datapath sub-blocks

55
4.5 Chapter Review

In this chapter, a complete study of NBTI aging effects on the performance of datapath sub-
blocks is presented. The hardware architectures are implemented based on predictive 10nm
FinFET technology. The NBTI degradations are measured by using the CET map-based
model under aging of 3 years.

The experiments are performed by using Reliability-aware digital design flow presented
in chapter 3. Aging impacts are investigated in relation to workload dependency, and
architectural topology. Aging sensitivity to workload variations is linked to architectural
parameters LD and FO. It is shown that increases for higher (LD, FO), and it can
vary and respectively. Results show a performance loss up to 16.5%, while aging
is mainly dominated by register file and shifter blocks.

Moreover, NBTI aging can cause a replacement of circuit’s critical path by a non-critical
one and this affects the normal operation and the logic functionality of the circuits and may
cause complete system failure when the degradation effects reach to a certain limit.

56
This page left intentionally blank

57
Chapter 5
Exploration on Timing Error Resilient
Techniques

5.1 Introduction

Due to the increasing of variability and device degradation in deep submicron (DSM)
technologies, several timing error resilient techniques have been proposed to maintain the
reliability of circuit operation. Razor I [55] proposed edge-sensitive FF for pipelined
processors that can detect timing errors caused by PVT variations propagated from
combinational logics with high area overhead and high power dissipation. Later on, it is
extended to Razor II to be used for Single Event Upset (SEU) tolerance. Razor II relies on
detecting spurious transitions at its input in order to flag timing errors on the processor
critical paths. It is designed as level-sensitive latch that has minimum delay constraints. The
major problems with Razor FFs are their sensitivity to short path delay and meta-stability.
Moreover, they require an error recovery circuit because they permit errors to occur, and this
increases the complexity of the design and results in higher performance overhead. Z. Ming
et al. [57] presented Adaptive Variation and Error Resilient Agent (AVERA) that provides
variation diagnosis, degradation detection and soft error correction. But it can do only one
operation at a time and mode selection remains a critical issue. T. Sato et al. [58] proposed
Canary flip-flop that relies on a delay buffer at the data input and provides pre-detection of
timing errors. Canary flip-flop only detects the timing error for pre-sampled data and it
suffers from strict timing margin on the critical path and area overhead. For that, Y. Kunitake
et al. [59] proposed a selective replacement method for Canary FF in order to reduce the area
and power overhead. T. Azam et al. [60] proposed an error resilient sequential FF based on
post sampling scheme with delayed clock and the level sensitive latch while Y. Jang et al.

58
[61] proposed a low-power variation-aware FF that does not require longer delayed clock or
additional error correction hardware.

This chapter presents an implementation of timing error sensor circuit that can be
embedded to the degraded critical paths in the datapath sub-blocks due to NBTI aging effects.
In next section, the error sensor circuit operation is explained. The behavior of the sensor
circuit is verified on 20 stage inverter chain by using HSPICE simulation as presented in
section 5.3.

5.2 Error Detection Scheme

The block diagram of the proposed sensor is illustrated in Figure 5.1. The sensor consists of
three flip flops, two XOR gates and delay buffer. The flip-flops Q1 and Q2 called main FF
and shadow FF respectively. The buffer gate is used for delaying the clock signal of the
shadow FF which is expected to always hold correct values. The buffer is tuned to cause a
delay equal to quarter of clock period. The output of the XOR gate (D3) between Q1 and Q2
is a pulse signal; its width equals to the delay value caused by the buffer gate in case of the
values latched in the main and shadow FFs are matched.

Figure 5.1: Timing error detection scheme

59
The second XOR gate between clock (CP1) and its delayed version (CP2) acts as clock
multiplier as its output signal (CP3) has double clock frequency which means its level width
equals the delay value caused by the buffer gate (equals to pulse width of D3 in case of no
errors). Therefore, Q3 will not latch the pulse generated from the XOR gate between Q1 and
Q2 in case of no timing error occurs. On the other hand, if the main FF latched incorrect data
but still the correct data stored in the shadow FF, then XOR gate between Q1 and Q2 will
generate a wider pulse than in case of no timing error and can be detected by Q3 that
represents the error signal. This circuit has sensitivity of detecting timing errors from 10ps up
to 100ps. In order to increase sensor sensitivity, it requires increasing clock frequency and
buffer stages.

(a)

(b)

Figure 5.2: Simulation of 20 stage inverter chain with the proposed error detection circuit (a) before
aging, (b) after aging

60
5.3 Simulation Results

To analyze the behavior of the proposed sensor circuit, we implemented 20 stage inverter
chain based on 45nm PTM transistor model [62] at supply voltage of 0.8V and 85 . The
proposed sensor was embedded at the output of the inverter chain to detect timing failures
after aging. The 20 stage inverter chain with the sensor circuit are simulated using HSPICE at
clock frequency of 2.5GHz and BTI impact is augmented to the transistors with Verilog-A
modules for stress time of 10 years. Each module generates voltage shift that depends on the
activity factor of the transistor. The simulation results before and after aging are shown in
Figure 5.2 where ‘data’ and ‘data1’ are the input and output signals of the inverter chain
respectively. In the absence of BTI, both Q1 and Q2 store same data and therefore, there is no
error detected in Q3 as shown in Figure 5.2 (a). On the other hand, in the presence of BTI, the
correct data is captured by Q2 while the data transition misses set up time requirements of Q1
and therefore wrong data is stored in Q1 consequently, the error signal is high as it is
captured by Q3 as shown in Figure 5.2 (b).

Figure 5.3 shows paths delay distribution before and after aging. The delays of paths in a
circuit are different with each other due to their logic depth, wire length, fan-out and so on.
The circuit paths with small delay will not cause a timing error as they are less sensitive to
aging effects. Therefore, these paths will not need to be integrated by timing error sensors.
While the paths that are susceptible to aging effects and might cause timing errors. These
paths do not satisfy the target cycle time that is specified. Therefore they need to be
integrated by the sensor circuit.

Figure 5.3: Paths delay distributions

61
5.4 Chapter Review

With the continuous downscaling of CMOS devices, variations in key device parameters such
as threshold voltage ( ), and oxide thickness ( ) are increasing at an alarming rate. This
has led to significant problems in terms of reliability, circuit resilience and yields. In this
chapter, time error resilient techniques such as Razor flip flop and Canary flip flop are
investigated. Moreover, timing error sensor circuit is implemented and its behavior is
validated with 20 stage inverter chain. Simulation results before and after aging are
presented.

62
This page left intentionally blank

63
Chapter 6

Conclusions and Future Work

6.1 Conclusions

NBTI aging is one of the critical challenges for the future of semiconductor industry. NBTI
affects the overall reliability of nano-scaled circuits and potentially causes system failure.
Being able to predict NBTI degradation is crucial for the development and long-term success
of novel transistor structures, such as the FinFET. Thus, this work presents a comprehensive
study of NBTI aging effects on datapath sub-blocks at predictive 10nm FinFET technology.

We begin our thesis with an overview of the emerged reliability issues due to the
continuous downscaling of CMOS technologies While it is shown that “Time-dependent”
variability issues are a major source of device variability that cause a formation of gate oxide
defects at elevated temperature, and resulting in deviation of device characteristics (e.g.,
threshold voltage, carrier mobility, sub-threshold slope, drain current) and their impacts get
worse as a circuit ages.

In chapter 2, transistor-level models for NBTI are investigated. NBTI phenomenon was
first explained by the Reaction-Diffusion (R-D) model. But due to inconsistencies with
experimental data in predicting the recovery phase of NBTI, the Atomic Trap-based model is
developed. The model was very accurate and provides a very detailed cycle-by-cycle
degradation but it wasn’t scalable and suffered on complex designs because of its stochastic
nature. For these reasons, CET map-based model is developed that is used in this thesis work
because of its superiority; such as higher accuracy (i.e. covering the complete defect space),
better long-term aging predictions and lightweight nature. The second part of this chapter, it
examines the aging effects at gate-level. While NBTI aging depends on some parameters

64
such as; duty factor, gate size, gate drive strength, stress location, transistors organizations
within a gate and frequency. The impact of each parameter is investigated.

In chapter 3, NBTI aging-aware framework that based on the-state-of-the-art CET map-


based model is explained. The concept of workload-dependent, instance-based, aged-aware
library characterization is used within commercial EDA tools to investigate NBTI impacts on
sub-block-level. The flow has the advantage of high accuracy and scalability. Moreover, it is
applicable for the rest of the digital design flow, e.g. placement & routing.

In chapter 4, Aging impacts on a large set of architectural topologies of datapath as well


as representative workload benchmarks are evaluated. Aging sensitivity to workload
variations is linked to architectural parameters LD and FO while it is shown that aging
sensitivity increases for higher LD and FO. Moreover, NBTI aging can cause a replacement
of circuit’s critical path by a non-critical one and this affects the normal operation and the
logic functionality of the circuits and may cause complete system failure when the
degradation effects reach to a certain limit.

Finally, in chapter 5, timing error resilient techniques are investigated such as Razor and
Canary circuits. Moreover, a timing error sensor circuit is implemented and its behavior is
validated with 20 stage inverter chain. Simulation results before and after aging are
presented.

6.2 Suggestions for Future Work

In this Thesis, the analysis of NBTI aging effects at sub-block-level is discussed. It is desired
this work contributes to the present knowledge and provide insights for future studies. From
the context of this work, several topics of interest can be found. This section aims to suggest
some of these directions for future research and they are as follows:

1) In this thesis we only focused on analyzing NBTI aging effects. While other sources
of variation such as Hot Carrier Injection (HCI) in NMOS device, Time-Dependent

65
Dielectric Breakdown (TDDB) in gate dielectric, and Electro-Migration (EM) in
interconnects, etc. are also important reliability issues and can cause shifts of device
characteristics since the representative reliability-aware digital flow is generic and it
can easily be adapted to cover the impacts of different time-dependent degradation
mechanisms with their own models.

2) Studying of mitigation techniques for NBTI aging effects is a fairly new and vast
topic. Mitigation designates any method of limiting or controlling BTI and its impacts
on a modern-day electronic device. Currently, a number of research projects are
conducted to develop fully automated schemes that are able to eliminate BTI effects
with little or no trade-offs in terms of performance, power consumption or costs.

3) The reliability-aware digital flow can be extended to cover Statistical Static Timing
Analysis (SSTA) to improve the accuracy of aging predictions. In reality, the
distributions of logic gates in a circuit are correlated depending on their statistical
properties. To obtain the information of path timing degradation, we can incorporate
statistical timing analysis techniques to handle the correlation.

4) In this work, NBTI aging effects are investigated on sub-block-level of datapath, our
study can be extended to evaluate the performance degradation on higher levels of
abstraction such as processor-level or system-level. For that, new methodologies are
required to predict aging behavior such as Dynamic Reliability Management (DRM)
techniques.

66
Bibliography

[1] G. E. Moore, “Cramming more components onto integrated circuits,” Proc. of Electronics, vol. 38,
pp. 114–117, April 19, 1965.

[2] S. Thompson, P. Packan, and M. Bohr, “MOS scaling: transistor challenges for the 21st century,”
Intel Technology Journal, vol. 2, pp. 1–19, 1998.

[3] K. J. Kuhn, “CMOS scaling for the 22nm node and beyond: Device physics and
technology,” International Symposium on VLSI Technology, pp. 1–2, Apr. 2011.

[4] K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J. Nowak, D. J.


Pearson, and N. J. Rohrer, "High-performance CMOS variability in the 65-nm regime and
beyond," IBM Journal of Research and Development, vol. 50, pp. 433–449, Jul-Sep 2006.

[5] D. K. Schroder and J. A. Babcock, “Negative bias temperature instability: Road to cross in deep
submicron silicon semiconductor manufacturing,” Journal of Applied Physics, vol. 94, pp. 1–18,
Jul. 2003.

[6] X. Wang, B. Cheng, A. R. Brown, C. Millar, J. B. Kuang, S. Nassif, and A. Asenov, “Statistical
variability and reliability in nanoscale finfets,” in IEEE Int. Electron Devices Meeting (IEDM),
pp. 1–4, 2011.

[7] B. Kaczer, T. Grasser, P. J. Roussel, J. Franco, R. Degraeve, L. Ragnarsson, E. Simoen, G.


Groeseneken, and H. Reisinger, “Origin of NBTI variability in deeply scaled pFETs,” in Proc.
IEEE IRPS, pp. 26–32, 2010.

[8] P. Woerlee, P. Damink, M. van Dort, C. Juffermans et al., “The impact of scaling on hot-carrier
degradation and supply voltage of deep-submicron NMOS transistors,” in IEEE Int. Electron
Devices Meeting (IEDM), pp. 537–540, 1991.

[9] Y. Lee, N. Mielke, M. Agostinelli, S. Gupta, R. Lu, and W. McMahon, “Prediction of logic
product failure due to thin-gate oxide breakdown,” in Proc. IEEE IRPS, pp. 18–28, 2006.

[10] The International Technology Roadmap for Semiconductors (ITRS), 2009. http://public.itrs.net

[11] V. Huard, F. Cacho, Y. Mamy Randriamihaja, and A. Bravaix, “From defects creation to circuit
reliability—A bottom-up approach,”in Microelectron. Eng., vol. 88, no. 7, pp. 1396–1407, Jul.
2011.

[12] Y. Miura and Y. Matukura, “Investigation of silicon-silicon dioxide interface using MOS
structure,” Japanese Journal of Applied Physics, vol. 5, pp. 180, 1966.

[13] S. Khan, S. Hamdioui, H. Kükner, P. Raghavan, and F. Catthoor, “BTI impact on logical gates in
nano-scale cmos technology,” in IEEE 15th International Symposium on Design and Diagnostics
of Electronic Circuits Systems (DDECS), pp. 348–353, 2012.

[14] H. Kükner, P. Weckx, P. Raghavan, B. Kaczer, F. Catthoor, L. Van Der Perre, R. Lauwereins,
and G. Groeseneken, “Impact of duty factor, stress stimuli, and gate drive strength on gate delay
degradation with an atomistic trap-based BTI model,” in 15th Euromicro Conf. on Digital
System Design (DSD), pp. 1–7, 2012.

67
[15] V. Huard, M. Dennis and C. Parthasarathy, “NBTI degradation: From physical mechanisms to
modelling,” Microelectronics Reliability, vol. 46, pp. 1–23, 2006.

[16] T. Grasser, B. Kaczer, W. Goes, T. Aichinger, P. Hehenberger, and M. Nelhiebel, “A two-stage


model for negative bias temperature instability,” in Proc. IEEE IRPS, pp. 33–44, 2009.

[17] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, T. Liu, and Y. Cao, “The impact of NBTI effect
on combinational circuit: Modeling, simulation, and analysis,” IEEE Trans. on Very Large Scale
Integration (VLSI) Systems, vol. 18, no. 2, pp. 173–183, 2010.

[18] M. A. Alam, and S. Mahapatra, “A comprehensive model of PMOS NBTI degradation,”


Microelectronics Reliability, vol. 45, pp. 71–81, 2005.

[19] M. Toledano-Luque, B. Kaczer, J. Franco, P. Roussel, T. Grasser, and G. Groeseneken, “Defect-


centric perspective of time-dependent BTI variability,” Microelectronics Reliability, vol. 52, no.
910, pp. 1883–1890, 2012.

[20] T. Grasser, P. Wagner, H. Reisinger, T. Aichinger, G. Pobegen, M. Nelhiebel, and B. Kaczer,


“Analytic modeling of the bias temperature instability using capture/emission time maps,” in
Proc. IEEE International Electron Devices Meeting (IEDM), pp. 27.4.1–27.4.4, Dec. 2011.

[21] K. Jeppson and C. Svensson, “Negative bias stress of mos devices at high electric fields and
degradation of nmos devices,” Journal of Applied Physics, vol. 48, pp. 2004–2014, 1977.

[22] J. Yang, J. Yang, X. Liu, R. Han, J. Kang, Z. Gan, C. Liao, and H. Wu, “A new model for two-
dimensional electrical-field-dependent Vth instability of pMOSFETs with ultrathin DPN gate
dielectrics,” IEEE Electron Device Lett., vol. 32, no. 5, pp. 605–607, May 2011.

[23] S. V. Kumar, “Reliability-aware and variation-aware cad techniques,” Ph.D. dissertation,


University of Minnesota, Minneapolis, MN, 2009.

[24] S. Kumar, C. Kim, and S. Sapatnekar, “An analytical model for negative bias temperature
instability,” in IEEE Int. Conf. on Computer-Aided Design (ICCAD), pp. 493–496, 2006.

[25] B. Kaczer, V. Arkhipov, R. Degraeve, N. Collaert, G. Groeseneken, and M. Goodwin, “Disorder-


controlled-kinetics model for negative bias temperature instability and its experimental
verification,” in Proc. IEEE IPRS, pp. 381–387, 2005.

[26] T. Grasser, W. Gos, and B. Kaczer, “Dispersive transport and negative bias temperature
instability: Boundary conditions, initial conditions, and transport models,” IEEE Trans. on
Device and Materials Reliability, vol. 8, no. 1, pp. 79–97, 2008.

[27] M. Toledano-Luque, B. Kaczer, P. Roussel, T. Grasser, G. Wirth, J. Franco, C. Vrancken, N.


Horiguchi, and G. Groeseneken, “Response of a single trap to AC negative bias temperature
stress,” in Proc. IEEE IRPS, pp. 4A.2.1–4A.2.8, Apr. 2011.

[28] G. Wirth, R. Silva, and B. Kaczer, “Statistical model for MOSFET bias temperature instability
component due to charge trapping,” IEEE Trans. on Electron and Devices, vol. 58, pp. 2743–
2751, 2011.

[29] B. Kaczer, S. Mahato, V. de Almeida Camargo, M. Toledano-Luque, P. Roussel, T. Grasser, F.


Catthoor, P. Dobrovolny, P. Zuber, G. Wirth, and G. Groeseneken, “Atomistic approach to
variability of bias temperature instability in circuit simulations,” in Proc. IEEE IRPS, pp.
XT.3.1–XT.3.5, 2011.

[30] T. Grasser, H. Reisinger, P. Wagner, F. Schanovsky, W. Goes, and B. Kaczer, “The time
dependent defect spectroscopy (TDDS) for the characterization of the bias temperature
instability,”in Proc. IEEE IRPS, pp. 16–25, 2010.

68
[31] A. van der Wel, E. A. M. Klumperink, J. S. Kolhatkar, E. Hoekstra, M. S. Snoeij, C. Salm, H.
Wallinga, and B. Nauta, “Low-frequency noise phenomena in switched MOSFETs,” IEEE J.
Solid-State Circuits, vol. 42, no. 3, pp. 540–550, Mar. 2007.

[32] M. J. Kirton and M. J. Uren, “Noise in solid-state microstructures: A new perspective on


individual defects, interface states and low-frequency (1/f) noise,” Adv. Phys., vol. 38, no. 4, pp.
367–468, 1989.

[33] H. Reisinger, T. Grasser, W. Gustin and C. Schlünder, “The statistical analysis of individual
defects constituting NBTI and its implications for modeling DC- and AC-stress,” in Proc. IEEE
IRPS, pp. 7–15, 2010.

[34] H. Reisinger, T. Grasser, K. Ermisch, H. Nielen, W. Gustin, and C. Schlünder, “Understanding


and modeling AC BTI,” in Proc. IEEE IRPS, pp. 1–8, 2011.

[35] T. Grasser, B. Kaczer, H. Reisinger, P. J. Wagner, and M. Toledano-Luque, “On the frequency
dependence of the bias temperature instability,” in Proc. IEEE IRPS, pp. XT.8.1–XT.8.7, 2012.

[36] T. Grasser, W. Goes and B. Kaczer, “Critical modeling issues in negative bias temperature
instability,” in 215th ECS meeting, pp. 265–287, 2009.

[37] W. Abadeer and W. Ellis, “Behavior of NBTI under AC dynamic circuit conditions,” in Proc.
IEEE IRPS, pp. 17–22, 2003.

[38] V. Huard, M. Denais, F. Perrier and C. Parthasarathy, “Static and dynamic NBTI stress in pMOS
transistors,” in Proc. Insulating Films Semicond. (INFOS), pp. 1–2, 2003.

[39] V. Huard, M. Denais, F. Perrier, N. Revil, C. Parthasarathy, A. Bravaix and E. Vincent, “A


thorough investigation of MOSFETs NBTI degradation,” Microelectronics Reliability, vol. 45,
no. 1, pp. 83–98, 2005.

[40] W. Wang, V. Balakrishnan, B. Yang, and Y. Cao, “Statistical prediction of NBTI-induced circuit
aging,” in 9th Int. Conf. on Solid-State and Integrated-Circuit Technology, ICSICT 2008, pp.
416–419, 2008.

[41] J. Velamala, K. Sutaria, T. Sato and Y. Cao, “Physics matters: Statistical aging prediction under
trapping/detrapping,” in 49th IEEE Design Automation Conference (DAC), pp. 139–144, 2012.

[42] D. Lorenz, “Aging analysis of digital integrated circuits,” Ph.D. dissertation, Technische
Universitat München, Germany, 2012.

[43] V. Huard, E. Pion, F. Cacho, D. Croain, V. Robert, R. Delater, P. Mergault, S. Engels, P.


Flatresse, N. Amador and L. Anghel, “A predictive bottom-up hierarchical approach to digital
system reliability,” in Proc. IEEE IRPS, pp. 4B.1.1–4B.1.10, 2012.

[44] V. Huard, “Two independent components modeling for negative bias temperature instability,” in
Proc. IEEE IRPS, pp. 33–42, 2010.

[45] H. Kükner, M. Khatib, S. Morrison, P. Weckx, P. Raghavan, B. Kaczer, F. Catthoor, F. Robert,


L. Van der Perre, R. Lauwereins and G. Groeseneken, “Degradation Analysis of Datapath Logic
Subblocks under NBTI Aging in FinFET Technology,” ISQED’14 (Accepted).

[46] S. Morrison, “Circuit-level reliability analysis based on an atomic trap-based model for bias
temperature instability (BTI),” Master dissertation, Université Libre de Bruxelles, Belgium,
2013.

69
[47] P. Weckx, B. Kaczer, M. Toledano-Luque, T. Grasser, Ph. J. Roussel, H. Kükner, P. Raghavan,
F. Catthoor and G. Groeseneken, “Defect-based methodology for workload-dependent circuit
lifetime projections application to sram,” in Proc. IEEE IRPS, pp. 1–7, 2013.

[48] J. Kawa, “FinFET design, manufacturability, and reliability” – Synopsys.com

[49] K. Kang, H. Kufluoglu, K. Roy and M. A. Alam, “Impact of negative-bias temperature


instability in nanoscale sram array: Modeling and analysis,” IEEE Trans. on Computer-Aided
Design of Integrated Circuits and Systems, vol. 26, no. 10, pp. 1770–1781, 2007.

[50] S. Knowles, “A family of adders,” in 14th IEEE Symposium on Computer Arithmetic, pp. 30–34,
Apr. 1999.

[51] D. Harris, “A taxonomy of parallel prefix networks,” in proc. Signals, Systems and Computers,
pp. 2213–2217, 2004.

[52] C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. on Electronic Computers, vol.
13, no. 1, pp. 14–17, Feb. 1964.

[53] L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, vol. 34, pp. 349–356, 1965.

[54] A. D. Booth, “A signed binary multiplication technique,” Quarterly Journal of Mathematics,


vol. 4, pt. 2, pp. 236–240, 1951.

[55] D. Ernst, N. Sung Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K.
Flautner, and T. Mudge, “ Razor: A low-power pipeline based on circuit-level timing
speculation,” in Proc. IEEE/ACM Int. Symp. Microarchitecture (MICRO-36), pp. 7–18, Dec.
2003.

[56] D. Blaauw, S. Kalaiselvan, K. Lai, W. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull, “Razor II:
In situ error detection and correction for PVT and SER tolerance,” in IEEE J. Sold-State
Circuits, vol. 4, no. 1, pp. 400–401, Feb. 2008.

[57] Z. Ming, T. M. Mak, J. Tschanz, K. Kim, N. Seifert and D. Lu, “Design for resilience to Soft
errors and variations,” in proc. 13th IEEE International On-Line Testing Symposium (IOTS07),
pp. 23–28, July 2007.

[58] T. Sato and Y. Kunitake, “A simple flip-flop circuit for typical-case designs for DFM,” in Proc.
8th International Symposium on Quality Electronic Design, pp. 539–544, 2007.

[59] Y. Kunitake, T. Sato and H. Yasuura, “A replacement strategy for canary flip-flops,” in Proc.
2010 Pacific Rim International Symposium on Dependable Computin, pp. 227–228, 2010.

[60] T. Azam and D. Cumming, “Robust low power design in nano-cmos technologies,” in proc.
IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2466–2469, May 2010.

[61] Y. Jang, C. Yoon, J. Kim and W. -K. Cho, “Low-power variation aware flip flop,” in Proc. IEEE
International Symposium on Circuits and Systems (ISCAS), pp. 488–491, May 2012.

[62] Predictive Technology Models (PTM) are available online at: http://www.eas.asu.edu/~ptm/

70
71

View publication stats

Вам также может понравиться