You are on page 1of 30

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Home

Search this site...

RSS

BipeenKulkarni

GO

Click here if you are not able to view the Images

ASIC SoC Physical Design guidelines and Solutions under one


cloud
RSS - Posts

About

EDA tools

Foundry

From My desk

Physical Design Basics

Useful Links

VLSI Interview preparation

VLSI tutorials and Projects

Backend (Physical Design) Interview Questions


and Answers
Filed under: VLSI Interview preparation 1 Comment

Follow Blog via Email


March 29, 2013

Enter your email address to follow this


blog and receive notifications of new
posts by email.
Join 28 other followers

Below are the sequence of questions asked for a physical design engineer.
In which field are you interested?

Follow

Answer to this question depends on your interest, expertise and to the requirement for which
you have been interviewed.
Well..the candidate gave answer: Low power design

Like Us On Facebook

Can you talk about low power techniques? How low power and latest 90nm/65nm technologies are
related?
Refer here and browse for different low power techniques.
Do you know about input vector controlled method of leakage reduction?
Leakage current of a gate is dependant on its inputs also. Hence find the set of inputs which
gives least leakage. By applyig this minimum leakage vector to a circuit it is possible to
decrease the leakage current of the circuit when it is in the standby mode. This method is
known as input vector controlled method of leakage reduction.
How can you reduce dynamic power?
-Reduce switching activity by designing good RTL
-Clock gating
-Architectural improvements
-Reduce supply voltage
-Use multiple voltage domains-Multi vdd
What are the vectors of dynamic power?

Follow

Voltage and Current


How will you do power planning?
Refer here for power planning.
If you have both IR drop and congestion how will you fix it?
-Spread macros
-Spread standard cells
-Increase strap width
-Increase number of straps
-Use proper blockage

Follow
BipeenKulkarni
Get every new post delivered
to your Inbox.

Rescent Posts
Join 28 other followers
Enter
yourChip
email with
address
How to Blast
Your
High Energy
Neutron Beams

Timing Paths : StaticSign


Timing
me up Analysis

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Is increasing power line width and providing more number of straps are the only solution to IR
drop?
-Spread macros
-Spread standard cells
-Use proper blockage
In a reg to reg path if you have setup problem where will you insert buffer-near to launching flop or
capture flop? Why?
(buffers are inserted for fixing fanout voilations and hence they reduce setup voilation;
otherwise we try to fix setup voilation with the sizing of cells; now just assume that you must
insert buffer !)

(STA) basic
Design and Verification Techniques for
Powered by WordPress.com
Clock Gating
EMCA RC Receiver
SoC Power Integrity Challenges (From the
original article by daniel_payne)

My Book

Near to capture path.


Because there may be other paths passing through or originating from the flop nearer to lauch
flop. Hence buffer insertion may affect other paths also. It may improve all those paths or
degarde. If all those paths have voilation then you may insert buffer nearer to launch flop
provided it improves slack.
How will you decide best floorplan?
Refer here for floor planning.
What is the most challenging task you handled? What is the most challenging job in P&R flow?
-It
-It
-It
-It
-It
-It
-It

may
may
may
may
may
may
may

be
be
be
be
be
be
be

power planning- because you found more IR drop


low power target-because you had more dynamic and leakage power
macro placement-because it had more connection with standard cells or macros
CTS-because you needed to handle multiple clocks and clock domain crossings
timing-because sizing cells in ECO flow is not meeting timing
library preparation-because you found some inconsistancy in libraries.
DRC-because you faced thousands of voilations

Book Published by Me

Blog Stats
10,179 hits

How will you synthesize clock tree?


-Single clock-normal synthesis and optimization
-Multiple clocks-Synthesis each clock seperately
-Multiple clocks with domain crossing-Synthesis each clock seperately and balance the skew
How many clocks were there in this project?
-It is specific to your project
-More the clocks more challenging !
How did you handle all those clocks?
-Multiple clocks>synthesize seperately>balance the skew>optimize the clock tree
Are they come from seperate external resources or PLL?
-If it is from seperate clock sources (i.e.asynchronous; from different pads or pins) then
balancing skew between these clock sources becomes challenging.
-If it is from PLL (i.e.synchronous) then skew balancing is comparatively easy.
Why buffers are used in clock tree?
To balance skew (i.e. flop to flop delay)
What is cross talk?
Switching of the signal in one net can interfere neigbouring net due to cross coupling
capacitance.This affect is known as cros talk. Cross talk may lead setup or hold voilation.
How can you avoid cross talk?
-Double spacing=>more spacing=>less capacitance=>less cross talk
-Multiple vias=>less resistance=>less RC delay
-Shielding=> constant cross coupling capacitance =>known value of crosstalk
-Buffer insertion=>boost the victim strength
How shielding avoids crosstalk problem? What exactly happens there?

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

-High frequency noise (or glitch)is coupled to VSS (or VDD) since shilded layers are connected to
either VDD or VSS.
Coupling capacitance remains constant with VDD or VSS.
How spacing helps in reducing crosstalk noise?
width is more=>more spacing between two conductors=>cross coupling capacitance is
less=>less cross talk
Why double spacing and multiple vias are used related to clock?
Why clock? because it is the one signal which chages it state regularly and more compared to
any other signal. If any other signal switches fast then also we can use double space.
Double spacing=>width is more=>capacitance is less=>less cross talk
Multiple vias=>resistance in parellel=>less resistance=>less RC delay
How buffer can be used in victim to avoid crosstalk?
Buffer increase victims signal strength; buffers break the net length=>victims are more tolerant
to coupled signal from aggressor.
0 comments Links to this post
Labels: Physical Design, Synthesis, Timing Analysis

Physical Design Questions and Answers


I am getting several emails requesting answers to the questions posted in this blog. But it is
very difficult to provide detailed answer to all questions in my available spare time. Hence i
decided to give short and sweet one line answers to the questions so that readers can
immediately benefited. Detailed answers will be posted in later stage.I have given answers to
some of the physical design questions here. Enjoy !
What parameters (or aspects) differentiate Chip Design and Block level design?
Chip design has I/O pads; block design has pins.
Chip design uses all metal layes available; block design may not use all metal layers.
Chip is generally rectangular in shape; blocks can be rectangular, rectilinear.
Chip design requires several packaging; block design ends in a macro.
How do you place macros in a full chip design?
First check flylines i.e. check net connections from macro to macro and macro to standard cells.
If there is more connection from macro to macro place those macros nearer to each other
preferably nearer to core boundaries.
If input pin is connected to macro better to place nearer to that pin or pad.
If macro has more connection to standard cells spread the macros inside core.
Avoid criscross placement of macros.
Use soft or hard blockages to guide placement engine.
Differentiate between a Hierarchical Design and flat design?
Hierarchial design has blocks, subblocks in an hierarchy; Flattened design has no subblocks and
it has only leaf cells.
Hierarchical design takes more run time; Flattened design takes less run time.
Which is more complicated when u have a 48 MHz and 500 MHz clock design?
500 MHz; because it is more constrained (i.e.lesser clock period) than 48 MHz design.
Name few tools which you used for physical verification?
Herculis from Synopsys, Caliber from Mentor Graphics.
What are the input files will you give for primetime correlation?

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Netlist, Technology library, Constraints, SPEF or SDF file.


If the routing congestion exists between two macros, then what will you do?
Provide soft or hard blockage
How will you decide the die size?
By checking the total area of the design you can decide die size.
If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna
problem?
Poly
If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of
using 7LM?
Because top two metal layers are required for global routing in chip design. If top metal layers
are also used in block level it will create routing blockage.
In your project what is die size, number of metal layers, technology, foundry, number of clocks?
Die size: tell in mm eg. 1mm x 1mm ; remeber 1mm=1000micron which is a big size !!
Metal layers: See your tech file. generally for 90nm it is 7 to 9.
Technology: Again look into tech files.
Foundry:Again look into tech files; eg. TSMC, IBM, ARTISAN etc
Clocks: Look into your design and SDC file !
How many macros in your design?
You know it well as you have designed it ! A SoC (System On Chip) design may have 100 macros
also !!!!
What is each macro size and number of standard cell count?
Depends on your design.
What are the input needs for your design?
For synthesis: RTL, Technology library, Standard cell library, Constraints
For Physical design: Netlist, Technology library, Constraints, Standard cell library
What is SDC constraint file contains?
Clock definitions
Timing exception-multicycle path, false path
Input and Output delays
How did you do power planning?
How to calculate core ring width, macro ring width and strap or trunk width?
How to find number of power pad and IO power pads?
How the width of metal and number of straps calculated for power and ground?
Get the total core power consumption; get the metal layer current density value from the tech
file; Divide total power by number sides of the chip; Divide the obtained value from the current
density to get core power ring width. Then calculate number of straps using some more
equations. Will be explained in detail later.
How to find total chip power?
Total chip power=standard cell power consumption,Macro power consumption pad power
consumption.
What are the problems faced related to timing?
Prelayout: Setup, Max transition, max capacitance
Post layout: Hold

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

How did you resolve the setup and hold problem?


Setup: upsize the cells
Hold: insert buffers
In which layer do you prefer for clock routing and why?
Next lower layer to the top two metal layers(global routing layers). Because it has less
resistance hence less RC delay.
If in your design has reset pin, then itll affect input pin or output pin or both?
Output pin.
During power analysis, if you are facing IR drop problem, then how did you avoid?
Increase power metal layer width.
Go for higher metal layer.
Spread macros or standard cells.
Provide more straps.
Define antenna problem and how did you resolve these problem?
Increased net length can accumulate more charges while manufacturing of the device due to
ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric
property of the gate and gate may conduct causing damage to the MOSFET. This is antenna
problem.
Decrease the length of the net by providing more vias and layer jumping.
Insert antenna diode.
How delays vary with different PVT conditions? Show the graph.
P increase->dealy increase
P decrease->delay decrease
V increase->delay decrease
V decrease->delay increase
T increase->delay increase
T decrease->delay decrease
Explain the flow of physical design and inputs and outputs for each step in flow.
Click here to see the flow diagram
What is cell delay and net delay?
Gate delay
Transistors within a gate take a finite time to switch. This means that a change on the input of
a gate takes a finite time to cause a change on the output.[Magma]
Gate delay =function of(i/p transition time, Cnet+Cpin).
Cell delay is also same as Gate delay.
Cell delay
For any gate it is measured between 50% of input transition to the corresponding 50% of output
transition.
Intrinsic delay
Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the cell.
It is defined as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
caused by the internal capacitance associated with its transistor.
This delay is largely independent of the size of the transistors forming the gate because

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

increasing size of transistors increase internal capacitors.


Net Delay (or wire delay)
The difference between the time a signal is first applied to the net and the time it reaches other
devices connected to that net.
It is due to the finite resistance and capacitance of the net.It is also known as wire delay.
Wire delay =fn(Rnet , Cnet+Cpin)
What are delay models and what is the difference between them?
Linear Delay Model (LDM)
Non Linear Delay Model (NLDM)
What is wire load model?
Wire load model is NLDM which has estimated R and C of the net.
Why higher metal layers are preferred for Vdd and Vss?
Because it has less resistance and hence leads to less IR drop.
What is logic optimization and give some methods of logic optimization.
Upsizing
Downsizing
Buffer insertion
Buffer relocation
Dummy buffer placement
What is the significance of negative slack?
negative slack==> there is setup voilation==> deisgn can fail
What is signal integrity? How it affects Timing?
IR drop, Electro Migration (EM), Crosstalk, Ground bounce are signal integrity issues.
If Idrop is more==>delay increases.
crosstalk==>there can be setup as well as hold voilation.
What is IR drop? How to avoid? How it affects timing?
There is a resistance associated with each metal layer. This resistance consumes power causing
voltage drop i.e.IR drop.
If IR drop is more==>delay increases.
What is EM and it effects?
Due to high current flow in the metal atoms of the metal can displaced from its origial place.
When it happens in larger amount the metal can open or bulging of metal layer can happen.
This effect is known as Electro Migration.
Affects: Either short or open of the signal line or power line.
What are types of routing?
Global Routing
Track Assignment
Detail Routing
What is latency? Give the types?
Source Latency
It is known as source latency also. It is defined as the delay from the clock origin point to the
clock definition point in the design.
Delay from clock source to beginning of clock tree (i.e. clock definition point).

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

The time a clock signal takes to propagate from its ideal waveform origin point to the clock
definition point in the design.
Network latency
It is also known as Insertion delay or Network latency. It is defined as the delay from the clock
definition point to the clock pin of the register.
The time clock signal (rise or fall) takes to propagate from the clock definition point to a
register clock pin.
What is track assignment?
Second stage of the routing wherein particular metal tracks (or layers) are assigned to the
signal nets.
What is congestion?
If the number of routing tracks available for routing is less than the required tracks then it is
known as congestion.
Whether congestion is related to placement or routing?
Routing
What are clock trees?
Distribution of clock from the clock source to the sync pin of the registers.
What are clock tree types?
H tree, Balanced tree, X tree, Clustering tree, Fish bone
What is cloning and buffering?
Cloning is a method of optimization that decreases the load of a heavily loaded cell by
replicating the cell.
Buffering is a method of optimization that is used to insert beffers in high fanout nets to
decrease the dealy.

What is the difference between a latch and a flip-flop?


Both latches and flip-flops are circuit elements whose output depends not only on the present
inputs, but also on previous inputs and outputs.
They both are hence referred as sequential elements.
In electronics, a latch, is a kind of bistable multi vibrator, an electronic circuit which has two
stable states and thereby can store one bit of of information. Today the word is mainly used for
simple transparent storage elements, while slightly more advanced non-transparent (or clocked)
devices are described as flip-flops. Informally, as this distinction is quite new, the two words
are sometimes used interchangeably. [wiki]
In digital circuits, a flip-flop is a kind of bistable multi vibrator, an electronic circuit which has
two stable states and thereby is capable of serving as one bit of memory. Today, the term flipflop has come to generally denote non-transparent (clocked or edge-triggered) devices, while
the simpler transparent ones are often referred to as latches.[wiki]
A flip-flop is controlled by (usually) one or two control signals and/or a gate or clock signal.
Latches are level sensitive i.e. the output captures the input when the clock signal is high, so as
long as the clock is logic 1, the output can change if the input also changes.
Flip-Flops are edge sensitive i.e. flip flop will store the input only when there is a rising or
falling edge of the clock.
A positive level latch is transparent to the positive level(enable), and it latches the final input
before it is changing its level(i.e. before enable goes to 0 or before the clock goes to -ve
level.)
A positive edge flop will have its output effective when the clock input changes from 0 to 1
state (1 to 0 for negative edge flop) only.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Latches are faster, flip flops are slower.


Latch is sensitive to glitches on enable pin, whereas flip-flop is immune to glitches.
Latches take less gates (less power) to implement than flip-flops.
D-FF is built from two latches. They are in master slave configuration.
Latch may be clocked or clock less. But flip flop is always clocked.
For a transparent latch generally D to Q propagation delay is considered while for a flop clock
to Q and setup and hold time are very important.
Synthesis perspective: Pros and Cons of Latches and Flip Flops
In synthesis of HDL codes inappropriate coding can infer latches instead of flip flops. Eg.:if
and case statements. This should be avoided sa latches are more prone to glitches.
Latch takes less area, Flip-flop takes more area ( as flip flop is made up of latches) .
Latch facilitate time borrowing or cycle stealing whereas flip flops allow synchronous logic.
Latches are not friendly with DFT tools. Minimize inferring of latches if your design has to be
made testable. Since enable signal to latch is not a regular clock that is fed to the rest of the
logic. To ensure testability, you need to use OR gate using enable and scan_enable signals as
input and feed the output to the enable port of the latch. [ref]
Most EDA software tools have difficulty with latches. Static timing analyzers typically make
assumptions about latch transparency. If one assumes the latch is transparent (i.e.triggered by
the active time of clock,not triggered by just clock edge), then the tool may find a false timing
path through the input data pin. If one assumes the latch is not transparent, then the tool may
miss a critical path.
If target technology supports a latch cell then race condition problems are minimized. If target
technology does not support a latch then synthesis tool will infer it by basic gates which is
prone to race condition. Then you need to add redundant logic to overcome this problem. But
while optimization redundant logic can be removed by the synthesis tool ! This will create
endless problems for the design team.
Due to the transparency issue, latches are difficult to test. For scan testing, they are often
replaced by a latch-flip-flop compatible with the scan-test shift-register. Under these
conditions, a flip-flop would actually be less expensive than a latch. Read a good article on
problems of latch published in eetimes long back !!
Flip flops are friendly with DFT tools. Scan insertion for synchronous logic is hassle free.
2 comments Links to this post
Labels: Digital design

What are the different types of delays in ASIC or VLSI design?


Different Types of Delays in ASIC or VLSI design
Source Delay/Latency
Network Delay/Latency
Insertion Delay
Transition Delay/Slew: Rise time, fall time
Path Delay
Net delay, wire delay, interconnect delay
Propagation Delay
Phase Delay
Cell Delay
Intrinsic Delay
Extrinsic Delay

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Input Delay
Output Delay
Exit Delay
Latency (Pre/post CTS)
Uncertainty (Pre/Post CTS)
Unateness: Positive unateness, negative unateness
Jitter: PLL jitter, clock jitter
Gate delay
Transistors within a gate take a finite time to switch. This means that a change on the input of
a gate takes a finite time to cause a change on the output.[Magma]
Gate delay =function of(i/p transition time, Cnet+Cpin).
Cell delay is also same as Gate delay.
Source Delay (or Source Latency)
It is known as source latency also. It is defined as the delay from the clock origin point to the
clock definition point in the design.
Delay from clock source to beginning of clock tree (i.e. clock definition point).
The time a clock signal takes to propagate from its ideal waveform origin point to the clock
definition point in the design.
Network Delay(latency)
It is also known as Insertion delay or Network latency. It is defined as the delay from the clock
definition point to the clock pin of the register.
The time clock signal (rise or fall) takes to propagate from the clock definition point to a
register clock pin.
Insertion delay
The delay from the clock definition point to the clock pin of the register.
Transition delay
It is also known as Slew. It is defined as the time taken to change the state of the signal. Time
taken for the transition from logic 0 to logic 1 and vice versa . or Time taken by the input
signal to rise from 10%(20%) to the 90%(80%) and vice versa.
Transition is the time it takes for the pin to change state.
Slew
Rate of change of logic.See Transition delay.
Slew rate is the speed of transition measured in volt / ns.
Rise Time
Rise time is the difference between the time when the signal crosses a low threshold to the time
when the signal crosses the high threshold. It can be absolute or percent.
Low and high thresholds are fixed voltage levels around the mid voltage
10% and 90% respectively or 20% and 80% respectively. The percent
absolute voltage levels at the time of measurement by calculating
difference between the starting voltage level and the final settled voltage

level or it can be either


levels are converted to
percentages from the
level.

Fall Time
Fall time is the difference between the time when the signal crosses a high threshold to the
time when the signal crosses the low threshold.
The low and high thresholds are fixed voltage levels around the mid voltage level or it can be
either 10% and 90% respectively or 20% and 80% respectively. The percent levels are converted
to absolute voltage levels at the time of measurement by calculating percentages from the

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

difference between the starting voltage level and the final settled voltage level.
For an ideal square wave with 50% duty cycle, the rise time will be 0.For a symmetric triangular
wave, this is reduced to just 50%.
Click here to see waveform.
Click here to see more info.
The rise/fall definition is set on the meter to 10% and 90% based on the linear power in Watts.
These points translate into the -10 dB and -0.5 dB points in log mode (10 log 0.1) and (10 log
0.9). The rise/fall time values of 10% and 90% are calculated based on an algorithm, which
looks at the mean power above and below the 50% points of the rise/fall times. Click here to
see more.
Path delay
Path delay is also known as pin to pin delay. It is the delay from the input pin of the cell to the
output pin of the cell.
Net Delay (or wire delay)
The difference between the time a signal is first applied to the net and the time it reaches other
devices connected to that net.
It is due to the finite resistance and capacitance of the net.It is also known as wire delay.
Wire delay =fn(Rnet , Cnet+Cpin)
Propagation delay
For any gate it is measured between 50% of input transition to the corresponding 50% of output
transition.
This is the time required for a signal to propagate through a gate or net. For gates it is the time
it takes for a event at the gate input to affect the gate output.
For net it is the delay between the time a signal is first applied to the net and the time it
reaches other devices connected to that net.
It is taken as the average of rise time and fall time i.e. Tpd= (Tphl+Tplh)/2.
Phase delay
Same as insertion delay
Cell delay
For any gate it is measured between 50% of input transition to the corresponding 50% of output
transition.
Intrinsic delay
Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the cell.
It is defined as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
caused by the internal capacitance associated with its transistor.
This delay is largely independent of the size of the transistors forming the gate because
increasing size of transistors increase internal capacitors.
Extrinsic delay
Same as wire delay, net delay, interconnect delay, flight time.
Extrinsic delay is the delay effect that associated to with interconnect. output pin of the cell to
the input pin of the next cell.
Input delay
Input delay is the time at which the data arrives at the input pin of the block from external
circuit with respect to reference clock.
Output delay
Output delay is time required by the external circuit before which the data has to arrive at the

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

output pin of the block with respect to reference clock.


Exit delay
It is defined as the delay in the longest path (critical path) between clock pad input and an
output. It determines the maximum operating frequency of the design.
Latency (pre/post cts)
Latency is the summation of the Source latency and the Network latency. Pre CTS estimated
latency will be considered during the synthesis and after CTS propagated latency is considered.
Uncertainty (pre/post cts)
Uncertainty is the amount of skew and the variation in the arrival clock edge. Pre CTS
uncertainty is clock skew and clock Jitter. After CTS we can have some margin of skew + Jitter.
Unateness
A function is said to be unate if the rise transition on the positive unate input variable causes
the ouput to rise or no change and vice versa.
Negative unateness means cell output logic is inverted version of input logic. eg. In inverter
having input A and output Y, Y is -ve unate w.r.to A. Positive unate means cell output logic is
same as that of input.
These +ve ad -ve unateness are constraints defined in library file and are defined for output
pin w.r.to some input pin.
A clock signal is positive unate if a rising edge at the clock source can only cause a rising edge
at the register clock pin, and a falling edge at the clock source can only cause a falling edge at
the register clock pin.
A clock signal is negative unate if a rising edge at the clock source can only cause a falling edge
at the register clock pin, and a falling edge at the clock source can only cause a rising edge at
the register clock pin. In other words, the clock signal is inverted.
A clock signal is not unate if the clock sense is ambiguous as a result of non-unate timing arcs
in the clock path. For example, a clock that passes through an XOR gate is not unate because
there are nonunate arcs in the gate. The clock sense could be either positive or negative,
depending on the state of the other input to the XOR gate.
Jitter
The short-term variations of a signal with respect to its ideal position in time.
Jitter is the variation of the clock period from edge to edge. It can varry +/- jitter value.
From cycle to cycle the period and duty cycle can change slightly due to the clock generation
circuitry. This can be modeled by adding uncertainty regions around the rising and falling edges
of the clock waveform.
Sources of Jitter Common sources of jitter include:
Internal circuitry of the phase-locked loop (PLL)
Random thermal noise from a crystal
Other resonating devices
Random mechanical noise from crystal vibration
Signal transmitters
Traces and cables
Connectors
Receivers
Click here to read more about jitter from Altera.
Click here to read what wiki says about jitter.
Skew
The difference in the arrival of clock signal at the clock pin of different flops.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Two types of skews are defined: Local skew and Global skew.
Local skew
The difference in the arrival of clock signal at the clock pin of related flops.
Global skew
The difference in the arrival of clock signal at the clock pin of non related flops.
Skew can be positive or negative.
When data and clock are routed in same direction then it is Positive skew.
When data and clock are routed in opposite then it is negative skew.
Recovery Time
Recovery specifies the minimum time that an asynchronous control input pin must be held
stable after being de-asserted and before the next clock (active-edge) transition.
Recovery time specifies the time the inactive edge of the asynchronous signal has to arrive
before the closing edge of the clock.
Recovery time is the minimum length of time an asynchronous control signal (eg.preset) must
be stable before the next active clock edge. The recovery slack time calculation is similar to the
clock setup slack time calculation, but it applies asynchronous control signals.
Equation 1:
Recovery Slack Time = Data Required Time Data Arrival Time
Data Arrival Time = Launch Edge + Clock Network Delay to Source Register + Tclkq+ Register
to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to Destination Register =Tsetup
If the asynchronous control is not registered, equations shown in Equation 2 is used to calculate the
recovery slack time. Equation 2:
Recovery Slack Time = Data Required Time Data Arrival Time
Data Arrival Time = Launch Edge + Maximum Input Delay + Port to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to Destination Register Delay+Tsetup
If the asynchronous reset signal is from a port (device I/O), you must make an Input Maximum
Delay assignment to the asynchronous reset pin to perform recovery analysis on that path.
Removal Time
Removal specifies the minimum time that an asynchronous control input pin must be held
stable before being de-asserted and after the previous clock (active-edge) transition.
Removal time specifies the length of time the active phase of the asynchronous signal has to be
held after the closing edge of clock.
Removal time is the minimum length of time an asynchronous control signal must be stable
after the active clock edge. Calculation is similar to the clock hold slack calculation, but it
applies asynchronous control signals. If the asynchronous control is registered, equations
shown in Equation 3 is used to calculate the removal slack time.
If the recovery or removal minimum time requirement is violated, the output of the sequential
cell becomes uncertain. The uncertainty can be caused by the value set by the resetbar signal or
the value clocked into the sequential cell from the data input.
Equation 3
Removal Slack Time = Data Arrival Time Data Required Time
Data Arrival Time = Launch Edge + Clock Network Delay to Source Register + Tclkq of Source
Register + Register to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to Destination Register + Thold
If the asynchronous control is not registered, equations shown in Equation 4 is used to calculate

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

the removal slack time.


Equation 4
Removal Slack Time = Data Arrival Time Data Required Time
Data Arrival Time = Launch Edge + Input Minimum Delay of Pin + Minimum Pin to Register
Delay
Data Required Time = Latch Edge + Clock Network Delay to Destination Register +Thold
If the asynchronous reset signal is from a device pin, you must specify the Input Minimum
Delay constraint to the asynchronous reset pin to perform a removal analysis on this path.

What is the difference between soft macro and hard macro?


What is the difference between hard macro, firm macro and soft macro?
or
What are IPs?
Hard macro, firm macro and soft macro are all known as IP (Intellectual property). They are
optimized for power, area and performance. They can be purchased and used in your ASIC or
FPGA design implementation flow. Soft macro is flexible for all type of ASIC implementation.
Hard macro can be used in pure ASIC design flow, not in FPGA flow. Before bying any IP it is
very important to evaluate its advantages and disadvantages over each other, hardware
compatibility such as I/O standards with your design blocks, reusability for other designs.
Soft macros
Soft macros are in synthesizable RTL.
Soft macros are more flexible than firm or hard macros.
Soft macros are not specific to any manufacturing process.
Soft macros have the disadvantage of being somewhat unpredictable in terms of performance,
timing, area, or power.
Soft macros carry greater IP protection risks because RTL source code is more portable and
therefore, less easily protected than either a netlist or physical layout data.
From the physical design perspective, soft macro is any cell that has been placed and routed in
a placement and routing tool such as Astro. (This is the definition given in Astro Rail user
manual !)
Soft macros are editable and can contain standard cells, hard macros, or other soft macros.
Firm macros
Firm macros are in netlist format.
Firm macros are optimized for performance/area/power using a specific fabrication technology.
Firm macros are more flexible and portable than hard macros.
Firm macros are predictive of performance and area than soft macros.
Hard macro
Hard macros are generally in the form of hardware IPs (or we termed it as hardwre IPs !).
Hard macos are targeted for specific IC manufacturing technology.
Hard macros are block level designs which are silicon tested and proved.
Hard macros have been optimized for power or area or timing.
In physical design you can only access pins of hard macros unlike soft macros which allows us
to manipulate in different way.
You have freedom to move, rotate, flip but you cant touch anything inside hard macros.
Very common example of hard macro is memory. It can be any design which carries dedicated
single functionality (in general).. for example it can be a MP4 decoder.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Be aware of features and characteristics of hard macro before you use it in your design other
than power, timing and area you also should know pin properties like sync pin, I/O standards
etc
LEF, GDS2 file format allows easy usage of macros in different tools.
From the physical design (backend) perspective:
Hard macro is a block that is generated in a methodology other than place and route (i.e. using
full custom design methodology) and is brought into the physical design database (eg. Milkyway
in Synopsys; Volcano in Magma) as a GDS2 file.
Here is one article published in embedded magazine about IPs. Click here to read.
Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ
different algorithms accomplish this task along with the target of power and area. There are several
research papers available on these subjects. Some of them can be downloaded from the given link
below.
Hard Macro Placement in Complex SoC Design view and read article from soccentral
Hard Macro Placement in Complex SoC Design download white paper
IEEE/Univerity research papers
Local Search for Final Placement in VLSI Design -download
Consistent Placement of Macro-Blocks Using Floorplanning and standard cell placement
download
A Timing-Driven Soft-Macro Placement And Resynthesis Method In Interaction with Chip
Floorplanning download
0 comments Links to this post
Labels: ASIC, Physical Design, VLSI

What is the difference between FPGA and CPLD?


FPGA-Field Programmable Gate Array and CPLD-Complex Programmable Logic Device both are
programmable logic devices made by the same companies with different characteristics.
A Complex Programmable Logic Device (CPLD) is a Programmable Logic Device with complexity
between that of PALs (Programmable Array Logic) and FPGAs, and architectural features of both.
The building block of a CPLD is the macro cell, which contains logic implementing disjunctive
normal form expressions and more specialized logic operations.
This is what Wiki defines..!!
Click here to see what else wiki has to say about it !
Architecture
Granularity is the biggest difference between CPLD and FPGA.
FPGA are fine-grain devices. That means that they contain hundreds of (up to 100000) of tiny
blocks (called as LUT or CLBs etc) of logic with flip-flops, combinational logic and
memories.FPGAs offer much higher complexity, up to 150,000 flip-flops and large number of
gates available.
CPLDs typically have the equivalent of thousands of logic gates, allowing implementation of
moderately complicated data processing devices. PALs typically have a few hundred gate
equivalents at most, while FPGAs typically range from tens of thousands to several million.
CPLD are coarse-grain devices. They contain relatively few (a few 100s max) large blocks of
logic with flip-flops and combinational logic. CPLDs based on AND-OR structure.
CPLDs have a register with associated logic (AND/OR matrix). CPLDs are mostly implemented
in control applications and FPGAs in datapath applications. Because of this course grained
architecture, the timing is very fixed in CPLDs.
FPGA are RAM based. They need to be downloaded (configured) at each power-up. CPLD are
EEPROM based. They are active at power-up i.e. as long as theyve been programmed at least
once.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

FPGA needs boot ROM but CPLD does not. In some systems you might not have enough time to
boot up FPGA then you need CPLD+FPGA.
Generally, the CPLD devices are not volatile, because they contain flash or erasable ROM
memory in all the cases. The FPGA are volatile in many cases and hence they need a
configuration memory for working. There are some FPGAs now which are nonvolatile. This
distinction is rapidly becoming less relevant, as several of the latest FPGA products also offer
models with embedded configuration memory.
The characteristic of non-volatility makes the CPLD the device of choice in modern digital
designs to perform boot loader functions before handing over control to other devices not
having this capability. A good example is where a CPLD is used to load configuration data for
an FPGA from non-volatile memory.
Because of coarse-grain architecture, one block of logic can hold a big equation and hence
CPLD have a faster input-to-output timings than FPGA.
Click here to read one good article.
Features
FPGA have special routing resources to implement binary counters,arithmetic functions like
adders, comparators and RAM. CPLD dont have special features like this.
FPGA can contain very large digital designs, while CPLD can contain small designs only.The
limited complexity (<500>

Speed: CPLDs offer a single-chip solution with fast pin-to-pin delays, even for wide input
functions. Use CPLDs for small designs, where instant-on, fast and wide decoding, ultra-low
idle power consumption, and design security are important (e.g., in battery-operated
equipment).
Security: In CPLD once programmed, the design can be locked and thus made secure. Since the
configuration bitstream must be reloaded every time power is re-applied, design security in
FPGA is an issue.
Power: The high static (idle) power consumption prohibits use of CPLD in battery-operated
equipment. FPGA idle power consumption is reasonably low, although it is sharply increasing in
the newest families.
Design flexibility: FPGAs offer more logic flexibility and more sophisticated system features
than CPLDs: clock management, on-chip RAM, DSP functions, (multipliers), and even on-chip
microprocessors and Multi-Gigabit Transceivers.These benefits and opportunities of dynamic
reconfiguration, even in the end-user system, are an important advantage.
Use FPGAs for larger and more complex designs.
Click here to read what Xilinx has to say about it.
FPGA is suited for timing circuit becauce they have more registers , but CPLD is suited for
control circuit because they have more combinational circuit. At the same time, If you synthesis
the same code for FPGA for many times, you will find out that each timing report is different.
But it is different in CPLD synthesis, you can get the same result.
As CPLDs and FPGAs become more advanced the differences between the two device types will
continue to blur. While this trend may appear to make the two types more difficult to keep apart,
the architectural advantage of CPLDs combining low cost, non-volatile configuration, and macro
cells with predictable timing characteristics will likely be sufficient to maintain a product
differentiation for the foreseeable future.

What is the difference between FPGA and ASIC?


This question is very popular in VLSI fresher interviews. It looks simple but a deeper insight into
the subject reveals the fact that there are lot of thinks to be understood !! So here is the answer.
FPGA vs. ASIC
Difference between ASICs and FPGAs mainly depends on costs, tool availability, performance
and design flexibility. They have their own pros and cons but it is designers responsibility to
find the advantages of the each and use either FPGA or ASIC for the product. However, recent

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

developments in the FPGA domain are narrowing down the benefits of the ASICs.
FPGA
Field Programable Gate Arrays
FPGA Design Advantages

Faster time-to-market: No layout, masks or other manufacturing steps are needed for FPGA
design. Readymade FPGA is available and burn your HDL code to FPGA ! Done !!
No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For
FPGA this is not there. FPGA tools are cheap. (sometimes its free ! You need to buy FPGA.
thats all !). ASIC youpay huge NRE and tools are expensive. I would say very expensiveIts in
crores.!!

Simpler design cycle: This is due to software that handles much of the routing, placement, and
timing. Manual intervention is less.The FPGA design flow eliminates the complex and timeconsuming floorplanning, place and route, timing analysis.
More predictable project cycle: The FPGA design flow eliminates potential re-spins, wafer
capacities, etc of the project since the design logic is already synthesized and verified in FPGA
device.
Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely,
instantly. FPGA can be reprogrammed in a snap while an ASIC can take $50,000 and more than
4-6 weeks to make the same changes. FPGA costs start from a couple of dollars to several
hundreds or more depending on the hardware features.
Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be
implemented on FPGA which could be verified for almost accurate results so that it can be
implemented on an ASIC. Ifdesign has faults change the HDL code, generate bit stream,
program to FPGA and test again.Modern FPGAs are reconfigurable both partially and
dynamically.
FPGAs are good for prototyping and limited production.If you are going to make 100-200
boards it isnt worth to make an ASIC.
Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But
todays FPGAs even run at 500 MHz with superior performance. With unprecedented logic
density increases and a host of other features, such as embedded processors, DSP blocks,
clocking, and high-speed serial at ever lower price, FPGAs are suitable for almost any type of
design.
Unlike ASICs, FPGAs have special hardwares such as Block-RAM, DCM modules, MACs,
memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get better
performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with
phase-locked loops, low-voltage differential signal, clock data recovery, more internal routing,
high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and
microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and
ARM (hardcore) and Nios(softcore) in Altera. There are FPGAs available now with built in ADC !
Using all these features designers can build a system on a chip. Now, dou yo really need an
ASIC ?
FPGA sythesis is much more easier than ASIC.
In FPGA you need not do floor-planning, tool can do it efficiently. In ASIC you have do it.
FPGA Design Disadvantages
Powe consumption in FPGA is more. You dont have any control over the power optimization.
This is where ASIC wins the race !
You have to use the resources available in the FPGA. Thus FPGA limits the design size.
Good for low quantity production. As quantity increases cost per product increases compared to
the ASIC implementation.
ASIC
Application Specific Intergrated Circiut

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

ASIC Design Advantages

Cost.cost.cost.Lower unit costs: For very high volume designs costs comes out to be very
less. Larger volumes of ASIC design proves to be cheaper than implementing design using
FPGA.
Speedspeedspeed.ASICs are faster than FPGA: ASIC gives design flexibility. This gives
enoromous opportunity for speed optimizations.
Low power.Low power.Low power: ASIC can be optimized for required low power. There are
several low power techniques such as power gating, clock gating, multi vt cell libraries,
pipelining etc are available to achieve the power target. This is where FPGA fails badly !!! Can
you think of a cell phone which has to be charged for every call..never..low power ASICs
helps battery live longer life !!
In ASIC you can implement analog circuit, mixed signal designs. This is generally not possible in
FPGA.
In ASIC DFT (Design For Test) is inserted. In FPGA DFT is not carried out (rather for FPGA no
need of DFT !) .
ASIC Design Diadvantages

Time-to-market: Some large ASICs can take a year or more to design. A good way to shorten
development time is to make prototypes using FPGAs and then switch to an ASIC.
Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many
more. In FPGA you dont have all these because ASIC designer takes care of all these. ( Dont
forget FPGA isan IC and designed by ASIC design enginner !!)
Expensive Tools: ASIC design tools are very much expensive. You spend a huge amount of NRE.
Structured ASICS
Structured ASICs have the bottom metal layers fixed and only the top layers can be designed by
the customer.
Structured ASICs are custom devices that approach the performance of todays Standard Cell
ASIC while dramatically simplifying the design complexity.
Structured ASICs offer designers a set of devices with specific, customizable metal layers along
with predefined metal layers, which can contain the underlying pattern of logic cells, memory,
and I/O.
FPGA vs. ASIC Design Flow Comparison
http://www.xilinx.com/company/gettingstarted/fpgavsasic.htm
Other links
http://www.controleng.com/article/CA607224.html
http://www.soccentral.com/results.asp?CategoryID=488&EntryID=15887
http://www.us.design-reuse.com/articles/article9010.html
1 comments Links to this post
Labels: ASIC, FPGA

In scan chains if some flip flops are +ve edge triggered and remaining flip
flops are -ve edge triggered how it behaves?
Answer:
For designs with both positive and negative clocked flops, the scan insertion tool will always route
the scan chain so that the negative clocked flops come before the positive edge flops in the chain.
This avoids the need of lockup latch.
For the same clock domain the negedge flops will always capture the data just captured into the
posedge flops on the posedge of the clock.
For the multiple clock domains, it all depends upon how the clock trees are balanced. If the clock
domains are completely asynchronous, ATPG has to mask the receiving flops.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

What is difference between normal buffer and clock buffer?


Answer:
Clock net is one of the High Fanout Net(HFN)s. The clock buffers are designed with some special
property like high drive strength and less delay. Clock buffers have equal rise and fall time. This
prevents duty cycle of clock signal from changing when it passes through a chain of clock buffers.
Normal buffers are designed with W/L ratio such that sum of rise time and fall time is minimum.
They too are designed for higher drive strength.

What is difference between HFN synthesis and CTS?


Answer:
HFNs are synthesized in front end also. but at that moment no placement information of standard
cells are available hence backend tool collapses synthesized HFNs. It resenthesizes HFNs based
on placement information and appropriately inserts buffer. Target of this synthesis is to meet delay
requirements i.e. setup and hold.
For clock no synthesis is carried out in front end (why..????..because no placement information of
flip-flops ! So synthesis wont meet true skew targets !!) in backend clock tree synthesis tries to
meet skew targetsIt inserts clock buffers (which have equal rise and fall time, unlike normal
buffers !) There is no skew information for any HFNs.

Is it possible to have a zero skew in the design?


Answer:
Theoretically it is possible.!
Practically it is impossible.!!
Practically we cant reduce any delay to zero. delay will exist hence we try to make skew equal
(or same) rather than zeronow with this optimization all flops get the clock edge with same
delay relative to each other. so virtually we can say they are having zero skew or skew is
balanced.

What you mean by scan chain reordering?


Answer1:
Based on timing and congestion the tool optimally places standard cells. While doing so, if scan
chains are detached, it can break the chain ordering (which is done by a scan insertion tool like DFT
compiler from Synopsys) and can reorder to optimize it. it maintains the number of flops in a
chain.
Answer2:
During placement, the optimization may make the scan chain difficult to route due to congestion.
Hence the tool will re-order the chain to reduce congestion.
This sometimes increases hold time problems in the chain. To overcome these buffers may have to
be inserted into the scan path. It may not be able to maintain the scan chain length exactly. It
cannot swap cell from different clock domains.
Because of scan chain reordering patterns generated earlier is of no use. But this is not a problem
as ATPG can be redone by reading the new netlist.

On what basis we decide the clock frequency in any design?


Answer:
There are several factors. Important of them are:
1) Input and output data rate : For example if you are designing any encryptor or decryptor you
need minimum 100 MHz
2) Power: Higher the frequency more the power consumption
3)Accuracy of the results required: If higher accuracy is not needed RC oscillator can be used which
saves area and everything we want in compact size.. but RC cant produce higher frequency !

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

4) Technology: Lower the node more speed (also more power.again trade off !!). how much fast
we want ?
5) Target platform: Is it FPGA or custom ASIC. naturally ASIC can give higher clok frequency but
FPGA frequency of operation is limited by several other factors

What is JTAG?
Answer1:
JTAG is acronym for Joint Test Action Group.This is also called as IEEE 1149.1 standard for
Standard Test Access Port and Boundary-Scan Architecture. This is used as one of the DFT
techniques.
Answer2:
JTAG (Joint Test Action Group) boundary scan is a method of testing ICs and their interconnections.
This used a shift register built into the chip so that inputs could be shifted in and the resulting
outputs could be shifted out. JTAG requires four I/O pins called clock, input data, output data, and
state machine mode control.
The uses of JTAG expanded to debugging software for embedded microcontrollers. This elimjinates
the need for in-circuit emulators which is more costly. Also JTAG is used in downloading
configuration bitstreams to FPGAs.
JTAG cells are also known as boundary scan cells, are small circuits placed just inside the I/O cells.
The purpose is to enable data to/from the I/O through the boundary scan chain. The interface to
these scan chains are called the TAP (Test Access Port), and the operation of the chains and the TAP
are controlled by a JTAG controller inside the chip that implements JTAG.

ASIC Design Check List


Silicon Process and Library Characteristics
What exact process are you using?
How many layers can be used for this design?
Are the Cross talk Noise constraints, Xtalk Analysis configuration, Cell EM & Wire EM available?
Design Characteristics
What is the design application?
Number of cells (placeable objects)?
Is the design Verilog or VHDL?
Is the netlist flat or hierarchical?
Is there RTL available?
Is there any datapath logic using special datapath tools?
Is the DFT to be considered?
Can scan chains be reordered?
Is memory BIST, boundary scan used on this design?
Are static timing analysis constraints available in SDC format?
Clock Characteristics
How many clock domains are in the design?
What are the clock frequencies?
Is there a target clock skew, latency or other clock requirements?
Does the design have a PLL?
If so, is it used to remove clock latency?
Is there any I/O cell in the feedback path?
Is the PLL used for frequency multipliers?
Are there derived clocks or complex clock generation circuitry?
Are there any gated clocks?
If yes, do they use simple gating elements?
Is the gate clock used for timing or power?
For gated clocks, can the gating elements be sized for timing?
Are you muxing in a test clock or using a JTAG clock?
Available cells for clock tree?
Are there any special clock repeaters in the library?
Are there any EM, slew or capacitance limits on these repeaters?

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

How many drive strengths are available in the standard buffers and inverters?
Do any of the buffers have balanced rise and fall delays?
Any there special requirements for clock distribution?
Will the clock tree be shielded? If so, what are the shielding requirements?
Floorplan and Package Characteristics
Target die area?
Does the area estimate include power/signal routing?
What gates/mm2 has been assumed?
Number of routing layers?
Any special power routing requirements?
Number of digital I/O pins/pads?
Number of analog signal pins/pads?
Number of power/ground pins/pads?
Total number of pins/pads and Location?
Will this chip use a wire bond package?
Will this chip use a flip-chip package?
If Yes, is it I/O bump pitch? Rows of bumps? Bump allocation?Bump pad layout guide?
Have you already done floorplanning for this design?
If yes, is conformance to the existing floorplan required?
What is the target die size?
What is the expected utilization?
Please draw the overall floorplan ?
Is there an existing floorplan available in DEF?
What are the number and type of macros (memory, PLL, etc.)?
Are there any analog blocks in the design?
What kind of packaging is used? Flipchip?
Are the I/Os periphery I/O or area I/O?
How many I/Os?
Is the design pad limited?
Power planning and Power analysis for this design?
Are layout databases available for hard macros ?
Timing analysis and correlatio?
Physical verification ?
Data Input
Library information for new library
.lib for timing information
GDSII or LEF for library cells including any RAMs
RTL in Verilog/VHDL format
Number of logical blocks in the RTL
Constraints for the block in SDC
Floorplan information in DEF
I/O pin location
Macro locations

Power Gating

Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein
circuit blocks that are not in use are temporarily turned off to reduce the overall leakage power of
the chip. This temporary shutdown time can also call as low power mode or inactive mode. When
circuit blocks are required for operation once again they are activated to active mode. These two
modes are switched at the appropriate time and in the suitable manner to maximize power
performance while minimizing impact to performance. Thus goal of power gating is to minimize
leakage power by temporarily cutting power off to selective blocks that are not required in that
mode.
Power gating affects design architecture more compared to the clock gating. It increases time delays
as power gated modes have to be safely entered and exited. The possible amount of leakage power
saving in such low power mode and the energy dissipation to enter and exit such mode introduces
some architectural trade-offs. Shutting down the blocks can be accomplished either by software or
hardware. Driver software can schedule the power down operations. Hardware timers can be

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

utilized. A dedicated power management controller is the other option.


An externally switched power supply is very basic form of power gating to achieve long term
leakage power reduction. To shutoff the block for small interval of time internal power gating is
suitable. CMOS switches that provide power to the circuitry are controlled by power gating
controllers. Output of the power gated block discharge slowly. Hence output voltage levels spend
more time in threshold voltage level. This can lead to larger short circuit current.
Power gating uses low-leakage PMOS transistors as header switches to shut off power supplies to
parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep
transistors. Inserting the sleep transistors splits the chips power network into a permanent power
network connected to the power supply and a virtual power network that drives the cells and can be
turned off.
The quality of this complex power network is critical to the success of a power-gating design. Two
of the most critical parameters are the IR-drop and the penalties in silicon area and routing
resources. Power gating can be implemented using cell- or cluster-based (or fine grain) approaches
or a distributed coarse-grained approach.
Power-gating parameters
Power gating implementation has additional considerations than the normal timing closure
implementation. The following parameters need to be considered and their values carefully chosen
for a successful implementation of this methodology [1] [2].
Power gate size: The power gate size must be selected to handle the amount of switching
current at any given time. The gate must be bigger such that there is no measurable voltage (IR)
drop due to the gate. Generally we use 3X the switching capacitance for the gate size as a rule
of thumb. Designers can also choose between header (P-MOS) or footer (N-MOS) gate. Usually
footer gates tend to be smaller in area for the same switching current. Dynamic power analysis
tools can accurately measure the switching current and also predict the size for the power gate.
Gate control slew rate: In power gating, this is an important parameter that determines the
power gating efficiency. When the slew rate is large, it takes more time to switch off and switchon the circuit and hence can affect the power gating efficiency. Slew rate is controlled through
buffering the gate control signal.
Simultaneous switching capacitance: This important constraint refers to the amount of circuit
that can be switched simultaneously without affecting the power network integrity. If a large
amount of the circuit is switched simultaneously, the resulting rush current can compromise
the power network integrity. The circuit needs to be switched in stages in order to prevent this.
Power gate leakage: Since power gates are made of active transistors, leakage is an important
consideration to maximize power savings.
Fine-grain power gating
Adding a sleep transistor to every cell that is to be turned off imposes a large area penalty, and
individually gating the power of every cluster of cells creates timing issues introduced by intercluster voltage variation that are difficult to resolve. Fine-grain power gating encapsulates the
switching transistor as a part of the standard cell logic. Switching transistors are designed by either
library IP vendor or standard cell designer. Usually these cell designs conform to the normal
standard cell rules and can easily be handled by EDA tools for implementation.
The size of the gate control is designed with the worst case consideration that this circuit will
switch during every clock cycle resulting in a huge area impact. Some of the recent designs
implement the fine-grain power gating selectively, but only for the low Vt cells. If the technology
allows multiple Vt libraries, the use of low Vt devices is minimum in the design (20%), so that the
area impact can be reduced. When using power gates on the low Vt cells the output must be
isolated if the next stage is a high Vt cell. Otherwise it can cause the neighboring high Vt cell to
have leakage when output goes to an unknown state due to power gating.
Gate control slew rate constraint is achieved by having a buffer distribution tree for the control
signals. The buffers must be chosen from a set of always on buffers (buffers without the gate
control signal) designed with high Vt cells. The inherent difference between when a cell switches off
with respect to another, minimizes the rush current during switch-on and switch-off.
Usually the gating transistor is designed as a high vt device. Coarse-grain power gating offers
further flexibility by optimizing the power gating cells where there is low switching activity. Leakage
optimization has to be done at the coarse grain level, swapping the low leakage cell for the high

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

leakage one. Fine-grain power gating is an elegant methodology resulting in up to 10X leakage
reduction. This type of power reduction makes it an appealing technique if the power reduction
requirement is not satisfied by multiple Vt optimization alone.
Coarse-grain power gating
The coarse-grained approach implements the grid style sleep transistors which drives cells locally
through shared virtual power networks. This approach is less sensitive to PVT variation, introduces
less IR-drop variation, and imposes a smaller area overhead than the cell- or cluster-based
implementations. In coarse-grain power gating, the power-gating transistor is a part of the power
distribution network rather than the standard cell.
There are two ways of implementing a coarse-grain structure:
1) Ring-based
2) column-based
Ring-based methodology: The power gates are placed around the perimeter of the module that
is being switched-off as a ring. Special corner cells are used to turn the power signals around
the corners.
Column-based methodology: The power gates are inserted within the module with the cells
abutted to each other in the form of columns. The global power is the higher layers of metal,
while the switched power is in the lower layers.
Gate sizing depends on the overall switching current of the module at any given time. Since only a
fraction of circuits switch at any point of time, power gate sizes are smaller as compared to the
fine-grain switches. Dynamic power simulation using worst case vectors can determine the worst
case switching for the module and hence the size. IR drop can also be factored into the analysis.
Simultaneous switching capacitance is a major consideration in coarse-grain power gating
implementation. In order to limit simultaneous switching daisy chaining the gate control buffers,
special counters are used to selectively turn on blocks of switches.
Isolation Cells
Isolation cells are used to prevent short circuit current. As the name indicates these cells isolate
power gated block from the normally on block. Isolation cells are specially designed for low short
circuit current when input is at threshold voltage level. Isolation control signals are provided by
power gating controller. Isolation of the signals of a switchable module is essential to preserve
design integrity. Usually a simple OR or AND logic can function as an output isolation device.
Multiple state retention schemes are available in practice to preserve the state before a module
shuts down. The simplest technique is to scan out the register values into a memory before
shutting down a module. When the module wakes up, the values are scanned back from the
memory.
Retention Registers
When power gating is used, the system needs some form of state retention, such as scanning out
data to a RAM, then scanning it back in when the system is reawakened. For critical applications,
the memory states must be maintained within the cell, a condition that requires a retention flop to
store bits in a table. That makes it possible to restore the bits very quickly during wakeup.
Retention registers are special low leakage flip-flops used to hold the data of main register of the
power gated block. Thus internal state of the block during power down mode can be retained and
loaded back to it when the block is reactivated. Retention registers are always powered up. The
retention strategy is design dependent. During the power gating data can be retained and
transferred back to block when power gating is withdrawn. Power gating controller controls the
retention mechanism such as when to save the current contents of the power gating block and
when to restore it back.

Multiple Threshold CMOS (MTCMOS) Circuits


James T. Kao et al. [2] showed MTCMOS logic is effective standby leakage control technique, but
difficult to implement since sleep transistor sizing is highly dependent on discharge pattern within
the circuit block. They showed dual Vt domino logic avoids the sizing difficulties and inherent
performance associated with MTCMOS. High Vt cells are used where leakage has to be prevented
whereas low Vt cells are employed where speed is of concern. Both cells are effectively used in
MTCMOS technique.
MTCMOS technique [1]

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

In active mode of operation the high Vt transistors are turned off and the logic gates consisting of
low Vt transistors can operate with low switching power dissipation and smaller propagation delay.
In standby mode the high Vt transistors are turned off thereby cutting off the internal low Vt
circuitry.

Variable Threshold CMOS (VTCMOS)


One of the efficient methods to reduce power consumption is to use low supply voltage and low
threshold voltage without loosing speed performance. But increase in the lower threshold voltage
devices leads to increased sub threshold leakage and hence more standby power consumption.
Variable Threshold CMOS (VTCMOS) devices are one solution to this problem. In VTCMOS technique
threshold voltage of the low threshold devices are varied by applying variable substrate bias voltage
from a control circuitry.
VTCMOS technique is very effective technique to reduce the power consumption with some
drawbacks with respect to manufacturing of these devices. VTCMOS requires either twin well or
triple well technology to achieve different substrate bias voltage levels at different parts of the IC.
The area overhead of the substrate bias control circuitry is negligible. [1]

Voltage Scaling
Reducing the power supply voltage is the effective technique to reduce dynamic power with the
speed penalty. Keeping all others factors constant if power scaling is scaled down propagation
delay will increase. This can be compensated by scaling down the threshold voltage to the same
extent as the supply voltage. This allows the circuit to produce the same speed performance at a
lower Vdd. At the same time smaller threshold voltages lead to smaller noise margin and increased
leakage current.

Dynamic Voltage and Frequency Scaling (DVFS)


We know that supply voltage can be reduced if frequency of operation is reduced. If reduction in
supply voltage is quadratic then approximately cubic reduction of power consumption can be
achieved. However, it should be noted that frequency reduction slows the operation.
The above mentioned relation between energy and voltage is not always true. The authors in [1]
showed that quadratic relationship between energy and Vdd deviates as Vdd is scaled down into the
sub threshold voltage level. Sub threshold leakage current increases exponentially with the supply
voltage. Since in sub threshold operation the on current takes the form of sub threshold current
delay increases exponentially with voltage scaling. At very low voltages dynamic power reduces
quadratically. But the leakage energy increases with supply voltage reduction since leakage energy
is linear with the circuit delay. Hence dynamic and leakage power becomes comparable in sub
threshold voltage region.
According to Bo Zhai et al. [1] dynamic voltage and frequency scaling is very popular low power
technique. But larger voltage ranges does not improve power efficiency. They showed that for sub
threshold supply voltages, leakage energy becomes dominant, making just in time completion
energy inefficient. They also showed that extending voltage range below half Vdd will improve the
energy efficiency for most processor designs while extending this range to sub threshold operations
is beneficial only for specific applications. One of the important points to be noted from their study
is DVFS in sub threshold voltage range is never energy efficient.

Multi Threshold (MVT) Voltage Technique


Multiple threshold voltage techniques use both Low Vt and High Vt cells. Use lower threshold gates
on critical path while higher threshold gates off the critical path. This methodology improves
performance without an increase in power. Flip side of this technique is that Multi Vt cells increase
fabrication complexity. It also lengthens the design time. Improper optimization of the design may
utilize more Low Vt cells and hence could end up with increased power!
Ruchir Puri et al. [2] have discussed the design issues related with multiple supply voltages and
multiple threshold voltages in the optimization of dynamic and static power. They noted several
advantages of Multi Vt optimization. Multi Vt optimization is placement non disturbing
transformation. Footprint and area of low Vt and high Vt cells are same as that of nominal Vt cells.

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

This enables time critical paths to be swapped by low Vt cells easily.


Frank Sill et al. [3] have proposed a new method for assignment of devices with different Vth in a
double Vth process. They developed mixed Vth gates. They showed leakage reduction of 25%. They
created a library of LVT, mixed Vt, HVT and Multi Vt. They compared simulation results with a LVT
version of each design. Leakage power dissipation decreased by average 65% with mixed Vth
technique compared to the LVT implementation.
Meeta Srivatsav et al. [4] have explored various ways of reducing leakage power and recommended
Multi Vt approach. They have carried out analysis using 130 nm and 90 nm technology. They
synthesized design with different combination of target library. The combinations were Low Vt cells
only, High Vt cells only, High Vt cells with incremental compile using Low Vt library, nominal (or
regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage
power of 469 w was obtained. With only High Vt cells leakage power consumption was minimum
but timing was not met (-1.13 of slack). With nominal Vt moderate leakage power value of 263 w
was obtained. Best results (54 w with timing met) obtained for synthesis targeting Hvt library and
incremental compile using Lvt library.
Different low leakage synthesis flows are carried out by Xiaodong Zhang [1] using Synopsys EDA
tools are listed below:
Low-Vt > Multi-Vt flow : This produces least cell count and least dynamic power. But produce
highest leakage power. It takes very low runtime. Good for a design with very tight timing
constraints
Multi-Vt one pass flow: It takes longest runtime and can be used in most of designs.
High-Vt > Multi-Vt flow : Produce least leakage power consumption but has high cell count
and dynamic power. This methodology is good for leakage power critical design.
High-Vt > Multi-Vt with different timing constraints flow : This is a well balanced flow and
produces second least leakage power. This has smaller cell count, area and dynamic power and
shorter runtime. This design is also good for most of designs.
Optimization Strategies
The tradeoffs between the different Vt cells to achieve optimal performance are especially beneficial
during synthesis technology gate mapping and placement optimization. The logic synthesis, or gate
mapping phase of the optimization process is implemented by synthesis tool, and placement
optimization is handled physical implementation tool.
Synthesis
During logic synthesis, the design is mapped to technology gates. At this point in the process
optimal logic architectures are selected, mapped to technology cells, and optimized for specific
design goals. Since a range of Vt libraries are now available and choices have to be made across
architectures with different Vt cells, logic synthesis is the ideal place to start deploying a mix of
different Vt cells into the design.
Single-Pass vs. Two-Pass Synthesis with multiple threshold libraries
Multiple libraries are currently available with different performance, area and power utilization
characteristics, and synthesis optimization can be achieved using either one or more libraries
concurrently. In a single-pass flow, multiple libraries can be loaded into synthesis tool prior to
synthesis optimization. In a two-pass flow, the design is initially optimized using one library, and
then an incremental optimization is carried out using additional libraries.
About multi vt optimization in his paper Ruchir Puri[2] says: The multi-threshold optimization
algorithm implemented in physical synthesis is capable of optimizing several Vt levels at the same
time. Initially, the design is optimized using the higher threshold voltage library only. Then, the
Multi-Vt optimization computes the power-performance tradeoff curve up to the maximum
allowable leakage power limit for the next lower threshold voltage library. Subsequently, the
optimization starts from the most critical slack end of this power-performance curve and switches
the most critical gate to next equivalent lower-Vt version. This will increase the leakage in the
design beyond the maximum permissible leakage power. To compensate for this, the algorithm
picks the least critical gate from the other end of the power-performance curve and substitutes it
back with its higher-Vt version. If this does not bring the leakage power below the allowed limit, it
traverses further from the curve (from least critical towards more critical) substituting gates with
higher-Vt gates, until the leakage limit is satisfied. Then we jump back to the second most critical
cell and switch it to the lower-Vt version. This iteration continues until we can no longer switch any

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

gate with the lower vt version without violating the leakage power limit.
But Amit Agarwal et al. [5] have warned about the yield loss possibilities due to dual Vt flows. They
showed that in nano-scale regime, conventional dual Vt design suffers from yield loss due to
process variation and vastly overestimates leakage savings since it does not consider junction BTBT
(Band To Band Tunneling) leakage into account. Their analysis showed the importance of
considering device based analysis while designing low power schemes like dual Vt. Their research
also showed that in scaled technology, statistical information of both leakage and delay helps in
minimizing total leakage while ensuring yield with respect to target delay in dual Vt designs.
However, nonscalability of the present way of realizing high Vt, requires the use of different process
options such as metal gate work function engineering in future technologies.

Multi Vdd (Voltage)


Dynamic power is directly proportional to power supply. Hence naturally reducing power
significantly improves the power performance. At the same time gate delay increases due to the
decreased threshold voltage. High voltage can be applied to the timing critical path and rest of the
chip runs in lower voltage. Overall system performance is maintained. Different blocks having
different voltage supplies can be integrated in SoC. This increases power planning complexity in
terms of laying down the power rails and power grid structure. Level shifters are necessary to
interface between different blocks.

Multiple Voltage ASIC/SoC Design: Classification


Multi voltage design strategies can be broadly classified as follows [1]:
Static Voltage Scaling (SVS): Different but fixed voltage is applied to different blocks or
subsystems of the SoC design.
Multi-level Voltage Scaling (MVS): The block or subsystem of the ASIC or SoC design is switched
between two or more voltage levels. But for different operating modes limited numbers of
discrete voltage levels are supported.
Dynamic Voltage and Frequency Scaling (DVFS): Voltage as well as frequency is dynamically
varied as per the different working modes of the design so as to achieve power efficiency. When
high speed of operation is required voltage is lowered to attain higher speed of operation with
the penalty of increased power consumption.
Adaptive voltage Scaling (AVS): Here voltage is controlled using a control loop. This is an
extension of DVFS.
Multi Voltage Design Challenges
Level Shifters
Signals crossing from one voltage domain to another voltage domain have to be interfaced through
the level shifter buffers which appropriately shift the signal levels. Design of suitable level shifter is
a challenging job.
Timing Analysis
Timing analysis of the given design becomes simpler with the single voltage as it can be performed
for single performance point based on the characterized libraries. Tools can optimize the design for
worst case PVT (Process, Voltage, temperature) conditions. This is not the case with multi voltage
designs. Libraries should be characterized for different voltage levels that are used in the design.
EDA tool has to optimize individual blocks or subsystems and also multiple voltage domains. This
analysis becomes complex for larger ASIC/SoC.
Floor planning and Power Planning
Multiple power domain demands multiple power grid structure and a suitable power distribution
among them. For a larger ASIC/SoC more careful floor planning and power planning is essential.
The speed in which different power domains switch on or off is also important. A low voltage power
domain may activate early compared to the high voltage domain. Multi voltage designs pose
additional board level complexities. Separate power supply may necessary to provide different
power levels.
Multi Voltage Designs: Timing Issues
Clock

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

Clock Tree Synthesis (CTS) tools should be aware of different power domains and understand the
level shifters to insert them in appropriate places. Clock tree is routed through level shifters to
reach different power domains. Simultaneous timing analysis and optimization is necessary for
multiple voltage domains. Thus CTS becomes more complex in multi voltage designs.
Timing Issues with multi voltage design
Static Timing Analysis (STA)
Timing analysis for single voltage design is easy. When it comes to static voltage scaling it becomes
little tougher job as analysis has to be carried out for different voltages. This methodology requires
libraries which are characterized for different voltages used. Multi level and dynamic voltage scaling
pose a greater challenge. For each supply voltage level or operating point constraints are specified.
There can be different operating modes for different voltages. Constraints need not be same for all
modes and voltages. The performance target for each mode can vary. EDA tool should be capable
of handling all these situations simultaneously to carry out timing analysis. Different constraints at
different modes and voltages have to be satisfied.
Multi Voltage Designs: Power Planning Issues
Efficient power planning is one of the key concerns of modern SoC designs. In multi voltage designs
providing power to the different power domains is challenging. Every power domain requires
independent local power supply and grid structure and some designs may even have a separate
power pad. Separate power pad is possible in flip-chip designs and power pad can be taken out
near from the power domain. Other chips have to take out the power pads from the periphery
which can put limit to the number of power domains.
Local on chip voltage regulation is good idea to provide multiple voltages to different circuits.
Unfortunately most of the digital CMOS technologies are not suitable for the implementation of
either switched mode of operation or linear voltage regulations. Separate power rail structure is
required for each power domain. These additional power rails introduce different levels of IR drop
putting limit to the achievable power efficiency.

Low Power Design Techniques


Michael Keating et al. [1] lists several low power techniques to tackle the dynamic and static power
consumption in modern SoC designs. Dynamic power control techniques include clock gating, multi
voltage, variable frequency, and efficient circuits. Leakage power control techniques include power
gating, multi Vt cells. Common methods supported by EDA tools include clock gating, gate sizing,
low power placement, register clustering, low power CTS, multi Vt optimization.
Some of the low power techniques in use today are listed in below table.
Different Low Power Techniques [3]
Trade-offs associated with the various power management techniques [2]
Above table summarizes trade-offs associated with different power management techniques. Power
gating and DVFS demand large methodology change whereas multi vt and clock gating affect least.
Unless large leakage optimization is not necessary it is always beneficial to go with either multi vt
or clock gating techniques. Based on the design complexity and requirements combination of any
low power techniques can be adopted. Multi vt optimization along with the power gating is found to
be efficient in some of the complex designs. Advanced improvements in the implementation (i.e.
fabrication) technology has allowed substrate biasing techniques to be used heavily as it does not
pose any architectural and design verification challenges and also provides high leakage reduction.

Multi Voltage Designs: Power Planning Issues


Efficient power planning is one of the key concerns of modern SoC
designs. In multi voltage designs providing power to the different
power domains is challenging. Every power domain requires independent
local power supply and grid structure and some designs may even have a
separate power pad. Separate power pad is possible in flip-chip
designs and power pad can be taken out near from the power domain.
Other chips have to take out the power pads from the periphery which
can put limit to the number of power domains.
Local on chip voltage regulation is good idea to provide multiple
voltages to different circuits. Unfortunately most of the digital CMOS

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

technologies are not suitable for the implementation of either


switched mode of operation or linear voltage regulations.
Separate power rail structure is required for each power domain. These
additional power rails introduce different levels of IR drop putting
limit to the achievable power efficiency.

Variable Threshold CMOS (VTCMOS) Circuits


One of the efficient methods to reduce power consumption is to use low
supply voltage and low
threshold voltage without loosing speed performance. But increase in
the lower threshold voltage devices leads to increased subthreshold
leakage and hence more standby power consumption.
Variable Threshold CMOS (VTCMOS) devices are one solution to this
problem. In VTCMOS technique threshold voltage of the low threshold
devices are varied by applying variable substrate bias voltage from a
control circuitry.
VTCMOS technique is very effective technique to reduce the power
consumption with some drawbacks related to manufacturing of these
devices. VTCMOS requires either twin well or triple well technology to
achieve different substrate bias voltage levels at different parts of
the IC. The area overhead of the substrate bias control

Multi Voltage Designs: Timing Issues


Clock
Clock Tree Synthesis (CTS) tools should be aware of different power
domains and understand the level shifters to insert them in
appropriate places. Clock tree is routed through level shifters to
reach different power domains. Simultaneous timing analysis and
optimization is necessary for multiple voltage domains. Thus CTS
becomes more complex in multi voltage designs.
Static Timing Analysis (STA)
Timing analysis for single voltage design is easy.When it comes to
static voltage scaling it becomes little tougher job as analysis has
to be carried out for different voltages.This methodology requires
libraries which are characterized for different voltages used.
Multi level and dynamic voltage scaling pose a greater challenge. For
each supply voltage level or operating point constraints are
specified. There can be different operating modes for different
voltages. Constraints need not be same for all modes and voltages. The
performance target for each mode can vary. EDA tool should be capable
of handling all these situations simultaneously to carry out timing
analysis. Different constraints at different modes and voltages have
to be satisfied.

Multiple Voltage Design Challenges


Level Shifters
Signals crossing from one voltage domain to another voltage domain has
to be interfaced through the level shifter buffers which appropriately
shifts the signal levels. Design of suitable level shifter is a
challenging job.
Timing Analysis
Timing analysis of the given design becomes simpler with the single
voltage as it can be performed for single performance point based on
the characterized libraries. Tools can optimize the design for worst
case PVT (Process, Voltage, temperature) conditions.
This is not the case with multi voltage designs. Libraries should be

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

characterized for different voltage levels that are used in the


design. EDA tool has to optimize individual blocks or subsystems and
also multiple voltage domains. This analysis becomes complex for
larger ASIC/SoC.
Floorplanning and Power Planning
Multiple power domain demands multiple power grid structure and a
suitable power distribution among them. For a larger ASIC/SoC more
careful floorplanning and power planning is essential
The speed in which different power domains switch on or off also
important. A low voltage power domain may activate early compared to
the the high voltage domain. Multi voltage designs pose additional
board level complexities. Separate power supply may necessary to
provide different power levels.

Multiple Threshold (Multi Vt) Cell Libraries


With the technologies shrinking to 90nm, 30nm and below one of the
common ways to reduce leakage power is to use multiple Vt libraries.
Subthreshold leakage varies exponentially with the Vt comparated to
the weaker dependance of delay over Vt.
Libraries are offered in different versions each consisting of
standard Vt cells, low Vt cells and high Vt cells independant of each
other. Power and timing is optimized based on these libraries and they
offer good flexibility and opportunity to logic and physical synthesis
tool for optimization process.
Dual Vt synthesis flow has become quite common in 130nm and below
tehnology nodes. In this flow initial synthesis is carried out
targeting primary library which may be a low Vt or high Vt or normal
Vt library, and the second iteraton of synthesis and optimization is
performed based on secondary libraries which are also libraries
consistitng of multiple threshold cells.
Which library has to be used as primary library ?
This depends on the optimization target as per the design requirement.
In general, if optimization target is power performance, first
syntheize the design using the high Vt cell library which achieves
lowest leakage power. In the next iteration of optimization cells in
the critical path has to be replaced by low Vt cells which are faster.
If the optimization target is to meet timing then first use low Vt
cell library to achieve timing and then optimize leakage power using
high Vt cells.

Low Power Techniques: Multi Voltage (Vdd)


Dynamic power is directly proportional to power supply. Hence
naturally reducing power significantly improves the power
performance.At the same time gate delay increases due to the decreased
threshold voltage.
High voltage can be applied to the timing critical path and rest of
the chip runs in lower voltage. Overall system performance is
maintained.
Different blocks having different voltage supplies can be integrated
in SoC. This increases power planning complexity in terms of laying
down the power rails and power grid structure. Level shifters are
necessary to interface between different blocks.

Low Power Techniques: Clock Gating


Clock buffers consume more than 50 % of dynamic power. Hence it is
good design idea to turn off the clock when it is not needed.Automatic

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

clock gating is supported by modern EDA tools. They identify the


circuits where clock gating can be inserted.
Specific clock gating cells are required in library to be utilized by
the synthesis tools. Availability of clock gating cells and automatic
insertion by the EDA tools makes it simpler method of low power
technique. Advantage of this method is that clock gating does not
require modifications to RTL description.

Share this:
Twitter
Facebook6

Like

Be the first to like this.

Related

SoC Power Integrity Challenges (From the


original article by daniel_payne)
In "From My desk"

Introduction to FinFET Technology Part III


In "Physical Design Basics"
"Timing Paths" : Static Timing Analysis
(STA) basic
In "Physical Design Basics"

Comments RSS feed

1 Comment:
relocation
April 24, 2013 at 3:50 pm

BestAntivirusSoftware.co.nz is New Zealands No. FREE


relocation http://www.zoomgroups.com/userProfile/6716514
Reply

Leave a Reply

All About 90nm Technology

Friends & links


Click Here to download Microwind 3.1
Click Here to download Tanner EDA
Click here to view tutorial on Synopsys
ICD software

Next Post

Blogs I Follow
Psyche's Circuitry
digiphile

Monthly archives
May 2013
April 2013
March 2013

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]

Backend (Physical Design) Interview Questions and Answers | BipeenKulkarni

IIT Study Centre (NPTEL)

9to5Mac

IIT Virtual Lab


Stay Tuned With Latest News in
Semiconductors

The Bookshelf of Emily J.


BipeenKulkarni

Blog at WordPress.com. | The Motion Theme.

[ Back to top ]

file:///C|/Users/COMSOL/Desktop/Backend%20(Physical%20Design)%20Interview%20Questions%20and%20Answers%20_%20BipeenKulkarni.htm[6/21/2014 7:49:41 PM]