Академический Документы
Профессиональный Документы
Культура Документы
VLSI II:
Design of Very Large Scale Integration Circuits
227-0147-00L
Exercise 6
Power Analysis
Prof. L. Benini
F. Gürkaynak
Reminder:
With the execution of this training you declare that you understand and accept the regulations about using
CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://eda.ee.ethz.ch/index.php/Regulations.
1 What you will learn
In previous exercises, you have learned how to carry out a digital circuit design that meets given timing and
area constraints. This exercise will extend your knowledge to power considerations. More specifically, we will
show you:
• How to determine node activity figures of adequate accuracy.
• How to estimate a circuit’s power dissipation from node activities.
The required CAE tools throughout this exercise are M ENTOR G RAPHICS M ODELSIM1 and C ADENCE I NNOVUS.
You will use the former in order to perform a post-layout simulation and the latter to obtain power figures of the
test circuit.
2 Introduction
2.1 Theoretical background
As explained in Chapter 10 of the lecture notes,2 four phenomena dissipate energy in CMOS circuits:
We will not be concerned with static power in this exercise as we limit ourselves to pure CMOS circuits with
no resistive loads and because leakage is almost negligible due to the conservative fabrication process being
studied. For the needs of EDA tools the dynamic dissipation can be attributed to library cells as follows.
Internal power Pint is the power dissipated inside a cell for the charging and discharging of internal capaci-
tances and due to crossover currents.
Switching power Pext is the power dissipated inside a cell for charging and discharging the load capacitance
connected to the cell’s output. That external load consists of the input capacitances of all cells being
driven plus the parasitic capacitances of the wires (aka interconnect).
The total power dissipation Ptot related to a cell can now be expressed as
1 Mentor Graphics Modelsim is now actually called Mentor Graphics Questa Sim; don’t let yourself get confused - the
functionality and user interface weren’t changed.
2 Hubert Kaeslin, “Top-Down Digital VLSI Design - From Gate-Level Circuits to CMOS Fabrication”, Sept. 2015.
3 For standard single-edge-triggered one-phase clocking, computation period and clock cycle are the same fcp = fclk . Double-edge
triggered circuits, in contrast, offer two computation periods per clock cycle so that fcp = 2fclk .
2
A power estimator essentially is a piece of software that sums up the various contributions over an entire circuit.
Provided the same clock and voltage get used everywhere, this amounts to:
M N M N
!
X X
2
X αm X αn
Pckt = Pint m + Pext n ' fcp Udd · Cint m + Cext n (3)
m=1 n=1 m=1
2 n=1
2
Index m = 1...M refers to the cells instantiated in the circuit and n = 1...N to the nets of interconnect running
in between. For each cell, an internal activity figure αm is estimated from the node activities at the input(s).
Note that Cint m is not meant to correspond to any capacitance physically present in the circuit. Rather, it is
just a numerical parameter adjusted for each cell during library characterization such as to model its internal
dissipation.4
Equation (3) tells us a few important things about power dissipation and power estimation:
• Realistic switching activity figures are crucial, they can be obtained from gate-level simulations.
• Realistic capacitance figures are important, they are best extracted from layout data.
• Dynamic power grows with Udd squared. For more information about the power vs. speed dilemma, we
refer to the lecture notes.
InputA_DI
4
1
Output_DO
0
4 Clk_CI
InputB_DI
A signal Add_SI decides which operation result gets assigned to the output according to the following rule (in
pseudo-VHDL):
if Add_SI = ’1’
then Output_DO <= InputA_DI + InputB_DI;
else Output_DO <= InputA_DI * InputB_DI;
end if;
The frequency of Clk_CI is 200 MHz and the input waveforms are represented in Figure 2. They are periodic
and the two input values (InputA_DI and InputB_DI) have been chosen to be always the same. Moreover,
assume that no glitches occur and that the supply voltage is 1.2 V.
Student Task 1:
1. Output waveform: Collecting all 8 bits into one signature, draw the waveform and numeric values
of Output_DO in Figure 2.
4 Incidentally observe that any attempt to capture the internal dissipation of a cell with a single quantity is not exactly accurate
as the energy dissipated when one input toggles may also depend on what is happening at other inputs at the same time. And
in the occurrence of a bistable, the current state is likely to matter too. While industrial standard cell models typically cover
all possible situations, we shall not be concerned with such details here.
3
Clk_CI
InputA_DI
= 0000 1111 0000 1111 0000 1111 0000 1111 0000
InputB_DI
Add_SI
Output_DO
10 20 30 40 time (ns)
Figure 2: Input and output waveforms.
2. Switching activities: Assuming single-edge-triggered one-phase clocking, complete the node ac-
tivity column in Table 2.
3. Power spent for switching of nets: You now have all the facts required to calculate the switching
powers associated with the various nets according to Equation (2). Fill in the numbers into the last
column.
4. Power dissipated within circuit blocks: Now consider Table 3. What is the main sink of power
among the blocks listed there and how much does it dissipate?
4
5. Consolidated dissipation: Compiling all contributions from Table 2 and Table 3, how much power
does the circuit dissipate internally, that is, with no load attached?
6. Overall dissipation: Suppose each output drives a load of 1 pF. What is the total power consump-
tion now?
Student Task 2: Plug in these numbers into Equation (3) and put down the result here:
OutSelect_SI
2
Mode_SI
0
0 0
15
DataIn_DI 15
1
DataOut_DO
1
0 0
5
Student Task 3:
1. Open a Unix shell window.
2. Install the test vehicle:
sh > /home/vlsi2/ex06/install.sh
The subsequent integer values in the stimuli file correspond to the input data. Next, let us give some technical
comments on the process of automated power estimation.
6
SDF back annotation: The SDF (Standard Delay Format) file contains the information about the interconnect
and cell delays in a design. It can be exported from C ADENCE I NNOVUS to transmit these delay data to
a simulator (and/or to a static timing analyzer). This file is required for any type of post-layout simulation,
irrespectively of whether you are interested in calculating power consumption or in gate-level functional
verification.
VCD back annotation: The VCD (Value Change Dump) file logs all signal changes (i.e., the “events” in VHDL
terminology) that occur during a simulation run. The information is essentially the same as in the M EN -
TOR G RAPHICS M ODELSIM wave window but in textual form. File size thus not only grows with design
complexity but also with the length of a simulation run. A VCD file is required for power analysis with
C ADENCE I NNOVUS. For obvious reasons, it is always possible to extract the average activity for each
circuit node from a VCD file but not the other way round.
As a welcome observation, we note that no parasitics exchange file (such as SPEF or RSPF) is required to
transport estimated capacitance values from the place&route tool to the power calculation tool as both functions
are assumed by C ADENCE I NNOVUS in the current design flow.
Student Task 4:
• Start C ADENCE I NNOVUS (e.g., from the cockpit).
• In the C ADENCE I NNOVUS GUI, select File→Restore Design... Click the I NNOVUS radio button
in the DATA T YPE selection. In the Restore Design File menu, choose filterBS_chip.enc
from the save directory and click OK.
• Among the views on the top right hand, select the last one, the P HYSICAL VIEW.
7
Global activity
C ADENCE I NNOVUS allows to automatically set a default toggle-activity value to all internal nodes. Throughout
the power analysis each internal node of your chip will toggle with this probability in each clock cycle.
Student Task 5: In order to run the analysis, select Power →Power Analysis →Run.... The Run \
Power Analysis window opens as shown in Figure 4. For the time beeing, leave the clock frequency
at 100 MHz. Select the folder powerReports as the results directory. Then, step into the the Activity
tab and write 0.2 as global activity, which means that every node will change its state with a probability
of 0.2 per clock cycle. This is a good initial value. At this point, you are able to start your first statistical
power analysis. Press the OK button (or A PPLY).
The power analysis will then start and write lines similar to the following on the C ADENCE I NNOVUS shell
window:
...
Begin Power Analysis
0.00V VSSIO
0.00V VSS
1.20V VDDIO
1.20V VDD
Ended Processing Power Net/Grid for Power Calculation: (cpu=0:00:00, real=0:00:00, mem(proc
8
Ended Processing User Attributes: (cpu=0:00:00, real=0:00:00, mem(process/total)=1155.82MB/
Starting Levelizing
2016-Oct-18 13:51:56 (2016-Oct-18 11:51:56 GMT)
2016-Oct-18 13:51:56 (2016-Oct-18 11:51:56 GMT): 10%
...
Among the messages in the console you will find some information about the clock. Notice that the clock
frequency extracted from the Synopsys Design Constraint (SDC) file (200 MHz) does not match the frequency
specified in the GUI. The tool will use the SDC version, so the entry in the GUI will be ignored. It is important
that you always check the clock frequency on the console.
At the end of the analysis C ADENCE I NNOVUS will write a summary on the console. The result will also be
written to the filterBS_chip.rpt file, in the powerReports directory. Have a look at it and try to identify
the main results of the power dissipation of your chip.
Student Task 6: How much power does the chip dissipate? What are the values that contribute most to
the total power? Talk to an assistant and discuss where most of the power is being dissipated. Calculate
the total power dissipated by these instances. Update the results table at the beginning of Section 5. Use
the last column to enter the power dissipated by the above mentioned instances.
Once we run the analysis again this report file will be overwritten. For this exercise we would like to
preserve the file, so that we can compare the results later on. Step into the encounter directory of this
exercise and make a copy or move the file under a different name, for example:
sh > cd ./encounter
sh > mv powerReports/filterBS_chip.rpt \
sh > powerReports/filterBS_chip_gaI.rpt
As you can see, most of the power is dissipated by the drivers in the input- and output pads. In order to get
a better grasp of the power that is consumed by the actual circuit (or, the core of our chip), we want to tell
C ADENCE I NNOVUS to not take the pads into account when performing power analyses.
Student Task 7:
• Select Power →Power Analysis →Setup... to open the Set Power Analysis Mode...
menu. In the Switched-off or Power-up Nets field, enter VDDIO VSSIO to not take these
two nets into account for the power analysis. Press OK.
• Run again an activity-based power analysis with a global activity of 0.2.
• Update the results table and compare the power dissipation and the dominating instances with the
values obtained earlier.
• As before, rename the generated report file:
sh > mv powerReports/filterBS_chip.rpt \
sh > powerReports/filterBS_chip_gaII.rpt
For the remainder of this exercise, we’ll exclude the power dissipated by the padring from the power figures
and concentrate on the core of the chip and how its power consumption can be reduced. Hence, always make
sure that the VDDIO and VSSIO nets are excluded from the power analyses.
9
Input activity
Setting all internal nodes to a fixed activity is a gross oversimplification. Not all gates will switch with the same
probability (i.e., a 3-input AND gate switches its output much less than say a 2-input XOR gate). Instead of
setting a default switching value to every internal node of the chip, it is also possible to define only the activity
of the input pins. C ADENCE I NNOVUS is then able to propagate this activity inside the chip.
Student Task 8:
• To execute this new power analysis go back to the Run Power Analysis window and deselect
the global activity option in the Activity tab. Return to the Basic tab and put the value 0.2 in the
input activity field. Set the clock frequency (dominant frequency value) in the GUI to 200 MHz
so that it matches the SDC value. Leave the flop activity and the clock gate activity fields emptya .
• Run the analysis and check the new report. What is the total power dissipation of the chip now?
Can you explain the difference with the previous value? Which of the two results is more reliable?
• Update the table you started from the last time with the current results.
• Don’t forget to rename the generated report.
a The first specifies the activity of outputs of sequential logic, while the latter specifies the average number of times that a
clock-gating cell switches in a clock cycle.
Instead of trying to estimate the switching power (with different levels of accuracies), we can use the M ENTOR
G RAPHICS M ODELSIM simulator to run the complete simulation and determine the exact switching activity. We
can tell M ENTOR G RAPHICS M ODELSIM to write out a VCD file from the post-layout netlist, which will for all
nodes include information that tells when the node has switched to what value.
Student Task 9: Step into the modelsim directory of this exercise:
sh > cd ../modelsim
Compile the placed & routed netlist of the final design. Also compile the testbench and related files. All
these compilations can be performed by executing a single shell scripta :
sh > ./compile_gate.csh
a A good idea is to take a look at it! You should know what you are executing.
Note that M ENTOR G RAPHICS M ODELSIM displays several warnings about missing connections for each pad
when loading the design. These warnings are due to the fact that, unlike for standard cells, the power and
ground connections of the pads are explicitly specified as input pins in the I/O library. In the current design
flow, the pads are added to the verilog netlist generated by S YNOPSYS D ESIGN C OMPILER without power and
ground connections being specified. This allows C ADENCE I NNOVUS to read the netlist and properly connect
the pads to power and ground without encountering power pins misleadingly declared as signal pins. But since
the power and ground pins of the pads are also not specified in the final netlist exported by C ADENCE I NNOVUS,
M ENTOR G RAPHICS M ODELSIM displays warnings about missing pad connections when reading the netlist and
loading the I/O library. Knowing that everything is fine with the pads, these warnings can be ignored.
To view the input and output of the filter, there is a .do file that will show the relevant signals in the Wave
window. On the console you could type:
10
vsim > do wave.do
• At this stage we can run the gate-level simulation until the end (5,060 ns). Moreover, the simulator
needs to be flushed at the end of the simulation run to make M ENTOR G RAPHICS M ODELSIM write
the VCD file.
vsim > run -all
vsim > vcd flush
For a real design, the simulation could take a very long time, and more importantly, could produce very large
(Gigabytes !!) of VCD files. For your own designs consider writing the VCD files to the /scratch directory.
This simulation, however, should not take that long. As you can see from the wave window, the inputs are
rather random, and should produce a lot of activity.
Stimuli-based activity
At this point, we have a VCD file that contains the toggle activity of the nodes in the design based on a
simulation with actual stimuli. We will now give it to C ADENCE I NNOVUS to perform a stimuli-based power
analysis:
Once the power analysis starts, it will start writing messages to the C ADENCE I NNOVUS shell, which provide
you some valuable feedback about the power simulation. There will be a message similar to the following
one
With this vcd command, 3639044 value changes and 4.8e-06 second
simulation time were counted for power consumption calculation.
11
Figure 5: Run Power Analysis menu in Cadence Innovus with VCD file.
The line above summarizes how C ADENCE I NNOVUS has interpreted the VCD file. It is very important to make
sure that the time (expressed in seconds) is equal to what we have simulated (and have intended). In our case,
the time should be 5,000 ns - 200 ns =4,800 ns, which matches the above message. Make sure that you have
the correct time.
The lines above tell us what C ADENCE I NNOVUS has extracted from the VCD file. It is very easy to make
mistakes and use the wrong VCD file. The second line shows the total number of switching activities, and the
third line shows what percentage of the internal nodes that were annotated. If you see that the message looks
like the following:
you have a problem. Most probably, it is the wrong file or the wrong scope has been specified. It is a good idea
to have a look at the VCD file and check for case mismatches in the scope. C ADENCE I NNOVUS will still perform
the analysis regardless of the success of the annotation. Since nothing was backannotated, the results will just
be wrong.
12
Student Task 13:
• To do this, we apply the stimuli producing the least activity in the design: an all zero vector. Generate
a stimuli file with an all zero input and record a new VCD file. Try to figure out how to achieve this in
an efficient way.
• Update the estimated power in our table.
• Present the results to an assistant.
As you can see, the reduction in power consumption due to unselecting one of the filter banks is almost
negligible. Consulting the simplified block diagram in Figure 3, you should notice that there is another more
effective way to reduce the power consumption without losing functionality.
Next, we are going to disable the unused filter bank by means of clock gating. The test circuit already has the
control signals for this solution (see Section 3.3). We will use the option Clock Gating. This option will a) only
enable the low-pass filter block, and b) disable the registers inside the high-pass filter block.
Changing Mode_SI to 0 constantly disables the registers inside the unselected filter blocks. In general, ar-
chitectural changes like this can not always be performed by changing the input stimuli (this was done in the
13
exercise to save time). Such architectural changes would require changes to be made to the circuit description,
re-synthesis of the circuit, and a fresh back-end design process. After that, one would have to extract the SDF
file and the netlist, use M ENTOR G RAPHICS M ODELSIM to generate a new VCD file, import this file back into
C ADENCE I NNOVUS and perform the power analysis.
Setting Mode_SI to 0 already saves quite some power if only one of the two filter banks is used. However,
there is still some power wasted if only one of the two banks is used. Looking at the sourcecode, can you
identify how to save even more power?
•
•
•
Explain the numbers in your final table to an assistant and discuss any open
E questions. E
14