Академический Документы
Профессиональный Документы
Культура Документы
Jeff Shabel
QUALCOMM, Inc.
jshabel@qualcomm.com
ABSTRACT
Clock tree power continues to be a major contributor to dynamic and, to a lesser degree, static
chip power. It is imperative for low-power designs to reduce clock tree power as much as
possible. The introduction of Power Compiler makes it possible to drastically cut dynamic clock
tree power. This paper shows how PrimeTime 2004.12 was used to obtain fairly accurate clock
tree power estimates on a 130nm chip. On this chip, Power Compiler dynamic clock gating was
applied to several blocks. This paper describes how much power was consumed by each
component of the clock tree, including wire, pin, clock buffer and register components. By
analyzing how these different components contribute to the overall clock tree power, it was
possible to find ways to improve the library to achieve lower-power designs. In addition this
analysis shows how much of the clock tree power was consumed at the leaf of the tree. In
situations where the leaf of the tree consumes more power, Power Compiler can be used to
achieve significant savings in dynamic clock tree power. This paper explains the potential
savings achieved on a real chip by using Power Compiler and compares this information to real
silicon data. The paper also compares two different chips, one that used Power Compiler and one
that did not, and describes how power savings for two blocks were achieved with Power
Compiler.
In addition to analyzing clock tree power, this paper explains how to take advantage of the new
PrimeTime 2004.12 feature, read_parasitics_load_locations, to assist with visualizing clock
trees. The paper describes how Tcl scripts can be used to plot clock trees, critical paths, and
clock tree power relative to a given floorplan. These scripts can provide the designer with a
quick and intuitive way to analyze clock trees and to find any potential issues with them.
Table of Contents
1.0
2.0
2.1
2.1.1
2.1.2
2.2
2.2.1
2.2.2
3.0
3.1
3.2
4.0
4.1
4.2
5.0
5.1
5.2
5.3
5.4
6.0
6.1
6.2
7.0
7.1
7.2
8.0
9.0
10.0
11.0
11.1
11.2
Introduction......................................................................................................................... 4
How is clock tree power measured now? ........................................................................... 4
Static measurements............................................................................................................ 4
PrimePower..................................................................................................................... 4
With real silicon.............................................................................................................. 4
Dynamic measurements ...................................................................................................... 5
PrimePower..................................................................................................................... 5
With real silicon.............................................................................................................. 5
Using PrimeTime to calculate clock tree power ................................................................. 5
Overview............................................................................................................................. 5
Details of the PrimeTime Tcl script.................................................................................... 8
Static analysis results .......................................................................................................... 9
Ungated clock tree power ................................................................................................... 9
Gated clock tree power ....................................................................................................... 9
Correlation to silicon ........................................................................................................ 11
Setup and method.............................................................................................................. 11
Ungated clocks.................................................................................................................. 11
Gated clocks...................................................................................................................... 12
Dynamic clock savings ..................................................................................................... 13
Power plotting................................................................................................................... 13
Goals and setup ................................................................................................................. 13
Sample results ................................................................................................................... 14
Clock tree plotting ............................................................................................................ 14
Goals and setup ................................................................................................................. 14
Sample results ................................................................................................................... 14
Conclusion ........................................................................................................................ 15
Acknowledgements........................................................................................................... 16
References......................................................................................................................... 16
Appendix........................................................................................................................... 16
Perl script to preprocess .lib file ....................................................................................... 16
Clock Analyzer Tcl script ................................................................................................. 18
Table of Figures
Figure 3-1 Clock tree power components....................................................................................... 5
Figure 3-2 Clock Buffer .lib extraction example ........................................................................... 6
Figure 3-3 Register .lib extraction example.................................................................................... 7
Figure 3-4 Tcl script defining .lib power numbers ......................................................................... 7
Figure 6-1 Power plot example.................................................................................................... 14
Figure 7-1 Clock tree plotting example ....................................................................................... 15
1.0 Introduction
Clock tree power is a significant contributor to overall dynamic chip power. Industry-wide
averages indicate that 40 to 50% of dynamic chip power comes from the clock tree1. Analyzing
the components that make up clock tree power is important. The ultimate goal is to identify the
most important components of clock tree power so that designers can concentrate on those areas
to reduce clock tree power on future chips. Another goal is to evaluate the effect of using Power
Compilers clock gating feature on a design. If most of the clock tree power is at the leaf of the
tree, then Power Compiler clock gating will have a major impact on reducing clock tree power.
The clock tree analysis described in this paper was done entirely within PrimeTime. While
PrimePower can do some of this analysis, PrimePower does not have the flexibility to analyze
different aspects of clock tree power, as can be done with simple Tcl scripts inside PrimeTime.
This paper also describes how to utilize a new PrimeTime feature,
read_parasitics_load_locations, which can be used to help view critical paths and clock trees.
PrimePower
PrimePower has the ability to calculate clock tree power and break down the components into
several categories. However, there are a few improvements needed for reporting and debugging,
which are not currently supported.
First, PrimePower requires a separate license from PrimeTime. While many companies have
PrimeTime licenses, not all companies have a PrimePower license nor do they have people
sufficiently experienced with the tool to use it effectively. Second, PrimePower does not
consider the internal register power that is consumed when only the clock pin is toggling. Third,
PrimePower does not have the flexibility to quickly analyze clock trees starting and ending at
specific points as required by the user. Fourth, PrimePower does not have the ability to extract
additional useful clock tree power statistics, which is shown later in this paper.
2.1.2
To measure clock tree power with real silicon, the design must provide an easy way to turn on
and off separate clock trees at their source while holding the design in some sort of reset state. If
this mechanism is provided, the power measurement simply involves measuring the current
before and after the clock is turned on. The difference between the two power measurements is
the amount of clock tree power consumed.
Because it is difficult to create an effective measurement setup with silicon, it is suggested that a
sanity check that correlates clock tree power results to some static predictions be performed.
It is nearly impossible to bound the best and worst case clock tree power numbers resulting from
Power Compiler clock gating cells. Even if a design can be held in a reset state while measuring
clock tree power, it is not known what percentage of the power compiler clock gating cells will
be in a gating state and what percentage will be in a non-gating state. Ideally, it is preferable to
bound the clock tree power with a maximum and minimum value depending on the gating state
of the clock gating cells. This bounding currently cannot be done using real silicon.
2.2 Dynamic measurements
2.2.1
PrimePower
PrimePower requires a SAIF file based on some simulation of a real-life scenario. The SAIF file
would need to encompass all clock trees. The other option is to create multiple SAIF files, one
per clock regime, to help isolate certain clocks. Generating SAIF files that reflect real
functionality can sometimes be difficult.
2.2.2
During real chip operation, it can be difficult to isolate clock tree power from combinational
logic switching power. Even if measurements can be taken, they must be correlated to some
other data (from PrimePower or other estimated analysis, for example) as a sanity check.
Clock tree power is calculated by taking into account the components shown in
Figure 3-1. The resulting general formula for clock tree power consumption is:
Clock Tree Power = Power(Cint_buffers) + Power(Cint_leaf_cells) + Power(Cwire) + Power(Cpin)
PrimeTime can provide the wire and pin capacitances required to calculate total power.
PrimeTime will not provide the internal switching power of the buffers and leaf cells that comes
from the clock lines toggling. This information needs to be extracted from the .lib file for the
standard cell library. The internal_power tables need to be parsed for buffers and leaf cells.
This information is then fed into PrimeTime to complete the calculation.
The internal switching power of clock tree buffers is a component of two values: input transition
time and output load. It is assumed that the input transition time and output load of clock tree
buffers is fairly tight (and consistent) across the clock tree. If this is the case, notice that the
internal switching power of a clock tree buffer does not change much within the range of
acceptable transition times and output loads for most clock tree synthesis (CTS) settings.
Because of this phenomenon, it is reasonable to choose an average power value from the
internal_power table inside the .lib file of the standard cell library. An example is shown in
Figure 3-2. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.
power_lut_template (clock_buffer1_energy_template_0)
variable_1 : input_transition_time ;
variable_2 : total_output_net_capacitance ;
index_1 ( "0.1,0.25,0.3,0.45,0.5,2.0" );
index_2 ( "0,5,25,50,90,340,2000" );
}
cell (clock_buffer1) {
[snip]
pin (z) {
[snip]
internal_power () {
related_pin : "a" ;
fall_power (clock_buffer1_energy_template_0)
values ( "77,77,78,79,79,80,80",\
"76,76,77,78,78,83,80",\
"75,75,76,77,77,79,79",\
"76,76,76,76,77,80,80",\
"78,77,77,78,78,79,81",\
"85,85,84,83,84,85,86" );
}
rise_power (clock_buffer1_energy_template_0)
values ( "80,80,81,81,82,81,82",\
"79,79,80,80,81,78,79",\
"77,77,78,78,78,78,79",\
"79,79,78,77,76,74,75",\
"81,82,84,86,88,71,71",\
"87,87,86,86,86,88,90" );
}
}
}
}
The same principle applies to registers. Internal register power depends only on input transition
time. Because transition times are fairly sharp after CTS, it is reasonable to choose an average
(or best guess) value from the .lib file of the standard cell library. An example is shown in
Figure 3-3. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.
power_lut_template (reg1_energy_template_0)
variable_1 : input_transition_time ;
index_1 ( "0.1,0.2,0.3,0.4,0.8,1.5" );
}
cell (reg1) {
[snip]
pin (clk) {
direction : input ;
capacitance : 5.0;
clock : true ;
internal_power () {
fall_power (reg1_energy_template_0)
values ( "26,26,26,26,26,27" );
}
rise_power (reg1_energy_template_0)
values ( "25,25,25,25,25,26" );
}
}
}
[snip]
}
{
{
At QUALCOMM, we use our own standard cell library. However, this same principle can be
applied to the TSMC standard cell library as well.
A Perl script can be used to preprocess the .lib files and write a Tcl script to read into
PrimeTime. This Tcl script defines a new user attribute for each clock buffer and register in the
standard cell library to store these values. Note that the Perl script must add both the rise and
fall power and supply the summed value to PrimeTime. The summed value represents the
consumed power during one clock cycle. A sample portion of the resulting Tcl script is shown
in Figure 3-4.
define_user_attribute -type float
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
38
40
44
50
53
60
The Tcl script provides PrimeTime with the additional information necessary to calculate clock
tree current consumption. A Tcl script can be written to traverse the clock tree, computing clock
tree power as it traverses, until it reaches a leaf cell. A leaf cell is typically a register but can
also be a memory element, custom block, or a random logic gate.
While PrimeTime is traversing the tree, the Tcl script can save various statistics that can be used
after PrimeTime completes the traversal. First, the script can optionally stop at power-compiler
clock gating cells (CGCs). If the Tcl script is run twice on the same clock, stopping once at
CGCs, and another time traversing through them, it is possible to see the maximum effect of
Power Compiler on clock tree power. In one case, Power Compiler is calculating the clock tree
power assuming that the CGCs are in a gating state. In the other case, it is calculating the clock
tree power assuming that the CGCs are in a non-gating state. The difference between these two
values represents the maximum dynamic current savings due to power compiler clock gating. In
reality, the real dynamic current consumed by a clock tree will be somewhere between these two
values. The dynamic current will also depend on how often the clock-enables are active. How
often the clock-enables are active is completely design dependent.
Second, the script can track how much current is consumed at the leaf of the tree. In this paper,
power consumed at the leaf of the tree is computed by summing the currents due to the final wire
and pin caps after the last buffer (or CGC), as well as the final leaf cell. If a majority of the
clock tree power comes from the leaf of the tree, Power Compiler will be extremely useful in
saving clock tree power. However, if a majority of the clock tree power comes from higher up in
the tree, Power Compiler would not be very effective in gating off clock tree power.
Third, the script can keep track of how much current is consumed by various components of the
clock tree:
With this information, it should be easy to see which areas of the clock tree should be evaluated
to save the most power.
3.2 Details of the PrimeTime Tcl script
The complete PrimeTime Tcl script is provided in the Appendix for reference. This section
describes how the script works. The script does the following in the order listed:
1. Prerequisite: Source the Tcl script generated from preprocessing the standard cell .lib file.
2. The user provides the start point of the clock tree.
3. Traverse the tree recursively, continuing only if the script finds a legal clock tree library
cell.
4. Optionally stop at power compiler gating cells.
5. Record all components and their power contributions along the way, and also the power
contributions of components at each level of the tree.
6. For each leaf traversed, record the power consumed:
a. Include the last wire and pin caps.
b. Include the final internal switching power of the leaf cell.
7. When traversal is complete, report final power statistics.
Current Source
Misc
Memory
CGC (int)
Pin
Clock Buffer (Int)
Wire
Register (Int)
Last Stage
Chip 1 - 130nm Chip 2 - 130nm Chip 3 - 130nm Chip 4 - 130nm Chip 5 - 130nm Chip 6 - 90nm Chip 7 - 90nm
% of Total
% of Total
% of Total
% of Total
% of Total
% of Total
% of Total
0%
0%
0%
0%
0%
2%
2%
0%
0%
1%
1%
1%
0%
0%
4%
3%
5%
4%
4%
1%
1%
10%
10%
9%
9%
9%
10%
10%
15%
15%
14%
16%
16%
14%
14%
18%
24%
25%
25%
24%
24%
23%
52%
48%
47%
45%
46%
49%
50%
68%
69%
70%
69%
70%
70%
70%
Table 4-1 Ungated clock tree power for five 130nm chips and two 90nm chips
Table 4-1 shows two very significant trends. First, note that regardless of the chip or the
technology, roughly 70% of the clock tree power comes from the last stage, that is, the last net
and leaf cell. This indicates that using Power Compiler will be very beneficial and that it should
be run on these chips.
Second, note that roughly 45 to 50% of the clock tree power is due to register power and only
15% is due to clock tree buffer power. Therefore, while it always helps to improve clock tree
buffer cell designs, improving the register design could reduce overall dynamic power
consumption.
4.2 Gated clock tree power
The results in Table 4-2 show the potential maximum power savings for each component of the
clock tree on one 130nm chip due to using Power Compiler.
Current source
Misc
Memory
CGC
Pin
CBUF
Wire
Register
Total Savings
Maximum current
savings per component
using Power Compiler
0%
0%
11%
26%
13%
31%
33%
28%
As discussed earlier in this paper, the maximum current savings is the difference in current,
measured when all clock gating cells are in a non-gating state and when they are in a gating
state. This value represents the maximum potential savings due to Power Compiler. Keep in
mind that the actual savings will depend on how often each CGC is gated off and on.
Also note that Power Compiler was not run on a large portion of the chip. It was run on the
blocks with the fastest clocks but not on many others, mostly due to tool issues on (now) older
versions of Design Compiler. Only 27% of the registers in the chip were synthesized using
Power Compiler. Not all of registers in the 27% were successfully gated using CGCs. Clock
gating was done on 66% of those registers, which equates to 18% of the total registers in the
chip.
So, even with only 18% of the registers successfully gated off, Power Compiler can save us up to
28% on our clock tree power. Again, note that the main reason for the high power savings is that
the highest-speed blocks were using this feature. The medium-to-slower speed blocks were not
able to use Power Compiler. This is an important point. Even if Power Compiler cannot be run
on all the blocks in a chip, it is imperative that Power Compiler be used on the highest-speed
blocks to maximize the potential savings.
Table 4-3 shows various gating statistics for 15 different clock trees that were synthesized using
Power Compiler.
Maximum
power
savings
(%)
Gated
registers (%)
Average
CGC
fanout
Median CGC
fanout
clock1
27%
40%
32
28
clock2
33%
54%
26
18
clock3
35%
59%
31
32
clock4
42%
58%
25
32
clock5
44%
70%
27
29
clock6
45%
56%
43
34
clock7
47%
55%
35
30
clock8
48%
77%
17
14
clock9
55%
86%
20
14
clock10
57%
77%
32
30
clock11
57%
72%
53
32
clock12
62%
72%
31
32
clock13
66%
84%
43
16
clock14
67%
90%
21
26
clock15
75%
88%
31
32
Clock name
10
The goal of gathering the statistics shown in Table 4-3 was to find a correlation between
maximum power savings and some other gating metric. While the power savings generally
correlates to the percentage of gated registers, the power savings do not always follow this
correlation. By evaluating the average and median CGC fanout, the expectation was to see a
definite strong trend between power savings and gated registers. This was not the case.
It is useful to note the wide range in power savings from clock regime to clock regime. Some
savings were as high as 75%, while others were as low as 27%. In general, the clocks on this
chip averaged 45 to 50% maximum power savings on their clock trees.
11
Clock name
% Difference
PT vs.
Silicon
clock1
Silicon
measurement
margin of error
-1%
0%
clock2
8%
1%
clock3
-3%
0%
clock4
-12%
6%
clock5
4%
0%
clock6
3%
1%
clock7
7%
4%
clock8
9%
4%
clock9
10%
11%
clock10
11%
11%
clock11
7%
11%
clock12
18%
4%
clock13
16%
25%
clock14
4%
1%
Note that almost all clocks are within 10% of the expected value predicted by PrimeTime. The
clocks that are outside the 10% range could be due to random logic that is not held in reset and is
toggling with the clock.
5.3 Gated clocks
Table 5-2 shows the correlation to clocks that were synthesized using Power Compiler.
Clock
clock1
clock2
clock3
clock4
clock5
clock6
clock7
clock8
clock9
clock10
clock11
clock12
clock13
clock14
Silicon
measuremen
t
38
-2
-3
31
27
81
43
80
86
8
90
56
-9
26
PT
gated
0
0
0
0
0
0
0
0
0
0
0
0
0
0
PT
ungated
100
100
100
100
100
100
100
100
100
100
100
100
100
100
Silicon
measurement
margin of error
1
1
7
0
3
2
36
7
17
174
9
6
10
0
12
The clock tree current measurements listed in Table 5-2 are normalized to a minimum and
maximum value of 0 and 100. Note that almost all clocks fall into the maximum/minimum range
when taking into account the measurement margin of error.
5.4 Dynamic clock savings
In order to evaluate the real effect of using Power Compiler, it is necessary to run a real life
application on silicon and measure the results for a block, one on a chip that used Power
Compiler and one on a chip that did not. This comparison was done on two blocks using two
different chips with the same RTL, technology, tools (but not tool versions), and most library
cells. The tool versions were updated to a later version for the chip that used Power Compiler,
which might cause a slight difference in the dynamic power results when logic is on. Other than
that, the only main difference was that one chip used Power Compiler clock gating and the other
chip did not.
Block
PT maximum
predicted CT
savings
Silicon
savings
Cell count
difference
block1
38%
40%
-8.5%
block2 / mode1
44%
block2 / mode2
39%
Comments
Savings higher probably also
due to cell count decrease
+4%
31%
Table 5-3 Dynamic clock savings from Power Compiler on silicon
For both of these blocks, note the significant savings due to Power Compiler clock gating. The
power measurements were taken once after each block was set up, just before the blocks were
run. The power measurements were taken again while each block was running, doing real work.
The difference between these two numbers is what is being compared between these two chips.
This data provides proof that real dynamic clock tree power savings can be accomplished by
using Power Compiler.
13
of the dynamic power on a chip, this analysis can be used as an indication of IR drop issues
caused by the clock tree.
The actual plotting is done using Gnuplot 4.0+. A Tcl script can be written to write Gnuplot
commands to generate the plots of interest. A sample Tcl script is provided in the Appendix.
6.2 Sample results
A sample power plot picture is shown in Figure 6-1.
The Tcl script creates bins for a small area of a chip. The total clock tree power consumed
within that area is summed up and normalized with the rest of the bins. Bins with higher clock
tree power are represented by orange and red colors. Bins with lower clock tree power are
indicated by green and blue colors
It is possible to annotate the top-level floorplan to the plot as well. This can be done by using
PrimeTime or Physical Compiler, depending on the particular design flow. If PrimeTime is used
for flat timing analysis (that is, no hardmacros, ETMs, or ILMs), Physical Compiler should be
used to extract the hardmacro boundaries.
14
Note that the real routes of the clocks are not shown. A straight connection is made using
Gnuplot to connect the buffers in the tree.
8.0 Conclusion
Clock trees typically consume 40 to 50% of the dynamic power of a chip. Analysis shows that
most of that clock tree power, upwards of 70%, is at the leaf. In fact, 45 to 50% of the clock tree
power comes from the internal switching power of the registers.
The data shows that Power Compiler should, and does, help reduce clock tree power
significantly.
The data also identifies which areas of the clock tree should be considered for improvement to
get the most bang for the buck. First, it is imperative that the placement of the last buffer (or
gating cell) be optimized with respect to the leaf cells of the tree. The closer together that the
last buffer and leaf are, the greater are the power savings that can be achieved. Second,
considerable effort should be placed on improving the internal switching power of the registers.
If this improvement comes at the cost of performance, then it might be feasible to have two types
of registers, one with less dynamic power consumption but poorer performance, and one with
better performance but more dynamic power consumption. With these two register types,
synthesis tools should be able to choose the appropriate register to meet the design constraints.
If the synthesis tools cannot handle this trade-off, external scripts can be written to perform the
necessary register swaps where needed. If scripts are used, it is beneficial if the two register
types have identical footprints so that a cell swap can be performed easily without affecting
placement.
15
9.0 Acknowledgements
I would like to thank Iain Finlay of QUALCOMM, Inc. for guiding me along the path of clock
tree power analysis. Without both his early analysis of clock tree power for previous chips and
his guidance, this analysis would never have been done.
I would also like to thank both Elisabeth Moseley and Geoffrey Suzuki of Synopsys for their
help in researching and writing this paper.
10.0 References
[1] Chun, K. and Ling, A. Placement approach cuts SoC power needs. EE Times, 11/21/03
http://www.eetimes.com/story/OEG20031121S0035
[2] Synopsys PrimeTime User Guide, Version 2004.12, 2004.
11.0 Appendix
11.1 Perl script to preprocess .lib file
#!/bin/perl
$lib_fname = "mylibrary.lib ";
$library_name = "yourlibraryname";
$outfile = "set_attribute_library.tcl";
open (INFILE,$lib_fname) || die "Cannot open $lib_fname for reading\n";
open (OUTFILE,">$outfile") || die "Cannot open $outfile for writing\n";
print OUTFILE "define_user_attribute -type float -classes lib_cell total_power\n";
while (<INFILE>)
{
if (/^\s+cell \(([^\)]+)\)/)
{
$cellname = $1;
# For registers, extract here.
if ($cellname =~ /DFF/)
{
while (<INFILE>)
{
# Clock pin of register here.
if (/^\s+pin \(clock/)
{
#
#
#
#
#
#
while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>; # values line
16
/values \( "[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$fall_power = $1;
$_ = <INFILE>; # } line
$_ = <INFILE>; # rise_power line
$_ = <INFILE>; # values line
/values \( "[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$rise_power = $1;
$total_power = $rise_power + $fall_power;
print OUTFILE "set_user_attribute [get_lib_cells
$library_name/$cellname] total_power $total_power\n";
goto next_cell;
}
next_power:
}
}
}
}
# Get clock buffers, CGC cells, etc. here
elsif (($cellname =~ /CBUF/) ||
($cellname =~ /CGC/)
)
{
while (<INFILE>)
{
if ( ((/^\s+pin \(z/) && ( ($cellname =~ /CBUF/)
)) ||
((/^\s+pin \(clk/) && ($cellname =~ /cgc/))
)
{
#
# Now Find internal power sections..
# Make sure to use the one without a when: clause..
# We're lucky because it's always the last one
#
while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>;
# pick the 3rd line.. avg of that line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 2
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$fall_power = $1;
$_ = <INFILE>; # values line 4
$_ = <INFILE>; # values line 5
$_ = <INFILE>; # values line 6
$_ = <INFILE>; # } line
$_ = <INFILE>; # rise_power line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$rise_power = $1;
$total_power = $rise_power + $fall_power;
print OUTFILE "set_user_attribute [get_lib_cells
$library_name/$cellname] total_power $total_power\n";
goto next_cell;
}
next_power:
}
}
}
}
}
next_cell:
}
17
18
19
"set multiplot"
"set key off"
"set style line 1"
"set style line 6"
"plot [0:$_die_size_x][0:$_die_size_y] '$_hm_ref_graph_file' with
"plot [0:$_die_size_x][0:$_die_size_y] '$_plot_tree_file' with lines
#####################################################
# This is only needed if you want to do gnuplot
# power plotting
#
# Then do:
# (execute gnuplot 4.0)
# load "$_power_dir/gnuplot.script"
# plot [0:$_die_size_x][0:$_die_size_y] 'pc_hm_boxes.graph' with lines ls 6
#####################################################
#
# Specify how many bins in X and Y direction for power plotting
# Should be nice even number from chip dimensions.
#
set _num_x_bins 20
set _num_y_bins 20
set _x_bin_size [expr $_die_size_x / $_num_x_bins]
set _y_bin_size [expr $_die_size_y / $_num_y_bins]
#
# Set up power bins initialized to 0
#
if {$_plot_power == 1} {
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
set _current_bin($i,$j) 0
}
}
}
proc store_current_in_bin { x_coord y_coord power } {
global _current_bin
global _die_size_x
global _die_size_y
global _num_x_bins
global _num_y_bins
global _x_bin_size
global _y_bin_size
global _max_current_in_bin
global _min_current_in_bin
set x_bin [expr int($x_coord / $_x_bin_size)]
if {[expr $x_bin >= $_num_x_bins]} {
set x_bin $_num_x_bins
}
set y_bin [expr int($y_coord / $_y_bin_size)]
if {[expr $y_bin >= $_num_y_bins]} {
set y_bin $_num_y_bins
}
set _current_bin($x_bin,$y_bin) [expr $_current_bin($x_bin,$y_bin) + $power]
}
#####################################################
# Initialize variables that should span across
# all calls to trace_clock_tree procedure. These
20
21
22
23
24
###################################################################
if {$found_leaf == 0} {
set found_leaf 1
set current [expr $tot_cap * $_vdd * $_freq / 1000000]
# Add in net between buffer and this reg only once
set _last_stage_power [expr $_last_stage_power + ($tot_cap * $_vdd * $_freq /
1000000)]
set _last_stage_power_for_clock [expr $_last_stage_power_for_clock + ($tot_cap *
$_vdd * $_freq / 1000000)]
}
}
}
###############################################################
# If finishing up an entire clock (done with recursion)
# Let's write out stats for this clock
###############################################################
if {$orig_mylevel == 0} {
set
set
set
set
set
set
set
_total_current_due_to_pins 0
_total_current_due_to_wires 0
_total_current_due_to_cgcs 0
_total_current_due_to_cbufs 0
_total_current_due_to_misc 0
_total_current_due_to_regs 0
_total_current_due_to_mems 0
set _total_leafs 0
set _total_current_all 0
for {set i 1} {$i<=$_max_level} {incr i} {
###############################################################
# Add up power parts for each level to totals for clock.
###############################################################
set _total_current_due_to_pins \
[expr $_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _total_current_due_to_wires \
[expr $_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _total_current_due_to_cgcs \
[expr $_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _total_current_due_to_cbufs \
[expr $_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _total_current_due_to_regs \
[expr $_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _total_current_due_to_misc \
[expr $_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _total_current_due_to_mems \
[expr $_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _total_current_all [expr $_total_current_all + $_total_current_at_level($i)]
###############################################################
# Also, add power parts to total for entire chip
###############################################################
set _top_total_current_due_to_pins \
[expr $_top_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _top_total_current_due_to_wires \
[expr $_top_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _top_total_current_due_to_cgcs \
[expr $_top_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _top_total_current_due_to_cbufs \
[expr $_top_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _top_total_current_due_to_regs \
[expr $_top_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _top_total_current_due_to_misc \
[expr $_top_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _top_total_current_due_to_mems \
[expr $_top_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _top_total_current_all [expr $_top_total_current_all +
$_total_current_at_level($i)]
if {$_rpt_summary_level == 1} {
set ma_mhz [expr $_total_current_at_level($i) / $_freq]
25
puts $CURRENT_LEVEL
"$_myclock,$_myclockname,$i,$_total_pin_current_at_level($i),$_total_wire_current_at_level($i),$_
total_cgc_current_at_level($i),$_total_cbuf_current_at_level($i),$_total_misc_current_at_level($i
),$_total_reg_current_at_level($i),$_total_mem_current_at_level($i),$_total_current_at_level($i),
$_total_current_all,$_freq,$ma_mhz,$_num_cells_at_level($i),$_total_leafs_at_level($i)"
}
}
set ma_mhz [expr $_total_current_all / $_freq]
if {$_rpt_summary == 1} {
puts $CURRENT_SUM
"$_myclock,$_myclockname,$_total_current_due_to_pins,$_total_current_due_to_wires,$_total_current
_due_to_cgcs,$_total_current_due_to_cbufs,$_total_current_due_to_misc,$_total_current_due_to_regs
,$_total_current_due_to_mems,$_total_current_all,$_last_stage_power_for_clock,$_freq,$ma_mhz,$_nu
m_regs"
}
###############################################################
# Let's reset these back to 0 so next run doesn't
# have to set them to 0 again.
###############################################################
for {set i 0} {$i<=$_max_level} {incr i} {
set _num_cells_at_level($i) 0
set _total_current_at_level($i) 0
set _total_pin_current_at_level($i) 0
set _total_wire_current_at_level($i) 0
set _total_cgc_current_at_level($i) 0
set _total_cbuf_current_at_level($i) 0
set _total_misc_current_at_level($i) 0
set _total_reg_current_at_level($i) 0
set _total_leafs_at_level($i) 0
set _total_mem_current_at_level($i) 0
}
###############################################################
# Close any open files
###############################################################
if {$_rpt_summary == 1} {
close $CURRENT_SUM
}
if {$_rpt_summary_level == 1} {
close $CURRENT_LEVEL
}
if {$_plot_tree == 1} {
close $PLOT_TREE_FILE
close $PLOT_TREE_CMD_FILE
}
}
}
###############################################################
# Put calls to clock routine here.
# trace_clock_tree <startpoint> <a clock name> 0 <frequency (MHz)>
#
# The <a clock name> can be any text you want. This shows
# up in a column in the .csv file and is supposed to be shorter
# than the startpoint for easier identification in Excel.
# The "0" is the level. We always start the recursive calls
# to trace_clock_tree with 0 indicating the start of the tree.
###############################################################
trace_clock_tree clock_block/myclock myclockname 0 50
###############################################################
# Now, let's collect the entire chip stats and
# write them out
###############################################################
set percentage [expr $_last_stage_power / $_top_total_current_all * 100]
if {$_rpt_summary == 1} {
set SUMMARY_FILE [open $_summary_dir/summary w+]
puts $SUMMARY_FILE "Total pin current: $_top_total_current_due_to_pins"
puts $SUMMARY_FILE "Total wire current: $_top_total_current_due_to_wires"
26
###############################################################
# For Plotting power, there's lots of stuff to write out..
###############################################################
if {$_plot_power == 1} {
set GNUPLOT_POWER_SCRIPT [open $_power_dir/gnuplot.script w+]
puts $GNUPLOT_POWER_SCRIPT "set key off"
puts $GNUPLOT_POWER_SCRIPT "set style line 1"
puts $GNUPLOT_POWER_SCRIPT "set style line 2"
puts $GNUPLOT_POWER_SCRIPT "set style line 3"
puts $GNUPLOT_POWER_SCRIPT "set style line 4"
puts $GNUPLOT_POWER_SCRIPT "set style line 5"
puts $GNUPLOT_POWER_SCRIPT "set style line 6"
puts $GNUPLOT_POWER_SCRIPT "set style line 7"
puts $GNUPLOT_POWER_SCRIPT "set style line 8"
puts $GNUPLOT_POWER_SCRIPT "set multiplot"
set _max_current_in_bin 0
set _min_current_in_bin 9999999
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
if {[expr $_current_bin($i,$j) > $_max_current_in_bin]} {
set _max_current_in_bin $_current_bin($i,$j)
}
if {[expr $_current_bin($i,$j) < $_min_current_in_bin]} {
set _min_current_in_bin $_current_bin($i,$j)
}
}
}
#
# We have only a certain # of colors to choose from for bins
# So figure out levels..
#
set num_bin_colors 6
set bin_current_width [expr ($_max_current_in_bin - $_min_current_in_bin) / $num_bin_colors]
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
# Create box coordinates to file
27
28