Вы находитесь на странице: 1из 28

Analyzing Clock Trees

Jeff Shabel

QUALCOMM, Inc.
jshabel@qualcomm.com

ABSTRACT
Clock tree power continues to be a major contributor to dynamic and, to a lesser degree, static
chip power. It is imperative for low-power designs to reduce clock tree power as much as
possible. The introduction of Power Compiler makes it possible to drastically cut dynamic clock
tree power. This paper shows how PrimeTime 2004.12 was used to obtain fairly accurate clock
tree power estimates on a 130nm chip. On this chip, Power Compiler dynamic clock gating was
applied to several blocks. This paper describes how much power was consumed by each
component of the clock tree, including wire, pin, clock buffer and register components. By
analyzing how these different components contribute to the overall clock tree power, it was
possible to find ways to improve the library to achieve lower-power designs. In addition this
analysis shows how much of the clock tree power was consumed at the leaf of the tree. In
situations where the leaf of the tree consumes more power, Power Compiler can be used to
achieve significant savings in dynamic clock tree power. This paper explains the potential
savings achieved on a real chip by using Power Compiler and compares this information to real
silicon data. The paper also compares two different chips, one that used Power Compiler and one
that did not, and describes how power savings for two blocks were achieved with Power
Compiler.
In addition to analyzing clock tree power, this paper explains how to take advantage of the new
PrimeTime 2004.12 feature, read_parasitics_load_locations, to assist with visualizing clock
trees. The paper describes how Tcl scripts can be used to plot clock trees, critical paths, and
clock tree power relative to a given floorplan. These scripts can provide the designer with a
quick and intuitive way to analyze clock trees and to find any potential issues with them.

Table of Contents
1.0
2.0
2.1
2.1.1
2.1.2
2.2
2.2.1
2.2.2
3.0
3.1
3.2
4.0
4.1
4.2
5.0
5.1
5.2
5.3
5.4
6.0
6.1
6.2
7.0
7.1
7.2
8.0
9.0
10.0
11.0
11.1
11.2

Introduction......................................................................................................................... 4
How is clock tree power measured now? ........................................................................... 4
Static measurements............................................................................................................ 4
PrimePower..................................................................................................................... 4
With real silicon.............................................................................................................. 4
Dynamic measurements ...................................................................................................... 5
PrimePower..................................................................................................................... 5
With real silicon.............................................................................................................. 5
Using PrimeTime to calculate clock tree power ................................................................. 5
Overview............................................................................................................................. 5
Details of the PrimeTime Tcl script.................................................................................... 8
Static analysis results .......................................................................................................... 9
Ungated clock tree power ................................................................................................... 9
Gated clock tree power ....................................................................................................... 9
Correlation to silicon ........................................................................................................ 11
Setup and method.............................................................................................................. 11
Ungated clocks.................................................................................................................. 11
Gated clocks...................................................................................................................... 12
Dynamic clock savings ..................................................................................................... 13
Power plotting................................................................................................................... 13
Goals and setup ................................................................................................................. 13
Sample results ................................................................................................................... 14
Clock tree plotting ............................................................................................................ 14
Goals and setup ................................................................................................................. 14
Sample results ................................................................................................................... 14
Conclusion ........................................................................................................................ 15
Acknowledgements........................................................................................................... 16
References......................................................................................................................... 16
Appendix........................................................................................................................... 16
Perl script to preprocess .lib file ....................................................................................... 16
Clock Analyzer Tcl script ................................................................................................. 18

SNUG Boston 2005

Analyzing Clock Trees

Table of Figures
Figure 3-1 Clock tree power components....................................................................................... 5
Figure 3-2 Clock Buffer .lib extraction example ........................................................................... 6
Figure 3-3 Register .lib extraction example.................................................................................... 7
Figure 3-4 Tcl script defining .lib power numbers ......................................................................... 7
Figure 6-1 Power plot example.................................................................................................... 14
Figure 7-1 Clock tree plotting example ....................................................................................... 15

SNUG Boston 2005

Analyzing Clock Trees

1.0 Introduction
Clock tree power is a significant contributor to overall dynamic chip power. Industry-wide
averages indicate that 40 to 50% of dynamic chip power comes from the clock tree1. Analyzing
the components that make up clock tree power is important. The ultimate goal is to identify the
most important components of clock tree power so that designers can concentrate on those areas
to reduce clock tree power on future chips. Another goal is to evaluate the effect of using Power
Compilers clock gating feature on a design. If most of the clock tree power is at the leaf of the
tree, then Power Compiler clock gating will have a major impact on reducing clock tree power.
The clock tree analysis described in this paper was done entirely within PrimeTime. While
PrimePower can do some of this analysis, PrimePower does not have the flexibility to analyze
different aspects of clock tree power, as can be done with simple Tcl scripts inside PrimeTime.
This paper also describes how to utilize a new PrimeTime feature,
read_parasitics_load_locations, which can be used to help view critical paths and clock trees.

2.0 How is clock tree power measured now?


Existing Synopsys tools provide various methods to measure clock tree power. This section
discusses the strengths and shortcomings of each method and the tools that are in use today.
2.1 Static measurements
2.1.1

PrimePower

PrimePower has the ability to calculate clock tree power and break down the components into
several categories. However, there are a few improvements needed for reporting and debugging,
which are not currently supported.
First, PrimePower requires a separate license from PrimeTime. While many companies have
PrimeTime licenses, not all companies have a PrimePower license nor do they have people
sufficiently experienced with the tool to use it effectively. Second, PrimePower does not
consider the internal register power that is consumed when only the clock pin is toggling. Third,
PrimePower does not have the flexibility to quickly analyze clock trees starting and ending at
specific points as required by the user. Fourth, PrimePower does not have the ability to extract
additional useful clock tree power statistics, which is shown later in this paper.
2.1.2

With real silicon

To measure clock tree power with real silicon, the design must provide an easy way to turn on
and off separate clock trees at their source while holding the design in some sort of reset state. If
this mechanism is provided, the power measurement simply involves measuring the current
before and after the clock is turned on. The difference between the two power measurements is
the amount of clock tree power consumed.
Because it is difficult to create an effective measurement setup with silicon, it is suggested that a
sanity check that correlates clock tree power results to some static predictions be performed.

SNUG Boston 2005

Analyzing Clock Trees

It is nearly impossible to bound the best and worst case clock tree power numbers resulting from
Power Compiler clock gating cells. Even if a design can be held in a reset state while measuring
clock tree power, it is not known what percentage of the power compiler clock gating cells will
be in a gating state and what percentage will be in a non-gating state. Ideally, it is preferable to
bound the clock tree power with a maximum and minimum value depending on the gating state
of the clock gating cells. This bounding currently cannot be done using real silicon.
2.2 Dynamic measurements
2.2.1

PrimePower

PrimePower requires a SAIF file based on some simulation of a real-life scenario. The SAIF file
would need to encompass all clock trees. The other option is to create multiple SAIF files, one
per clock regime, to help isolate certain clocks. Generating SAIF files that reflect real
functionality can sometimes be difficult.
2.2.2

With real silicon

During real chip operation, it can be difficult to isolate clock tree power from combinational
logic switching power. Even if measurements can be taken, they must be correlated to some
other data (from PrimePower or other estimated analysis, for example) as a sanity check.

3.0 Using PrimeTime to calculate clock tree power


3.1 Overview

Figure 3-1 Clock tree power components

Clock tree power is calculated by taking into account the components shown in
Figure 3-1. The resulting general formula for clock tree power consumption is:
Clock Tree Power = Power(Cint_buffers) + Power(Cint_leaf_cells) + Power(Cwire) + Power(Cpin)
PrimeTime can provide the wire and pin capacitances required to calculate total power.
PrimeTime will not provide the internal switching power of the buffers and leaf cells that comes
from the clock lines toggling. This information needs to be extracted from the .lib file for the

SNUG Boston 2005

Analyzing Clock Trees

standard cell library. The internal_power tables need to be parsed for buffers and leaf cells.
This information is then fed into PrimeTime to complete the calculation.
The internal switching power of clock tree buffers is a component of two values: input transition
time and output load. It is assumed that the input transition time and output load of clock tree
buffers is fairly tight (and consistent) across the clock tree. If this is the case, notice that the
internal switching power of a clock tree buffer does not change much within the range of
acceptable transition times and output loads for most clock tree synthesis (CTS) settings.
Because of this phenomenon, it is reasonable to choose an average power value from the
internal_power table inside the .lib file of the standard cell library. An example is shown in
Figure 3-2. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.
power_lut_template (clock_buffer1_energy_template_0)
variable_1 : input_transition_time ;
variable_2 : total_output_net_capacitance ;
index_1 ( "0.1,0.25,0.3,0.45,0.5,2.0" );
index_2 ( "0,5,25,50,90,340,2000" );
}
cell (clock_buffer1) {
[snip]
pin (z) {
[snip]
internal_power () {
related_pin : "a" ;
fall_power (clock_buffer1_energy_template_0)
values ( "77,77,78,79,79,80,80",\
"76,76,77,78,78,83,80",\
"75,75,76,77,77,79,79",\
"76,76,76,76,77,80,80",\
"78,77,77,78,78,79,81",\
"85,85,84,83,84,85,86" );
}
rise_power (clock_buffer1_energy_template_0)
values ( "80,80,81,81,82,81,82",\
"79,79,80,80,81,78,79",\
"77,77,78,78,78,78,79",\
"79,79,78,77,76,74,75",\
"81,82,84,86,88,71,71",\
"87,87,86,86,86,88,90" );
}
}
}
}

Figure 3-2 Clock Buffer .lib extraction example

The same principle applies to registers. Internal register power depends only on input transition
time. Because transition times are fairly sharp after CTS, it is reasonable to choose an average
(or best guess) value from the .lib file of the standard cell library. An example is shown in
Figure 3-3. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.

SNUG Boston 2005

Analyzing Clock Trees

power_lut_template (reg1_energy_template_0)
variable_1 : input_transition_time ;
index_1 ( "0.1,0.2,0.3,0.4,0.8,1.5" );
}
cell (reg1) {
[snip]
pin (clk) {
direction : input ;
capacitance : 5.0;
clock : true ;
internal_power () {
fall_power (reg1_energy_template_0)
values ( "26,26,26,26,26,27" );
}
rise_power (reg1_energy_template_0)
values ( "25,25,25,25,25,26" );
}
}
}
[snip]
}

{
{

Figure 3-3 Register .lib extraction example

At QUALCOMM, we use our own standard cell library. However, this same principle can be
applied to the TSMC standard cell library as well.
A Perl script can be used to preprocess the .lib files and write a Tcl script to read into
PrimeTime. This Tcl script defines a new user attribute for each clock buffer and register in the
standard cell library to store these values. Note that the Perl script must add both the rise and
fall power and supply the summed value to PrimeTime. The summed value represents the
consumed power during one clock cycle. A sample portion of the resulting Tcl script is shown
in Figure 3-4.
define_user_attribute -type float
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells
set_user_attribute [get_lib_cells

-classes lib_cell total_power


std_cell_library/inv_a] total_power
std_cell_library/inv_b] total_power
std_cell_library/inv_c] total_power
std_cell_library/inv_d] total_power
std_cell_library/inv_e] total_power
std_cell_library/inv_f] total_power

38
40
44
50
53
60

Figure 3-4 Tcl script defining .lib power numbers

The Tcl script provides PrimeTime with the additional information necessary to calculate clock
tree current consumption. A Tcl script can be written to traverse the clock tree, computing clock
tree power as it traverses, until it reaches a leaf cell. A leaf cell is typically a register but can
also be a memory element, custom block, or a random logic gate.
While PrimeTime is traversing the tree, the Tcl script can save various statistics that can be used
after PrimeTime completes the traversal. First, the script can optionally stop at power-compiler
clock gating cells (CGCs). If the Tcl script is run twice on the same clock, stopping once at
CGCs, and another time traversing through them, it is possible to see the maximum effect of
Power Compiler on clock tree power. In one case, Power Compiler is calculating the clock tree
power assuming that the CGCs are in a gating state. In the other case, it is calculating the clock
tree power assuming that the CGCs are in a non-gating state. The difference between these two

SNUG Boston 2005

Analyzing Clock Trees

values represents the maximum dynamic current savings due to power compiler clock gating. In
reality, the real dynamic current consumed by a clock tree will be somewhere between these two
values. The dynamic current will also depend on how often the clock-enables are active. How
often the clock-enables are active is completely design dependent.
Second, the script can track how much current is consumed at the leaf of the tree. In this paper,
power consumed at the leaf of the tree is computed by summing the currents due to the final wire
and pin caps after the last buffer (or CGC), as well as the final leaf cell. If a majority of the
clock tree power comes from the leaf of the tree, Power Compiler will be extremely useful in
saving clock tree power. However, if a majority of the clock tree power comes from higher up in
the tree, Power Compiler would not be very effective in gating off clock tree power.
Third, the script can keep track of how much current is consumed by various components of the
clock tree:

Internal register power, due solely to clock pin toggling


Internal clock buffer switching power
Wire capacitance
Pin capacitance
Clock gating cells switching power
Internal memory power, due solely to clock pin toggling
Power from miscellaneous non-clock buffer cells in tree

With this information, it should be easy to see which areas of the clock tree should be evaluated
to save the most power.
3.2 Details of the PrimeTime Tcl script
The complete PrimeTime Tcl script is provided in the Appendix for reference. This section
describes how the script works. The script does the following in the order listed:
1. Prerequisite: Source the Tcl script generated from preprocessing the standard cell .lib file.
2. The user provides the start point of the clock tree.
3. Traverse the tree recursively, continuing only if the script finds a legal clock tree library
cell.
4. Optionally stop at power compiler gating cells.
5. Record all components and their power contributions along the way, and also the power
contributions of components at each level of the tree.
6. For each leaf traversed, record the power consumed:
a. Include the last wire and pin caps.
b. Include the final internal switching power of the leaf cell.
7. When traversal is complete, report final power statistics.

SNUG Boston 2005

Analyzing Clock Trees

4.0 Static analysis results


4.1 Ungated clock tree power
Table 4-1 lists the entire clock tree power results for five 130nm chips and two 90nm chips. It
assumes that all Power Compiler and manually-instantiated gating cells are in a non-gating state.

Current Source
Misc
Memory
CGC (int)
Pin
Clock Buffer (Int)
Wire
Register (Int)
Last Stage

Chip 1 - 130nm Chip 2 - 130nm Chip 3 - 130nm Chip 4 - 130nm Chip 5 - 130nm Chip 6 - 90nm Chip 7 - 90nm
% of Total
% of Total
% of Total
% of Total
% of Total
% of Total
% of Total
0%
0%
0%
0%
0%
2%
2%
0%
0%
1%
1%
1%
0%
0%
4%
3%
5%
4%
4%
1%
1%
10%
10%
9%
9%
9%
10%
10%
15%
15%
14%
16%
16%
14%
14%
18%
24%
25%
25%
24%
24%
23%
52%
48%
47%
45%
46%
49%
50%
68%
69%
70%
69%
70%
70%
70%

Table 4-1 Ungated clock tree power for five 130nm chips and two 90nm chips

Table 4-1 shows two very significant trends. First, note that regardless of the chip or the
technology, roughly 70% of the clock tree power comes from the last stage, that is, the last net
and leaf cell. This indicates that using Power Compiler will be very beneficial and that it should
be run on these chips.
Second, note that roughly 45 to 50% of the clock tree power is due to register power and only
15% is due to clock tree buffer power. Therefore, while it always helps to improve clock tree
buffer cell designs, improving the register design could reduce overall dynamic power
consumption.
4.2 Gated clock tree power
The results in Table 4-2 show the potential maximum power savings for each component of the
clock tree on one 130nm chip due to using Power Compiler.

Current source
Misc
Memory
CGC
Pin
CBUF
Wire
Register
Total Savings

Maximum current
savings per component
using Power Compiler
0%
0%
11%
26%
13%
31%
33%
28%

Table 4-2 Maximum current savings using Power Compiler

SNUG Boston 2005

Analyzing Clock Trees

As discussed earlier in this paper, the maximum current savings is the difference in current,
measured when all clock gating cells are in a non-gating state and when they are in a gating
state. This value represents the maximum potential savings due to Power Compiler. Keep in
mind that the actual savings will depend on how often each CGC is gated off and on.
Also note that Power Compiler was not run on a large portion of the chip. It was run on the
blocks with the fastest clocks but not on many others, mostly due to tool issues on (now) older
versions of Design Compiler. Only 27% of the registers in the chip were synthesized using
Power Compiler. Not all of registers in the 27% were successfully gated using CGCs. Clock
gating was done on 66% of those registers, which equates to 18% of the total registers in the
chip.
So, even with only 18% of the registers successfully gated off, Power Compiler can save us up to
28% on our clock tree power. Again, note that the main reason for the high power savings is that
the highest-speed blocks were using this feature. The medium-to-slower speed blocks were not
able to use Power Compiler. This is an important point. Even if Power Compiler cannot be run
on all the blocks in a chip, it is imperative that Power Compiler be used on the highest-speed
blocks to maximize the potential savings.
Table 4-3 shows various gating statistics for 15 different clock trees that were synthesized using
Power Compiler.

Maximum
power
savings
(%)

Gated
registers (%)

Average
CGC
fanout

Median CGC
fanout

clock1

27%

40%

32

28

clock2

33%

54%

26

18

clock3

35%

59%

31

32

clock4

42%

58%

25

32

clock5

44%

70%

27

29

clock6

45%

56%

43

34

clock7

47%

55%

35

30

clock8

48%

77%

17

14

clock9

55%

86%

20

14

clock10

57%

77%

32

30

clock11

57%

72%

53

32

clock12

62%

72%

31

32

clock13

66%

84%

43

16

clock14

67%

90%

21

26

clock15

75%

88%

31

32

Clock name

Table 4-3 Gating statistics for several clock regimes

SNUG Boston 2005

10

Analyzing Clock Trees

The goal of gathering the statistics shown in Table 4-3 was to find a correlation between
maximum power savings and some other gating metric. While the power savings generally
correlates to the percentage of gated registers, the power savings do not always follow this
correlation. By evaluating the average and median CGC fanout, the expectation was to see a
definite strong trend between power savings and gated registers. This was not the case.
It is useful to note the wide range in power savings from clock regime to clock regime. Some
savings were as high as 75%, while others were as low as 27%. In general, the clocks on this
chip averaged 45 to 50% maximum power savings on their clock trees.

5.0 Correlation to silicon


Static clock tree power analysis is meaningless without correlation to silicon. This section
shows how the correlation between clock tree power and the static analysis described in this
paper was achieved with silicon.
5.1 Setup and method
Clock tree power is measured on silicon by configuring software registers to turn on and off
individual clock trees. Current measurements are made before and after the clock tree is turned
on. The difference between the two measurements is assumed to be entirely due to clock tree
power. During this measurement, software is holding all the blocks in a reset state to ensure that
most of the random logic is stable when the clocks are turned on.
The clock trees that were not synthesized using Power Compiler can be easily correlated to static
measurements. The clock trees that were synthesized using Power Compiler are more difficult to
correlate. These clock trees have clock gating cells and it is not known which gating cells are in
a non-gating state and which cells are in a gating state while the block is being held in reset. It is
completely design dependent. Therefore, static clock power predictions can only bound the
power when Power Compiler is used. Clock tree power as measured on silicon should be within
the bounds predicted by PrimeTime.
The current measurements have a margin of error due to the precision of the measuring device in
the lab. For smaller clock regimes, the margin of error can be very close to the actual measured
current value. Therefore, it is necessary to consider the measurement error when correlating
clock tree power to static measurements, as noted in the following sections.
5.2 Ungated clocks
Table 5-1 shows the correlation to clocks that were not synthesized using Power Compiler.

SNUG Boston 2005

11

Analyzing Clock Trees

Clock name

% Difference
PT vs.
Silicon

clock1

Silicon
measurement
margin of error

-1%

0%

clock2

8%

1%

clock3

-3%

0%

clock4

-12%

6%

clock5

4%

0%

clock6

3%

1%

clock7

7%

4%

clock8

9%

4%

clock9

10%

11%

clock10

11%

11%

clock11

7%

11%

clock12

18%

4%

clock13

16%

25%

clock14

4%

1%

Table 5-1 Ungated Clock Correlation to Silicon

Note that almost all clocks are within 10% of the expected value predicted by PrimeTime. The
clocks that are outside the 10% range could be due to random logic that is not held in reset and is
toggling with the clock.
5.3 Gated clocks
Table 5-2 shows the correlation to clocks that were synthesized using Power Compiler.

Clock
clock1
clock2
clock3
clock4
clock5
clock6
clock7
clock8
clock9
clock10
clock11
clock12
clock13
clock14

Silicon
measuremen
t
38
-2
-3
31
27
81
43
80
86
8
90
56
-9
26

PT
gated
0
0
0
0
0
0
0
0
0
0
0
0
0
0

PT
ungated
100
100
100
100
100
100
100
100
100
100
100
100
100
100

Silicon
measurement
margin of error
1
1
7
0
3
2
36
7
17
174
9
6
10
0

Table 5-2 Gated clock correlation to silicon

SNUG Boston 2005

12

Analyzing Clock Trees

The clock tree current measurements listed in Table 5-2 are normalized to a minimum and
maximum value of 0 and 100. Note that almost all clocks fall into the maximum/minimum range
when taking into account the measurement margin of error.
5.4 Dynamic clock savings
In order to evaluate the real effect of using Power Compiler, it is necessary to run a real life
application on silicon and measure the results for a block, one on a chip that used Power
Compiler and one on a chip that did not. This comparison was done on two blocks using two
different chips with the same RTL, technology, tools (but not tool versions), and most library
cells. The tool versions were updated to a later version for the chip that used Power Compiler,
which might cause a slight difference in the dynamic power results when logic is on. Other than
that, the only main difference was that one chip used Power Compiler clock gating and the other
chip did not.

Block

PT maximum
predicted CT
savings

Silicon
savings

Cell count
difference

block1

38%

40%

-8.5%

block2 / mode1
44%

block2 / mode2

39%

Comments
Savings higher probably also
due to cell count decrease

+4%

31%
Table 5-3 Dynamic clock savings from Power Compiler on silicon

For both of these blocks, note the significant savings due to Power Compiler clock gating. The
power measurements were taken once after each block was set up, just before the blocks were
run. The power measurements were taken again while each block was running, doing real work.
The difference between these two numbers is what is being compared between these two chips.
This data provides proof that real dynamic clock tree power savings can be accomplished by
using Power Compiler.

6.0 Power plotting


6.1 Goals and setup
PrimeTime, starting in version 2004.12, features a new option for reading in the parasitics file:
read_parasitics_local_locations. This new option provides the capability to trace through the
design and create visual plots more easily than before. Previously, a user had to preprocess the
SPEF or DEF files and create attributes for PrimeTime, to give it the X,Y locations of each cell.
One application of this option is to create power plots of the chip that show clock tree power
consumption. With this capability, a user can quickly tell where the most power in the chip is
being consumed due to a clock tree. Furthermore, since this is done using a Tcl script, the user
has the flexibility to control things such as stopping (optionally) at clock gating cells. This can
be beneficial when analyzing clock tree power. Because clock tree power is roughly 40 to 50%

SNUG Boston 2005

13

Analyzing Clock Trees

of the dynamic power on a chip, this analysis can be used as an indication of IR drop issues
caused by the clock tree.
The actual plotting is done using Gnuplot 4.0+. A Tcl script can be written to write Gnuplot
commands to generate the plots of interest. A sample Tcl script is provided in the Appendix.
6.2 Sample results
A sample power plot picture is shown in Figure 6-1.

Figure 6-1 Power plot example

The Tcl script creates bins for a small area of a chip. The total clock tree power consumed
within that area is summed up and normalized with the rest of the bins. Bins with higher clock
tree power are represented by orange and red colors. Bins with lower clock tree power are
indicated by green and blue colors
It is possible to annotate the top-level floorplan to the plot as well. This can be done by using
PrimeTime or Physical Compiler, depending on the particular design flow. If PrimeTime is used
for flat timing analysis (that is, no hardmacros, ETMs, or ILMs), Physical Compiler should be
used to extract the hardmacro boundaries.

7.0 Clock tree plotting


7.1 Goals and setup
There are other tools that can visually depict clock trees. However, using PrimeTime provides
more flexibility. With PrimeTime, any start point can be specified for a clock tree. Users can
also stop the clock tree transversal at any particular level of the tree or at clock gating cells.
It is helpful to have this flexibility in PrimeTime since many users already use this tool for final
chip analysis.
7.2 Sample results
A sample clock tree is shown in Figure 7-1.

SNUG Boston 2005

14

Analyzing Clock Trees

Figure 7-1 Clock tree plotting example

Note that the real routes of the clocks are not shown. A straight connection is made using
Gnuplot to connect the buffers in the tree.

8.0 Conclusion
Clock trees typically consume 40 to 50% of the dynamic power of a chip. Analysis shows that
most of that clock tree power, upwards of 70%, is at the leaf. In fact, 45 to 50% of the clock tree
power comes from the internal switching power of the registers.
The data shows that Power Compiler should, and does, help reduce clock tree power
significantly.
The data also identifies which areas of the clock tree should be considered for improvement to
get the most bang for the buck. First, it is imperative that the placement of the last buffer (or
gating cell) be optimized with respect to the leaf cells of the tree. The closer together that the
last buffer and leaf are, the greater are the power savings that can be achieved. Second,
considerable effort should be placed on improving the internal switching power of the registers.
If this improvement comes at the cost of performance, then it might be feasible to have two types
of registers, one with less dynamic power consumption but poorer performance, and one with
better performance but more dynamic power consumption. With these two register types,
synthesis tools should be able to choose the appropriate register to meet the design constraints.
If the synthesis tools cannot handle this trade-off, external scripts can be written to perform the
necessary register swaps where needed. If scripts are used, it is beneficial if the two register
types have identical footprints so that a cell swap can be performed easily without affecting
placement.

SNUG Boston 2005

15

Analyzing Clock Trees

9.0 Acknowledgements
I would like to thank Iain Finlay of QUALCOMM, Inc. for guiding me along the path of clock
tree power analysis. Without both his early analysis of clock tree power for previous chips and
his guidance, this analysis would never have been done.
I would also like to thank both Elisabeth Moseley and Geoffrey Suzuki of Synopsys for their
help in researching and writing this paper.

10.0 References
[1] Chun, K. and Ling, A. Placement approach cuts SoC power needs. EE Times, 11/21/03
http://www.eetimes.com/story/OEG20031121S0035
[2] Synopsys PrimeTime User Guide, Version 2004.12, 2004.

11.0 Appendix
11.1 Perl script to preprocess .lib file
#!/bin/perl
$lib_fname = "mylibrary.lib ";
$library_name = "yourlibraryname";
$outfile = "set_attribute_library.tcl";
open (INFILE,$lib_fname) || die "Cannot open $lib_fname for reading\n";
open (OUTFILE,">$outfile") || die "Cannot open $outfile for writing\n";
print OUTFILE "define_user_attribute -type float -classes lib_cell total_power\n";
while (<INFILE>)
{
if (/^\s+cell \(([^\)]+)\)/)
{
$cellname = $1;
# For registers, extract here.
if ($cellname =~ /DFF/)
{
while (<INFILE>)
{
# Clock pin of register here.
if (/^\s+pin \(clock/)
{
#
#
#
#
#
#

Now Find internal power sections..


Make sure to use the one without a when: clause..
We're lucky because it's always the last one
This will take some tweaking to get right for every .lib file.

while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>; # values line

SNUG Boston 2005

16

Analyzing Clock Trees

/values \( "[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$fall_power = $1;
$_ = <INFILE>; # } line
$_ = <INFILE>; # rise_power line
$_ = <INFILE>; # values line
/values \( "[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$rise_power = $1;
$total_power = $rise_power + $fall_power;
print OUTFILE "set_user_attribute [get_lib_cells
$library_name/$cellname] total_power $total_power\n";
goto next_cell;
}
next_power:
}
}
}
}
# Get clock buffers, CGC cells, etc. here
elsif (($cellname =~ /CBUF/) ||
($cellname =~ /CGC/)
)
{
while (<INFILE>)
{
if ( ((/^\s+pin \(z/) && ( ($cellname =~ /CBUF/)
)) ||
((/^\s+pin \(clk/) && ($cellname =~ /cgc/))
)
{
#
# Now Find internal power sections..
# Make sure to use the one without a when: clause..
# We're lucky because it's always the last one
#
while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>;
# pick the 3rd line.. avg of that line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 2
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$fall_power = $1;
$_ = <INFILE>; # values line 4
$_ = <INFILE>; # values line 5
$_ = <INFILE>; # values line 6
$_ = <INFILE>; # } line
$_ = <INFILE>; # rise_power line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$rise_power = $1;
$total_power = $rise_power + $fall_power;
print OUTFILE "set_user_attribute [get_lib_cells
$library_name/$cellname] total_power $total_power\n";
goto next_cell;
}
next_power:
}
}
}
}
}
next_cell:
}

SNUG Boston 2005

17

Analyzing Clock Trees

11.2 Clock Analyzer Tcl script


###################################################################
# Clock Analyzer
#
# Modes:
#
1) Write out Excel .csv file for clock tree power per
#
clock regime with total current consumption per clock
#
showing where current comes from. (pin, reg, etc.)
#
2) Write out Excel .csv file for clock tree power per
#
clock regime as we did for #1 -- except show results for
#
each level of the clock tree for each clock.
#
3) Power plotting. Divide up the chip dimensions into
#
"bins" and show where the most current is being consumed
#
graphically. (red = most current, etc.)
#
4) Clock Tree Plotting
###################################################################
# Have to use PT 2004.12 with the command:
# set read_parasitics_load_locations true
# set before you read in the parasitics..
###################################################################
###################################################################
# Set up modes below.
# Mode 1 = _rpt_summary
# Mode 2 = _rpt_summary_level
# Mode 3 = _plot_power
#
# For any of the above modes, you can have the script stop
# at power-compiler-inserted CGC cells, assuming they are gating
# the clock. Set the variable "_stop_at_pc_cgc" to 1 for this.
###################################################################
set _rpt_summary 0
set _rpt_summary_level 0
set _plot_power 0
set _plot_tree 0
set _stop_at_pc_cgc 0
###################################################################
# For the three modes, give report directory names here where
# to write files.
###################################################################
# For mode 1
set _summary_dir
"summary_results"
if {$_rpt_summary == 1} {
if { [ file exist $_summary_dir ] == 0 } {
file mkdir $_summary_dir
}
}
# For mode 2
set _summary_level_dir "summary_level_results"
if {$_rpt_summary_level == 1} {
if { [ file exist $_summary_level_dir ] == 0 } {
file mkdir $_summary_level_dir
}
}
# For mode 3
set _power_dir "power_results"
if {$_plot_power == 1} {
if { [ file exist $_power_dir ] == 0 } {
file mkdir $_power_dir
}
}
# For mode 4
set _plot_tree_file "test.gnuplot"
set _plot_tree_cmd_file "test.cmd.gnuplot"
if {$_plot_tree == 1} {
set PLOT_TREE_FILE [open $_plot_tree_file w+]
set PLOT_TREE_CMD_FILE [open $_plot_tree_cmd_file w+]
}
###################################################################

SNUG Boston 2005

18

Analyzing Clock Trees

# Power Setup info.


#
1) VDD (in Volts)
###################################################################
set _vdd 2.0
###################################################################
# This filter sets up all the valid cells on the clock tree
# or sitting at the leaf of a tree. This should include
# any clock tree buffers, inverters that area allowed, delay cells,
# gating cells, memories, regs, etc.
###################################################################
set _filter_valid_ct_cells "@ref_name =~ *BUF* ||
ref_name =~ *DFF* ||
ref_name =~ *CGC* ||
ref_name =~ *RAM*"
###################################################################
# Need to know valid leaf cell names so know when to stop
# tracing through cells.
###################################################################
set _filter_valid_leaf_cells "@ref_name =~ *DFF*
||
ref_name =~ *RAM*"
###################################################################
# Need to know CGC library cell name that Power Compiler will
# use to insert clock gating cells.
###################################################################
set _filter_cgc_name "*CGC*"
###################################################################
# Need to know valid register library cell names.
###################################################################
set _filter_reg_name "*DFF*"
###################################################################
# Need to know valid clock buffer library cell names.
###################################################################
set _filter_cbuf_name "*CBUF*"
###################################################################
# Need to know valid memory library cell names.
###################################################################
set _filter_mem_name "*RAM*"
###################################################################
# Need to know valid misc library cell names that could show up.
###################################################################
set _filter_misc_name "*INV*"
###################################################################
# Write out header lines to .csv files if required to do so.
###################################################################
if {$_rpt_summary == 1} {
set CURRENT_SUM [open $_summary_dir/current_sum.csv w+]
puts $CURRENT_SUM "Clock,Simple Clock Name,Pin,Wire,CGC,CBUF,Misc,Reg,Mem,Total,Last Stage
Power,Freq,mA/MHz,Num Regs"
}
if {$_rpt_summary_level == 1} {
set CURRENT_LEVEL [open $_summary_level_dir/current_level.csv w+]
puts $CURRENT_LEVEL "Clock,Simple Clock
Name,Level,Pin,Wire,CGC,CBUF,Misc,Reg,Mem,Total,Running Total,Freq,mA/MHz,Num Cells,Num Leaf
Cells"
}
#####################################################
# This is only needed if you want to do gnuplot
# plotting (power and/or clock).
#####################################################
set _die_size_x 2000
set _die_size_y 2000
#####################################################
# This is only needed if you want to do gnuplot
# clock plotting.
# Then do:

SNUG Boston 2005

19

Analyzing Clock Trees

# (execute gnuplot 4.0)


# load "$plot_tree_cmd_file"
#####################################################
# This is the file that was generated from PC to show hardmacro boundaries
set _hm_ref_graph_file "pc_hm_boxes.graph"
if {$_plot_tree == 1} {
puts $PLOT_TREE_FILE ""
puts $PLOT_TREE_CMD_FILE
puts $PLOT_TREE_CMD_FILE
puts $PLOT_TREE_CMD_FILE
puts $PLOT_TREE_CMD_FILE
puts $PLOT_TREE_CMD_FILE
lines ls 6"
puts $PLOT_TREE_CMD_FILE
ls 1"
}

"set multiplot"
"set key off"
"set style line 1"
"set style line 6"
"plot [0:$_die_size_x][0:$_die_size_y] '$_hm_ref_graph_file' with
"plot [0:$_die_size_x][0:$_die_size_y] '$_plot_tree_file' with lines

#####################################################
# This is only needed if you want to do gnuplot
# power plotting
#
# Then do:
# (execute gnuplot 4.0)
# load "$_power_dir/gnuplot.script"
# plot [0:$_die_size_x][0:$_die_size_y] 'pc_hm_boxes.graph' with lines ls 6
#####################################################
#
# Specify how many bins in X and Y direction for power plotting
# Should be nice even number from chip dimensions.
#
set _num_x_bins 20
set _num_y_bins 20
set _x_bin_size [expr $_die_size_x / $_num_x_bins]
set _y_bin_size [expr $_die_size_y / $_num_y_bins]
#
# Set up power bins initialized to 0
#
if {$_plot_power == 1} {
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
set _current_bin($i,$j) 0
}
}
}
proc store_current_in_bin { x_coord y_coord power } {
global _current_bin
global _die_size_x
global _die_size_y
global _num_x_bins
global _num_y_bins
global _x_bin_size
global _y_bin_size
global _max_current_in_bin
global _min_current_in_bin
set x_bin [expr int($x_coord / $_x_bin_size)]
if {[expr $x_bin >= $_num_x_bins]} {
set x_bin $_num_x_bins
}
set y_bin [expr int($y_coord / $_y_bin_size)]
if {[expr $y_bin >= $_num_y_bins]} {
set y_bin $_num_y_bins
}
set _current_bin($x_bin,$y_bin) [expr $_current_bin($x_bin,$y_bin) + $power]
}
#####################################################
# Initialize variables that should span across
# all calls to trace_clock_tree procedure. These

SNUG Boston 2005

20

Analyzing Clock Trees

# are used to sum up all #'s so we can get entire


# chip stats.
#####################################################
set _last_stage_power 0
set _top_total_current_due_to_pins 0
set _top_total_current_due_to_wires 0
set _top_total_current_due_to_cgcs 0
set _top_total_current_due_to_cbufs 0
set _top_total_current_due_to_misc 0
set _top_total_current_due_to_regs 0
set _top_total_current_due_to_mems 0
set _top_total_current_all 0
#####################################################
# Finally, the real procedure call gets defined.
#####################################################
proc trace_clock_tree { _myclock _myclockname _mylevel _freq } {
#########################################################
# Set up global variables to access here.
#########################################################
global _vdd
global _plot_tree
global PLOT_TREE_FILE
global PLOT_TREE_CMD_FILE
global _last_stage_power
global _plot_power
global _stop_at_pc_cgc
global _filter_valid_ct_cells
global _filter_valid_leaf_cells
global _max_level
global _num_cells_at_level
global _num_regs
global _total_current_at_level
global _total_wire_current_at_level
global _total_mem_current_at_level
global _total_pin_current_at_level
global _total_cgc_current_at_level
global _total_cbuf_current_at_level
global _total_misc_current_at_level
global _total_reg_current_at_level
global _total_leafs_at_level
global _top_total_current_due_to_pins
global _top_total_current_due_to_wires
global _top_total_current_due_to_cgcs
global _top_total_current_due_to_cbufs
global _top_total_current_due_to_misc
global _top_total_current_due_to_regs
global _top_total_current_due_to_mems
global _top_total_current_all
global CURRENT_SUM
global CURRENT_LEVEL
global _summary_level_dir
global _rpt_summary
global _rpt_summary_level
global _last_stage_power_for_clock
global _filter_cgc_name
global _filter_reg_name
global _filter_cbuf_name
global _filter_mem_name
global _filter_misc_name
#########################################################
# Store the original level we were at coming into
# this routine. Increment it for use inside this
# routine.
#########################################################
set orig_mylevel $_mylevel
incr _mylevel;
#########################################################
# If this is the first time we are being called
# for a clock, let's initialize a bunch of variables.

SNUG Boston 2005

21

Analyzing Clock Trees

# The arrayed variables need to be initialized also.


# I picked 10000 just as some high number. I hope no
# clock trees are 10000 levels deep!
#########################################################
if {$orig_mylevel == 0} {
set _max_level 0
set _num_regs 0
set _last_stage_power_for_clock 0
}
#########################################################
# Keep track of max clock tree depth.
#########################################################
if {[expr $_mylevel > $_max_level]} {
set _max_level $_mylevel
}
#########################################################
# Let's initialize/set some arrayed variables
# if they don't already exist.
#########################################################
if {![info exists _num_cells_at_level($_mylevel)]} {
set _num_cells_at_level($_mylevel) 0;
}
if {![info exists _total_current_at_level($_mylevel)]} {
set _total_current_at_level($_mylevel) 0;
}
if {![info exists _total_wire_current_at_level($_mylevel)]} {
set _total_wire_current_at_level($_mylevel) 0;
}
if {![info exists _total_pin_current_at_level($_mylevel)]} {
set _total_pin_current_at_level($_mylevel) 0;
}
if {![info exists _total_cgc_current_at_level($_mylevel)]} {
set _total_cgc_current_at_level($_mylevel) 0;
}
if {![info exists _total_reg_current_at_level($_mylevel)]} {
set _total_reg_current_at_level($_mylevel) 0;
}
if {![info exists _total_mem_current_at_level($_mylevel)]} {
set _total_mem_current_at_level($_mylevel) 0;
}
if {![info exists _total_cbuf_current_at_level($_mylevel)]} {
set _total_cbuf_current_at_level($_mylevel) 0;
}
if {![info exists _total_misc_current_at_level($_mylevel)]} {
set _total_misc_current_at_level($_mylevel) 0;
}
if {![info exists _total_leafs_at_level($_mylevel)]} {
set _total_leafs_at_level($_mylevel) 0;
}
#########################################################
# Ok, let's begin.
#
# Get the cell and its X,Y coordinates for plotting,
# if needed later.
#########################################################
set orig_buf_cell [get_cells -of_objects $_myclock]
set orig_buf_loc_x [get_attribute $orig_buf_cell x_coordinate_max]
set orig_buf_loc_y [get_attribute $orig_buf_cell y_coordinate_max]
#########################################################
# Get immediate fanouts of this cell, filtered for only
# valid clock tree/leaf cell types. Remember to
# remove the original cell we started with.
#########################################################
set myclock_bufs [remove_from_collection \
[filter [all_fanout -flat -only_cells -levels 1 -from $_myclock] \
$_filter_valid_ct_cells] \
[get_cells -of_objects $_myclock] ]
#########################################################
# Get output net from this net and gather capacitance info
#

SNUG Boston 2005

22

Analyzing Clock Trees

# Let's get the net associated with the output of


# the cell in question. Then get the pin and wire
# capacitances for that net. If this cell isn't driving
# anything (weird case), then don't do anything.
#########################################################
set net_name [all_connected $_myclock]
if {[sizeof_collection $net_name] != 0} {
set wire_cap [get_attribute -class net $net_name wire_capacitance_max]
set pin_cap [get_attribute -class net $net_name pin_capacitance_max]
set tot_cap [get_attribute -class net $net_name total_capacitance_max]
#########################################################
# Store the current value due to this net into
# variables for later use. Use divide ratio as needed
# to get from cap units to desired current units to report.
#########################################################
set _total_current_at_level($_mylevel) [expr \
$_total_current_at_level($_mylevel) + \
($tot_cap * $_vdd * $_freq / 1000000)]
set _total_pin_current_at_level($_mylevel) [expr \
$_total_pin_current_at_level($_mylevel) + \
($pin_cap * $_vdd * $_freq / 1000000)]
set _total_wire_current_at_level($_mylevel) [expr \
$_total_wire_current_at_level($_mylevel) + \
($wire_cap * $_vdd * $_freq / 1000000)]
}
#########################################################
# For debugging purposes, it may be useful to know
# what cells were rejected due to our filters above.
# Uncomment this below if you want to know that.
#########################################################
# list rejected ones..
#set rejected [remove_from_collection [remove_from_collection [all_fanout -flat -only_cells levels 1 -from $_myclock] $myclock_bufs] [get_cells -of_objects $_myclock]]
#foreach_in_collection rejected_inst $rejected {
#
set rejected_name [get_attribute $rejected_inst full_name]
#
echo "rejected: $rejected_name"
#}
#########################################################
# Ok, let's go through all the fanouts from the startpoint
# at this level. We need to recursively traverse each
# one (if not a leaf). We also need to start gathering
# all sorts of stats.
#########################################################
#########################################################
# We need this found_leaf variable to help us calculate
# the power at the last stage of the clock tree. We don't
# want to count the last buffer and net more than once!
#########################################################
set found_leaf 0
foreach_in_collection mybuf $myclock_bufs {
incr _num_cells_at_level($_mylevel)
set cell_name [get_attribute $mybuf full_name]
set ref_name [get_attribute $mybuf ref_name]
set lib_cell_name [get_attribute [get_lib_cells -of_objects $mybuf] full_name]
set new_buf_loc_x [get_attribute [get_cells $cell_name] x_coordinate_max]
set new_buf_loc_y [get_attribute [get_cells $cell_name] y_coordinate_max]
if {$_plot_tree == 1} {
puts $PLOT_TREE_FILE "$orig_buf_loc_x $orig_buf_loc_y"
puts $PLOT_TREE_FILE "$new_buf_loc_x $new_buf_loc_y"
puts $PLOT_TREE_FILE ""
}
###############################################################################
# If we stop at PC-CGCs and we're at a PC-CGC, we need to include the
# average power of clk pin. (not through cell which is what's in "total_power".
###############################################################################

SNUG Boston 2005

23

Analyzing Clock Trees

if {($_stop_at_pc_cgc == 1) && [string match $_filter_cgc_name $ref_name]} {


set pwr [get_attribute [get_lib_cell $lib_cell_name] gated_power]
} else {
set pwr [get_attribute [get_lib_cell $lib_cell_name] total_power]
}
# Just in case we didn't set the user attribute for this cell type here
# let's use 0 so our computations below don't die.
if {$pwr == ""} {
echo "Warning: $cell_name ($ref_name) has no total_power attribute set on it"
set pwr 0
}
set _total_current_at_level($_mylevel) \
[expr $_total_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd / 1000000)]
if {$_plot_power == 1} {
store_current_in_bin $new_buf_loc_x $new_buf_loc_y [expr ($pwr * $_freq / $_vdd /
1000000)]
}
if [string match $_filter_reg_name $ref_name] {
incr _num_regs
set _total_reg_current_at_level($_mylevel) \
[expr $_total_reg_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd /
1000000)]
} elseif [string match $_filter_mem_name $ref_name] {
set _total_mem_current_at_level($_mylevel) \
[expr $_total_mem_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd /
1000000)]
} elseif [string match $_filter_cbuf_name $ref_name] {
set _total_cbuf_current_at_level($_mylevel) \
[expr $_total_cbuf_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd /
1000000)]
} elseif [string match $_filter_cgc_name $ref_name] {
set _total_cgc_current_at_level($_mylevel) \
[expr $_total_cgc_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd /
1000000)]
} elseif [string match $_filter_misc_name $ref_name] {
set _total_misc_current_at_level($_mylevel) \
[expr $_total_misc_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd /
1000000)]
}
###################################################################################
# If we are at a PC-CGC and we need to stop here, record it (else part of this
# if statement) and don't trace beyond it.
#
# If we are at a leaf cell, record it (else part of this if statement) and
# don't trace beyond it.
#
# Otherwise, continue down tree.
###################################################################################
if {([sizeof [filter [get_cells $cell_name] $_filter_valid_leaf_cells]] == 0) &&
!([string match $_filter_cgc_name $ref_name] && ($_stop_at_pc_cgc == 1))} {
set myoutput_pin [get_pins -of_objects $mybuf -filter "@pin_direction == out"]
###################################################################
# Trace down tree more!
###################################################################
trace_clock_tree $myoutput_pin $_myclockname $_mylevel $_freq
} else {
###################################################################
# We are at a leaf.
###################################################################
incr _total_leafs_at_level($_mylevel)
set _last_stage_power [expr $_last_stage_power + ($pwr * $_freq / $_vdd / 1000000)]
set _last_stage_power_for_clock [expr $_last_stage_power_for_clock + ($pwr * $_freq /
$_vdd / 1000000)]
###################################################################
# Store cap due to wire+pin and last leaf cell

SNUG Boston 2005

24

Analyzing Clock Trees

###################################################################
if {$found_leaf == 0} {
set found_leaf 1
set current [expr $tot_cap * $_vdd * $_freq / 1000000]
# Add in net between buffer and this reg only once
set _last_stage_power [expr $_last_stage_power + ($tot_cap * $_vdd * $_freq /
1000000)]
set _last_stage_power_for_clock [expr $_last_stage_power_for_clock + ($tot_cap *
$_vdd * $_freq / 1000000)]
}
}
}
###############################################################
# If finishing up an entire clock (done with recursion)
# Let's write out stats for this clock
###############################################################
if {$orig_mylevel == 0} {
set
set
set
set
set
set
set

_total_current_due_to_pins 0
_total_current_due_to_wires 0
_total_current_due_to_cgcs 0
_total_current_due_to_cbufs 0
_total_current_due_to_misc 0
_total_current_due_to_regs 0
_total_current_due_to_mems 0

set _total_leafs 0
set _total_current_all 0
for {set i 1} {$i<=$_max_level} {incr i} {
###############################################################
# Add up power parts for each level to totals for clock.
###############################################################
set _total_current_due_to_pins \
[expr $_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _total_current_due_to_wires \
[expr $_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _total_current_due_to_cgcs \
[expr $_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _total_current_due_to_cbufs \
[expr $_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _total_current_due_to_regs \
[expr $_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _total_current_due_to_misc \
[expr $_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _total_current_due_to_mems \
[expr $_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _total_current_all [expr $_total_current_all + $_total_current_at_level($i)]
###############################################################
# Also, add power parts to total for entire chip
###############################################################
set _top_total_current_due_to_pins \
[expr $_top_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _top_total_current_due_to_wires \
[expr $_top_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _top_total_current_due_to_cgcs \
[expr $_top_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _top_total_current_due_to_cbufs \
[expr $_top_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _top_total_current_due_to_regs \
[expr $_top_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _top_total_current_due_to_misc \
[expr $_top_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _top_total_current_due_to_mems \
[expr $_top_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _top_total_current_all [expr $_top_total_current_all +
$_total_current_at_level($i)]
if {$_rpt_summary_level == 1} {
set ma_mhz [expr $_total_current_at_level($i) / $_freq]

SNUG Boston 2005

25

Analyzing Clock Trees

puts $CURRENT_LEVEL
"$_myclock,$_myclockname,$i,$_total_pin_current_at_level($i),$_total_wire_current_at_level($i),$_
total_cgc_current_at_level($i),$_total_cbuf_current_at_level($i),$_total_misc_current_at_level($i
),$_total_reg_current_at_level($i),$_total_mem_current_at_level($i),$_total_current_at_level($i),
$_total_current_all,$_freq,$ma_mhz,$_num_cells_at_level($i),$_total_leafs_at_level($i)"
}
}
set ma_mhz [expr $_total_current_all / $_freq]
if {$_rpt_summary == 1} {
puts $CURRENT_SUM
"$_myclock,$_myclockname,$_total_current_due_to_pins,$_total_current_due_to_wires,$_total_current
_due_to_cgcs,$_total_current_due_to_cbufs,$_total_current_due_to_misc,$_total_current_due_to_regs
,$_total_current_due_to_mems,$_total_current_all,$_last_stage_power_for_clock,$_freq,$ma_mhz,$_nu
m_regs"
}
###############################################################
# Let's reset these back to 0 so next run doesn't
# have to set them to 0 again.
###############################################################
for {set i 0} {$i<=$_max_level} {incr i} {
set _num_cells_at_level($i) 0
set _total_current_at_level($i) 0
set _total_pin_current_at_level($i) 0
set _total_wire_current_at_level($i) 0
set _total_cgc_current_at_level($i) 0
set _total_cbuf_current_at_level($i) 0
set _total_misc_current_at_level($i) 0
set _total_reg_current_at_level($i) 0
set _total_leafs_at_level($i) 0
set _total_mem_current_at_level($i) 0
}
###############################################################
# Close any open files
###############################################################
if {$_rpt_summary == 1} {
close $CURRENT_SUM
}
if {$_rpt_summary_level == 1} {
close $CURRENT_LEVEL
}
if {$_plot_tree == 1} {
close $PLOT_TREE_FILE
close $PLOT_TREE_CMD_FILE
}
}
}
###############################################################
# Put calls to clock routine here.
# trace_clock_tree <startpoint> <a clock name> 0 <frequency (MHz)>
#
# The <a clock name> can be any text you want. This shows
# up in a column in the .csv file and is supposed to be shorter
# than the startpoint for easier identification in Excel.
# The "0" is the level. We always start the recursive calls
# to trace_clock_tree with 0 indicating the start of the tree.
###############################################################
trace_clock_tree clock_block/myclock myclockname 0 50
###############################################################
# Now, let's collect the entire chip stats and
# write them out
###############################################################
set percentage [expr $_last_stage_power / $_top_total_current_all * 100]
if {$_rpt_summary == 1} {
set SUMMARY_FILE [open $_summary_dir/summary w+]
puts $SUMMARY_FILE "Total pin current: $_top_total_current_due_to_pins"
puts $SUMMARY_FILE "Total wire current: $_top_total_current_due_to_wires"

SNUG Boston 2005

26

Analyzing Clock Trees

puts $SUMMARY_FILE "Total CGC current: $_top_total_current_due_to_cgcs"


puts $SUMMARY_FILE "Total CBUF current: $_top_total_current_due_to_cbufs"
puts $SUMMARY_FILE "Total Misc current: $_top_total_current_due_to_misc"
puts $SUMMARY_FILE "Total reg current: $_top_total_current_due_to_regs"
puts $SUMMARY_FILE "Total mem current: $_top_total_current_due_to_mems"
puts $SUMMARY_FILE "Last stage current: $_last_stage_power ($percentage % of total current)"
puts $SUMMARY_FILE "Total current: $_top_total_current_all"
close $SUMMARY_FILE
}
if {$_rpt_summary_level == 1} {
set SUMMARY_LEVEL_FILE [open $_summary_level_dir/summary w+]
puts $SUMMARY_LEVEL_FILE "Total pin current: $_top_total_current_due_to_pins"
puts $SUMMARY_LEVEL_FILE "Total wire current: $_top_total_current_due_to_wires"
puts $SUMMARY_LEVEL_FILE "Total CGC current: $_top_total_current_due_to_cgcs"
puts $SUMMARY_LEVEL_FILE "Total CBUF current: $_top_total_current_due_to_cbufs"
puts $SUMMARY_LEVEL_FILE "Total Misc current: $_top_total_current_due_to_misc"
puts $SUMMARY_LEVEL_FILE "Total reg current: $_top_total_current_due_to_regs"
puts $SUMMARY_LEVEL_FILE "Total mem current: $_top_total_current_due_to_mems"
puts $SUMMARY_LEVEL_FILE "Last stage current: $_last_stage_power ($percentage % of total
current)"
puts $SUMMARY_LEVEL_FILE "Total current: $_top_total_current_all"
close $SUMMARY_LEVEL_FILE
}
echo
echo
echo
echo
echo
echo
echo
echo
echo

"Total pin current: $_top_total_current_due_to_pins"


"Total wire current: $_top_total_current_due_to_wires"
"Total CGC current: $_top_total_current_due_to_cgcs"
"Total CBUF current: $_top_total_current_due_to_cbufs"
"Total Misc current: $_top_total_current_due_to_misc"
"Total reg current: $_top_total_current_due_to_regs"
"Total mem current: $_top_total_current_due_to_mems"
"Last stage current: $_last_stage_power ($percentage % of total current)"
"Total current: $_top_total_current_all"

###############################################################
# For Plotting power, there's lots of stuff to write out..
###############################################################
if {$_plot_power == 1} {
set GNUPLOT_POWER_SCRIPT [open $_power_dir/gnuplot.script w+]
puts $GNUPLOT_POWER_SCRIPT "set key off"
puts $GNUPLOT_POWER_SCRIPT "set style line 1"
puts $GNUPLOT_POWER_SCRIPT "set style line 2"
puts $GNUPLOT_POWER_SCRIPT "set style line 3"
puts $GNUPLOT_POWER_SCRIPT "set style line 4"
puts $GNUPLOT_POWER_SCRIPT "set style line 5"
puts $GNUPLOT_POWER_SCRIPT "set style line 6"
puts $GNUPLOT_POWER_SCRIPT "set style line 7"
puts $GNUPLOT_POWER_SCRIPT "set style line 8"
puts $GNUPLOT_POWER_SCRIPT "set multiplot"
set _max_current_in_bin 0
set _min_current_in_bin 9999999
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
if {[expr $_current_bin($i,$j) > $_max_current_in_bin]} {
set _max_current_in_bin $_current_bin($i,$j)
}
if {[expr $_current_bin($i,$j) < $_min_current_in_bin]} {
set _min_current_in_bin $_current_bin($i,$j)
}
}
}
#
# We have only a certain # of colors to choose from for bins
# So figure out levels..
#
set num_bin_colors 6
set bin_current_width [expr ($_max_current_in_bin - $_min_current_in_bin) / $num_bin_colors]
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
# Create box coordinates to file

SNUG Boston 2005

27

Analyzing Clock Trees

set fname "$_power_dir/box_${i}_${j}"


set OUTFILE [open $fname w+]
set bottom_x [expr $i*$_x_bin_size]
set bottom_y [expr $j*$_y_bin_size]
set top_x [expr $bottom_x + $_x_bin_size]
set top_y [expr $bottom_y + $_y_bin_size]
puts $OUTFILE "$bottom_x $bottom_y"
puts $OUTFILE "$bottom_x $top_y"
puts $OUTFILE ""
puts $OUTFILE "$bottom_x $bottom_y"
puts $OUTFILE "$top_x $bottom_y"
puts $OUTFILE ""
puts $OUTFILE "$top_x $bottom_y"
puts $OUTFILE "$top_x $top_y"
puts $OUTFILE ""
puts $OUTFILE "$bottom_x $top_y"
puts $OUTFILE "$top_x $top_y"
puts $OUTFILE ""
close $OUTFILE
if {[expr $_current_bin($i,$j) < ($bin_current_width+$_min_current_in_bin)]} {
set ls 2
} elseif {[expr $_current_bin($i,$j) < ($bin_current_width*2+$_min_current_in_bin)]}
{
set ls 3
} elseif {[expr $_current_bin($i,$j) < ($bin_current_width*3+$_min_current_in_bin)]}
{
set ls 7
} elseif {[expr $_current_bin($i,$j) < ($bin_current_width*4+$_min_current_in_bin)]}
{
set ls 8
} elseif {[expr $_current_bin($i,$j) < ($bin_current_width*5+$_min_current_in_bin)]}
{
set ls 4
} else {
set ls 1
}
puts $GNUPLOT_POWER_SCRIPT "plot [0:$_die_size_x][0:$_die_size_y]
'$_power_dir/box_${i}_${j}' with filledcurves xy=$bottom_x,$bottom_y ls $ls"
}
}
close $GNUPLOT_POWER_SCRIPT
set SUMMARY_COLORS [open $_power_dir/summary_colors w+]
set t [expr $bin_current_width+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: green = < $t mA"
set t [expr $bin_current_width*2+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: blue = < $t mA"
set t [expr $bin_current_width*3+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: light orange = < $t mA"
set t [expr $bin_current_width*4+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: dark orange = < $t mA"
set t [expr $bin_current_width*5+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: magenta = < $t mA"
set t [expr $bin_current_width*6+$_min_current_in_bin]
puts $SUMMARY_COLORS "colors: red = < $t mA"
close $SUMMARY_COLORS
}

SNUG Boston 2005

28

Analyzing Clock Trees

Вам также может понравиться