Вы находитесь на странице: 1из 43

Industrial Clock Synthesis

Pei-Hsin Ho
Implementation Group
Synopsys, Inc.

Outline
Problems that our customers care about
Existing solutions and plan-of-record solutions
Improvements required

2009 Synopsys, Inc. (2)

Clock Network
Clock network delivers the clock signal to synchronize
every sequential cell in the clock domain

2009 Synopsys, Inc. (3)

Clock Network Metrics


Power
Insertion delay for a clock path
Usually longer than the clock period
Proxy for variation and power
Logic level
RC delay

Skew under variation


False skew
Variation
Multiple corners
OCV (on-chip variation)

2009 Synopsys, Inc. (4)

Power: Clock

Major Culprit

Power: biggest implementation issue today


Consumer electronics
Frequency limiter: 4GHz ceiling

Biggest trouble maker


Clock
1/3 total power
Many large and leaky buffers

2009 Synopsys, Inc. (5)

Variation: Clock

Most Susceptible

Variation: biggest implementation issue tomorrow


Frequency limiter
>40% dead cycle time
Consumer electronics

clock cycle time


Logic time

Biggest trouble maker

clk variation margin

Clock
OCV impact: 2X

2009 Synopsys, Inc. (6)

clock skew
logic vari. margin

Variation: Clock

Most Susceptible

MC CTS: how to minimize skew for all corners?


Variation ratios of cells and wires differ in different corners
Variation rations of wires differ in different layers
Failed chips hold violation caused by variable skews
Skew = 5-5 = 0;

1
1
1

1
1

Skew = 5.7-5.6 = 0.1

1.1

1
1.2
1
3

2009 Synopsys, Inc. (7)

1.2

1.1

1.1
1.2

1.1
3.3

Performance
Performance: biggest implementation challenge
yesterday?
1G Hz ASICs

Clock skew is still a big issue for performance today


More clocks, more modes and more IPs
Complex clock gating
Higher wire delays

2009 Synopsys, Inc. (8)

Clock Is Key
Not competitive in clock

not competitive in timing or


power for 45nm and beyond

State-of-the-art clock-tree synthesis algorithms


i2

i1

i3
i4

2009 Synopsys, Inc. (9)

Industrial Solutions for Power


Clock gating
Conventional
Sequential
Physical clock gating

Register clumping
Register banking

2009 Synopsys, Inc. (10)

Clock Gating Used by 90% of Synopsys


Users
90%

Clock Gating

Gate-Level Power
Optimization

51%

43%

Multi-Voltage Design

Multi-Threshold
Design

42%

2007
State
Retention/MTCMOS

18%

Power Network
Synthesis

14%

0%

20%

Please check the techniques your team is using on your current project.
2007 N = 718; Margin of error = +/- 4%
2009 Synopsys, Inc. (11)

2006

40%

60%

80%

100%

Clock Gating
ICG

en

gclk
clk

en
clk

ICG

en
clk

High
activity

gclk
Low
activity

Automatic clock gating


always@(posedge clk)
if (en) Q <= D
ICGs (Integrated Clock Gating cells)
Fewer sizes and uneven loads
Consume power

2009 Synopsys, Inc. (12)

harder to balance skew

Physical Clock Gating

Design Compiler inserts ICGs during logic synthesis


ICG drive flops wide-spread for datapath timing; unbalanced ICG levels
Leave power saving opportunities on the table

flop

m1
i1

clock gate

m2

macro
r2
r1

r4
r6
r5

2009 Synopsys, Inc. (13)

i4

r3

buffer

ICG Merge

Merge ICGs of the same enable signal

flop

s1
m1
i1

clock gate

m2

macro
r2
r1

r4
r6
r5

2009 Synopsys, Inc. (14)

i4

r3

buffer

ICG Removal

Remove ICGs that are ineffective (small fanout and mostly


enabled) or causing unbalanced ICG levels

flop

s1
i1

clock gate
macro

r2
r1

r4
r6
r5

2009 Synopsys, Inc. (15)

i4

r3

buffer

ICG Splitting

Split ICGs based on timing, DRC and placement

i2

flop

s1
i1

i1

clock gate
macro
buffer

i4

2009 Synopsys, Inc. (16)

Issue: Higher ICGs


Merging ICGs with the same enable may save more
clock tree power
Can gate a larger subtree

Splitting ICGs make enable signal timing more easily


satisfied

Merge
Split

2009 Synopsys, Inc. (17)

a
2

Issue: Higher ICGs


Does not always save power!
Higher ICGs restrict the sharing of the subtrees
May introduce enable timing violations

Merge
Split
1

2009 Synopsys, Inc. (18)

a
2

Issue: Multi-Layer ICGs


Multi-layer ICGs may save more clock tree power
DC PwrC generates multi-layer ICGs by enable factoring
May gate a larger subtree

Factor
3

Removal
1

2009 Synopsys, Inc. (19)

a&c
2

b&c

a
2

Issue: Multi-Layer ICGs


But does not always save power!
Multi-layer ICGs restrict sharing of the subtrees
Extra ICG may consume more power than a very small gated
subtree

Factor
3

Removal
1

2009 Synopsys, Inc. (20)

a&c
2

b&c

a
2

Flop Placement: Register Clumping


Around 8% reduction in total power on average

2009 Synopsys, Inc. (21)

Flop Placement: Register Banking

Automatically place registers into banks


Reduce power
Reduce clock skew
Implemented as RP (relative placement) constraints
Routability may be an issue for some designs

2009 Synopsys, Inc. (22)

Industrial Solutions for Variation


Clock meshes
OCV-aware clustering

2009 Synopsys, Inc. (23)

metal 8

metal 7

metal 6

planGroup boundary

Clock Meshes
Good skew under variation
Tree above the mesh
Trees below the mesh to drive
the flops

Bad for power


(~+30% clock power)
More wires
Can only gate the small clock trees below the mesh

Few SNPS customers mass-produce IC products with


clock meshes

Insight from clock mesh?


Regularity
2009 Synopsys, Inc. (24)

good for variation

periphery IOs

Dummy ICG Insertion

Insert dummy ICGs to balance topology

i2

flop
i1

i1

clock gate
macro
buffer

i3
i4

2009 Synopsys, Inc. (25)

OCV-Aware Register Clustering

Try to cluster registers with critical timing paths in between within


a 1st or 2nd level cluster
minimize variation impact to timing
i2

flop
i1

i1

clock gate
macro

buffer

i3

2009 Synopsys, Inc. (26)

i4

Register Clumping

Place registers closer for the leaf-level buffers or ICGs to


minimize leaf-level net capacitance (>50% of total net cap)
i2

flop
i1

i1

clock gate
macro
buffer

i3
i4

2009 Synopsys, Inc. (27)

Register Banking

Place registers into rectangular banks (more dramatic form of


register clumping)
i2

flop
i1

i1

clock gate
macro
buffer

i3
i4

2009 Synopsys, Inc. (28)

Regular Clock Tree Synthesis

Synthesize regular buffer tree based on DRC and placement


Balanced buffer levels, ICG levels, fanout, wire length (by placement), metal
layer
i2
i2

flop
i1

i1
i1

clock gate
macro
buffer

i3
i4

2009 Synopsys, Inc. (29)

Industrial Solutions for Timing


Clock routing
Useful skews
Inter-clock delay balancing
CTS for SoCs
Multi-voltage-domain and multi-mode CTS

2009 Synopsys, Inc. (30)

Clock Routing

Signal routing mostly for routability (wire length) and


timing

Clock routing mostly for skew under variation and power


(wire length)

Snaking for skew under variation


Selective shielding clock nets to
control skew

2009 Synopsys, Inc. (31)

Detail-Route Clocks with Wire


Snaking

Detail route the clock tree using minimal wire snaking (and
shielding) to fine-tune skew
i2

flop
i1
i1

clock gate
macro
buffer

i3

2009 Synopsys, Inc. (32)

i4

Before Physical Synthesis


Irregular gated clock topology bad for variation and
power
flop

m1
i1

clock gate

m2

macro
r2
r1

r4
r6
r5

2009 Synopsys, Inc. (33)

i4

r3

buffer

After Physical Synthesis


Regular gated clock topology good for variation and
power
i2

flop
i1
i1

clock gate
macro
buffer

i3

2009 Synopsys, Inc. (34)

i4

Useful Skew
Clock skews can be used to fix timing violations
Setup: trigger the launcher sooner and/or the capturer later
Hold: trigger the launcher later and/or the capturer sooner

Risk
Clock skew is hard to control under variation

2009 Synopsys, Inc. (35)

Inter-Clock Delay Balancing


If there are timing paths from one clock domain to
another clock

Inter-clock delay must be balanced based on the


timing constraints
Extra insertion delay is bad for power and variation

2009 Synopsys, Inc. (36)

CTS for SoCs

SoC
Large number of IPs with
known clock latencies
Hard to balance skews

Routing Clock source


blockages

Large number of placement


and/or routing blockages
Hard to balance topology
Multiple voltage domains and
multiple operation modes
Multiple clocks
Complex requirements
2009 Synopsys, Inc. (37)

Macro clock pins

Placement
blockage

Multiple Voltage Domains and


Multiple Modes
Clock shared by multiple voltage
domains
No timing path in between
insert
isolation cells near the top of the
Routing Clock source
clock tree to save power

Clock going through a voltage

blockages

domain that may be turned off in


an operation mode
Clock buffers powered by alwayson power rail
Implication in power rail synthesis
Macro clock pins
2009 Synopsys, Inc. (38)

Placement
blockage

Summary
Competitive clock synthesis technology is key for IC
product differentiation
Power
Variation
Performance

Academic research in this area will make huge


impacts to the real world through the realization of
cool, robust and fast ICs

2009 Synopsys, Inc. (39)

Backup Slides

2009 Synopsys, Inc. (40)

Pipelined Datapath with Bubbles


Invalid data (bubbles) may go through the pipelined
datapath and consume energy during computation

+
2009 Synopsys, Inc. (41)

&

Conventional Clock Gating


Gate clock upon invalid data
always @ (posedge clk)
if (vld) begin a1 <= a0; b1 <= b0; end

+
2009 Synopsys, Inc. (42)

&

Sequential Clock Gating


Introduce a valid-bit pipeline to track valid data and gate
the clock to the datapath so that no pipeline stage
computes for invalid data

+
2009 Synopsys, Inc. (43)

&

Вам также может понравиться