Вы находитесь на странице: 1из 39

Placement

Flow
Bounding Box and Cost Function
• Bounding box underestimates
wirelength
– q(n) is compensation factor
• q is 1 for 3- and 2-terminal nets
• increases to 2.79 for 50 terminal
nets
– Cav is channel capacity (tracks)
in x and y directions over the
bounding box of net n
• penalizes placements which require
more routing in areas of the FPGA
that have narrower channels.
• However, Cav is constant since
channel width is fixed for island
style FPGA
Automatic Schedule - Tstart

• System is hot enough if


– std deviation method of the cost distribution
– Assume normal distribution
• Tstart = K x std. cost deviation
– Tstart has to be high enough to accept a
solution with a probability P whose cost is 3
times worse.
– VPR uses K=20
Annealing Criteria

– Contemporary FPGA packages use the following


parameters:
• Starting temp – 20 * stand_dev(cost of N swaps)
• Cost function – weighted sum of wire length and
delay
• Inner loop – B * N4/3
– VPR uses B=10
– Note: Reducing number of moves per temperature by a factor of 10,
speeds up placement by a factor of 10.
» But may lose quality by about 10%
Temperature Update

• Tnew=α.Told
– The value of α will depend on the fraction of
moves that were accepted at temperature Told
– It has been shown that if the fraction is kept close
to 0.44 for as long as possible then the
convergence is the best.
• Dlimit(new) = Dlimit(old) . (1 – 0.44 + Raccept(old)
Range Limiting
– As temperature drops, limit scope of swaps
– Increased likelihood of acceptance.
– Can also used to secure critical path performance.
Timberwolf

• Std cell placement using Simulated Annealing


• Fixed schedule
– Tstart = 4000000
– Tend = 0.1
– Cool_down = alpha (T)
• alpha = 0.8 à 0.95 à 0.8
– Equilibrium
• Circuit dependant constant number
• Perturb
– Displace a block / Swap two blocks / change orientation of a
block
Timing-driven Placement

– Why should placement take timing into account?


• Placement sets the constraints for router
• A timing driven router’s performance is limited by the
quality of the placement.
• For more speed, placement should be timing-driven.
– Operation principle
• Map blocks that are on critical path onto physical
locations that are closer together
– Minimize the amount of interconnect for critical signals to
traverse
Timing-driven Placement

– Timing-driven only placement


• Increases demand on routing resources
– Wireability-driven only placement
• Slower circuit
– Take both wire length and critical path into
account
• Problem: Modeling delay
– Critical path changes as we move blocks
– Most accurate delay model
» Route each placement
» Extract delay of each connection
» Execution time is a major problem
Timing-Driven Placement – Delay Modeling

• Delay profile
– Homogenous FPGA
– Exploit uniformity
• Compute delay as a function of distance (∆x, ∆y)
• Compute a delay lookup matrix for every possible
∆x, ∆y
– Router is timing driven
• Take advantage of the architecture features
– Segment length
– Use long wires for blocks on far ends of the FPGA
– Assumption that router will probably find the
minimum delay path (a leap of faith!)
Path vs Connection Based Timing Analysis

• Path based:
– Timing-analysis to compute path-delays at every stage of the
placement and use delays in the cost function
– Computationally expensive
• Moving any connection triggers a new timing-analysis
• Connection based:
– Perform timing-analysis before placement
• Assign slacks to each connection
• Pay attention to connections with low slack
– Delay values are always up to date (∆x, ∆y)
– Criticality becomes outdated after the moves
• Approach: Hybrid
– Allow certain number of moves between each timing-analysis
Determining Criticality

– Same basic approach as used for clustering criticality


– For each (i, j) connection from source i and sink j
• Determine arrival times (pre-order BFS)
• Determine required arrival times (post-order BFS)
• Determine slack -> required_arrival_time – arrival_time
• Criticality(i, j) = [1- slack(i, j)/ (Max slack) ]
Cost Function
From lookup table
matrix

[0 1]

° Heavily weight connections that are critical, while


giving less weight to connections that are non-
critical
Balancing Wiring and Timing Cost

– Need to determine relative changes in timing and wiring based


on moves

– Idea: Use relative changes from previous calculation


• Both values less than 1
• Helps balance effect based on scaling parameter
Conclusion

• The greatest challenge facing FPGA placement is the


need to produce high quality placements for ever-
larger circuits.
– FPGA capacity doubles every two to three years, doubling
the size of the placement problem.
• In order to maintain the fast time to market and ease
of use historically provided by FPGAs, placement
algorithms cannot be allowed to take ever more CPU
time.
• There is thus a compelling need for algorithms that are
very scalable and parallel yet still produce high-quality
results.
Analytical Placement

• Good for solving large problems in relatively


short time.
• Tend to find solution that is close to global
optimum
• Problem is solved in mathematical domain
– cells are assumed to be point sized objects of zero
area.
• representing cells with shape and area is a hard problem.
– objective function solves for positional co-ordinates of
the cells in two dimensional plane while minimizing
quadratic wirelength.
Example
(8,10)

2 3
(0,8)

(8,6)
( x3 − 8) 2
( y3 − 6) 2

(0,2)
1 4 (8,3)

(0,0) ( x1 − x4 ) 2 ( y1 − y4 ) 2
Objective Function
Q = ( x1 − 0) 2 + ( x1 − x4 ) 2 + ( x2 − 0) 2 + ( x2 − x3 ) 2 +
( x2 − x4 ) 2 + ( x3 − x4 ) 2 + ( x3 − 8) 2 + ( x4 − 8) 2 +
( y1 − 2) 2 + ( y1 − y4 ) 2 + ( y2 − 8) 2 + ( y2 − y3 ) 2 +
( y2 − y4 ) 2 + ( y3 − y4 ) 2 + ( y3 − 6) 2 + ( y4 − 3) 2
2
Q = ( x1 − 0) 2 + ( x1 − x4 ) 2 + ( x2 − 0) 2 + ( x2 − x3 ) 2 +
3
2 2
( x2 − x4 ) + ( x3 − x4 ) 2 + ( x3 − 8) 2 + ( x4 − 8) 2 +
2

3 3
2
( y1 − 2) + ( y1 − y4 ) + ( y2 − 8) + ( y2 − y3 ) 2 +
2 2 2

3
2 2
( y2 − y4 ) 2 + ( y3 − y4 ) 2 + ( y3 − 6) 2 + ( y4 − 3) 2
3 3
Center Spread Constraints

12 x1 + 12 x2 + 12 x3 + 12 x4 = 48 × 4
12 y1 + 12 y2 + 12 y3 + 12 y4 = 48 × 5

Assuming each cell has area of 12 units with width =3


and height =4 and chip has width = 8 and height = 10.
Problem Formulation
−2
Minimize ⎡4

0 0 0 0 0 0 ⎤

⎢ 0 14 −
4

4
0 0 0 0 ⎥
⎢ 3 3 3 ⎥
⎢ 4 14 4 ⎥ ⎡ x1 ⎤
⎢0 − − 0 0 0 0 ⎥ ⎢x ⎥
3 3 3 ⎥⎢ ⎥
2

⎢ 4 4 20 ⎥ ⎢ x3 ⎥
⎢ −2 − − 0 0 0 0 ⎥⎢ ⎥
1 3 3 3 x
( x1 x2 x3 x4 y1 y2 y3 y4 ) ⎢ ⎥⎢ 4⎥
2 ⎢0 0 0 0 4 0 0 −2 ⎥ ⎢ y1 ⎥
⎢ ⎢ ⎥
⎢0 14 4 4 ⎥⎥ ⎢ y2 ⎥
0 0 0 0 − −
⎢ 3 3 3 ⎥ ⎢ y3 ⎥
⎢ ⎢ ⎥
4 14 4⎥ ⎢y ⎥
⎢0 0 0 0 0 − − ⎥⎣ 4⎦
⎢ 3 3 3⎥
⎢ 4 4 20 ⎥
⎢⎢ 0 0 0 0 −2 − − ⎥
⎣ 3 3 3 ⎦⎥
⎡ x1 ⎤
⎢x ⎥
⎢ 2⎥
⎢ x3 ⎥
⎢ ⎥
x
+(0 0 −16 −16 −4 −16 −12 −6) ⎢ 4 ⎥
⎢ y1 ⎥
⎢ ⎥
⎢ y2 ⎥
⎢y ⎥
⎢ 3⎥
⎢⎣ y4 ⎥⎦
Constraints
subject to
⎡ x1 ⎤ ⎡ x1 ⎤
⎢x ⎥ ⎢x ⎥
⎢ 2⎥ ⎢ 2⎥
⎢ x3 ⎥ ⎢ x3 ⎥
⎢ ⎥ ⎢ ⎥
x4 ⎥ ⎢ x4 ⎥
(12 12 12 12 0 0 0 0) ⎢ = 192 (0 0 0 0 12 12 12 12) = 240
⎢ y1 ⎥ ⎢ y1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ y2 ⎥ ⎢ y2 ⎥
⎢y ⎥ ⎢y ⎥
⎢ 3⎥ ⎢ 3⎥
⎢⎣ y4 ⎥⎦ ⎢⎣ y4 ⎥⎦
Solution Method

• The constrained optimization problem is


transformed into unconstrained
optimization problem.
– objective function is modified using Lagrange
Multipliers
Solution Method

Q ' = Q + λ1 ( A1 x − b1 ) + λ2 ( A2 y − b2 )

λ1 , λ2 are Lagrange Multipliers

( A1 x − b1 ),( A2 y − b2 ) represent center-spread constraints.


Solve
Solution

• Upon equating to zero and solving


simultaneous equations, the x and y
coordinates of the cells are,
– (x1, y1) = (2.44,3.46)
– (x2, y2) = (3.01, 6.40)
– (x3, y3) = (5.68, 5.73)
– (x4, y4) = (4.87, 4.41)
• four cells are placed at these locations as point
sized objects. Upon expansion into real area, cell
overlaps are expected.
Solution

2 3

4
1
Placement Problem Formulation
• Let C = {c1 , c2 ,..., cn } be the set of n modules to
be placed.
• Let IO = {IO1 , IO2 ,..., IOm } be the set of IO modules.
– IO modules are assumed to be fixed.
• Let N = {N1 , N 2 ,..., N k } be the set of k nets in the
design.
• If two modules Ci and Cj are connected by the
net, then the term [( xi − x j )2 + ( yi − y j )2 ] is introduced
in the objective function.
Placement Problem Formulation
• The objective function consists of terms
contributed by each net and can be written as,
k
Q=∑ ∑ {( xi − x j ) + ( yi − y j ) }
2 2

t =1 Ci ,C j ∈Nt ,i ≠ j

• A weighting factor ft is added to suitably counter


for effect of multi-terminal nets. Thus,
k ⎛ ⎞
Q = ∑ ft ⎜ ∑ {( xi − x j ) + ( yi − y j ) }⎟
2 2
⎜ C ,C ∈N ,i ≠ j ⎟
t =1 ⎝ i j t ⎠
Placement Problem Formulation
• Let a1 , a2 ,..., an be the area of modules. If xc is
the coordinate of the center of the layout, then
center spread constraint can be stated as,
n

∑a x
i =1
i i = Axc

where,
n

∑a
i =1
i =A
Problem Statement
Minimize
k ⎛ ⎞
Q=∑ ft ⎜ ∑ {( xi − x j ) + ( yi − y j ) }⎟
2 2
⎜ C ,C ∈N ,i ≠ j ⎟
t =1 ⎝ i j t ⎠

subject to
n

∑a x
i =1
i i = Axc
Problem Statement
Minimize
1 T
Q = X CX + p X
T

2
subject to

MX = r
Solution Method
• The linearly constrained QP can be solved by
first transformation to unconstrained QP with
modified objective function,

Q ' = Q − Λ(MX − r )
• where Λ is a vector of Lagrange multipliers.
– minimizing Q’ also minimizes Q.
• The minimum Q’ is obtained by equating the
first-order partial derivative with respect to
eachλ ∈ Λ and each x ∈ X to zero. This yields a
system of linear equations
AX = b
Generating Feasible Layouts

• QP based execution results in infeasible


layouts due to severe module overlapping.
• Module spreading is required to minimize
overlaps.
• After first phase of QP, a geometric layout
is partitioned.
– KL, FM, SA based methods are candidates for
partitioning.
Level 0 Placement
Level 1 Partitioning

• Perform level 1 partitioning


– Obtain center locations for center-of-gravity
constraints
Level 1 Placement
Level 2 Partitioning

• Add two more cut-lines


Summary

• Center-of-gravity constraint
– Helps spread the cells evenly while monitoring
wirelength

Вам также может понравиться