Temporal Resource Management for Dynamic FPGA Placement

Temporal Placement
Wasted Resources
Wasted resource wr(vi) of a node vi: Unused area occupied by the node vi during the computation of a partition.
wr(vi) = (t(Pi)ti)ai, t(Pi): run-time of partition Pi. ti: run-time of the component vi ai: area of vi
Wasted resource avoidance: A component is placed on the chip only when its computation is required and remains on the device only for time it is active.
Idle components can be replaced by new ones.
Assumptions: There is partial reconfiguration support. Blocks are hard

For soft library modules, temporal partitioning + 2d placement may suffice.
Temporal Placement
Temporal placement: Time-dependent placement Management of task execution at run-time Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is allowed:
If not at the same time.
A horizontal cut:
RFU configuration at a given time
Temporal Placement
Off-line (compile time)
Pre-defined placement sequence In DSP applications, program flow can be predicted.
On-line (during execution) Computation sequence is not known in advance.

Dynamically at run time.
Needs to solve computational intensive problems in a fraction of millisecond:

Efficient management of the free space Selection of the best site for a new component Management of communication
Off-Line Temporal Placement

Off-line temporal placement: Given a DFG G = (V,E) and a reconfigurable device H (with length Hx and width Hy), a temporal placement is a 3-dimensional vector function: p = (px, py, pt) : V R3, pt defines a feasible schedule (pt(vi): Starting time of vi.) px(vi), py(vi): Coordinates of vi on H,
[Bazargan]
6

Algorithms: 1. First fit 2. Best fit 3. Packing 4. Simulated annealing (KAMER)
First/Best Fit Algorithm

First fit: 1. Select a node among ready nodes.
i.e. nodes whose predecessors have been placed
1. Place it in the first free location. Advantages: Fast (linear w.r.t. number of free locations) Disadvantages: Unused resources very high Fragmentation
First/Best Fit Algorithm

Best fit: 1. Select a node among ready nodes. 2. Place it in the best free location. Advantages: Better area utilization Disadvantages: Much slower:
CLBs
Complete list of free locations must be searched.
Frames
Fragmentation Simplifying the problem: Restricting the modules to columns (Virtex II) or part of a column (Virtex 4 / Virtex 5 / Virtex 6).
First Fit Algorithm

Algorithm: First-fit temporal placement of clusters
1. While all the clusters are not placed do 1.1. Select one cluster Cact from the list of ready-to-run clusters 1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that Ctop Cact 1.3. Place the cluster Cact on top of Ctop 2. end while
10
First Fit Algorithm

Configuration sequence produced: a) {0, 1, 2, 3, 4} b) {5, 1, 2, 3, 4} c) {5, 1 , 6, 7, 4} d) {5, 1 , 6, 7, 8} e) {5, 9 , 6, 7, 8} f) {10, 9 , 6, 7, 8}
11
ILP
ILP: Can construct a constraint set An objective function Advantage: Exact solution Limitation: Only for small size problems.
12
Packing Approach
1. Variations of Packing Problem: Base Minimization Problem (BMP): Given a set of boxes B and a height H, find a container with minimal size (x, y, H) that can accommodate the set of boxes B i.e. in temporal placement:
Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H.
1.
Strip Packing Problem (SPP): Given a set of boxes and a base (X, Y), find the minimum height h container with size (X, Y, h) that can hold all the boxes. i.e. in temporal
Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y).
The device is usually fixed SPP is more common.
13
KAMER
KAMER Model of an RCS

Larger picture: An RFU operation ri can be either accepted or rejected
based on availability of RFU real-estate
If rejected, run on CPU

running time penalty
Assumption: No communications among RFUs After running an operation on RFU, results are saved in CPU registers
15
KAMER Model of an RCS
On-line temporal placement

RFUOPS = {r1, , rn| ri = (w i, hi, si, ei)} ei - si : time span the operation is resident in the system Placement of RFUOPS: Modeled as 3D template placement Empty Rectangle (ER): A rectangle that does not overlap a placed module on the chip Maximum Empty Rectangle (MER): An ER not included in any other rectangle than the device bounding box
16
Example
Non-ER: (E,F) ER: (E,D): MER (A,D): MER (E,C): not MER KAMER method: keeping all maximum empty rectangles:
Permanently keeps track of all MERs Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v.
Possible to have many MERs in which v can fit Strategies: first-fit or best-fit
17
KAMER
Once a rectangle is chosen:
Candidate points are those that do not allow an overlap with the external part of the rectangle.
Problem: Number of empty rectangles does not grow linear with the number of components included
18
KAMER
Run-time of free rectangle placement: O(n2) Heuristic: Keep only the non-overlapping empty rectangles
Linear time, lower quality
Problem 1: Several non-overlapping rectangle representations
19
Keeping Non-Overlapping ERs

Problem 2: Non-overlapping empty rectangles are not necessarily maximal
A module may exists that could fit onto the device, but cannot be placed
Can fit
Cant fit
(bad decomposition)
20

Problem 2:
Cant fit
Can fit
21

Incremental update: Whenever a new component v1 is placed in a rectangle, two possibilities:
Horizontal splitting Vertical splitting
Negative impact on next components
22

Horizontal
Cant fit
Vertical Solution: Delaying the split decision for a number of steps later
Can fit
23

KAMER always finds a rectangle to place a new component, if one exists. But position to place the component within the rectangle must be selected from a set of points
whose number is the area of the rectangle in worst case
In [Bazargan00]: Bottom-left position

Note: No communication is assumed between the rectangle to be placed and those already placed.
If communication exists: An optimal algorithm should consider any single point Solution: [Ahmadinia04]
24
KAMER Off-Line Placement

Uses simulated annealing in a 3D space: 1. Places X% largest modules
X is a parameter of the algorithm Idea: Remove small modules
Open space for larger ones Small modules fragment more
Initial placement: from KAMER- best fit Perturb:

Can displace a module on the RFU
No change in start or end time
Can move an operation from CPU to RFU and vice versa
2. Use zero-temperature SA for fine tuning

Place as many (100-X)% modules as it can
25
Packing Classes
Interval Graph: Given a DFG G = (V,E), a reconfigurable device H and a packing of V into H, (i.e. a 3-D placement of the components of V on the device H), an interval graph of G is a graph Gi = (V,Ei), i {1, 2, 3} such that: (vk, vl) Ei the projections of the nodes vk and vl overlap in the i-th dimension. Complement Graph: Complement of interval graph: Gi = (V,Ei)
26
Packing Class
Packing class: A d-tuple of interval graphs with the following properties:

C1: Each independent set S
Gi is i-feasible i {1,2,3} (wi(S) = vS wi(v) hi) (each set of boxes must fit in the container in the i-th direction)
Independent Set S: v,w S, v overlaps with w in dimension (3-i),
i{1,2}
C2: i=1...d Ei =
. There must be at least one dimension in which the boxes do not overlap.
The Gi are called component graphs of E = {E1, E2, E3}.
wi = length of vi in i-th direction

27
Packing Class
28
Packing Class
Invalid two-dimensional packing:
29
Packing Class
Theorem: A d-tuple of graphs Gi(Vi,Ei) corresponds to a feasible packing, if and only if it is a packing class. Comparability Graph: Given a DFG G = (V,E), a reconfigurable device H and a packing of V into H, The comparability graph of an interval graph Gi = (V,Ei), i {1, 2} is the directed graph Gi = (V,Ei), where (vk, vl) Ei vl is placed after vk in 3-rd dimension, i.e. (pt(vk) + tk pt(vl)).
The relation place after defined by the comparability graph is a transitive relation also known as transitive orientation that can be used for orienting the packing classes.
30
Packing Class Orientation

Packing Class Orientation: Given G(V, E), H and packing of V on H, orientation of the packing class corresponding to the packing is defined by constructing the comparability graph of the interval graph in the time dimension.
31
32
Algorithm
Can check if a given arrangement of nodes forms a valid packing with some precedence constraints fulfilled. Build arrangements on which we can apply the formulas to check whether the arrangement is a solution. If we have a set of available solutions, we need to extract the optimal one. Arrangements can be constructed by arbitrarily fixing/changing the edges of the different graphs as well as their orientations.
solution space: The set of all possible arrangements that can be built. Can be extremely large.
33
Algorithm
Incremental approach: Constructs a tree from the root to the leaf: Root consists of the set of available nodes. Inserts edges to the already existing graph Two different insertions of edges on the same node two different branches in the tree. In each step: A new edge is included into the graphs, The validity of the resulting placement is checked. If the resulting placement not valid,
then all possible arrangement in which the introduced edges exist are discarded from the search. Otherwise, a new edge is added to the constructed graph.
Whenever a branch does not lead to valid leaf, the complete subtree resulting from the introduction of a new branch is discarded.
34
References
[Bobda07] C. Bobda, Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications, Springer, 2007. [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation" MorganKaufmann, 2007 [Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems, Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 5262. [Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000. [Bazargan] Bazargan, Fast Placement Methods for Reconfigurable Computing Systems, http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/ [Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, A new approach for on-line placement on reconfigurable devices, in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.
35

Temporal Resource Management for Dynamic FPGA Placement

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Temporal Resource Management for Dynamic FPGA Placement

Загружено:

Авторское право:

Доступные форматы

Temporal Placement

Assumptions: There is partial reconfiguration support. Blocks are hard

RFU configuration at a given time

On-line (during execution) Computation sequence is not known in advance.

Needs to solve computational intensive problems in a fraction of millisecond:

Off-Line Temporal Placement

Off-Line Temporal Placement

Off-Line Temporal Placement

First/Best Fit Algorithm

First/Best Fit Algorithm

Complete list of free locations must be searched.

First Fit Algorithm

First Fit Algorithm

The device is usually fixed SPP is more common.

KAMER Model of an RCS

If rejected, run on CPU

KAMER Model of an RCS

On-line temporal placement

Problem 1: Several non-overlapping rectangle representations

Keeping Non-Overlapping ERs

Keeping Non-Overlapping ERs

Keeping Non-Overlapping ERs

Negative impact on next components

Keeping Non-Overlapping ERs

Keeping Non-Overlapping ERs

In [Bazargan00]: Bottom-left position

KAMER Off-Line Placement

Initial placement: from KAMER- best fit Perturb:

Can move an operation from CPU to RFU and vice versa

2. Use zero-temperature SA for fine tuning

Packing class: A d-tuple of interval graphs with the following properties:

The Gi are called component graphs of E = {E1, E2, E3}.

wi = length of vi in i-th direction

Packing Class Orientation

Вам также может понравиться