Вы находитесь на странице: 1из 14

Partitioning Overview

System functionality is implemented on


system components

System Partitioning

Two design tasks:

Siamak Mohammadi
University of Tehran

ASICs, processors, memories, buses


Allocate system components or ASIC constraints
Partition functionality among components

Constraints

Cost, performance, size, power

Partitioning is a central system design task

S. Mohammadi

Outline

Structural vs. Functional Partitioning

Structural vs. functional partitioning


Natural vs. executable language
specifications
Basic partitioning issues and algorithms
Functional partitioning techniques for
hardware
Hardware/software partitioning
Functional partitioning techniques for
software
Exploring tradeoffs with functional partitioning

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

HW/SW Codesign of Embedded Systems


Tehran University

Structural: Implement specification, then


partition
Functional: Partition function, then implement

Enables better size/performance tradeoffs


Uses fewer objects, better for algorithms
Permits hardware/software solutions
But, its harder than graph partitioning

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Natural vs. Executable Language Specification

Basic Partitioning Issues

Alternative methods for specifying


functionality
Natural languages common in practice
Executable languages becoming popular

Automated estimation/partitioning explores


solutions
Early verification reduces costly late changes
Precision eases integration

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Basic Partitioning Issues

S. Mohammadi

Metrics and estimations: "good" partition attributes

Granularity: specification size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

Combines multiple metric values


Closeness used for grouping before complete partition
Weighted sum common

Algorithms: control strategies seeking best partition

Output: format and uses


e.g. new specification, hints to synthesis tool

e.g. cost, speed, power, size, pins, testability, reliability


Estimates derived from quick, rough implementation
Speed and accuracy are competing goals of estimation

Objective and closeness functions

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

HW/SW Codesign of Embedded Systems


Tehran University

Basic Partitioning Issues

Specification-abstraction level: input definition


Just indicating the language is insufficient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Constructive creates partition


Iterative improves partition
Key is to escape local minimum

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Basic Partitioning Issues

Specification Abstraction Levels

Specification-abstraction level: input definition


Just indicating the language is insufficient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

Task-level dataflow graph

Task

Arithmetic-level dataflow graph

Granularity: specification size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

HW/SW Codesign of Embedded Systems


Tehran University

Structure

Basic Partitioning Issues

A finite state machine, with possibly complex expressions being computed in


a state or during a transition

Register transfers

Output: format and uses


e.g. new specification, hints to synthesis tool

S. Mohammadi

A Dataflow graph of arithmetic operations along with some control operations


The most common model used in the partitioning techniques

Each task is described as a sequential program

Finite state machine (FSM) with datapath

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

A Dataflow graph where each operation represents a task

The transfers between registers for each machine state are described
A structural interconnection of physical components
Often called a netlist

Granularity: specification size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

The granularity of the decomposition is a measure of


the size of the specification in each object
The specification is first decomposed into functional
objects, which are then partitioned among system
components

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

Coarse granularity means that each object contains a large


amount of the specification.
Fine granularity means that each object contains only a
small amount of the specification

Many more objects


More possible partitions

Output: format and uses


e.g. new specification, hints to synthesis tool
HW/SW Codesign of Embedded Systems
Tehran University

10

Granularity Issues in Partitioning

Specification-abstraction level: input definition


Just indicating the language is insufficient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

S. Mohammadi

11

S. Mohammadi

Better optimizations can be achieved

HW/SW Codesign of Embedded Systems


Tehran University

12

Basic Partitioning Issues

Specification-abstraction level: input definition


Just indicating the language is insufficient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

Granularity: specification size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

System Component Allocation

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

The process of choosing system component types


from among those allowed, and selecting a number
of each to use in a given design
The set of selected components is called an
allocation

Output: format and uses


e.g. new specification, hints to synthesis tool

A partitioning technique must designate the types of


system components to which functional objects can
be mapped

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

13

Basic Partitioning Issues

Specification-abstraction level: input definition


Just indicating the language is insufficient
Abstraction-level indicates amount of design already done
e.g. task DFG, tasks, CDFG, FSMD

Granularity: specification size in each object


Fine granularity yields more possible designs
Coarse granularity better for computation, designer interaction
e.g. tasks, procedures, statement blocks, statements

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

Output: format and uses


e.g. new specification, hints to synthesis tool

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Various allocations can be used to implement a


specification, each differing primarily in monetary cost and
performance
Allocation is typically done manually or in conjunction with
a partitioning algorithm

ASICs, memories, etc.

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

14

Output Issue in Partitioning

Any partitioning technique must define the


representation format and potential use of its
output

E.g., the format may be a list indicating which


functional object is mapped to which system
component
E.g., the output may be a revised specification

15

S. Mohammadi

Containing structural objects for the system components


Defining a components functionality using the functional
objects mapped to it

HW/SW Codesign of Embedded Systems


Tehran University

16

Basic Partitioning Issues

Metrics and estimations: "good" partition attributes

e.g. cost, speed, power, size, pins, testability, reliability


Estimates derived from quick, rough implementation
Speed and accuracy are competing goals of estimation

A technique must define the attributes of a partition


that determine its quality

Such attributes are called metrics

Objective and closeness functions

Metrics and Estimation Issues

Combines multiple metric values


Closeness used for grouping before complete partition
Weighted sum common

Need to compute a metrics value

Algorithms: control strategies seeking best partition

Constructive creates partition


Iterative improves partition
Key is to escape local minimum

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

17

Examples include monetary cost, execution time,


communication bit-rates, power consumption, area, pins,
testability, reliability, program size, data size, and memory size
Closeness metrics are used to predict the benefit of grouping
any two objects

Because all metrics are defined in terms of the structure (or


software) that implements the functional objects, it is
difficult to compute costs as no such implementation exists
during partitioning

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Metrics in HW/SW Partitioning

Computation of Metrics

Two key metrics are used in hardware/software


partitioning

Two approaches to computing metrics

Creating a detailed implementation

Performance: Generally improved by moving


objects to hardware

Hardware size: Hardware size is generally


improved by moving objects out of hardware

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

19

Produces accurate metric values


Impractical as it requires too much time

Creating a rough implementation

18

S. Mohammadi

Includes the major register transfer components of a


design
Skips details such as precise routing or optimized logic,
which require much design time
Determining metric values from a rough implementation
is called estimation

HW/SW Codesign of Embedded Systems


Tehran University

20

Basic Partitioning Issues

e.g. cost, speed, power, size, pins, testability, reliability


Estimates derived from quick, rough implementation
Speed and accuracy are competing goals of estimation

Objective and closeness functions

Multiple metrics, such as cost, power, and


performance are weighed against one another

Metrics and estimations: "good" partition attributes

Objective and Closeness Functions

Combines multiple metric values


Closeness used for grouping before complete partition
Weighted sum common

Algorithms: control strategies seeking best partition

HW/SW Codesign of Embedded Systems


Tehran University

21

Basic Partitioning Issues

e.g. cost, speed, power, size, pins, testability, reliability


Estimates derived from quick, rough implementation
Speed and accuracy are competing goals of estimation

Objective and closeness functions

Combines multiple metric values


Closeness used for grouping before complete partition
Weighted sum common

Algorithms: control strategies seeking best partition

S. Mohammadi

Constructive creates partition


Iterative improves partition
Key is to escape local minimum

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

e.g Objfct = k1 * F(area, area_constr)


+ k2 * F(delay, delay_constr)
+ k3 * F(power, power_constr)
HW/SW Codesign of Embedded Systems
Tehran University

22

Partitioning Algorithm Issue

Metrics and estimations: "good" partition attributes

e.g., Objfct = k1 * area + k2 * delay + k3 * power

Because constraints always exist on each design, they


must be taken into account

Constructive creates partition


Iterative improves partition
Key is to escape local minimum

S. Mohammadi

An expression combining multiple metric values into a


single value that defines the quality of a partition is called
an Objective Function
The value returned by such a function is called cost
Because many metrics may be of varying importance, a
weighted sum objective function is used

23

Given a set of functional objects and a set of system


components, a partitioning algorithm searches for
the best partition, which is the one with the lowest
cost, as computed by an objective function
While the best partition can be found through
exhaustive search, this method is impractical
because of the inordinate amount of computation
and time required
The essence of a partitioning algorithm is the
manner in which it chooses the subset of all possible
partitions to examine

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

24

Partitioning Algorithm Classes

computation time in an
iterative algorithm is spent
evaluating large numbers of
partitions
Iterative algorithms differ from
one another primarily in the
ways in which they modify the
partition and in which they
accept or reject bad
modifications
The goal is to find global
minimum while performing as
little computation as possible
(A: local minimum)

Group objects into a complete partition


Use closeness metrics to group objects, hoping for a good
partition
Spend computation time constructing a small number of
partitions

Iterative algorithms

The

Constructive algorithms

Iterative Partitioning Algorithm

Modify a complete partition in the hope that such


modifications will improve the partition
Use an objective function to evaluate each partition
Yield more accurate evaluations than closeness functions
used by constructive algorithms

In practice, a combination of constructive and


iterative algorithms is often employed

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

25

Typical Partitioning-System Configuration

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

26

Basic Partitioning Algorithms

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

27

Clustering and multi-stage clustering


Group migration (a.k.a. min-cut or
Kernighan/Lin)
Ratio cut
Simulated annealing
Genetic evolution
Integer linear programming
Tabu Search

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

28

Hierarchical Clustering

Constructive algorithm using closeness


metrics
Overview

Hierarchical Clustering Algorithm

Groups closest objects


Recomputes closenesses
Repeats until termination condition met

Cluster tree maintains history of merges

Cutline across the tree defines a partition

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

29

Hierarchical Clustering Example

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

30

Simulated Annealing
Iterative algorithm modeled after physical annealing
process

Overview

Starts with initial partition and temperature


Slowly decreases temperature
For each temperature, generates random moves
Accepts any move that improves cost
Accepts some bad moves, less likely at low temperatures

Results and complexity depend on temperature


decrease rate
S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

31

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

32

Functional partitioning for hardware: BUD

Simulated Annealing Algorithm

Goal: incorporate area/time into synthesis

Clusters CDFG operations into datapath modules

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

33

BUD Example

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

BUD takes a single process CDFG as input, and partitions


the arithmetic operations into groups

Closeness metrics:

Provide accurate estimations of area and delay to guide


high level synthesis decisions

Interconnecting wires
Concurrency
Shared hardware

Each cluster is scheduled and allocated


Selects clustering with best area/time

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

34

BUD Example

35

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

36

Functional Partitioning for Hardware: Aparty

Extends BUD clustering to multiple stages


Different closeness metrics for each stage

Closeness metrics:

Control transfer reduction


Data transfer reduction
Hardware sharing

HW/SW Codesign of Embedded Systems


Tehran University

S. Mohammadi

37

Hardware Software Partitioning

38

Allow moves in directions increasing cost (retracing)

Through use of stochastic functions

Can escape local minima


E.g., simulated annealing

The process of deciding, for each subsystem, whether the


required functionality is more advantageously implemented
in hardware or software

Goal

Only accept moves that decrease cost


Can get trapped in local minima

HW/SW Codesign of Embedded Systems


Tehran University

Definition

Hill climbing [EHB94]

Greedy [GD92]

S. Mohammadi

Hardware Software Partitioning

Combined hardware/software systems are common


Software is cheap, modifiable, and quick to design
Hardware is fast
Special algorithms are needed to favor software
Proposed algorithms

Aparty Example

Partitioning the system into dedicated hardware


components and software components executing on
Instruction Set Processors is a vital part of the codesign
process
To achieve a partition that will give us the required
performance within the overall system requirements (in size,
weight, power, cost, etc.)

Binary-constraint search with hill climbing [VGG93]

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

39

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

40

HW/SW Partitioning Issues

Partitioning into hardware and software affects


overall system cost and performance

Hardware implementation

Provides higher performance via hardware speeds and


parallel execution of operations

Incurs additional expense of fabricating ASICs

Partitioning Approaches

Start with all functionality in software and


move portions into hardware which are timecritical and cannot be allocated to software
(software-oriented partitioning)

Start with all functionality in hardware and


move portions into software implementation
(hardware-oriented partitioning)

Software implementation

May run on high-performance processors at low cost (due


to high-volume production)

Incurs high cost of developing and maintaining (complex)


software

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

41

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

42

System Partitioning (Functional)


For example Cosyma, a cosynthesis approach to design, starts with all functions
generated in software and then moves operations to hardware only as time
constraints are violated.

A team at the University of Braunschweig, Germany, explored codesign tradeoffs in


systems that were originally implemented in software (written in C)

A partitioning program identified the computational bottlenecks and migrated the


corresponding functions to application-specific hardware

With system-level partitioning, a critical loop which took up 90% of the software
execution time for a HDTV Chromakey algorithm was implemented in hardware (as a
17,000 gate ASIC) leading to a 3X speedup

A team at Stanford University explored migrating hardware components to software


routines

Identifying non-critical tasks which can be migrated from hardware to software


implementations lead to significant size and cost reductions without reducing
performance

A system model which specified performance requirements in terms of latency and


data-rate constraints was used to support the partitioning

A 20% reduction in gate count was achieved

System partitioning in the context of


hardware/software codesign is also referred to as
functional partitioning
Partitioning functional objects among system
components is done as follows

The systems functionality is described as collection of


indivisible functional objects
Each system components functionality is implemented in
either hardware or software

An important advantage of functional partitioning is


that it allows hardware/software solutions

[DeMicheli94]

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

43

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

44

Functional Partitioning for Systems

Vulcan I

Partitions operations among hardware/software


Architecture: processor, hardware, memory, bus
All communication through memory
Uses greedy algorithm, extracts behaviors from hardware

HW/SW Codesign of Embedded Systems


Tehran University

45

Functional Partitioning for Systems

Behaviors to processors/ASICs
Variables to memories
Communication channels to buses

Uses fast incremental-update estimators


Covers both hardware and
hardware/software partitioning

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

HW/SW Codesign of Embedded Systems


Tehran University

46

Binding Software to Hardware

Examine control flow and data flow within an architecture to


determine computationally expensive parts which are better
realized in hardware

S. Mohammadi

Solves three partitioning problems

Used when the model is not fully specified


Based on the analysis of similar systems and certain
design parameters

Profiling techniques

Partitions statement blocks among hardware/software


Architecture: processor, hardware, memory, bus
Simulated annealing, extracts behaviors from software

S. Mohammadi

Can be used only with a fully specified model with all data
dependencies removed and all component costs known
Result in very good partitions

Statistical estimation techniques

Cosyma

Deterministic estimation techniques

Partitions CDFG operations among hardware only


Group migration and simulated annealing algorithms

Vulcan II

Partitioning Metrics

Binding: assigning software to hardware


components
After parallel implementation of assigned
modules, all design threads are joined for
system integration

47

Early binding commits a design process to a


certain course
Late binding, on the other hand, provides greater
flexibility for last minute changes

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

48

HW/SW System Architecture Trends

Some operations in special-purpose hardware

Generally take the form of a coprocessor communicating


with the CPU over its bus

Computation must be long enough to compensate for the


communication overhead

May be implemented totally in hardware to avoid instruction


interpretation overhead

HW/SW Partition Formal Definition

Utilize high-level synthesis algorithms to generate a register


transfer implementation from a behavior description

HW/SW Codesign of Embedded Systems


Tehran University

49

A performance satisfying partition is one for


which performance (Cj.G) Cj.timecon, for all
j=1...m
Given O and Cons, the hardware/software
partitioning problem is to find a performance
satisfying partition {H,S} such that Hsize(H) is
minimized
The all-hardware size of O is defined as the
size of an all hardware partition (i.e.,
Hsize(O))

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Hsize(H) is the size of the hardware needed to implement


the functions in H (e.g., number of transistors)
Performance(G) is the total execution time for the group of
functions in G for a given partition {H,S}
Set of performance constraints, Cons = (C1, ... Cm), where
Cj = {G, timecon}, indicates the maximum execution time
allowed for all the functions in group G and G O

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

50

Exploring Tradeoffs with Functional Partitioning

Performance Satisfying Partition

Associated metrics:

Partitioning algorithms are closely related to the


process scheduling model used for the software side
of the implementation

S. Mohammadi

A hardware/software partition is defined using two


sets H and S, where H O, S O, H S = O,
HS=

51

Each

line represents a
different vendors chip set

Each

point represents an
allocation and partition

Many

designs quickly
examined

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

52

Summary

Future Directions

Partitioning heavily influences design quality

Functional partitioning is necessary

Executable specification enables:

Automation
Exploration
Documentation

Metrics from real design to guide partitioning


Comparison of functional partitioning
algorithms
Impact of metric selections and orderings
Impact of granularity on partition quality
Exploitation of regularity in partitioning

Variety of algorithms exist


Variety of techniques exist for different applications

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

53

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

Tabu Search

Tabu Search

Tabu search allows the search to explore solutions


that do not decrease the objective function value
only in those cases where these solutions are not
forbidden.
This is usually obtained by keeping track of the last
solutions in term of the action used to transform one
solution to the next.
When an action is performed it is considered tabu for
the next T iterations, where T is the tabu status
length.
A solution is forbidden if it is obtained by applying a
tabu action to the current solution.

Basic Tabu Search Algorithm k := 1.


generate initial solution
WHILE the stopping condition is not met DO
Identify N(s). (Neighbourhood set)
Identify T(s,k). (Tabu set)
Identify A(s,k). (Aspirant set)
Choose the best s N(s,k) = {N(s) T(s,k)}+A(s,k).
Memorize s if it improves the previous best
known solution
s := s.
k := k+1.
END WHILE

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

55

S. Mohammadi

HW/SW Codesign of Embedded Systems


Tehran University

54

56

Вам также может понравиться