HARDWARE MODELING LANGUAGES • Hardware description languages are primarily motivated by the need of specifying circuits. Several HDLs exist, with different features and goals • some evolved from programming languages, like AHPL (A Hardware Programming Language), which was based upon APL(A Programming Language) and VHDL (VHSIC Hardware Description Language), which was derived from ADA • Ada is a structured, statically typed, imperative, and object-oriented high- level computer programming language, extended from Pascal and other languages • specific nature of hardware circuits, fairly different from the commonly used software programming languages. Circuit models, synthesis and optimization: a simplified view Difference between Hardware and Software Hardware Language Software Language Hardware circuits can execute operations with a wide Software programs are most commonly executed on degree of concurrency uni-processors and hence operations are serialized It is closer to programming language for parallel computers, hardware circuits entail some sutural information interface of a circuit with the environment may require the definition of the input output ports, data format for input output HDLs must support both behavioral and structural views, to be used efficiently for circuit specification detailed timing of the operations is very important in On the other hand, the specific execution time frames hardware, because of interface requirements of the operations in software programs are of less concern • Circuits can be modeled under different views, and consequently HDLs • Architectural and logic level modeling, behaviorul or structural views • Some • languages support combined views, thus allowing a designer to specify implementation details for desired parts • Synthesis tools support the computer-aided transformation of behavioral models into structural ones, at both the architectural and logic levels • HDLs serve also the purpose of exchange formats among tools and designers. • Circuit models require validation by simulation or verification methods. Synthesis methods use HDL models as a starting point. As a result, several goals must be fulfilled by HDLs. • The multiple goals of HDLs cannot be achieved by programming languages applied to hardware specification. Standard programming languages have been used • for functional modeling of processors that can be validated by compiling and executing the models. • The enhancement-of the C programming language for simulation and synthesis has led to new HDL languages, such as ESlM • We use the term HDL model as the counterpart of program in software Distinctive Features of Hardware Languages • A language can be characterized by its syntax, semuntics and pragmatics. • The syntax relates to the language structure and it can be specified by a grammar. • The semantics relates to the meaning of a language. The semantic rules associate actions to the language fragments that satisfy the syntax rules • The pragmatics relate to the other aspects of the language, including implementation issues • Languages can be broadly classified as procedural and declarative languages • Procedural programs specify the desired action by describing a sequence of steps whose order of execution matters • declarative models express the problem to be solved by a set of declarations without detailing a solution method • Alternative classifications of languages exist. Languages with an imperative semantics • where there is an underlying dependence between the assignments and the values that variables • Languages with an applicative semantics are those based on function invocation. • Languages for hardware specification are classified on the basis of the description view that they support (e.g., physical, structural pr behavioral) • Most HDLs support both structural and behavioral views, because circuit specifications often require both • Structural Hardware Languages • Models in structural languages describe an interconnection of components. Hence their expressive power is similar to that of circuit schematics • Hierarchy is often used to make the description modular and compact. VHDL language, using its structural modeling capability Behavioral Hardware Languages • consider behavioral modeling for circuits in increasing levels of complexity. Combinational logic circuits can be described by a set of ports (inputs &outputs) equations that relate variables to logic expressions. • The declarative paradigm applies best to combinational circuits, which are by definition memoryless • These models differ from structural models in that there is not a one-to- one correspondence between expressions and logic gates, because for some expressions there may not exist a single gate implementing it. • Procedural languages can be used to dcscribe comhinational logic circuits.Most procedural hardware languages, except for the early ones, allow for multiple assignments to variables. structures and logic networks • Structural representations can be modeled in terms of incidence structures. • An incidence structure consists of a set of modules, a set of nets and an incidence relation among modules and nets • A simple model for the structure is a hypergraph, where the vertices correspond to the modules and the edges to the net • The incidence relation is then represented by the corresponding incidence matrix • Note that a hypergraph is equivalent to a bipartite graph, where the two sets of vertices correspond to the modules and nets. • Consider the example of Figure • The corresponding hypergraph and biparite graphs are shown in Figures 3.5 (b) and (c), respectively. A module-oriented netlist is the following • ml: nl,n2,n3 • m2: nl,n2 • m3: n2. n3 • Incidence structures can be made hierarchical in the following way. A leaf module is a primitive with a set of pins. A non-leaf module is a set of modules, called its submodules, a set of nets and an incidence structure relating the nets to the pins of the module itself and to those of the submodules. • module m2 has submodule • which consists of submodules rn21 and 17122, subueu n21 and n22 and internal pins • p21, p22, p23. p24 and p25. Logic Networks • A generalized logic network is a structure, where each leaf module is associated with a • combinational or sequential logic function. While this-concept is general and powerful, • consider here two restrictions to this model: the combinational logic network and the synchronous logic network. • The combinational logic network, called also logic network or Boolean network, is a hierarchical structure where: • Each leaf module is associated with a multiple-input, single-output combinational logic function, called a local function. • Pins are partitioned into two classes, called input and outputs. Pins that do not belong to submodules are also partitioned into two classes, called primary inputs and primary output. • Each net has a distinguished terminal, called a source, and an orientation from the source to the other terminals. The source of a net can be either a primary input or a primary output of a module at the inferior level. • In general, a logic network is a hybrid structural/behavioral representation, because the incidence structure provides a structure while The logic functions denote the terminal behavior of the leaf modules • In most cases, logic networks are used to represent multiple-input output logic functions in a structured way. Indeed, logic networks have a corresponding unique input output combinational logic function State Diagrams • The behavioral view of sequential circuits at the logic level can be expressed by finite-state machine transition diagrams • A set of primary input panems, X. • A set of primary output patterns, Y. • A set of states, S. • A state transition function, S : X x S S. • An output function, A : X x S t Y for Mealy models or A : S Y for Moore models. An initial state. • The state transition table is a tabulation of the state transition and output function. Its corresponding graph-based representation is the stare transition diagram. • The state transition diagram is a labeled directed multi-graph Gt (V, E), • the vertex set V is in one-to-one correspondence with the state set S and the directed • edge set E is in one-to-one correspondence with the transitions specified by Del • In the Mealy model, such an edge is labeled byx/A(x, s;). In the Moore model, that edge is labeled by x only Data-flow and Sequencing Graphs • We describe here a Mealy-type finite-state machine that acts as a synchronizer • between two signals. The primary inputs are a and b, plus the reset signal r. • There is one primary output o that is asserted when both a and b are simultaneously true or • when one is true and the other was true at some previous time. The finite-state • machine has four states. A reset state So. A state memorizing that a was true while b • was false. called S1, and a similar one for b. called s2. Finally, a state corresponding to • both a and b being, or having been, true, called s3. The state transitions and the output • function are annotated on the diagram • culling vertex of • a diagram at the higher level in the hierarchy. Each transition to a calling vertex is equivalent to a transition into the entry state of the corresponding finite- state machine diagram. A transition into an exit state corresponds to return to the calling vertex. A hierarchical state transition diagram • A hierarchical state transition diagram is shown in Figure 3.10. There • are two levels in the diagram: the top level has three states, the other four. A transition • into calling state so, is equivalent to a transition to the envy state of the lower level of • the hierarchy, i.e., into s,o. Transitions into s,, correspond to a transition back to .sol. In • simple words, the dotted edges of the diagrams are traversed immediately COMPILATION AND BEHAVIORAL OPTIMIZATION • explain in this section bow circuit models, described by HDL programs, can be transformed in the abstract models that will be used as a starting point for synthesis • Most hardware compilation techniques have analogues in software compilation. Since hardware synthesis followed the development of software compilers, many techniques were borrowed and adapted from the rich field of compiler design • Nevertheless, some behavioral optimization techniques are applicable only to hardware synthesis • A software compiler consists of a front end that transforms a program into an intermediate form and a back end that translates the intermediate form into the machine code for a given architecture • The front end is language dependent, and the back end varies according to the target machine. Most modem optimizing compilers improve the intermediate form, so that the optimization is neither language nor machine dependent • Similarly, a hardware compiler can be seen as consisting of a front end, an optimizer and a back end • The back end is much more complex than a software compiler, because of the requirements on timing and interface of the internal operations. • The back end exploits several techniques that go under the generic names of architectural synthesis, logic synthesis and library binding Compilation Techniques • The front end of a compiler is responsible for lexical and syntax analysis, parsing and creation of the intermediate form. • A lexical analyzer is a component of a compiler that reads the source model and produces as output a set of tokens that the parser then uses for syntax analysis. • A lexical analyzer may also perform ancillary tasks, such as stripping comments and expanding macros. • A parser receives a set of tokens. Its first task is to verify that they satisfy the syntax rules of the language. The parser has knowledge of the grammar of the language and it generates a set of parse trees • A parse tree is a tree-like representation of the syntactic structure of a language Whereas the front ends of a compiler for software and hardware are very similar, the subsequent steps may be fairly different. In particular, for hardware languages, diverse strategies are used according to their semantics and intent Optimization Techniques • Behavioral optimization is a set of semantic-preserving transformations that minimize the amount of information needed to specify the partial order of tasks. • No knowledge about the circuit implementation style is required at this stage. The latitude of applying such optimization depends on the freedom to rearrange the intermediate code. • models that are highly constrained to adhere to a time schedule or to an operator binding may benefit very Little from the following techniques. • Behavioral optimization can be implemented in different ways. It can be applied directly to the parse trees, or during the generation of the intermediate form, or even on the intermediate form itself, • For the sake of explanation, we consider here these transformations as applied to sequences of statements, i.e., as program-level transformations • Algorithms for behavioural optimization of HDL models can he classified as data-flow and control-flow oriented. DATA-FLOW-BASED TRANSFORMATIONS. • Tree-height reduction. This transformation applies to the arithmetic expression trees and strives to achieve the expression split into two-operand expressions, so that the parallelism available in hardware can be exploited at best. It can be seen as a local transformation, applied to each compound arithmetic statement Constant and variable propagation.
• Constant propagation, also called constant folding, consists of
detecting constant operands and pre-computing the value of the operation with that operand. Since the result may be again a constant, the new constant can be propagated to those operations that use it as input. • Example Consider the following fragment: • a = 0; b = a + I; c = 2 * b;. • It can be replaced by a = 0: b = I: c = 2 : . Dead code elimination. • Dead code consists of all those operations that cannot be reached, or whose result is never referenced elsewhere. Such operations are detected by data-flow analysis and removed. Obvious cases are those statements following a procedure return statement. • Example 3.4.9. • Consider the following fragment: a = x; b = x + I; c = 2 * x;. If • variable a is not referenced in the subsequent code, the first assignment can be removed. Operator strength reduction. • Operator strength reduction means reducing the cost of implementing an operator by using a simpler one. Even though in principle some notion of the hardware implementation is required, very often general Considerations apply. For example, a multiplication by 2 (or by a power of 2) can be replaced by a shift. Shifters are always faster and smaller than multipliers in many implementations. • Example 3.4.10. • Consider the following fragment: a = x2; b = 3 * x;. It can be • replacedbya=x*x: r=x<<I: b=x+t;. CONTROL-FLOW-BASED TRANSFORMATIONS. • The following transformations are typical of hardware compilers. In some cases these transformations are automated, in others they are user driven. • Model expansion. • Writing structured models by exploiting subroutines and functions is useful for two main reasons: modularity and re-usability. Modularity helps in highlighting a particular task (or set of tasks). Often, models are called only once. • Example 3.4.12. Consider the follow!ng fragment: x = a + b: y = o * b; z = foo(x,y):, where foo(p,q){t = q - p; return(t); }. Then by expanding foo, we have x=a+b; y=a*b; z=y-x:. Conditional expansion. • A conditional construct can be always transformed into a parallel construct with a test in the end. Under some circumstances this transformation can increase the performance of the circuit. For example, this happens when the conditional clause depends on some late-arriving signals • Example 3.4.13. Consider the following fragment: ? = ab: if (a) {x = b + d; } e l s e {x = bd; I. The conditional statement can be flattened to x = a(b+d)+a'bd: and by some logic manipulation, the fragment can be rewritten as y = ab; x = y +d(a+b);. Loop expansion • Loop expansion, or unrolling, applies to an iterative construct with data-independent exit conditions. The loop is replaced by as many instances of its body as the number of operations • Example 3.4.14. Consider the following fragment: . • x = 0: for (i = 1: i 5 3: i + +) • {x =x+a[i];}. The loop can be flattened tox = 0: x =x+a[1]: x =x+a[2]: x = • x + a[3]; and then transformed to .r = a[1] + a[2] + a[3] by propagation Block-level transformations • Branching and iterative constructs segment the intermediate code into basic blocks. Such blocks correspond to the sequencing graph entities. • block-level transformations that include block merging and expansions of conditionals and loops. Even though he did not consider model expansion, the extension is straightforward • Therefore, collapsing blocks may provide more parallelism and enhance the average performance. To find the optimum number of expansions to be performed, he proposed five transformations UNIT -3 Architectural Synthesis • Architectural synthesis means constructing the macroscopic structure of a digital circuit, • starting from behavioral models that can be captured by data-flow or sequencing graphs. • The outcome of architectural synthesis is both a structural view of the circuit,in particular of its data path, and a logic-level specification of its control unit. • Architectural synthesis may be performed in many ways, according to the desired circuit implementation style. Therefore a large variety of problems, algorithms and tools have been proposed that fall under the umbrella of architectural synthesis. • Circuit implementations are evaluated on the basis of the following objectives: area, cycle-time (i.e., the clock period) and latency (i.e., the number of cycles to perform all operations) as well as throughput (i.e., the computation rate) CIRCUIT SPECIFICATIONS FOR ARCHITECTURAL SYNTHESIS • Specifications for architectural synthesis include behavioral-level circuit models, details about the resources being used and constraints. Behavioral models are captured by sequencing graphs • Resources • Functional resources process data. They implement arithmetic or logic functions and can be grouped into two subclasses: • Primitive resources are sub circuits that are designed carefully once and often used. Examples are arithmetic units and some standard logic functions, such as encoders and decoders. Primitive resources can be stored in libraries • Application-specific resources are sub circuits that solve a particular subtask. An example is a sub circuit servicing a particular interrupt of a processor. In general such resources are the implementation of other HDL models. • Memory resources store data. Examples are registers and read-only and read-write memory arrays. Requirement for storage resources are implicit in the sequencing graph model • Interface resources support data transfer. Interface resources include busses that may he used as a major means of communication inside a data path. External interface resources are IO pads and interfacing circuits • The major decisions in architectural synthesis are often related to the usage of functional resources • formulating architectural synthesis and optimization problems, there is no difference between primitive and application-specific functional resources. Both types can be characterized in terms of area and performance and used as building blocks Constraints • Constraints in architectural synthesis can be classified into two major groups: interface constraints and implementation constraints • Interface constraints are additional specifications to ensure that the circuit can be embedded in a given environment. They relate to the format and timing of the I/O data transfers. • The timing separation of I/O operations can be specified by timing constraints that can ensure that a synchronous I/O operation • Implementation constraints reflect the desire of the designer to achieve a structure • with some properties. Examples are area constraints and performance constraints, e.g., cycle-time and/or latency bounds. THE FUNDAMENTAL ARCHITECTURAL SYNTHESIS PROBLEMS • now the fundamental problems in architectural synthesis and optimization. We assume that a circuit is specified by: • A sequencing graph. • A set of functional resources. fully characterized in terms of area and execution delays. • A set of constraints • now that storage is implemented by registers and interconnections by wires. Usage of internal memory arrays and busses • we shall consider next non-hierarchical graphs with operations having bounded and known execution delays and present then extensions to hierarchical models and unbounded delays • Sequencing graphs are polar and acyclic, the source and sink vertices being labeled as Vo and Vn respectively, where n = nops + 1. • Architectural synthesis and optimization consists of two stages. • First, placing the operations in time and in space, i.e., determining the time interval for their execution and their binding to resources. • Second, determining the detailed interconnections of the data path and the logic-level specifications of the control unit. • We show now that the first stage is equivalent to annotating the sequencing graph with additional information The Temporal Domain: Scheduling • We denote the execution delays of the operations by the set D = {di; i = 0, 1, . . . , n} • We assume that the delay of the source and sink vertices is zero. • We define the start time of an operation as the time at which the operation starts its execution. The start times of the operations, represented by the set T = (ti; i = 0, 1, . . . , n) • Scheduling is the task of determining the start times, subject to the precedence constraints specified by the sequencing graph • latency of a scheduled sequencing graph is denoted by A, and it is the difference between the start time of the sink and the start time of the source, i.e., A = tn - to. • A scheduled sequencing graph is a vertex-weighted sequencing graph, where each vertex is labeled by its start time. A schedule may have to satisfy timing and/or resource usage constraints. Different scheduling algorithms have been proposed, addressing unconstrained and constrained problems All operations are assumed to have unit execution delay. A scheduled sequencing graph is shown in Figure 4.3. The start time of the operations is summarized by the following table. The latency of the schedule is A = tn - to = 5 - I = 4. Consider again the sequencing graph of Figure 4.2, where all operations have unit execution delay. A schedule with a bound on the resource usage of one resource per type is the following: The scheduled sequencing graph is shown in Figure 4.4. The latency of the schedule is A=tn-to=8-1=7. Hierarchical Models • hierarchical graphs are considered, the concepts of scheduling and binding must be extended accordingly • A hierarchical schedule can be defined by associating a start time to each vertex in each graph entity. The start times are now relative to that of the source vertex in the corresponding graph entity. The start times of the link vertices denote the start times of the sources of the linked graphs • The latency computation of a hierarchical sequencing graph, with bounded delay operations, can be performed by traversing the hierarchy bottom up • The executiondelay of a model call vertex is the latency of the corresponding graph entity. • The execution delay of a branching vertex is the maximum of the latencies of the corresponding bodies. • The execution delay of an iteration vertex is the latency of its body times the maximum number of iterations. • A hierarchical binding can he defined as the ensemble of bindings of each graph entity, restricted to the operation vertices. Operations in different entities may share resources. Whereas this can be beneficial in improving the area and performance of the circuit, The Synchronization Problem • There are operations whose delay is unbounded and not known at synthesis time. Examples are external synchronization and data-dependent iteration • Scheduling unbounded-latency sequencing graphs cannot be done with traditional techniques. • Different methods can be used. The simplest one is to modify the sequencing graph by isolating the unbounded-delay operations • by splitting thegraph into bounded-latency subgraphs • these subgraphs can be scheduled. Techniques • for isolating the synchronization points and implementing them in the control unit SCHEDULING ALGORITHMS • Scheduling is a very important problem in architectural synthesis. Whereas a sequencing graph prescribes only dependencies among the operations, • the scheduling-of a sequencing graph determines the precise start time of each task • The start times must satisfy the original dependencies of the sequencing graph, which limit the amount of parallelism of the operations • Scheduling determines the concurrency of the resulting implementation, and therefore 'it affects its performance. By the same token, the maximum number of concurrent operations of any given type at any step of the schedule is a lower bound SCHEDULING ALGORITHMS • Scheduling is a very important problem in architectural synthesis. Whereas a sequencing graph prescribes only dependencies among the operations, the scheduling-of a sequencing graph determines the precise start time of each task • Scheduling determines the concurrency of the resulting implementation, and therefore 'it affects its performance. By the same token, the maximum number • concurrent operations of any given type at any step of the schedule is a lower bound on the number of required hardware resources of that type A MODEL FOR THE SCHEDULING PROBLEMS • We recall that the sequencinz graph is a polar directed acyclic • graph G,(V, E), where the vertex set V = {ui; i = 0, 1, . . . , n] is in one- to-one correspondence • with the set of operations and the edge set E = {(ui, u,); i, j = 0, 1, . . . , n] • represents dependencies. We recall also that n =Nops+1 and that we denote the source • vertex by vo and the sink by vnboth are No-Operations. Let D = (di; i = 0, I, . . . , n) • be the set of operation execution delays; Unit 5 Physical Design FLOORPLANNING • The input to floor planning is the output of system partitioning and • design entry—a netlist. • As feature sizes decrease, both average interconnect delay and average gate delay decrease— but at different rates. • because interconnect capacitance tends to a limit that is independent of scaling • Interconnect delay now dominates gate delay. Floorplanning Goals and Objectives • Floorplanning is a mapping between the logical description (the • netlist) and the physical description (the floorplan). • Goals of floorplanning: • • arrange the blocks on a chip, • • decide the location of the I/O pads, • • decide the location and number of the power pads, • • decide the type of power distribution, and • • decide the location and type of clock distribution • Objectives of floorplanning are: • • to minimize the chip area, and • • minimize delay. Measurement of Delay in Floorplanning • We don’t yet know the parasitics of the interconnect capacitance • know only the fanout (FO) of a net and the size of the block • Fan-out is a term that defines the maximum number of digital inputs that the output of a single logic gate can feed. • We estimate interconnect length from predicted-capacitance tables (wire-load tables) Predicted capacitance. Floorplanning Tools • we start with a random floorplan generated by a floorplanning tool • • flexible blocks and fixed blocks • seeding • seed cells • wildcard symbol • hard seed • soft seed • seed connectors • rat's nest • bundles • flight lines • congestion • aspect ratio • die cavity • congestion map • routability • interconnect channels • channel capacity • channel density Channel Definition • Key terms and concepts: channel definition or channel allocation • channel ordering • • slicing floorplan • cyclic constraint • switch box • merge • selective flattening • routing • order