Академический Документы
Профессиональный Документы
Культура Документы
Abstract
As an asynchronous design style becomes popular,
the request for asynchronous high-level synthesis(AHLS) tools is increasing continuously. In this
paper, a method , so called process-oriented method,
which generates distributed asynchronous control
circuits automatically in a hierarchical and systematic
manner is suggested as a part of an AHLS tool.
Experimental results show that the suggested method
is ecient in the aspects of area and performance of
derived control circuits.
1 Introduction
In the last decade, asynchronous design style has
become popular due to the potential advantages
such as no clock skew, high-performance, low-power
consumption, small EMI and ease of modular design[1].
However, since most existing CAD tools, which are
indispensable for supporting complex and large circuit
design, are targeted for synchronous system design,
there is a great need of design automation tools which
are suitable for asynchronous system design. Until
recently, most work on CAD tools for asynchronous
systems has been focused on logic synthesis for
hazard-free asynchronous control circuits from signal
transition graphs(STGs) or asynchronous nite state
machines(AFSMs)[2, 3, 4]. However, as asynchronous
design style becomes wide spread, researches in
high-level synthesis such as scheduling, allocation,
resource binding and control circuit generation are
required more and more. Though there are many
achievements in synchronous HLS, those results cannot
be directly applied to asynchronous systems due to
inherent features of asynchronous circuits such as the
absence of a global clock. In particular, scheduling
and automatic control circuit generation are steps that
DFG-Unit 1
0
DFG-Unit 1
Control
Node
if
Conditional
Node
CDFG-Unit 2
while
0
CDFG-Unit 2
while
Conditional
Node
DFG-Unit 3
endif
adder : 1
alu : 1
multiplier : 1
(b)
alu1
multiplier1
start:140
end:240
Child Block
if
Conditional
Node
+
-
Child Block
if
+
alu1
start:70
end :140
Conditional
Node
DFG-Unit 3
Child
Block
start:0 start:0
end :50 end :70
adder1
Conditional
Node
DFG-Unit*
HW Dependency
DFG-Unit*
Conditional
Node
Control
Node
while
0
endif
endif
start:0 start:0
end:50 end:70
adder1 alu1
Child
Block
DFG-Unit
DFG-Unit
+
start:70
end:140 start:70
alu1
end:120
adder1
adder : 1
alu : 1
(a)
(c)
2 Preliminaries
In this section, we explain basic knowledge necessary
for better understanding of this paper.
(X; Y; Z; E), where X is a control node, Y is a conditional node, Z is the set of nodes in , except for X and
(a)
Through a series of AHLS procedures such as scheduling, allocation and resource binding, details about HW
implementation is associated to the initial CDFG as
shown in Fig. 2 and we call it a Scheduled, Allocated,
Resource binded-CDFG(SAR-CDFG). An SAR-CDFG is
used for the input of the synthesis procedure, namely
scheduling, allocation and resource binding have been
already done. When only a DFG-unit is considered, the
term `SAR-DFG' is used instead of `SAR-CDFG'. In the
rest of the paper, for the sake of simplicity, we omit the
prex `SAR-', when it is not confusing.
WHILE-CNC
USC
Child Block
PC 11
PC 12
MUX
MUX
MUX
DFG-Unit1
PSC
DFG-Unit3
PSC
DFG-Unit
PC 1n
CNC 1
PSC
PC
Delay
PSC 1
PC PC
IF-CNC
USC
PC PC PC PC
Positive Edge
Triggered Register
PC
PC
DFG-Unit*
PSC
PC
PC PC PC
(a)
PC 21
PC 22
MUX
MUX
USC
MUX
PC 2n
CNC CNC
CNC
PSC PSC
PSC
CNC
PC
PSC
CNC
CNC
PSC 2
CNC
Delay
USC
Control Part
PSC
USC
PC
PSC
Functional Unit
(Adder,Shifter,..)
PC
Functional Part
PC
PC
PC
PC
PC
(b)
In the following two sections, we explain how to derive a distributed control unit of AFAHLS. At the rst,
automatic generation of PCs and a PSC for a DFG-unit
is presented, then the method to build CNCs and an
USC for a CDFG is explained in detail.
MUX
CPmux11
CPmux12
MUX
CPmux1n
Positive Edge
Triggered Register
CPreg 1
CPreg 2
CPreg n
Controller
MUX
MUX
CPmux21
CPmux22
MUX
CPmux2n
Delay
MUX
Delay
Centralized
Controller
(a)
Functional Unit
(Adder,Shifter,..)
Functional Part
CPfu 1
CPfu n
Hardware-Oriented
Controller
(b)
ReqOP1
OP1
Start
ReqWDR
ReqOP1+
Result
AckPC1+
ReqStart+
ReqStart
AckStart
Register
ReqOP2+
ReqPC1+
PC
ReqPC2+
PC1
PC2
AckPC1+
AckWDR
AckFU
AckFU+
OP2
ReqPC4+
AckPC3+
AckPC4+
Req-
PC4
Idling Phase
End
ReqPC1-
ReqPC2-
ReqPC3-
ReqPC4-
AckPC1-
AckPC2-
AckPC3-
AckPC4-
Ack-
External Ack
AckStart+
ReqOP1
ReqOP2
ReqFU
ReqOP1-
ReqOP2-
(a)
ReqFU-
ReqWDR-
AckFU-
AckWDR-
AckStart-
PC2
AckPC3
AckPC4
C
AckPC1
ReqPC3
Idling Phase
AckWDR
(b)
Ack
AckPC2
AckPC3
ReqPC2
AckPC1
AckPC2
ReqPC4
Req
ReqPC1
ReqPC2
AckPC4
ReqPC3
PC3
(c)
(b)
Ack
PSC
ReqStartAckStart
Req
PC1
ReqPC1
ReqWDR
AckFU
Working Phase
Ack+
PC3
AckWDR+
(a)
AckPC2+
AckPC2+
ReqWDR+
ReqPC2+
ReqPC3+
Working Phase
FU
ReqOP2
ReqStart
AckStart
ReqPC1+
ReqFU+
ReqFU
Register
Input Mux 2
Req+
External Req
PSC
Register
Input Mux 1
PC4
(c)
ReqPC4
(d)
liveness, boundedness, output semi-modularity, consistency and CSC[10]. The following denition and two
theorems are given in order to prove that an STG we
conceive for a PC always satises those ve properties.
Note that an STG for a PC is always a strongly connected MG from the construction.
A process sequencing controller(PSC) is a circuit activating a series of PCs in a proper order based on the
dependencies among nodes in a DFG-unit. In the rst
step of automatic PSC generation, we transform a DFGunit into a Petri Net(PN). Since all the operation nodes
in a DFG-unit has been already scheduled, the corresponding PN can be easily obtained. The following
denition denes dependency relation between any two
operation nodes in the same DFG-unit.
DEFINITION 8 For any two operation nodes,
opi and opj, in a DFG-unit,
if scheduled execution of opi does not overlap with upon
that of opj and opi is scheduled prior to opj , opi precedes opj, denoted by (opi Popj). If (opi Popj) and there
does not exist opk such that (opi Popk ) and (opk Popj ),
then opi directly precedes opj, denoted by (opi DPopj ).
Note that any DFG does not have choice inherently and
thus the net is an MG. Therefore, \make an arc from
t to t0 " in this algorithm means that t and t0 are connected via a place p, i.e., t ! p ! t0 .
step 1 Generate two transitions labeled Start and End.
step 2 For each operation node opi , make a transition
and label PCi; i = 1; 2; : : :.
step 3 For each transition PCi corresponding to an operation node opi which has no operation node opj such
that (opj Popi ), make an arc from Start transition to
PCi.
step 4 For each transition PCi corresponding to an operation node opi which has no operation node opj such
that (opi Popj), make an arc to End transition.
step 5 Make an arc from PCi to PCj corresponding
to operation nodes, opi and opj such that (opi DPopj).
That is, if (opi DPopj), PCi ! p ! PCj.
Moreover the PN derived from a DFG-unit can be
automatically transformed into an STG in a straight
forward way. The following algorithm shows how to
derive an STG of a PSC from a PN using 4-phase handshaking protocol. Fig. 7(a) and (b) show how to derive
a PN and an STG from the DFG , unit in Fig.2, and
Fig.7 (c) shows signal exchanges between PSC and associated PCs.
Req
Ack
P5
AckBlk-
Ack-/2
Req+
CNC
P1
Req+
ReqBlkReqCon
ReqBlk
AckCon
AckBlk
Req-/2
AckBlk+
Child
Block
ReqCon+
C1+
AckCon+
P2
P1
C1-/2
P4
(a)
Ack-/1
ReqCon-
ReqCon+
ReqBlk-
Req-/1
C1-/1
ReqCon-
Ack-
P3
P2
P5
P3
Ack+/1
C1+ AckCon+
AckBlk+
Flag+
ReqBlk+
FLAG
AckCon-
C1-/2
Ack+/2
Conditional
Node
AckBlk-
AckCon-
Req-
Flag+
P4
ReqBlk+
Ack+
C1-/1
Flag-
Flag-
(b)
(c)
Figure 8: (a) Signal exchanges in CNC (b) STG for IF-CNC (c) STG for WHILE-CNC
node when a Req input signal is activated. Then, as a
result of executing the conditional node, the CNC activates the child block according to the value of Flag
which indicates an execution result of conditional node.
Fig. 8(a) shows a block diagram which shows signal exchanges among CNC, the associated conditional
node and the child block. We propose STGs for IFCNC and WHILE-CNC as shown in Fig. 8(b) and (c),
respectively. STGs for IF-CNC and WHILE-CNC satisfy ve properties for speed-independent circuit synthesis. Therefore, they can be synthesized into speedindependent circuits.
Start
Req+
DFG-Unit 1
ReqFU1
ReqBlk1+
while
0
ReqFU2
AckBlk1+
1
Delay
CDFG-Unit 2
Block1
FU / Register
Block2
ReqBlk2+
DFG-Unit3
Block3
g1
AckBlk2+
Child Block
ReqBlk3+
End
if
PC2
AckFU1
AckBlk3+
g2
PC1
(a)
AckFU2
Ack+
DFG-Unit*
Reqendif
ReqBlk1-
ReqBlk2-
ReqBlk3-
AckBlk1-
AckBlk2-
AckBlk3-
DFG-Unit
Ack-
8 Experimental results
In this paper, we suggested a process-oriented
control circuit generation method. The proposed
method has been being implemented as a part of an
asynchronous high-level synthesis tool. This tool consists of two parts largely, an automatic asynchronous
control circuit generator and a VHDL code generator.
The former derives a series of controllers based on
process-oriented method and the latter generates
structural VHDL codes of those controllers for circuit
simulation and analysis.
We performed two experiments in order to check
eectiveness of our method. The rst, we performed
a comparison between the hardware-oriented method
and the process-oriented method in terms of the
number of literals, area, worst-case delay, and average
cycle time. Since an automatic VHDL code generator
is being developed, in the experiment, control circuits
were implemented and simulated manually using
Figure 10: (a) Signal exchanges based on 4-phase bundled data method between FU and PC (b) Delay element with fast high-to-low propagation delay
a commercial VHDL tool, SYNOPSYS and 0.6m
IDEC-C631 library[13]. Table 1 shows experimental
results for four kinds of controllers, PC, PSC, CNC and
USC in the process-oriented method. Especially, PSCs
and USCs are scalable according to the size of the
given DFG-unit and CDFG. Thus we made experiments
in several PSCs and USCs with various sizes. As
shown in Table 1, areas of process-oriented method
based controllers are small and regular except IF-CNC
and WHILE-CNC, in consequence, they show smaller
worst case delay and average cycle time comparing
those of controllers in the hardware-oriented method.
These features are due to good decomposition of a
global controller into several process-level controllers
and coordinators among them through the proposed
method.
Table 2 shows experimental results of controllers
for functional units and registers, denoted by CPfu
and CPreg , among controllers in hardware-oriented
method[5]. Unfortunately, [5] did not presents any
experimental results and thus CPs in Table 2 were
constructed by us for the purpose of comparison. CPi ,
i=1, 2, 3, in Table 2 are constructed for cases that
one, two or three processes use a functional unit or a
register sequentially. As Table 2 shows, the sizes of
CPs are larger than those of process-oriented based
controllers given in Table 1 and thus suer from
bigger delay. In our opinion, those features result
from following two reasons; The rst reason is a CSC
violation causing large area overhead and performance
degradation. For controllers in hardware-oriented
method, although Petrify, which is state of art in
solving a CSC violation problem, was used in order to
solve CSC violations, many additional internal signals
were inserted. In consequence, resulting circuits show
bigger area, worst case delay and average cycle time.
Although CSC violations can be reduced with much
a = 1;
c = 0;
(b = c)
Controllers
b = 4;
d = 2;
while
0
R0 : a
R1 : b, e, g
R2 : c, f, h
R3 : d
c d
FU : adder1, adder2
Reg : R0, R1, R2, R3
10 Acknowledgements
This work has been supported in part by the Korea Research Foundation under grant 1998-016-E00058
[8] T. Kolks, S. Vercauteren and B. Lin, \Control Resynthesis for Control-Dominated Asynchronous Designs,"
In Proceedings of Second International Symposium on
Advanced Research in Asynchronous Circuits and Systems, Mar., 1996.
[9] J. Cortadella et. al., \Petrify: a tool for manipulating concurrent specications and synthesis of asynchronous controllers," In Proceedings of the 11th Conf.
Design of Integrated Circuits and Systems, Nov., 1996.
[10] A. Kondratyev, J. Cortadella, M. Kishinevsky, E.
Pastor, O. Roig and A. Yakovlev, \Checking Signal
Transition Graph Implementability by Symbolic BDD
Traversal," European Design and Test'95, Mar., 1995.
[11] T. Murata, \Petri Nets: Properties, Analysis and Applications," Proceedings of the IEEE, Vol. 77, No. 4,
1989.
[12] K. T. Christensen, P. Jensen, P. Korger and J.
Spars;, \The Design of an Asynchronous TinyRISCTM
TR4101 Microprocessor Core," In Proceedings of
Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems, Mar.,
1998.
[13] IDEC Cell Library Data Book Release 9804, Apr.,
1998.
References
[1] S. Hauck, \Asynchronous Design Methodologies : An
Overview," Proceedings of the IEEE, 83(1), Jan., 1995.
[2] T. A. Chu, \Synthesis of Self-timed VLSI Circuits
from Graph-theoretic Specications," Ph. D. thesis,
MIT, Jun., 1987.
[3] A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbekbergen, and Yakovlev, \Basic Gate Implementation
of Speed-Independent Circuits," In Proceedings of Design Automation Conference, Jun., 1994.
[4] S. M. Nowick and D. L. Dill, \Synthesis of Asynchronous State Machines Using a Local Clock," In
Proceedings of ICCD, Oct., 1991.
[5] J. Cortadella and R. M. Badia, \An Asynchronous Architecture Model for Behavioral Synthesis," In Proceedings of European Conference on Design Automation, Mar., 1992.
[6] R. M. Badia, J. Cortadella, E. Pastor and A. Pardo,
\A High-Level Synthesis System for Asynchronous
Circuits," Sixth International Workshop on High-Level
Synthesis, Nov., 1992.
[7] E. Brunvand, \Translating Concurrent Communicating Programs into Asynchronous Circuits," Ph. D. thesis, CMU, 1991.