Академический Документы
Профессиональный Документы
Культура Документы
Data input
Data Register
Register
Sender Logic Clock
Receiver
Tsetup Thold
Data input
Data Register
Register
Sender Logic Req
Receiver
Ack
y
+ out
*
a
+
b
x req1 ack1
y
+ out req3 ack3
*
a req2 ack2
+
b
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
Muller C-element
Key component in asynchronous circuit design – like
a Petri net transition
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
Muller C-element
Key component in asynchronous circuit design – like
a Petri net transition
x1 0 0
y=x1*x2+(x1+x2)y
x2
0 C y
Set-part Reset-part
Muller C-element
Key component in asynchronous circuit design – like
a Petri net transition
0->1
x1 0
y=x1*x2+(x1+x2)y
x2
0 C y
Set-part Reset-part
Muller C-element
Key component in asynchronous circuit design –
behaves like a Petri net transition
0->1
x1 0*
y=x1*x2+(x1+x2)y
x2
0->1 C y
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
Power
NMOS circuit
y implementation
x1
x2 x1 x2
Ground
Muller C-element
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
Power
x1
x2 x1 x2
Ground
Muller C-element
x1
y=x1*x2+(x1+x2)y
x2
C y
Set-part Reset-part
Power
x1
x2 x1 x2
Ground
Why asynchronous is good
• Performance (work on actual, not max delays)
• Robustness (operationally scalable; no clock
distribution; important when gate-to-wire delay ratio
changes)
• Low Power (‘change-based’ computing – fewer signal
transitions)
• Low Electromagnetic Emission (more even
power/frequency spectrum)
• Modularity and re-use (parts designed independently;
well-defined interfaces)
• Testability (inherent self-checking via ack signals)
Obstacles to Async Design
• Design tool support – commercial design
tools are aimed at clocked systems
• Difficulty of production testing – production
testing is heavily committed to use of clock
• Aversion of majority of designers, trained
‘with clock’ – biggest obstacle
• Overbalancing effect of periodic (every 10
years) ‘asynchronous euphoria’
Why control logic
• Customary in hardware design to separate control logic from datapath
logic due to different design techniques
• Control logic implements the control flow of a (possibly concurrent)
algorithm
• Datapath logic deals with operational part of the algorithms
• Datapath operations may have their (lower level) control flow
elements, so the distinction is relative
• Examples of control-dominated logic: a bus interface adapter, an
arbiter, or a modulo-N counter
• Their behaviour is a combination of partial orders of signal events
• Examples of data-dominated logic are: a register bank or an
arithmetic-logic unit (ALU)
Role of Petri Nets
• We concentrate here on control logic
• Control logic is behaviourally more diverse than data
path
• Petri nets capture causality and concurrency between
signalling events, deterministic and non-deterministic
choice in the circuit and its environment
• They allow:
– composition of labelled PNs (transition or place sync/tion)
– refinement of event annotation (from abstract operations
down to signal transitions)
– use of observational equivalence (lambda-events)
– clear link with state-transition models in both directions
Design flow with Petri nets
Transition systems High-level languages
Abstract behaviour synthesis
Predicates on traces (VHDL, CSP, …)
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution (not
exactly yet!)
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
(now it is!)
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Instruction
Example Instruction
Fetch Execution
One-word
Instruction
Decode
One-word
Instruction
Execute
Memory
Read
Two-word
Program Memory Instruction
Counter Address Execute
Two-word
Update Register Load Instruction
Instruction
Register Load
Decode
High-level modelling:Processor
Example
• The details of further refinement, circuit implementation
(by direct translation) and performance estimation (using
UltraSan) are in:
A. Semenov, A.M. Koelmans, L.Lloyd and A. Yakovlev.
Designing an asynchronous processor using Petri Nets,
IEEE Micro, 17(2):54-64, March 1997
• For use of Coloured Petri net models and use of
Design/CPN in processor modeling:
F.Burns, A.M. Koelmans and A. Yakovlev. Analysing
superscala processor architectures with coloured Petri nets,
Int. Journal on Software Tools for Technology Transfer,
vol.2, no.2, Dec. 1998, pp. 182-191.
Low-level Modelling:Interface
Example
• Insert VME bus figure 1 – timing diagrams
Low-level Modelling:Interface
Example
• Insert VME bus figure 2 - STG
Low-level Modelling:Interface
Example
• Details of how to model interfaces and design controllers
are in:
A.Yakovlev and A. Petrov,
complete the reference
Low-level Modelling:Interface
Example
• Insert VME bus figure 3 – circuit diagram
Logic Circuit Modelling
Event-driven elements Petri net equivalents
Muller C-
element
Toggle
Logic Circuit Modelling
Level-driven elements Petri net equivalents
x=0
x(=1) z(=0)
z=1
y(=1) y=0
y=1
Event-driven circuit example
a ab 0->1
0*
00 a
10 01
b
0->1
11 0*
b
Other modelling examples
• Examples with mixed event and level based
signalling:
– “Lazy” token ring arbiter spec
– RGD arbiter with mutex
Lazy ring adaptor
Lr R
R G D dum dum
Lr Rr
G
Ring Rr
La Ra
adaptor
La Ra D
t=0
(token isn’t
initially here)
t=0 t=1
Lazy ring adaptor
Lr R
R G D dum dum
Lr Rr
G
Ring Rr
La Ra
adaptor
La Ra D
t=0->1->0
(token must be
taken from the
right and past to
the left t=0 t=1
Lazy ring adaptor
Lr R
R G D dum dum
Lr Rr
G
Ring Rr
La Ra
adaptor
La Ra D
t=1
(token is
already
here) t=0 t=1
Lazy ring adaptor
Lr R
R G D dum dum
Lr Rr
G
Ring Rr
La Ra
adaptor
La Ra D
t=0->1
(token must be
taken from the
right)
t=0 t=1
Lazy ring adaptor
Lr R
R G D dum dum
Lr Rr
G
Ring Rr
La Ra
adaptor
La Ra D
t=1
(token is here)
t=0 t=1
Tutorial Outline
• Introduction
• Modeling Hardware with PNs
Synthesis of Circuits from PN
specifications
• Circuit verification with PNs
• Performance analysis using PNs
Synthesis.Outline
• Abstract synthesis of LPNs from transition systems
and characteristic trace specifications
• Handshake and signal refinement (LPN-to-STG)
• Direct translation of LPNs and STGs to circuits
Examples
• Logic synthesis from STGs
Examples
Synthesis from trace specs
• Modelling behaviour in terms of characteristic
predicates on traces (produce LPN
“snippets”)
• Construction of LPNs as compositions of
snippets
• Examples: n-place buffer, 2-way merge
Synthesis from transition systems
• Modelling behaviour in terms of a sequential
capture – transition system
• Synthesis of LPN (distributed and concurrent
object) from TS (using theory of regions)
• Examples: one place buffer, counterflow pp
Synthesis from process-based
languages
• Modelling behaviour in terms of a process
(-algebraic) specifications (CSP, …)
• Synthesis of LPN (concurrent object with
explicit causality) from process-based model
(concurrency is explicit but causality implicit)
• Examples: modulo-N counter
Refinement at the LPN level
• Examples of refinements, and introduction of
“silent” events
• Handshake refinement
• Signalling protocol refinement (return-to-zero
versus NRZ)
• Arbitration refinement
• Brief comment on what is implemented in
Petrify and what isn’t yet
Translation of LPNs to circuits
• Examples of refinements, and introduction of
“silent” events
Why direct translation?
• Logic synthesis has problems with state
space explosion, repetitive and regular
structures (log-based encoding approach)
• Direct translation has linear complexity but
can be area inefficient (inherent one-hot
encoding)
What about performance?
Direct Translation of Petri Nets
• Previous work dates back to 70s
• Synthesis into event-based (2-phase) circuits
(similar to micropipeline control)
– S.Patil, F.Furtek (MIT)
• Synthesis into level-based (4-phase) circuits
(similar to synthesis from one-hot encoded FSMs)
– R. David (’69, translation FSM graphs to CUSA cells)
– L. Hollaar (’82, translation from parallel flowcharts)
– V. Varshavsky et al. (’90,’96, translation from PN into
an interconnection of David Cells)
Synthesis into event-based circuits
a x’1
yb
x1 x2
b d ya
yc
c x’2
x1 x’2
K
A
N (1)
M
N
B (1) (1)
L
L
K
1 (0)
1 A B
Fragment of flow-chart One-hot circuit cell
Hollaar’s approach
M 1 0
K
A
N (1)
M
N
B 0 (1)
L
L
K
1 (0)
1 A B
Fragment of flow-chart One-hot circuit cell
Hollaar’s approach
M 1 0
K
A
N (1)
M
N
B 1 (1)
L
L
K
0 (0)
1 A B
Fragment of flow-chart One-hot circuit cell
Varshavsky’s Approach
Controlled
p1 Operation p2
p1 p2
(1) (0) (0) (1)
(1)
1*
To Operation
Varshavsky’s Approach
p1 p2
p1 p2
(1) (0) 0->1 1->0
1->0 (1)
Varshavsky’s Approach
p1 p2
p1 p2
1->0 0->1 0->1 1->0
1->0->1 1*
Translation in brief
D S w + D S r + D T A C K -
LDTACK+
D+ 0 1 0 1 0 1
0 1
( 1 )
( 1 )
( 1 ) (1)
&
DTACK+ p r 1
p r 2
p r 3
p r 4
L D S + / 1 D + / 1 r e q
D T A C K + / 1
DSr- &
( 1 )
L D T A C K + / 1
( 1 )
( 1 )
D + / 1 a c k
( 1 ) ( 1 )
D S r -
( 1 ) ( 1 )
D- 1 0 ( 1 ) 1 0
0 1
(0) (1)
(0)
+ +
p 2
p2 p 1
p 4
D - / 1 a c k
D S r +
L D T A C K -
D T A C K - +
LDS- DSw-
+ D - / 1 r e q
D S w + L D S -
( 1 )
( 1 ) ( 1 )
1 *
1 *
LDTACK- DTACK+/1 ( 1 )
0 1 0 1 0 1 0 1
( 1 )
( 1 ) ( 1 )
&
p3 D-/1 p w 1
p w 2
p w 3
p w 4
D T A C K + / 2
D - / 2 r e q
D + / 2 r e q L D S + / 2
&
D S w -
LDS+/1
L D T A C K + / 2 D - / 2 a c k
( 1 ) D + / 2 a c k ( 1 ) ( 1 ) ( 1 ) ( 1 )
( 1 )
( 1 )
( 1 )
LDTACK+/1
I N P U T S : D S r , D S w , L D T A C K
O U T P U T S : D , L D S , D T A C K
VME bus controller
After DC-optimisation (in the style of Varshavsky et al WODES’96)
0 1
( 1 )
D - / 1 r e q
( 1 )
& D - / 1 a c k
(1)
p r 1
pr2
&
L D S + / 1 D + / 1 r e q
D T A C K + / 1
L D T A C K + / 1 D + / 1 a c k
(1) (1)
D S r -
( 1 ) ( 1 ) ( 1 )
1 0
01
+ +
p 1
(1) p2
& LDTACK- +
D S r +
+
LDS-
D T A C K -
& D S w +
1* (1)
1*
(1)
(1)
0 1
( 1 )
&
pw 1
pw 2 (1)
D - / 2 r e q
&
L D S + / 2
D + / 2 r e q
(1)
L D T A C K + / 2 D - / 2 a c k
D + / 2 a c k ( 1 )
( 1 )
( 1 ) ( 1 )
(1)
D T A C K + / 2
D S w -
David Cell library
D C 1 0 D C 4 0
1 1
0 1
0 1
1 1
1 1
1 1 1
1 +
D C 2 D C 5
0 1 0 1 0
p 1
& 1
D C 6
D C 3 0
0 1 0 1 1
& 1 +
1 1
D C 7
0 1
1 1
&
& 1
1
“Data path” control logic
Example of interface with a handshake control (DTACK, DSR/DSW):
D T A C K + / 1 ( r )
D T A C K - D S R / D S w h a n d s h a k e : D S r -
( 1 )
( 1 )
D T A C K + / 2 ( w )
D S w -
D T A C K + / 1 ( r )
( 1 )
D S r - ( 1 )
D T A C K + / 2 ( w )
3 w i r e D S w -
( 1 ) D S r +
h / s
D S r +
D T A C K -
D T A C K -
( 1 )
D S w +
( 1 )
( 1 )
D S w +
( 1 )
D T A C K D S r D S w
( 1 ) ( 0 ) ( 0 )
D T A C K D S r D S w
Example 2: “Flat” mod-6 Counter
TE-like Specification:
((p?;q!)5;p?;c!)*
Petri net (5-safe):
q! 5
p?
c! 5
“Flat” mod-6 Counter
Refined (by hand) and optimised (by Petrify) Petri net:
“Flat” mod-6 counter
Result of direct translation (optimised by hand):
David Cells and Timed circuits
x-
dummy
a- b-
b+ b-
y- z-
“Butterfly” circuit
Speed-independent logic synthesis solution:
x ( 0 )
( 1 )
0 * 0 *
( 0 )
( 0 )
y z
( 1 ) ( 1 )
( 0 ) ( 1 ) ( 1 ) ( 0 )
“Butterfly” circuit
Speed-independent DC-circuit:
( 1 )
( 0 1 )
( 1 0 )
( 0 )
( 1 )
a + a -
( 1 ) ( 1 )
1 *
( 0 1 )
( 1 )
( 1 )
& ( 0 1 )
( 1 0 )
( 0 )
( 1 ) ( 1 ) ( 1 )
b + b -
&
1 *
( 1 )
“Butterfly” circuit
DC-circuit with aggressive relative timing:
a an
pa2 pa2n
p a 1 p a 1 n
( 1 )
1* ( 0 ) ( 1 )
( 0 )
( 1 )
( 1 )
ta1
p pn
( 0 )
(1)
( 1 )
pb2 pb2n
p b 1
p b 1 n
( 1 )
1* (0) (1)
b bn
( 0 ) ( 1 )
( 1 )
tb1