Академический Документы
Профессиональный Документы
Культура Документы
Unfolding
Unfolding creates a program with more than one
g factor
iteration, J=unfolding
Unfolding Applications
– sample period reduction
reduction, reach T∞
– Parallel processing
– Bit-serial and Digit-serial
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Unfolding example
Unfolding, Definitions
⎧ y (2k ) = ay (2( k − 5) + 1) + x (2k )
⎨
⎩ y (2k + 1) = ay (2( k − 4) + 0 ) + x (2k + 1)
y(2k))
y(
⎣x ⎦ is the floor of x, largest integer ≤x
x(2k) a
5D Not trivial even ⎡x ⎤ i the
is ili off x,
th ceiling x smallest
ll t integer
i t ≥x
for a simple
graph!
x(2k+1) 4D a a%b remainder after a b
y(2k+1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
Properties of unfolding
Algorithm for unfolding 2D
gcd=greatest
• For
each node U in the original DFG,
J 4
J=4 U
D
V U0 V0 2D T0 common divisor
draw J nodes U0 , U1 , U2 ,…, UJ-1 3-unfolded gcd(12 , 3)
5D 6D
U0 9D V0 U1 V1 2D T1 =3
T DFG
2D
U
37D
V U1 9D V1 U2 D V2 2D T2
D
⎢ (i + w )⎥ ⎢ (i + 37 )⎥ ⎧9, i = 0,1,2
• Unfolding preserves the number of delays in a DFG
U2 V2
⎢ J ⎥ = ⎢ 4 ⎥ = ⎨10, i = 3
9D ⎣w/J⎦ + ⎣(w+1)/J⎦ + … + ⎣(w + J - 1)/J⎦ = w
⎣ ⎦ ⎣ ⎦ ⎩ U3 10D V3
• Unfolding preserves precedence constraints
• J-unfolding
f off a loop with wl delays in the original DFG
G
• For each edge U → V with w delays in the original DFG, gcd(wl , J) loops in the unfolded DFG. Each loop contains
g Ui
draw the J edges → V(i + w)%J with
wl/gcd(wl , J) delays and J/ gcd(wl , J) copies of each node.
⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1 • Unfolding a DFG with iteration bound T∞ results in a J-unfolded
DFG with iteration bound JT∞ .
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
common divisor
But we x(2k+1) 4D a A2 B2 C2
process
2 samples y(2k+1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
Sample Period S0
Sample Period Reduction: case 2
Reduction: case 1 (1) (4) The original DFG cannot have sample period equal to the iteration
bound because the iteration bound is not an integer
g
Q0 T0
(2) 2D (1) (1) (1) (1)
D
⎧⎪ t ⎫⎪ 4
(0) (0)
If the computation
p time of S D T U D V
P0 R0 U0
a node ‘U’, tu, is greater (1) T∞ = max ⎨ l ⎬ =
than the iteration bound D But two l ∈ L⎪⎩ w l ⎪⎭ 3
T∞, then ⎡tu/T ∞⎤ - (4) Samples! D
unfolding should be used. 4 S1
If a critical loop bound is of the form tl/wl where tl and wl are
tu = 4 and T∞ = 3 (1)
Q1
(4)
T1
3 mutually co-prime,
co prime, then wl-unfolding
unfolding should be used.
((0)) 6 D ((0))
⎡ ⎤ = 2 - unfolding
⎡4/3⎤ P1 R1 U1
Unfolding of 3
(1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k)
b0 b1 b2
y(2k+1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
a0 b0
a1 b1 cin msb cin i
a2 Bit-parallel
b2
a3 b3 coutmsb couti+1 ai Digit-Serial
g si
bi
couti
a3 a2 a1 a0 Bit-serial b3 b2 b1 b0
Bit-Serial
Bit Serial ai +1
si +1
cout i +1
bi +1
ai
si si + 2
bi ai + 2
a2 a0 b2 b0 couti bi + 2 cout i +2
Digit-Serial
(Digit-size = 2)
a3 a1 b3 b1
Δ Δ
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
U2 V2 U2 V2
¾ Write
W it the
th switching
it hi instance
i t as ¾ Write
W it the
th switching
it hi instance
i t as
Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J) Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J)
Edges Edges
9l 1 3(3l + ⎣1/3⎦ ) + (1%3) = 3(3l + 0) + 1
9l+1=3(3l 9l 1 3(3l + ⎣1/3⎦ ) + (1%3) = 3(3l + 0) + 1
9l+1=3(3l
b t
between b t
between
9l+5=3(3l + ⎣5/3⎦ ) + (5%3) = 3(3l + 1) + 2 Nodes 9l+5=3(3l + ⎣5/3⎦ ) + (5%3) = 3(3l + 1) + 2 Nodes
Switched at ¾ Draw an edge from the node Uu%J Vu%J, I.e.
time instances
U1 V1 and U2 V2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
INPUTS X D B0 D0 B1 D1
D
Dummy node
d
B D Z0 Z1
4l+0 4l+1 2 3
4l+1,2,3 For each node U in the original DFG, draw J nodes U0 , U1 , U2 ,…, UJ-1
Reset Carry
Carry = 0
Z
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
X0 X1 X0 X1
B0 D0 B1 D1 B0 D0 B1 D1
D
Z0 Z1 Z0 Z1
F each
For d U → V with
h edge ith w delays
d l in
i the
th original
i i l DFG,
DFG F each
For d U → V with
h edge ith w delays
d l in
i the
th original
i i l DFG,
DFG
draw the J edges Ui → V(i + w)%J with draw the J edges Ui → V(i + w)%J with
⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1 ⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1
If edge has w=0 Ui → Vi with 0 delays X D for i=0 X0 → D1 with 0 delays and X D for i=1 X1 → D0 with 1 delays
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
X0 X0
B0 D0 B0 D0
Z X D X Z X D X
4l+0 2(2l+0)+0 4l+0 4l+1,2,3 4l+1 2(2l+0)+1 4l+0 2(2l+0)+0 4l+0 4l+1,2,3 4l+1 2(2l+0)+1
4l+2 2(2l+1)+0 4l+2 2(2l+1)+0
Z0 Z0
4l+3 2(2l+1)+1 4l+3 2(2l+1)+1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
X0 X1 X0 X1
B0 B1 B0 B1
2l+0 2l+1
2l 1 D 2l+0 2l+1
2l 1 D
Z0 Z0
X0 X1 X2 X3
X0
B0 D0 B1 D1 B2 D2 B3 D3
B0 D0
Z X D X
Z0 Z1 Z2 Z3 4l+0 4(1l+0)+0 4l+0 4l+1,2,3 4l+1 4(1l+0)+1
4l+2 4(1l+0)+2
Z0
4l+3 4(1l+0)+3
D
For each node U in the original DFG, draw J nodes U0 , U1 , U2 ,…, UJ-1 Write the switching instance as
For each edge U → V with w delays in the original DFG
DFG, W’ll + ⎣u/J⎦ ) + (u%J)
Wl + u = J( W
draw the J edges Ui → V(i + w)%J with ⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
B0 D0 B1 D1 B2 D2 B3 D3
B0 D0
Z X D X
4l+0 4(1l+0)+0 4l+0 4l+1,2,3 4l+1 4(1l+0)+1 Z0 Z1 Z2 Z3
4l+2 4(1l+0)+2
Z0
4l+3 4(1l+0)+3
D
Only 1 time instance 0, i.e. fully parallel
Only 1 time instance 0, i.e. fully parallel
Z0 X0, D1 X1, D2 X2 and D3 X3
Z0 X0, D1 X1, D2 X2 and D3 X3
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
X0 X1 X2 X3 X0 X1 X2 X3
B0 D0 B1 D1 B2 D2 B3 D3 B0 D0 B1 D1 B2 D2 B3 D3
Z0 Z1 Z2 Z3 Z0 Z1 Z2 Z3
D D
”Dead” nodes
”Dead”
Dead nodes can be removed Dummy y nodes
can be removed
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design