Вы находитесь на странице: 1из 13

DSP Design DSP Design

Unfolding
Unfolding creates a program with more than one
g factor
iteration, J=unfolding

Unfolding is a structured way to achieve parallel


p
processingg

Unfolding Applications
– sample period reduction
reduction, reach T∞
– Parallel processing
– Bit-serial and Digit-serial

Unfolding = Loop unrolling


– assembly programming
– compiler
p theory
y

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Example: Loop unrolling


unrolling+
Software Pipelining Unfolding ≡ Parallel Processing
2-unfolded
CC oper (1)
1 1 1 (1)
A B (1) (1) 0,2,4,….
2 2 1 2 B0
A0
3 3 1 2 3 2D
5 1 2 3
GSM Speechcoder
S h d T’∞= 2ut D
• Org. C-code = 250k cc A0ÆB0=> A2ÆB2=> A4ÆB4=>…..
6 2 3 A1ÆB1=> A3ÆB3=> A5ÆB5=>….. (1) 1,3,5,….
((1))
7 3 B1
2 nodes & 2 edges A1
8 1 • Mod. C-code = 90k cc
T∞= (1+1)/2 = 1ut
T’∞= 2ut D
• Hand Opt
Opt. = 50k cc 4 nodes & 4 edges
T∞= 2/2 = 1ut
Iteration 1 Iteration 3
• In a ‘J ’ unfolded system each delay is J-slow
Higher order if input
put to a delay
de ay element
e e e t is
s x(kJ
( J + m))
Iteration 2
Iterations the output is x((k-1)J + m) = x(kJ + m – J ). J samples
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Unfolding, example Unfolding, example


y(n))
y(
x(n) 9D a ⎧ y (2k ) = ay (2k − 9 ) + x (2k )

⎩ y (2k + 1) = ay (2k − 8 ) + x (2k + 1)
y (n ) = ay (n − 9 ) + x (n )
Unfolding J=2, 2-times parallel
J
⎧ y (2k ) = ay (2k − 9 ) + x (2k ) ⎧ y (2k ) = ay (2( k − 5) + 1) + x (2k )

⎩ y (2k + 1) = ay (2k − 8 ) + x (2k + 1)

⎩ y (2k + 1) = ay (2( k − 4) + 0 ) + x (2k + 1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Unfolding example
Unfolding, Definitions
⎧ y (2k ) = ay (2( k − 5) + 1) + x (2k )

⎩ y (2k + 1) = ay (2( k − 4) + 0 ) + x (2k + 1)
y(2k))
y(
⎣x ⎦ is the floor of x, largest integer ≤x
x(2k) a
5D Not trivial even ⎡x ⎤ i the
is ili off x,
th ceiling x smallest
ll t integer
i t ≥x
for a simple
graph!
x(2k+1) 4D a a%b remainder after a b
y(2k+1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Properties of unfolding
Algorithm for unfolding 2D
gcd=greatest
• For
each node U in the original DFG,
J 4
J=4 U
D
V U0 V0 2D T0 common divisor
draw J nodes U0 , U1 , U2 ,…, UJ-1 3-unfolded gcd(12 , 3)
5D 6D
U0 9D V0 U1 V1 2D T1 =3
T DFG
2D
U
37D
V U1 9D V1 U2 D V2 2D T2
D

⎢ (i + w )⎥ ⎢ (i + 37 )⎥ ⎧9, i = 0,1,2
• Unfolding preserves the number of delays in a DFG
U2 V2
⎢ J ⎥ = ⎢ 4 ⎥ = ⎨10, i = 3
9D ⎣w/J⎦ + ⎣(w+1)/J⎦ + … + ⎣(w + J - 1)/J⎦ = w
⎣ ⎦ ⎣ ⎦ ⎩ U3 10D V3
• Unfolding preserves precedence constraints
• J-unfolding
f off a loop with wl delays in the original DFG
G
• For each edge U → V with w delays in the original DFG, gcd(wl , J) loops in the unfolded DFG. Each loop contains
g Ui
draw the J edges → V(i + w)%J with
wl/gcd(wl , J) delays and J/ gcd(wl , J) copies of each node.

⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1 • Unfolding a DFG with iteration bound T∞ results in a J-unfolded
DFG with iteration bound JT∞ .
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Unfolding and Iteration Bound D D


A B C
gcd(9 , 2) = 1 1 loop
TA=3, TM=6
T∞ = 18 / 9 = 2 J=3
J 3
y(n)
y(2k) A0 B0 C0
x(n) 9D a D D
x(2k) a
gcd(2 , 3) =1
5D
T∞ = 9 / 9 = 1 gcd=greatest
A1 B1 C1

common divisor
But we x(2k+1) 4D a A2 B2 C2
process
2 samples y(2k+1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

The Critical Path


If edge with w<J (J-w) paths with zero Sample Period Reduction
delay and w paths with 1 delay

D D • Case 1 : A node in the DFG having


A B C A0 B0 C0 computation time greater than T∞.
D D

• Case 2 : Iteration bound is not an integer.


Can lead to A1 B1 C1
increased
critical path! • Case 3 : Longest
g node computation
p is
A2 B2 C2 larger than the iteration bound T∞, and T∞
Edge
g with w>=J will not is not an integer
create new critical path!
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Sample Period Reduction: case 1 Sample Period Reduction: case 1


The original DFG cannot have sample period equal to the iteration
bound because a node computation time is more than iteration bound
(4)
b2 S
6 (4)
⎧⎪ t ⎫⎪
T∞ = max ⎨ l ⎬
Q S
D 3
l ∈ L⎪⎩ w l ⎪⎭
(4)
b1 (1)
(4) D
Q T (1)
Q Q T
2D
(0) (0) 2D
⎧6 6 ⎫
X(n) y(n)
P R U
(0) (0)
= max ⎨ , ⎬ = 3
l ∈ L⎩ 3 2 ⎭
(1)
P R U
IIR-filter from Lab1 (1) 6
2 <4, max node time
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design
(4)

Sample Period S0
Sample Period Reduction: case 2
Reduction: case 1 (1) (4) The original DFG cannot have sample period equal to the iteration
bound because the iteration bound is not an integer
g
Q0 T0
(2) 2D (1) (1) (1) (1)
D
⎧⎪ t ⎫⎪ 4
(0) (0)
If the computation
p time of S D T U D V
P0 R0 U0
a node ‘U’, tu, is greater (1) T∞ = max ⎨ l ⎬ =
than the iteration bound D But two l ∈ L⎪⎩ w l ⎪⎭ 3
T∞, then ⎡tu/T ∞⎤ - (4) Samples! D
unfolding should be used. 4 S1
If a critical loop bound is of the form tl/wl where tl and wl are
tu = 4 and T∞ = 3 (1)
Q1
(4)
T1
3 mutually co-prime,
co prime, then wl-unfolding
unfolding should be used.

((0)) 6 D ((0))
⎡ ⎤ = 2 - unfolding
⎡4/3⎤ P1 R1 U1
Unfolding of 3
(1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Sample Period Reduction: case 2 (2) Sample Period Reduction: case 3


(1) (1) (1) (1) The original DFG cannot have sample period
S0 T1 U1 V2
equall to
t th
the iteration
it ti bound
b d because
b the
th
D
longest node computation is larger than the
(1) (1) (1) (1) (1) (1) (1) (1) i
iteration
i bound
b d T∞, and
d T∞ is
i not an integer
i
S D T U D V S1 T2 U2 D V0

T∞ = 4 The minimum J that achieves the iteration


(1) (1) (1) (1)
S2 D T0 U0 V1
bound is the minimun value of J such that JT∞
and 3 samples gives is an integer
g and is greater
g or equal
q to the
minimum sample period 4/3 longest node computation time
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Parallel processing can be


Another FIR-filter, J=3
performed by unfolding, chapter 3
x(2k+1) x(2k)
x(2k-1)
D
x(2k-2)
D
b0 b1 b2

y(2k)

b0 b1 b2

y(2k+1)

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design


Bit-Parallel
Bit-Level Parallel Processing amsb bmsb ai+1bi+1 ai bi

a0 b0
a1 b1 cin msb cin i
a2 Bit-parallel
b2
a3 b3 coutmsb couti+1 ai Digit-Serial
g si
bi
couti

a3 a2 a1 a0 Bit-serial b3 b2 b1 b0
Bit-Serial
Bit Serial ai +1
si +1
cout i +1
bi +1
ai
si si + 2
bi ai + 2
a2 a0 b2 b0 couti bi + 2 cout i +2
Digit-Serial
(Digit-size = 2)
a3 a1 b3 b1
Δ Δ
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Bit-serial adder Unfolding of Switches


Bit-serial can be seen as a time-multiplexed architecture,
in this example on addition (i.e.
(i e 1 iteration) takes 4cc.
4cc • The following assumptions are made when unfolding an edge U→V
containing a switch :
¾ The wordlength W is a multiple of the unfolding factor J, i.e. W = W’J.
a3 a2 a1 a0 s3 s2 s1 s0 ¾ All edges into and out of the switch have no delays.
b3 b2 b1 b0 Bit-serial
Bit i l
• With the above two assumptions an edge U→V can be unfolded as
adder follows :
D
¾ Write the switching instance as
4l+0 4l+1,2,3
Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J)
0 ¾ Draw an edge from the node Uu%J Vu%J
% % ,
which is switched at time instance ( W’l + ⎣u/J⎦) .
Switch for carry signal
Wl+u
How to unfold switches? U V
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Example: Unfolding of Switches, J=3 Example: Unfolding of Switches, J=3


U0 V0 U0 V0
9l+1,5 9l+1,5
U V U1 V1 U V U1 V1

U2 V2 U2 V2
¾ Write
W it the
th switching
it hi instance
i t as ¾ Write
W it the
th switching
it hi instance
i t as
Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J) Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J)
Edges Edges
9l 1 3(3l + ⎣1/3⎦ ) + (1%3) = 3(3l + 0) + 1
9l+1=3(3l 9l 1 3(3l + ⎣1/3⎦ ) + (1%3) = 3(3l + 0) + 1
9l+1=3(3l
b t
between b t
between
9l+5=3(3l + ⎣5/3⎦ ) + (5%3) = 3(3l + 1) + 2 Nodes 9l+5=3(3l + ⎣5/3⎦ ) + (5%3) = 3(3l + 1) + 2 Nodes
Switched at ¾ Draw an edge from the node Uu%J Vu%J, I.e.
time instances
U1 V1 and U2 V2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Example: Unfolding of Switches, J=3 Switch with multiple instances


Example :
U0 V0
U0 V0
9l+1,5 (3l+0)
U V U1 V1 12l + 1, 7, 9, 11 Unfolding by 3
(3l+1) U V U1 V1
U2 V2
9l+1=3(3l + ⎣1/3⎦
⎣ ⎦ ) + (1%3) = 3(3l + 0) + 1
U2 V2
9l+1=3(3l + ⎣5/3⎦ ) + (5%3) = 3(3l + 1) + 2 Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J)
Switched at
time instances To unfold the DFG by J=3, the switching instances are as follows
12l + 1 = 3(4l + 0) + 1
switched at time instance ( W’l + ⎣u/J⎦), I.e. 12l + 7 = 3(4l + 2) + 1
12l + 9 = 3(4l + 3) + 0
U1 V1 at (3l+0) and U2 V2 at (3l+1) 12l + 11 = 3(4l + 3) + 2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Switch with multiple instances Switches with Delays


Example : 4l + 3 Unfolding a DFG containing an edge having a switch and a positive
number of delays is done by introducing a dummy node.
U0 V0
12l + 1, 7, 9, 11 Unfolding by 3 4l + 0,2
U V U1 V1 2D
6l + 1, 5
2D 6l + 1, 5
4l + 3 A A D
Inserting
C
U2 V2 C
Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J) B 6l + 0, 2, 3, 4 Dummy node
B 6l + 0, 2, 3, 4
Switched at time instances
12l + 1 = 3(4l + 0) + 1
12l + 7 = 3(4l + 2) + 1
12l + 9 = 3(4l + 3) + 0
12l + 11 = 3(4l + 3) + 2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Bit-serial Adder Unfold Bit-serial Adder, J=2


A0 S0 A1 S1
A S O t t
Output
X0 X1

INPUTS X D B0 D0 B1 D1
D
Dummy node
d
B D Z0 Z1
4l+0 4l+1 2 3
4l+1,2,3 For each node U in the original DFG, draw J nodes U0 , U1 , U2 ,…, UJ-1

Reset Carry
Carry = 0
Z
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Unfold Bit-serial Adder, J=2 Unfold Bit-serial Adder, J=2


A0 S0 A1 S1 A0 S0 A1 S1

X0 X1 X0 X1

B0 D0 B1 D1 B0 D0 B1 D1
D
Z0 Z1 Z0 Z1
F each
For d U → V with
h edge ith w delays
d l in
i the
th original
i i l DFG,
DFG F each
For d U → V with
h edge ith w delays
d l in
i the
th original
i i l DFG,
DFG
draw the J edges Ui → V(i + w)%J with draw the J edges Ui → V(i + w)%J with
⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1 ⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1
If edge has w=0 Ui → Vi with 0 delays X D for i=0 X0 → D1 with 0 delays and X D for i=1 X1 → D0 with 1 delays
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Unfold the Switch, J=2 Unfold the Switch, J=2


A0 S0 A0 S0

X0 X0

B0 D0 B0 D0
Z X D X Z X D X
4l+0 2(2l+0)+0 4l+0 4l+1,2,3 4l+1 2(2l+0)+1 4l+0 2(2l+0)+0 4l+0 4l+1,2,3 4l+1 2(2l+0)+1
4l+2 2(2l+1)+0 4l+2 2(2l+1)+0
Z0 Z0
4l+3 2(2l+1)+1 4l+3 2(2l+1)+1

Z0 X0 at time 2l+0 D0 X0 at time 2l+1


Write the switching instance as
W’ll + ⎣u/J⎦ ) + (u%J)
Wl + u = J( W

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Unfold the Switch, J=2 Unfold the Switch, J=2


A0 S0 A0 S0 A1 S1
X0 X0 X1
B0 D0 B0 D0 B1 D1
Z X D X
4l+0 2(2l+0)+0 4l+0 4l+1,2,3 4l+1 2(2l+0)+1 2l+0 2l+1
2l 1 D
4l+2 2(2l+1)+0
Z0 Z0 Z1 Dead Node
4l+3 2(2l+1)+1

Z0 X0 at time 2l+0 D0 X0 at time 2l+1 D1 X1 at time 2l+0,1


Z0 X0 at time 2l+0
i.e. always closed
D1 X1 att ti
time 2l+0,1
2l+0 1 D0 X0 att ti
time 2l+1
i.e. always closed
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Remove Dead and Dummy Nodes The Digit Serial Adder


A0 S0 A1 S1 A0 S0 A1 S1

X0 X1 X0 X1

B0 B1 B0 B1
2l+0 2l+1
2l 1 D 2l+0 2l+1
2l 1 D
Z0 Z0

D1 X1 at time 2l+0,1 Carry within


Z0 X0 at time 2l+0
i.e. always closed Carry next iteration iteration
D0 X0 att ti
time 2l+1
D=1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Fully Parallel Adder, i.e.J=4 Unfold the Switch, J=4


LSB MSB A0 S0
A0 S0 A1 S1 A2 S2 A3 S3

X0 X1 X2 X3
X0
B0 D0 B1 D1 B2 D2 B3 D3
B0 D0
Z X D X
Z0 Z1 Z2 Z3 4l+0 4(1l+0)+0 4l+0 4l+1,2,3 4l+1 4(1l+0)+1
4l+2 4(1l+0)+2
Z0
4l+3 4(1l+0)+3
D
For each node U in the original DFG, draw J nodes U0 , U1 , U2 ,…, UJ-1 Write the switching instance as
For each edge U → V with w delays in the original DFG
DFG, W’ll + ⎣u/J⎦ ) + (u%J)
Wl + u = J( W
draw the J edges Ui → V(i + w)%J with ⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Unfold the Switch, J=4 Bit-parallel Adder


A0 S0
A0 S0 A1 S1 A2 S2 A3 S3
X0 X0 X1 X2 X3

B0 D0 B1 D1 B2 D2 B3 D3
B0 D0
Z X D X
4l+0 4(1l+0)+0 4l+0 4l+1,2,3 4l+1 4(1l+0)+1 Z0 Z1 Z2 Z3
4l+2 4(1l+0)+2
Z0
4l+3 4(1l+0)+3
D
Only 1 time instance 0, i.e. fully parallel
Only 1 time instance 0, i.e. fully parallel
Z0 X0, D1 X1, D2 X2 and D3 X3
Z0 X0, D1 X1, D2 X2 and D3 X3
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design DSP Design

Bit-parallel Adder Remove ”Dead” and Dummy Nodes


LSB MSB
A0 S0 A1 S1 A2 S2 A3 S3 A0 S0 A1 S1 A2 S2 A3 S3

X0 X1 X2 X3 X0 X1 X2 X3

B0 D0 B1 D1 B2 D2 B3 D3 B0 D0 B1 D1 B2 D2 B3 D3

Z0 Z1 Z2 Z3 Z0 Z1 Z2 Z3

D D
”Dead” nodes
”Dead”
Dead nodes can be removed Dummy y nodes
can be removed
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design DSP Design

Bit-parallel Adder If Wordlength is not a multiple of J


Carry from MSB as overflow or
if to
t be
b used
d as a 4-bit
4 bit module
d l • determine lcm{W,J}, lcm = least common multiple
A0 S0 A1 S1 A2 S2 A3 S3 • replace switching instance Wl+u with L/W instances
X0 X1 X2 X3 Ll+u+wW,
l for
f w= 0..L/W-1
/
B0 B1 B2 B3 i.e. the switchingperiodicity has been changed
from W to L
Z0
• perform the unfolding as previously
• identify the correspondence between original
Switch if to be used Carry Ripple Adder instances and expanded instances
as a 4
4-bit
bit module
Carry = 0
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Вам также может понравиться