Академический Документы
Профессиональный Документы
Культура Документы
Pipelining
and
Parallel Processing
cont.
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Repetition
DSP algorithms are non-terminating = repeatedly execute same code
Iteration = all operations are executed once
Iteration period = time to perform one iteration
Sampling rate (throughput) = number of samples per second
Latency = time difference between output sample and corresponding input
sample (how long before producing output samples)
DFG - Capture the data-driven nature of a DSP system, intra-iteration and inter-
iteration constraints, nodes are computations (functions, subtasks), edges are
data paths, very general description. Difference from block diagram: Hardware
not allocated, scheduled in DFG.
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Repetition
Critical path - the combinational path with maximum total execution time
Loop (=cycle) - a path beginning and ending at same node
Loop bound for loop
Tj loop computation time
(3) (6) (21)
Wj number of delays in loop D
B C D
Iteration Bound - maximum of all loop bounds
2D
t
T = max l
l L w l
It is the lower bound on execution time for DFG (assuming only pipelining,
retiming, unfolding)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Repetition
Pipelining - insert delay elements to reduce critical path length, In an M-
level pipelined system, the number of delay elements in any path from input to
output is (M-1) greater than that in the same path in the original sequential
circuit
x(n)
D D D D
h0 h1 h2 h3
D
y(n-1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
D
3 5 Not a
Feedforward cutset
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Lets continue
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining (cont.)
y(n) z(n)
x(n) x(n)+y(n) x(n)+y(n)+z(n)
y(n) z(n-1)
x(n) x(n-1)+y(n-1)+z(n-1)
Z-1
x(n)+y(n) x(n-1)+y(n-1)
y3 x3 y2 x2 y1 x1 y0 x0
c4 c3 c2 c1 c0
s3 s2 s1 s0 Tcritical
Assume the delay in calculating the sum, sdelay,
is equal to calculating the carry, cdelay.
Tcritical
z3 z2 z1 z0
c4 c3 c2 c1 c0
s3 s2 s1 s0
What is the critical path now?
4 x cdelay+ sdelay 5 x cdelay
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Example: Pipelining when ripple-carry adders
y3 x3 y2 x2 y1 x1 y0 x0
c4 c3 c2 c1 c0
Tcritical
D D D D
z3 z2 z1 z0
c4 c3 c2 c1 c0
s3 s2 s1 s0
What is the critical path now?
4 x cdelay
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Example: Pipelining when ripple-carry adders
y3 x3 y2 x2 y1 x1 y0 x0
c4 c3 c2 c1 c0
Tcritical
D D D D
z3 z2 z1 z0
c4 c3 c2 c1 c0
s3 s2 s1 s0
What is the critical path now?
4 x cdelay
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
z3 z2 z1 z0
c4 c3 c2 c2 c1 c0
D
s3 s2 s1 s0
What is the critical path now?
Possible??
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Conclusion
The result depends on the structure of the used
blocks, e.g. type of adder (ripple-carry, carry
save, carry look-ahead,...).
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fine-Grain pielining
Let TM=10 units and TA=2 units. If the multiplier is broken into 2
smaller units with processing times of 6 units and 4 units,
respectively (by placing the latches on the horizontal cutset across
the multiplier), then the desired clock period can be achieved as
(TM+TA)/2
x(n)
(6)
Feedforward
cutset? h0 h1 h2 h3 (10)
(4)
D D D
(2) (2) y(n)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fine-Grain pielining
Equal paths. We get a more balanced design.
A fine-grain pipelined version of the 3-tap data-broadcast FIR filter is
shown below.
x(n)
(6)
D D D D
(4)
y(n-1) (6)
D D D
(2)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Non Pipelined
in1 Out
Synchronous Pipelining
R R
Combinatorial
E Logic E
G G
clock
In 0 time
Tclk
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Wave Pipelining
R R
Combinatorial
E Logic E
G G
clock
Delay
Logic depth tmin tmax New input data is
Out applied before the
previous computation
is done.
+ shorter Tclk
- extensive simulation
- tedious design
- hard to verify
- lack of tools
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallell Processing
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing
a b
x(n) ax(n) abx(n)
x(0), x(1), x(2)...
k=0,1,2,3...
a b
x(2k) ax(2k) abx(2k) x(0), x(2), x(4)...
a b
x(2k+1) ax(2k+1) abx(2k+1)
x(1), x(3), x(5)...
Two samples are processed in parallel
double throughput
or
lower power consumption due to reduced VDD
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing
Parallel processing and pipelining techniques are duals of each other: if
a computation can be pipelined, it can also be processed in parallel. Both of them
exploit concurrency available in the computation in different ways.
How to design a Parallel FIR system?
Consider a single-input single-output (SISO) FIR filter:
y(n)=ax(n)+bx(n-1)+cx(n-2)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
y (n) = b0 x (n) + b1 x (n 1) + b2 x (n 2)
y (n + 1) = b0 x (n + 1) + b1 x (n) + b2 x (n 1)
n changed to 2k
y (2k ) = b0 x (2k ) + b1 x (2k 1) + b2 x (2k 2)
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
x(2k+1) x(2k)
x(2k-1)
2 Inputs D
k=0
x(2k-2)
D
b0 b1 b2
y(2k)
y(0) y(2) y(4)
b0 b1 b2
y(2k+1)
y(1) y(3) y(5)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
b0 b1 b2
y(2k+1)
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
So, we know that pipelining can be used only to the extent such that the
critical path computation time is limited by the communication (or I/O)
bound. Once this is reached, pipelining can no longer increase the speed
Chip 1 Chip 2
Tcommunication
output input
pad pad
Tcomputation
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Tclock
Titeration = Tsample =
LM
Parallel processing can also be used for reduction of power
consumption while using slow clocks
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
4k
0 T T T T y(n)
D D D
T/4 T/4 T/4
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
and
Parallel Processing
for Low Power
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Power Dissipation
Two measures are important
Peak power (sets dimensions)
= VDD I DDmax
Ppeak
Average power (battery and cooling)
T
VDD
Pav =
T 0 i DD (t) dt
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
VDD-VT
VT
Ipeak
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Drain Leakage
Ileakage Ileakage increases
with decreasing VT
Subthreshold
Current
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
ln( IDS)
Leakage: High VT
VT IOFF IOFFL
IOFFH
VTL VTH
VG
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Dynamic Power Consumption
Assumed that Vswing=VDD, otherwise VswingVDD
Energy charged in a capacitor
VDD
EC = CV2/2 = CLVDD2/2
Charge
Energy Ec is also discharged, i.e.
Etot= CL VDD2
Power consumption
P = CL VDD2 f (=Ctot V02 f)
Discharge
Since squared most
efficient to reduce P
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Reduce...
Capacitances
Transistor/Gate C
Load C
Interconnects, more and more important
External
Activity
Frequency
Power supply squared so most efficient
CL VDD W
T pd = , k , , Cox
k (VDD VT ) 2 L
if VDD >> VT
CL 1 f
T pd = = proportional to
kVDD f VDD
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
a b
x(n) abx(n-1)
Z-1
ax(n) ax(n-1)
P = f C VDD
2
Critical path cut in half
double f double throughput
or
same f double time for mult reduced VDD
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pseq = f C L VDD
2
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
Propagation delays of the sequential and the pipelined architecture:
C L VDD (C L M ) VDD
Tseq = , T pip =
k (VDD VT ) 2
k ( VDD VT ) 2
M ( VDD VT ) = (VDD VT )
2 2
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
M ( VDD VT ) 2 = (VDD VT ) 2
x(n)
D D D D
h2 h3 h4
Tadd = 1
h0 h1
Tmult = 3
y(n)
M =?
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
M ( VDD VT ) 2 = (VDD VT ) 2
x(n)
D D D D
h2 h3 h4
Tadd = 1
h0 h1
Tmult = 3
y(n)
P = f CV 2
a b c
switched
capacitance
Compare
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining
Increased C due to register
Ppipe = f 1.1C (0.58V) = 0.37P 2
a b c
Pipeline
Register Register
stage
Compare
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
MT = Multiplication Tree
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
k=0,1,2,3...
a b
x(2k) ax(2k) abx(2k) x(0), x(2), x(4)...
a b
x(2k+1) ax(2k+1) abx(2k+1)
x(1), x(3), x(5)...
Two samples are processed in parallel
same f but two samples double throughput
or
reduce f and two samples same throughput and reduced VDD
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing
k=0,1,2,3...
a b
x(2k) ax(2k) abx(2k) x(0), x(2), x(4)...
a b
x(2k+1) ax(2k+1) abx(2k+1)
x(1), x(3), x(5)...
f
=
Ppara LCL ( VDD )=
2
Pseq
2
L
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
L( V DD VT ) 2 = (V DD VT ) 2
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Parallel Processing
2.15 due to mux
Ppar = 0.5f 2.15C (0.58V) = 0.36P 2
a b c a b c
Compare Compare
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
a b c a b c
stage
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Pipelining only a b c
Ppipe = f 1.1C (0.58V)2 = 0.37P
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
End of Lecture 4
Fredrik Edman, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se