Академический Документы
Профессиональный Документы
Культура Документы
DIGITAL SIGNAL
PROCESSORS
M em o r y - r e g i s t e r a r c h itectu r e
P r of. B r i a n L . E v a n s
in collaboration w ith
N ir a n ja n D a m e r a -V e n k a t a a n d
M a g e s h Va llia p p a n L o a d - s t o r e a r c h itectu r e
E m b e d d e d S ign a l P r oces s in g L a b or a t or y
T h e U n i v e r s it y of T e x a s a t A u s t in
A u s t in , T X 7 8 7 1 2 - 1 0 8 4
h t t p ://s i g n a l.e c e .u t e x a s .e d u /
Outline
n C o n v e n t ion a l D S P a r ch it e ct u r e
n P i p e l i n in g i n D S P p r oce s s or s
n R I S C v s . D S P p r oce s s or a r ch it e ct u r e s
n T I T M S 3 2 0 C 6 x V L I W D S P a r ch it e ct u r e
n S ign a l p r o c e s s i n g on g e n e r a l-p u r p o s e p r o c e s s o r s
n C o n clu s ion
2
Signal Processing Applications
n Low-cost em b e d d e d s y s t e m s
4 M o d e m s , cellu la r t e l e p h on e s , d i s k d r i v e s , p r i n t e r s
n P C b a s e d m u lt im e d ia
4 C o m p r e s s ion /d e com p r e s s ion of a u d io, gr a p h ics , vid e o
n E m b e d d e d p r oce s s or r e q u ir e m e n t s
4 I n e x p e n s i v e w i t h s m a ll a r e a a n d volu m e
4 D e t e r m in i s t i c i n t e r r u p t s e r v ice r ou t in e l a t e n cy
4 L o w p o w e r : ~ 5 0 m W ( T M S 3 2 0 C 5 4 x u s e s 0 . 3 6 A / M I P )
3
Conventional DSP Architecture
n H a r v a r d a r ch it e ct u r e
4 S e p a r a t e d a t a m e m or y /b u s a n d p r ogr a m m e m or y /bu s
4 T h r e e r e a d s a n d on e or t w o w r i t e s p e r i n s t r u ct ion cycle
n D e t e r m in istic in t e r r u p t s e r v ice r ou t in e la t e n cy
n M u lt iply-a ccu m u la t e in s in g l e i n s t r u c t i o n c y c l e
n S p e cia l a d d r e s s in g m o d e s s u p p o r t e d i n h a r d w a r e
4 M o d u l o a d d r e s s i n g for cir cu la r b u ffe r s (e .g. F I R filt e r s )
4 B it -r e v e r s e d a d d r e s s in g (e.g. fa s t F ou r i e r t r a n s for m s )
n I n s t r u ct ion s t o k e e p t h e p i p e l i n e ( 3 - 4 s t a g e s ) f u l l
4 Z e r o-ove r h e a d loop in g (on e p i p e l i n e flu s h t o s e t u p )
4 D e la y e d b r a n ch e s
4
Conventional DSP Architecture (cont)
Data-shifting
Modulo addressing
Time Buffer contents Next sample
n Bit reversed
n=N xN-2 xN-1 xN xN-K+1 xN-K+2 xN+1
addressing
4 used to n=N+1 xN-2 xN-1 xN xN+1 xN-K+2 xN-K+3 xN+2
implement the
radix-2 FFT n=N+2 xN-2 xN-1 xNN xN+1 xN+2 xN-K+3 xxN-K+4
N-K+4 xN+3
5
Conventional DSP Architecture (cont)
F i x e d -P o i n t F l o a t i n g -P o i n t
C o s t /U n i t $5 - $79 $5 - $381
Architecture Accu m u la t or loa d -s t or e or
m e m or y -r e g ist e r
Registers 2-4 data 8 or 1 6 d a t a
8 address 8 or 1 6 a d d r e s s
Data Word s 1 6 o r 2 4 b it in t e g e r 32 bit in t e g e r a n d
a n d fixed -poin t fixed /floa t in g -poin t
On-Chip 2 -64 k w or d s d a t a 8 -64 k w or d s d a t a
Mem ory 2 -64 k w or d s p r ogr a m 8 -64 k w or d s p r ogr a m
A d d r ess 16-128 k w d a t a 16 Mw 4Gw data
S p a ce 1 6 -64 k w p r ogr a m 1 6 M w 4 G w p r ogr a m
Com p ilers C com p iler s ; C , C++ com p iler s ;
p o o r c o d e g e n e r a t ion b e t t e r cod e g e n e r a t ion
Exa m p les TI TMS320C5x; TI TMS320C3x;
Motorola 56000 An a log D e v ice s S H A R C
6
Conventional DSP Architecture (cont)
7
Pipelining
Sequential (Motorola 56000)
Fetch Decode Read Execute
pipeline interlocking
Fetch Decode Read Execute
in the processor
Superpipelined (CDC7600)
hardware instruction
scheduling
b r a n ch in s t r u ct ion i s d e cod e d F D R E
4 F lu s h t h e p i p e l i n e D C B A
4 or : D e l a y e d b r a n c h ( e x p o s e p i p e l i n e ) E D C B
F E D C
n A d a t a h a z a r d occu r s b e ca u s e
br F E D
a n o p e r a n d ca n n ot b e r e a d y e t G br F E
4 I n t e n d e d b y p r ogr a m m e r - - br F
4 or : I n t e r lock h a r d w a r e in s e r t s - - - br
b u b b l e X - - -
TMS320C5x example
Y X - -
Y - X -
LAC #064h LAR AR2, DATA
Z Y - X
SAMM AR2 LACC *-
Z Y -
NOP
LACC *- Z Y
Z
10
Pipelining: Avoiding Control Hazards
Fetch Decode Read
Execute
A key factor in the numeric performance
of DSPs is the provision of special F D R E
hardware to perform looping.
D C
B
RPT COUNT E D AB
TBLR *+ CD
F E CD
E
rpt F E
F
n A r e p e a t in s t r u ct ion r e p e a t s X rpt F
rpt
X - rpt
on e in s t r u ct ion or a block of -
X - -
in s t r u ct ion s a ft e r r e p e a t -
X X -
X
n T h e p i p e l i n e is filled w it h X X X
X
X X X
r e p e a t e d in s t r u ct ion (or X
X X X
X
b lock of in s t r u ct ion s ) X X
n C o s t : on e p i p e l i n e flu s h on ly
11
RISC vs. DSP: Instruction Encoding
n R I S C : S u p e r s ca la r
Reorder
Load/store
Load/store
Load/store
Address
ALU Multiplier
12
RISC vs. DSP: Memory Hierarchy
n RISC
Registers
I/D Physical
Cache memory
Out
of TLB
order
TLB: Translation Lookaside Buffer
I Cache Internal
n DSP memories
Registers
External
memories
13
TI TMS320C6x VLIW DSP Architecture
Simplified
Architecture
Program RAM
Data RAM
or Cache
Addr
Internal Buses
DMA
Data
.D1 .D2 Serial Port
Regs (A0-A15)
Regs (B0-B15)
External .M1 .M2 Host Port
Memory
-Sync .L1 .L2
Boot Load
-Async
.S1 .S2
Timers
Control Regs
Pwr Down
CPU
14
TI TMS320C6x VLIW DSP Architecture
n T w o p a r a lle l d a t a p a t h s w it h s ingle-cycle u n it s :
4 D a t a u n it - 3 2 -bit a d d r e s s ca lcu la t ion s (m o d u lo, lin e a r )
4 M u lt i p l i e r u n it - 1 6 b i t x 1 6 b i t w i t h 3 2 -bit r e s u lt
4 Logica l u n it - 4 0 -bit (s a t u r a t ion ) a r it h m e t ic & com p a r e s
4 S h ift e r u n it - 3 2 - b i t i n t e g e r A L U a n d 4 0 -bit s h ift e r
n 1 6 3 2 -bit r e g i s t e r s i n e a ch d a t a p a t h
4 4 0 b it s ca n b e s t or e d in a d ja cen t e v e n /od d r e g i s t e r s
n F i x e d -p o i n t ( C 6 2 x ) a n d floa t in g -p o i n t ( C 6 7 x )
n T M S 3 2 0 C 6 2 0 1 : $ 2 5 in v olu m e
4 1 5 0 M H z , 3 0 0 m illion M A C s /s e c, 1 2 0 0 R I S C M I P S
4 O n -ch i p m e m or y : 1 6 k x 3 2 p r o g r a m , 3 2 k x 1 6 d a t a
15
TI TMS320C6x VLIW DSP Architecture
16
C5x and C6x Addressing Modes
n Immediate TMS320C5x TMS320C6x
4 The operand is part of the
instruction ADD #0FFh add .L1 -13,A1,A6
n Register
4 The operand is specified in a
(implied) add .L1 A7,A6,A7
register
n Direct
4 The address of the operand is
part of the instruction (added ADD 010h not supported
to imply memory page)
n Indirect
4 The address of the operand is ADD * ldw .L1 *A5++[8],A1
stored in a register
17
TMS320C6x vs. Pentium MMX
P r ocessor Peak BDTI ISR Power Unit Area Volum e
MIP S m a r k s latency Price
18
Application: FIR Filter
z-1 z-1 z-1
n E a ch t a p r e q u ir e s
4 F e t ch in g o n e d a t a s a m p l e
4 F e t ch in g o n e o p e r a n d
4 M u lt i p l y i n g t w o n u m b e r s
4 Accu m u la t in g m u lt iplica t ion r e s u lt
4 S h ift in g o n e s a m p l e i n t h e d e l a y l i n e
19
Application: FIR Filter on a TMS320C5x
Coefficients
Data
LAR AR3, #LASTAP ; Point to oldest sample
RPT #127
MACD COEFFP, *- ; Do the thing
APAC
SACH Y,1 ; Store result -- note shift
20
Application: FIR Filter on a TMS320C62x
Coefficients
Data
Single-Cycle Loop
...
...
21
Ordered Dithering on a TMS320C62x
periodic
1/8 5/8 7/8 3/8
array of
thresholds
7/8 3/8 1/8 5/8
Throughput of two cycles
; remove next two lines if thresholds in linear array
MVK .S1 0x0001,AMR ; modulo block size 2^2
MVKH .S1 0x4000,AMR ; modulo addr reg B6
; initialize A6 and B6
.trip 100 ; minimum loop count
dith: LDB .D1 *A6++,A4 ; read pixel
|| LDB .D2 *B6++,B4 ; read threshold
|| CMPGTU .L1x A4,B4,A1 ; threshold pixel
|| ZERO .S1 A5 ; 0 if <= threshold
[A1] MVK .S1 255,A5 ; 255 if > threshold
|| STB .D1 A5,*A6++ ; store result
||[B0] SUB .L2 B0,1,B0 ; decrement counter
||[B0] B .S2 dith ; branch if not zero
22
DSP Cores
n ASIC with:
4 Programmable DSP
4 RAM
4 ROM
4 Standard cells
4 Codec
4 Peripherals
4 Gate array
4 Microcontroller
23
DSP on General Purpose Processors
n M u lt im e d ia a p p lica t ion s on P C s
4 Vid e o, a u d io, gr a p h ics a n d a n i m a t i o n
4 R e p e t it i v e p a r a l l e l s e q u e n ces of in s t r u ct ion s
24
DSP on General Purpose Processors (cont)
n P r ogr a m m in g i s c o n s i d e r a b l y t o u g h e r
4 C / C + + com p iler s d o n ot g e n e r a t e n a t i v e s i g n a l p r o c e s s i n g
cod e e x cep t M e t r o w e r k s C o d e W a r r ior 5 g i v e s M M X cod e
4 L i b r a r ies of r ou t in e s u s in g n a t i v e s i g n a l p r o c e s s i n g
4 H a n d cod e u s in g i n - l i n e a s s e m b ly for b e s t p e r for m a n ce
4 P a ck /u n p a ck d a t a n ot a lign e d on S I M D w or d b ou n d a r i e s
4 5 0 -cycle p e n a lt y t o s w it ch t o M M X ; 0 p e n a l t y f o r V I S
4 S a t u r a t ion a r it h m e t ic in M M X; n ot s u p p or t e d in V I S
4 E x t e n d e d -p r e cision a ccu m u la t ion in M M X; n on e in V I S
n S p e e d u p for a p p lica t ion s
4 S ign a l a n d im a g e p r oce s s in g - 1 . 5 : 1 t o 2 : 1
4 G r a p h ics - 4:1 t o 6:1 (n o p a c k i n g /u n p a ck in g )
25
Intel MMX Instruction Set
n 6 4 -bit S I M D r e g i s t e r ( 4 d a t a t y p e s )
4 6 4 -b it q u a d w or d
4 P a ck e d b y t e ( 8 b y t e s p a ck e d in t o 6 4 b i t s )
4 P a ck e d w or d ( 4 1 6 -b it w or d s p a ck e d in t o 6 4 b i t s )
4 P a ck e d d ou b l e w o r d (2 d ou b l e w or d s p a ck e d in t o 6 4 b i t s )
n 5 7 n e w in s t r u ct ion s
4 P a ck a n d u n p a ck
4 A d d , s u b t r a ct , m u lt iply, a n d m u lt i p l y /a ccu m u la t e
n S a t u r a t ion a n d w r a p a r ou n d a r it h m e t ic
n M a xim u m p a r a lle l i s m p o s s i b l e
4 8 :1 for 8 -bit a d d it ion s
4 4 :1 for 8 x 1 6 m u lt iplica t ion or 1 6 -bit a d d it ion s
26
Concluding Remarks
n D igit a l s i g n a l p r o c e s s o r m a r k e t
4 4 0 % a n n u a l gr o w t h r a t e s in c e 1 9 9 0
4 $ 3 .5 billion r e v e n u e i n 1 9 9 8
4 4 5 % T I , 2 5 % L u cen t , 1 0 % M ot o r o l a , 8 % A n a l o g D e v i c e s
n I n d e p e n d e n t b e n ch m a r k in g b y i n d u s t r y
4 B e r k e l e y D e s i g n T e ch n ology In c. h t t p ://w w w .b d t i.com
4 E D N E m b e d d e d M icr o p r o c e s s o r B e n c h m a r k
C o n s or t iu m h t t p ://w w w .e e m bc.or g
n W e b r e s ou r c e s
4 com p .d s p n e w s g r ou p : F A Q w w w .b d t i.com /fa q /d s p _fa q .h t m l
4 e m b e d d e d p r oce s s or s a n d s y s t e m s : w w w .eg3.com
4 on -lin e cou r s e s a n d D S P b oa r d s : w w w .t e ch on lin e .com
28
References
1 G . E . A l l e n , B . L . E v a n s , a n d D . C. S ch a n b a ch e r , R e a l-T i m e S o n a r
B e a m for m in g on a U n ix W or k s t a t ion , Proc. I E E E A s i l o m a r C on f. O n
S i g n a l s , S y s t em s , a n d C om p u t e r s , p p . 7 6 4 - 7 6 8 , 1 9 9 8 .
h t t p ://w w w .e c e .u t e x a s .e d u /~ b e v a n s /p a p e r s /1 9 9 8 / b e a m for m in g /
2 R . B h a r g a v a , R . R a d h a k r i s h n a n , B . L . E v a n s , a n d L . K . J oh n , E v a lu a t in g
M M X T e c h n ology U s in g D S P a n d M u lt i m e d i a A p p l i c a t i o n s , Proc. I E E E
S y m . O n M icroarch itectu r e , p p . 3 7 - 4 6 , 1 9 9 8 .
h t t p ://w w w .e c e .u t e x a s .e d u /~ r a v i b /m m x d s p /
3 W . C h e n , H . J . R e e k i e , S . B h a v e , a n d E . A . L e e , N a t i v e S i g n a l P r oce s s i n g
on t h e U lt r a S P A R C in t h e P t olem y E n v i r on m e n t , Proc. I E E E A silom a r
C o n f. O n S i g n a l s , S y s t em s , a n d C om p u t ers , 1 9 9 6 .
h t t p ://w w w .e c e .u t e x a s .e d u /~ b e v a n s /cou r s e s /e e 3 8 2 c /lect u r e s /21_n s p /v i s /
4 B . L . E v a n s , E E 3 7 9 K - 1 7 R e a l - T i m e D S P L a b or a t or y , U T A u s t i n .
h t t p ://w w w .e c e .u t e x a s .e d u /~ b e v a n s /cou r s e s /r e a lt im e /
5 B . L . E v a n s , E E 3 8 2 C E m b e d d e d S o ft w a r e S y s t e m s , U T A u s t i n .
h t t p ://w w w .e c e .u t e x a s .e d u /~ b e v a n s /cou r s e s /e e 3 8 2 c /
6 A. K u lk a r n i a n d A. D u b e , E v a lu a t ion of t h e C o d e G e n e r a t i o n D o m a in in
P t olem y , h t t p ://w w w .e c e .u t e x a s .e d u / ~ b e v a n s /t a l k s / b e n c h m a r k in g 9 7 /s l d 0 0 1 . h t m
7 P . L a p s l e y , J . B i e r , A. S h oh a m , a n d E . A. L e e , D S P P r ocessor
F u n d a m en t a l s , I E E E P r e s s , 1 9 9 7 .
29