Вы находитесь на странице: 1из 60

UNIT 1:

FUNDAMENTALS OF
PROGRAMMABLE DSPs

Bhooshan Humane
TOPICS TO BE COVERED
• Multiplier and Multiplier accumulator,
• Modified Bus Structures
• Memory access in P-DSPs
• Multiple access memory
• Multi-ported memory
• VLIW architecture
• Pipelining
• Special Addressing modes in PDSPs
• On chip Peripherals
• Computational accuracy in DSP processor
• Von Neumann and Harvard Architecture
2
What is Digital Signal Processors?

DSPs outperform general purpose processors for


time-critical applications, and are architecturally
designed for mathematical operations and data
movement. (Source: www.ti.com)

3
P-DSPs / Advanced Microprocessor / RISC
• Conventional Microprocessor – For general purpose applications.
• Adv. Microprocessor like:
• DEC Alpha 21064 computes a 1024 point complex FFT in 480us
compared to
• Analog Device ADSP 21060 takes 460us.
• But in terms of:
• Low power requirement, cost, real time I/O Capability &
availability of high speed on-chip memories, P-DSPs have
advantage over adv. up & RISC.

4
The Basic Features of DSPs

5
Feature Use
Fast-Multiply accumulate Most DSP algorithms, including
filtering, transforms, etc. are
multiplication- intensive
Multiple – access memory Many data-intensive DSP operations
architecture require reading a program instruction
and multiple data items during each
instruction cycle for best performance
Specialized addressing modes Efficient handling of data arrays and
first-in, first-out buffers in memory

Specialized program control Efficient control of loops for many


iterative DSP algorithms. Fast interrupt
handling for frequent I/O operations.

On-chip peripherals and I/O On-chip peripherals like A/D converters


interfaces allow for small low cost system designs.
Similarly I/O interfaces tailored for
common peripherals allow clean
interfaces to off-chip I/O devices.
6
Typical Applications for the TMS320 DSPs
Automotive
• Adaptive ride control
• Antiskid brakes
• Cellular telephones
• Digital radios
• Engine control
• Navigation and global positioning
• Vibration analysis
• Voice commands
• Anticollision radar
7
Typical Applications for the TMS320 DSPs
Consumer
• Digital radios/TVs
• Educational toys
• Music synthesizers
• Pagers
• Power tools
• Radar detectors
• Solid-state answering machines

8
Typical Applications for the TMS320 DSPs
General-Purpose
• Adaptive filtering
• Convolution
• Correlation
• Digital filtering
• Fast Fourier transforms
• Hilbert transforms
• Waveform generation
• Windowing
9
TEST
• Formula For Convolution?

𝑛
y(n) = 𝑘=0 𝑥 (𝑛 − 𝑘)ℎ(𝑘)

• Formula for Auto / Cross- Correlation?

10
MUTIPLIER & MUTIPLIER ACCUMULATOR
(MAC)

Fig: 1 Implementation of Convolver with


Single Multiplier Ladder
11
MUTIPLIER & MUTIPLIER ACCUMULATOR
(MAC)

• Syntax : MACD pma, dma Direct addressing


• Operands : dma: 7 LSBs of the data-memory address
pma: 16-bit program-memory address

12
13
MACD Instruction
• MAC operation with data move (i.e. MACD instruction) requires
four memory accesses per instruction cycle.

• WHAT IS INSTRUCTION CYCLE /1 processor CLOCK PERIOD ???

• In the conventional up 1 instruction cycle corresponds to several


clock cycle.
• For Eg: i)LDA Addr (4)
ii) MVI A, 50H(2)
14
Meaning of Instruction Cycle

An instruction cycle is the time that elapses since


an instruction is fetched till the particular
instruction complete execution including the time
taken for writing the result into a register or
memory.

15
MACD Instruction
The 4 memory accesses/clock period required for the MACD instruction are as follows:

1) Fetch the MACD instruction from the program memory

2) Fetch one of the operands from the program memory

3) Fetch the second operand from the data memory

4) Write the content of the data memory with address dma into the location with the
address dma+1

16
Von Neuman Architecture
Results
Processing
Unit Data Bus
Operands

Status Opcode

Data/
Instructions
Instructions
Control Data &
Unit Program
memory
Address
17
Harvard Architecture
Results / Operands
Data
Processing
Memory
Unit

Status Opcode
Address

Control Program
Unit Instructions Memory

Address
18
Modified Harvard Architecture
Results / Operands
Data
Processing
Memory
Unit

Status Opcode
Address

Control Program
Instructions
Unit Memory

Address
19
Multiple Access Memory

• The number of memory accesses/clock period can also be


increased by using a high speed memory than one memory
accesses/clock period.

• The concept of DARAM

20
Multiported Memory

Address Bus1 Data Bus 1

Dual port
memory

Address Bus 2 Data Bus 2

Fig: Block Diagram of a dualported memory

21
VLIW Architecture
P
R
O Multiported register file
G
R
A
M
Read/Write cross bar
C
O
N
T
R Functional
Functional
O Unit n
Fig: Block Diagram of the VLIW L
Unit 1 .....
Architecture
U
N
I
T
Instruction cache
22
Pipelining

• One of the approach for increasing the efficiency of P-DSPs and


Advanced Microprocessors.

• An instruction cycle starting with the fetching of an instruction &


ending with the execution of the instruction including the time storage
of the results can be split into a number of microinstructions.

Digital Signal Processors & Architecture By Bhooshan


23
Humane
Approach
• An instruction cycle requiring four microinstructions can be said to be
in four phases as follows:
• 1) Fetch Phase
• 2) Decode Phase
• 3) Memory read Phase
• 4) Execution Phase
• Each of the above microinstructions may be carried out separately by
four functional units.

Digital Signal Processors & ArchitectureBy Bhooshan


24
Humane
Pipelining: Its Natural!
• Laundry Example
• Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes

• Dryer takes 40 minutes


• “Folder” takes 20 minutes

25
Sequential Laundry
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
C
d
e
r D

• Sequential laundry takes 6 hours for 4 loads


• If they learned pipelining, how long would laundry take?
26
Pipelined Laundry Start work ASAP
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
C
d
e
r D

• Pipelined laundry takes 3.5 hours for 4 loads


27
Pipelining Lessons
• Pipelining doesn’t help latency of
6 PM 7 8 9 single task, it helps throughput of
entire workload
Time
T
• Pipeline rate limited by slowest
a
30 40 40 40 40 20 pipeline stage
s
A
• Multiple tasks operating
k simultaneously
O • Potential speedup = Number pipe
r
B stages
d • Unbalanced lengths of pipe stages
e reduces speedup
C
r
• Time to “fill” pipeline and time to
D “drain” it reduces speedup

Digital Signal Processors & ArchitectureBy Bhooshan


28
Humane
Pipelining
Fig: Instruction cycles of processor with no pipelining Fig: Instruction cycles of processor with pipelining
Value of T Fetch Decode Read Execute Value of T Fetch Decode Read Execute
1 I1 1 I1
2 I1 2 I2 I1
3 I1 3 I3 I2 I1
4 I1 4 I4 I3 I2 I1
5 I2 5 I5 I4 I3 I2
6 I2 6 I6 I5 I4 I3
7 I2 7 I7 I6 I5 I4
8 I2 8 I8 I7 I6 I5
9 I3 9 I9 I8 I7 I6
10 I3 10 I9 I8 I7
11 I3 11 I9 I8
12 I3 12 29I9
Special Addressing Modes in P-DSPs
1) Short Immediate Addressing

2) Short direct Addressing

3) Memory-mapped Addressing

4) Indirect Addressing

5) Bit Reversed Addressing Mode

6) Circular Addressing
30
1) Short Immediate Addressing
• Permits the operand to be specified using a short constant that
forms part of a single word instruction.
• The length of the short constant depends on he instruction type &
P-DSP.
• Short immediate values can be 3, 5, 8, or 9 bits in length.

31
Some Info. about DP
• In the direct addressing mode, data memory is addressed
in blocks of 128 words called data pages.
• The entire 64K of data memory consists of 512 data pages
labeled 0 through 511, as shown in Fig.
• The current data page is determined by the value in the 9-
bit data page pointer (DP) in status register ST0.
• For example, if the DP value is (0 0000 0000)2, the current
data page is 0. If the DP value is (0 0000 0010)2, the
current data page is 2.
32
33
2) Short direct Addressing
• Permits the lower order address of the operand of an instruction
to be specified in the single word instruction.
• In TI TMS320 DSPs, the higher order 9 bits of the memory are
stored in the data page pointer & only the lower 7 bits are
specified as a part of the instruction.

34
Generation of Data Addresses in Direct Addressing
Mode

35
3) Memory-mapped Addressing
• The CPU registers & I/O registers of P-DSPs are also accessible as
memory location.
• This is achieved by storing them in either the starting page or the
final page of the memory space.
• For Eg. In TMS320C5X, page 0 corresponds to CPU registers & I/O
registers.
• When these registers are accessed using memory mapped
addressing modes, the higher address bits are not taken from the
data page pointer & instead made to be 0 in case of TI DSPs & 1 in
Motorola DSPs.
36
4) Indirect Addressing
• In indirect addressing, any location in the 64K-word data space
can be accessed using the 16-bit address contained in an auxiliary
register.

• The C54x DSP has eight 16-bit auxiliary registers (AR0–AR7).

• Indirect addressing is used mainly when there is a need to step


through sequential locations in memory in fixed-size steps.

37
4) Indirect Addressing
• In P-DSPs this addressing mode has a number of options.
• Permits an array of data to be processed in P-DSP to be efficiently
fetched & stored.
• The address can be stored in one of the registers called indirect
address registers.
• In TI, indirect address registers are called auxillary registers ARs.
• Any of these registers can be updated when the operand fetched
using these registers are being executed.
• This is made possible by having an additional ALU in CPU core.

38
4) Indirect Addressing
• The ARs can be incremented or decremented either in steps of 1 or
in steps specified by the content of an offset register.

• In TI, offset register is called as INDX register.

• In Analog devices, called as Modifier Register.

• Contents can also be updated by a contant using Bit Reversed


Addressing Mode.

• Pre-increment / decrement & Post-increment / decrement.


39
5) Bit Reversed Addressing Mode
• The binary pattern corresponding to a particular decimal number
is obtained by writing the natural binary equivalent of the number
in the reverse order so that the MSB of the natural binary becomes
the LSB of the bit reversed number & vice-versa.

40
Decimal Number Natural Binary Number Bit Reversed Number
0 0000 0000
1 0001 1000
2 0010 0100
3 0011 1100
4 0100 0010
5 0101 1010
6 0110 0110
7 0111 1110
8 1000 0001
9 1001 1001
10 1010 0101
11 1011 1101
12 1100 0011
13 1101 1011
14 1110 0111
15 1111 1111 41
6) Circular Addressing
• Memory can be organized as a circular buffer with the beginning
memory address & the ending memory address corresponding to
this buffer defined by the programmer.
• In this, when the address pointer is incremented, the address will
be checked with the ending memory address of the circular buffer.
• If it exceeds that, the address will be made equal to the beginning
address of the circular buffer.

42
On Chip Peripherals
1) On-chip Timer
2) Serial Port
3) TDM Serial port
4) Parallel Port
5) Bit I/O Ports
6) Host Port
7) Comm Ports
8) On-Chip A/D and D/A Converters
9) P-DSPs with RISC & CISC 43
2) Serial Port

Fig: Burst Mode Serial Port Receive Operation 44


3) TDM Serial port

Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8

One TDM Frame

Fig: TDM Frame with 8 time slots

45
3) TDM Serial port
• TFRM: The Frame Sync Signal
• TClock: The Bit Clock
• TADD: The Address of the serial device that is outputting data in a
particular TDM Slot.
• TDAT: The data transmitted into the TDM channel by authorized
device.

46
Fig. Data transfer using TDM Channel

47
9) P-DSPs with RISC & CISC
• TI TMS320C6X P-DSPs uses RISC processor.
• Large number of Analog Devices & Motorola Devices uses CISC.

48
49
50
51
52
53
54
55
56
57
58
59
60

Вам также может понравиться