Академический Документы
Профессиональный Документы
Культура Документы
Dr. A. Amalin Prince BITS Pilani K.K. Birla Goa Campus Department of Electrical , Electronics and Instrumentation Engineering
MEL G642
System on a chip
What is a DSP core? What is a DSP processor? What is a DSP subsystem? MCU is the task controller-executes tasks without real-time requirements
Main memories SoC design hierarchy Bus with arbitration DSP subsystem Customer IF DM AGU PM RF DMA ALU MMU Interrupt Timer MCU subsystem Customer IF
MAC
Control path
MCU core
Main memories
Processor architecture
ADG
RF
Control path
DSP subsystem
DM
PM
Other pheriph
MEL G642
MEL G642
Inside a core
The core can be divided into three parts:
the datapath, the control path, and the address generation unit (AGU).
MEL G642
MEL G642
MEL G642
MEL G642
Memory
Program memory
Data memory
Control unit
Arithmetic unit
Control unit
Arithmetic unit
MEL G642
DP
DM
DP
DM
DP
DM
CP
PM
CP
PM
CP
PM
(a)
(b)
(c)
DP
DM
DP
DM
CP
Cache
PM/DM
CP
PM
DM
(d)
(e)
Register _File
ALU
MAC
Datapath
OPA OPB
MEL G642
reset
Calculate PC
MEL G642
Receive operands
Execute instruction
Legend
PC FSM Program address
PM
Data bus
Memory bus
MEL G642
MEM ctrl
Operation ctrl
AGU
Control signals
DM
RF
Results
MEL G642
Modules in a core
MEL G642
Control path
MEL G642
MEL G642
Single-clock, reduced instruction Includes multiclock only complex instructions Operands only from registers LOAD and STORE are used to link between register file and data memories Large code size Arithmetic computing based on memory-to-memory variables and register-toregister variables Small code size Silicon might be used for storing complex instructions (microcode)
Most silicon area used for Most silicon area used for program and data storing program and data storing
MEL G642
MEL G642
D es ign of ge neral R ISC i nst ructi ons D es ign of C IS C acc ele rate d ins truct ions De si gn of m i sc ell aneous ins truct ions Ins t ruc tion c oding and r elea s e AS M m anua l Ins truc tion s et si m ula tor and a ss em bler B enc hm arki ng perform a nce a nd covera ge No
satisfied
MEL G642
Application profiling
MEL G642
MEL G642
MEL G642
REG names or Arithmetic and immediate logic data operations Way to get target address Jump taken decision
ALU Flag
No flags 1 / 3
MEL G642
Move-load-store instructions
RISC processor architecture simple. Data and parameters of a subroutine are loaded to the register file first. Operands are from register file or immediate data carried by an instruction. Results in the register file need to be moved back to the data memory
MEL G642
Move-load-store instructions
Mnem Load Operand Rd, DA Description Load data from memory 0/1 Store data to memory 0/1 Move between two registers Move immediate data to a register Operation Rd DM(DA) DM(DA) Rs Rd Rs Rd immediate CC 1
Store
DA, Rs
move move
Rd, Rs Rd, K
1 1
MEL G642
MEL G642
MEL G642
MEL G642
MEL G642
Here "and" operates on each bit of operand A and B; that is, C[0]=A[0] & B[0], C[1]=A[1] & B[1], C[15]=A[15] & B[15].
MEL G642
MEL G642
Logic Operators in C
Condition symbol < <= == >= > != && || ! Conditions Less than Less than or equal to Equal to Greater than or equal to Greater than Not equal to Boolean AND Boolean OR Boolean NOT
MEL G642
In an assembly language
Condition test and condition jump are separated the first instruction offers and flag computation the second instruction is the conditional jump
MEL G642
Flags meet N=1 N=1 or Z=1 Z=1 N=0 N=0 and Z=0 Z=0
MEL G642
MEL G642
MEL G642
MEL G642
What is benchmark
DSP benchmarking gets cycle cost and code size used by a DSP algorithm with single-precision data. Convention of DSP benchmarking
round is required before moving long data from an accumulation register to a general register
MEL G642
What to benchmark
Standard benchmark algorithms available from BDTI (Berkeley Design Technology, Inc)
MEL G642
How to benchmark
BDTI benchmarking convention
It measures the execution time (cycle cost), the code size (program memory cost), and the cost of data memories.
MEL G642
MEL G642
MEL G642
238 43
MEL G642
MEL G642
MEL G642
bottom 5
start
stop
top
MEL G642
MIN address
DM
BAR X
X X X TAR X 0 1 2 3 4
DAR
BAR
TAR
+ + + + +
BAR X (n)
X X X TAR X (n-1) (n-2) (n-3) (n-4)
DAR
BAR X (n-1)
X X X X (n-2) (n-3) (n-4) (n)
TAR
DAR
MAX address
after getting new data 2 Example: The procedure a FIFO getting a new data sample
MEL G642
MEL G642
Basic TSMD
16-tapFIR 16-tapFIR
192 31
173 16
the extra cycle cost is 192-31=161 cycles. The cycle cost is 6.2 times higher than the benchmark of a TSMD. Opportunities for improvement are:
The cost of SW emulated circular buffer and modulo addressing is high. o HW circular buffer and modulo addressing is essential. Data and coefficient loading, MAC, and the loop control can be merged into one instruction, convolution, which is one of the most frequently used instructions in DSP.
Example:
FIR Filtering Auto correlation
Autocorrelation is used for finding regularities or periodical features of a signal
Cross-correlation
Cross-correlation is used for measuring the similarity of a signal with a known signal pattern
What difference??
MEL G642
MEL G642
Lessons Learned
C does not give parallel features; The convolution is one of the most used DSP operations, very high efficiency by having the memory addressing, arithmetic computing, result store, and program flow control carried out in parallel in one instruction. It is possible because the parallel hardware can be organized in a pipeline. Other most frequently used iterative DSP ops can also be specified into one instruction. Research work: Why?
Identify the requirement and benchmark it
MEL G642
Conclusion
An assembly language instruction set must be more efficient. Accelerations implemented at arithmetic and algorithmic levels. Addressing and memory accesses should be executed in parallel with arithmetic computing. Program flow control such as loop or conditional execution shall also be accelerated
MEL G642
Questions?
MEL G642