Вы находитесь на странице: 1из 59

VLSI Architecture :: MEL G642

Dr. A. Amalin Prince BITS Pilani K.K. Birla Goa Campus Department of Electrical , Electronics and Instrumentation Engineering

MEL G642

System on a chip
What is a DSP core? What is a DSP processor? What is a DSP subsystem? MCU is the task controller-executes tasks without real-time requirements
Main memories SoC design hierarchy Bus with arbitration DSP subsystem Customer IF DM AGU PM RF DMA ALU MMU Interrupt Timer MCU subsystem Customer IF

MAC

Control path

DSP Processor core


MEL G642

MCU core

Main memories

Processor architecture

ADG

RF

Control path

DSP subsystem

DM

Bus and arbiration DM DMA MMU

PM

DSP core MAC accelerator Interrupt ALU Timer

Other pheriph

DSP Processor Chip inferface


MEL G642

MEL G642

Architecture and microarchitecture


The processor architecture is the hardware organization of the core and its peripherals including the memory bus architecture. Architecture represents relations of modules The microarchitecture design is the specification of functional modules ASIP microarchitecture design is the implementation of an ISA specification into hardware modules.

MEL G642

Inside a core
The core can be divided into three parts:
the datapath, the control path, and the address generation unit (AGU).

The core components are organized around two data busses:


The memory bus is distributed between the core and the memory subsystem. The register bus connects the register file to all units in the core.

MEL G642

Memory subsystem in a DSP subsystem


The memory subsystem consists of
data memories (DM), program (code) memory (PM), AGU, DMA, and MMU.

MEL G642

Peripherals in a DSP subsystem


Timers for counting clock cycles and events Interrupt controller for handling interrupts DMA (Direct Memory Access) controller for handling data transfers to/from main memory and between other memories/ports MMU (Memory Management Unit) for reliable and efficient (address space) memory usage

MEL G642

DSP memory architecture

MEL G642

History of DSP memory architectures

Memory

Program memory

Data memory

Control unit

Arithmetic unit

Control unit

Arithmetic unit

In-out In-out (a) Von Neumann architecture (b) Harvard architecture

MEL G642

History of DSP memory architectures


MUX MUX

DP

DM

DP

DM

DP

DM

CP

PM

CP

PM

CP

PM

(a)

(b)

(c)

One tap of convolution requires multiple clock

Fetch coefficients instead of instructions during CONV


MEL G642

Dual port/multi-port memory required. Used up to 1980s

History of DSP memory architectures


MUX MUX

DP

DM

DP

DM

CP

Cache

PM/DM

CP

PM

DM

(d)

(e)

Program for iteration to be stored in the cache in advance


MEL G642

Two data memories; widely used

A typical DSP bus architecture


Control Path (CP) and address addressing path (AGU)

Register _File

ALU

MAC

Datapath
OPA OPB

Register bus Memory bus PM bus

D1 -address D1-data D2 -address D2-data P-address Program PM DM1 DM2

MEL G642

Control flow of DSP ASIP

reset

Send PC to PM Request an instruction

Get code from PM Receive an instruction

Calculate PC

Receive states from DP Flags from DP

Generate operand addresses Generated address to storage units

Decode the instruction and send control to DP Control signals to DP

Instruction Flow FSM

MEL G642

Data flow of DSP ASIP


From PM Receive instruction From address generator Receive operand address Send address to storage HW Fetch operands

Receive operands

Return states Flags to PC FSM

Store result Send result to storage HW

Execute instruction

Data Flow FSM


MEL G642

A complete view of a DSP processor

Legend
PC FSM Program address

PM

Program flow control

Data bus

Configura tion and status

Memory bus

Instruction Instruction decoder

MEL G642

MEM ctrl

Operation ctrl

AGU

Operand & result control

Internal signals in control path

Control signals

DM

RF

Exec unit ALU/MAC

Results

MEL G642

Modules in a core

MEL G642

Modules in a DSP core


Datapath
Register file ALU MAC AGU

Control path

MEL G642

Differences between design of DSP and MPU


The MPU designers think of ultimate performance and ultimate flexibility as well as the compiler-friendly instruction set. The ASIP DSP designers think of application and cost first, and the challenge is to be efficient. The goal of an ASIP design is to reach the highest performance over silicon, the highest performance over power consumption, the highest performance over the design cost.

MEL G642

Is DSP CISC or RISC


a DSP, like a RISC:
More general-purpose registers. Most instructions as simple instructions. Instruction decoding by decoding logic circuit instead of microcode. Regular instruction pipelining.

a DSP, like a CISC:


One execution cycle for ALU and multiple cycles for iteration. Complicated data memory addressing modes and circuits. Special-purpose registers (accumulator registers). Strong instructions for accelerating certain tasks.
MEL G642

Is DSP CISC or RISC


DSP Emphasis on hardware and software Single and multiclock complex instructions Operands from registers Operands also from memories RISC Emphasis on software CISC Emphasis on hardware

Single-clock, reduced instruction Includes multiclock only complex instructions Operands only from registers LOAD and STORE are used to link between register file and data memories Large code size Arithmetic computing based on memory-to-memory variables and register-toregister variables Small code size Silicon might be used for storing complex instructions (microcode)

Small code size

Most silicon area used for Most silicon area used for program and data storing program and data storing

MEL G642

Design instruction set

MEL G642

S ource code profil ing: c overage and 10-90% l ocali ty

Instruction set design flow

D es ign of ge neral R ISC i nst ructi ons D es ign of C IS C acc ele rate d ins truct ions De si gn of m i sc ell aneous ins truct ions Ins t ruc tion c oding and r elea s e AS M m anua l Ins truc tion s et si m ula tor and a ss em bler B enc hm arki ng perform a nce a nd covera ge No
satisfied
MEL G642

ye s R el eas e t he ins truc tion s et archi tec ture


MEL G642

Release an instruction set

Application profiling

Design of assembly instruction set architecture

Instruction set benchmarking

When Benchmarking result equivalent to requirements

MEL G642

We need to identify problems


How is an instruction set designed and why is it designed in that way? In which circumstances should a function be implemented using an instruction instead of a subroutine? Why ASIP DSP instructions not really RISC Why my benchmarking is not satisfactory?

MEL G642

What is the starting point


Let us start at the point to implement C functions to an assembly instruction set A typical architecture with two DM in parallel Instructions including move-load-store, ALU/MAC, and program flow control

MEL G642

Classify the Instruction set


Instruction group /type Load, store, and move ALU instructions Flow control Operands Register name and memory addressing Operations Data transfer and addressing modes Mathematical description DST (ADR) <= SRC (ADR) DST (ADR) <= OA op OB If condition true PC <= target address Flags No flag CC 1

REG names or Arithmetic and immediate logic data operations Way to get target address Jump taken decision

ALU Flag

No flags 1 / 3

MEL G642

Move-load-store instructions
RISC processor architecture simple. Data and parameters of a subroutine are loaded to the register file first. Operands are from register file or immediate data carried by an instruction. Results in the register file need to be moved back to the data memory

MEL G642

Move-load-store instructions
Mnem Load Operand Rd, DA Description Load data from memory 0/1 Store data to memory 0/1 Move between two registers Move immediate data to a register Operation Rd DM(DA) DM(DA) Rs Rd Rs Rd immediate CC 1

Store

DA, Rs

move move

Rd, Rs Rd, K

1 1

MEL G642

Addressing for data memory access


Memory addressing is addressing algorithm carried by an assembly instruction. It specifies the way to calculate the memory the unique location of data in a data memory for a read or a write. Implicitly addressing algorithm in C; explicitly algorithm in ASM

MEL G642

Addressing for data memory access


Name Direct Register indirect Register incremental Register decrement DA D R R++ --R DA code Memory cost (b) 16 5 5 5 DM0/1 DM0/1 DM0/1 DM0/1 Algorithm 16-bit constant as the direct memory address A register containing the memory address R gives address, and R=R+1 after addressing R=R1 before addressing, R gives address CC 1 1 1 1

MEL G642

Arithmetic logic instructions


Basic arithmetic operations in C are +, , , /, and %. The modulo operation % is not used very often for DSP arithmetic computing, to implement it using a subroutine. Division operation / is not easy to implement in hardware

MEL G642

Basic Arithmetic Instructions


Mnem ADD SUB ABS INC DEC MPL MAC RND CAC Operand Rd, Rr Rd, Rr Rd, Rr Rd Rd A, Rd, Rr A, Rd, Rr Rd, A A Description Add Subtract Absolute operation Increment Decrement Multiplication Multiplication and accumulation Round, saturate, and truncate Clear an accumulator Operation Rd Ra + Rb Rd Ra - Rb Rd ABS(Ra) Rd Ra + 1 Rd Ra - 1 A Ra Rb A A + Ra Rb Rd Saturate(Round(A)) A0 Flags Z,N,V Z,N,V Z,N,V Z,N,V Z,N,V Z,N,V Z,N,V Z,N,V Z,N,V CC 1 1 1 1 1 1 2 1 1

MEL G642

Logic and Shift Operations


Logic and shift operations in C
&(and), |(or), ~(not), ^(xor), << (left shift), and >> (right shift).

Here "and" operates on each bit of operand A and B; that is, C[0]=A[0] & B[0], C[1]=A[1] & B[1], C[15]=A[15] & B[15].

MEL G642

Logic and Shift Operations


Mnem AND OR NOT XOR LS RS Operand Ra, Rb Ra, Rb Ra, Rb Ra, Rb Ra, Rb Ra, Rb Description A logic-and B A logic-or B Invert A A logic-xor B Logic left shift Logic right shift Operation Rd Ra and Rb Rd Ra or Rb Rd INV (Ra) Rd Ra xor Rb Rd Ra left shifted by Rb [3:0] Rd Ra right shifted by Rb [3:0] Flags C, Z C, Z C, Z C, Z C, Z C, Z CC 1 1 1 1 1 1

MEL G642

Logic Operators in C
Condition symbol < <= == >= > != && || ! Conditions Less than Less than or equal to Equal to Greater than or equal to Greater than Not equal to Boolean AND Boolean OR Boolean NOT

MEL G642

Program flow control in C


Conditional and unconditional controls in C.
Unconditional GOTO operations. Conditional: Condition test and jump in C are integrated, for example, if A then B else C.

In an assembly language
Condition test and condition jump are separated the first instruction offers and flag computation the second instruction is the conditional jump

MEL G642

Program flow control instructions


Mnem JLT JLE JEQ JGE JGT JNE JUMP CALL Return Description Jump when Less than Jump when Less than or Equal to Jump when Equal to Jump when Greater than or Equal to Jump when Greater than Jump when Not Equal to Unconditional jump Jump, push return address into stack Return to the stacked address
MEL G642

Condit ions < <= == >= > !=

Flags meet N=1 N=1 or Z=1 Z=1 N=0 N=0 and Z=0 Z=0

CC 3/1 3/1 3/1 3/1 3/1 3/1 3 3 3

Target addressing for jumping


TA Absolute Relative Algorithm 16 bits constant In a general register

MEL G642

Assembly Instruction Set Summary

MEL G642

MEL G642

Benchmarking the instruction set

MEL G642

What is benchmark
DSP benchmarking gets cycle cost and code size used by a DSP algorithm with single-precision data. Convention of DSP benchmarking
round is required before moving long data from an accumulation register to a general register

MEL G642

What to benchmark
Standard benchmark algorithms available from BDTI (Berkeley Design Technology, Inc)

MEL G642

How to benchmark
BDTI benchmarking convention
It measures the execution time (cycle cost), the code size (program memory cost), and the cost of data memories.

The cycle cost = prologue + Kernel + epilogue


Prologue: preparing for running a program, Epilogue: terminating the program Kernel: the part of the algorithm

MEL G642

Assumption in this discussion


Data frame size: 40 samples. The number of FIR taps = 16. The cycle cost = 1 cycle per normal instruction 3 cycles for jump taken. MAC takes one cycle if the following instruction does not use the data in an accumulator register. TSMD: a typical single MAC DSP (TSMD)
processor available as a COTS (commercial off-the shelf).

MEL G642

Example: Block Transfer


C-code: DM1 (SEG: 0 to 39) -> DM1 (SEG: 0 to 39) Assembly code

MEL G642

Example: Block Transfer


Processor Basic (ours) TSMD Algorithm BT Total cycle cost 242 47 Pro-epilogue cycle cost 4 4 Kernel cycle cost Total code cost 8 7 Code for proepilogue 4 4 DM cost 84 84

238 43

Opportunities for improvement are:


The loop: The extra cost of each jump taken and DEC of the loop counter consumes four clock cycles. HW loop may eliminate the cost. Load and store can be merged to a memory move to memory instruction.

MEL G642

Example: Single sample FIR


Modulo addressing
FIFO Emulated in a data memory Can be hardware accelerated memory addressing (for accelerated instructions)

MEL G642

Example: consider 7-tap FIR Filter

MEL G642

FIFO Emulated in Data Memory


Bottom position Stop 4 5 3 Stop x(n-6) Stop x(n-6) 6 0 Start, new data x(n) bottom 1 Start, new data x(n) 2 0 1 2 At the same position, new data x(n) becomes x(n-1) the next cycle start stop top 1 2 3 4 (b)
MEL G642

Bottom Stop position Top position 6 5 4 Top position

bottom 5

start

stop

top

The FIFO at cycle 1 (a)

The FIFO at cycle 2

Example: Single sample FIR


Assembly code

MEL G642

Example: Single sample FIR: FIFO behavior

MIN address

DM

BAR X
X X X TAR X 0 1 2 3 4

The data memory space

Step 0 (n-3) (n-4) (n) (n-1) (n-2)

BAR X DAR TAR


X X X X

Step 1 (n-4) (n) (n-1) (n-2) (n-3)

DAR

The FIFO buffer

BAR

TAR

BAR BAR BAR BAR BAR

+ + + + +

before getting new data


Step 2

after getting new data 1


Step 3

BAR X (n)
X X X TAR X (n-1) (n-2) (n-3) (n-4)

DAR

BAR X (n-1)
X X X X (n-2) (n-3) (n-4) (n)

TAR

DAR

MAX address

after getting new data 2 Example: The procedure a FIFO getting a new data sample
MEL G642

after getting new data 3

Example: N sample FIR (Single Sample in loop)

MEL G642

Example: Single sample FIR


Processor Algorithm Total cycle cost Kernel cycle cost Total code cost 26 15

Basic TSMD

16-tapFIR 16-tapFIR

192 31

173 16

the extra cycle cost is 192-31=161 cycles. The cycle cost is 6.2 times higher than the benchmark of a TSMD. Opportunities for improvement are:
The cost of SW emulated circular buffer and modulo addressing is high. o HW circular buffer and modulo addressing is essential. Data and coefficient loading, MAC, and the loop control can be merged into one instruction, convolution, which is one of the most frequently used instructions in DSP.

CONV N DM0(AP0++M) DM1(AP1++)


MEL G642

Example:
FIR Filtering Auto correlation
Autocorrelation is used for finding regularities or periodical features of a signal

Cross-correlation
Cross-correlation is used for measuring the similarity of a signal with a known signal pattern

What difference??
MEL G642

Analyses on identified problems

MEL G642

Lessons Learned
C does not give parallel features; The convolution is one of the most used DSP operations, very high efficiency by having the memory addressing, arithmetic computing, result store, and program flow control carried out in parallel in one instruction. It is possible because the parallel hardware can be organized in a pipeline. Other most frequently used iterative DSP ops can also be specified into one instruction. Research work: Why?
Identify the requirement and benchmark it

MEL G642

Conclusion
An assembly language instruction set must be more efficient. Accelerations implemented at arithmetic and algorithmic levels. Addressing and memory accesses should be executed in parallel with arithmetic computing. Program flow control such as loop or conditional execution shall also be accelerated

MEL G642

ASIP microarchitecture design flow


Proposed assembly language manual

Further expose all micro operations of each assembly instruction

Schedule micro operations into each pipeline step

Design for HW multiplexing in DP and AP

Specify microarchitecture and micro operations for CP

Release micro architecture documents


MEL G642

Proposed pipeline steps

Partiton micro operations into DP, CP, and AP

The End :: Thank you for your attention

Questions?

MEL G642

Вам также может понравиться