Вы находитесь на странице: 1из 35

Design and Implementation of Signal

Processing Systems:
An Introduction
Outline

 Course Objectives and Outline, Conduct


 What is signal processing?
 Implementation Options and Design issues:
– General purpose (micro) processor (GPP)
o Multimedia enhanced extension (Native signal
processing)
– Programmable digital signal processors (PDSP)
o Multimedia signal processors (MSP)
– Application specific integrated circuit (ASIC)
– Re-configurable signal processors

2
Course Objectives

 Provide students with a global view of


embedded micro-architecture implementation
options and design methodologies for
multimedia signal processing.
 The interaction between the algorithm
formulation and the underlying architecture
that implements the algorithm will be focused:
– Formulate algorithm for match architecture.
– Design novel architecture to match algorithm.
Course Outline
 Signal processing computing  Native signal processing and
algorithms multimedia extension
 Algorithm representations  Programmable DSPs
 Algorithm transformations:  Very Long Instruction Word
– Retiming, unfolding (VLIW) Architecture
– Folding  Re-configurable computing &
 Systolic array and design FPGA
methodologies  Signal Processing
 Mappling algorithms to array arithmetics: CORDIC, and
structures distributed arithmetic.
 Low power design  Applications: Video, audio,
communication
Course Conduct

 Instructor will give an introduction to each


topic.
 Power point notes will be published on the
web.
 Depending on size of class, the lectures may
be followed by an in-class discussion or even
some presentations by individual students.
 Final project presentation at last week of
semester
Homework, Projects

 3-5 homework  Groups of one or (up to)


assignments are two persons are to be
currently planed. Part of formed to conduct class
the homework may projects. A two-person
project must justify the
involve programming, or amount of work and
hands-on processing of specify each person’s
signals. contribution in the final
 One take-home final report.
exam is due on the  Report, and presentation
scheduled final date. are both required.
Electronic copies
encouraged but not a
must.
What is Signal?

 A SIGNAL is a measurement of a physical


quantity of certain medium.
 Examples of signals:
– Visual patterns (written documents, picture, video,
gesture, facial expression)
– Audio patterns (voice, speech, music)
– Change patterns of other physical quantities:
temperature, EM wave, etc.
 Signal contains INFORMATION!
Medium and Modality

 Medium:
– Physical materials that carry the signal.
– Examples: paper (visual patterns, handwriting,
etc.), Air (sound pressure, music, voice), various
video displays (CRT, LCD)
 Modality:
– Different modes of signals over the same or
different media.
– Examples: voice, facial expression and gesture.
What is Signal Processing?

 Ways to manipulate  Types of processing:


signal in its original – Transformation
medium or an abstract – Filtering
representation. – Detection
– Estimation
 Signal can be – Recognition and
abstracted as functions classification
of time or spatial – Coding (compression)
coordinates. – Synthesis and
reproduction
– Recording, archiving
– Analyzing, modeling
Signal Processing Applications
 Communications:  Audio
– Modulation/Demodulation – 3D sound,
(modem) – surround sound
– Channel estimation,  Speech
equalization
– Coding
– Channel coding
– Recognition
– Source coding:
compression – Synthesis
– Translation
 Imaging:
– Digital camera,  Virtual reality, animation,
– scanner  Control
– HDTV, DVD – Hard drive,
– Motor
Digital Signal Processing
 Signals generated via  Digital signal processing
physical phenomenon are concerns processing
analog in that
signals using digital
– Their amplitudes are
defined over the range of computers.
real/complex numbers – A continuous time/space
– Their domains are signal must be sampled to
continuous in time or space. yield countable signal
 Processing analog signal samples.
requires dedicated,special – The real-(complex) valued
hardware. samples must be quantized
to fit into internal word
length.
Signal Processing Systems

Digital Signal D/A


A/D Processing

The task of digital signal processing (DSP) is to


process sampled signals (from A/D analog to digital
converter), and provide its output to the D/A (digital to
analog converter) to be transformed back to physical
signals.
Implementation of DSP Systems
 Platforms:  Requirements:
– Native signal processing – Real time
(NSP) with general purpose o Processing must be
processors (GPP) done before a pre-
o Multimedia extension specified deadline.
(MMX) instructions – Streamed numerical data
– Programmable digital signal o Sequential processing
processors (PDSP) o Fast arithmetic
o Media processors processing
– Application-Specific – High throughput
Integrated Circuits (ASIC) o Fast data input/output
– Re-configurable computing o Fast manipulation of
with field-programmable data
gate array (FPGA)
How Fast is Enough for DSP?
 It depends!  Different throughput
 Real time requirements: rates for processing
– Example: data capture different signals
speed must match sampling
rate. Otherwise, data will be – Throughput sampling
lost. rate.
– Example: in verbal – CD music: 44.1 kHz
conversation, delay of – Speech: 8-22 kHz
response can not exceed
50ms end-to-end.
– Video (depends on frame
rate, frame size, etc.)
– Processing must be done by
a specific deadline. range from 100s kHz to
MHz.
– A constraint on throughput.
Early Signal Processing Systems
 Implemented with either  Key approach:
main frame computer or – Faster hardware
special purpose – Faster algorithms
computers.
 Faster algorithms
 Batch processing rather
– Reduce the number of
than real time,
arithmetic operations
streamed data
processing. – Reduce the number of
bits to represent each
 Accelerate processing data
speed is of main – Most important example:
concern. Fast Fourier Transform
Computing Fourier Transform

Discrete Fourier Transform  Fast Fourier Transform


N 1 – Reduce the computation
2nk
X (k )   x(n) exp[ 
N
] to O(N log2 N) complex
multiplications
n 0
N 1 – Makes it practical to
2nk

1
x ( n)  X (k ) exp[ ] process large amount of
N k 0
N digital data.
– Many computations can
 To compute the N be “Speed-up” using FFT
frequencies {X(k); 0  k  – Dawn of modern digital
N1} requires N2 complex signal processing
multiplications
Evolution of Micro-Processor
 Micro-processors  Clock frequency
implemented a central increases from 100KHz
processing unit on a to 1GHz
single chip.  Number of transistors
 Performance improved increases from 1K to
from 1MFLOP (1983) to 50M
1GFLOP or above  Power consumption
 Word length (# bits for increases much slower
register, data bus, addr. with the use of lower
Space, etc) increases supply voltage: 5 V
from 4 bits to 64 bits drops to 1.5V
today.
Native Signal Processing
 Use GPP to perform signal  MMX (multimedia extension
processing task with no instructions): special
additional hardware. instructions for accelerating
– Example: soft-modem, soft multimedia tasks.
DVD player, soft MPEG  May share same data-path
player. with other instructions, or
 Reduce hardware cost! work on special hardware
 May not be feasible for modules.
extremely high throughput  Make use sub-word
tasks. parallelism to improve
 Interfering with other tasks numerical calculation speed.
as GPP is tied up with NSP  Implement DSP-specific
tasks. arithmetic operations, eg.
Saturation arithmetic ops.
ASIC: Application Specific ICs

 Custom or semi-custom  ASIC becomes popular


IC chip or chip sets due to availability of IC
developed for specific foundry services. Fab-
functions. less design houses turn
innovative design into
 Suitable for high profitable chip sets
volume, low cost using CAD tools.
productions.  Design automation is a
 Example: MPEG codec, key enabling technology
3D graphic chip, etc. to facilitate fast design
cycle and shorter time
to market delay.
Programmable Digital Signal
Processors (PDSPs)
 Micro-processors designed  PDSPs were developed
for signal processing to fill a market segment
applications.
between GPP and
 Special hardware support
for: ASIC:
– Multiply-and-Accumulate – GPP flexible, but slow
(MAC) ops – ASIC fast, but inflexible
– Saturation arithmetic ops  As VLSI technology
– Zero-overhead loop ops
improves, role of PDSP
– Dedicated data I/O ports
changed over time.
– Complex address
calculation and memory – Cost: design, sales,
access maintenance/upgrade
– Real time clock and other – Performance
embedded processing
supports.
Multimedia Signal Processors
 Specialized PDSPs  Main applications:
designed for multimedia – Video signal processing,
applications MPEG, H.324, H.263,
 Features: etc.
– Multi-processing system – 3D surround sound
with a GPP core plus
multiple function modules – Graphic engine for 3D
– VLIW-like instructions to
rendering
promote instruction level
parallelism (ILP)
– Dedicated I/O and memory
management units.
Re-configurable Computing using
FPGA
 FPGA (Field programmable  Use of FPGA
gate array) is a derivative of – Rapid prototyping: run
PLD (programmable logic fractional ASIC speed
devices). without fab delay.
 They are hardware – Hardware accelerator:
configurable to behave using the same hardware
differently for different to realize different
configurations. function modules to save
 Slower than ASIC, but faster hardware
than PDSP. – Low quantity system
 Once configured, it behaves deployment
like an ASIC module.
Characteristics and Impact of VLSI
 The term VLSI (Very Large  Characteristics
Scale Integration) is coined – High density:
in late 1970s. o Reduced feature size:
 Usage of VLSI: 0.25µm -> 0.16 µm
o % of wire/routing area
– Micro-processor increases
o General purpose
– Low power/high speed:
o Programmable DSP
o Decreased operating
o Embedded m-controller voltage: 1.8V -> 1V
– Application-specific ICs o Increased clock frequency:
– Field-Programmable Gate 500 MHz-> 1GH.
Array (FPGA) – High complexity:
 Impacts: o Increased transistor count:
10M transistors and higher
– Design methodology
o Shortened time-to-market
– Performance delay: 6-12 months
– Power
Design Issues

 Given a DSP  Software design:


– NSP/MMX, PDSP/MSP
application, which
– Algorithms are implemented
implementation option as programs.
should be chosen? – Often still require
programming in assembly
 For a particular level manually
implementation option,  Hardware design:
how to achieve optimal – ASIC, FPGA
– Algorithms are directly
design? Optimal in implemented in hardware
terms of what criteria? modules.
 S/H Co-design: System level
design methodology.
Design Process Model

 Design is the process  Implementation


that links algorithm to – Assignment: Each
implementation operation can be realized
with
 Algorithm o One or more
– Operations instructions (software)
– Dependency between o One or more function
operations determines a modules (hardware)
partial ordering of – Scheduling: Dependence
execution relations and resource
– Can be specified as a constraints leads to a
dependence graph schedule.
A Design Example …

Consider the algorithm:  Operations:


n – Multiplication
y  a (k ) x(k ) – Addition
 Dependency
k 1
– y(k) depends on y(k-1)
Program:
– Dependence Graph:
y(0) = 0
For k = 1 to n Do
a(1) x(1) a(2) x(2) a(n) x(n)
y(k) = y(k-1)+ a(k)*x(k)
End * * *
y = y(n)
y(0)
+ + + y(n)
Design Example cont’d …
 Software Implementation:  Hardware Implementation:
– Map each * op. to a MUL – Map each * op. to a
instruction, and each + op. multiplier, and each + op. to
to a ADD instruction.
an adder.
– Allocate memory space for
{a(k)}, {x(k)}, and {y(k)} – Interconnect them
according to the
– Schedule the operation by
sequentially execute dependence
a(1) x(1) a(2) x(2) graph:
a(n) x(n)
y(1)=a(1)*x(1), y(2)=y(1) +
a(2)*x(2), etc. * * *
– Note that each instruction is
y(0)
still to be implemented in + + + y(n)

hardware.
Observations

 Eventually, an  Bottom line – Hardware/


implementation is software co-design.
realized with hardware. There is a continuation
between hardware and
 However, by using the software
same hardware to implementation.
realize different  A design must explore
operations at different both simultaneously to
time (scheduling), we achieve best
have a software performance/cost trade-
program! off.
A Theme

 Matching hardware to  Formulate algorithm to


algorithm match hardware
– Algorithm must be
– Hardware architecture formulated so that they can
must match the best exploit the potential of
characteristics of the architecture.
algorithm. – Example: GPP, PDSP
– Example: ASIC architectures are fixed. One
architecture is designed must formulate the
to implement a specific algorithm properly to
algorithm, and hence can achieve best performance.
Eg. To minimize number of
achieve superior
operations.
performance.
Algorithm Reformulation

 Matching algorithm to architectural features


– Similar to optimizing assembly code
– Exploiting equivalence between different operations
 Reformulation methods
– Equivalent ordering of execution:
o (a+b)+c = a+(b+c)
– Equivalent operation with a particular representation:
o a*2 is the same as left-shift a by 1 bit in binary representation
– Algorithmic level equivalence
o Different filter structures implementing the same specification!
Algorithm Reformulation (2)

 Exploiting parallelism
– Regular iterative algorithms and loop reformulation
o Well studied in parallel compiler technology
– Signal flow/Data flow representation
o Suitable for specification of pipelined parallelism
Mapping Algorithm to Architecture

 Scheduling and Assignment Problem


– Resources: hardware modules, and time slots
– Demands: operations (algorithm), and throughput
 Constrained optimization problem
– Minimize resources (objective function) to meet
demands (constraints)
 For regular iterative algorithms and regular
processor arrays -> algebraic mapping.

15
Mapping Algorithms to Architectures

 Irregular multi-processor architecture:


– linear programming
– Heuristic methods
– Algorithm reformulation for recursions.
 Instruction level parallelism
– MMX instruction programming
– Related to optimizing compilation.
Arithmetic

 CORDIC
– Compute elementary functions
 Distributed arithmetic
– ROM based implementation
 Redundant representation
– eliminate carry propagation
 Residue number system

14
Low Power Design

 Device level low power design


 Logic level low power design
 Architectural level low power design
 Algorithmic level low power design

Вам также может понравиться