R
e
d
u
c
e
d
b
l
u
r
60 frames/sec
Eye moves continuously, smearing
the still image across the retina
16ms
16ms
120 frames/sec
Eye moves a shorter arc in 8ms, thus
reducing amount of blur across retina
8ms
Interpolated
frames
AMD Xilleon 420 (X420) performs motion compensated frame
rate conversion (MC-FRC) and doubles the frame rate.
Samsung Timing Controller (TCON) for LCD panel is
integrated inside X420.
TV System (FullHD, 1080p)
TV System Block Diagram
Hot Chips 20: mediaDSP
4
August 2008
Memory Controller
LVDS
Receiver
Host Controller
Unit
Peripheral
Controller
Unit
LVDS
Pixel Input
mini-LVDS
Pixel
Output
mediaDSP
Core #2
mediaDSP
Core #1
I2C Slave
Interface
I2C Master
Interface
GPIO
EEPROM
Interface
Stream
Manager
Timing
Controller
(TCON)
mini-LVDS
Transmitter
SDRAM Interface
Programmable multicore approach is attractive
mediaDSP Introduction
mediaDSP is a flexible media processing platform
Well suited to address a range of media applications,
with emphasis on video processing
Enables guaranteed real-time performance
Has been used in two product families
MPEG A/V encoding
Motion compensated frame rate conversion (MC-FRC)
Characteristics of MC-FRC:
Is an ill-posed problem no single, optimal solution;
customer preferences vary
Extreme computational load
Hot Chips 20: mediaDSP
6
August 2008
Task N Task O
Task A
Task B
Task C
Task K
Task L
Task M Task P
Task Q Task R
Task J
Task I
Task H Task E Task D Task F Task G
Example Task Flow Graph:
MPEG-2 Video Macroblock Encoder (Unmapped)
Hot Chips 20: mediaDSP
9
August 2008
mediaDSP
Generic mediaDSP Topology
Hot Chips 20: mediaDSP
12
August 2008
Communication Fabric
Task-Oriented
Engine #1
Task Control Unit
Task-Oriented
Engine #2
Task Control Unit
mediaDSP
Communication Fabric
FRC-
Specific
Engine #1
TCU
Shared
Memory
TCU
Stream
Capture
Engine
FRC-
Specific
Engine #2
TCU
FRC-
Specific
Engine #3
TCU
TCU
External
Memory
DMA
Engine
FRC-
Specific
Engine #4
TCU
FRC-
Specific
Engine #5
TCU
FRC-
Specific
Engine #6
TCU
Crunch
Engines
#(1-10)
TCU
Register
Slave
Interface
Unit
Chip-Level Memory Backbone
Chip-Level Stream Manager
Control
Engine
#1
Control
Engine
#2
Control
Engine
#3
Control
Engine
#4
Chip-Level Register Backbone
Re-Use, Extending, Scaling
Module/TOE
SD
Enc1
SD
Enc2
HD
FRC
FHD
FRC
I
n
f
r
a
Control Engine 2 2 4 8
Task Control Unit 5 6 16 36
Advanced DMA Engine 3 3 30 64
T
O
E
s
Crunch Engine (SIMD DSP)
1 1 9 20
Variable Length CODEC Engine 1 1
Motion Estimation Engine
1 1
Sub-pixel Motion Estimation Engine
1
FRC-Specific Engine #1
1 2
FRC-Specific Engine #2
1 2
FRC-Specific Engine #3
1 2
FRC-Specific Engine #4
1 2
FRC-Specific Engine #5
1 2
FRC-Specific Engine #6
1 2
Stream Capture Engine 2
Hot Chips 20: mediaDSP
14
August 2008
Re-Use
E
x
t
e
n
d
i
n
g
Scaling
Template for TOE
Hot Chips 20: mediaDSP
15
August 2008
mediaDSP
Communication Fabric
Task-Oriented
Engine #1
Task Control Unit
Task-Oriented
Engine #2
Task Control Unit
TCUs In Action
Hot Chips 20: mediaDSP
17
August 2008
Communication Fabric
Control
Engine
EMDMA
Core
Semaphores
Crunch
Engine 1
Semaphores
Task2 Task1 Wait1
Poll1
Send1
Crunch
Engine 2
Semaphores
Task3
Poll2
Send2
Wait2
Shared
Memory
Task Flow Graph:
DMA In
Number Crunch
DMA Out
TCU TCU
TCU
Memory Latency Management
Hot Chips 20: mediaDSP
18
August 2008
Request from
Backbone or TCU
Request to
Backbone
Crunch
Engine
Slave
Access
Decoder/
Router
Memory Read
DMA
Memory Write
DMA
Backbone Master
Interface Unit
R W
IF
ID
DF
OS
Crunch
Engine
Core
Address
Generator Unit
Local RAM A
Operand
Selection
Data
Processing
Unit
C
o
n
t
r
o
l
P
i
p
e
l
i
n
e
Branch
Unit
PC
Instruction
Memory
IR
R W
Local RAM C
Crunch Engine Block Diagram
Hot Chips 20: mediaDSP
20
August 2008
8KB instruction store
Program sequencer with
zero-overhead looping
Input/Output DMA
engines, with streaming
feature
Triple address generator
with
array and stream
addressing model
Zero-overhead
unaligned operand-read
and result-write
128-bit SIMD/Reduction
data processing pipeline
4 KB local store,
64 B output store
Backbone Slave
Interface Unit
E1
E2
E3
PW
WB
Application Profiling (in silicon)
Hardware profiling is an important part of developing
parallel programs.
Flexible profiling scheme can trace both hardware and
software-based events.
Trace Buffer converted to waveform file for easy visualization.
Hot Chips 20: mediaDSP
21
August 2008