SOC Architecture Design

SOC architecture and design
system-on-chip (SOC)
processors: become components in a system
SOC covers many topics
processor: pipelined, superscalar, VLIW, array, vector

storage: cache, embedded and external memory
interconnect: buses, network-on-chip
impact: time, area, power, reliability, configurability
customisability: specialized processors, reconfiguration
productivity/tools: model, explore, re-use, synthesise, verify
examples: crypto, graphics, media, network, comm, security
future: autonomous SOC, self-optimising/verifying design
our focus
overview, processor, memory
wl 2015 10.1
iPhone SOC
I/O
Processor
1 GHz ARM Cortex

A8
I/O
Memory
Source: UC Berkeley
I/O
wl 2015 10.2
Basic system-on-chip model
wl 2015 10.3
AMDs Barcelona Multicore

4 out-of-order cores
Processor
512KB L2
512KB L2
Core 1
Core 2
1.9 GHz clock rate
65nm technology
3 levels of caches
integrated Northbridge
Core 3
512KB L2
Northbridge
512KB L2
2MB shared L3 Cache
Core 4
http://www.techwarelabs.com/reviews/processors/barcelona/
wl 2015 10.4
SOC vs processors on chip

with lots of transistors, designs move in 2 ways:
complete system on a chip
multi-core processors with lots of cache
System on chip
Processors on chip
processor
multiple, simple,
heterogeneous
few, complex,
homogeneous
cache
one level, small
2-3 levels, extensive
memory
embedded, on chip
very large, off chip
functionality
special purpose
general purpose
interconnect
wide, high bandwidth
often through cache
power, cost
both low
both high
operation
largely stand-alone
need other chips

wl 2015 10.5
Processor types: overview

Processor type Architecture / Implementation approach
SIMD
Single instruction applied to multiple

functional units
Vector
Single instruction applied to multiple

pipelined registers
VLIW
Multiple instructions issued each cycle

under compiler control
Superscalar
Multiple instructions issued each cycle

under hardware control
wl 2015 10.6
Processors for SOCs

SOC
Basic ISA
Processor description
Freescale c600:
signal processing
PowerPC
Superscalar with vector

extension
ClearSpeed
CSX600: general
Proprietary
Array processor with 96

processing elements
PlayStation 2:
gaming
MIPS
Pipelined with 2 vector

coprocessors
ARM VFP11:
general
ARM
Configurable vector
coprocessor
wl 2015 10.7
Sequential and parallel machines

basic single stream processors
pipelined: overlap operations in basic sequential
superscalar: transparent concurrency
VLIW: compiler-generated concurrency
multiple streams, multiple functional units

array processors
vector processors
multiprocessors
wl 2015 10.8
Pipelined processor
Instruction #1
IF
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
Instruction #2
IF
Instruction #3
IF
Instruction #4
IF
WB
Time
wl 2015 10.9
Superscalar and VLIW processors

Instruction #1
IF
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
Instruction #2
IF
Instruction #3
IF
Instruction #4
IF
Instruction #5
IF
Instruction #6
IF
Time
wl 2015 10.10
Superscalar
hardware for parallelism control
VLIW
wl 2015 10.11
Array processors
perform op if condition = mask
operand can come from neighbour
mask
op
dest
sr1
sr2
n PEs, each with

memory; neighbour
communications
one instruction
issued to all PEs
wl 2015 10.12
Vector processors
vector registers, eg 8 sets x 64 elements x 64 bits
vector instructions: VR3 = VR2 VOP VR1
wl 2015 10.13
Memory addressing:
three levels
(each segment contains pages

for a program/process)
wl 2015 10.14
User view of memory: addressing

a program: process address (offset + base + index)
virtual address: from page address and process/user id
segment table: process base and bound
(for each process)
system address: process base + page address
pages: active localities in main/real memory

virtual address: page table lookup to physical address
page miss: virtual pages not in page table
TLB (translation look-aside buffer): recent translations

TLB entry: corresponding real and (virtual, id) address
a few hashed virtual address bits address TLB entries

if virtual, id = TLB (virtual, id) then use translation
wl 2015 10.15
Virtual Address
TLB and Paging:

Address
translation
(recent translations)
(find process)
process base
System Address
(find page)
Physical Address
wl 2015 10.16
SOC interconnect
interconnecting multiple active agents requires
bandwidth: capacity to transmit information (bps)
protocol: logic for non-interfering message transmission
bus
AMBA (Adv. Microcontroller Bus Architecture) from ARM,
widely used for SOC
bus performance: can determine system performance
network on chip
array of switches
statically switched: eg mesh
dynamically switched: eg crossbar
wl 2015 10.17
Design cost: product economics

increasingly product cost determined by
design costs, including verification
not marginal cost to produce
manage complexity in die technology by

engineering effort
engineering cleverness
design effort
often dictated by
product volume
Basic
physical
tradeoffs
Design time
and effort
Balance point depends on

n, number of units
wl 2015 10.18
Design complexity
processors
wl 2015 10.19
Cost: product program vs engineering

Chip design
Fixed
costs
Variable costs
Verify & test
Labor costs
Software
Marketing,
sales,
administration
Manufacturing
costs
CAD
support
Engineering
costs
Engineering
Mask costs
CAD
programs
Fixed
project costs
Product cost
Capital
equipment
wl 2015 10.20
Example: two scenarios

fixed costs Kf, support costs 0.1 x function(n), and
variable costs Kv x n, so
design gets more complex, while production costs

decrease
Kf increases while Kv decreases
if same price, requires higher volumes to break even
when compared with 1995, in 2015

Kf increased by 10 times
Kv decreased by the same amount
wl 2015 10.21
More recent: higher NRE

2015
1995
wl 2015 10.22
IP: Intellectual Property
wl 2015 10.23
Answers to Unassessed Coursework 5

1. rdl1 R = snd [-]-1 ; R
rdln+1 R = snd aprn-1 ; rsh ; fst (rdln R) ; R
2. P0 = rdln Pcell; 1
<<s,x>, a> Pcell <sx+a, x>
3. rdln R = rown (Ri ; 2-1) ; 2
P1 = loop (rown Pcell1 ; fst mapn D) ; 1
<<s,x>, a> Pcell1 <a,<sx+a, x>>
4. loop (rown R) = (loop R)n
Proof: induction on n
(see www.doc.ic.ac.uk/~wl/papers/scp90.pdf)
P1 = P2 ; [D,D]-n
P2 = (loop (Pcell1 ; [D,[D,D]]))n
wl 2015 10.24

SOC Architecture Design

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SOC Architecture Design

Загружено:

Авторское право:

Доступные форматы

SOC architecture and design

SOC covers many topics

processor: pipelined, superscalar, VLIW, array, vector

1 GHz ARM Cortex

Basic system-on-chip model

AMDs Barcelona Multicore

1.9 GHz clock rate

2MB shared L3 Cache

SOC vs processors on chip

one level, small

2-3 levels, extensive

very large, off chip

wide, high bandwidth

often through cache

need other chips

Processor types: overview

Single instruction applied to multiple

Single instruction applied to multiple

Multiple instructions issued each cycle

Multiple instructions issued each cycle

Processors for SOCs

Superscalar with vector

Array processor with 96

Pipelined with 2 vector

Sequential and parallel machines

multiple streams, multiple functional units

Superscalar and VLIW processors

hardware for parallelism control

n PEs, each with

(each segment contains pages

User view of memory: addressing

segment table: process base and bound

(for each process)

system address: process base + page address

pages: active localities in main/real memory

TLB (translation look-aside buffer): recent translations

a few hashed virtual address bits address TLB entries

TLB and Paging:

Design cost: product economics

manage complexity in die technology by

Balance point depends on

Cost: product program vs engineering

Example: two scenarios

design gets more complex, while production costs

when compared with 1995, in 2015

More recent: higher NRE

IP: Intellectual Property

Answers to Unassessed Coursework 5

Вам также может понравиться