Вы находитесь на странице: 1из 24

SOC architecture and design

system-on-chip (SOC)
processors: become components in a system

SOC covers many topics

processor: pipelined, superscalar, VLIW, array, vector


storage: cache, embedded and external memory
interconnect: buses, network-on-chip
impact: time, area, power, reliability, configurability
customisability: specialized processors, reconfiguration
productivity/tools: model, explore, re-use, synthesise, verify
examples: crypto, graphics, media, network, comm, security
future: autonomous SOC, self-optimising/verifying design

our focus
overview, processor, memory
wl 2015 10.1

iPhone SOC
I/O
Processor

1 GHz ARM Cortex


A8

I/O

Memory

Source: UC Berkeley

I/O
wl 2015 10.2

Basic system-on-chip model

wl 2015 10.3

AMDs Barcelona Multicore


4 out-of-order cores
Processor
512KB L2

512KB L2

Core 1

Core 2

1.9 GHz clock rate

65nm technology

3 levels of caches

integrated Northbridge

Core 3

512KB L2

Northbridge

512KB L2

2MB shared L3 Cache

Core 4

http://www.techwarelabs.com/reviews/processors/barcelona/

wl 2015 10.4

SOC vs processors on chip


with lots of transistors, designs move in 2 ways:
complete system on a chip
multi-core processors with lots of cache
System on chip

Processors on chip

processor

multiple, simple,
heterogeneous

few, complex,
homogeneous

cache

one level, small

2-3 levels, extensive

memory

embedded, on chip

very large, off chip

functionality

special purpose

general purpose

interconnect

wide, high bandwidth

often through cache

power, cost

both low

both high

operation

largely stand-alone

need other chips


wl 2015 10.5

Processor types: overview


Processor type Architecture / Implementation approach
SIMD

Single instruction applied to multiple


functional units

Vector

Single instruction applied to multiple


pipelined registers

VLIW

Multiple instructions issued each cycle


under compiler control

Superscalar

Multiple instructions issued each cycle


under hardware control

wl 2015 10.6

Processors for SOCs


SOC

Basic ISA

Processor description

Freescale c600:
signal processing

PowerPC

Superscalar with vector


extension

ClearSpeed
CSX600: general

Proprietary

Array processor with 96


processing elements

PlayStation 2:
gaming

MIPS

Pipelined with 2 vector


coprocessors

ARM VFP11:
general

ARM

Configurable vector
coprocessor

wl 2015 10.7

Sequential and parallel machines


basic single stream processors
pipelined: overlap operations in basic sequential
superscalar: transparent concurrency
VLIW: compiler-generated concurrency

multiple streams, multiple functional units


array processors
vector processors

multiprocessors
wl 2015 10.8

Pipelined processor
Instruction #1
IF

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

Instruction #2
IF

Instruction #3
IF

Instruction #4
IF

WB

Time

wl 2015 10.9

Superscalar and VLIW processors


Instruction #1
IF

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

ID

AG

DF

EX

WB

Instruction #2
IF

Instruction #3
IF
Instruction #4
IF

Instruction #5
IF
Instruction #6
IF

Time

wl 2015 10.10

Superscalar

hardware for parallelism control

VLIW

wl 2015 10.11

Array processors
perform op if condition = mask
operand can come from neighbour
mask

op

dest

sr1

sr2

n PEs, each with


memory; neighbour
communications

one instruction
issued to all PEs

wl 2015 10.12

Vector processors
vector registers, eg 8 sets x 64 elements x 64 bits
vector instructions: VR3 = VR2 VOP VR1

wl 2015 10.13

Memory addressing:
three levels

(each segment contains pages


for a program/process)

wl 2015 10.14

User view of memory: addressing


a program: process address (offset + base + index)
virtual address: from page address and process/user id

segment table: process base and bound

(for each process)

system address: process base + page address

pages: active localities in main/real memory


virtual address: page table lookup to physical address
page miss: virtual pages not in page table

TLB (translation look-aside buffer): recent translations


TLB entry: corresponding real and (virtual, id) address

a few hashed virtual address bits address TLB entries


if virtual, id = TLB (virtual, id) then use translation
wl 2015 10.15

Virtual Address

TLB and Paging:


Address
translation

(recent translations)

(find process)

process base

System Address

(find page)

Physical Address

wl 2015 10.16

SOC interconnect
interconnecting multiple active agents requires
bandwidth: capacity to transmit information (bps)
protocol: logic for non-interfering message transmission

bus
AMBA (Adv. Microcontroller Bus Architecture) from ARM,
widely used for SOC
bus performance: can determine system performance

network on chip
array of switches
statically switched: eg mesh
dynamically switched: eg crossbar
wl 2015 10.17

Design cost: product economics


increasingly product cost determined by
design costs, including verification
not marginal cost to produce

manage complexity in die technology by


engineering effort
engineering cleverness

design effort
often dictated by
product volume

Basic
physical
tradeoffs

Design time
and effort

Balance point depends on


n, number of units

wl 2015 10.18

Design complexity

processors

wl 2015 10.19

Cost: product program vs engineering


Chip design

Fixed
costs

Variable costs
Verify & test

Labor costs
Software

Marketing,
sales,
administration

Manufacturing
costs

CAD
support

Engineering
costs

Engineering
Mask costs

CAD
programs

Fixed
project costs

Product cost
Capital
equipment

wl 2015 10.20

Example: two scenarios


fixed costs Kf, support costs 0.1 x function(n), and
variable costs Kv x n, so

design gets more complex, while production costs


decrease
Kf increases while Kv decreases
if same price, requires higher volumes to break even

when compared with 1995, in 2015


Kf increased by 10 times
Kv decreased by the same amount
wl 2015 10.21

More recent: higher NRE


2015
1995

wl 2015 10.22

IP: Intellectual Property

wl 2015 10.23

Answers to Unassessed Coursework 5


1. rdl1 R = snd [-]-1 ; R
rdln+1 R = snd aprn-1 ; rsh ; fst (rdln R) ; R
2. P0 = rdln Pcell; 1
<<s,x>, a> Pcell <sx+a, x>
3. rdln R = rown (Ri ; 2-1) ; 2
P1 = loop (rown Pcell1 ; fst mapn D) ; 1
<<s,x>, a> Pcell1 <a,<sx+a, x>>
4. loop (rown R) = (loop R)n
Proof: induction on n
(see www.doc.ic.ac.uk/~wl/papers/scp90.pdf)
P1 = P2 ; [D,D]-n
P2 = (loop (Pcell1 ; [D,[D,D]]))n
wl 2015 10.24

Вам также может понравиться