Академический Документы
Профессиональный Документы
Культура Документы
Jason Miller
OOO
1 Superscalar Diminishing returns from
0.1 Pipelining single CPU mechanisms
(pipelining, caching, etc.)
Wire delays
0.01 Power envelopes
time
1992 1998 2002 2006 2010
2
Multicore Scaling Trends
Today Tomorrow
A few large cores on each chip 100’s to 1000’s of simpler
Diminishing returns prevent cores [S. Borkar, Intel, 2007]
cores from getting more
Simple cores are more power
complex
and area efficient
Only option for future scaling is
to add more cores Global structures do not scale;
all resources must be
Still some shared global
structures: bus, L2 caches distributed
m m m m
p p p p
p p switch switch switch switch
m m m m
p p p p
c c switch switch switch switch
m m m m
BUS p p p p
switch switch switch switch
m m m m
L2 Cache p p p p
switch switch switch switch
3
The Future of Multicore
Number of cores doubles
every 18 months Parallelism replaces
clock frequency
scaling and core
complexity
Resulting
Challenges…
Scalability
Programming
Power
4
Multicore Challenges
Scalability
How do we turn additional cores into additional performance?
Must accelerate single apps, not just run more apps in parallel
Efficient core-to-core communication is crucial
Architectures that grow easily with each new technology
generation
Programming
Traditional parallel programming techniques are hard
Parallel machines were rare and used only by rocket scientists
Multicores are ubiquitous and must be programmable by
anyone
Power
Already a first-order design constraint
More cores and more communication more power
Previous tricks (e.g. lower Vdd) are running out of steam
5
Multicore Communication
Today
Bus-based Interconnect
p p
Single shared resource
c c
Uniform communication cost
BUS Communication through
L2 Cache
memory
Doesn’t scale to many cores
due to contention and long
wires
DRAM
Scalable up to about 8 cores
6
Multicore Communication
Tomorrow
Point-to-Point Mesh Network
m m m m
p p p p Examples: MIT Raw, Tilera
TILEPro64, Intel Terascale
switch switch switch switch
m m m m
p
switch
p
switch
p
switch
p
switch
Prototype
m m m m
p p p p Neighboring tiles are connected
switch switch switch switch
m
p
m
p
m
p
m
p
Distributed communication
switch switch switch switch
resources
Non-uniform costs:
Latency depends on distance
Encourages direct
DRAM
DRAM
DRAM
DRAM
communication
More energy efficient than bus
Scalable to hundreds of cores 7
Multicore Programming Trends
8
Multicore Programming Trends
9
Improving Programmability
Observations:
10
ATAC Architecture
Electrical Mesh Interconnect
m m m m
p p p p
m m m m
p p p p
m m m m
p p p p
m m m m
p p p p
optical waveguide
12
Optical Broadcast Network
Electronic-
photonic
integration using
standard CMOS
N cores
process
Cores
communicate via
optical WDM
broadcast and
select network
Each core sends
on its own
dedicated
wavelength using
modulators
Cores can receive
from some set 13of
Optical bit transmission
data waveguide
flip-flop flip-flop
32 32
FIFO
FIFO
FIFO
FIFO
FIFO FIFO
32 32
Processor Processor
Core Core Processor Core
16
System Capabilities and
Performance
17
Programming ATAC
Cores can directly communicate with any other core
in one hop (<2ns)
Broadcasts require just one send
No complicated routing on network required
Cheap broadcast enables frequent global
communications
Broadcast-based cache update/remote store
protocol
All “subscribers” are notified when a writing core issues a
store (“publish”)
Uniform communication latency simplifies
scheduling
18
Communication-centric Computing
ATAC reduces off-chip memory calls, and hence energy and
latency
20
Backup Slides
What Does the Future Look
Like?
Corollary of Moore’s law: Number of cores will
double every 18 months
‘02 ‘05 ‘08 ‘11 ‘14
memory BNet
Proc
ONet $
HUB
ENet Dir $
memory NET
24