Академический Документы
Профессиональный Документы
Культура Документы
Arvind
Computer Science and Artificial Intelligence Lab
M.I.T.
1
Inspiration: Jack Dennis
General purpose parallel machines based on a
dataflow graph model of computation
• Active Messages
David Culler
• Compiling for FPGAs, ...
Wim Bohm, Seth Goldstein...
• Synchronous dataflow
– Lustre, Signal
Ed Lee @ Berkeley
• Musings
T T
T F
+ T F
T T
T F
+ T F
• • • • • •
Before After
Conditional Loop
Bounded Loop
T F F
T F
p
f g T F
T Needed for f
F
resource
management
ISCA, Madison, WI, June 6, 2005 Arvind - 8
Outline
• Dataflow graphs
– A clean model of parallel computation
• Static Dataflow Machines
– Not general-purpose enough
• Dynamic Dataflow Machines
– As easy to build as a simple pipelined processor
• The software view
– The memory model: I-structures
• Monsoon and its performance
• Musings
Receive
Instruction Templates
1 Op dest1 dest2 p1 src1 p2 src2
2
.
.
.
FU FU FU FU FU
Send
<s1, p1, v1>, <s2, p2, v2>
• Musings
frame instruction
pointer pointer
op r d1,d2 Instruction
ip Fetch
Code
fp+r Operand
Fetch Token
Frames Queue
ALU
Form
Token
Network Network
ISCA, Madison, WI, June 6, 2005 Arvind - 16
Temporary Registers &
Threads
Robert Iannucci
op r S1,S2 Instruction
Code Fetch
n sets of
registers
(n = pipeline Operand
depth) Frames Fetch Token
Queue
Registers
Registers evaporate
ALU
when an instruction
thread is broken
Robert Iannucci
Form
Token
Registers are also
Network Network
used for exceptions &
interrupts
ISCA, Madison, WI, June 6, 2005 Arvind - 17
Actual Monsoon Pipeline:
Eight Stages
32
Instruction Instruction Fetch
Memory
Effective Address
3
Presence Presence Bit
bits Operation User System
Queue Queue
72
Frame Frame
Memory Operation
72 72 144
2R, 2W
ALU
144 Network
Registers
Form Token
Like standard
call/return
but caller & 1: n:
callee can be
Fork Graph for f
active
simultaneously
token in frame 0 change Tag 0 change Tag 1
token in frame 1
• Musings
g: h:
active
threads
asynchronous
loop and parallel
at all levels
ISCA, Madison, WI, June 6, 2005 Arvind - 22
Id World
implicit parallelism
Id
TTDA Monsoon *T
*T-Voyager
• K. Eknadham (IBM)
• Wim Bohm (Colorado)
• Joe Stoy (Oxford)
• ...
Boon S. Ang Jamey Hicks Derek Chiou
Steve Heller
I-store
• Musings
Monsoon 16-node
Processor Fat Tree I-structure
64-bit
100 MB/sec
10M tokens/sec Unix Box 4M 64-bit words
Id World Software
Tony Dahbura
ISCA, Madison, WI, June 6, 2005 Arvind - 28
Id Applications on Monsoon @ MIT
• Numerical
– Hydrodynamics - SIMPLE
– Global Circulation Model - GCM
– Photon-Neutron Transport code -GAMTEB
– N-body problem
• Symbolic
– Combinatorics - free tree matching,Paraffins
– Id-in-Id compiler
• System
– I/O Library
– Heap Storage Allocator on Monsoon
• Fun and Games
– Breakout
– Life
– Spreadsheet
GAMTEB-9C
40K particles 17:20 10:42 5:36 5:36
1M particles 7:13:20 4:17:14 2:36:00 2:22:00
SIMPLE-100
1 iterations :19 :15 :10 :06
1K iterations 4:48:00 1:19:49
hours:minutes:seconds
ISCA, Madison, WI, June 6, 2005 Arvind - 31
Monsoon Speed Up
Results
Boon Ang, Derek Chiou, Jamey Hicks
speed up critical path
(millions of cycles)
1pe 2pe 4pe 8pe 1pe 2pe 4pe 8pe
Matrix Multiply 1.00 1.99 3.90 7.74 1057 531 271 137
500 x 500
• Musings
and thanks to
41
DFGs vs CDFGs
• Both Dataflow Graphs and Control DFGs had the
goal of structured, well-formed, compositional,
executable graphs