Академический Документы
Профессиональный Документы
Культура Документы
Overview
Multicore/Manycore
SIMD
Programming with millions of threads
2/28
The CUDA Programming Model
3/28
Multicore and Manycore
Multicore Manycore
Multicore: yoke of oxen
Each core optimized for executing a single thread
Manycore: flock of chickens
Cores optimized for aggregate throughput, deemphasizing
individual performance
4/28
Multicore & Manycore, cont.
Specifications Core i7 960 GTX285
GTX285 5/28
SIMD: Neglected Parallelism
6/28
What to do with SIMD?
7/28
CUDA
9/28
CUDA Summary
10/28
A Parallel Scripting Language
What is a scripting language?
Lots of opinions on this
I’m using an informal definition:
▪ A language where performance is happily traded for productivity
Weak performance requirement of scalability
▪ “My code should run faster tomorrow”
What is the analog of today’s scripting languages for manycore?
11/28
Data Parallelism
12/28
Warning: Evolving Project
13/28
Copperhead = Cu + python
14/28
Copperhead is not Pure Python
Python
Copperhead
15/28
Saxpy: Hello world
def saxpy(a, x, y):
return map(lambda xi, yi: a*xi + yi, x, y)
c=a+b
Num52 Num52 Num52
Copperhead includes function templates for intrinsics like
add, subtract, map, scan, gather
Expressions are mapped against templates
Every variable starts out with a unique generic type, then
types are resolved by union find on the abstract syntax tree
Tuple and function types are also inferred
17/28
Data parallelism
18/28
Copperhead primitives
map
reduce
Scans:
scan, rscan, segscan, rsegscan
exscan, exrscan, exsegscan, exrsegscan
Shuffles:
shift, rotate, gather, scatter
19/28
Implementing Copperhead
Module(
None,
def saxpy(a, x, y): Stmt(
Function(
None,
return map(lambda xi, yi: a*xi + yi, x, y) 'saxpy',
['a', 'x', 'y'],
0,
None,
Stmt(
The Copperhead Return(
CallFunc(
)
by walking the AST )
)
)
20/28
Compiling Copperhead to CUDA
21/28
Saxpy Revisited
def saxpy(a, x, y):
return map(lambda xi, yi: a*xi + yi, x, y)
Reduction
phase 0
phase 1
Scan
phase 0
phase 1
phase 2
23/28
Copperhead to CUDA, cont.
B = reduce(map(A))
D = reduce(map(C))
phase 0
phase 1
25/28
Split
def split(input, value):
0 flags = map(lambda a: 1 if a <= value else 0, input)
0 notFlags = map(lambda a: not a, flags)
phases
26/28
Interpreting to Copperhead
27/28
Future Work
28/28