## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Manuel M. T. Chakravarty

University of New South Wales

JOINT WORK WITH

Gabriele Keller Sean Lee

Monday, 7 September 2009

General Purpose GPU Programming (GPGPU)

Monday, 7 September 2009

**MODERN GPUS ARE FREELY PROGRAMMABLE
**

But no function pointers & limited recursion

Monday, 7 September 2009

**MODERN GPUS ARE FREELY PROGRAMMABLE
**

But no function pointers & limited recursion

Monday, 7 September 2009

**Very Different Programming Model
**

(Compared to multicore CPUs)

Monday, 7 September 2009

Quadcore Xeon CPU

Tesla T10 GPU

h

red nd u

of s

ds rea th

ore /c

**MASSIVELY PARALLEL PROGRAMS NEEDED
**

Tens of thousands of dataparallel threads

Monday, 7 September 2009

Programming GPUs is hard! Why bother?

Monday, 7 September 2009

**Reduce power consumption!
**

✴ GPU achieves 20x better performance/Watt (judging by peak performance) ✴ Speedups between 20x to 150x have been observed in real applications

Monday, 7 September 2009

Sparse Matrix Vector Multiplication 1000

Time (milliseconds)

100

10

1

0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

**Number of non-zero elements (million)
**

Plain Haskell, CPU only (AMD Sempron) Haskell EDSL on a GPU (GeForce 8800GTS) Plain Haskell, CPU only (Intel Xeon) Haskell EDSL on a GPU (Tesla S1070 x1)

**Prototype of code generator targeting GPUs
**

Computation only, without CPU ⇄ GPU transfer

Monday, 7 September 2009

Challenges

Code must be massively dataparallel Control structures are limited

**‣ No function pointers ‣ Very limited recursion
**

Software-managed cache, memory-access patterns, etc. Portability...

Monday, 7 September 2009

Tesla T10 GPU

**OTHER COMPUTE ACCELERATOR ARCHITECTURES
**

Goal: portable data parallelism

Monday, 7 September 2009

Tesla T10 GPU

**OTHER COMPUTE ACCELERATOR ARCHITECTURES
**

Goal: portable data parallelism

Monday, 7 September 2009

Tesla T10 GPU

**OTHER COMPUTE ACCELERATOR ARCHITECTURES
**

Goal: portable data parallelism

Monday, 7 September 2009

Tesla T10 GPU

**OTHER COMPUTE ACCELERATOR ARCHITECTURES
**

Goal: portable data parallelism

Monday, 7 September 2009

**Data.Array.Accelerate
**

Collective operations on multi-dimensional regular arrays Embedded DSL

**‣ Restricted control ﬂow ‣ First-order GPU code
**

Generative approach based on combinator templates Multiple backends

Monday, 7 September 2009

**Data.Array.Accelerate
**

ta p Collective operations on multi-dimensional regular e da ssiv ma arrays elism arall

✓

Embedded DSL

**‣ Restricted control ﬂow ‣ First-order GPU code
**

Generative approach based on combinator templates Multiple backends

Monday, 7 September 2009

**Data.Array.Accelerate
**

ta p Collective operations on multi-dimensional regular e da ssiv ma arrays elism arall

✓

Embedded DSL

**‣ Restricted controloﬂowst trol n
**

✓ ‣ First-order GPU code

l ed c imit

ures ruct

**Generative approach based on combinator templates Multiple backends
**

Monday, 7 September 2009

**Data.Array.Accelerate
**

ta p Collective operations on multi-dimensional regular e da ssiv ma arrays elism arall

✓

Embedded DSL

**‣ Restricted controloﬂowst trol n
**

✓ ‣ First-order GPU code

l ed c imit

ures ruct erns patt

**Generative approach ne tu based on combinator ndtemplates ✓ ha Multiple backends
**

Monday, 7 September 2009

cess d ac

**Data.Array.Accelerate
**

ta p Collective operations on multi-dimensional regular e da ssiv ma arrays elism arall

✓

Embedded DSL

**‣ Restricted controloﬂowst trol n
**

✓ ‣ First-order GPU code

l ed c imit

ures ruct erns patt

**Generative approach ne tu based on combinator ndtemplates ✓ ha
**

bility Multiple backends orta p ✓

Monday, 7 September 2009

cess d ac

**import Data.Array.Accelerate
**

Dot product

dotp :: Vector Float -> Vector Float -> Acc (Scalar Float) dotp xs ys = let xs' = use xs ys' = use ys in fold (+) 0 (zipWith (*) xs' ys')

Monday, 7 September 2009

**import Data.Array.Accelerate
**

HaskellDot product array

dotp :: Vector Float -> Vector Float -> Acc (Scalar Float) dotp xs ys = let xs' = use xs ys' = use ys in fold (+) 0 (zipWith (*) xs' ys')

Monday, 7 September 2009

**import Data.Array.Accelerate
**

HaskellDot product array

dotp :: Vector Float -> Vector Float -> Acc (Scalar Float) dotp xs ys EDSL array = = let desc. of array comps xs' = use xs ys' = use ys in fold (+) 0 (zipWith (*) xs' ys')

Monday, 7 September 2009

**import Data.Array.Accelerate
**

HaskellDot product array

dotp :: Vector Float -> Vector Float -> Acc (Scalar Float) dotp xs ys EDSL array = = let desc. of array comps xs' = use xs ys' = use ys in Lift Haskell arrays into fold (+) 0 (zipWith (*) xs' ys') EDSL — may trigger

host➙device transfer

Monday, 7 September 2009

**import Data.Array.Accelerate
**

HaskellDot product array

dotp :: Vector Float -> Vector Float -> Acc (Scalar Float) dotp xs ys EDSL array = = let desc. of array comps xs' = use xs ys' = use ys in Lift Haskell arrays into fold (+) 0 (zipWith (*) xs' ys') EDSL — may trigger

EDSL array computations

host➙device transfer

Monday, 7 September 2009

**import Data.Array.Accelerate
**

Sparse-matrix vector multiplication

type SparseVector a = Vector (Int, a) type SparseMatrix a = (Segments, SparseVector a) smvm :: Acc (SparseMatrix Float) -> Acc (Vector Float) -> Acc (Vector Float) smvm (segd, smat) vec = let (inds, vals) = unzip smat vecVals = backpermute (shape inds) (\i -> inds!i) vec products = zipWith (*) vecVals vals in foldSeg (+) 0 products segd

Monday, 7 September 2009

**import Data.Array.Accelerate
**

[0, 0, 6.0, 0, 7.0] ≈ [(2, 6.0), (4, 7.0)]

**Sparse-matrix vector multiplication
**

type SparseVector a = Vector (Int, a) type SparseMatrix a = (Segments, SparseVector a) smvm :: Acc (SparseMatrix Float) -> Acc (Vector Float) -> Acc (Vector Float) smvm (segd, smat) vec = let (inds, vals) = unzip smat vecVals = backpermute (shape inds) (\i -> inds!i) vec products = zipWith (*) vecVals vals in foldSeg (+) 0 products segd

Monday, 7 September 2009

**import Data.Array.Accelerate
**

[0, 0, 6.0, 0, 7.0] ≈ [(2, 6.0), (4, 7.0)]

**Sparse-matrix vector multiplication
**

type SparseVector a = Vector (Int, a) type SparseMatrix a = (Segments, SparseVector a) smvm :: Acc (SparseMatrix Float) -> Acc (Vector Float) [[10, 20], [], [30]] ≈ ([2, 0, 1], [10, 20, 30]) -> Acc (Vector Float) smvm (segd, smat) vec = let (inds, vals) = unzip smat vecVals = backpermute (shape inds) (\i -> inds!i) vec products = zipWith (*) vecVals vals in foldSeg (+) 0 products segd

Monday, 7 September 2009

Architecture of Data.Array.Accelerate

Monday, 7 September 2009

map (\x -> x + 1) arr

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

Recover sharing (CSE or O bserve)

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

Recover sharing (CSE or O bserve)

ation s ptimi n) O (Fusio

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

**Recover sharing (CSE or O bserve)
**

Code generation

ation s ptimi n) O (Fusio

__global__ void kernel (float *arr, int n) {...

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

**Recover sharing (CSE or O bserve)
**

Code generation

ation s ptimi n) O (Fusio

__global__ void kernel (float *arr, int n) {...

nvcc

Monday, 7 September 2009

map (\x -> x + 1) arr

eify & HO R

de Bruijn AS ->

Map (Lam (Add `PrimApp` (ZeroIdx, Const 1))) arr

**Recover sharing (CSE or O bserve)
**

Code generation

ation s ptimi n) O (Fusio

age ack ns p gi plu

__global__ void kernel (float *arr, int n) {...

nvcc

Monday, 7 September 2009

**The API of Data.Array.Accelerate
**

(The main bits)

Monday, 7 September 2009

Array types

data Array dim e — regular, multi-dimensional arrays

type DIM0 = () type DIM1 = Int type DIM2 = (Int, Int) ⟨and so on⟩ type Scalar e = Array DIM0 e type Vector e = Array DIM1 e

Monday, 7 September 2009

Array types

data Array dim e — regular, multi-dimensional arrays

type DIM0 = () type DIM1 = Int type DIM2 = (Int, Int) ⟨and so on⟩ type Scalar e = Array DIM0 e type Vector e = Array DIM1 e

EDSL forms

data Exp e data Acc a — scalar expression form — array expression form

Monday, 7 September 2009

Array types

data Array dim e — regular, multi-dimensional arrays

type DIM0 = () type DIM1 = Int type DIM2 = (Int, Int) ⟨and so on⟩ type Scalar e = Array DIM0 e type Vector e = Array DIM1 e

EDSL forms

data Exp e data Acc a — scalar expression form — array expression form

Classes

class Elem e class Elem ix => Ix ix

Monday, 7 September 2009

— scalar and tuples types — unit and integer tuples

Scalar operations

instance Num (Exp e) instance Integral (Exp e) ⟨and so on⟩ (==*), (/=*), (<*), (<=*), (>*), (>=*), min, max (&&*), (||*), not — overloaded arithmetic

— comparisons — logical operators

(?) :: Elem t — conditional expression => Exp Bool -> (Exp t, Exp t) -> Exp t (!) :: (Ix dim, Elem e) — scalar indexing => Acc (Array dim e) -> Exp dim -> Exp e shape :: Ix dim — yield array shape => Acc (Array dim e) -> Exp dim

Monday, 7 September 2009

**Collective array operations — creation
**

— use an array from Haskell land use :: (Ix dim, Elem e) => Array dim e -> Acc (Array dim e) — create a singleton unit :: Elem e => Exp e -> Acc (Scalar e) — multi-dimensional replication replicate :: (SliceIx slix, Elem e) => Exp slix -> Acc (Array (Slice slix) e) -> Acc (Array (SliceDim slix) e) — Example: replicate (All, 10, All) twoDimArr

Monday, 7 September 2009

**Collective array operations — slicing
**

— slice extraction slice :: (SliceIx slix, Elem e) => Acc (Array (SliceDim slix) e) -> Exp slix -> Acc (Array (Slice slix) e) — Example: slice (5, All, 7) threeDimArr

Monday, 7 September 2009

**Collective array operations — mapping
**

map :: => -> -> (Ix dim, Elem a, Elem b) (Exp a -> Exp b) Acc (Array dim a) Acc (Array dim b) (Ix dim, Elem a, Elem b, Elem c) (Exp a -> Exp b -> Exp c) Acc (Array dim a) Acc (Array dim b) Acc (Array dim c)

zipWith :: => -> -> ->

Monday, 7 September 2009

**Collective array operations — reductions
**

fold :: => -> -> -> scan :: => -> -> -> (Ix dim, Elem a) (Exp a -> Exp a -> Exp a) Exp a Acc (Array dim a) Acc (Scalar a) — associative

Elem a (Exp a -> Exp a -> Exp a) — associative Exp a Acc (Vector a) (Acc (Vector a), Acc (Scalar a))

Monday, 7 September 2009

**Collective array operations — permutations
**

permute :: => -> -> -> -> (Ix dim, Ix dim', Elem a) (Exp a -> Exp a -> Exp a) Acc (Array dim' a) (Exp dim -> Exp dim') Acc (Array dim a) Acc (Array dim' a) (Ix dim, Ix dim', Elem a) Exp dim' (Exp dim' -> Exp dim) Acc (Array dim a) Acc (Array dim' a)

backpermute :: => -> -> ->

Monday, 7 September 2009

Dense Matrix-Matrix Multiplication 1000000 100000 Time (milliseconds) 10000 1000 100 10 1

128

256

512

1024

**size of square NxN matrix
**

Unboxed Haskell arrays Delayed arrays Plain C Mac OS vecLib

**Regular arrays in package dph-seq
**

@ 3.06 GHz Core 2 Duo (with GHC 6.11)

Monday, 7 September 2009

Conclusion

EDSL for processing multi-dimensional arrays Collective array operations (highly data parallel) Support for multiple backends Status:

**‣ Very early version on Hackage (only interpreter)
**

http://hackage.haskell.org/package/accelerate

**‣ Currently porting GPU backend over ‣ Looking for backend contributors
**

Monday, 7 September 2009

- Informática Educativauploaded byGonzalo Piutrin Betancourt
- Conexion AMRuploaded byMiguel Ferrer
- M SP7I2B Manualeuploaded byIgor Snjezana Doko
- OpenCLvsCPU Matrix Calculations_lecture3 Nice Readuploaded bygeorge_cpp2
- Multi Pro Ces Adoresuploaded byAnthonyCarreraMartinez
- 01 Iag Genesis Informaticauploaded byAnderson Andree
- cid. blink y vir.uploaded byGer Dominguez
- Computação de Alto Desempenho utilizando GPUuploaded byLuander Henrique
- Postres y Sus Recetasuploaded bycomidastipicas12
- Password Recovery cisco 2960uploaded bycdaguilarm
- Performance of Various Computer Using Standard Linear Equations Softwareuploaded byKuldeep Singh
- Tarjeta Madre BIOSTAR TZ77XE4uploaded byRodrigo L. Punk
- Jennifer (Autoguardado)uploaded byVictor
- 01 MTS-6000A Port Mainframeuploaded byOrnato Nogueira
- KAMUS KOMPUTERuploaded byadi
- YTRIVINO - Formato de Entrega de Equiposuploaded byYeison Leal Gonzalez
- Crowd Floweruploaded byAaromMarquez
- Equipos MD1200 y Workstationuploaded byJuan Salas
- 10º Modelo Hallazgo y La Evaluacion de Aclaracionesuploaded byLizMabelCA
- Segunda Semana 02 Finaluploaded byHugo Aguero
- pdf.pdfuploaded bychely
- leuploaded byPanchita Cola
- Lista Multiniveluploaded byjesus
- 016 - Introducción a La Arquitectura Chasis de Cisco CRS-1uploaded byrosalesjesus3
- album digitaluploaded byapi-381489651
- Primera Guia Tercer Gradouploaded byMelismar
- Dimension-e520 Setup Guide en-usuploaded byChrisTheodorou
- cuadros.dom1uploaded byyui
- Especificacion Cfe v6700-55_ Siscoprommuploaded byHugo Morales
- configuracion Eltekv 1uploaded byEdder Jacobo

- Practical Haskell Programming: scripting with typesuploaded byDon Stewart
- Building a business with Haskell: Case Studies: Cryptol, HaLVM and Copilotuploaded byDon Stewart
- Evaluation strategies and synchronization: things to watch foruploaded byDon Stewart
- Galois Tech Talk: A Scalable Io Manager for GHCuploaded byDon Stewart
- Modern Benchmarking in Haskelluploaded byDon Stewart
- Loop Fusion in Haskelluploaded byDon Stewart
- Engineering Large Projects in a Functional Languageuploaded byDon Stewart
- Hackage, Cabal and the Haskell Platform: The Second Yearuploaded byDon Stewart
- Concurrent Orchestration in Haskelluploaded byDon Stewart
- The Semantics of Asynchronous Exceptionsuploaded byDon Stewart
- An Introduction to Communicating Haskell Processesuploaded byDon Stewart
- Multicore programming in Haskelluploaded byDon Stewart
- Multicore Haskell Now!uploaded byDon Stewart
- Supercompilation for Haskelluploaded byDon Stewart
- The Haskell Platform: Status Reportuploaded byDon Stewart
- The Birth of the Industrial Haskell Group : CUFP 2009uploaded byDon Stewart
- Multicore Programming in Haskell Now!uploaded byDon Stewart
- Multicore Haskell Now!uploaded byDon Stewart
- Engineering Large Projects in Haskell: A Decade of FP at Galoisuploaded byDon Stewart
- Haskell: Batteries Includeduploaded byDon Stewart
- The Design and Implementation of xmonaduploaded byDon Stewart
- A Wander Through GHC's New IO Libraryuploaded byDon Stewart
- Domain Specific Languages for Domain Specific Problemsuploaded byDon Stewart
- Specialising Generators for High-Performance Monte-Carlo Simulation ... in Haskelluploaded byDon Stewart
- Stream Fusion for Haskell Arraysuploaded byDon Stewart
- Improving Data Structuresuploaded byDon Stewart

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading