Вы находитесь на странице: 1из 36

Shared-memory Parallel Programming with Cilk

John Mellor-Crummey
Department of Computer Science Rice University johnmc@rice.edu

COMP 422

Lecture 4 17 January 2013

Topics for Today



Shared-memory systems Cilk overview Example programs Data races and non-determinism Advanced Cilk
inlets SYNCHED locks abort

Performance measures

Shared Memory Architectures


Logical machine model

Hardware model
processors shared global memory
ideal: equidistant from processors

Software model
threads shared variables communication
read shared data write shared data

synchronization

Introducing Cilk
cilk int fib(int n) { if (n < 2) return n; else { int n1, n2; n1 = spawn fib(n-1); n2 = spawn fib(n-2); sync; return (n1 + n2); } }

Cilk constructs

cilk: a Cilk function; without the keyword, a function is standard C spawn: call can execute asynchronously in a concurrent thread sync: current thread waits for all locally-spawned functions

Cilk constructs specify logical parallelism in the program


what computations can be performed in parallel not the mapping of tasks to processes
4

Cilk Language

Cilk is a faithful extension of C
if Cilk keywords are elided C program semantics

Idiosyncrasies
spawn keyword can only be applied to a cilk function spawn keyword cannot be used in a C function cilk function cannot be called with normal C call conventions
must be called with a spawn & awaited using a sync

Cilk Terminology

Parallel control = spawn, sync, return from spawned function Thread = maximal sequence of instructions not containing parallel control (task in earlier terminology)
Thread A: if statement up to first spawn Thread B: computation of n-2 before 2nd spawn Thread C: n1+ n2 before the return fib(n)

cilk int fib(n) { if (n < 2) return n; else { int n1, n2; n1 = spawn fib(n-1); n2 = spawn fib(n-2); sync; return (n1 + n2); } }

continuation

C
6

Cilk Program Execution as a DAG


A
spawn continuation

b(4)

B
return

each circle represents a thread


b(2)

b(3)

A
b(2)

C
b(1)

A
b(1)

C
b(0)

A
b(1)

C
b(0)

A Legend

continuation spawn return

Task Scheduling in Cilk

Alternative strategies
work-sharing: thread scheduled to run in parallel at every spawn
benefit: maximizes parallelism drawback: cost of setting up new threads is high should be avoided

work-stealing: processor looks for work when it becomes idle


lazy parallelism: put off work for parallel execution until necessary benefits: executes with precisely as much parallelism as needed minimizes the number of threads that must be set up runs with same efficiency as serial program on uniprocessor

Cilk uses work-stealing rather than work-sharing

Cilk Execution using Work Stealing



Cilk runtime maps logical tasks to compute cores Approach:
lazy thread creation plus work-stealing scheduler spawn: a potentially parallel task is available
an idle thread steals tasks from a random working thread
f(n)

Possible Execution: thread 1 begins thread 2 steals from 1 thread 3 steals from 1 etc...

f(n-1)

f(n-2)

f(n-2) ... ... ... ... ...

f(n-3) ... ...

f(n-3) ...

f(n-4)

Topics for Today



Shared-memory systems Cilk overview Example programs Data races and non-determinism Advanced Cilk
inlets SYNCHED locks abort

Performance measures

10

Exercise: Sum of First n Integers


Solution sketch?
#include <stdlib.h> #include <stdio.h> #include <cilk.h> cilk double sum(int L, int U) { if (L == U) return L; else { double lower, upper; int mid = (U+L)/2; lower = spawn sum(L, mid); upper = spawn sum(mid+ 1, U); sync; return (lower + upper); } } cilk int main(int argc, char *argv[]) { int n; double result; n = atoi(argv[1]); if (n <= 0) { printf('n = %d': n must be positive\n,n); } else { result = spawn sum(1, n); sync; printf("Result: %lf\n", result); } return 0; }

11

Exercise: Initialize and Sum a Vector


Solution sketch?
#include <stdlib.h> #include <stdio.h> #include <cilk.h> int * v = 0; cilk double sum(int L, int U) { if (L == U) return v[L]; else { double lower, upper; int mid = (U + L)/2; lower = spawn sum(L, mid); upper = spawn sum(mid+ 1, U); sync; return (lower + upper); } } cilk void init(int L, int U) { if (L == U) v[L] = L + 1; else { int mid = (U + L)/2; spawn init(L, mid); spawn init(mid + 1, U); sync; } } cilk int main(int argc, char *argv[]) { int n; double result; n = atoi(argv[1]); v = malloc(sizeof(int) * n); spawn init(0, n-1); sync; result = spawn sum(0, n-1); sync; free(v); printf("Result: %lf\n", result); return 0; }

12

Example: N Queens

Problem
place N queens on an N x N chess board no 2 queens in same row, column, or diagonal

Example: a solution to 8 queens problem

Image credit: http://en.wikipedia.org/wiki/Eight_queens_puzzle

13

N Queens: Many Solutions Possible


Example: 8 queens
92 distinct solutions 12 unique solutions; others are rotations & reflections

Image credit: http://en.wikipedia.org/wiki/Eight_queens_puzzle

14

N Queens Solution Sketch


Sequential Recursive Enumeration of All Solutions
int nqueens(n, j, placement) { // precondition: placed j queens so far if (j == n) { print placement; return; } for (k = 0; k < n; k++) if putting j+1 queen in kth position in row j+1 is legal add queen j+1 to placement nqueens(n, j+1, placement) remove queen j+1 from placement }

Wheres the potential for parallelism? What issues must we consider?


15

Parallel N Queens Solution Sketch


cilk void nqueens(n, j, placement) { // precondition: placed j queens so far if (j == n) { /* found a placement */ process placement; return; } for (k = 1; k <= n; k++) if putting j+1 queen in kth position in row j+1 is legal copy placement into newplacement and add extra queen spawn nqueens(n,j+1,newplacement) sync discard placement } Issues regarding placements
how can we report placements? what if a single placement suffices? no need to compute all legal placements so far, no way to terminate children exploring alternate placement 16

Approaches to Managing Placements

Choices for reporting multiple legal placements


count them print them on the fly collect them on the fly; print them at the end

If only one placement desired, can skip remaining search

17

Topics for Today



Shared-memory systems Cilk overview Example programs Data races and non-determinism Advanced Cilk
inlets SYNCHED locks abort

Performance measures

18

Race Conditions (Data Races)



Two or more concurrent accesses to the same variable At least one is a write
cilk int f() { int x = 0; spawn g(&x); spawn g(&x); sync; return x; } cilk void g(int *p) { *p += 1; }

serial semantics? f returns 2

parallel semantics? lets look closely

parallel execution of two instances of g: g, g many interleavings possible


one interleaving read x read x add 1 add 1 write x write x

read x add 1 write x

f returns 1!

19

Data Races Can Be Subtle! (1)


Erroneous Parallel N Queens Solution Sketch
cilk void nqueens(n, j, placement) { // precondition: placed j queens so far if (j == n) { /* found a placement */ process placement; return; } for (k = 1; k <= n; k++) if putting j+1 queen in kth position in row j+1 is legal place j+1 queen in kth position in row j+1 in placement spawn nqueens(n, j+1,placement) remove queen in kth position in row j+1 in placement
sync

}
20

Data Races Can Be Subtle (2)


Parallel N Queens Solution Sketch Revisited
cilk void nqueens(n,j, placement) { // precondition: placed j queens so far if (j == n) return placement for (k = 0; k < n; k++) if putting j+1 queen in kth position in row j+1 is legal copy placement into newplacement and add extra queen spawn nqueens(n,j+1,newplacement) discard newplacement; sync; if some child found a legal result return one, else return null }
21

Programming with Race Conditions



Approach 1: avoid them completely
no read/write sharing between concurrent tasks only share between child and parent tasks in Cilk

Approach 2: be careful!
guard against data corruption
word operations are atomic on microprocessor architectures definition of a word varies according to processor: 32-bit, 64-bit locks to control atomicity of aggregate structures ! acquire lock

22

Topics for Today



Shared-memory systems Cilk overview Example programs Data races and non-determinism Advanced Cilk
inlets SYNCHED locks abort

Performance measures Cilk++ vs. Cilk


23

inlet

Normal spawn: x = spawn f();
result of f simply copied into callers frame

Problem
might want to handle receipt of a result immediately nqueens: handle legal placement returned from child promptly

Solution: inlet
block of code within a function used to incorporate results executes atomically with respect to enclosing function

Syntax (inlet must appear in declarations section)


cilk int f() { inlet void my_inlet(ResultType* result, iarg2, , iargn) { // atomically incorporate result into fs variables return; } my_inlet(spawn g(), iarg2, , iargn); }

24

Using an inlet
A simple complete example
cilk int fib(int n) { if (n < 2) return n; else { int n1, n2; n1 = spawn fib(n-1); n2 = spawn fib(n-2); sync; return (n1 + n2); } } cilk guarantees inlet instances from all spawned children are atomic w.r.t. one another and caller too cilk int fib(int n) { int result = 0; inlet void add(int r) { result += r; return; } if (n < 2) return n; else { int n1, n2; add(spawn fib(n-1)); add(spawn fib(n-2)); sync; return result; } }

inlet has access to fibs variables

25

abort

Syntax: abort; Where: within a cilk procedure p Purpose: terminate execution of all of ps spawned children Does this help with an nqueens example for a single solution?
cilk void nqueens(n,j, placement) { // precondition: placed j queens so far if (j == n) return placement for (k = 0; k < n; k++) if putting j+1 queen in kth position in row j+1 is legal copy placement into newplacement and add extra queen spawn nqueens(n,j+1,newplacement) sync; discard placement; if some child found a legal result return one, else return null }

Need a way to invoke abort when a child yields a solution

26

N Queens Revisited
New solution that finishes when first legal result discovered
cilk void nqueens(n,j,placement) { int *result = null function initializes result // precondition: placed j queens so far inlet void doresult(childplacement) { if (childplacement == null) return; else { result = copy(childplacement); abort; } } if (j == n) return placement for (k = 0; k < n; k++) if putting j+1 queen in kth position in row j+1 is legal copy placement into newplacement and add extra queen if solution doresult(spawn nqueens(n,j+1,)) found, inlet sync
discard placement;

return result }

updates result and aborts siblings

27

Implicit inlets

General spawn syntax


statement: [lhs op] spawn proc(arg1, , argn); [lhs op] may be omitted
spawn update(&data);

if lhs is present
it must be a variable matching the return type for the function op may be = *= /= %= += -= <<= >>= &= ^= |=

Implicit inlets execute atomically w.r.t. caller


implicit inlets

28

Using an implicit inlet

cilk int fib(int n) { if (n < 2) return n; else { int n1, n2; n1 = spawn fib(n-1); n2 = spawn fib(n-2); sync; return (n1 + n2); } } cilk guarantees implicit inlet instances from all spawned children are atomic w.r.t one another and caller

cilk int fib(int n) { int result = 0; if (n < 2) return n; else { int n1, n2; result += spawn fib(n-1)); result += spawn fib(n-2)); sync; return result; } }

29

SYNCHED

Determine whether a procedure has any currently outstanding children without executing sync
if children have not completed SYNCHED = 0 otherwise SYNCHED = 1

Why SYNCHED? Save storage and enhance locality.


state *state1, state2; state1 = (state *) Cilk_alloca(state_size); spawn foo(state1); /* fill in state1 with data */ if (SYNCHED) state2 = state1; else state2 = (state *) Cilk_alloca(state_size); spawn bar(state2); sync; 30

Locks

Why locks? Guarantee mutual exclusion to shared state
only way to guarantee atomicity when concurrent procedure instances are operating on shared data

Library primitives for locking


Cilk_lock_init(Cilk_lockvar k) Cilk_lock(Cilk_lockvar k) Cilk_unlock(Cilk_lockvar k)

usage example: could use a lock to protect I/O from parallel writes in nqueens parallel solution could enumerate all solutions in the order that they are found

must initialize a lock variable before using it!

31

Concurrency Cautions

Cilk atomicity guarantees


all threads of a single procedure operate atomically threads of a procedure include
all code in the procedure body proper, including inlet code

Guarantee implications
can coordinate caller and callees using inlets without locks

Only limited guarantees between descendants or ancestors


DAG precedence order maintained and nothing more dont assume atomicity between different procedures!

32

Topics for Today



Shared-memory systems Cilk overview Example programs Data races and non-determinism Advanced Cilk
inlets SYNCHED locks abort

Performance measures

33

Performance Measures

T1 = sequential work; minimum running time on 1 processor Tp = minimum running time on P processors T = minimum running time on infinite number of processors
longest path in DAG
length reflects the cost of computation at nodes along the path

known as critical path length

34

Work and Critical Path Example


A
b(4)

b(3)

b(2)

A
b(2)

C
b(1)

A
b(1)

C
b(0)

A
b(1)

C
b(0)

If all threads run in unit time T1 = 17

T = 8 (critical path length)


35

References

Cilk 5.4.6 reference manual. Charles Leiserson, Bradley Kuzmaul, Michael Bender, and Hua-wen Jing. MIT 6.895 lecture notes - Theory of Parallel Systems. http://theory.lcs.mit.edu/classes/6.895/fall03/scribe/ master.ps

36

Вам также может понравиться