Вы находитесь на странице: 1из 122

Introduction

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Art of Multiprocessor Programming

Moores Law
Transistor count still rising Clock speed flattening sharply

Art of Multiprocessor Programming

Vanishing from your Desktops: The Uniprocesor

cpu

memory

Art of Multiprocessor Programming

Your Server: The Shared Memory Multiprocessor (SMP)

cache

cache
Bus

cache

Bus

shared memory

Art of Multiprocessor Programming

Your New Server or Desktop: The Multicore Processor (CMP)


All on the same chip
cache cache
Bus

cache

Bus

shared memory

Sun T2000 Niagara

Art of Multiprocessor Programming

From the 2008 press


Intel has announced a press conference in
San Francisco on November 17th, where it will officially launch the Core i7 Nehalem processor Suns next generation Enterprise T5140 and T5240 servers, based on the 3rd Generation UltraSPARC T2 Plus processor, were released two days ago

Art of Multiprocessor Programming

Why is Kunle Smiling?

Niagara 1

Art of Multiprocessor Programming

Traditional Scaling Process


7x
Speedup

1.8x
User code

3.6x

Traditional Uniprocessor

Time: Moores law


Art of Multiprocessor Programming 9

Multicore Scaling Process


7x
Speedup

1.8x

3.6x

User code

Multicore

Unfortunately, not so simple


Art of Multiprocessor Programming 10

Real-World Scaling Process


Speedup

1.8x
User code

2x

2.9x

Multicore

Parallelization and Synchronization require great care


Art of Multiprocessor Programming 11

Multicore Programming: Course Overview


Fundamentals
Models, algorithms, impossibility

Real-World programming
Architectures Techniques
Art of Multiprocessor Programming 12

Multicore Programming: Course Overview


Fundamentals
Models, algorithms, impossibility

Real-World programming
Architectures Techniques
Art of Multiprocessor Programming 13

Sequential Computation
thread memory

object
Art of Multiprocessor Programming

object
14

Concurrent Computation

memory

object
Art of Multiprocessor Programming

object
15

Asynchrony

Sudden unpredictable delays

Cache misses (short) Page faults (long) Scheduling quantum used up (really long)
16

Art of Multiprocessor Programming

Model Summary
Multiple threads
Sometimes called processes

Single shared memory Objects live in memory Unpredictable asynchronous delays

Art of Multiprocessor Programming

17

Road Map
We are going to focus on principles first, then practice
Start with idealized models Look at simplistic problems Emphasize correctness over pragmatism Correctness may be theoretical, but incorrectness has practical impact

Art of Multiprocessor Programming

18

Concurrency Jargon
Hardware
Processors

Software
Threads, processes

Sometimes OK to confuse them, sometimes not.

Art of Multiprocessor Programming

19

Parallel Primality Testing


Challenge
Print primes from 1 to 1010

Given
Ten-processor multiprocessor One thread per processor

Goal
Get ten-fold speedup (or close)

Art of Multiprocessor Programming

20

Load Balancing
1
P0

109 2109
P1

1010
P9

Split the work evenly Each thread tests range of 109

Art of Multiprocessor Programming

21

Procedure for Thread i


void primePrint { int i = ThreadID.get(); // IDs in {0..9} for (j = i*109+1, j<(i+1)*109; j++) { if (isPrime(j)) print(j); } }

Art of Multiprocessor Programming

22

Issues
Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads
Uneven Hard to predict

Art of Multiprocessor Programming

23

Issues
Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads
Uneven Hard to predict

Need dynamic load balancing

Art of Multiprocessor Programming

24

Shared Counter 19 18
17
Art of Multiprocessor Programming 25

each thread takes a number

Procedure for Thread i


int counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } }
Art of Multiprocessor Programming 26

Procedure for Thread i


Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) Shared counter print(j); object } }
Art of Multiprocessor Programming 27

Where Things Reside


void primePrint { int i = ThreadID.get(); // IDs in {0..9} for (j = i*109+1, j<(i+1)*109; j++) { if (isPrime(j)) print(j); } }

code
cache
cache
Bus

Local variables

cache

Bus
shared memory

shared counter
Art of Multiprocessor Programming 28

Procedure for Thread i


Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); Stop when every } value taken }
Art of Multiprocessor Programming 29

Procedure for Thread i


Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } Increment & return

each new value

Art of Multiprocessor Programming

30

Counter Implementation

public class Counter { private long value; public long getAndIncrement() { return value++; }

Art of Multiprocessor Programming

31

Counter Implementation

public class Counter { private long value; public long getAndIncrement() { return value++; }

Art of Multiprocessor Programming

32

What It Means

public class Counter { private long value; public long getAndIncrement() { return value++; }

Art of Multiprocessor Programming

33

What It Means

public class Counter { private long value; public long getAndIncrement() { return value++; temp = value; } value = temp + 1; } return temp;

Art of Multiprocessor Programming

34

Not so good
Value 1 2
read write 1 2 read 2

3
write 3

read 1
time
Art of Multiprocessor Programming

write 2

35

Is this problem inherent? !! !!


read

write

write

read

If we could only glue reads and writes together

Art of Multiprocessor Programming

36

Challenge

public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } }
Art of Multiprocessor Programming 37

Challenge

public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; Make these } }

steps atomic (indivisible)


38

Art of Multiprocessor Programming

Hardware Solution

public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } }

ReadModifyWrite() instruction Art of Multiprocessor Programming 39

An Aside: Java
public class Counter { private long value; public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; } }
Art of Multiprocessor Programming 40

An Aside: Java
public class Counter { private long value; public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; } Synchronized block
Art of Multiprocessor Programming 41

An Aside: Java
public class Counter { private long value; public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; } }
Art of Multiprocessor Programming 42

Mutual Exclusion

Mutual Exclusion or Alice & Bob share a pond


A B

Art of Multiprocessor Programming

43

Alice has a pet


A B

Art of Multiprocessor Programming

44

Bob has a pet


A B

Art of Multiprocessor Programming

45

The Problem
A B

The pets dont get along

Art of Multiprocessor Programming

46

Formalizing the Problem


Two types of formal properties in asynchronous computation: Safety Properties
Nothing bad happens ever

Liveness Properties
Something good happens eventually

Art of Multiprocessor Programming

47

Formalizing our Problem


Mutual Exclusion
Both pets never in pond simultaneously This is a safety property

No Deadlock
if only one wants in, it gets in if both want in, one gets in. This is a liveness property

Art of Multiprocessor Programming

48

Simple Protocol
Idea
Just look at the pond

Gotcha
Not atomic Trees obscure the view

Art of Multiprocessor Programming

49

Interpretation
Threads cant see what other threads are doing Explicit communication required for coordination

Art of Multiprocessor Programming

50

Cell Phone Protocol


Idea
Bob calls Alice (or vice-versa)

Gotcha
Bob takes shower Alice recharges battery Bob out shopping for pet food

Art of Multiprocessor Programming

51

Interpretation
Message-passing doesnt work Recipient might not be
Listening There at all

Communication must be
Persistent (like writing) Not transient (like speaking)

Art of Multiprocessor Programming

52

Can Protocol

cola

cola

Art of Multiprocessor Programming

53

Bob conveys a bit


A
cola

Art of Multiprocessor Programming

54

Bob conveys a bit


A B

Art of Multiprocessor Programming

55

Can Protocol
Idea
Cans on Alices windowsill Strings lead to Bobs house Bob pulls strings, knocks over cans

Gotcha
Cans cannot be reused Bob runs out of cans

Art of Multiprocessor Programming

56

Interpretation
Cannot solve mutual exclusion with interrupts
Sender sets fixed bit in receivers space Receiver resets bit when ready Requires unbounded number of interrupt bits

Art of Multiprocessor Programming

57

Flag Protocol
A B

Art of Multiprocessor Programming

58

Alices Protocol (sort of)


A B

Art of Multiprocessor Programming

59

Bobs Protocol (sort of)


A B

Art of Multiprocessor Programming

60

Alices Protocol
Raise flag Wait until Bobs flag is down Unleash pet Lower flag when pet returns

Art of Multiprocessor Programming

61

Bobs Protocol
Raise flag Wait until Alices flag is down Unleash pet Lower flag when pet returns

Art of Multiprocessor Programming

62

Bobs Protocol (2nd try)


Raise flag While Alices flag is up

Unleash pet Lower flag when pet returns


Art of Multiprocessor Programming

Lower flag Wait for Alices flag to go down Raise flag

63

Bobs Protocol
Raise flag While Alices flag is up Bob defers to Alice

Unleash pet Lower flag when pet returns


Art of Multiprocessor Programming

Lower flag Wait for Alices flag to go down Raise flag

64

The Flag Principle


Raise the flag Look at others flag Flag Principle:
If each raises and looks, then Last to look must see both flags up

Art of Multiprocessor Programming

65

Proof of Mutual Exclusion


Assume both pets in pond
Derive a contradiction By reasoning backwards

Consider the last time Alice and Bob each looked before letting the pets in Without loss of generality assume Alice was the last to look
Art of Multiprocessor Programming 66

Proof
Bob last raised flag Alice last raised her flag Alices last look Bobs last look
time

Alice must have seen Bobs Flag. A Contradiction Art of Multiprocessor Programming 67

Proof of No Deadlock
If only one pet wants in, it gets in.

Art of Multiprocessor Programming

68

Proof of No Deadlock
If only one pet wants in, it gets in. Deadlock requires both continually trying to get in.

Art of Multiprocessor Programming

69

Proof of No Deadlock
If only one pet wants in, it gets in. Deadlock requires both continually trying to get in. If Bob sees Alices flag, he gives her priority (a gentleman)

Art of Multiprocessor Programming

70

Remarks
Protocol is unfair Protocol uses waiting
Bobs pet might never get in
If Bob is eaten by his pet, Alices pet might never get in

Art of Multiprocessor Programming

71

Moral of Story
Mutual Exclusion cannot be solved by It can be solved by
transient communication (cell phones) interrupts (cans) one-bit shared variables that can be read or written

Art of Multiprocessor Programming

72

The Arbiter Problem (an aside)


Pick a point

Pick a point
Art of Multiprocessor Programming 73

The Fable Continues


Alice and Bob fall in love & marry

Art of Multiprocessor Programming

74

The Fable Continues


Alice and Bob fall in love & marry Then they fall out of love & divorce
She gets the pets He has to feed them

Art of Multiprocessor Programming

75

The Fable Continues


Alice and Bob fall in love & marry Then they fall out of love & divorce
She gets the pets He has to feed them

Leading to a new coordination problem: Producer-Consumer

Art of Multiprocessor Programming

76

Bob Puts Food in the Pond


A

Art of Multiprocessor Programming

77

Alice releases her pets to Feed


mmm mmm

Art of Multiprocessor Programming

78

Producer/Consumer
Alice and Bob cant meet
Each has restraining order on other So he puts food in the pond And later, she releases the pets

Avoid
Releasing pets when theres no food Putting out food if uneaten food remains

Art of Multiprocessor Programming

79

Producer/Consumer
Need a mechanism so that
Bob lets Alice know when food has been put out Alice lets Bob know when to put out more food

Art of Multiprocessor Programming

80

Surprise Solution
A
cola

Art of Multiprocessor Programming

81

Bob puts food in Pond


A
cola

Art of Multiprocessor Programming

82

Bob knocks over Can


A B

Art of Multiprocessor Programming

83

Alice Releases Pets


A
yum yum

Art of Multiprocessor Programming

84

Alice Resets Can when Pets are Fed


A
cola

Art of Multiprocessor Programming

85

Pseudocode
while (true) { while (can.isUp()){}; pet.release(); pet.recapture(); can.reset(); }

Alices code
Art of Multiprocessor Programming 86

Pseudocode
while (true) { while (can.isUp()){}; pet.release(); pet.recapture(); can.reset(); while (true) { } while (can.isDown()){}; pond.stockWithFood(); can.knockOver();

Bobs code

Alices code

}
87

Art of Multiprocessor Programming

Correctness
Mutual Exclusion
Pets and Bob never together in pond

Art of Multiprocessor Programming

88

Correctness
Mutual Exclusion
Pets and Bob never together in pond

No Starvation
if Bob always willing to feed, and pets always famished, then pets eat infinitely often.

Art of Multiprocessor Programming

89

Correctness
Mutual Exclusion No Starvation

safety liveness safety

Pets and Bob never together in pond

Producer/Consumer

if Bob always willing to feed, and pets always famished, then pets eat infinitely often. The pets never enter pond unless there is food, and Bob never provides food if there is unconsumed food.
Art of Multiprocessor Programming 90

Could Also Solve Using Flags


A B

Art of Multiprocessor Programming

91

Waiting
Both solutions use waiting
while(mumble){}

In some cases waiting is problematic


If one participant is delayed So is everyone else But delays are common & unpredictable

Art of Multiprocessor Programming

92

The Fable drags on


Bob and Alice still have issues

Art of Multiprocessor Programming

93

The Fable drags on


Bob and Alice still have issues So they need to communicate

Art of Multiprocessor Programming

94

The Fable drags on


Bob and Alice still have issues So they need to communicate They agree to use billboards

Art of Multiprocessor Programming

95

Billboards are Large

B D A C E
3 2 1 3

From Scrabble box


96

Letter Tiles

Art of Multiprocessor Programming Art of Multiprocessor Programming

Write One Letter at a Time

W A S
4 1

B D A C E
3 2 1 3

1 97

Art of Multiprocessor Programming

To post a message

W A S H T H E C A R
4 1 1 4 1 4 1 3 1

whe w

Art of Multiprocessor Programming

98

Lets send another message


L A M PS
1 1 3 3

S E L L
1 1 1

L A V A
1 1 4

Art of Multiprocessor Programming

99

Uh-Oh

S E L L
1 1 1

T H E
1 4

C A R
3 1

OK
Art of Multiprocessor Programming 100

Readers/Writers
Devise a protocol so that
Writer writes one letter at a time Reader reads one letter at a time Reader sees snapshot
Old message or new message No mixed messages

Art of Multiprocessor Programming

101

Readers/Writers (continued)
Easy with mutual exclusion But mutual exclusion requires waiting
One waits for the other Everyone executes sequentially

Remarkably
We can solve R/W without mutual exclusion

Art of Multiprocessor Programming

102

Esoteric?
Java container size() method Single shared counter?
incremented with each add() and decremented with each remove()

Threads wait to exclusively access counter

Art of Multiprocessor Programming

103

Readers/Writers Solution
Each thread i has size[i] counter
only it increments or decrements.

To get objects size, a thread reads a snapshot of all counters This eliminates the bottleneck

Art of Multiprocessor Programming

104

Why do we care?
We want as much of the code as possible to execute concurrently (in parallel) A larger sequential part implies reduced performance Amdahls law: this relation is not linear

Art of Multiprocessor Programming

105

Amdahls Law

Speedup=

OldExecutionTime NewExecutionTime

of computation given

n CPUs instead of 1
106

Art of Multiprocessor Programming

Amdahls Law

Speedup=

p 1p n
107

Art of Multiprocessor Programming

Amdahls Law

Speedup=

p 1p n

Parallel fraction

Art of Multiprocessor Programming

108

Amdahls Law
Sequential fraction

Speedup=

p 1p n

Parallel fraction

Art of Multiprocessor Programming

109

Amdahls Law
Sequential fraction

Speedup=
Number of processors

p 1p n

Parallel fraction

Art of Multiprocessor Programming

110

Example
Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup?

Art of Multiprocessor Programming

111

Example
Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup?

Speedup = 2.17=

0.6 1 0.6 10
112

Art of Multiprocessor Programming

Example
Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup?

Art of Multiprocessor Programming

113

Example
Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup?

Speedup = 3.57=

0.8 1 0.8 10
114

Art of Multiprocessor Programming

Example
Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup?

Art of Multiprocessor Programming

115

Example
Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup?

Speedup = 5.26=

0.9 1 0.9 10
116

Art of Multiprocessor Programming

Example
Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup?

Art of Multiprocessor Programming

117

Example
Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup?

Speedup = 9.17=

0.99 1 0.99 10

Art of Multiprocessor Programming

118

Back to Real-World Multicore Scaling


Speedup

1.8x
User code

2x

2.9x

Multicore

Not reducing sequential % of code


Art of Multiprocessor Programming 120

Why?
Amdahls Law:

Speedup = 1/(ParallelPart/N + SequentialPart)

Pay for N = 8 cores SequentialPart = 25%

As num cores grows the effect of 25% Speedup = only 2.9 times! becomes more accute 2.3/4, 2.9/8, 3.4/16, 3.7/32.

Shared Data Structures


c c c c c c c c c c c c c c c c The reason Fine grained parallelism we get has huge performance onlybenefit 2.9 speedup c c c c c c c c c c c
25% Shared

Coarse Grained

25% Shared

Fine Grained

75% Unshared

c c c c

75% Unshared

Multicore Programming
This is what this course is about
The % that is not easy to make concurrent yet may have a large impact on overall speedup

Next:
A more serious look at mutual exclusion

Art of Multiprocessor Programming

124

This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.


You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.

Art of Multiprocessor Programming

125