Вы находитесь на странице: 1из 8

9/23/2014

• Course Structure & Book
• Basic of thread and process
• Coordination and synchronization
• Example of Parallel Programming 
– Shared memory : C/C++ Pthread, C++11 thread, 
OpenMP, Cilk
Dr A Sahu – Distributed Memory : MPI
Dept of Computer Science &  • Concurrent Objects
Engineering  – Concurrent Queue, List, stack, Tree, Priority Queue, 
IIT Guwahati Hash, SkipList
• Use of Concurrent objects

• Programming to Simulate Concurrent behavior 
• Concurrent Programming of system
– Threads and processes – Multi‐threading
– Synchronization and monitors – Doing many task simultaneously 
– Concurrent objects • Platform of Concurrent Programming 
– Concurrent Programming in Java/MPI/CILK/C++. – May be uni‐processor 
• Book  – May be shared or distributed memory 
multiprocessor
– Maurice Herlihy, Nir Shavit, Art of Multiprocessor 
Programming, Elsevier 2009 • Parallel Programming 
– Enhancing performance of application by running 
– Anthony Williams, C++ Concurrency in Action: 
program in parallel on Multiprocessor  
Practical Multithreading , Dream tech 
Publication, 2012 

• Process • Exchange of data between threads/processes 
– A sequential computation with its own thread of  – Either by explicit message passing 
control  – Or through the values of shared variable
– Can be many threads of a Process • Between Process
• Thread  – Message passing
– A sequential computation is the sequence of the  – Message Passing Interface : MPI‐send(), MPI_recv() 
program points that are reached as control flow 
through source text
• Between thread 
– through the values of shared variable
– Light weight process
– Many things shared by parent process

1
9/23/2014

• Relates the thread of one process with others • Time shared programs appears to run in parallel
– Even if it run on uni‐processor system
If  P is point in the thread of a process P, and q  – Lets go back to Pentium PC, RR Scheduling
is point in the thread of another process Q, • Interrupts (Hardware)
Then  Synchronization can be used to constrain 
Then Synchronization can be used to constrain – Allowed the activity of a central CPU to be 
Allowed the activity of a central CPU to be
the order in which P reached to p and Q synchronized with data channels. 
reaches to q. – If a program P needed to read a card, CPU could 
initiate the read action on a data channel and start 
Synchronization Involves: Exchange of control  executing other program Q. Once  the card had been 
information between processes. read, the channel sent INT to CPU to resume 
execution of P

• Reactive System: Potential for parallelism occurs 
• No need  to specify
in system 
• Process networks in Unix (Pipe)  
– User Interface:  KBD, Mice and Display supporting 
multiple window P1  | P2 | …|Pn
– Each primitive process does a simple job, perhaps 
– Network, Game, Processor controller a trivial job 
a trivial job
– but short pipeline of processes can do what would 
otherwise done by substantial program
• Example
$ bc | number  | speak
$ ls | wc –l 
$ ps –A | grep mozilla

• Concurrent  computation  • Interleaving: The relative order of atomic 
– Can be described in terms of events, where an  events
event is an un‐interruptible action   – An interleaving of two sequence S and T is any 
– Event: execution of assignment, call, expr sequence U formed from the events of S and T
evaluation
l ti – Subjected to constraints: events of S retain their 
order in U and so the event of T
• Example: S={a,b,c,d,…}, T={1,2,3,4,..}
– One  U can be {1,a,b,c,d,2,3,4,e,5,f,g..} 

2
9/23/2014

• Sharing Data • Sharing Data: Reader and Writer 
More than 1 process and 1 must 
– Reader and Writer  be writer • Locking and unlocking 
– 1R, 1W, MR, 1R1W, MR1W, 1RMW, MRMW  – Mutex
– Synchronization necessary: One process should be  • Hardware Instruction to ensure locking 
writer
– Atomic Instructions: TAS, LL/LD pair, XHNG, SWAP
At i I t ti TAS LL/LD i XHNG SWAP
– Mutual Exclusion: Critical Section Problem
– TAS (test and set) 
• Barrier or Fence For_all_N_threads DoWork1(); – TTAS (try, test and set) This part we will discuss towards 
– Wait until some thing  waits(N); End of this course (in Nov)
For_all_N_threads DoWork2(); – TTAS with Backup •Atomic Register
– Synchronized waits(N);
•Safe Register 
– Example: Phase wise executions • Relative Power of Sync’ 
operations

int main() {
#include <stdio.h> pthread_t thr1, thr2;
#include <stdlib.h> const char *MSG1="Thr 1“, *MSG2="Thr 2";
#include <pthread.h> int iret1, iret2;
iret1 = pthread_create( &thr1, NULL,
void *thr_func( void *ptr ){ thr_func, (void*) MSG1);
thr_func
char *message; iret2 = pthread_create( &thr2, NULL,
message = (char *) ptr; thr_func, (void*) MSG2);
thr_func
printf("%s \n", message); pthread_join(thr1, NULL);
} pthread_join(thr2, NULL);
exit(0);
}
$ g++ -pthread pthread1.c –o pthread1

#define NTH 10 int counter = 0;


pthread_mutex_t M = 0; int main() {
thread_t th[NTH];
int i, j;
void SimpleCnt() {
for(int i=0;i<10;i++) counter++; for(i=0; i < NTH; i++) {
} pthread_create(&th[i],NULL,SimpleCnt,NULL );
}

void MutexCnt() { for(j=0; j < NTH; j++) {


for(int i=0;i<10;i++){ pthread_join( th[j], NULL);
pthread_mutex_lock( &M ); }
counter++;
pthread_mutex_unlock( &M); printf("Final counter value: %d\n", counter);
}
}
}

3
9/23/2014

int counter = 0;
int main() {
thread_t th[NTH];
int i, j;

for(i=0; i < NTH; i++) {


pthread_create(&th[i],NULL,MutexCnt,NULL );
}

for(j=0; j < NTH; j++) {


pthread_join( th[j], NULL);
.. spin
CS

critical Resets lock


} lock section upon exit

printf("Final counter value: %d\n", counter);


}

…lock introduces …lock suffers from contention


sequential bottleneck

.. spin
CS

critical Resets lock


.. spin
CS

critical Resets lock


lock section upon exit lock section upon exit

…lock suffers from contention …lock suffers from contention

.. spin
CS

critical Resets lock


.. spin
CS

critical Resets lock


lock section upon exit lock section upon exit

Notice: these are distinct


Seq Bottleneck Æ no parallelism
phenomena

4
9/23/2014

Review: Test‐and‐Set
• Boolean value
• Test‐and‐set (TAS)
…lock suffers from contention
– Swap true with current value
– Return value tells if prior value was true or false
• Can reset just by writing false
.. spin
CS

critical Resets lock


• TAS aka “getAndSet”
lock section upon exit

Contention Æ ???

import java.util.concurrent.atomic
public class AtomicBoolean { • Locking
boolean value; – Lock is free: value is false
– Lock is taken: value is true
public synchronized boolean • Acquire lock by calling TAS
getAndSet(boolean newValue) {
boolean prior = value; – If result is false, you win
value = newValue; – If result is true, you lose 
return prior; • Release lock by writing false
}
}

Swap old and new values

Test‐and‐set Lock Graph
class TASlock {
AtomicBoolean state = no speedup
new AtomicBoolean(false); because of
sequential
me

o d lock()
void oc () {
tim

bottleneck
while (state.getAndSet(true)) {}
ideal
}

void unlock() { threads


state.set(false);
}
} Keep trying until lock acquired

5
9/23/2014

Mystery #1
• Lurking stage
TAS lock – Wait until lock “looks” free
– Spin while read returns true (lock taken)
me

• Pouncing state
Pouncing state
tim

Ideal – As soon as lock “looks” available
– Read returns false (lock free)
– Call TAS to acquire lock
threads What is – If TAS loses, back to lurking
going
on?

Mystery #2
class TTASlock {
AtomicBoolean state =
TAS lock
new AtomicBoolean(false);
TTAS lock
me
void lock() { Then try to acquire it
tim

while (true) {
Ideal
while (state.get()) {}
if (!state.getAndSet(true))
return;
} threads
}
Wait until lock looks free

public class Backoff implements lock {


• Both public void lock() {
– TAS and TTAS int delay = MIN_DELAY;
– Do the same thing (in our model) while (true) {
• Except that while (state.get()) {}
– TTAS performs much better than TAS  if (!lock.getAndSet(true))return;
– Neither approaches ideal sleep(random() % delay);
• Approach : Similar to CSMA  BUS protocol  if (delay < MAX_DELAY)
delay = 2 * delay;
– If many people are waiting for shared lock/Lock is 
busy.. Let me wait for some time then try }
– Waiting time may be fixed or increased exponentially.  }
}

6
9/23/2014

Spin‐Waiting Overhead
• Shared memory 
– Pthread, C++11 thread
– Java
TTAS Lock
– OpenMP
time

– Cilk

Backoff lock • Distributed Memory 


– MPI

threads

• Asynchronous tasks and threads • Two ways: std::async and std::thread


• Promises and tasks
• It’s all about things that are Callable:
• Mutexes and condition variables
Functions and Member functions
–Functions and Member functions
• Atomics
–Objects with operator() and Lambda 
functions //anonymous function

#include <future> // for std::async #include <thread> // for std::thread
#include <iostream> #include <iostream>
void write_message(std::string const& message) {
std::cout<<message; void write_message(std::string const& message) {
} std::cout<<message;g
int
i main() {
i () {
}
auto f=std::async(write_message,
"hello world from std::async\n");
int main() {
write_message("hello world from main\n"); std::thread t(write_message,
f.wait(); "hello world from std::thread\n");
} write_message("hello world from main\n");
$ g++ ‐std=c++0x ‐pthread test.cpp t.join();
}

7
9/23/2014

#include <future>
#include <iostream>
• std::launch::async => “as if” in a new  void write_message(std::string const& message) {
thread. std::cout<<message;
• std::launch::deferred => executed on  }
demand. int main() {
auto f=std::async(
• std::launch::async | std::launch::deferred 
std::launch::async, write_message,
=>  implementation chooses (default).
"hello world from std::async\n");
write_message("hello world from main\n");
f.wait();
}

class NewThread implements Runnable {
#include <future>
Thread t; 
#include <iostream>
NewThread() {
int find_the_answer() {
return 42; t = new Thread(this, "Demo Thread");
} System.out.println("Child thread: " + t);
int main() { t.start();  // Start the thread
auto } 
f=std::async(find_the_answer); public void run() {
std::cout<<"the answer is“
<<f.get()<<"\n"; for(int i = 5; i > 0; i‐‐) {
} System.out.println("Child Thread: " + i);
}
}

public class ThreadDemo { • We need to indentify parallelism
public static void main(String args[]) { – How to do extract parallelism manually 
new NewThread();  – Parallel Decomposition 
• Code in threaded model
C d i th d d d l
for(int i = 5; i > 0; i‐‐) {
System.out.println("Main Thread: " + i);
• OS is responsible for running it efficiently 
}
– Less control over runtime
}
$javac ThreadDemo.java
$java  ThreadDemo

Вам также может понравиться