Concurrent Programiing Tutorial-1

9/23/2014
• Course Structure & Book
• Basic of thread and process
• Coordination and synchronization
• Example of Parallel Programming
– Shared memory : C/C++ Pthread, C++11 thread,
OpenMP, Cilk
Dr A Sahu – Distributed Memory : MPI
Dept of Computer Science & • Concurrent Objects
Engineering – Concurrent Queue, List, stack, Tree, Priority Queue,
IIT Guwahati Hash, SkipList
• Use of Concurrent objects
• Programming to Simulate Concurrent behavior
• Concurrent Programming of system
– Threads and processes – Multi‐threading
– Synchronization and monitors – Doing many task simultaneously
– Concurrent objects • Platform of Concurrent Programming
– Concurrent Programming in Java/MPI/CILK/C++. – May be uni‐processor
• Book – May be shared or distributed memory
multiprocessor
– Maurice Herlihy, Nir Shavit, Art of Multiprocessor
Programming, Elsevier 2009 • Parallel Programming
– Enhancing performance of application by running
– Anthony Williams, C++ Concurrency in Action:
program in parallel on Multiprocessor
Practical Multithreading , Dream tech
Publication, 2012
• Process • Exchange of data between threads/processes
– A sequential computation with its own thread of – Either by explicit message passing
control – Or through the values of shared variable
– Can be many threads of a Process • Between Process
• Thread – Message passing
– A sequential computation is the sequence of the – Message Passing Interface : MPI‐send(), MPI_recv()
program points that are reached as control flow
through source text
• Between thread
– through the values of shared variable
– Light weight process
– Many things shared by parent process
1
9/23/2014
• Relates the thread of one process with others • Time shared programs appears to run in parallel
– Even if it run on uni‐processor system
If P is point in the thread of a process P, and q – Lets go back to Pentium PC, RR Scheduling
is point in the thread of another process Q, • Interrupts (Hardware)
Then Synchronization can be used to constrain
Then Synchronization can be used to constrain – Allowed the activity of a central CPU to be
Allowed the activity of a central CPU to be
the order in which P reached to p and Q synchronized with data channels.
reaches to q. – If a program P needed to read a card, CPU could
initiate the read action on a data channel and start
Synchronization Involves: Exchange of control executing other program Q. Once the card had been
information between processes. read, the channel sent INT to CPU to resume
execution of P
• Reactive System: Potential for parallelism occurs
• No need to specify
in system
• Process networks in Unix (Pipe)
– User Interface: KBD, Mice and Display supporting
multiple window P1 | P2 | …|Pn
– Each primitive process does a simple job, perhaps
– Network, Game, Processor controller a trivial job
a trivial job
– but short pipeline of processes can do what would
otherwise done by substantial program
• Example
$ bc | number | speak
$ ls | wc –l
$ ps –A | grep mozilla
• Concurrent computation • Interleaving: The relative order of atomic
– Can be described in terms of events, where an events
event is an un‐interruptible action – An interleaving of two sequence S and T is any
– Event: execution of assignment, call, expr sequence U formed from the events of S and T
evaluation
l ti – Subjected to constraints: events of S retain their
order in U and so the event of T
• Example: S={a,b,c,d,…}, T={1,2,3,4,..}
– One U can be {1,a,b,c,d,2,3,4,e,5,f,g..}
2
9/23/2014
• Sharing Data • Sharing Data: Reader and Writer
More than 1 process and 1 must
– Reader and Writer be writer • Locking and unlocking
– 1R, 1W, MR, 1R1W, MR1W, 1RMW, MRMW – Mutex
– Synchronization necessary: One process should be • Hardware Instruction to ensure locking
writer
– Atomic Instructions: TAS, LL/LD pair, XHNG, SWAP
At i I t ti TAS LL/LD i XHNG SWAP
– Mutual Exclusion: Critical Section Problem
– TAS (test and set)
• Barrier or Fence For_all_N_threads DoWork1(); – TTAS (try, test and set) This part we will discuss towards
– Wait until some thing waits(N); End of this course (in Nov)
For_all_N_threads DoWork2(); – TTAS with Backup •Atomic Register
– Synchronized waits(N);
•Safe Register
– Example: Phase wise executions • Relative Power of Sync’
operations
int main() {
#include <stdio.h> pthread_t thr1, thr2;
#include <stdlib.h> const char *MSG1="Thr 1“, *MSG2="Thr 2";
#include <pthread.h> int iret1, iret2;
iret1 = pthread_create( &thr1, NULL,
void *thr_func( void *ptr ){ thr_func, (void*) MSG1);
thr_func
char *message; iret2 = pthread_create( &thr2, NULL,
message = (char *) ptr; thr_func, (void*) MSG2);
thr_func
printf("%s \n", message); pthread_join(thr1, NULL);
} pthread_join(thr2, NULL);
exit(0);
}
$ g++ -pthread pthread1.c –o pthread1
#define NTH 10 int counter = 0;

pthread_mutex_t M = 0; int main() {
thread_t th[NTH];
int i, j;
void SimpleCnt() {
for(int i=0;i<10;i++) counter++; for(i=0; i < NTH; i++) {
} pthread_create(&th[i],NULL,SimpleCnt,NULL );
}
void MutexCnt() { for(j=0; j < NTH; j++) {

for(int i=0;i<10;i++){ pthread_join( th[j], NULL);
pthread_mutex_lock( &M ); }
counter++;
pthread_mutex_unlock( &M); printf("Final counter value: %d\n", counter);
}
}
}
3
9/23/2014
int counter = 0;
int main() {
thread_t th[NTH];
int i, j;
for(i=0; i < NTH; i++) {

pthread_create(&th[i],NULL,MutexCnt,NULL );
}
for(j=0; j < NTH; j++) {

pthread_join( th[j], NULL);
.. spin
CS
critical Resets lock

} lock section upon exit
printf("Final counter value: %d\n", counter);

}
…lock introduces …lock suffers from contention

sequential bottleneck
.. spin
CS

.. spin
CS

lock section upon exit lock section upon exit
…lock suffers from contention …lock suffers from contention
.. spin
CS

.. spin
CS

lock section upon exit lock section upon exit
Notice: these are distinct

Seq Bottleneck Æ no parallelism
phenomena
4
9/23/2014
Review: Test‐and‐Set
• Boolean value
• Test‐and‐set (TAS)
…lock suffers from contention
– Swap true with current value
– Return value tells if prior value was true or false
• Can reset just by writing false
.. spin
CS

• TAS aka “getAndSet”
lock section upon exit
Contention Æ ???
import java.util.concurrent.atomic
public class AtomicBoolean { • Locking
boolean value; – Lock is free: value is false
– Lock is taken: value is true
public synchronized boolean • Acquire lock by calling TAS
getAndSet(boolean newValue) {
boolean prior = value; – If result is false, you win
value = newValue; – If result is true, you lose
return prior; • Release lock by writing false
}
}
Swap old and new values
Test‐and‐set Lock Graph
class TASlock {
AtomicBoolean state = no speedup
new AtomicBoolean(false); because of
sequential
me
o d lock()
void oc () {
tim
bottleneck
while (state.getAndSet(true)) {}
ideal
}
void unlock() { threads

state.set(false);
}
} Keep trying until lock acquired
5
9/23/2014
Mystery #1
• Lurking stage
TAS lock – Wait until lock “looks” free
– Spin while read returns true (lock taken)
me
• Pouncing state
Pouncing state
tim
Ideal – As soon as lock “looks” available
– Read returns false (lock free)
– Call TAS to acquire lock
threads What is – If TAS loses, back to lurking
going
on?
Mystery #2
class TTASlock {
AtomicBoolean state =
TAS lock
new AtomicBoolean(false);
TTAS lock
me
void lock() { Then try to acquire it
tim
while (true) {
Ideal
while (state.get()) {}
if (!state.getAndSet(true))
return;
} threads
}
Wait until lock looks free
public class Backoff implements lock {

• Both public void lock() {
– TAS and TTAS int delay = MIN_DELAY;
– Do the same thing (in our model) while (true) {
• Except that while (state.get()) {}
– TTAS performs much better than TAS if (!lock.getAndSet(true))return;
– Neither approaches ideal sleep(random() % delay);
• Approach : Similar to CSMA BUS protocol if (delay < MAX_DELAY)
delay = 2 * delay;
– If many people are waiting for shared lock/Lock is
busy.. Let me wait for some time then try }
– Waiting time may be fixed or increased exponentially. }
}
6
9/23/2014
Spin‐Waiting Overhead
• Shared memory
– Pthread, C++11 thread
– Java
TTAS Lock
– OpenMP
time
– Cilk
Backoff lock • Distributed Memory

– MPI
threads
• Asynchronous tasks and threads • Two ways: std::async and std::thread

• Promises and tasks
• It’s all about things that are Callable:
• Mutexes and condition variables
Functions and Member functions
–Functions and Member functions
• Atomics
–Objects with operator() and Lambda
functions //anonymous function
#include <future> // for std::async #include <thread> // for std::thread
#include <iostream> #include <iostream>
void write_message(std::string const& message) {
std::cout<<message; void write_message(std::string const& message) {
} std::cout<<message;g
int
i main() {
i () {
}
auto f=std::async(write_message,
"hello world from std::async\n");
int main() {
write_message("hello world from main\n"); std::thread t(write_message,
f.wait(); "hello world from std::thread\n");
} write_message("hello world from main\n");
$ g++ ‐std=c++0x ‐pthread test.cpp t.join();
}
7
9/23/2014
#include <future>
#include <iostream>
• std::launch::async => “as if” in a new void write_message(std::string const& message) {
thread. std::cout<<message;
• std::launch::deferred => executed on }
demand. int main() {
auto f=std::async(
• std::launch::async | std::launch::deferred
std::launch::async, write_message,
=> implementation chooses (default).
"hello world from std::async\n");
write_message("hello world from main\n");
f.wait();
}
class NewThread implements Runnable {
#include <future>
Thread t;
#include <iostream>
NewThread() {
int find_the_answer() {
return 42; t = new Thread(this, "Demo Thread");
} System.out.println("Child thread: " + t);
int main() { t.start(); // Start the thread
auto }
f=std::async(find_the_answer); public void run() {
std::cout<<"the answer is“
<<f.get()<<"\n"; for(int i = 5; i > 0; i‐‐) {
} System.out.println("Child Thread: " + i);
}
}
public class ThreadDemo { • We need to indentify parallelism
public static void main(String args[]) { – How to do extract parallelism manually
new NewThread(); – Parallel Decomposition
• Code in threaded model
C d i th d d d l
for(int i = 5; i > 0; i‐‐) {
System.out.println("Main Thread: " + i);
• OS is responsible for running it efficiently
}
– Less control over runtime
}
$javac ThreadDemo.java
$java ThreadDemo

Concurrent Programiing Tutorial-1

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Concurrent Programiing Tutorial-1

Загружено:

Авторское право:

Доступные форматы

9/23/2014

#define NTH 10 int counter = 0;

void MutexCnt() { for(j=0; j < NTH; j++) {

for(i=0; i < NTH; i++) {

for(j=0; j < NTH; j++) {

critical Resets lock

printf("Final counter value: %d\n", counter);

…lock introduces …lock suffers from contention

critical Resets lock

critical Resets lock

…lock suffers from contention …lock suffers from contention

critical Resets lock

critical Resets lock

Notice: these are distinct

critical Resets lock

Swap old and new values

void unlock() { threads

public class Backoff implements lock {

Backoff lock • Distributed Memory

• Asynchronous tasks and threads • Two ways: std::async and std::thread

Вам также может понравиться