Вы находитесь на странице: 1из 32

Shared Memory: OpenMP

Environment and Synchronization

OpenMP API Overview
API is a set of compiler directives inserted in the
source program (in addition to some library functions).
Ideally, compiler directives do not affect sequential
pragma’s in C / C++ .
(special) comments in Fortran code.
API Semantics
Master thread executes sequential code.
Master and slaves execute parallel code.
Note: very similar to fork-join semantics of Pthreads
create/join primitives.
OpenMP Directives
Parallelization directives:
parallel region
parallel for
Data environment directives:
shared, private, threadprivate, reduction, etc.
Synchronization directives:
barrier, critical
General Rules about Directives
They always apply to the next statement, which must
be a structured block.
#pragma omp …
#pragma omp …
{ statement1; statement2; statement3; }
OpenMP Parallel Region
#pragma omp parallel

A number of threads are spawned at entry.

Each thread executes the same code.
Each thread waits at the end.
Very similar to a number of create/join’s with the
same function in Pthreads.
Getting Threads to do Different Things
Through explicit thread identification (as in Pthreads).
Through work-sharing directives.
Thread Identification
int omp_get_thread_num()
int omp_get_num_threads()

Gets the thread id.

Gets the total number of threads.
#pragma omp parallel
if( !omp_get_thread_num() )
Work Sharing Directives
Always occur within a parallel region directive.
Two principal ones are
parallel for
parallel section
OpenMP Parallel For
#pragma omp parallel
#pragma omp for
for( … ) { … }
Each thread executes a subset of the iterations.
All threads wait at the end of the parallel for.
Multiple Work Sharing Directives
May occur within a single parallel region
#pragma omp parallel
#pragma omp for
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
All threads wait at the end of the first for.
The NoWait Qualifier
#pragma omp parallel
#pragma omp for nowait
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
Threads proceed to second for w/o waiting.
A parallel loop is an example of independent work
units that are numbered
If you have a pre-determined number of independent
work units, the sections is more appropriate
In a sections construct can be any number
of section constructs and each should be independent
They can be executed by any available thread in the
current team
Parallel Sections Directive

#pragma omp parallel

#pragma omp sections
#pragma omp section  this is a delimiter
#pragma omp section

y = f(x) + g(x)
double y1,y2;
#pragma omp sections
#pragma omp section
y1 = f(x)
#pragma omp section
y2 = g(x)
y = y1+y2;
Single directive
It limits the execution of a block to a single thread
If the computation needs to be done only once
Helpful for initializing shared variables
#pragma omp parallel
#pragma omp single
printf(“Inside section single!\n");
//Try to get thread numbers using omp_get_thread_num
// parallel code
Exercise 1:
Matrix multiplication using sections primitive and
observe the time taken
Matrix multiplication using serial programming and
observe the time taken
Exercise 2:
Data Environment Directives (2 of 2)
Private Variables
#pragma omp parallel for private( list )

Makes a private copy for each thread for each variable

in the list.
This and all further examples are with parallel for, but
same applies to other region and work-sharing
Private Variables: Example (1 of 2)
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
Swaps the values in a and b.
Loop-carried dependence on tmp.
Easily fixed by privatizing tmp.
Private Variables: Example (2 of 2)
#pragma omp parallel for private( tmp )
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
Removes dependence on tmp.
Would be more difficult to do in Pthreads.
Private variables are private on a parallel region basis.
Threadprivate variables are global variables that are
private throughout the execution of the program.
#pragma omp threadprivate( list )
Example: #pragma omp threadprivate( x)
Requires program change in Pthreads.
Requires an array of size p.
Access as x[pthread_self()].
Costly if accessed frequently.
Not cheap in OpenMP either.
Reduction Variables
#pragma omp parallel for reduction( op:list )
op is one of +, *, -, &, ^, |, &&, or ||
The variables in list must be used with this operator in
the loop.
The variables are automatically initialized to sensible
Reduction Variables: Example
#pragma omp parallel for reduction( +:sum )
for( i=0; i<n; i++ )
sum += a[i];

Sum is automatically initialized to zero.

int x;
x = 2;
#pragma omp parallel num_threads(2) shared(x)
if (omp_get_thread_num() == 0)
x = 5;
{ /* Print 1: the following read of x has a race */
printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );
#pragma omp barrier
if (omp_get_thread_num() == 0)
{ /* Print 2 */
printf("2: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } else { /* Print 3 */
printf("3: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } }
return 0;
Synchronization Primitives
#pragma omp critical name
Implements critical sections by name.
Similar to Pthreads mutex locks (name ~ lock).
#pragma omp critical barrier
Implements global barrier.
#pragma omp parallel for reduction(+,sum)
for( i=0, sum=0; i<n; i++ )
sum += a[i];

Dependence on sum is removed.

Use OpenMP to implement a producer-consumer program
in which some of the threads are producers and others are
consumers. The producers read text from a collection of
files, one per producer. They insert lines of text into a
single shared queue. The consumers take the lines of text
and tokenize them. Tokens are “words”
A search engine can be implemented using a farm of
servers; each contains a subset of data that can be searched.
Assume that this farm server has a single front-end that
interacts with clients who submit queries. Implement the
above server form using master-worker pattern

Вам также может понравиться