Вы находитесь на странице: 1из 32

Shared Memory: OpenMP

Environment and Synchronization


OpenMP API Overview
API is a set of compiler directives inserted in the
source program (in addition to some library functions).
Ideally, compiler directives do not affect sequential
code.
pragma’s in C / C++ .
(special) comments in Fortran code.
API Semantics
Master thread executes sequential code.
Master and slaves execute parallel code.
Note: very similar to fork-join semantics of Pthreads
create/join primitives.
OpenMP Directives
Parallelization directives:
parallel region
parallel for
Data environment directives:
shared, private, threadprivate, reduction, etc.
Synchronization directives:
barrier, critical
General Rules about Directives
They always apply to the next statement, which must
be a structured block.
Examples
#pragma omp …
statement
#pragma omp …
{ statement1; statement2; statement3; }
OpenMP Parallel Region
#pragma omp parallel

A number of threads are spawned at entry.


Each thread executes the same code.
Each thread waits at the end.
Very similar to a number of create/join’s with the
same function in Pthreads.
Getting Threads to do Different Things
Through explicit thread identification (as in Pthreads).
Through work-sharing directives.
Thread Identification
int omp_get_thread_num()
int omp_get_num_threads()

Gets the thread id.


Gets the total number of threads.
Example
#pragma omp parallel
{
if( !omp_get_thread_num() )
master();
else
slave();
}
Work Sharing Directives
Always occur within a parallel region directive.
Two principal ones are
parallel for
parallel section
OpenMP Parallel For
#pragma omp parallel
#pragma omp for
for( … ) { … }
Each thread executes a subset of the iterations.
All threads wait at the end of the parallel for.
Multiple Work Sharing Directives
May occur within a single parallel region
#pragma omp parallel
{
#pragma omp for
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
}
All threads wait at the end of the first for.
The NoWait Qualifier
#pragma omp parallel
{
#pragma omp for nowait
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
}
Threads proceed to second for w/o waiting.
Sections
A parallel loop is an example of independent work
units that are numbered
If you have a pre-determined number of independent
work units, the sections is more appropriate
In a sections construct can be any number
of section constructs and each should be independent
They can be executed by any available thread in the
current team
Parallel Sections Directive

#pragma omp parallel


{
#pragma omp sections
{
{…}
#pragma omp section  this is a delimiter
{…}
#pragma omp section
{…}

}
}
Example:
y = f(x) + g(x)
double y1,y2;
#pragma omp sections
{
#pragma omp section
y1 = f(x)
#pragma omp section
y2 = g(x)
}
y = y1+y2;
Single directive
It limits the execution of a block to a single thread
If the computation needs to be done only once
Helpful for initializing shared variables
#pragma omp parallel
{
#pragma omp single
printf(“Inside section single!\n");
//Try to get thread numbers using omp_get_thread_num
// parallel code
}
Exercise 1:
Matrix multiplication using sections primitive and
observe the time taken
Matrix multiplication using serial programming and
observe the time taken
Exercise 2:
Data Environment Directives (2 of 2)
Private
Threadprivate
Reduction
Private Variables
#pragma omp parallel for private( list )

Makes a private copy for each thread for each variable


in the list.
This and all further examples are with parallel for, but
same applies to other region and work-sharing
directives.
Private Variables: Example (1 of 2)
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}
Swaps the values in a and b.
Loop-carried dependence on tmp.
Easily fixed by privatizing tmp.
Private Variables: Example (2 of 2)
#pragma omp parallel for private( tmp )
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}
Removes dependence on tmp.
Would be more difficult to do in Pthreads.
Threadprivate
Private variables are private on a parallel region basis.
Threadprivate variables are global variables that are
private throughout the execution of the program.
Threadprivate
#pragma omp threadprivate( list )
Example: #pragma omp threadprivate( x)
Requires program change in Pthreads.
Requires an array of size p.
Access as x[pthread_self()].
Costly if accessed frequently.
Not cheap in OpenMP either.
Reduction Variables
#pragma omp parallel for reduction( op:list )
op is one of +, *, -, &, ^, |, &&, or ||
The variables in list must be used with this operator in
the loop.
The variables are automatically initialized to sensible
values.
Reduction Variables: Example
#pragma omp parallel for reduction( +:sum )
for( i=0; i<n; i++ )
sum += a[i];

Sum is automatically initialized to zero.


{
int x;
x = 2;
#pragma omp parallel num_threads(2) shared(x)
{
if (omp_get_thread_num() == 0)
{
x = 5;
}
else
{ /* Print 1: the following read of x has a race */
printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );
}
#pragma omp barrier
if (omp_get_thread_num() == 0)
{ /* Print 2 */
printf("2: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } else { /* Print 3 */
printf("3: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } }
return 0;
Synchronization Primitives
Critical
#pragma omp critical name
Implements critical sections by name.
Similar to Pthreads mutex locks (name ~ lock).
Barrier
#pragma omp critical barrier
Implements global barrier.
Reduction
#pragma omp parallel for reduction(+,sum)
for( i=0, sum=0; i<n; i++ )
sum += a[i];

Dependence on sum is removed.


Exercise
Use OpenMP to implement a producer-consumer program
in which some of the threads are producers and others are
consumers. The producers read text from a collection of
files, one per producer. They insert lines of text into a
single shared queue. The consumers take the lines of text
and tokenize them. Tokens are “words”
A search engine can be implemented using a farm of
servers; each contains a subset of data that can be searched.
Assume that this farm server has a single front-end that
interacts with clients who submit queries. Implement the
above server form using master-worker pattern

Вам также может понравиться