Академический Документы
Профессиональный Документы
Культура Документы
: 91 1 1
E-mail : c00tch00@nchc.gov.tw
C MPI ........................................................................................................1
.............................................................................................................................4
1.1 MPI ...................................................................................................5
1.2 ........................................................................6
1.3 IBM SP2 MPI ...................................................................................7
1.3.1 IBM SP2 MPI C ..............................................................7
1.3.2 IBM SP2 Job command file......................................................................7
1.3.3 IBM SP2 ..............................................................9
1.4 PC Cluster MPI...............................................................................11
2.1
MPI .........................................................................................................15
2.1.1 mpi.h include file ..............................................................................................15
2.1.2 MPI_Init, MPI_Finalize....................................................................................15
2.1.3 MPI_Comm_size, MPI_Comm_rank ...............................................................16
MPI_ScatterMPI_GatherMPI_Reduce .............................................................27
2.4.1 MPI_ScatterMPI_Gather ..............................................................................27
MPI_Sendrecv, MPI_Bcast.......................................................................................36
3.1.1 MPI_ Sendrecv ..............................................................................................36
3.1.2 MPI_Bcast .....................................................................................................36
3.2 T3SEQ ........................................................................38
3.3 T3CP .......................................................40
3.4 () T3DCP_1 ............................................47
3.5 () T3DCP_2 ............................................52
...................................................................................57
4.1 T4SEQ ....................................................................58
4.2. MPI_ScattervMPI_Gatherv .................................................................................60
4.3 MPI_PackMPI_UnpackMPI_ BarrierMPI_ Wtime......................................62
4.3.1 MPI_PackMPI_Unpack .............................................................................62
4.3.2
MPI_BarrierMPI_Wtime...........................................................................64
2
4.4
T4DCP ................................................................................66
5.1
5.2
5.3
...............................................................................................72
T5SEQ ..................................................................................73
T5CP .................................................................77
T5DCP ......................................................85
.........................................................................................126
6.4.1 .......................................................................................126
6.4.2 .......................................................................132
.....................................................................................................134
7.1
.............................................................................................135
7.2
.....................................................................................................140
7.3
.........................................................................................150
SOR ............................................................................................158
8.1
SOR ....................................................................................159
8.2
SOR ..................................................................................164
8.3
SOR ..........................................................................................173
8.4
SOR ................................................................181
.....................................................................................................191
9.1
.................................................................................192
9.2
.................................................................................196
.........................................................................................................................207
Parallel Processing of 1-D Arrays without Partition........................................................208
Parallel Processing of 1-D Arrays with Partition.............................................................209
Parallel on the 1st Dimension of 2-D Arrays without Partition.......................................210
Parallel on the 1st Dimension of 2-D Arrays with Partition............................................211
Partition on the 1st dimension of 3-D Arrays ..................................................................212
MPI
MPI
MPI
PC cluster MPI
1.1
MPI
http://www-unix.mcs.anl.gov/mpi/mpich
anonymous ftp
ftp.mcs.anl.gov
(directory) pub/mpi mpich-1.2.1.tar.Z
MPI
mpich-1.2.1.tar.gz
1.2
1.3
SP2 SP2
(level 3 Optimization)
-qarch=auto
-qstrict
-o file.x
file.x
(default) a.out
#!/bin/csh
#@ executable = /usr/bin/poe
#@ arguments = /your_working_directory/file.x
#@ output
= outp4
#@ error
= outp4
#@ job_type = parallel
euilib us
#@ class
= medium
#@ min_processors = 4
#@ max_processors = 4
#@ requirements = (Adapter == "hps_user")
#@ wall_clock_limit = 20
#@ queue
executable
arguments
output
error
class
long
(CPU 96 24 120MHz CPU)
min_processors = CPU
max_processors = CPU
requirements = (Adapter == "hps_user")
wall_clock_limit =
job
queue
#!/bin/csh
#@ network.mpi= css0,shared,us
#@ executable = /usr/bin/poe
#@ arguments = /your_working_directory/file.x
#@ output
= outp4
#@ error
= outp4
euilib us
#@ job_type = parallel
#@ class
= medium
#@ tasks_per_node = 4
#@ node = 1
#@ wall_clock_limit = 20
#@ queue
IBM SP2 SMP Node 375MHz CPU 4GB 8GB
class
= SP2 SMP CPU llclass :
short
(CPU 12 3 Node 6 CPU)
tasks_per_node=4
node=1
Node CPU
medium
bigmem
jobp4
llq llq
grep class user id jobp4 medium
:
llq | grep medium
9
llq :
job_id
----------ivy1.1781.0
ivy1.1814.0
user_id
-----------u43ycc00
u50pao00
job_id
user_id
submitted
status
submitted
-------------8/13 11:24
8/13 20:12
status priority
------ -------R
50
R
50
class
---------medium
short
running on
------------ivy39
ivy35
LoadLeveler
login name
/ :
R Running
I Idle (=waiting in queue)
Priority
ST Start execution
NQ Not Queued
Class
Running on
CPU
CPU
llcancel
llcancel
job_id
10
PC Cluster MPI
1.4
1.4.1
PC Cluster C MPI
-O3
gcc
-o file.x file.x
a.out
file.c
C
PGI MPI mpicc pgcc
pgcc makefile :
11
OBJ
EXE
= file.o
= file.x
MPI
= /home/package/mpich_PGI
LIB
= $(MPI)/lib/libmpich.a
MPICC = $(MPI)/bin/mpicc
OPT = -O2 -I$(MPI)/include
$(EXE) : $(OBJ)
$(MPICC) $(LFLAG) -o $(EXE) $(OBJ) $(LIB)
.f.o :
$(MPICC) $(OPT) -c $<
makefile make
1.4.2
PC cluster DQS
DQS job command file job command file jobp4
CPU hubksp :
#!/bin/csh
#$ -l qty.eq.4,HPCS00
#$ -N HUP4
#$ -A user_id
#$ -cwd
#$ -j y
cat $HOSTS_FILE > MPI_HOST
mpirun -np 4 -machinefile MPI_HOST hubksp >& outp4
#!/bin/csh
C shell script
#$ -l qty.eq.4,HPCS DQS CPUqty (quantity)
HPCS CPU cluster queue class
#$ -N HUP4
(Name) HUP4
#$ -A user_id
(Account)
#$ -cwd
(working directory)
home directory
#$ -j y
$HOST_FILE
-np 4 hubksp
>& outp4
1.4.3
PC Cluster
jobp4
qstat cluster
qstat -f cluster node qsub jobp4 qstat
:
c00tch00 HUP4
c00tch00 HUP4
hpcs01
hpcs02
62
62
0:1
0:1
r
r
RUNNING
RUNNING
02/26/99 10:51:23
02/26/99 10:51:23
c00tch00 HUP4
c00tch00 HUP4
hpcs03
hpcs04
62
62
0:1
0:1
r
r
RUNNING
RUNNING
02/26/99 10:51:23
02/26/99 10:51:23
job_id
13
14
MPI
2.1
MPI
MPI_Init, MPI_Finalize,
MPI_Comm_size, MPI_Comm_rank,
MPI_Send, MPI_Recv
MPI_Finalize MPI_Init
MPI_Finalize :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
main ( argc, argv)
int argc;
char **argv;
{
MPI_Init(&argc, &argv);
...
MPI_Finalize();
return 0;
}
16
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int
nproc, myid;
main ( argc, argv)
int argc;
char **argv;
{
MPI_Init(&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD,
MPI_Comm_rank (MPI_COMM_WORLD,
...
...
MPI_Finalize();
&nproc);
&myid);
return 0;
}
(scalar) (array)
icount data
MPI 1.1
CPU id
17
C data type
signed char
signed short iny
signed int
signed long int
unsigned char
unsigned short int
unsigned int
unsigned long int
float
double
long double
description
1-byte character
2-byte integer
4-byte integer
4-byte integer
1-byte unsigned character
2-byte unsigned integer
4-byte unsigned integer
4-byte unsigned integer
4-byte floating point
8-byte floating point
8-byte floating point
C MPI
MPI_Recv :
MPI_Recv ((void *)&data, icount, DATA_TYPE, isrc, itag, MPI_COMM_WORLD, istat);
data
icount
DATA_TYPE
isrc
itag
istat
CPU id
MPI_Recv
MPI_Status
istat[MPI_STATUS_SIZE];
mpi.h MPI_STATUS_SIZE
MPI_Status
istat[8];
CPU CPU
MPI_COMM_WORLD, istat);
CPU id STATUS
isrc= istat( MPI_SOURCE );
MPI (MPI_SendMPI_Recv) '' (envelope)
(message)
1. CPU id
2. CPU id
3.
4.
communicator
CPU CPU
19
T2SEQ
2.2
/*
PROGRAM T2SEQ
sequential version of 1-dimensional array operation
#include <stdio.h>
#include <stdlib.h>
#define n
*/
200
main ()
{
double suma, a[n], b[n], c[n], d[n];
int
i, j;
FILE *fp;
/*
'input.dat'
*/
*/
read
'input.dat',
20
3.056
2.108
2.057
2.562
2.099
2.054
2.383
2.091
2.052
2.290
2.085
2.050
2.234
2.079
2.048
2.196
2.074
2.046
2.168
2.070
2.044
2.148
2.066
2.043
2.131
2.063
2.041
2.040
2.039
2.037
sum of A=438.548079
2.036
2.035
2.034
2.033
2.032
2.031
2.031
21
2.3
T2CP
(decomposition/partition)
(sequential version)
cpu0
istart
iend
ntotal
|
|
|
cpu1
istart
iend
ntotal
|
|
|
cpu2
istart
iend
ntotal
|
|
|
cpu3
istart
iend
|
|
2.1
2.1 CPU
CPU istart iend
MPI 1.2 Parallel I/O CPU0myid
for loop MPI_Send CPU CPU
22
PROGRAM T2CP
computation partition without data partition of 1-dimensional arrays
#include <stdio.h>
#include <stdlib.h>
*/
#include <mpi.h>
#define n 200
main ( argc, argv)
int argc;
char **argv;
{
double
int
FILE
int
int
itag, isrc, idest, istart1, icount1;
int
gstart[16], gend[16], gcount[16];
MPI_Status
istat[8];
MPI_Comm comm;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
startend( nproc, 0, n - 1, gstart, gend, gcount);
istart=gstart[myid];
iend=gend[myid];
comm=MPI_COMM_WORLD;
printf( "NPROC,MYID,ISTART,IEND=%d\t%d\t%d\t%d\n",nproc,myid,istart,iend);
/*
else {
for ( isrc=1; isrc < nproc; isrc++ ) {
icount1=gcount[isrc];
istart1=gstart[isrc];
MPI_Recv((void *)&a[istart1], icount1, MPI_DOUBLE, isrc, itag, comm, istat);
}
}
if (myid == 0) {
for (i = 0; i < n; i+=40) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
a[i],a[i+5],a[i+10],a[i+15],a[i+20],a[i+25],a[i+30],a[i+35]);
}
suma=0.0;
for (i = 0; i < n; i++)
suma+=a[i];
printf( "sum of A=%f\n",suma);
}
MPI_Finalize();
return 0;
}
startend(,int nproc,int is1,int is2,int gstart[16],int gend[16], int gcount[16])
{
int
ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
for ( i=0; i < nproc; i++ ) {
if(i < ir) {
gstart[i]=is1+i*(iblock+1);
gend[i]=gstart[i]+iblock;
}
else {
gstart[i]=is1+i*iblock+ir;
gend[i]=gstart[i]+iblock-1;
}
if(ilength < 1) {
gstart[i]=1;
gend[i]=0;
25
}
gcount[i]=gend[i]-gstart[i] + 1;
}
}
T2CP :
ATTENTION: 0031-408 4 nodes allocated by LoadLeveler, continuing...
NPROC,MYID,ISTART,IEND=4
1
50
99
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
10.000 3.056
2.562
2.383
2.290
0
49
150
199
100
149
2.234
2.196
2.168
2.148
2.074
2.050
2.131
2.070
2.048
0
3
2
2.118
2.066
2.046
2.108
2.063
2.044
2.099
2.060
2.043
2.091
2.057
2.041
2.085
2.054
2.040
2.079
2.052
2.039
2.037
2.036
2.035
sum of A=438.548079
2.034
2.033
2.032
2.031
2.031
2.4
MPI_ScatterMPI_GatherMPI_Reduce
MPI_ScatterMPI_GatherMPI_AllgatherMPI_ReduceMPI_Allreduce '
' communicator CPU
CPU
2.4.1 MPI_Scatter
MPI_Gather
MPI_Scatter iroot CPU t nproc (nproc= CPU )
n CPU id CPU ( iroot CPU )
CPU0 CPU1 CPU2 2.2 :
CPU0
t 1t 2t 3t 4
CPU0
t1
CPU1
t2
CPU2
CPU2
t3
CPU3
CPU3
t4
CPU1
Scatter
>
2.2 MPI_Scatter
MPI_Scatter :
iroot = 0
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&b, n, MPI_DOUBLE, iroot, comm);
n
CPU
MPI_DOUBLE
b
n b
n
MPI_DOUBLE
iroot
CPU id
CPU id t CPU0 n t
CPU1 n t CPU2 n t
2.3 :
CPU0
t 1t 2t 3t 4
CPU0
t1
CPU1
t2
CPU2
CPU2
t3
CPU3
CPU3
t4
CPU1
Gather
<
2.3 MPI_Gather
MPI_Gather :
idest = 0
MPI_Gather ((void *)&a, n, MPI_DOUBLE, (void *)&t, n, MPI_ DOUBLE, idest, comm);
MPI_Gather
a
n a
n
MPI_DOUBLE
t
n
MPI_DOUBLE
CPU
idest
CPU id
MPI_Allgather :
MPI_ Allgather ((void *)&a, n, MPI_DOUBLE, (void *)&t, n, MPI_DOUBLE, comm);
MPI_Allgather MPI_Gather MPI_Gather
CPU MPI_ Allgather CPU
28
CPU0
t 1t 2t 3t 4
CPU1
t 1t 2t 3t 4
CPU2
CPU3
CPU0
t1
CPU1
t2
t 1t 2t 3t 4
CPU2
t3
t 1t 2t 3t 4
CPU3
t4
Allgather
2.4 MPI_Allgather
CPU0
suma
0.2 1.5
CPU0
CPU1
suma
0.5 0.6
CPU2
suma
0.3 0.4
Reduce
CPU1
>
(MPI_SUM) CPU2
CPU3
suma
0.7 1.0
CPU3
sumall
1.7 3.5
2.5 MPI_Reduce
CPU0
suma
0.2 1.5
CPU1
suma
0.5 0.6
CPU2
suma
0.3 0.4
CPU3
suma
0.7 1.0
Allreduce
>
(MPI_SUM)
CPU0
sumall
1.7 3.5
CPU1
sumall
1.7 3.5
CPU2
sumall
1.7 3.5
CPU3
sumall
1.7 3.5
2.6 MPI_Allreduce
29
MPI_Reduce MPI_Allreduce :
iroot = 0;
MPI_Reduce ((void *)&suma, (void *)&sumall, count, MPI_DOUBLE, MPI_SUM,
iroot, comm);
MPI_Allreduce((void *)&suma, (void *)&sumall, count, MPI_DOUBLE, MPI_SUM,
comm);
suma
sumall
count
MPI_DOUBLE
()
() ( CPU suma )
()
suma sumall
2.1
CPU_id
MPI_SUM
iroot
Operation
sum
product
maximum
minimum
max value and location
min value and location
logical AND
logical OR
logical exclusive OR
binary AND
binary OR
binary exclusive OR
C Data type
MPI_INT, MPI_ FLOAT,
MPI_DOUBLE, MPI_LONG_DOUBLE
MPI_FLOAT_INT, MPI_DOUBLE_INT,
MPI_LONG_INT, MPI_2INT
MPI_SHORT, MPI_LONG, MPI_INT,
MPI_UNSIGNED_SHORT, MPI_UNSIGNED,
MPI_UNSIGNED_LONG
MPI_SHORT, MPI_LONG, MPI_INT,
MPI_UNSIGNED_SHORT, MPI_UNSIGNED,
MPI_UNSIGNED_LONG
Description (C structure)
{MPI_FLOAT, MPI_INT}
{MPI_DOUBLE, MPI_INT}
{MPI_LONG, MPI_INT}
{MPI_INT, MPI_INT}
30
2.5
T2DCP
T2DCP np CPU
abcd ntotal np bcd
ntotal ntotal t bc
d aCPU0 MPI_Scatter
CPU
iroot=0;
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&b, n, MPI_DOUBLE, iroot, comm);
CPU MPI_Gather a
CPU0
idest=0;
MPI_Gather ((void *)&a, n, MPI_DOUBLE, (void *)&t, n, MPI_DOUBLE, idest, comm);
T2CP T2DCP MPI_ScatterMPI_Gather
CPU MPI_Send
MPI_Recv
dimension CPU np ntotal
n ntotal / np define :
#define ntotal
#define np
#define n
200
4
50
31
iroot=0;
MPI_Reduce ((void *0&suma, (void *)&sumall, 1, MPI_DOUBLE, MPI_SUM, iroot,
comm);
T2DCP :
/*
PROGRAM T2DCP */
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ntotal 200
#define np 4
#define n
50
main ( argc, argv)
int argc;
char **argv;
{
/*
*/
int
FILE
i, j, k;
*fp;
double
a[n], b[n], c[n], d[n], t[ntotal], suma, sumall;
int
nproc, myid, istart, iend, iroot, idest;
MPI_Comm
comm;
MPI_Status
istat[8];
extern int
mod;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
Comm = MPI_COMM_WORLD;
istart = 0;
iend = n-1;
32
/*
*/
/*
compute, gather computed data, and write out the result
*/
suma=0.0;
/* for(i=0; i<ntotal; i++) { */
for(i=istart; i<=iend; i++) {
a[i]=b[i]+c[i]*d[i];
suma=suma+a[i];
}
idest=0;
MPI_Gather((void *)&a, n, MPI_DOUBLE, (void *)&t, n, MPI_ DOUBLE, idest, comm);
MPI_Reduce((void *)&suma, (void *)&sumall, 1, MPI_DOUBLE, MPI_SUM, idest, comm);
if(myid == 0) {
for (i = 0; i < ntotal; i+=40) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
t[i],t[i+5],t[i+10],t[i+15],t[i+20],t[i+25],t[i+30],t[i+35]);
33
}
printf( "sum of A=%f\n",sumall);
}
MPI_Finalize();
return 0;
}
T2DCP :
ATTENTION: 0031-408
10.000 3.056
2.562
2.148
2.131
2.118
2.074
2.070
2.066
2.050
2.048
2.046
2.037
2.036
2.035
sum of A=438.548079
2.043
2.033
2.041
2.032
2.040
2.031
2.039
2.031
34
MPI
3.1 MPI MPI_SendrecvMPI_BcastMPI_ Sendrecv
MPI_Bcast
3.2 T3SEQ
3.3 MPI_ SendrecvMPI_SendMPI_Recv T3SEQ
T3CP_1 MPI_ Bcast T3CP_1 MPI_Send MPI_Recv
T3CP_2T3CP_1 T3CP_2
3.4 T3DCP_1
3.5 T3DCP_2
35
3.1
MPI_Sendrecv, MPI_Bcast
itag = 110;
MPI_ Sendrecv ((void *)&b[iend],
icount, DATA_TYPE, r_nbr, itag,
(void *)&b[istartm1], icount, DATA_TYPE, l_nbr, itag, comm, istat);
b[iend]
icount
DATA_TYPE
r_nbr
itag
CPU id ()
b[istartm1]
icount
DATA_TYPE
l_nbr
itag
CPU id ()
istat
3.1.2 MPI_Bcast
MPI_Bcast '' Bcast Broadcast
communicator CPU ''
CPU CPU
CPU
MPI_Bcast :
iroot=0;
36
CPU id
iroot
MPI_Bcast 3.1 :
CPU0
CPU0
b1 b2 b3 b4
CPU1
b1 b2 b3 b4
CPU2
CPU2
b1 b2 b3 b4
CPU3
CPU3
b1 b2 b3 b4
CPU1
b1 b2 b3b4
MPI_Bcast
3.1 MPI_Bcast
37
T3SEQ
3.2
PROGRAM T3SEQ
Boundary Data Exchange Program - Sequential Version
*/
#include <stdio.h>
#include <stdlib.h>
#define ntotal
200
main ()
{
double
int
FILE
read
'input.dat',
*/
a[i],a[i+5],a[i+10],a[i+15],a[i+20],a[i+25],a[i+30],a[i+35]);
}
printf( "MAXIMUM VALUE OF A ARRAY is=%f\n",amax);
return 0;
}
double max(double a, double b)
{
if(a >= b)
return a;
else
return b;
}
T3SEQ :
0.000
2.148
3.063
2.131
2.563
2.118
2.383
2.108
2.290
2.099
2.234
2.091
2.196
2.085
2.168
2.079
2.074
2.070
2.066
2.063
2.060
2.057
2.050
2.048
2.046
2.044
2.043
2.041
2.037
2.036
2.035
2.034
2.033
2.032
MAXIMUM VALUE OF A ARRAY is=5.750000
2.054
2.040
2.031
2.052
2.039
2.031
39
T3CP
3.3
T3SEQ ? 3.4
3.5
T3CP_1 startend CPU index
CPU0 CPU1 3.2 :
left
mpi_proc_null
cpu0
cpu1
cpu2
right
| |
istart2
istart
.
iend+1
iend1
iend
|
.
. .
iend+1
iend1
iend
|
| |
| istart
istart2
istart-1
. . . . .
is owned data
ntotal
|
. . .
ntotal
|
. . .
iend
iend1
|
|
|
|
istart
istart2
istart -1
mpi_proc_null
is exchanged data
3.2
3.2 .
CPU
CPU istart iend T3SEQ :
amax=-1.e12;
for (i=1; i<ntotal-1; i++) {
a[i]=c[i]*d[i] + ( b[i-1] + 2.0*b[i] + b[i+1] )*0.25
amax = max(amax, a[i])
}
40
itag = 110;
MPI_Sendrecv ((void *)&b[iend],
1, MPI_DOUBLE, r_nbr, itag,
(void *)&b[istartm1], 1, MPI_DOUBLE, l_nbr, itag, comm, istat)
b[i+1] 3.2 CPU1 b[istart] "
b[iendp1]" " b[istart]" b[iendp1]
MPI_PROC_NULL CPU
b[iendp1] :
itag = 120;
MPI_Sendrecv ((void *)&b[istart], 1, MPI_DOUBLE, l_nbr, itag,
(void *)&b[iendp1], 1, MPI_DOUBLE, r_nbr, itag, comm, istat);
CPU for loop istart iend amax a ntotal
np
MPI_Allreduce CPU amax gmax (global maximum)
CPU
MPI_Allreduce ( (void *)&amax, (void *)&gmax, 1, MPI_DOUBLE, MPI_MAX, comm );
reduce allreduce CPU reduce
MPI_Allreduce CPU reduce
MPI_Reduce MPI_Allreduce
:
/*
PROGRAM T3CP
Boundary data exchange with computing partition without data partition
Using MPI_Send, MPI_Recv to distribute input data
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ntotal 200
main ( argc, argv)
42
int argc;
char **argv;
{
double amax, gmax, a[ntotal], b[ntotal], c[ntotal], d[ntotal];
int
i, j, k;
FILE *fp;
int
nproc, myid, istart, iend, icount, r_nbr, l_nbr, lastp;
int
itag, isrc, idest, istart1,icount1, istart2, iend1, istartm1, iendp1;
int
gstart[16], gend[16], gcount[16];
MPI_Status
istat[8];
MPI_Comm comm;
extern double max(double, double);
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
startend (nproc, 0, ntotal-1, gstart, gend, gcount);
istart=gstart[myid];
iend=gend[myid];
icount=gcount[myid];
lastp=nproc-1;
printf( "NPROC,MYID,ISTART,IEND=%d\t%d\t%d\t%d\n",nproc,myid,istart,iend);
istartm1=istart-1;
iendp1=iend+1;
istart2=istart;
if (myid == 0) istart2=istart+1;
iend1=iend;
if(myid == lastp ) iend1=iend-1;
l_nbr = myid - 1;
r_nbr = myid + 1;
if (myid == 0) l_nbr=MPI_PROC_NULL;
if (myid == lastp) r_nbr=MPI_PROC_NULL;
43
/*
*/
if ( myid==0) {
fp = fopen( "input.dat", "r");
fread( (void *)&b, sizeof(b), 1, fp );
fread( (void *)&c, sizeof(c), 1, fp );
fread( (void *)&d, sizeof(d), 1, fp );
fclose( fp );
for (idest = 1; idest < nproc; idest++) {
istart1=gstart[idest];
icount1=gcount[idest];
itag=10;
MPI_Send ((void *)&b[istart1], icount1, MPI_DOUBLE, idest, itag, comm);
itag=20;
MPI_Send ((void *)&c[istart1], icount1, MPI_DOUBLE, idest, itag, comm);
itag=30;
MPI_Send ((void *)&d[istart1], icount1, MPI_DOUBLE, idest, itag, comm);
}
}
else {
isrc=0;
itag=10;
MPI_Recv ((void *)&b[istart], icount, MPI_DOUBLE, isrc, itag, comm, istat);
itag=20;
MPI_Recv ((void *)&c[istart], icount, MPI_DOUBLE, isrc, itag, comm, istat);
itag=30;
MPI_Recv ((void *)&d[istart], icount, MPI_DOUBLE, isrc, itag, comm, istat);
}
/*
Exchange data outside the territory
*/
itag=110;
MPI_Sendrecv((void *)&b[iend],
1, MPI_DOUBLE, r_nbr, itag,
(void *)&b[istartm1],1, MPI_DOUBLE, l_nbr, itag, comm, istat);
itag=120;
MPI_Sendrecv((void *)&b[istart], 1, MPI_DOUBLE, l_nbr, itag,
(void *)&b[iendp1],1, MPI_DOUBLE, r_nbr, itag, comm, istat);
44
/*
Compute, gather and write out the computed result
*/
amax= -1.0e12;
for (i=istart2; i<=iend1; i++) {
a[i]=c[i]*d[i]+(b[i-1]+2.0*b[i]+b[i+1])*0.25;
amax=max(amax,a[i]);
}
itag=130;
if (myid > 0) {
idest=0;
MPI_Send((void *)&a[istart], icount, MPI_DOUBLE, idest, itag, icomm);
}
else
T3CP_1 :
ATTENTION: 0031-408 4 nodes allocated by LoadLeveler, continuing...
NPROC,MYID,ISTART,IEND=4
0
0
49
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0.000
3.063
2.563
2.383
2.290
50
99
100
149
150
199
2.234
2.196
2.168
2.148
2.074
2.050
2.037
2.099
2.060
2.043
2.033
2.091
2.057
2.041
2.032
2.131
2.070
2.048
2.036
2.118
2.066
2.046
2.035
2.108
2.063
2.044
2.034
1
2
3
2.085
2.054
2.040
2.031
2.079
2.052
2.039
2.031
46
3.4
() T3DCP_1
np CPU n
ntotal np ntotal NP
n+2 dimension [n+2]
index 1 n :
double a[n+2], b[n+2], c[n+2], d[n+2], t[ntotal], amax,gmax;
3.3 :
left
mpi_proc_null
cpu0
n
+
index 0 1 2 . n 1
index
cpu1
0 1 2 . n
n
+
1
n
+
index 0 1 2
. n 1
cpu2
n
+
index 0 1 2 . n 1
cpu3
mpi_proc_null
right
is owned data
is exchanged data
3.3
CPU for loop 1 N :
istart=1;
iend=n;
CPU for loop 2 CPU for loop 1 CPU for loop
n-1 CPU for loop n :
47
istart2= istart ;
if (myid == 0) istart2=2;
iend1= iend;
if(myid == nproc-1) iend1= iend 1;
CPU CPU iend
CPU istart-1 :
istartm1 = istart 1;
itag=110;
MPI_Sendrecv ((void *)&b[iend],
1, MPI_DOUBLE, r_nbr, itag,
(void *)&b[istartm1], 1, MPI_DOUBLE, l_nbr, itag, comm, istat);
CPU CPU istart
CPU iend+1 :
iendp1 = iend+1;
itag=120
MPI_Sendrecv ((void *)&b[istart],
n, MPI_DOUBLE,
T3DCP_1 :
/*
PROGRAM T3DCP_1
Boundary data exchange with data & computing partition
Using MPI_Gather, MPI_Scatter to gather & scatter data
*/
48
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ntotal 200
#define n
50
#define np
*fp;
nproc, myid, istart, iend, istart2, iend1, istartm1, iendp1;
r_nbr,l_nbr, lastp, iroot, itag;
MPI_Status
istat[8];
MPI_Comm
comm;
extern double max(double, double);
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
istart=1;
iend=n;
lastp=nproc-1;
printf( "NPROC,MYID,ISTART,IEND=%d\t%d\t%d\t%d\n",nproc,myid,istart,iend);
istartm1=istart-1;
iendp1=iend+1;
istart2=istart;
if(myid == 0) istart2=2;
iend1=iend;
if(myid == lastp ) iend1=iend-1;
49
l_nbr = myid - 1;
r_nbr = myid + 1;
if(myid == 0)
l_nbr=MPI_PROC_NULL;
if(myid == lastp) r_nbr=MPI_PROC_NULL;
/*
*/
if( myid==0) {
fp = fopen( "input.dat", "r");
fread( (void *)&t, sizeof(t), 1, fp );
}
iroot=0;
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&b[1], n, MPI_DOUBLE, iroot, comm);
if( myid==0)
fread( (void *)&t, sizeof(t), 1, fp );
MPI_Scatter ((void *)&t, n, MPI_DOUBLE,( void *)&c[1], n, MPI_DOUBLE, iroot, comm);
if( myid==0) {
fread( (void *)&t, sizeof(t), 1, fp );
fclose( fp );
}
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&d[1], n, MPI_DOUBLE, iroot, comm);
/*
Exchange data outside the territory
*/
itag=110;
MPI_Sendrecv((void *)&b[iend],
1,MPI_DOUBLE, r_nbr, itag,
(void *)&b[istartm1], 1,MPI_DOUBLE, l_nbr, itag, comm, istat);
itag=120;
MPI_Sendrecv((void *)&b[istart], 1, MPI_DOUBLE, l_nbr, itag,
(void *)&b[iendp1],1, MPI_DOUBLE, r_nbr, itag, comm, istat);
/*
Compute, gather and write out the computed result
*/
amax= -1.0e12;
for (i=istart2; i<=iend1; i++) {
a[i]=c[i]*d[i] + ( b[i-1] + 2.0*b[i] + b[i+1] )*0.25;
amax=max(amax,a[i]);
50
}
MPI_Gather((void *)&a[istart], n, MPI_DOUBLE,(void *)&t, n, MPI_DOUBLE,iroot, comm);
MPI_Allreduce((void *)&amax, (void *)&gmax, 1, MPI_DOUBLE, MPI_MAX,
amax=gmax;
if( myid == 0) {
for (i = 0; i < ntotal; i+=40) {
comm);
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
t[i],t[i+5],t[i+10],t[i+15],t[i+20],t[i+25],t[i+30],t[i+35]);
}
printf ("MAXIMUM VALUE OF ARRAY A is %f\n", amax);
}
MPI_Finalize();
return 0;
}
double max(double a, double b)
{
if(a >= b)
return a;
else
return b;
}
T3DCP_1 :
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0.000
3.063
2.563
2.383
2.148
2.131
2.118
2.108
2.290
2.099
2.234
2.091
50
50
50
50
2.196
2.168
2.085
2.079
2.074
2.050
2.037
2.060
2.043
2.033
2.057
2.041
2.032
2.054
2.040
2.031
2.070
2.048
2.036
2.066
2.046
2.035
2.063
2.044
2.034
1
3
0
2
1
1
1
1
2.052
2.039
2.031
51
3.5
() T3DCP_2
LEFT
mpi_proc_null
cpu0
0 1 2 3 .
index
cpu1
cpu2
n n n
+ + +
1 2 3
0 1 2 3
index
0 1 2 3
index
cpu3
RIGHT
n n n
+ + +
. 1 2 3
.
n n n
+ + +
1 2 3
0 1 2 3
is exchanged data
is owned data
n
+
1
n n
+ +
2 3
mpi_proc_null
3.4
for loop index CPU 3 CPU
ntotal-2 :
istart3=istart;
52
if (myid == 0) istart3=4;
iend2= iend;
if (myid == nproc-1) iend2= iend 2;
CPU
CPU iend-1 CPU
istart-2 :
iendm1=iend-1;
istartm2=istart-2;
itag = 110;
MPI_Sendrecv ((void *)&b[iendm1], 2, MPI_DOUBLE, r_nbr, itag,
1
(void *)&b[istartm2], 2, MPI_DOUBLE, l_nbr, itag, comm, istat);
CPU CPU
istart CPU iend+1 :
iendp1=iend+1;
itag=120;
MPI_Sendrecv ((void *)&b[istart],
PROGRAM T3CP_2
Two element of boundary data exchange with data & computing partition
Using MPI_Gather, MPI_Scatter to gather & scatter data
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ntotal 200
#define n
50
#define np
int argc;
char **argv;
{
double amax, gmax, a[n+4], b[n+4], c[n+4], d[n+4], t[ntotal];
int
i, j, k;
FILE *fp;
int
nproc, myid, istart, iend, istart3, iend2, istartm2, iendm1, iendp1;
int
r_nbr, l_nbr, lastp, iroot, itag;
MPI_Status
istat[8];
MPI_Comm comm;
extern double max(double, double);
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
istart=2;
iend=n+1;
lastp=nproc-1;
printf( "NPROC,MYID,ISTART,IEND=%d\t%d\t%d\t%d\n",nproc,myid,istart,iend);
istartm2=istart-2;
iendp1=iend+1;
iendm1=iend-1;
istart3=istart;
if(myid == 0) istart3=4;
iend2=iend;
if(myid == lastp ) iend2=iend-2;
l_nbr = myid - 1;
r_nbr = myid + 1;
if(myid == 0)
l_nbr=MPI_PROC_NULL;
if(myid == lastp) r_nbr=MPI_PROC_NULL;
/*
*/
54
if ( myid==0) {
fp = fopen( "input.dat", "r");
fread( (void *)&t, sizeof(t), 1, fp );
}
iroot=0;
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&b[2], n, MPI_DOUBLE, iroot, comm);
if( myid==0)
fread( (void *)&t, sizeof(t), 1, fp );
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&c[2], n, MPI_DOUBLE, iroot, comm);
If ( myid==0) {
fread( (void *)&t, sizeof(t), 1, fp );
fclose( fp );
}
MPI_Scatter ((void *)&t, n, MPI_DOUBLE, (void *)&d[2], n, MPI_DOUBLE, iroot, comm);
/*
Exchange data outside the territory
*/
itag=110;
MPI_Sendrecv((void *)&b[iendm1], 2, MPI_DOUBLE, r_nbr, itag,
(void *)&b[istartm2], 2, MPI_DOUBLE, l_nbr, itag, comm, istat);
itag=120;
MPI_Sendrecv((void *)&b[istart], 2, MPI_DOUBLE, l_nbr, itag,
(void *)&b[iendp1], 2, MPI_DOUBLE, r_nbr, itag, comm, istat);
/*
C
*/
amax= -1.0e12;
for (i=istart3; i<=iend2; i++) {
a[i]=c[i]*d[i] + ( b[i-2] + 2.0*b[i-1] + 2.0*b[i] + 2.0*b[i+1] + b[i+2] )*0.125;
amax=max(amax,a[i]);
}
MPI_Gather((void *)&a[istart], n, MPI_DOUBLE, (void *)&t, n, MPI_DOUBLE, iroot, comm);
MPI_Allreduce((void *)&amax, (void *)&gmax, 1, MPI_DOUBLE, MPI_MAX, comm);
amax=gmax;
if( myid == 0) {
for (i = 0; i < ntotal; i+=40) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
t[i],t[i+5],t[i+10],t[i+15],t[i+20],t[i+25],t[i+30],t[i+35]);
55
}
printf ("MAXIMUM VALUE OF ARRAY A is %f\n", amax);
}
MPI_Finalize();
return 0;
}
double max(double a, double b)
{
if(a >= b)
return a;
else
return b;
}
T3DCP_2 :
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0.000
2.148
2.074
2.050
3.078
2.131
2.070
2.048
2.565
2.118
2.066
2.046
2.384
2.108
2.063
2.044
0
1
3
2
2.291
2.099
2.060
2.043
2
2
2
2
51
51
51
51
2.234
2.091
2.057
2.041
2.196
2.085
2.054
2.040
2.168
2.079
2.052
2.039
2.037
2.036
2.035
2.034
2.033
2.032
MAXIMUM VALUE OF ARRAY A is 4.484722
2.031
2.031
56
(grid points) CPU
dimension
4.1 T4SEQ dimension 161 7 23
4.2 MPI_Scatterv MPI_Gatherv '' MPI_Scatter
MPI_Gather CPU
4.3 MPI_Pack MPI_Unpack MPI_Barrier
MPI_Wtime
4.4 MPI T4SEQ T4DCP
57
T4SEQ
4.1
PROGRAM T4SEQ
Sequential Version of an odd-dimensioned array with -1, +1 access
*/
#include <stdio.h>
#include <stdlib.h>
#define ntotal 161
main ()
{
double
a[ntotal], b[ntotal], c[ntotal], d[ntotal], p, q, r, pqr[3];
int
i,j;
FILE
*fp;
extern double max(double, double);
/*
T4SEQ :
0.000
13.305
12.872
12.726
18.550
13.210
12.847
12.714
15.720
13.133
12.824
12.703
14.682
13.070
12.803
12.693
14.143
13.018
12.785
12.684
13.812
12.973
12.768
12.675
13.588
12.935
12.753
12.667
13.427
12.901
12.739
12.660
59
4.2.
MPI_ScattervMPI_Gatherv
CPU
CPU t
MPI_DOUBLE
c(1)
mycount
MPI_DOUBLE
iroot
CPU id
int
nproc, myid, mycount, istart, iend, l_nbr, r_nbr, gcount[np], gdisp[np], gend[np];
MPI_Status
istat[8];
MPI_Gatherv :
MPI_ Gatherv ((void *)&a[1], mycount, MPI_DOUBLE,
(void *)&t, gcount, gdisp, MPI_DOUBLE, iroot, comm);
MPI_ Gatherv MPI_Scatterv iroot CPU CPU ( iroot CPU)
a CPU id t CPU
gcount CPU gdisp
t :
a[1]
mycount
MPI_DOUBLE
gcount
gdisp
MPI_DOUBLE
iroot
CPU
CPU T
CPU id
61
4.3
MPI MPI_Pack
(noncontiguous data) (contiguous memory
locations) (buffer area) (character array)
MPI_Unpack
4.3.1 MPI_Pack
MPI_Unpack
MPI_Pack T4SEQ pqr
CPU (pack)
MPI_Unpack
MPI_PackMPI_Unpack
pqr 4 12
buf1 12 :
#define bufsize 12
char buf1[bufsize];
MPI_Pack :
MPI_Pack ((void *)&p, 1, MPI_FLOAT, (void *)&buf1, bufsize, &ipos, comm);
p
buf1
62
buf1
MPI_ FLOAT
buf1
buf1
bufsize
buf1
ipos
buf1
ipos
CPU0 pqr buf1 :
if (myid == 0) {
scanf (%f %f %f, &p, &q, &r);
ipos = 0;
MPI_Pack ((void *)&p, 1, MPI_ FLOAT, (void *)&buf1, bufsize, &ipos, comm);
MPI_Pack ((void *)&q, 1, MPI_ FLOAT, (void *)&buf1, bufsize, &ipos, comm);
MPI_Pack ((void *)&r, 1, MPI_ FLOAT, (void *)&buf1, bufsize, &ipos, comm);
}
MPI_Bcast buf1 CPU :
iroot=0
MPI_Bcast ((void *)&buf1, bufsize, MPI_CHAR, iroot, comm);
MPI_Unpack :
MPI_Unpack ((void *)&buf1, bufsize, ipos, (void *)&p 1, MPI_FLOAT, comm);
buf1
bufsize
ipos
p
1
MPI_FLOAT
buf1
buf1
buf1
buf1
buf1
ipos
CPU bcast buf1 buf1 pqr
:
63
if (myid > 0)
ipos=0;
MPI_Unpack ((void *)&buf1, bufsize, ipos, (void *)&p, 1, MPI_FLOAT, comm);
MPI_ Unpack ((void *)&buf1, bufsize, ipos, (void *)&q, 1, MPI_ FLOAT, comm);
MPI_ Unpack ((void *)&buf1, bufsize, ipos, (void *)&r, 1, MPI_ FLOAT, comm);
}
Pack
float
p, q, r, buf1[3];
if (myid == 0)
scanf (%f
buf1(1)=p
buf1(2)=q
{
%f
%f,
buf1(3)=r
}
iroot=0;
MPI_Bcast (buf1, 3, MPI_FLOAT, iroot, comm);
If (myid > 0) {
p = buf1(1);
q = buf1(2);
r = buf1(3);
}
4.3.2 MPI_Barrier
MPI_Wtime
MPI_Barrier '' communicator CPU '
' (synchronized) CPU MPI_Barrier CPU
MPI_Barrier MPI_Barrier
MPI_Barrier :
MPI_Barrier (MPI_COMM_WORLD);
(wall clock time) MPI MPI_Wtime
64
'' :
time1=MPI_Wtime();
time1 double
CPU
MPI_Init MPI_Wtime MPI_Finalize
MPI_Wtime :
MPI_Init();
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
MPI_Barrier (MPI_COMM_WORLD);
time1=MPI_Wtime()
...
time2=MPI_Wtime() time1;
printf (myid, clock time= %f\t%f\n, myid,time2);
MPI_Finalize();
Return 0;
time 1 MPI_Barrier ? Job Scheduler
(executable file) CPU CPU
CPU
CPU
CPU CPU time 1 MPI_ Barrier
65
T4DCP
4.4
T4DCP T4SEQ ab
cd ntotal 161 4 CPU startend CPU
41404040 np CPU
n ntotal / np + 1 define :
#define ntotal
#define np
161
4
#define n
41
n+2 demension
(n+2) index 1 n :
double
66
mpi_proc_null
iend
iend+1
iend
istart-1
istart
istart-1
istart
istart-1
is owned data
iend+1
iend
CPU0
iend+1
istart
CPU1
CPU2
mpi_proc_null
is exchanged data
4.1
for loop :
for (i=1; i<ntotal-1; i++)
a[i]=c[i]*d[i] + ( b[i-1] + 2.0*b[i] + b[i+1]
)*0.25;
PROGRAM T4DCP
Boundary data exchange with data & computing partition
Using MPI_Gatherv, MPI_Scatterv to gather & scatter data
*/
#include <stdio.h>
67
#include <stdlib.h>
#include <mpi.h>
#define ntotal 161
#define n 41
#define np 4
main ( argc, argv)
int argc;
char **argv;
{
double
int
FILE
int
int
r_nbr, l_nbr, lastp, iroot, itag, icount;
int
gstart[16], gend[16], gcount[16], gdisp[16];
MPI_Status
istat[8];
MPI_Comm
comm;
istart2=istart;
if(myid == 0) istart2=2;
iend1=iend;
if(myid == lastp ) iend1=iend-1;
l_nbr = myid - 1;
r_nbr = myid + 1;
if(myid == 0) l_nbr=MPI_PROC_NULL;
if(myid == lastp) r_nbr=MPI_PROC_NULL;
/*
*/
if( myid==0) {
fp = fopen( "input.dat", "r");
fread( (void *)&t, sizeof(t), 1, fp );
}
iroot=0;
MPI_Scatterv ((void *)&t, gcount, gdisp, MPI_DOUBLE,
(void *)&b[1], icount,MPI_DOUBLE, iroot, comm);
if( myid==0)
fread( (void *)&t, sizeof(t), 1, fp );
MPI_Scatterv ((void *)&t, gcount, gdisp, MPI_DOUBLE,
(void *)&c[1], icount, MPI_DOUBLE, iroot, comm);
if( myid==0) {
fread( (void *)&t, sizeof(t), 1, fp );
fread( (void *)&pqr, sizeof(pqr), 1, fp );
fclose( fp );
}
MPI_Scatterv ((void *)&t, gcount, gdisp, MPI_DOUBLE,
(void *)&d[1], icount, MPI_DOUBLE, iroot, comm);
MPI_Bcast ((void *)&pqr, 3, MPI_DOUBLE, 0, comm);
p=pqr[0];
q=pqr[1];
r=pqr[2];
/*
Exchange data outside the territory
*/
69
itag=110;
MPI_Sendrecv((void *)&b[iend],
comm, istat);
itag=120;
MPI_Sendrecv((void *)&b[istart], 1,MPI_DOUBLE, l_nbr, itag,
(void *)&b[iendp1],1,MPI_DOUBLE, r_nbr, itag, comm, istat);
/*
C
*/
)*q + r;
}
MPI_Gatherv ((void *)&a[istart], icount, MPI_DOUBLE,
(void *)&t, gcount, gdisp, MPI_DOUBLE, iroot, comm);
if( myid == 0) {
for (i = 0; i < ntotal-1; i+=40) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
t[i],t[i+5],t[i+10],t[i+15],t[i+20],t[i+25],t[i+30],t[i+35]);
}
}
clock=MPI_Wtime() - clock;
printf( "myid, clock time= %d\t%.3f\n", myid, clock);
MPI_Finalize();
return 0;
}
startend(,int nproc,int is1,int is2,int gstart[16],int gend[16], int gcount[16])
{
int
ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
for ( i=0; i < nproc; i++ ) {
if(i < ir) {
gstart[i]=is1+i*(iblock+1);
gend[i]=gstart[i]+iblock;
}
else {
gstart[i]=is1+i*iblock+ir;
70
gend[i]=gstart[i]+iblock-1;
}
if(ilength < 1) {
gstart[i]=1;
gend[i]=0;
}
gcount[i]=gend[i]-gstart[i] + 1;
}
}
T4DCP :
ATTENTION: 0031-408 4 tasks allocated by LoadLeveler, continuing...
NPROC,MYID,ISTART,IEND=4
0
1
41
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0.000
13.305
12.872
12.726
18.550
13.210
12.847
12.714
15.720
13.133
12.824
12.703
14.682
13.070
12.803
12.693
1
2
3
14.143
13.018
12.785
12.684
1
1
1
13.812
12.973
12.768
12.675
40
40
40
13.588
12.935
12.753
12.667
13.427
12.901
12.739
12.660
0.002
0.002
0.002
0.002
CPU CPU
71
5.1 T5SEQ
5.2 T5CP
5.3 T5DCP
5.4 MPI
5.5 T5_2D
72
5.1 T5SEQ
T5SEQ (global variables)
(local variables) (test data generation)
/*
PROGRAM T5SEQ
Sequential version of multiple dimensional array with -1,+1 data access
*/
#include <stdio.h>
#include <stdlib.h>
#define kk 20
#define km 3
#define mm 160
#define nn
120
double f1[mm][nn][km], f2[mm][nn][km], hxu[mm][nn], hxv[mm][nn],
hmmx[mm][nn], hmmy[mm][nn];
double vecinv[kk][kk], am7[kk];
main ()
{
double u1[mm][nn][kk], v1[mm][nn][kk], ps1[mm][nn];
double d7[mm][nn], d8[mm][nn], d00[mm][nn][kk];
double clock, sumf1, sumf2;
int
i, j, k, ka, isec1, isec2, nsec1, nsec2;
/*
*/
{
struct timestruc_t tb;
int iret;
iret=gettimer(TIMEOFDAY, &tb);
*isec=tb.tv_sec;
*nsec=tb.tv_nsec;
return 0;
}
T5SEQ IBM SP2 SMP :
SUMF1,SUMF2= 26172.46054
F2[i][1][1],i=0,159,5
0.000
-0.333 -0.295 -0.281
-0.274
-0.269
-0.266
-0.264
-0.262
-0.256
-0.254
-0.258
-0.255
-0.254
-0.258
-0.255
-0.253
-0.257
-0.255
-0.253
-0.257
-0.254
-0.253
-0.261
-0.256
-0.254
-0.260
-0.255
-0.254
-0.259
-0.255
-0.254
-2268.89180
76
5.2 T5CP
C least
dimension index index
index +1 index -1
istart-1
| istart
| |
istart-1
| istart
| |
|
|
istart-1
| istart
| |
| |
| iend+1
iend
|
| |
istart
|
nn |
. |
.
j=1
P0
5.1
#define kk
| |
| iend+1
iend
P1
|
iend
| |
| iend+1
iend
P2
ps1(i,j)
P3
ps1(mm,nn)
20
#define km 3
#define mm 160
#define nn
120
double
itag = 20;
MPI_Sendrecv ((void *)&ps1[istart][0],
nn, MPI_DOUBLE, l_nbr, itag,
(void *)&ps1[iendp1][0], nn, MPI_DOUBLE, r_nbr, itag, icomm, istat);
k=kk
k=1
nn
.
.
.
j=1
i=1 . . . . . m
P0
u1[i][j][k]
m = mm / np
P1
5.2
P2
P3
u1(mm,nn,kk)
nnkk = nn*kk;
itag = 10;
MPI_Sendrecv ((void *)&u1[iend][0][0],
78
T5CP :
/*
PROGRAM T5CP
Computing partition on the first dimension of multiple dimensional
array with -1,+1 data exchange without data partition
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define kk 20
#define km 3
#define mm 160
#define nn
120
double f1[mm][nn][km], f2[mm][nn][km], hxu[mm][nn], hxv[mm][nn],
hmmx[mm][nn], hmmy[mm][nn];
double vecinv[kk][kk], am7[kk];
main ( argc, argv)
int argc;
char **argv;
{
double u1[mm][nn][kk], v1[mm][nn][kk], ps1[mm][nn];
double d7[mm][nn], d8[mm][nn], d00[mm][nn][kk];
double clock, sumf1, sumf2, gsumf1, gsumf2;
int
i, j, k, ka, nnkk;
int
int
int
gstart[16], gend[16], gcount[16];
MPI_Status
istat[8];
MPI_Comm
comm;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
79
MPI_Barrier(comm);
clock=MPI_Wtime();
startend (nproc, 0, mm-1, gstart, gend, gcount);
istart=gstart[myid];
iend=gend[myid];
icount=gcount[myid];
lastp=nproc-1;
printf( "NPROC,MYID,ISTART,IEND=%d\t%d\t%d\t%d\n",nproc,myid,istart,iend);
istartm1 = istart-1;
iendp1 = iend+1;
istart2 = istart;
if (myid == 0) istart2 = 1;
iend1 = iend;
if (myid == lastp ) iend1 = iend-1;
l_nbr = myid - 1;
r_nbr = myid + 1;
if (myid == 0) l_nbr = MPI_PROC_NULL;
if (myid == lastp) r_nbr = MPI_PROC_NULL;
/*
*/
{
80
*/
/*
81
/*
*/
82
83
T5CP :
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0
1
2
3
0
40
80
120
39
79
119
159
SUMF1,SUMF2= 26172.46054
-2268.89180
F2[i][1][1],i=0,159,5
0.000
-0.333 -0.295 -0.281 -0.274 -0.269 -0.266 -0.264
-0.262 -0.261 -0.260 -0.259 -0.258 -0.258 -0.257 -0.257
-0.256 -0.256 -0.255
-0.254 -0.254 -0.254
myid, clock time=0
myid, clock time=1
myid, clock time=2
myid, clock time=3
-0.255 -0.255
-0.254 -0.254
0.03366
0.03054
-0.255
-0.253
-0.255
-0.253
-0.254
-0.253
0.03195
0.03338
84
5.3 T5DCP
T5DCP T5SEQ
mm 160 np CPU mm np
m mm/npmm np m
mm/np+1 :
#define kk
#define km
20
3
#define mm 160
#define nn
120
#define m
40
dimension (m+2) istart-1 iend+1
double f1[m+2][nn][km], f2[m+2][nn][km], hxu[m+2][nn], hxv[m+2][nn],
hmmx[m+2][nn], hmmy[m+2][nn];
double u1[m+2][nn][kk], v1[m+2][nn][kk], ps1[m+2][nn];
double d7[m+2][nn], d8[m+2][nn], d00[m+2][nn][kk], tt[mm][nn][km];
mm np f1f2 MPI_Gather
MPI_Gather :
iroot=0;
icount1= m*nn*km;
MPI_Gather((void *)&f2[istart][0][0], icount1, MPI_DOUBLE,
(void *)&tt,
icount1, MPI_DOUBLE, iroot, icomm);
:
/*
PROGRAM
T5DCP
#define kk 20
#define km 3
#define mm 160
#define nn
120
#define m
40
double f1[m+2][nn][km], f2[m+2][nn][km], hxu[m+2][nn], hxv[m+2][nn],
hmmx[m+2][nn], hmmy[m+2][nn];
double vecinv[kk][kk], am7[kk];
main ( argc, argv)
int argc;
char **argv;
{
double u1[m+2][nn][kk], v1[m+2][nn][kk], ps1[m+2][nn];
double d7[m+2][nn], d8[m+2][nn], d00[m+2][nn][kk], tt[mm][nn][km];
double clock, sumf1, sumf2, gsumf1, gsumf2;
int
int
int
int
gstart[16], gend[16], gcount[16];
MPI_Status
istat[8];
MPI_Comm
comm;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
MPI_Barrier(icomm);
clock=MPI_Wtime();
startend( nproc, 1, mm, gstart, gend, gcount);
istart = 1;
iend = m;
lastp = nproc-1;
istartg = gstart[myid];
printf( "NPROC,MYID,ISTART,IEND,istartg=%d\t%d\t%d\t%d\t%d\n",
86
nproc,myid,istart,iend,istartg);
istartm1 = istart-1;
iendp1 = iend+1;
istart2 = istart;
if (myid == 0) istart2 = 2;
iend1 = iend;
if (myid == lastp ) iend1 = iend-1;
l_nbr = myid - 1;
r_nbr = myid + 1;
if (myid == 0)
l_nbr = MPI_PROC_NULL;
if (myid == lastp) r_nbr = MPI_PROC_NULL;
/* for (i=0; i<mm; i++) */
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++)
for (k=0; k<kk; k++)
u1[i][j][k]=1.0/(double) ii + 1.0/(double) (j+1) + 1.0/(double) (k+1);
}
/* for (i=0; i<mm; i++) */
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++)
for (k=0; k<kk; k++)
v1[i][j][k]=2.0/(double) ii + 1.0/(double) (j+1) + 1.0/(double) (k+1);
}
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++) {
ps1[i][j] = 1.0/(double) ii + 1.0/(double)(j+1);
hxu[i][j] = 2.0/(double) ii + 1.0/(double)(j+1);
hxv[i][j] = 1.0/(double) ii + 2.0/(double)(j+1);
hmmx[i][j] = 2.0/(double) ii + 1.0/(double)(j+1);
87
*/
nnkk = nn*kk;
itag = 10;
MPI_Sendrecv ((void *)&u1[iend][0][0],
/*
*/
itag=30;
MPI_Sendrecv ((void *)&d7[iend][0],
nn, MPI_DOUBLE, r_nbr, itag,
(void *)&d7[istartm1][0], nn, MPI_DOUBLE, l_nbr, itag, comm, istat);
/*
/*
/*
/*
*/
(void *)&tt,
if (myid == 0) {
-0.258
-0.255
-0.253
-0.257
-0.255
-0.253
1
81
41
121
-0.257
-0.254
-0.253
0.02978
90
5.4 MPI
5.3
MPI
MPI_Cart_createMPI_Cart_coordsMPI_Cart_shiftMPI_Type_vectorMPI_Type_commit
(up)
(j)
u
p
d
o
w
n
CPU2
(0,2)
CPU5
(1,2)
CPU8
(2,2)
CPU11
(3,2)
CPU1
(0,1)
CPU4
(1,1)
CPU7
(2,1)
CPU10
(3,1)
CPU0
(0,0)
CPU3
(1,0)
CPU6
(2,0)
CPU9
(3,0)
(sideways)
A(i,j)
(i) (right)
5.3
a(mm,nn)
m mm/4 n
nn/3 mm nn 200 150
define mn
dimension :
#define
#define
#define
#define
mm 200
nn
150
m
50
n
50
91
#define ip
#define jp
4
3
MPI_Cart_coords
MPI_Cart_shift
MPI_Comm_size nproc MPI MPI_Csrt_create
5.3 :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ip 4
#define jp 3
#define ndim 2
int
nproc, myid, r_nbr, l_nbr, t_nbr, b_nbr, comm2d, my_coord[ndim];
int
ipart[ndim], periods[ndim], sideways, updown, right, up, reorder;
MPI_Status istat[8];
MPI_Comm comm;
main ( argc, argv)
int argc;
char **argv;
{
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
ipart[0]=ip;
ipart[1]=jp;
92
periods[0]=0;
periods[1]=0;
reorder=1;
MPI_Cart_create(MPI_COMM_WORLD, ndim, ipart, periods, reorder, &comm2d);
.....
return 0;
}
MPI_COMM_WORLD
communicator
ndim
ipart
ipart (1)
ipart (2)
5.3 2
ndim
5.3 4
5.3 3
periods
periods (1)
ndim
1
0 5.3 0
periods (2)
1
0 5.3 0
CPU
1
reorder
comm2d
communicator
comm2d
myid
communicator
communicator comm2d CPU id
communicator
communicator comm2d CPU id
5.3 2
ndim myid CPU CPU
comm2d
myid
ndim
my_coord
my_coord(1)
my_coord (2)
CPU my_coord 5.3 CPU0CPU1CPU2 CPU id
5.3 my_coord (0) CPU CPU my_coord
(0) ip-1 CPU CPU my_coord (1) CPU
CPU () my_coord (1) jp-1 CPU CPU
()
MPI_Cart_shift CPU CPU id
MPI_ Cart_shift :
int
sideways, updown, right, up
sideways=0;
updown=1;
right=1;
up=1;
MPI_ Cart_shift (comm2d, sideways, right, &l_nbr, &r_nbr);
comm2d
sideways
right
l_nbr
r_nbr
communicator
( i )
CPU CPU id
CPU CPU id
updown
( j )
94
up
b_nbr
t_nbr
CPU CPU id
CPU CPU id
5.4.3 MPI
MPI_Type_vector
MPI_Type_commit
(up)
(j)
n CPU2
. (0,2)
1xxxxx
u
p
d
o
w
n
n
.
1
CPU1
(0,1)
n
y
. CPU0 y
1 (0,0) y
1....m
CPU5
(1,2)
xxxxx
CPU8
(2,2)
xxxxx
CPU11
(3,2)
xxxxx
CPU4
(1,1)
CPU7
(2,1)
CPU10
(3,1)
y
CPU3 y
(1,0) y
1....m
(sideways)
y
CPU6 y
(2,0) y
1....m
a(i,j)
y
CPU9 y
(3,0) y
1....m
(i) (right)
5.4
a(mm,nn) C
a index j index i 5.4
CPU i=1 j 1 ni=2 j 1 n i=m j 1 n
i-1 i+1 5.4 y y y y
j-1 j+1 5.4 x x x x dimension
95
a(m,n) x n
CPU x x x x MPI_Type_vector
MPI_Type_commit
MPI_ Type_vector MPI_ Type_commit :
MPI_ Type_vector (count, blocklen, stride, oldtype, &newtype);
MPI_ Type_commit (&newtype);
count
blocklen
stride
oldtype
newtype
5.4 x x x x :
MPI_ Type_vector (m, 1, n, MPI_REAL, &vector2d);
MPI_ Type_commit (&vector2d);
x x x x vector2d
96
5.5 T5_2D
T5SEQ :
#define kk
20
#define km
3
#define mm 160
#define nn
120
double f1[mm][nn][km], f2[mm][nn][km], hxu[mm][nn], hxv[mm][nn],
hmmx[mm][nn], hmmy[mm][nn];
double vecinv[kk][kk], am7[kk];
main ()
{
double u1[mm][nn][kk], v1[mm][nn][kk], ps1[mm][nn];
double d7[mm][nn], d8[mm][nn], d00[mm][nn][kk];
double clock, sumf1, sumf2;
5.4 mm
nn 5.5 mm
nn mm nn ip 4 jp 2 m=mm/ip
n=nn/jp 5.5 :
#include <stdlib.h>
#include <mpi.h>
#define
#define
#define
#define
#define
kk
20
km
3
mm 160
nn
120
m
40
#define n
#define ip
#define jp
60
4
2
up
Index j
d8(m+2, n+2),
ps1(m+2, n+2)
cpu1
(0,1)
xxxxxxx
n
.
1
cpu0
(0,0)
xxxxxxx
1.......m
cpu2
(1,0)
xxxxxxx
1.......m
cpu5
(2,1)
xxxxxxx
cpu7
(3,1)
xxxxxxx
cpu4
(2,0)
xxxxxxx
1.......m
cpu6
(3,0)
xxxxxxx
1.......m
Index i
5.5
(sideways) (right)
98
nbr2d()
ipart[0]=ip;
ipart[1]=jp;
periods[0]=0;
periods[1]=0;
reorder=1;
sideways=0;
updown=1;
right=1;
up=1;
MPI_Cart_create(MPI_COMM_WORLD, 2, ipart, periods, reorder, &comm2d);
MPI_Comm_rank( comm2d,&myid);
MPI_Cart_coords( comm2d, myid, 2, my_coord);
MPI_Cart_shift( comm2d, sideways, right, &l_nbr, &r_nbr);
MPI_Cart_shift( comm2d, updown, up, &b_nbr, &t_nbr);
printf(" myid,coord,l,r,t,b_nbr=%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
myid,my_coord[0],my_coord[1],l_nbr,r_nbr,t_nbr,b_nbr);
}
CPU 5.5 x x x x
m vector2d n+2 1
vector3D kk*(n+2) kk
MPI :
n2=n+2;
MPI_Type_vector(m, 1, n2, MPI_DOUBLE, &vector2d);
MPI_Type_commit (&vector2d);
n2kk=n2*kk;
MPI_ Type_vector (m, kk, n2kk, MPI_DOUBLE, &vector3d);
MPI_ Type_commit (&vector3d);
x x x x vector2d
vector3d 1j for loop 1 n
jstart=1;
jend=n;
jstartm1=jstart-1;
jendp1=jend+1;
99
up
Index j
d8(m+2, n+2),
ps1(m+2, n+2)
jend
.
jstart
jend
.
jstart
yyyyyyy
xxxxxxx
cpu1
yyyyyyy
xxxxxxx
yyyyyyy
xxxxxxx
cpu3
yyyyyyy
xxxxxxx
yyyyyyy
xxxxxxx
cpu5
yyyyyyy
xxxxxxx
yyyyyyy
xxxxxxx
cpu7
yyyyyyy
xxxxxxx
yyyyyyy
yyyyyyy
yyyyyyy
yyyyyyy
xxxxxxx
cpu0
yyyyyyy
xxxxxxx
xxxxxxx
cpu2
yyyyyyy
xxxxxxx
xxxxxxx
cpu4
yyyyyyy
xxxxxxx
xxxxxxx
cpu6
yyyyyyy
xxxxxxx
1.......m
1.......m
1.......m
1.......m
Index i
5.6
(sideways) (right)
CPU (y ) jendp1
(y ) MPI_Sendrecv :
MPI_ Sendrecv ((void *)&ps1[istart][ jstart], 1, vector2d, b_nbr, itag,
(void *)&ps1[istart][ jendp1], 1, vector2d, t_nbr, itag, comm2d, istat);
MPI_ Sendrecv ((void *)&v1[istart][jstart][0],
(void *)&v1[istart][jendp1][0],
i for loop 1 m
istart=1;
iend=m;
istartm1=istart-1
iendp1=iend+1
CPU 5.3 T5DCP
CPU iendp1
MPI_Sendrecv :
MPI_ Sendrecv ((void *)&ps1[istart][ jstart], n, MPI_DOUBLE, l_nbr, itag,
(void *)&ps1[iendp1][ jstart], n, MPI_DOUBLE, r_nbr, itag, comm2d, istat);
CPU istartm1
MPI_ Sendrecv :
n2kk=(n+2)*kk
MPI_ Sendrecv ((void *)&u1[iend][jstart][0],
n2kk, MPI_DOUBLE, r_nbr, itag,
(void *)&u1[istartm1][jstart][0], n2kk, MPI_DOUBLE, l_nbr, itag,
comm2d, istat);
tt dimension f1f2 dimension MPI_Gather MPI_Gatherv
MPI_SendMPI_Recv CPU
copy1 f2 tt
double
double
tt[mm][nn][km];
f1[m+2][n+2][km], f2[m+2][n+2][km];
T5_2D :
/*
PROGRAM T5_2D
Computing & data partition on the first 2 dimensions of multiple
dimensional arrays with -1,+1 data exchange */
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
101
#define kk
#define km
#define
#define
#define
#define
20
3
mm 160
nn
120
m
40
n
60
#define ip
#define jp
#define np
4
2
8
int
istart,iend, istart2, iend1, istartm1, iendp1;
int
jstart,jend, jstart2, jend1, jstartm1, jendp1;
int
istartg[16], iendg[16], jstartg[16], jendg[16];
MPI_Comm
comm2d;
MPI_Status
istat[8];
MPI_Datatype vector2d, vector3d;
main ( argc, argv)
int argc;
char **argv;
{
double u1[m+2][n+2][kk], v1[m+2][n+2][kk], ps1[m+2][n+2];
double d7[m+2][n+2], d8[m+2][n+2], d00[m+2][n+2][kk];
double clock, sumf1, sumf2, gsumf1, gsumf2, tt[mm][nn][km];
int
i, j, k, ka, ii, jj, n2, n2kk;
int
itag, isrc, idest, iroot, ig, jg, count;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
if (nproc != np) {
102
if (myid == 0)
printf(" nproc not equal to np=%d ", np, "program will stop");
MPI_Finalize();
return 0;
}
nbr2d();
MPI_Barrier(comm2d);
clock=MPI_Wtime();
MPI_Gather ((void *)&my_coord, 2, MPI_INTEGER,
(void *)&g_coord, 2, MPI_INTEGER, 0, comm2d);
startend( ip, 1, mm, istartg, iendg);
startend( jp, 1, nn, jstartg, jendg);
istart = 1;
iend = m;
jstart = 1;
jend = n;
myid_i=my_coord[0];
myid_j=my_coord[1];
ig = istartg[myid_i];
jg = jstartg[myid_j];
lastp_i=ip-1;
lastp_j=jp-1;
printf( "NPROC,MYID,ISTART,IEND,ig,jg=%d\t%d\t%d\t%d\t%d\t%d\n",
nproc, myid, istart, iend, ig, jg);
istartm1 = istart-1;
iendp1 = iend+1;
jstartm1 = jstart-1;
jendp1 = jend+1;
istart2 = istart;
if (myid_i == 0) istart2 = 2;
jstart2 = jstart;
if (myid_j == 0) jstart2 = 2;
iend1 = iend;
if (myid_i == lastp_i ) iend1 = iend-1;
jend1 = jend;
103
/*
/*
}
/* for (i=0; i<mm-1; i++)
*/
/* for (j=0; j<nn-1; j++)
*/
for (i=istart; i<=iend1; i++)
for (j=jstart; j<=jend1; j++)
105
*/
107
nbr2d()
{
ipart[0]=ip;
ipart[1]=jp;
periods[0]=0;
periods[1]=0;
reorder=1;
sideways=0;
updown=1;
right=1;
up=1;
MPI_Cart_create(MPI_COMM_WORLD, 2, ipart, periods, reorder, &comm2d);
MPI_Comm_rank( comm2d,&myid);
MPI_Cart_coords( comm2d, myid, 2, my_coord);
MPI_Cart_shift( comm2d, sideways, right, &l_nbr, &r_nbr);
MPI_Cart_shift( comm2d, updown, up, &b_nbr, &t_nbr);
printf(" myid,coord,l,r,t,b_nbr=%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
myid,my_coord[0],my_coord[1],l_nbr,r_nbr,t_nbr,b_nbr);
return 0;
}
copy1(int id, double tt[mm][nn][km])
{
/*
*/
jj=g_coord[id][1];
for (i=1; i<=m; i++) {
ig=istartg[ii]+i-2;
for (j=1; j<=n; j++) {
jg=jstartg[jj]+j-2;
for (k=0; k<km; k++)
tt[ig][jg][k] = f2[i][j][k];
}
}
return 0;
}
108
T5_2D :
ATTENTION: 0031-408 8 tasks allocated by LoadLeveler, continuing...
myid,coord,l,r,t,b_nbr=0
0
0
-3 2
1
-3
myid,coord,l,r,t,b_nbr=1
0
1
-3 3
-3 0
myid,coord,l,r,t,b_nbr=2
1
0
0
4
3
-3
myid,coord,l,r,t,b_nbr=3
myid,coord,l,r,t,b_nbr=4
myid,coord,l,r,t,b_nbr=5
myid,coord,l,r,t,b_nbr=6
1
2
2
3
1
0
1
0
1
2
3
4
5
6
7
-3
-3
5
-3
7
2
-3
4
-3
myid,coord,l,r,t,b_nbr=7
3
1
5
-3 -3 6
sumf1,sumf2 = 26172.46985
-2268.89180
tt[i][1][1],i=0,159,5
0.000
-0.333 -0.295 -0.281 -0.274 -0.269
-0.266
-0.264
-0.262
-0.256
-0.254
-0.257
-0.255
-0.253
-0.257
-0.254
-0.253
-0.261
-0.256
-0.254
-0.260
-0.255
-0.254
-0.259
-0.255
-0.254
0.02869
0.02974
0.03281
0.02929
0.01998
0.02046
0.02030
0.02039
-0.258
-0.255
-0.254
-0.258
-0.255
-0.253
109
MPI
Nonblocking blocking
110
6.1
Nonblocking
MPI_SendMPI_Recv Blocking
MPI_Send CPU Buffer is empty MPI_Send
MPI_Recv CPU
Buffer is full MPI_Recv
6.1 Blocking Send/Recv
Processor 0
User mode
kernel mode
MPI_Send
sendbuf
sysbuf
CPU idled
copy
sendbuf
to
sysbuf
Processor 1
User mode
kernel mode
MPI_Recv
recvbuf
CPU idled
sysbuf
copy
sysbuf
to
recvbuf
111
Processor 0
User mode
kernel mode
MPI_Isend
computation
sendbuf
sysbuf
copy
sendbuf
to
sysbuf
Processor 1
User mode
kernel mode
MPI_Irecv
computation
recvbuf
sysbuf
copy
sysbuf
to
recvbuf
data
count
DATATYPE
dest
CPU id
tag
MPI_COMM_WORLD communicator
request
112
MPI_Irecv
MPI_Irecv ((void *)&data, count, DATATYPE, src, tag, MPI_COMM_WORLD, request);
data
count
DATATYPE
dest
tag
MPI_COMM_WORLD
request
CPU id
communicator
MPI_Wait
MPI_Wait (request, istat);
request
Istat
MPI_Isend MPI_Irecv
request
PROGRAM
T6DCP
#define kk
#define km
#define
#define
#define
#define
20
3
mm 160
nn
120
m
40
np
4
nproc, myid, istart, iend, icount, r_nbr, l_nbr, lastp, iroot, istartg;
int
itag, istart2, iend1, istartm1, iendp1;
int
gstart[16],gend[16],gcount[16];
MPI_Status
istat[8];
MPI_Comm
comm;
MPI_Request
requ1, reqps1;
lastp = nproc-1;
istartg = gstart[myid];
printf( "NPROC,MYID,ISTART,IEND,istartg=%d\t%d\t%d\t%d\t%d\n",
nproc, myid, istart, iend, istartg);
istartm1 = istart-1;
iendp1 = iend+1;
istart2 = istart;
if (myid == 0) istart2 = 2;
iend1 = iend;
if (myid == lastp ) iend1 = iend-1;
l_nbr = myid - 1;
r_nbr = myid + 1;
if (myid == 0) l_nbr = MPI_PROC_NULL;
if (myid == lastp) r_nbr = MPI_PROC_NULL;
/*
Test data generation
*/
/* for (i=0; i<mm; i++) */
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++)
for (k=0; k<kk; k++)
u1[i][j][k]=1.0/(double) ii + 1.0/(double) (j+1) + 1.0/(double) (k+1);
}
/* for (i=0; i<mm; i++) */
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++)
for (k=0; k<kk; k++)
v1[i][j][k]=2.0/(double) ii + 1.0/(double) (j+1) + 1.0/(double) (k+1);
}
for (i=istart; i<=iend; i++) {
ii = i + istartg -1;
for (j=0; j<nn; j++) {
115
*/
nnkk = nn*kk;
itag = 10;
/*
*/
MPI_Isend ((void *)&u1[iend][0] [0],
MPI_Irecv ((void *)&u1[istartm1][0] [0], nnkk, MPI_DOUBLE, l_nbr, itag, comm, &requ1);
itag = 20;
/* MPI_Sendrecv ((void *)&ps1[istart][0], nn, MPI_DOUBLE, l_nbr, itag,
(void *)&ps1[iendp1][0], nn, MPI_DOUBLE, r_nbr, itag, comm, istat);
*/
MPI_Isend ((void *)&ps1[istart][0], nn, MPI_DOUBLE, l_nbr, itag, comm, &reqps1)
MPI_Irecv ((void *)&ps1[iendp1][0], nn, MPI_DOUBLE, r_nbr, itag, comm, &reqps1);
/* for (i=0; i<mm; i++) { */
for (i=istart; i<=iend; i++) {
for (j=0; j<nn; j++) {
for (k=0; k<km; k++) {
f1[i][j][k]=0.0;
f2[i][j][k]=0.0;
}
}
}
116
/*
/*
*/
{ */
sumf2 +=f2[i][j][k];
}
}
}
/*
Output data for validation
*/
MPI_Allreduce ((void *)&sumf1,(void *)&gsumf1, 1, MPI_DOUBLE, MPI_SUM, comm);
MPI_Allreduce ((void *)&sumf2,(void *)&gsumf2, 1, MPI_DOUBLE, MPI_SUM, comm);
icount1 = m*nn*km;
iroot=0;
MPI_Gather((void *)&f2[istart][0][0],icount1,MPI_DOUBLE,
(void *)&tt,
icount1,MPI_DOUBLE, iroot, comm);
if (myid == 0) {
printf( "SUMF1,SUMF2= %.5f\t%.5f\n", gsumf1, gsumf2 );
printf( " tt[i][1][1],i=0,159,5\n");
for (i = 0; i < mm; i+=40) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
tt[i][1][1],tt[i+5][1][1],tt[i+10][1][1],tt[i+15][1][1],
tt[i+20][1][1],tt[i+25][1][1],tt[i+30][1][1],tt[i+35][1][1]);
}
clock=MPI_Wtime() - clock;
printf( " myid, clocktime= %d\t%.5f\n", myid, clock);
}
MPI_Finalize();
return 0;
}
T6DCP CPU :
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND,istartg=4
NPROC,MYID,ISTART,IEND,istartg=4
NPROC,MYID,ISTART,IEND,istartg=4
3
0
2
NPROC,MYID,ISTART,IEND,istartg=4
1
SUMF1,SUMF2= 26172.46054
-2268.89180
F2[i][1][1],i=1,160,5
1
1
1
40
40
40
121
1
81
40
41
118
0.000
-0.262
-0.255
-0.253
-0.255
-0.253
-0.254
-0.253
0.02873
0.02895
119
6.2
CPU
CPU MPI_Pack buffer
CPU buffer MPI_Unpack
buffer
MPI_Sendrecv iend
istartm1 :
itag=110
MPI_Sendrecv ((void *)&ps1[iend][0],
n, MPI_DOUBLE, r_nbr, itag,
(void *)&ps1[istartm1][0], n, MPI_DOUBLE, l_nbr, itag, comm, istat);
itag=120
MPI_Sendrecv ((void *)&ps2[iend][0],
n, MPI_DOUBLE, r_nbr, itag,
(void *)&PS2[istartm1][0], n, MPI_DOUBLE, l_nbr, itag, comm, istat);
MPI_Pack iend buf1
MPI_Sendrecv buf1 r_nbr buf2 MPI_Unpack
buf2 istartm1
#define n 120
#define bufsize n*2*8
char
buf1[bufsize], buf2[bufsize];
int
ipos, itag, icount, l_nbr, r_nbr;
MPI_Comm
MPI_Status
comm;
istat[8];
MPI_Barrier (comm);
ipos=0
MPI_Pack ( (void *)&ps1[iend][0], n, MPI_DOUBLE, (void *)&buf1, bufsize, &ipos, comm);
MPI_Pack ( (void *)&ps2[iend][0], n, MPI_DOUBLE, (void *)&buf1, bufsize, &ipos, comm);
itag=120;
MPI_Sendrecv ((void *)&buf1, bufsize, MPI_CHAR, r_nbr, itag,
(void *)&buf2, bufsize, MPI_CHAR, l_nbr, itag, comm, istat);
if (myid > 0) {
ipos=0;
120
121
35
30
25
Mbytes/s
IBM SP2_160 us
IBM SP2_120 us
IBM SP2_160
IBM SP2_120
20
15
10
16M
8M
4M
2M
1M
512K
256K
128K
64K
32K
16K
8K
4K
2K
1K
512
256
128
64
32
16
122
600
500
Fujistu VPP300
HP SPP2000
IBM SP2_375
IBM SP2_160
400
300
200
16M
8M
4M
2M
1M
512K
256K
128K
64K
32K
16K
8K
4K
2K
1K
512
256
128
64
32
16
100
8
Mbytes/s
700
6.4 CPU
123
6.3
CPU CPU
ps2 u2 i-1i+1
ps2 d1 i-1
d1 i-1
for (i=istart; i<=iend1; i++) {
for (j=0; j<jend1; j++) {
d1[i][j]=(ps2[i+1][j]+ps2[i][j])*HXU[i][j]*0.50;
d2[i][j]=(ps2[i][j+1]+ps2[i][j])*HXV[i][j]*0.50;
}
}
MPI_Sendrecv((void *)&d1[iend][0],
nn, MPI_DOUBLE, r_nbr, itag,
(void *)&d1[istartm1][0], nn, MPI_DOUBLE, l_nbr, itag, comm, istat);
for (i=istart2; i<=iend1; i++)
for (j=1; j<n1; j++)
for (k=0; k<kk; k++)
d11[i][j][k]= (d1[i][j]*u2[i][j][k]-d1[i-1][j]*u2[i-1][j][k])*hmmx[i][j]
+ (d2[i][j]*v2[i][j][k]-d2[i][j-1]*v2[i][j-1][k])*hmmy[i][j];
125
6.4
MPI CPU
MPI_ScatterMPI_Scatterv MPI_Bcast CPU
CPU
CPU
MPI_GatherMPI_Ggatherv CPU
CPU CPU
6.4.1
bcd np input.11
input.12input.13 . . .
/*
PROGRAM PIOSEQ
#include <stdio.h>
*/
#include <stdlib.h>
#define mm
200
#define np
4
#define m
50
main ()
{
double
int
FILE
char
/*
'input.dat' */
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
PROGRAM PIODCP
Each processor read its own data from individual file
*/
#include <stdio.h>
#include <stdlib.h>
128
#include <mpi.h>
#define mm 200
#define np
#define m
4
50
string[10];
i, j, k, iu, size;
*fp;
a[m], b[m], c[m], d[m], t[mm], suma, sumall;
int
nproc, myid, istart, iend, iroot, idest;
MPI_Comm
comm;
MPI_Status istat[8];
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
comm=MPI_COMM_WORLD;
istart=0;
iend=m-1;
/*
READ INPUT DATA and DISTRIBUTE INPUT DATA
*/
if(nproc != np) {
printf( "nproc not equal to np= %d\t%d\t",nproc, np);
printf(" program will stop");
MPI_Finalize();
return 0;
}
iu=11+myid;
sprintf(string, "input.%d", iu);
fp = fopen(string, "r");
129
size = m*sizeof(double);
fread ((void *)&b[istart], size, 1, fp);
fread ((void *)&c[istart], size, 1, fp);
fread ((void *)&d[istart], size, 1, fp);
fclose( fp );
/*
COMPUTE, GATHER COMPUTED DATA, and WRITE OUT the RESULT
*/
suma=0.0;
/* for(i=0; i<ntotal; i++) {
*/
size = m*nn*kk*sizeof(double);
fread ((void *)&b[istart][0][0], size, 1, fp);
fread ((void *)&c[istart] [0][0], size, 1, fp);
fread ((void *)&d[istart] [0][0], size, 1, fp);
130
CPU CPU (load
balance) CPU
CPU (local
disk) CPU CPU
CPU
system CPU input.xx
CPU /var/tmp CPU /var/tmp input.xx
#define
#define
#define
#include
mm 200
np
4
m
50
<mpi.h>
double
char
int
MPI_Init();
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
. . . . . .
iu=11+myid;
sprintf(cmd, cp input.%d, iu, /var/tmp);
system (cmd);
sprintf(fname, /var/tmp/input.%d, iu);
fp = fopen(fname, "r");
size = m*sizeof(double);
fread ((void *)&b[1], size, 1, fp);
fread ((void *)&c[1], size, 1, fp);
fread ((void *)&d[1], size, 1, fp);
fclose( fp );
131
6.4.2
CPU CPU
A output.xx
#define
#define
#define
#include
mm 200
np
4
m
50
<mpi.h>
double
char
int
MPI_Init();
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
. . . . . .
iu=11+myid;
sprintf(fname, output.%d, iu);
fp = fopen(fname, "r");
size = m*sizeof(double);
fwrite ((void *)&b, size, 1, fp);
fwrite ((void *)&c, size, 1, fp);
fwrite ((void *)&d, size, 1, fp);
fclose( fp );
double
size = m*nn*kk*sizeof(double);
fwrite ((void *)&b, size, 1, fp);
fwrite ((void *)&c, size, 1, fp);
fwrite ((void *)&d, size, 1, fp);
132
output.11output.12output.13 . . . np
#define np
4
#define mm 200
char
fname[10];
double a[mm], b[mm], c[mm], d[mm];
int
i, iu, size
for (i=0; i<np; i++) {
iu=11+I;
sprintf (fname, output.%d, iu);
fp = fopen (fname, r);
startend (i, np, 0, mm-1, &istart, &iend);
size = (iend-istart+1)*sizeof(double);
fread ((void *)&a[istart], size, 1, fp);
fclose( fp );
}
sprintf (fname, output.dat);
fp = fopen (fname, w);
fwrite ((void *)&a, sizeof(a), 1, fp);
size = (iend-istart+1)*nn*kk*sizeof(double);
fread ((void *)&a[istart][0][0], size, 1, fp);
133
MPI (Transposing Block Distribution)
(2 Way Recursive and Pipeline method)
134
7.1
MPI MPI_INTMPI_FLOATMPI_DOUBLE
MPI_CHAR (derived data type)
MPI_Type_vectorMPI_Type_contiguousMPI_Type_indexed
MPI_Type_struct MPI_Type_vector
(Constant Stride)MPI_ Type_contiguous
MPI_ Type_struct
C struct
, C struct
struct {
float a;
float b;
int
n;
} load;
MPI MPI_ Type_struct :
#define count 3
int
length[count];
MPI_Datatype oldtype[count];
MPI_Aint
disp[count];
MPI_Datatype newtype;
MPI_ Type_struct ( count, length, disp, oldtype, &newtpye);
MPI_Type_commit(&newtpye);
count
length
disp
() count
count MPI_Aint
oldtype
newtype
count MPI_Datatype
MPI_Datatype
MPI_Address
135
(Displacement) :
MPI_Address ( (void *)&data, &address);
data
adress
data
/*
PROGRAM T7STRUCT
C struct and related MPI_Type_struct example
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define count
3
/*--------- MPI related data ---------*/
int
nproc, myid;
MPI_Comm
comm;
MPI_Status
istat[8];
newtype;
disp[3];
float a;
float b;
int
n;
} new;
136
MPI_Sendrecv MPI_Pack
buf1 MPI_ Sendrecv buf 1 buf 2
MPI_Unpack buf 2
137
#define
im=160;
#define
float
int
char
km=20;
up[im+1][km], vp[im+1][km], wp[im+1][km];
bufsize = km*4*8, km2=km*2, itag, ipos, l_nbr, r_nbr;
buf1[bufsize], buf2[bufsize];
MPI_Comm comm;
. . . . . . . .
if (myid > 0) {
ipos=0;
MPI_Pack ( (void *)&up[istart][0], km, MPI_FLOAT, (void *)&buf1, bufsize, ipos, comm);
MPI_Pack ((void *)&vp[istart][0], km2, MPI_FLOAT, (void *)&buf1, bufsize, ipos, comm);
MPI_Pack ((void *)&wp[istart][0], km, MPI_FLOAT, (void *)&buf1, bufsize, ipos, comm);
}
itag=202;
MPI_Sendrecv((void *)&buf1, bufsize, MPI_CHAR, l_nbr, itag,
(void *)&buf2, bufsize, MPI_CHAR, r_nbr, itag, comm, istat);
if (myid < nproc)
{
ipos=0;
MPI_Unpack ((void *)& buf 2, bufsize, ipos,
(void *)&up[iendp1][0], km, MPI_FLOAT, comm );
MPI_ Unpack ((void *)& buf 2, bufsize, ipos,
(void *)&vp[iendp1][0], km2, MPI_FLOAT, comm );
MPI_ Unpack ((void *)& buf 2, bufsize, ipos,
(void *)&wp[iendp1][0], km, MPI_FLOAT, comm );
}
km MPI_Type_Contiguous
cont2d :
MPI_Datatype cont2d;
MPI_Type_ Contiguous ( km, MPI_DOUBLE, &cont2d );
MPI_Type_commit (&cont2d);
C
138
up km vp km*2 wp
km [istart][0][iendp1][0]
upvpwp dimension
ipack3 MPI_Sendrecv
Pack Unpack
int
length[3], ifirst = 1;
MPI_Datatype ipack3, itype[3];
MPI_Aint
disp[3];
. . . . . . .
if (ifirst == 1) {
ifirst = 0;
length[0] = 1;
length[1] = 2;
length[2] = 1;
itype[0] = cont2d;
itype[1] = cont2d;
itype[2] = cont2d;
MPI_Address( (void *)&up[istart][0], &disp[0]);
MPI_Address((void *)&vp[istart][0], &disp[1]);
MPI_Address((void *)&wp[istart][0], &disp[2]);
For (i=2; i>=0; i--)
disp[i] -= disp[0];
MPI_Type_struct( 3, length, disp, itype, &ipack3);
MPI_Type_commit(&ipack3);
}
itag=202;
MPI_Sendrecv( (void *)&up[istart][0], 1, ipack3, l_nbr, itag,
(void *)&up[iendp1][0], 1, ipack3, r_nbr, itag, comm, istat);
Pack/Unpack
139
7.2
2nd dimension
a(i,j)
P0
P1
n
.
.
.
P2
3333666999
3333666999
2222555888
2222555888
1111444777
1111444777
j=1
i=1
n3333
3333
2222
2222
1111
1111
j=1
i=1 . .
666
666
555
555
444
444
.
999
999
888
888
777
777
m
1st dimension
7.1
row_to_col
block transpose
CPU 7.1
row distribution
column distribution CPU
derived data type 7.2 [i][j]
CPU
140
2nd dimension
P0
P1
P2
A(I,J)
itype(i,j)
N
. (0,2)
.
. (0,1)
. 1111444777
. (0,0) (1,0) (2,0)
J=1
I=1 . . . . M
itype(i,j)
itype(i,j)
3333666999
(0,2) (1,2) (2,2)
(1,2)
2222555888
(0,1) (1,1) (2,1)
(2,1)
(1,0)
(2,0)
1st dimension
Transpose
2nd dimension
jmax
jleng
ileng
1st dimension
jmin
7.3
vector2d
7.2 7.3
block2d
int
jmin, jmax, ileng, jleng, count, stride;
MPI_Datatype block2d[ip][jp];
stride = jmax - jmin +1;
141
row_to_col
transpose
itag=10;
k=-1;
for (id = 0; id < nproc; id++) {
if (id != myid ) {
k=k+1;
istart1=istartg[id];
jstart1=jstartg[id];
MPI_Isend( (void *)&a[istart1][jstart], 1, block2d[id][myid], id, itag, comm, &req1[k]);
MPI_Irecv( (void *)&a[istart][jstart1], 1, block2d[myid][id], id, itag, comm, &req2[k]);
}
}
icount=nproc-1;
MPI_Waitall (icount, req1, stat);
MPI_Waitall (icount, req2, stat);
MPI_IsendMPI_Irecv MPI_Waitall :
MPI_Waitall (count, request, status);
count
request
MPI_Isend MPI_Irecv
count MPI_Request
status
count MPI_Status
request
non-blocking send/recv blocked sendrecv
non-blocking send/recv
itag=10;
for (id = 0; id < nproc; id++) {
if (id != myid ) {
istart1=istartg[id];
jstart1=jstartg[id];
MPI_Sendrecv( (void *)&a[istart1][jstart], 1, vector[id][myid], id, itag,
(void *)&a[istart][jstart1], 1, vector[myid][id], id, itag, comm, istat);
}
}
P0
a(i,j)
n3333
(0,2)
2222
(0,1)
1111
(0,0)
j=1
i=1 . .
P1
666
(1,2)
555
(1,1)
444
(1,0)
.
P2
999
(2,2)
888
(2,1)
777
(2,0)
3333666999
3333666999
2222555888
2222555888
1111444777
1111444777
7.5
block2d
col_to_row transpose
col_to_row
143
itag=20;
k=-1;
for (id = 0; id < nproc; id++) {
if (id != myid ) {
k = k +1;
istart1=istartg[id];
jstart1=jstartg[id];
MPI_Isend( (void *)&a[istart][jstart1], 1, block2d[myid][id], id, itag, comm, &req1[k]);
MPI_Irecv( (void *)&a[istart1][jstart], 1, block2d[id][myid], id, itag, comm, &req2[k]);
}
}
icount=nproc-1;
MPI_Waitall (icount, req1, stat);
MPI_Waitall (icount, req2, stat);
block2d
/*
program transpose
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
*/
#define np 3
#define mm 9
#define nn 6
main ( argc, argv)
int argc;
char **argv;
{
int
a[mm][nn];
int
int
int
FILE
*fp;
char
string[80], fname[16];
MPI_Datatype vector[np][np];
MPI_Request
req1[np], req2[np];
MPI_Status
istat[8];
144
MPI_Comm
comm;
sprintf(string,"%d %d %d %d %d %d %d %d %d\n",
a[0][j],a[1][j],a[2][j],a[3][j],a[4][j],a[5][j],a[6][j],a[7][j],a[8][j]);
fwrite( (void *)&string, sizeof(string), 1, fp );
}
/*
row_to_col
*/
itag=10;
k=-1;
for (id = 0; id < nproc; id++) {
if (id != myid ) {
k=k+1;
istart1=istartg[id];
jstart1=jstartg[id];
MPI_Isend( (void *)&a[istart1][jstart], 1, block2d[id][myid],
id, itag, comm, &req1[k]);
MPI_Irecv( (void *)&a[istart][jstart1], 1, block2d[myid][id],
id, itag, comm, &req2[k]);
}
}
icount=nproc-1;
MPI_Waitall (icount, req1, istat);
MPI_Waitall (icount, req2, istat);
sprintf( string, "after row_to_col\n");
fwrite( (void *)&string, sizeof(string), 1, fp );
for (j=nn-1; j>=0; j--) {
sprintf(string,"%d %d %d\0\0",
a[istart][j],a[istart+1][j],a[istart+2][j]);
fwrite( (void *)&string, sizeof(string), 1, fp );
}
/*
col_to_row
*/
MPI_Barrier( comm );
itag=20;
k=-1;
for (id = 0; id < nproc; id++) {
146
if (id != myid ) {
k=k+1;
istart1=istartg[id];
jstart1=jstartg[id];
MPI_Isend( (void *)&a[istart][jstart1], 1, block2d[myid][id],
id, itag, comm, &req1[k]);
MPI_Irecv( (void *)&a[istart1][jstart], 1, block2d[id][myid],
id, itag, comm, &req2[k]);
}
}
icount=nproc-1;
MPI_Waitall (icount, req1, istat);
MPI_Waitall (icount, req2, istat);
for (i=0; i<mm; i++)
for (j=jstart; j<=jend; j++)
a[i][j]=a[i][j]+10;
sprintf( string, "after col_to_row\n");
fwrite( (void *)&string, sizeof(string), 1, fp );
for (j=jstart; j<=jend; j++) {
sprintf(string,"%d %d %d %d %d %d %d %d %d\n",
a[0][j],a[1][j],a[2][j],a[3][j],a[4][j],a[5][j],a[6][j],a[7][j],a[8][j]);
fwrite( (void *)&string, sizeof(string), 1, fp );
}
MPI_Finalize();
return 0;
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
147
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
CPU fort.11
1.
1.
1.
1.
1.
1.
4.
4.
4.
4.
4.
4.
7.
7.
7.
7.
7.
7.
after col_to_row
11. 11. 11. 14.
11. 11. 11. 14.
14.
14.
14.
14.
17.
17.
17.
17.
17.
17.
after row_to_col
1.
1.
1.
1.
1.
1.
2.
2.
3.
3.
2.
2.
3.
3.
2.
2.
3.
3.
fort.12
2.
2.
2.
2.
2.
2.
after row_to_col
4.
4.
4.
5.
5.
5.
5.
5.
5.
8.
8.
8.
8.
8.
8.
6.
6.
6.
6.
6.
6.
after col_to_row
12. 12. 12. 15.
15.
15.
18.
18.
18.
4.
5.
5.
4.
5.
5.
4.
5.
5.
148
12.
12.
12.
15.
15.
15.
18.
18.
18.
6.
6.
6.
6.
6.
6.
9.
9.
9.
9.
9.
9.
16.
16.
16.
16.
16.
16.
19.
19.
19.
19.
19.
19.
fort.13
3.
3.
3.
3.
3.
3.
after row_to_col
7.
7.
7.
7.
7.
7.
8.
8.
8.
8.
8.
8.
9.
9.
9.
9.
9.
9.
after col_to_row
13.
13.
13.
13.
13.
13.
149
7.3
for loop
for loop X
index [i][j] index [i][j][i-1][j][i][j-1]i j
(Recursive) (2-Way Recursive )
(Pipeline Method)
#define m
128
#define m 128
double x[m+2][n+2];
for (i=1; i<=m; i++)
for (j=1; j<=n; j++)
x[i][j]=x[i][j]+( x[i-1][j]+x[i][j-1] )*0.5;
2nd Dimension
j
P0
P1
P2
x[i][j]
3
i
1st Dimension
7.2 (a)
150
x 7.2(a)
CPU
CPU j CPU
CPU CPU
CPU CPU 7.2(b)
345 CPU 26 CPU
17 CPU j
j
P0
P1
P2
time
7
6
7.2 (b)
/*
program pipeseq
#include <stdio.h>
*/
#include <stdlib.h>
#define m 128
#define n 128
151
main ()
{
double
int
FILE
wtime(&isec1, &nsec1);
fp = fopen( "input.dat", "r");
fread( (void *)&x, sizeof(x), 1, fp );
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
eps=1.0e-5;
omega=0.5;
for (loop=0; loop<36000; loop++) {
err1=0.0;
for (i=1; i<=m; i++) {
for (j=1; j<=n; j++) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
if(err1 <= eps) break;
}
printf( "loop,err1 = %d %.5e\n", loop, err1);
printf( " x[i][n], i=1; i<=128; i+=8\n");
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
wtime(&isec2, &nsec2);
152
3.700
6.900
2.232
6.741
8.797
1.727
7.457
3.974
4.557
4.171
7.183
3.670
clock time=10.663873
6.317
5.158
6.057
6.431
4.818
3.384
6.752
5.198
5.458
5.561
/*
program pipeline
Parallel on 1st dimension
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define m 128
#define n
128
char **argv;
{
double
int
int
*fp;
l_nbr = MPI_PROC_NULL;
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
}
count=(m+2)*(n+2);
MPI_Bcast((void *)&x, count, MPI_DOUBLE, 0, comm);
for (ip=0; ip<nproc; ip++)
startend( ip, nproc, 1, m, &istartg[i], &iendg[i]);
iblock = 4;
omega = 0.5;
eps = 1.0e-5;
for (loop=1; loop<36000; loop++) {
err1 = 1.0e-15;
itag = 20;
MPI_Sendrecv ((void *)&x[istart][0], n+2, MPI_DOUBLE, l_nbr, itag,
(void *)&x[iendp1][0], n+2, MPI_DOUBLE, r_nbr, itag, comm, istat);
itag = 10;
for (jj=1; jj<=m; jj+=iblock) {
iblklen = min(iblock, n-jj+1);
MPI_Recv( (void *)&x[istartm1][jj], iblklen, MPI_DOUBLE, l_nbr, itag, comm, istat);
for (i=istart; i<=iend; i++) {
for (j=jj; j<=jj+iblklen-1; j++) {
temp = 0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j] = x[i][j]+omega*temp;
if ( temp < 0.0) temp = -temp;
if(temp >
}
}
MPI_Send( (void *)&x[iend][jj], iblklen, MPI_DOUBLE, r_nbr, itag, comm);
}
MPI_Allreduce((void *)&err1,(void *)&gerr1,1,MPI_DOUBLE,MPI_MAX, comm);
err1 = gerr1;
if(err1 < eps) break;
155
}
itag = 110;
if( myid == 0) {
for (isrc=1; isrc<nproc; isrc++) {
istart1=istartg[isrc];
count1=(iendg[isrc]-istart1+1)*(n+2);
MPI_Recv((void *)&x[istart1][0], count1, MPI_DOUBLE, isrc, itag, comm, istat);
}
printf( "loop,err1 = %d %.5e\n", loop, err1);
printf( " x[i][n], i=1; i<=128; i+=8\n");
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
}
else {
count = (iend-istart+1)*(n+2);
MPI_Send ((void *)&x[istart][0], count, MPI_DOUBLE, 0, itag, comm);
}
clock = MPI_Wtime() - clock;
printf( " myid, clock time= %d
MPI_Finalize();
return 0;
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
156
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
min(int i1, int i2)
{
if (i1 < i2) return i1;
else return i2;
}
PIPELINE IBM SP2 SMP CPU 13.93
10.66
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0
1
2
3
4.349
6.040
4.860
7.116
4.919
0.318
2.283
3.704
2.286
4.340
myid, clock time= 2 13.927630
loop,err1 = 10567 9.99821e-06
myid, clock time= 1 13.927630
myid, clock time= 3 13.927758
x[i][n], i=1; i<=128; i+=8
7.457
3.974
4.557
6.752
4.171
7.183
3.670
5.198
myid, clock time= 0
5.458
5.561
1
33
65
97
32
64
96
128
3.700
6.900
2.232
6.741
8.797
1.727
6.317
5.158
6.057
6.431
4.818
3.384
13.928672
157
SOR
SOR(Successive
Over-Relaxation)
SOR
SOR
SOR
SOR
158
8.1 SOR
Successive Over-Relaxation (SOR) method Laplace
for-loop x for-loop x
omega
for-loop
for (i=1; i<=m; i++)
err1=temp;
}
}
x[i][j] 8.1
x[i-1][j] x[i][j-1] x[i+1][j] x[i][j+1]
(pipeline method)
(red-black SOR method)
2nd Dimension
X(I,J)
about to be updated
already updated
x[i][j+1]
x[i-1][j]
j=1
x[i][j]
x[i+1][j]
x[i][j-1]
i=1
8.1
1st Dimension
SOR
159
/*
program sor
Sequential version of Successive Over-Relaxation Method
*/
#include <stdio.h>
#include <stdlib.h>
#define m 128
#define n
128
*fp;
wtime(&isec1, &nsec1);
fp = fopen( "input.dat", "r");
fread( (void *)&x, sizeof(x), 1, fp );
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<36000; loop++) {
err1 = 0.0;
for (i=1; i<=m; i++) {
for (j=1; j<=n; j++) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
160
#include <stdlib.h>
#define m 128
#define n
double
main ()
{
double
int
FILE
128
seed = 123456.78;
x[m+2][n+2];
i, j;
*fp;
twom16=twom08*twom08;
twom31=twom16*twom08*twom04*twom02*twom01;
twom62=twom31*twom31;
}
for (i=0; i<number; i++) {
seed=seed*a;
ic=(int) (seed/b);
seed-=b*(double)ic;
array[i] = seed*twom31+seed*twom62;
}
return 0;
}
SORSEQ IBM SP2 SMP loop 10567
eps 10.66
4.349
6.040
4.860
7.116
0.318
2.283
3.704
2.286
loop,err1 = 10567 9.99821e-06
x[i][n], i=1; i<=128; i+=8
7.457
3.974
4.557
6.752
4.171
7.183
3.670
clock time=10.664137
5.198
4.919
3.700
2.232
8.797
4.340
6.900
6.741
1.727
5.458
6.317
6.057
4.818
5.561
5.158
6.431
3.384
163
8.2 SOR
(red-black SOR method)
8.2
i+j
i+j
SOR
2nd Dimension
x[i][j]
j=n
black element
3
2
j=1
i=1
8.2
1st Dimension
SOR
j+i
j+i
/*
program sorrb
Sequential version of red-black Successive Over-Relaxation Method
*/
#include <stdio.h>
#include <stdlib.h>
#define m 128
#define n 128
main ( argc, argv)
int argc;
164
char **argv;
{
double
int
FILE
wtime(&isec1, &nsec1);
fp = fopen( "input.dat", "r");
fread( (void *)&x, sizeof(x), 1, fp );
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<36000; loop++) {
err1 = 0.0;
for (i=1; i<=m; i++) {
/*
*/
/*
*/
i3;
i3=i1/i2;
i1=i1-i3*i2;
return i1;
}
SORRB IBM SP2 SMP loop 10313
eps 4.51 SOR 10567
SOR 10.66
4.349
6.040
4.860
7.116
0.318
2.283
3.704
2.286
loop,err1 = 10313 9.99917e-06
x[i][n], i=1; i<=128; i+=8
4.919
3.700
2.232
8.797
4.340
6.900
6.741
1.727
166
7.457
4.171
3.974
7.183
4.557
3.670
6.752
5.198
5.458
5.561
6.317
5.158
6.057
6.431
4.818
3.384
clock time=4.505438
2nd Dimension
P0
P1
P2
x[i][j]
j=n
black element
3
2
j=1
i=1
1st Dimension
/*
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define m 128
#define n 128
double
int
x[m+2][n+2];
nproc, myid, istart, iend, count, count1, istart1,
comm;
167
}
}
count=(m+2)*(n+2);
MPI_Bcast((void *)&x, count, MPI_DOUBLE, 0, comm);
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<12000; loop++) {
err1 = 0.0;
n2 = n+2;
MPI_Sendrecv ((void *)&x[istart][0], n2, MPI_DOUBLE, l_nbr, 1,
(void *)&x[iendp1][0], n2, MPI_DOUBLE, r_nbr, 1, comm, istat);
MPI_Sendrecv ((void *)&x[iend][0],
n2, MPI_DOUBLE, r_nbr, 2,
(void *)&x[istartm1][0],n2, MPI_DOUBLE, l_nbr, 2, comm, istat);
for (i=istart; i<=iend; i++) {
/*
red(white) grid) */
for (j=mod(i+1,2)+1; j<=n; j+=2) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
MPI_Sendrecv ((void *)&x[istart][0], n2, MPI_DOUBLE, l_nbr, 3,
(void *)&x[iendp1][0], n2, MPI_DOUBLE, r_nbr, 3, comm, istat);
MPI_Sendrecv ((void *)&x[iend][0],
n2, MPI_DOUBLE, r_nbr, 4,
(void *)&x[istartm1][0],n2, MPI_DOUBLE, l_nbr, 4, comm, istat);
for (i=istart; i<=iend; i++) {
/*
black grid
*/
for (j=mod(i,2)+1; j<=n; j+=2) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
MPI_Allreduce((void *)&err1,(void *)&gerr1,1,MPI_DOUBLE,MPI_MAX, comm);
err1 = gerr1;
if(err1 <= eps) break;
}
169
itag=30;
if (myid == 0) {
for (i=1; i<nproc; i++) {
istart1=istartg[i];
count1 =(iendg[i]-istart1+1)*(n+2);
MPI_Recv ((void *)&x[istart1][0], count1, MPI_DOUBLE, i, itag, comm, istat);
}
}
else {
count=(iend-istart+1)*(n+2);
MPI_Send ((void *)&x[istart][0], count, MPI_DOUBLE, 0, itag, comm);
}
if (myid == 0) {
printf( "loop,err1
= %d
%f\n", myid,clock);
}
mod(int i1, int i2)
{
int i3;
i3 = i1/i2;
i3 = i1-i3*i2;
return i3;
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
170
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
SORRBP IBM SP2 SMP CPU ITER 10313
eps 4.72 4.51
Red-Black SOR 10313 (speed up) =4.51/4.72=0.96
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
0
1
2
3
4.349
6.040
4.860
7.116
4.919
0.318
2.283
3.704
2.286
4.340
myid,clock time=1 4.717445
myid,clock time=2 4.717462
myid,clock time=3 4.717453
loop,err1 = 10313 9.99917e-06
x[i][n], i=1; i<=128; i+=8
7.457
3.974
4.557
6.752
4.171
7.183
3.670
5.198
myid,clock time=0 4.718111
1
33
65
97
32
64
96
128
3.700
6.900
2.232
6.741
8.797
1.727
5.458
6.317
6.057
4.818
5.561
5.158
6.431
3.384
ATTENTION: 0031-408
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
0
1
2
3
1
17
33
49
16
32
48
64
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
NPROC,MYID,ISTART,IEND=8
5
6
7
4
81
97
113
65
96
112
128
80
4.349
6.040
4.860
7.116
0.318
2.283
3.704
2.286
myid,clock time=1 9.557924
myid,clock time=2 9.557863
4.919
4.340
3.700
6.900
2.232
6.741
8.797
1.727
5.458
5.561
6.317
5.158
6.057
6.431
4.818
3.384
4.557
6.752
3.670
5.198
9.560302
172
8.3 SOR
SOR
8.4
1 n
2nd
Dimension
x[i][j]
j=n
white element
black element
3
2
j=1
i=1
1st Dimension
8.4 SOR
SOR
/*
program sorzebra
Sequential version of zebra SOR
(Successive Over-Relaxation Method)
*/
#include <stdio.h>
#include <stdlib.h>
#define m 128
#define n 128
main ( argc, argv)
int argc;
char **argv;
{
double x[m+2][n+2], eps, omega, err1, temp, clock;
int
FILE
wtime(&isec1, &nsec1);
fp = fopen( "input.dat", "r");
fread( (void *)&x, sizeof(x), 1, fp );
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<36000; loop++) {
err1 = 0.0;
for (i=1; i<=m; i+=2) {
for (j=1; j<=n; j++) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
for (i=2; i<=m; i+=2) {
for (j=1; j<=n; j++) {
temp=0.25*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
if(err1 <= eps) break;
}
printf( "loop,err1 = %d %.5e\n", loop, err1);
printf( " x[i][n], i=1; i<=128; i+=8\n");
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
174
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
wtime(&isec2, &nsec2);
clock=(double) (isec2-isec1) + (double) (nsec2-nsec1)/1.0e9;
printf( " clock time=%f\n", clock);
return 0;
}
#include <sys/time.h>
int wtime(int *isec, int *nsec)
{
struct timestruc_t tb;
int iret;
iret=gettimer(TIMEOFDAY, &tb);
*isec=tb.tv_sec;
*nsec=tb.tv_nsec;
return 0;
}
sorzebra IBM SP2 SMP loop 10409
eps 10.51
4.349
6.040
4.860
7.116
4.919
0.318
2.283
3.704
2.286
4.340
loop,err1 = 10409 9.99896e-06
x[i][n], i=1; i<=128; i+=8
7.457
3.974
4.557
4.171
7.183
3.670
clock time=10.511644
6.752
5.198
5.458
5.561
3.700
6.900
2.232
6.741
8.797
1.727
6.317
5.158
6.057
6.431
4.818
3.384
8.5 CPU
CPU CPU index
for (i = 0; i < nproc; i++) {
startend(i, nproc, 1, (m+1)/2, &istart, &iend);
istartg[i] = istart*2-1;
iendg[i] = min (m, iend*2);
istart=istartg[myid];
iend =iendg[myid];
175
2nd
Dimension
P0
P1
P2
j=n
x[i][j]
white element
black element
3
2
j=1
i=1
1st Dimension
8.5 SOR
CPU index i istart istart-1
CPU MPI_Sendrecv istart-1
i+1 CPU iend+1
MPI_ Sendrecv iend+1 i-1
CPU sor_zebrap
/*
of zebra SOR
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define m 128
#define n 128
double
int
x[m+2][n+2];
nproc, myid, istart, iend, count, count1, istart1,
istartm1, iendp1, l_nbr, r_nbr, lastp;
int
MPI_Status
MPI_Comm
istartg[32], iendg[32];
istat[8];
comm;
int argc;
char **argv;
{
double
int
FILE
l_nbr = MPI_PROC_NULL;
}
}
else {
count=(iend-istart+1)*(n+2);
MPI_Send ((void *)&x[istart][0], count, MPI_DOUBLE, 0, itag, comm);
}
if (myid == 0) {
printf( "loop,err1 = %d %.5e\n", loop, err1);
printf( " x[i][n], i=1; i<=128; i+=8\n");
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
}
clock=MPI_Wtime() - clock;
printf( " myid,clock time=%d
%f\n", myid,clock);
MPI_Finalize();
return 0;
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
179
}
}
min(int i1, int i2)
{
if (i1 < i2) return i1;
else return i2;
}
SOR_ZEBRAP IBM SP2 SMP CPU loop
10409 eps 8.39 (speed up) =10.51/8.39=1.25
ATTENTION: 0031-408 4 tasks allocated by LoadLeveler, continuing...
NPROC,MYID,ISTART,IEND=4
1
33
64
NPROC,MYID,ISTART,IEND=4
2
65
96
NPROC,MYID,ISTART,IEND=4
NPROC,MYID,ISTART,IEND=4
myid,clock time=1 8.384307
3
0
97
1
128
32
5.458
5.561
6.317
5.158
6.057
6.431
4.818
3.384
180
8.4 SOR
x[i][j]
8.6 for loop :
err1 = 0.0;
for (i=1; i<=m; i++) {
for (j=1; j<=n; j++) {
temp=0.125*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] +
x[i-1][j-1]+x[i-1][j+1]+x[i+1][j-1]+x[i+1][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
2nd Dimension
x[i][j]
n
.
x(i-1,j+1) x(i,j+1) x(i+1,j+1)
3
x(i-1,j)
x(i,j)
x(i+1,j)
x(i-1,j-1)
x(i,j-1) x(i+1,j-1)
2
j=1
I=1
1st Dimension
8.6 SOR
() 8.6
181
/*
program color_seq
Sequential version of 4 colour Successive Over-Relaxation Method
*/
#include <stdio.h>
#include <stdlib.h>
#define m 128
#define n
128
wtime(&isec1, &nsec1);
fp = fopen( "input.dat", "r");
fread( (void *)&x, sizeof(x), 1, fp );
fclose( fp );
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<36000; loop++) {
err1 = 0.0;
for (i=1; i<=m; i+=2) { /* update circle */
for (j=1; j<=n; j+=2) {
temp=0.125*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] +
x[i-1][j-1]+x[i-1][j+1]+x[i+1][j-1]+x[i+1][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
182
}
for (i=1; i<=m; i+=2) {
/*
update triangle */
update <>
*/
temp=0.125*( x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1] +
x[i-1][j-1]+x[i-1][j+1]+x[i+1][j-1]+x[i+1][j+1] )-x[i][j];
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
}
if(err1 <= eps) break;
}
printf( "loop,err1 = %d %.5e\n", loop, err1);
printf( " x[i][n], i=1; i<=128; i+=8\n");
for (i = 1; i <= m; i+=64) {
printf( "%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\n",
x[i][n],x[i+8][n],x[i+16][n],x[i+24][n],
x[i+32][n],x[i+40][n],x[i+48][n],x[i+56][n]);
}
wtime(&isec2, &nsec2);
183
4.919
4.340
3.700
6.900
2.232
6.741
8.797
1.727
4.290
5.531
5.596
5.200
5.881
5.883
4.026
3.643
CPU
CPU CPU index
184
2nd Dimension
P0
P1
P2
x[i][j]
n
.
3
2
j=1
i=1
1st Dimension
8.6 SOR
istart-1 istart-1
iend+1 iend+1
CPU
/*
program colorp
Parallel on 1st dimension of 4 colour Successive Over-Relaxation Method
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define m 128
#define n 128
main ( argc, argv)
int argc;
char **argv;
{
double
int
int
int
istartg[32], iendg[32];
MPI_Status
istat[8];
185
MPI_Comm
FILE
comm;
*fp;
count=(m+2)*(n+2);
MPI_Bcast((void *)&x, count, MPI_DOUBLE, 0, comm);
eps=1.0e-5;
omega=0.5;
for (loop=1; loop<=36000; loop++) {
err1 = 0.0;
itag=10;
MPI_Sendrecv ((void *)&x[iend][0],
n+2, MPI_DOUBLE, r_nbr, itag,
(void *)&x[istartm1][0], n+2, MPI_DOUBLE, l_nbr, itag, comm, istat);
/*
/*
}
for (i=1; i<=m; i+=2) {
update square */
x[i][j]+=omega*temp;
if(temp < 0) temp=-temp;
if(temp > err1) err1=temp;
}
/*
}
for (i=2; i<=m; i+=2) {
update <>
*/
= %d
}
}
clock=MPI_Wtime() - clock;
printf( " myid,clock time=%d
MPI_Finalize();
return 0;
%f\n", myid,clock);
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
min(int i1, int i2)
{
if (i1 < i2) return i1;
else return i2;
}
colorp IBM SP2 SMP CPU loop 8157
eps 3.24 (speed up) =5.87/3.24=1.81
189
1
2
3
4.919
33
64
65
96
97
128
3.700
2.232
8.797
4.340
6.900
6.741
1.727
4.290
5.531
5.596
5.200
5.881
5.883
4.026
3.643
3.240597
190
(finite element method - FEM)
(finite difference method)
(implicit method)
(explicit method)
191
9.1
/*
*/
#include <stdio.h>
#include <stdlib.h>
#define ne 18
#define nn 28
main ( argc, argv)
int argc;
char **argv;
{
double
int
int
wtime(&isec1, &nsec1);
for (i=1; i<=ne; i++) {
scanf("%d %d %d %d\n",&index[i][0],&index[i][1],&index[i][2],&index[i][3]);
}
for (ie=1; ie<=ne; ie++)
ve[ie]=10.0*ie;
for (in=1; in<=nn; in++)
vn[in]=100.0*in;
for (loop=0; loop<10; loop++) {
for (ie=1; ie<=ne; ie++) {
for (j=0; j<4; j++) {
k= index[ie][j];
vn[k]= vn[k] + ve[ie];
}
}
for (in=1; in<=nn; in++)
vn[in] = vn[in] * 0.25;
for (ie=1; ie<=ne; ie++) {
for (j=0; j<4; j++) {
192
k= index[ie][j];
ve[ie] = ve[ie] + vn[k];
}
}
for (ie=1; ie<=ne; ie++)
ve[ie] = ve[ie] *0.25;
}
printf("result of vn\n");
for (i=1; i<=nn; i+=7)
printf(" %.3f %.3f %.3f %.3f %.3f %.3f %.3f\n",
vn[i],vn[i+1],vn[i+2],vn[i+3],vn[i+4],vn[i+5],vn[i+6]);
printf("result of ve\n");
for (i=1; i<=ne; i+=6)
printf(" %.3f %.3f %.3f %.3f %.3f %.3f\n",
ve[i],ve[i+1],ve[i+2],ve[i+3],ve[i+4],ve[i+5]);
wtime(&isec2, &nsec2);
clock=(double) (isec2-isec1) + (double) (nsec2-nsec1)/1.0e9;
printf( " clock time=%f\n", clock);
return 0;
}
#include <sys/time.h>
int wtime(int *isec, int *nsec)
{
struct timestruc_t tb;
int iret;
iret=gettimer(TIMEOFDAY, &tb);
*isec=tb.tv_sec;
*nsec=tb.tv_nsec;
return 0;
}
Element node 9.1 18 element () element node
() (unstructured grid)
ve vn element node loop for loop
ve vn loop ve vn loop
vn loop vn ve
loop ve
193
8
3
12
6
7
2
20
9
11
5
6
16
12
15
8
10
4
15
19
11
23
18
10
27
17
17
22
13
13
28
18
14
14
7
24
26t
16
21
element
25
node
3
3
12
6
7
2
2
9
11
5
6
15
8
10
4
16
15
11
14
18
27
17
22
13
17
28
23
18
10
13
24
12
19
14
7
20
26
16
21
25
2
3
7
6
3
5
6
7
9 10 11 13 14 15 17 18 19 21 22 23
4
6
7
8 10 11 12 14 15 16 18 19 20 22 23 24
8 10 11 12 14 15 16 18 19 20 22 23 24 26 27 28
7
9 10 11 13 14 15 17 18 19 21 22 23 25 26 27
FEM_SEQ
result of vn
303.506 737.138 743.620 309.989 905.479 2197.706 2214.970
922.743 1476.091 3579.268 3602.639 1499.462 1927.588 4670.236
4695.415 1952.767 2066.994 5005.284 5028.654 2090.365 1655.717
4008.155 4025.419 1672.981 642.193 1554.410 1560.892 648.676
194
result of ve
1281.497 1823.797 1298.016 2526.136 3591.987 2554.243
3617.916 5139.575 3651.343 4262.652 6051.210 4296.079
3991.965 5664.587 4020.072 2474.166 3510.129 2490.686
clock time=0.000746
195
9.2
element 9.218 element
CPU CPU 6 element node CPU
node rank CPU (primary processor) rank CPU
(secondary processor)
25
26
27
28
node
16
17
18
element
21
23
23
24
process 2
13
18
17
17
18
10
14
13
14
19
19
11
15
15
20
20
12
16
process 1
7
10
11
9
12
10
11
12
4
5
5
7
6
8
process 0
1
2
2
3
3
4
9.2
element CPU element
iecntg[i] CPU i element iestartg[i] CPU i element
CPU node incntg[i] CPU i node
inodeg[i][j] CPU i node 9.3 g global
196
ecntg
P0
P1
P2
12
P0
8
8
estartg
eendg
12
13
18
P1
13
14
15
16
17
18
19
20
P2
21
22
23
24
25
26
27
28
ncntg
10
11
12
nodeg
9.3
element node
Associated with
scnt
Process 0
P0
P1 P2
0
Process 2
P0 P1 P2
9
10
11
12
snode
Associated with P0
pcnt
0
pnode
Process 1
P0
P1
P2
P1
4
9
10
11
12
9.4
P2
0
P0
0
17
18
19
20
P1
0
P2
4
P0 P1
0
0
P2
0
17
18
19
20
fem_seq
/*
program femp -- parallel version of finite element explicit method */
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define ne 18
#define nn 28
#define np
3
int
int
int
pcnt [np], pnode[np][nn];
MPI_Status
istat[8];
MPI_Comm comm;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
MPI_Barrier (MPI_COMM_WORLD);
clock=MPI_Wtime();
comm=MPI_COMM_WORLD;
if (myid == 0) {
for (i=1; i<=ne; i++)
scanf("%d %d %d %d\n",&index[i][0],&index[i][1],&index[i][2],&index[i][3]);
}
icount=(ne+1)*4;
MPI_Bcast ((void *)&index, icount, MPI_INT, 0, comm);
198
/*
*/
for (irank = 0; irank < nproc; irank++) {
ncntg[irank]=0;
scnt[irank]=0;
pcnt[irank]=0;
for (j=0; j<nn; j++) {
itg [irank][j]=0;
snode[irank][j]=0;
pnode[irank][j]=0;
}
}
itg CPU node for loop CPU
nn node for loop CPU node
1 9.2 9.1
j
P0
P1
P2
1
1
0
0
2
1
0
0
3
1
0
0
4
1
0
0
5
1
0
0
6
1
0
0
9.1
7
1
0
0
8
1
0
0
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
itg
else {
if (ii == myid) {
pcnt[irank]=pcnt[irank]+1;
pnode[irank][pcnt[irank]]=in; }
}
*/
*/
*/
*/
200
}
}
}
}
/*
*/
count and store all primary node code which belongs to each CPU
for (irank=0; irank<nproc; irank++) {
for (in=1; in<=nn; in++) {
if (itg[irank][in] == 1) {
ncntg[irank]=ncntg[irank]+1;
nodeg[irank][ncntg[irank]]=in;
}
}
k=ncntg[irank];
if(myid == 0) {
printf("nodeg values for irank,k= %d
for (j=1; j<=k; j+=4)
%d\n", irank,k);
printf("%d %d %d %d\n",
nodeg[irank][j],nodeg[irank][j+1],nodeg[irank][j+2],nodeg[irank][j+3]);
}
}
/*
/*
/*
/*
MPI_Barrier(comm);
for (i=1; i<=ncntg[myid]; i++)
bufs[myid][i]=vn[ nodeg[myid][i] ];
itag=30;
if (myid == 0)
for (irank=1; irank<nproc; irank++)
MPI_Recv ((void *)&bufr[irank][1], ncntg[irank], MPI_DOUBLE,
irank, itag, comm, istat);
else
203
%f\n", myid,clock);
}
startend(int myid,int nproc,int is1,int is2,int* istart,int* iend)
{
int ilength, iblock, ir;
ilength=is2-is1+1;
iblock=ilength/nproc;
ir=ilength-iblock*nproc;
if(myid < ir) {
204
*istart=is1+myid*(iblock+1);
*iend=*istart+iblock;
}
else {
*istart=is1+myid*iblock+ir;
*iend=*istart+iblock-1;
}
if(ilength < 1) {
*istart=1;
*iend=0;
}
}
femp IBM SP2 SMP CPU fem_seq
ATTENTION: 0031-408
1
2
7 12
13 18
1111
1111
0000
0000
0000
0000
itg values for irank= 1
0000
0000
1111
1111
1111
0000
0000
itg values for irank= 2
0000
0000
205
0000
0000
1111
1111
1111
NPROC,MYID,ISTART,IEND=3
25 26 27 28
result of vn
303.506 737.138 743.620 309.989 905.479 2197.706 2214.970
922.743 1476.091 3579.268 3602.639 1499.462 1927.588 4670.236
4695.415 1952.767 2066.994 5005.284 5028.654 2090.365 1655.717
4008.155 4025.419 1672.981 642.193 1554.410 1560.892 648.676
result of ve
1281.497 1823.797 1298.016 2526.136 3591.987 2554.243
3617.916 5139.575 3651.343 4262.652 6051.210 4296.079
3991.965 5664.587 4020.072 2474.166 3510.129 2490.686
myid,clock time=0 0.003736
206
1. Tutorial on MPI : The Message-Passing Interface
By William Gropp, Mathematics and Computer Science Division, Argonne National Laboratory,
gropp@mcs.anl.gov
2. MPI in Practice
by William Gropp, Mathematics and Computer Science Division, Argonne National Laboratory,
gropp@mcs.anl.gov
3. A Users Guide to MPI
by Peters S. Pacheco, Department of Mathematics, University of San Francisco, peter@usfca.edu
4. Parallel Programming Using MPI
by J.M.Chuang, Department of Mechanical Engineering, Dalhousie University, Canada
chuangjm@newton.ccs.tuns.ca
5. RS/6000 SP : Practical MPI Programming
IBM International Technical Support Organization,
http://www.redbooks.ibm.com
207
P0
P1
P2
P3
i=51 . . . . 100
i=101 . . . 150
i=151 . . . 200
208
i=1 . . . . . . 50
i=(1 . . . . . 50)
1 . . . . . . 50
(51 . . . 100)
1 . . . . . . 50
(101 . . . 150)
1 . . . . . . 50
Local index
(151 . . 200)
float a[50]
Global index
float b[50]
float c[50]
float d[50]
P0
P1
P2
P3
209
i=151 . . . . . . . . .200
i=101 . . . . . . . . . 150
i=51 . . . . . . . . . . 100
j=8
.
j=1
i=1 . . . . . . . . . . . .50
x[i][j]
x[200][8]
x[200][8]
x[200][8]
x[200][8]
y[200][8]
z[200][8]
y[200][8]
z[200][8]
y[200][8]
z[200][8]
y[200][8]
z[200][8]
P0
P1
P2
P3
210
i=(1 . . . . . . . . . . . .50
i=1 . . . . . . . . . . . . 50
51
100
i=1 . . . . . . . . . . . . 50
101
150
i=1 . . . . . . . . . . . . 50
151
200)
i=1 . . . . . . . . . . . . 50
local index
x[50] [8]
y[50] [8]
z[50] [8]
P0
x[50] [8]
y[50] [8]
z[50] [8]
P1
x[50] [8]
y[50] [8]
z[50] [8]
P2
x[50] [8]
y[50] [8]
z[50] [8]
P3
Sequential Version
X[200][8], y[200] [8], z[200] [8]
211
i=50(100)
i=1 (51)
i=1
i=50(150)
i=1 (101)
i=50(200)
i=1 (151)
j=24
j=24
j=24
j=24
.
.
j=1
k=1 . . . . . . 8
.
.
j=1
k=1 . . . . . . . 8
.
.
j=1
k=1 . . . . . . .8
.
.
j=1
k=1 . . . . . . 8
Global index
x[i][j][k]
x[50][24][8]
y[50][24][8]
x[50][24][8]
y[50][24][8]
x[50][24][8]
y[50][24][8]
x[50][24][8]
y[50][24][8]
z[50][24][8]
z[50][24][8]
z[50][24][8]
z[50][24][8]
P0
P1
P2
P3
Sequential Version :
x[200][24][8], y[200][24][8], z[200][24][8]
212
213