Академический Документы
Профессиональный Документы
Культура Документы
PRESTACIONS/ES PAC 3
Semestre Setembre 2015
The exercises
Evaluation criteria
Formatting
Deadline
Describe what the selected algorithm algorithm does (include the references that you
have used): inputs, outputs and pseudo-code describing the algorithm. The pseudo-code
must contain comments on what each important part of it is doing.
1.2
2. Parallel implementation
The goal of this second part is to propose a pseudo-code parallel implementation for the parts
identified in 1.2.
2.1
2.1.1 What strategy have you selected? (i.e: pipeline, shared memory, message passing etc.)
Why?
2.1.2 What other options you could use?
2.1.3 Describe the pseudo-code including comments on why the different parallel selected
parts.
3. Performance projection
3.1
Given the pseudo-code proposed in 2.1 and 1.1 propose a theoretical model to project
the speedup that the parallel implementation may show with respect to the serial
implementation. (Its a model so its not expected to provide 100% accuracy).
3.2
Provide a speedup analysis using the previous model for: 1, 2, 4, 16 and 32 threads.
3.3
Provide a description of what type of computational system would be better for the
provided implementation and what components would be important to invest more.
#include <stdio.h>
const int N = 16;
const int blocksize = 16;
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
cudaMalloc(
cudaMalloc(
cudaMemcpy(
cudaMemcpy(
(void**)&ad, csize );
(void**)&bd, isize );
ad, a, csize, cudaMemcpyHostToDevice );
bd, b, isize, cudaMemcpyHostToDevice );
PRESTACIONS
2. Mandelbrot
3.11
We want to implement a parallel version of a very popular program called Mandelbrot. The
Mandelbrot set results in a geometric figure with infinite complexity (fractal kind of figure)
obtained through a mathematical formula and a recursive algorithm. The left part of the
following figure shows the output that the algorithm generates:
Mandelbrot set
3.12
The code provided as a part of the exercise (mandelbrot.c) is a sequential implementation of the
Mandelbrot set that result in the figure shown in right part previous figure. The provided code
will be used as our reference. To link the code it will be necessary to include the libraries libm
and libX11. One way of compiling the code can be:
(Its important to emphasize that the provided paths will depend on the installation that you are
using. The provided command line works in the cluster provided by UOC the compilation may
show some warnings).
Its important to notice that to visualize the output you will need to have a connection that
allows X forwarding (for example if connecting through linux client ssh X username@host).
In order to compile the CUDA version of mandelbrot (mandelbrot.cu) with Ocelot (more details
below) you will need to use the following instructions:
Questions
2.1.
2.2.
2.3.
What options you have selected (bloc size, stream distributions, etc.)? Why?
What other alternatives you could consider?
2.4.
2.5.
How you would execute mandelbrot using several GPUS in parallel with CUDA?
Provide a scheme / pseudo-code showing your proposal.
3.
The previous step will generate a hello.cu.cpp.ii that you will need to compile with g++
in order to obtain a binary that can be executed:
g++ -o hello hello.cu.cpp.ii -I /usr/local/include/ocelot/api/interface/ -L
/usr/local/lib/ -locelot -L/usr/local/lib -lpthread -ldl -lm
4.
Execute
Evaluation criteria
Criteria that will be used in the evaluation: proper utilization of MPI or OpenMP models, brevity
and clear results, experiment setup and discussion and analysis.
Format
One PDF document containing all the different answers for the selected option containing:
-
All the different codes developed or scripts must be added as annex section at the end
of the document (no limit)
Provide one tar document with the developed codes (if any)
$ tar cvf tot.tar fitxer1 fitxer2 ...
Deadline