Вы находитесь на странице: 1из 3

Master CI Programarea GPU

Laborator 1

Programarea GPU – Introducere in CUDA

Hello CUDA World!

1. Rulati aplicatia DeviceQuery utilizand NVIDIA GPU Computing SDK Browser si


identificati proprietatile device-urilor CUDA instalate pe statiile din laborator:
CUDA Device
# of Multiprocessors
# of Cores per MP
Total # of cores
Global Memory (MB)
Warp size
# of Threads per block
minimum # of threads processed in SIMD
fashion by a CUDA multiprocessor
Dimensiunile maxime ale unui grid
Dimensiunile maxime ale unui bloc

2. Creati un proiect CUDA in Visual Studio.


a. Urmariti structura programului demo si identificati: portiunea de cod ce se
executa pe GPU, nr. de thread-uri GPU ce executa codul paralel.
b. Modificati aplicatia demo astfel incat sa variati nr. de elemente din vectorii
ce se aduna, iar fiecare element din vectorul rezultat sa fie calculate pe un
thread separate pe GPU. Incercati diferite valori pt. nr de elemente: 1000,
100000, 1000000, 10000000,…. (asigurati-va ca ati furnizat o configuratie
de executie fezabila!)

Urmariti tutorialele CUDA accesibile la:


https://developer.nvidia.com/how-to-cuda-c-cpp
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

1
Master CI Programarea GPU
Laborator 1

Analiza performantelor unei aplicatii CUDA

1. Masurarea timpului de executie

Varianta 1 – Utilizarea unui Timer pe CPU

cudaMemcpy(…);

t1 = myCPUTimer();
myKernel<<<……>>(…);
cudaDeviceSynchronize();
t2 = myCPUTimer();

cudaMemcpy(…);

Nota: Apelul kernelului CUDA este asincron!! Controlul revine pe CPU imediat dupa apel
(foarte posibil inainte de terminarea executiei kernelului pe GPU). Astfel, este obligatorie
sincronizarea CPU-GPU!

Varianta 2 – Utilizarea Event API

CUDA Event API Management Functions:


cudaEventCreate
cudaEventCreateWithFlags
cudaEventDestroy
cudaEventElapsedTime
cudaEventQuery
cudaEventRecord
cudaEventSynchronize
cudaEvent_t start,stop;

// Generate events
cudaEventCreate(&start);
cudaEventCreate(&stop);

// Trigger event 'start'


cudaEventRecord(start, 0);

/* CUDA Host / Device / Kernel Code ... */

cudaEventRecord(stop, 0); // Trigger Stop event


cudaEventSynchronize(stop); // Sync events (BLOCKS till last
(stop in this case) has been recorded!)

2
Master CI Programarea GPU
Laborator 1

float elapsedTime; // Initialize elapsedTime;


cudaEventElapsedTime(&elapsedTime, start, stop); // Calculate
runtime, write to elapsedTime -- cudaEventElapsedTime returns
value in milliseconds. Resolution ~0.5ms

printf("Execution Time: %f", elapsedTime); // Print Elapsed


time

// Destroy CUDA Event API Events


cudaEventDestroy(start);
cudaEventDestroy(stop);

2. CUDA Visual Profiler

https://developer.nvidia.com/nvidia-visual-profiler

CUDA occupancy calculator:


http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

Вам также может понравиться