Академический Документы
Профессиональный Документы
Культура Документы
Outline
• First CUDA Program
• Execution Configuration
• Kernel Launch
• Massively Parallel Hardware
• Parallel Execution Model
CUDA Programming
C Programming Model
Model
Host Host Memory
CPU RAM User I/O
A A
A
A A
GPU DRAM
Device Device Memory
Graphics Card
First CUDA Program
• Problem-
– Write a program in CUDA to find square of first
500 whole numbers stored in an array.
– Serial implementation-
#include <stdio.h>
int main()
{
int *a, i, N=500;
a = (int*) malloc (sizeof(int) * N);
for(i=0; i<N; i++) a[i] = i;
for(i=0; i<N; i++) a[i] = a[i] * a[i];
for(i=0; i<N; i++)
printf(“Square of %d = %d\n”, i, a[i]);
return 1;
}
Parallel Implementation
#include <stdio.h> Host Memory
#include <cuda.h> RAM
int main()
{ ah
int *ad, *ah, i, N=500;
ad
DRAM
Device Memory
Execution Configuration
find_square <<< 1, N >>> (ad, N);
__host__ __device__
__global__
CPU GPU
More on Execution Configuration
dim3 grid_spec(4, 3);
dim3 block_spec(2, 2, 1);
my_function <<< grid_spec, block_spec >>> ();
gridDim.x blockDim.x
blockDim.y
gridDim.y
Shared Memory
MPn
SP
MP
GPU
Queries ???