Вы находитесь на странице: 1из 46

CUDA

A technology that can make super-


computers personal
Presented by
Kunal Garg
2507276
UIET KU
Kurukshetra,India
SUPERCOMPUTER
A supercomputer is a computer that is at the
frontline of current processing capacity, particularly
speed of calculation.
Supercomputers are used for highly calculation-
intensive tasks.
Space for supercomputer pic
GPU
A graphics processing unit or GPU
(VPU) is a specialized processor that
offloads 3D or 2D graphics rendering
from the microprocessor.
Used in embedded systems,
mobile phones, personal computers,
workstations, and game consoles
GPU Computing
The excellent floating point
performance in GPUs led to
the advent of General Purpose
Computing on GPU’s(GPGPU)
GPU computing is the use
of a GPU to do general
purpose scientific and
engineering computing
The model for GPU
computing is to use a CPU
and GPU together in a
heterogeneous computing
model.
Problems in
GPU Programming
Required graphical languages
Difficult for users to program applications for GPU
CUDA
CUDA is
an acronym for Compute Unified Device
Architecture
a parallel computing architecture
computing engine
CUDA
CUDA with industry-standard C
 Write a program for one thread
 Instantiate it on many parallel threads
 Familiar programming model and language
CUDA is a scalable parallel programming model
 Program runs on any number of processors
without recompiling
Advantages of CUDA
CUDA has following advantages over
traditional GPGPU using graphics APIs.
Scattered reads
Shared memory
Faster downloads and readbacks to and from the GPU
Full support for integer and bitwise operations
CUDA Programming Model
Parallel code (kernel) is launched and executed
on a device by many threads
Threads are grouped into thread blocks
Parallel code is written for a thread
 Each thread is free to execute a unique code path
 Built-in thread and block ID variables
CUDA Architecture
The CUDA Architecture
Consists of several
components

Parallel compute engines

OS kernel-level support

User-mode driver

ISA
Tesla 10 Series

CUDA Computing with Tesla T10


240 SP processors at 1.45 GHz: 1 TFLOPS peak
30 DP processors at 1.44Ghz: 86 GFLOPS peak
128 threads per processor: 30,720 threads total
Thread Hierarchy
Threads launched for a parallel section are section
are partitioned into
thread blocks
 Grid = all blocks for a given
launch
Thread block is a group of
threads that can
 Synchronize their execution
 Communicate via shared
memory
Execution Model
Warps and Half Warps
GPU Memory Allocation / Release
Host (CPU) manages device (GPU) memory:
cudaMalloc (void ** pointer, size_t nbytes)
cudaMemset (void * pointer, int value, size_t
count)
cudaFree (void* pointer)
Next Generation CUDA Architecture
The next generation CUDA architecture, codenamed
Fermi is the most advanced GPU architecture ever
built. Its features include
• 512 CUDA cores
• 3.2 billion transistors
• Nvidia Parallel Datacache Technology
Nvidia Gigathread Engine
ECC Support
Applications
Accelerated
rendering of
3D graphics
Video Forensic
Molecular Dynamics
Computational Chemistry
Life Sciences
Bioinformatics
Electrodynamics
Medical Imaging
Oil and gas
Weather and Ocean Modeling
Electronic Design Automaton
Video Imaging
Video
Acceleration
Why should I use a GPU as a Processor
When compared to the latest quad-core CPU, Tesla 20-
series GPU computing processors deliver equivalent
performance at 1/20th the power consumption and
1/10th the cost
When computational fluid dynamics problem is
solved it takes
 9 minutes on a Tesla S870(4GPUs)
 12 hours on one 2.5 GHz CPU core
Double Precision Performance
 Intel core i7 980XE is 107.6 GFLOPS
 AMD Hemlock 5970 is 928 GFLOPS (GPU)
 nVidia's Tesla S2050 & S2070 is 2.1 TFlops - 2.5 Tflops(GPU)
Tesla C1060-933 GFLOPs (GPU)
GeForce 8800 GTX - 346 GFLOPs(GPU)
Core 2 Duo E6600 - 38 GFLOPs
Athlon 64 X2 4600+ - 19 GFLOPs
After all, it’s your personal supercomputer
Bibliography

Вам также может понравиться