computers personal Presented by Kunal Garg 2507276 UIET KU Kurukshetra,India SUPERCOMPUTER A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers are used for highly calculation- intensive tasks. Space for supercomputer pic GPU A graphics processing unit or GPU (VPU) is a specialized processor that offloads 3D or 2D graphics rendering from the microprocessor. Used in embedded systems, mobile phones, personal computers, workstations, and game consoles GPU Computing The excellent floating point performance in GPUs led to the advent of General Purpose Computing on GPU’s(GPGPU) GPU computing is the use of a GPU to do general purpose scientific and engineering computing The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing model. Problems in GPU Programming Required graphical languages Difficult for users to program applications for GPU CUDA CUDA is an acronym for Compute Unified Device Architecture a parallel computing architecture computing engine CUDA CUDA with industry-standard C Write a program for one thread Instantiate it on many parallel threads Familiar programming model and language CUDA is a scalable parallel programming model Program runs on any number of processors without recompiling Advantages of CUDA CUDA has following advantages over traditional GPGPU using graphics APIs. Scattered reads Shared memory Faster downloads and readbacks to and from the GPU Full support for integer and bitwise operations CUDA Programming Model Parallel code (kernel) is launched and executed on a device by many threads Threads are grouped into thread blocks Parallel code is written for a thread Each thread is free to execute a unique code path Built-in thread and block ID variables CUDA Architecture The CUDA Architecture Consists of several components
Parallel compute engines
OS kernel-level support
User-mode driver
ISA Tesla 10 Series
CUDA Computing with Tesla T10
240 SP processors at 1.45 GHz: 1 TFLOPS peak 30 DP processors at 1.44Ghz: 86 GFLOPS peak 128 threads per processor: 30,720 threads total Thread Hierarchy Threads launched for a parallel section are section are partitioned into thread blocks Grid = all blocks for a given launch Thread block is a group of threads that can Synchronize their execution Communicate via shared memory Execution Model Warps and Half Warps GPU Memory Allocation / Release Host (CPU) manages device (GPU) memory: cudaMalloc (void ** pointer, size_t nbytes) cudaMemset (void * pointer, int value, size_t count) cudaFree (void* pointer) Next Generation CUDA Architecture The next generation CUDA architecture, codenamed Fermi is the most advanced GPU architecture ever built. Its features include • 512 CUDA cores • 3.2 billion transistors • Nvidia Parallel Datacache Technology Nvidia Gigathread Engine ECC Support Applications Accelerated rendering of 3D graphics Video Forensic Molecular Dynamics Computational Chemistry Life Sciences Bioinformatics Electrodynamics Medical Imaging Oil and gas Weather and Ocean Modeling Electronic Design Automaton Video Imaging Video Acceleration Why should I use a GPU as a Processor When compared to the latest quad-core CPU, Tesla 20- series GPU computing processors deliver equivalent performance at 1/20th the power consumption and 1/10th the cost When computational fluid dynamics problem is solved it takes 9 minutes on a Tesla S870(4GPUs) 12 hours on one 2.5 GHz CPU core Double Precision Performance Intel core i7 980XE is 107.6 GFLOPS AMD Hemlock 5970 is 928 GFLOPS (GPU) nVidia's Tesla S2050 & S2070 is 2.1 TFlops - 2.5 Tflops(GPU) Tesla C1060-933 GFLOPs (GPU) GeForce 8800 GTX - 346 GFLOPs(GPU) Core 2 Duo E6600 - 38 GFLOPs Athlon 64 X2 4600+ - 19 GFLOPs After all, it’s your personal supercomputer Bibliography