02 Gpu Architecture Overview s07

GPU Architecture Overview
John Owens UC Davis
The Right-Hand Turn
[H&P Figure 1.1]
Why? [Architecture Reasons]

ILP increasingly difficult to extract from instruction stream Control hardware dominates processors
Complex, difficult to build and verify Takes substantial fraction of die Scales poorly
Pay for max throughput, sustain average throughput Quadratic dependency checking
Control hardware doesnt do any math!

Intel Core Duo: 48 GFLOPS, ~10 GB/s NVIDIA G80: 330 GFLOPS, 80+ GB/s
3
AMD Deerhound (K8L)
chip-architect.com
Why? [Technology Reasons]

Industry moving from instructions per second to instructions per watt
Power wall now all-important Traditional proc techniques are not power-efficient
We can continue to put more transistors on a chip

but we cant scale their voltage like we used to and we cant clock them as fast
Go Parallel
Time of architectural innovation
GPUs let us explore using hundreds of processors now, not 10 years from now
Major CPU vendors supporting multicore Interest in general-purpose programmability on GPUs Universities must teach thinking in parallel
Whats Different about the GPU?

The future of the desktop is parallel
We just dont know what kind of parallel
GPUs and multicore are different

Multicore: Coarse, heavyweight threads, better performance per thread GPUs: Fine, lightweight threads, single-thread performance is poor
A case for the GPU

Interaction with the world is visual GPUs have a well-established programming model Market for GPUs is 500M+ total/year
7
The Rendering Pipeline

Application
Compute 3D geometry Make calls to graphics API Transform geometry from 3D to 2D (in parallel)
Geometry
Rasterization
Generate fragments from 2D geometry (in parallel) Combine fragments into image
Composite
GPU
The Programmable Pipeline

Application
Compute 3D geometry Make calls to graphics API Transform geometry from 3D to 2D [vertex programs]
Geometry
Rasterization
Generate fragments from 2D geometry [fragment programs] Combine fragments into image
Composite
GPU
DirectX 10 Pipeline
fixed programmable memory
Constant Input Assembler Vertex Shader Sampler

Vertex Buffer Index Buffer
Constant Geometry Shader Setup Rasterizer
Constant Pixel Shader Sampler

Depth Render Stencil Target
Output Merger
Sampler Stream out

Stream Buffer
Texture
Texture
Texture
Memory
Courtesy David Blythe, Microsoft
Characteristics of Graphics
Large computational requirements Massive parallelism
Graphics pipeline designed for independent operations
Long latencies tolerable Deep, feed-forward pipelines Hacks are OKcan tolerate lack of accuracy GPUs are good at parallel, arithmetically intense, streaming-memory problems
11
Graphics HardwareTask Parallel

Application Command Vertex Geometry Rasterization Fragment Display
Geometry
Application/ Command (CPU) Command Vertex

Rasterization Display
Mem
Fragment
Mem
GPU
Rage 128
13
NVIDIA GeForce 6800 3D Pipeline

Vertex
Triangle Setup Z-Cull
Shader Instruction Dispatch
Fragment
L2 Tex
Fragment Crossbar
Composite
Memory Partition
Memory Partition
Memory Partition
Memory Partition Courtesy Nick Triantos, NVIDIA
Programmable Pipeline
Object Space
Application Command Per-Surface Tessellation Per-Vertex
Texture Spaces
Per-Texel
Texture Memory
Image Space
Primitive Assembly Per-Primitive Rasterization
FB
Pixel Ops
Per-Fragment Image Composition? Per-Pixel Display
[From Akeley and Hanrahan, Real-Time Graphics Architectures]
Generalizing the Pipeline

Transform A to B
Transform A to B
Ex: Rasterization (triangles to fragments) Historically fixed function
Process A to A
Ex: Fragment program Recently programmable, and becoming more so
Process A to A
16
GeForce 8800 GPU
Built around
Host Input Assembler Thread Execution Manager
Thread Processors
programmable units Unified shader

Thread Processors Thread Processors Thread Processors
Thread Processors
Thread Processors
Thread Processors
Thread Processors
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Parallel Data Cache
Load/store Global Memory
[courtesy of Ian Buck, NVIDIA] 17
Unified Shaders
Application Command Vertex Geometry Rasterization Fragment Display
Rasterization
Application/ Command (CPU) Command

Display
Mem
Programmable
Mem
GPU
Towards Programmable Graphics

Fixed function
Configurable, but not programmable
Programmable shading
Shader-centric Programmable shaders, but fixed pipeline
Programmable graphics
Customize the pipeline Neoptica asserts the major obstacle is programming models and tools
http://www.neoptica.com/NeopticaWhitepaper.pdf http://www.graphicshardware.org/previous/www_2006/presentations/pharr-keynote-gh06.pdf
19
Yesterdays Vendor Support

High-Level Graphics Language OpenGL D3D
Low-Level Device Driver
Todays New Vendor Support

High-Level Graphics OpenGL D3D
High-Level Compute Language Lang.
CUDA CTM CAL CTM HAL
Compute
API
Low-Level Low-Level Device Driver
Architecture Summary
GPU is a massively parallel architecture
Many problems map well to GPU-style computing GPUs have large amount of arithmetic capability Increasing amount of programmability in the pipeline
New features map well to GPGPU

Unified shaders Direct access to compute units in new APIs
Challenge:
How do we make the best use of GPU hardware?
Techniques, programming models, languages, evaluation tools
22

02 Gpu Architecture Overview s07

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

02 Gpu Architecture Overview s07

Загружено:

Авторское право:

Доступные форматы

GPU Architecture Overview

John Owens UC Davis

The Right-Hand Turn

[H&P Figure 1.1]

Why? [Architecture Reasons]

Control hardware doesnt do any math!

AMD Deerhound (K8L)

Why? [Technology Reasons]

We can continue to put more transistors on a chip

Whats Different about the GPU?

GPUs and multicore are different

A case for the GPU

The Rendering Pipeline

The Programmable Pipeline

Constant Input Assembler Vertex Shader Sampler

Constant Geometry Shader Setup Rasterizer

Constant Pixel Shader Sampler

Sampler Stream out

Courtesy David Blythe, Microsoft

Graphics HardwareTask Parallel

Application/ Command (CPU) Command Vertex

NVIDIA GeForce 6800 3D Pipeline

Memory Partition Courtesy Nick Triantos, NVIDIA

Primitive Assembly Per-Primitive Rasterization

Per-Fragment Image Composition? Per-Pixel Display

[From Akeley and Hanrahan, Real-Time Graphics Architectures]

Generalizing the Pipeline

Ex: Rasterization (triangles to fragments) Historically fixed function

GeForce 8800 GPU

programmable units Unified shader

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Parallel Data Cache

Load/store Global Memory

[courtesy of Ian Buck, NVIDIA] 17

Application/ Command (CPU) Command

Towards Programmable Graphics

Yesterdays Vendor Support

Low-Level Device Driver

Todays New Vendor Support

CUDA CTM CAL CTM HAL

Low-Level Low-Level Device Driver

New features map well to GPGPU

Вам также может понравиться