Gpu

graphics processing unit seminar report
.doc graphics processing unit seminar report.doc (Size: 391.5 KB /

Downloads: 186)
ABSTRACT
A Graphics Processing Unit (GPU) is a microprocessor that has been designed

specifically for the processing of 3D graphics. The processor is built with
integrated transform, lighting, triangle setup/clipping, and rendering engines,
capable of handling millions of math-intensive processes per second. GPUs
allow products such as desktop PCs, portable computers, and game consoles
to process real-time 3D graphics that only a few years ago were only
available on high-end workstations. Used primarily for 3-D applications, a
graphics processing unit is a single-chip processor that creates lighting
effects and transforms objects every time a 3D scene is redrawn. These are
mathematically-intensive tasks, which otherwise, would put quite a strain on
the CPU.
INTRODUCTION
There are various applications that require a 3D world to be simulated as

realistically as possible on a computer screen. These include 3D animations in
games, movies and other real world simulations. It takes a lot of computing
power to represent a 3D world due to the great amount of information that
must be used to generate a realistic 3D world and the complex mathematical
operations that must be used to project this 3D world onto a computer
screen. In this situation, the processing time and bandwidth are at a premium
due to large amounts of both computation and data.
The functional purpose of a GPU then, is to provide a separate dedicated

graphics resources, including a graphics processor and memory, to relieve
some of the burden off of the main system resources, namely the Central
Processing Unit, Main Memory, and the System Bus, which would otherwise
get saturated with graphical operations and I/O requests. The abstract goal of
a GPU, however, is to enable a representation of a 3D world as realistically as
possible. So these GPUs are designed to provide additional computational
power that is customized specifically to perform these 3D tasks.
WHATâ„¢S A GPU????

form the heart of modern graphics cards, relieving the CPU (central
processing units) of much of the graphics processing load. GPUs allow
products such as desktop PCs, portable computers, and game consoles to
process real-time 3D graphics that only a few years ago were only available
on high-end workstations.
Used primarily for 3-D applications, a graphics processing unit is a single-chip

processor that creates lighting effects and transforms objects every time a
3D scene is redrawn. These are mathematically-intensive tasks, which
otherwise, would put quite a strain on the CPU. Lifting this burden from the
CPU frees up cycles that can be used for other jobs.
However, the GPU is not just for playing 3D-intense videogames or for those
who create graphics (sometimes referred to as graphics rendering or content-
creation) but is a crucial component that is critical to the PC's overall system
speed. In order to fully appreciate the graphics card's role it must first be
understood.
Many synonyms exist for Graphics Processing Unit in which the popular one
being the graphics card .Itâ„¢s also known as a video card, video accelerator,
video adapter, video board, graphics accelerator, or graphics adapter.
HISTORY AND STANDARDS
The first graphics cards, introduced in August of 1981 by IBM, were

monochrome cards designated as Monochrome Display Adapters (MDAs). The
displays that used these cards were typically text-only, with green or white
text on a black background. Color for IBM-compatible computers appeared on
the scene with the 4-color Hercules Graphics Card (HGC), followed by the 8-
color Color Graphics Adapter (CGA) and 16-color Enhanced Graphics Adapter
(EGA). During the same time, other computer manufacturers, such as
Commodore, were introducing computers with built-in graphics adapters that
could handle a varying number of colors.
When IBM introduced the Video Graphics Array (VGA) in 1987, a new graphics
standard came into being. A VGA display could support up to 256 colors (out
of a possible 262,144-color palette) at resolutions up to 720x400. Perhaps the
most interesting difference between VGA and the preceding formats is that
VGA was analog, whereas displays had been digital up to that point. Going
from digital to analog may seem like a step backward, but it actually
provided the ability to vary the signal for more possible combinations than
the strict on/off nature of digital.
Over the years, VGA gave way to Super Video Graphics Array (SVGA). SVGA
cards were based on VGA, but each card manufacturer added resolutions and
increased color depth in different ways. Eventually, the Video Electronics
Standards Association (VESA) agreed on a standard implementation of SVGA
that provided up to 16.8 million colors and 1280x1024 resolution. Most
graphics cards available today support Ultra Extended Graphics Array
(UXGA). UXGA can support a palette of up to 16.8 million colors and
resolutions up to 1600x1200 pixels.
Even though any card you can buy today will offer higher colors and
resolution than the basic VGA specification, VGA mode is the de facto
standard for graphics and is the minimum on all cards. In addition to
including VGA, a graphics card must be able to connect to your computer.
While there are still a number of graphics cards that plug into an Industry
Standard Architecture (ISA) or Peripheral Component Interconnect (PCI) slot,
most current graphics cards use the Accelerated Graphics Port (AGP).
PERIPHERAL COMPONENT INTERCONNECT(PCI)
There are a lot of incredibly complex components in a computer. And all of

these parts need to communicate with each other in a fast and efficient
manner. Essentially, a bus is the channel or path between the components in
a computer. During the early 1990s, Intel introduced a new bus standard for
consideration, the Peripheral Component Interconnect (PCI).It provides direct
access to system memory for connected devices, but uses a bridge to
connect to the front side bus and therefore to the CPU.
The illustration above shows how the various buses connect to the CPU.
PCI can connect up to five external components. Each of the five connectors
for an external component can be replaced with two fixed devices on the
motherboard. The PCI bridge chip regulates the speed of the PCI bus
independently of the CPU's speed. This provides a higher degree of reliability
and ensures that PCI-hardware manufacturers know exactly what to design
for.
PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the

standard include increasing the speed from 33 MHz to 66 MHz and doubling
the bit count to 64. Currently, PCI-X provides for 64-bit transfers at a speed of
133 MHz for an amazing 1-GBps (gigabyte per second) transfer rate!
PCI cards use 47 pins to connect (49 pins for a mastering card, which can
control the PCI bus without CPU intervention). The PCI bus is able to work
with so few pins because of hardware multiplexing, which means that the
device sends more than one signal over a single pin. Also, PCI supports
devices that use either 5 volts or 3.3 volts. PCI slots are the best choice for
network interface cards (NIC), 2-D video cards, and other high-bandwidth
devices. On some PCs, PCI has completely superseded the old ISA expansion
slots.
Although Intel proposed the PCI standard in 1991, it did not achieve
popularity until the arrival of Windows 95 (in 1995). This sudden interest in
PCI was due to the fact that Windows 95 supported a feature called Plug and
Play (PnP). PnP means that you can connect a device or insert a card into
your computer and it is automatically recognized and configured to work in
your system. Intel created the PnP standard and incorporated it into the
design for PCI. But it wasn't until several years later that a mainstream
operating system, Windows 95, provided system-level support for PnP. The
introduction of PnP accelerated the demand for computers with PCI.
ACCELERATED GRAPHICS PORT (AGP)
The need for streaming video and real-time-rendered 3-D games requires an
even faster throughput than that provided by PCI. In 1996, Intel debuted the
Accelerated Graphics Port (AGP), a modification of the PCI bus designed
specifically to facilitate the use of streaming video and high-performance
graphics.
AGP is a high-performance interconnect between the core-logic chipset and

the graphics controller for enhanced graphics performance for 3D
applications. AGP relieves the graphics bottleneck by adding a dedicated
high-speed interface directly between the chipset and the graphics controller
as shown below.
Segments of system memory can be dynamically reserved by the OS for use

by the graphics controller. This memory is termed AGP memory or non-local
video memory. The net result is that the graphics controller is required to
keep fewer texture maps in local memory.
AGP has 32 lines for multiplexed address and data. There are an additional 8
lines for sideband addressing. Local video memory can be expensive and it
cannot be used for other purposes by the OS when unneeded by the graphics
of the running applications. The graphics controller needs fast access to local
video memory for screen refreshes and various pixel elements including Z-
buffers, double buffering, overlay planes, and textures.
For these reasons, programmers can always expect to have more texture
memory available via AGP system memory. Keeping textures out of the
frame buffer allows larger screen resolution, or permits Z-buffering for a
given large screen size. As the need for more graphics intensive applications
continues to scale upward, the amount of textures stored in system memory
will increase. AGP delivers these textures from system memory to the
graphics controller at speeds sufficient to make system memory usable as a
secondary texture store.
AGP Memory Allocation
During AGP memory initialization, the OS allocates 4K byte pages of AGP

memory in main (physical) memory. These pages are usually discontiguous.
However, the graphics controller needs contiguous memory. A translation
mechanism called the GART (Graphics Address Remapping Table), makes
discontiguous memory appear as contiguous memory by translating virtual
addresses into physical addresses in main memory through a remapping
table.
A block of contiguous memory space, called the Aperture is allocated above

the top of memory. The graphics card accesses the Aperture as if it were
main memory. The GART is then able to remap these virtual addresses to
physical addresses in main memory. These virtual addresses are used to
access main memory, the local frame buffer, and AGP memory.
AGP Transfers
AGP provides two modes for the graphics controller to directly access texture
maps in system memory: pipelining and sideband addressing. Using Pipe
mode, AGP overlaps the memory or bus access times for a request ("n") with
the issuing of following requests ("n+1"..."n+2"... etc.). In the PCI bus,
request "n+1" does not begin until the data transfer of request "n" finishes.
With sideband addressing (SBA), AGP uses 8 extra "sideband" address lines
which allow the graphics controller to issue new addresses and requests
simultaneously while data continues to move from previous requests on the
main 32 data/address lines. Using SBA mode improves efficiency and reduces
latencies.
AGP Specifications
The current PCI bus supports a data transfer rate up to 132 MB/s, while AGP
(at 66MHz) supports up to 533 MB/s! AGP attains this high transfer rate due
to it's ability to transfer data on both the rising and falling edges of the
66MHz clock
Mode Approximate
clock rate Transfer rate
(MBps)
1x 66 MHz 266
2x 133 MHz 533
4x 266 MHZ 1066
8x 533 MHZ 2133
The AGP slot typically provides performance which is 4 to 8 times faster than
the PCI slots inside your computer.
COMPONENTS OF GPU
There are several components on a typical graphics card:
Graphics Processor
The graphics processor is the brains of the card, and is typically one of three
configurations:
Graphics co-processor: A card with this type of processor can handle all of the
graphics chores without any assistance from the computer's CPU. Graphics
co- processors are typically found on high-end video cards.
Graphics accelerator: In this configuration, the chip on the graphics card

renders graphics based on commands from the computer's CPU. This is the
most common configuration used today.
Frame buffer: This chip simply controls the memory on the card and sends
information to the digital-to-analog converter (DAC) . It does no processing of
the image data and is rarely used anymore.
Memory â€œ The type of RAM used on graphics cards varies widely, but the
most popular types use a dual-ported configuration. Dual-ported cards can
write to one section of memory while it is reading from another section,
decreasing the time it takes to refresh an image.
Graphics BIOS â€œ Graphics cards have a small ROM chip containing basic
information that tells the other components of the card how to function in
relation to each other. The BIOS also performs diagnostic tests on the card's
memory and input/ output (I/O) to ensure that everything is functioning
correctly.
Digital-to-Analog Converter (DAC) â€œ The DAC on a graphics card is

commonly known as a RAMDAC because it takes the data it converts directly
from the card's memory. RAMDAC speed greatly affects the image you see on
the monitor. This is because the refresh rate of the image depends on how
quickly the analog information gets to the monitor.
Display Connector â€œ Graphics cards use standard connectors. Most cards
use the 15-pin connector that was introduced with Video Graphics Array
(VGA).
Computer (Bus) Connector â€œ This is usually Accelerated Graphics Port

(AGP). This port enables the video card to directly access system memory.
Direct memory access helps to make the peak bandwidth four times higher
than the Peripheral Component Interconnect (PCI) bus adapter card slots.
This allows the central processor to do other tasks while the graphics chip on
the video card accesses system memory.
Internal Organization of GPU
HOW IS 3D ACCELERATION DONE??????
There are different steps involved in creating a complete 3D scene. It is done

by different parts of the GPU, each of which are assigned a particular job.
During 3D rendering, there are different types of data the travel across the
bus. The two most common types are texture and geometry data. The
geometry data is the "infrastructure" that the rendered scene is built on. This
is made up of polygons (usually triangles) that are represented by vertices,
the end-points that define each polygon. Texture data provides much of the
detail in a scene, and textures can be used to simulate more complex
geometry, add lighting, and give an object a simulated surface.
Many new graphics chips now have accelerated Transform and Lighting (T&L)
unit, which takes a 3D scene's geometry and transforms it into different
coordinate spaces. It also performs lighting calculations, again relieving the
CPU from these math-intensive tasks.
Following the T&L unit on the chip is the triangle setup engine. It takes a
scene's transformed geometry and prepares it for the next stages of
rendering by converting the scene into a form that the pixel engine can then
process. The pixel engine applies assigned texture values to each pixel. This
gives each pixel the correct color value so that it appears to have surface
texture and does not look like a flat, smooth object. After a pixel has been
rendered it must be checked to see whether it is visible by checking the
depth value, or Z value.
A Z check unit performs this process by reading from the Z-buffer to see if
there are any other pixels rendered to the same location where the new pixel
will be rendered. If another pixel is at that location, it compares the Z value of
the existing pixel to that of the new pixel. If the new pixel is closer to the
view camera, it gets written to the frame buffer. If it's not, it gets discarded.
After the complete scene is drawn into the frame buffer the RAMDAC
converts this digital data into analog that can be given to the monitor for
display.
PERFORMANCE FACTORS OF GPU
There are many factors that affect the performance of a GPU. Some of the
factors that are directly visible to a user are given below.
Â¢ Fill Rate:
It is defined as the number of pixels or texels (textured pixels) rendered per

second by the GPU on to the memory . It shows the true power of the GPU.
Modern GPUs have fill rates as high as 3.2 billion pixels. The fill rate of a GPU
can be increased by increasing the clock given to it.
Â¢ Memory Bandwidth:
It is the data transfer speed between the graphics chip and its local frame
buffer. More bandwidth usually gives better performance with the image to
be rendered is of high quality and at very high resolution.
Â¢ Memory Management:
The performance of the GPU also depends on how efficiently the memory is
managed, because memory bandwidth may become the only bottle neck if
not managed properly.
Â¢ Hidden Surface removal:
A term to describe the reducing of overdraws when rendering a scene by not

rendering surfaces that are not visible. This helps a lot in increasing the
performance of GPU, by preventing overdraw so that the fill rate of the GPU
can be utilized to the maximum.
Now lets see how far GPUs have come as far as performance is concerned.
TYPES OF GPUSÂ¦Â¦..
There are mainly two types of GPUs, they are

1. Those that can handle all of the graphics processes without any assistance
from the computer's CPU. They are typically found on high-end workstations.
These are mainly used for Digital Content Creation like 3D animation as it
supports a lot of 3D functions.
Some of them areÂ¦Â¦
Quadro series from NVIDIA.
Wildcat series from 3D Labs.
FireGL series from ATI.
2. The chip on the graphics card renders graphics based on commands from
the computer's CPU. This is the most common configuration used today.
These are used for 3D gaming and such smaller tasks. They are found on
normal desktop PCs and are better known as 3D accelerators. These support
less functions and hence are cheaper.
Some of them areÂ¦Â¦.
Geforce series from NVIDIA.
Radeon series from ATI Technology ltd.
Kyro series from STM Microelectronics
Todayâ„¢s GPU can do what was hoped for and beyond. In the last year a
giant leap have been made in the GPU technology. The maximum amount of
RAM that can be found on a graphics card has jumped from 16MB to a
whopping 128MB. The premier company in GPU manufacturing ATI,who has
held the position past couple of years has given way to nVidia , whose new
ground breaking technology is leaving ATI to follow.
GEFORCE4
NVIDIA introduced the groundbreaking, top-to-bottom GeForce4 family of

GPUsâ€delivering new levels of graphics performance and display flexibility
to desktop and mobile PCs. nVidiaâ„¢s latest creation, the GeForce4 GPU is
the fourth edition in the famed GeForce lineup. It has wowed gamers and
artists alike by having the capability to make graphics better than life with
128MB of DDR memory and a super fast processor. What this means is that
the GeForce4 is capable of rendering graphics better than the eye can see.
Although this is extraordinary, there are no available monitors that can
handle displaying such graphics. Even so, the nVIDIA GeForce4 is truly an
extraordinary and groundbreaking graphics processing unit.
GeForce4 is the most complete family of graphics solutionsâ€from the

ferocious graphics power of the GeForce4 Ti, the worldâ„¢s fastest GPU; to
the multi-display flexibility of the mainstream GeForce4 MX; to the most
advanced mobile graphics available, GeForce4 Go.
Three of its kind has been releasedÂ¦Â¦.
Â¢ GeForce4 Ti series (NV25)
Â¢ GeForce4 MX series (NV17)
Â¢ GeForce4 Go series
Chip Architecture
The LMA II
In the upper left hand corner lies the LMA II. The LMA II controls the flow of
data from the chip to the GPU's memory (the DDR memory on the graphics
card). It controls how much data is sent to the memory and how fast it is sent
to the memory.
The Accuview AA Engine
The Accuview AA Engine, located to the left and in the middle of the chip,
does the antialiasing for the GPU.
The Interface Unit
At the bottom of the GeForce4 chip and to the very left is the interface unit.
This simply determines what interface (2X or 4X AGP) the computer has, and
adjusts to comply with it.
The Texture Unit
At the bottom of the chip and to the left-center is the texture unit. This unit is
dedicated to processing textures that must be rendered.
The nfiniteFX II Engine
This is the striking feature of GeForce4 series.
One among the classic computer graphics problem have been regarding the
rendering the realistic hair and fur. Animals with skin texture were easy to
render than those with fur.
For the first time ever, and only through the power of nfiniteFX II engine,
which includes support for dual vertex shaders,advanced pixel shader
pipelines , 3D textures, shadow buffers and z-correct bump mapping ,is it
now possible to render those.
It also improves rendering of shading.
A demo shot of a wolf man which has been rendered by GeForce4
The Display Unit
In the center to the right of the chip is the display unit. This unit is simple and
its only job is to determine the optimal display resolution for the display
being used. It also determines what type of display it is and optimizes its
performance.
The 2D/Video/HDVP Unit
In the bottom right corner of the chip is the 2D/Video/HDVP unit. This part of
the chip is dedicated to all those tasks that don't require 3D rendering. These
tasks include movies, 2D pictures, 2D games and almost any other program
that doesn't have 3D graphics
nView Technology:
Simply put, NVIDIAs Multi-Monitor and Dual Independent Display technology,

now called "nView", has been polished up nicely. nView is available on both
the GF4 Ti and MX and enables the following:
Multi-desktop tools
Â¢ Multi-desktop integration
Â¢ Full featured interface including explorer browser with birds-eye views of

desktops
Â¢ Toolbar control available as well for those needing a streamlined, low real-
estate interface
Window management
Â¢ Individual application control
Â¢ Window & dialog repositioning
Application management
Â¢ Transparency & colored transparency window options
Â¢ Extends functionality of all applications

Â¢ Pop- up menu control
GEFORCE4 TI
nVIDIA's crown graphics card is the GeForce4 Ti, a creation of admirable

brilliance. Its basic specifications include
Â¢ 63 million transistors (only 3 million more than GeForce3)
Â¢ Manufactured in TSMC's .15 Ã‚Âµ process
Â¢ Chip clock 225 - 300 MHz
Â¢ Memory clock 500 - 650 MHz
Â¢ Memory bandwidth 8,000 - 10,400 MB/s
Â¢ TnL Performance of 75 - 100 million vertices/s
Â¢ 128 MB frame buffer by default
Â¢ nfiniteFX II engine
Â¢ Accuview Anti Aliasing
Â¢ Light Speed Memory Architecture II
Â¢ nView
It has a 128 bit bus , double the size of previous busses and the only graphics
card to have one. This extra bus size improves performance dramatically. Its
considered as the worldâ„¢s fastest GPU available today.
NV25 series include four- GeForce4 Ti 4600 ,GeForce4 Ti 4400 , GeForce4 Ti

4200 & GeForce4 Ti 4200 w/AGP 8x
This chart shows comparison of performances of GeForce4 Ti 4600 and its

predecessor in FPS.
GEFORCE4 MX
With the GeForce4 MX graphics processing units (GPUs), NVIDIA provides a

new level of cost-effective, high-performance graphics to the mainstream PC
user. The GeForce4 MX is the cheapest and worst performing of the three
lines. The standard graphics card comes with 64MB of RAM, a hefty amount
that can handle almost any task. The GeForce4 MX has a 64bit bus, which is
also pretty standard for today's graphics cards. For the most part the only
real improvement from the GeForce3 MX to the GeForce4MX is the graphics
processing unit, which beats its predecessors, the GeForce2 and GeForce3,
hands down.
Performance comparisons
GEFORCE4 GO
NVIDIA introduces the fastest, most comprehensive and feature-rich

computing experience ever realized on a mobile platformâ€ the GeForce4
Go. With a revolutionary core of integrated technologies, the GeForce4 Go
ensures unparalleled performance, battery life, and DVD and video playback.
From notebooks powerful enough to replace your desktop PC, to those that
are both thin and light, NVIDIA's GeForce4 Go mobile GPUs provide
unprecedented mobile computing experiences.
CONCLUSION
From the introduction of the first 3D accelerator from 3dfx in 1996 these
units have come a long way to be truly called a Graphics Processing Unit. So
it is not a wonder that this piece of hardware is often referred to as an exotic
product as far as computer peripherals are concerned. By observing the
current pace at which work is going on in developing GPUs we can surely
come to a conclusion that we will be able to see better and faster GPUs in the
near future.
3D GLOSSARY
Given below are some of the terms that are closely associated with GPUs .
Vertex: Vertices are the basic unit of 3D graphics. All 3D geometry is

composed of vertices. Vertices contain X, Y and Z positions plus possible
vertex normal and texture mapping information.
Polygons or Triangles:
3D scenes are drawn using only triangles. This vastly simplifies the computer
creation of a 3D world. Triangles are defined as three x,y,z coordinates (one
for each vertex), a properly-oriented texture, and a shading definition. The
illusion of curved surfaces (fuselage, engines, wings, etc.) comes from well-
applied shading of a flat polygon. A good 3D-accelerator card will put the
textures together without any seams, white pixels that flash where the
triangles almost meet.
Vertices
Texture (Bitmap):
A Texture Map is a way of controlling the diffuse color of a surface on a pixel-

by-pixel basis, rather than by assigning a single overall value. This is
commonly achieved by applying a color bitmap image to the surface.
Rasterization:
The process of finding which pixels an individual polygon covers or, at a more
basic level, on which pixels an edge of a polygon lies on. This second aspect
will be dealt with first. Simply, it is the process of transforming a 3D image
into a set of colored pixels.
Rendering:
A term which is often used as a synonym for rasterization, but which can also
refer to the whole process of creating a 3D image. Rendering is the process
of producing bitmapped images from a view of 3-D models in a 3-D scene. It
is, in effect, "taking a picture" of the scene. An animation is a series of such
renderings, each with the scene slightly changed
Anti-aliasing:
A method to remove the jagged edges that appear in the computer

generated images.
Filtering:
Filtering is a method to determine the color of a pixel based on texture maps.

When you get very close to a polygon the texture map hasn't got enough info
to determine the real color of each pixel on the screen. The basic idea is
interpolation; this is a technique of using information of the real pixels
surrounding the unknown pixel to determine its color based on mathematical
averages.
Following are some methods of filtering starting (from the worst quality to
best)
1. Point Sampled filtering
2. Bilinear filtering
3. Trilinear filtering
Point filtering will just copy the color of the nearest real pixel, so it will
actually enlarge the real pixel. This creates a blocky effect and when moving
this blocks can change color quickly creating weird visual effects. This
technique is always used in software 3D engines because it requires very
little calculation power.
Bilinear filtering uses four adjacent texels (textured pixels) to interpolate the
output pixel value (unknown). This result in a smoother textured polygon as
the interpolation filters down the blockiness associated with point sampling.
The disadvantage of bilinear texturing is that it results in a fourfold increase
in texture memory bandwidth.
Trilinear filtering will combine Bilinear filtering in 2 Mip levels. This however
results in 8 texels being needed so memory bandwidth is multiplied by 2. This
usually means that the memory will suffer serious bandwidth problems so
trilinear filtering is usually used as an option.
Alpha-blending:
It is a technique to do transparency. It is an extra value added to the pixels of

a texture map to define how easy it is to look through the pixel. This way it is
possible to look through things and effects like realistic water and glass are
possible.
FPS:
FPS stands for Frames per Second. This is the main a unit of measure that is
used to describe graphics and video performance.
Texture Mapping:
In 3D graphics, texture mapping is the process of adding a graphic pattern to

the polygons of a 3D scene. Unlike simple shading, which uses colors to the
underlying polygons of the scene, texture mapping applies simple textured
graphics, also known as patterns or more commonly "tiles", to simulate walls,
floors, the sky, and so on.
T&L:
Transform and lighting (T&L) are two major steps in the 3D graphics pipeline.
They are computationally very intensive. Transform phase converts the
source 3D data to a form that can be rendered and Lighting phase calculates
lighting for the 3D environment. These two steps can be performed
simultaneously or consequently to a triangle. Traditionally the CPU performs
them, but nowadays some 3D accelerators also have dedicated hardware T&L
solutions
Frame buffer:
The memory used to store the pictures you see on screen. Under Direct3D,
there are actually two frame buffers. The front buffer is being displayed while
the back buffer is being drawn. When the back buffer is complete it becomes
the new front buffer and the old front buffer is cleared and the next frame is
drawn there. (This method of buffering is being referred as double buffering).
Triple buffering:
Here three buffers are used instead of two. If you have very fast refresh rates
there isn't much wait for the buffer swap. But if you have slow refresh rates,
the wait can be considerable. To stop this, 3Dfx drivers (for the Rush and
Voodoo2) can enable a third frame buffer, so when it is done with the back
buffer it starts immediately on the next one.
Z-Buffer:
A third buffer (or a fourth if triple buffering is enabled) where depth data is
stored, to help hardware to sort out which textures are visible and which are
hidden.
API (Application Program Interface):
It is a set of routines, protocols, and tools for building software applications. A

good API makes it easier to develop a program by providing all the building
blocks. Game programmers use Application Programming Interfaces to help
them program 3D functions more easily and so the program they write will
run on more types of hardware.
The three most popular graphical API's (Application Programming Interface's)

are:
Â¢ GLide by 3dfx,
Â¢ OpenGL by Silicon Graphics (and Microsoft), and
Â¢ Direct3D by Microsoft as part of their multimedia DirectX package.
REFERENCES
1. http://www.howstuffworks.com
2. http://www.tomshardware.com
3. http://www.intel.com
4. http://www.nvidia.com
5. http://www.extremetech.com
6. http://www.pcworld.com
ACKNOWLEDGEMENT
I extend my sincere thanks to Prof. P.V.Abdul Hameed, Head of the

Department for providing me with the guidance and facilities for the Seminar.
I express my sincere gratitude to Seminar coordinator
Mr. Berly C.J, Staff in charge, for their cooperation and guidance for preparing
and presenting this seminar.
I also extend my sincere thanks to all other faculty members of Electronics

and Communication Department and my friends for their support and
encouragement.
Anish Salam
CONTENTS
1. INTRODUCTION
2. WHATâ„¢S A GPU ???
3. HISTORY AND STANDARDS
4. PERIPHERAL COMPONENT INTERCONNECT
5. ACCELLERATED GRAPHICS PORT
6. COMPONENTS OF GPU
7. HOW IS 3D ACCELLERATION DONE ?
8. PERFORMANCE FACTOR OF GPU
9. TYPES OF GPU
10. GEFORCE4
11. GEFORCE4 TI
12. GEFORCE4 MX
13. GEFORCE4 GO
14. CONCLUSION
15. 3D GLOSSARY
16. REFERENCES
RE: graphics processing unit seminar report
Graphics Processing Unit

What is a GPU???

specifically for the processing of 3D graphics.
Main purpose of gpu is to simulate the 3D images as realistic as possible on

the computer screen
GPU is mainly needed to relieve the CPU from graphical computations so that
CPU can be used for other processes
History & Standards
The first GPU, introduced in 1981 by IBM, were monochrome cards

designated as Monochrome Display Adapters (MDAs).
Then came Colour Graphics Adapter(CGA) & then the Enhanced Graphics
Adapter(EGA).
IBM introduced the Video Graphics Array (VGA) in 1987. It could support up to
256 colors at resolutions up to 720x400
Then came the Super Video Graphics Array (SVGA) that supports upto 16.8
million colors and 1280x1024 resolution
Graphics Hardware Interface
Peripheral Component Interconnect(PCI)
Accelerated Graphics Port (AGP)
PERIPHERAL COMPONENT INTERCONNNECT- EXPRESS(PCI-E)
Graphics Processor
The processor is designed specifically to perform floating point calculations,

which are fundamental to 3D graphics processing.
Graphics Accelerator
A graphics accelerator assists graphics processing by executing instructions

concurrently.
Frame Buffer
A frame buffer is a video output device that drives a video display from a
memory buffer containing a complete frame of data.
Memory
Video memory may be used for storing screen image as well as Z-Buffer
which manages the depth coordinates in 3D graphics.
Graphics BIOS
This contains the basic program, which is usually hidden, that governs the
video card's operations and provides the instructions that allow the computer
and software to interact with the card.
Digital-to-Analog Converter (DAC)
The RAMDAC or Random Access Memory Digital-to-Analog Converter,

converts digital signals to analog signals for use by a computer display that
uses analog inputs such as CRT displays.
Display Connector
This include connection system b/w the video card and the display devices
such as monitor or a television.
Need for 3DAcceleration

Everything that is displayed on the computer screen is 2D
The push for more realism, more finely-detailed graphics and faster speeds in
such programs as means that more 3D work must be done in a shorter period
of time has resulted in need for 3D acceleration
How is 3d acceleration done???
Geometry -CPU has the job of determining were the object or image have to
be placed to make an fig on the screen.
Transform -This involves rotation ,scaling and translating an object or image.
Rendering -After taking information out of memory it is sent to the GPU &
process those Bit Map images thus making an image in 3D.
Filtering -This is done by smoothing over the block bitmaps. Textures will
seem more streamline and realistic.
Double Buffering -This uses two buffers to speed up the computation process.
Data in one buffer is being processed while the next set of data is read into
the other one.
Flat Shading -It shades each polygon of an object based on the angle
between the polygon's surface normal and the direction of the light source,
their respective colors and the intensity of the light source.
Mipmapping - The image becomes bigger as we move towards it & small as

we move away from it, as we perceive it in the real world
Atmosphere - An effect used in outdoor scenes by blurring objects that are in

the distance.
Lighting - This cause color shading, light reflection, shadows and other effects
to be added to objects based on their position & the position of light sources.
Z-Buffering - To determine which objects, or parts of objects, are visible and

which are hidden behind other objects.
Performance Factors of GPU
Fillrate
The fillrate usually refers to the number of pixels a video card can render and
write to video memory in a second.
Memory Bandwidth
Memory bandwidth is the rate at which data can be read from or stored into a
memory by a processor.
Memory Clock
This tells us the amount of memory bandwidth a graphics card has.
Memory Interface (Memory Bus)
The larger the Memory Interface width, faster the speed of data traveling in
it.
Core Clock
The actual speed at which the graphics processor on a video card operates.
Some Terms Associated On GPU
ANTI ALIASING -Anti-Aliasing is a method of fooling the eye that a jagged

edge is really smooth.
ROP (Raster Operators) -It is the task of taking an image (shapes) and
converting it into a pixels or dots for output on a screen.
Alpha-blending -It is a technique to do transparency.
FPS -This is the main a unit of measure that is used to describe graphics and
video performance.
Stream Processing
A stream is simply a set of records that require similar computation.
A technique used to accelerate the processing of many types of video and

image computations is called stream processing.
Streams provide data parallelism
GPUs are stream processors â€œ processors that can operate in parallel by
running a single kernel on many records in a stream at once.
Kernels are the functions that are applied to each element in the stream.
API (Application Program Interface)
It is a set of routines, protocols, and tools for building software applications

GLide by 3dfx
OpenGL by Silicon Graphics
Direct3D by Microsoft
TYPES OF GPU
DEDICATED GPU
INTEGRATED GRAPHICS SOLUTIONS
HYBRID SOLUTIONS
Some Examples of GPU
NVIDIAâ„¢S GFORCE
GFORCE 2series, GFORCE 3series, GFORCE 4series, GFORCE 5series, GFORCE

6series, GFORCE 7series, GFORCE 8series, GFORCE 9series, GFORCE
100series, GFORCE 200series,GFORCE 300series
Latest being GFORCE 400series
AMDâ„¢S RADEON
MACHseries,RAGEseries,RADEON R100series, RADEON R200series, RADEON

R300series, RADEON R400series, RADEON R500series, RADEON R600series,
RADEON R700series.
Latest being EVERGREEN5series
CUDA
CUDA (Compute Unified Device Architecture) is a parallel computing

architecture developed by NVIDIA
CUDA exploits the parallel computational power of the GPU where hundreds
of on-chip processor cores simultaneously communicate and cooperate to
solve complex computing problems, transforming the GPU into a massively
parallel processor.
The programming interface of the CUDA technology uses the familiar

programming language (such as NVIDIAâ„¢s C-like syntax) to code algorithms
that send all calculations to the GPU
Conclusion
From the introduction of the first GPUâ„¢s in the 1970â„¢s to the most recent
ones manufactured today, the present world of graphics has changed
enormously and would have never been the same without it.
Today lot of applications have become faster and efficient by using GPU
technology thus saving lot of time in many scenarios.
RE: graphics processing unit seminar report
GRAPHICS PROCESSING UNIT
(GPU)
ABSTRACT

allow products such as desktop PCs, portable computers, and game consoles
to process real-time 3D graphics that only a few years ago were only
available on high-end workstations. Used primarily for 3-D applications, a
graphics processing unit is a single-chip processor that creates lighting
effects and transforms objects every time a 3D scene is redrawn. These are
mathematically-intensive tasks, which otherwise, would put quite a strain on
the CPU.
CONTENTS
1. INTROUCTION
2. WHATâ„¢S A GPU
3. DIFFERENCE B/W CPU AND GPU
4. HISTORY & STQANDARDS
5. PERIPHERAL COMPONENT INTERCONNECT
6. ACCELERATED GRAPHICS PORT
7. PERIPHERAL COMPONENT INTERCONNECT-EXPRESS

8. COMPONENTS OF GPU
9. HOW IS 3D ACCELERATION DONE
10. PERFORMANCE FACTORS OF GPU
11. TERMS ASSOCIATED WITH GPU
12. STREAM PROCESSING
13. TYPES OF GPU
14. APPLICATION PROGRAM INTERFACE
15. MANUFACTURERS AND EXAMPLES
16. APPLICATION OF GPU
17. LATEST IN GPU
INTRODUCTION
There are various applications that require a 3dimensional (3D) world to be

simulated in a much realistic manner on a computer screen. These include 3D
animations in games, movies and other real world simulations. It takes
immense computing power to represent a 3D image due to the enormous
amount of information and the complex mathematical operations that needs
to be processed to project this 3D image onto a computer screen. In this
situation, the processing time and bandwidth are at a premium due to large
amounts of both computation and data.
The functional purpose of a GPU is to provide separate, dedicated graphics

resources, including a graphics processor and memory, to relieve some of the
burden on the main system resources, namely the Central Processing
Unit(CPU), Main Memory, and the System Bus, which would otherwise get
saturated with graphical operations and I/O requests. The abstract goal of a
GPU, however, is to enable a representation of a 3D world as realistically as
possible. So these GPUs are designed to provide additional computational
power that is customized specifically to perform these 3D tasks.
WHATâ„¢S A GPU????

capable of handling millions of math-intensive processes per second. GPU's
form the heart of modern graphics cards, relieving the CPU (central
processing units) of much of the graphics processing load. GPUs allow
products such as desktop PCs, portable computers, and game consoles to
process real-time 3D graphics.
Used primarily for 3-D applications, a graphics processing unit is a single-chip

processor that creates lighting effects and transforms objects every time a
3D scene is redrawn. These are mathematically-intensive tasks, which
otherwise, would put quite a strain on the CPU. Lifting this burden from the
CPU frees up cycles that can be used for other jobs.
However, the GPU is not just for playing 3D-intense video games or for those
who create graphics (sometimes referred to as graphics rendering or content-
creation) but is a crucial component that is critical to the PC's overall system
speed. In order to fully appreciate the graphics card's role it must first be
understood.
Many synonyms exist for Graphics Processing Unit in which the popular one
being the graphics card .Itâ„¢s also known as a video card, video accelerator,
video adapter, video board, graphics accelerator, or graphics adapter.
DIFFERENCE B/W CPU & GPU
Central Processing Unit or the CPU is where all the program instructions are
executed
Graphics Processing Unit or GPU is a dedicated piece of hardware that

processes graphic.
GPU have a lot more transistors as compared to CPU. But GPU transistors run
at a
much slower rate than CPU transistors
GPU is called a parallel processor as there are usually several processors on

one chip that
are specialized at interpreting and drawing graphics on to a display very

quickly
CPU on the other hand tends to deal with one instruction at a time
HISTORY AND STANDARDS
The first graphics cards, introduced in August of 1981 by IBM, were

monochrome cards designated as Monochrome Display Adapters (MDAs). The
displays that used these cards were typically text-only, with green or white
text on a black background. Color for IBM-compatible computers appeared on
the scene with the 4-color Hercules Graphics Card (HGC), followed by the 8-
color Color Graphics Adapter (CGA) and 16-color Enhanced Graphics Adapter
(EGA). During the same time, other computer manufacturers, such as
Commodore, were introducing computers with built-in graphics adapters that
could handle a varying number of colors.
When IBM introduced the Video Graphics Array (VGA) in 1987, a new graphics
standard came into being. A VGA display could support up to 256 colors (out
of a possible 262,144-color palette) at resolutions up to 720x400.
Over the years, VGA gave way to Super Video Graphics Array (SVGA). SVGA
cards were based on VGA, but each card manufacturer added resolutions and
increased color depth in different ways. Eventually, the Video Electronics
Standards Association (VESA) agreed on a standard implementation of SVGA
that provided up to 16.8 million colors and 1280x1024 resolution. Most
graphics cards available today support Ultra Extended Graphics Array
(UXGA). UXGA can support a palette of up to 16.8 million colors and
resolutions up to 1600x1200 pixels.
Even though any card you can buy today will offer higher colors and
resolution than the basic VGA specification, VGA mode is the basic standard
for graphics and is the minimum on all cards. Many graphics cards were using
the Peripheral Component Interconnect (PCI) and the Accelerated Graphics
Port (AGP).Now a days we use Peripheral Component Interconnect
-express(PCI-E) slots
PERIPHERAL COMPONENT INTERCONNECT(PCI)
There are a lot of incredibly complex components in a computer. And all of

these parts need to communicate with each other in a fast and efficient
manner. Essentially, a bus is the channel or path between the components in
a computer. During the early 1990s, Intel introduced a new bus standard for
consideration, the Peripheral Component Interconnect (PCI).It provides direct
access to system memory for connected devices, but uses a bridge to
connect to the front side bus and therefore to the CPU.
PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the

standard include increasing the speed from 33 MHz to 66 MHz and doubling
the bit count to 64. Currently, PCI-X provides for 64-bit transfers at a speed of
133 MHz for an amazing 1-GBps (gigabyte per second) transfer rate!
Although Intel proposed the PCI standard in 1991, it did not achieve
popularity until the arrival of Windows 95.This sudden interest in PCI was due
to the fact that Windows 95 supported a feature called Plug and Play (PnP).
PnP means that you can connect a device into your computer and it is
automatically recognized and configured to work in the system.
ACCELERATED GRAPHICS PORT (AGP)
The need for streaming video and real-time-rendered 3-D games requires an
even faster throughput than that provided by PCI. In 1996, Intel debuted the
Accelerated Graphics Port (AGP), a modification of the PCI bus designed
specifically to facilitate the use of streaming video and high-performance
graphics.
AGP is a high-performance interconnect between the core-logic chipset and

the graphics controller for enhanced graphics performance for 3D
applications. AGP relieves the graphics bottleneck by adding a dedicated
high-speed interface directly between the chipset and the graphics controller
as shown below.
The current PCI bus supports a data transfer rate up to 132 MB/s, while AGP
(at 66MHz) supports up to 266 MB/s! AGP attains this high transfer rate due
to it's ability to transfer data on both the rising and falling edges of the
66MHz clock .The AGP slot typically provides performance which is 4 to 8
times faster than the PCI slots inside your computer.
PERIPHERAL COMPONENT INTERCONNNECT-EXPRESS(PCI-E)
PCI-E, unlike previous PC expansion standards, is structured around point-to-

point serial links, a pair of which (one in each direction) make up lanes; rather
than a shared parallel bus. These lanes are routed by a hub on the main-
board acting as a crossbar switch. This dynamic point-to-point behavior
allows more than one pair of devices to communicate with each other at the
same time. In contrast, older PC interfaces had all devices permanently wired
to the same bus; therefore, only one device could send information at a time.
This format also allows channel grouping, where multiple lanes are bonded to
a single device pair in order to provide higher bandwidth.
The PCI-E 2.0 standard supports a data transfer of 500MB/s. The PCI-E 2.0
standard uses a base clock speed of 5.0 GHz, while the first version operates
at 2.5 GHz.
Currently PCI-E 3.0 is being developed and is said to have 8GB/s data transfer
rate which include a number of optimizations for enhanced signaling and data
integrity, including transmitter and receiver equalization, PLL improvements,
clock data recovery, and channel enhancements for currently supported
COMPONENTS OF GPU
There are several components on a typical graphics card:
Graphics Processor
A GPU is a dedicated processor optimized for accelerating graphics. The

processor is designed specifically to perform floating-point calculations, which
are fundamental to 3D graphics rendering. The main attributes of the GPU
are the core clock frequency, which typically ranges from 250 MHz to 4 GHz
and the number of pipelines (vertex and fragment shaders), which translate a
3D image characterized by vertices and lines into a 2D image formed by
pixels.
Graphics accelerator
A graphics accelerator assists graphics rendering by supplying primitives that

it can execute concurrently with and more efficiently than the x86 CPU. One
reason the accelerator can be more efficient than the CPU is because it lives
closer to the graphics memory; it does not have to transfer raw pixel data
over a slow (relative to the speed of the graphics RAM) general BUS and
chipset. But the main reason a graphics accelerator improves overall graphics
performance is because it executes concurrently with the CPU. This means
that while the CPU is calculating the coordinates for the next set of graphics
commands to issue, the graphics accelerator can be busy filling in the
polygons for the current set of graphics commands. This dividing up of
computation is often referred to as load balancing.
Frame buffer
A frame buffer is a video output device that drives a video display from a
memory buffer containing a complete frame of data.
The information in the memory buffer typically consists of color values for
every pixel (point that can be displayed) on the screen. Color values are
commonly stored in 1-bit monochrome, 4-bit palletized, 8-bit palletized, 16-
bit high color and 24-bit true color formats. An additional alpha channel is
sometimes used to retain information about pixel transparency. The total
amount of the memory required to drive the framebuffer depends on the
resolution of the output signal, and on the color depth and palette size.
Memory
The memory capacity of most modern video cards ranges from 128 MB to 4
GB, though very few cards actually go over 1 GB. Since video memory needs
to be accessed by the GPU and the display circuitry, it often uses special
high-speed or multi-port memory, such as VRAM, WRAM, SGRAM, etc. Around
2003, the video memory was typically based on DDR technology. During and
after that year, manufacturers moved towards DDR2, GDDR3, GDDR4, and
even GDDR5 is utilized. The effective memory clock rate in modern cards is
generally between 400 MHz and 3.8 GHz.Video memory may be used for
storing other data as well as the screen image, such as the Z-buffer, which
manages the depth coordinates in 3D graphics, textures, vertex buffers, and
compiled shader programs.
Graphics BIOS
The video BIOS or firmware contains the basic program, which is usually
hidden, that governs the video card's operations and provides the
instructions that allow the computer and software to interact with the card. It
may contain information on the memory timing, operating speeds and
voltages of the graphics processor, RAM, and other information. It is
sometimes possible to change the BIOS (e.g. to enable factory-locked
settings for higher performance), although this is typically only done by video
card overclockers and has the potential to irreversibly damage the card.
Digital-to-Analog Converter (DAC)
The RAMDAC, or Random Access Memory Digital-to-Analog Converter,

converts digital signals to analog signals for use by a computer display that
uses analog inputs such as CRT displays. The RAMDAC is a kind of RAM chip
that regulates the functioning of the graphic card. Depending on the number
of bits used and the RAMDAC-data-transfer rate, the converter will be able to
support different computer-display refresh rates. With CRT displays, it is best
to work over 75 Hz and never under 60 Hz, in order to minimize flicker.(With
LCD displays, flicker is not a problem.) Due to the growing popularity of
digital computer displays and the integration of the RAMDAC onto the GPU
die, it has mostly disappeared as a discrete component. All current LCDs,
plasma displays and TVs work in the digital domain and do not require a
RAMDAC. There are few remaining legacy LCD and plasma displays that
feature analog inputs (VGA, component, SCART etc.) only. These require a
RAMDAC, but they reconvert the analog signal back to digital before they can
display it, with the unavoidable loss of quality stemming from this digital-to-
analog-to-digital conversion.
Display Connector
The most common connection systems between the video card and the
computer display are:
Video Graphics Array (VGA) (DE-15)
Analog-based standard adopted in the late 1980s designed for CRT displays,
also called VGA connector. Some problems of this standard are electrical
noise, image distortion and sampling error evaluating pixels.
Digital Visual Interface (DVI)
Digital-based standard designed for displays such as flat-panel displays

(LCDs, plasma screens, wide high-definition television displays) and video
projectors. It avoids image distortion and electrical noise, corresponding each
pixel from the computer to a display pixel, using its native resolution.
Video in Video out (VIVO) for S video, composite video and component video
Included to allow the connection with televisions, DVD players, video

recorders and video game consoles. They often come in two 9-pin Mini-DIN
connector variations, and the VIVO splitter cable generally comes with either
4 connectors (S-Video in and out + composite video in and out), or 6
connectors (S-Video in and out + component PB out + component PR out +
component Y out [also composite out] + composite in).
High Definition Multimedia Interface (HDMI)
An advanced digital audio/video interconnect released in 2003 and is

commonly used to connect game consoles and DVD players to a display.
HDMI supports copy protection through HDCP.
Display Ports
An advanced license- and royalty-free digital audio/video interconnect

released in 2007. DisplayPort intends to replace VGA and DVI for connecting
a display to a computer.
Other types of connection systems
Composite video
Analog system with lower resolution; it uses the RCA connector
Component video
It has three cables, each with RCA connector (YCBCR for digital component,
or YPBPR for analogue component); it is used in projectors, DVD players and
some televisions.
DB13W3
An analog standard once used by Sun Microsystems, SGI and IBM.

DMS-59
A connector that provides two DVI outputs on a single connector.
Computer (Bus) Connector - This is usually Accelerated Graphics Port (AGP).

This port enables the video card to directly access system memory. Direct
memory access helps to make the peak bandwidth four times higher than the
Peripheral Component Interconnect (PCI) bus adapter card slots. This allows
the central processor to do other tasks while the graphics chip on the video
card accesses system memory.
Internal Organization of GPU
The Need for 3D Acceleration
It may be a valid question to ask why it is that special 3D cards are needed
today. After all, everything that is displayed on the computer screen is 2D,
even 3D images projected to 2D. And 3D graphics have been used on
computers for years.
The reason that specialized 3D accelerators are becoming popular is that

software today is trying to do more in 3D than has ever been done before.
The push for more realism, more finely-detailed graphics, and faster speeds
in such programs as action games, flight simulators, graphics programs and
CAD applications, means that more 3D work must be done in a shorter period
of time.
There is an obvious parallel between today's quandary with 3D and a similar

one that occurred in the early 90s when graphical operating systems became
popular. At that time, most video cards had no acceleration functions at all.
When people started running Windows, their CPU had to do all the work of
drawing all the graphics on the screen, which caused everything to slow
down tremendously. To combat this problem, accelerators were designed
that did much of this work with specialized hardware, instead of forcing the
system processor to do it.
Similarly today, it is not necessary to have a 3D graphics card to do 3D

graphics, but the large amount of computation work necessary to translate
3D images to 2D in a realistic manner means that without specialized
hardware to do this work, it must be done by the processor, using much
slower software. Using a 3D accelerator allows programs--especially games,
where the screen image must be recomputed many times per second--to
display virtual 3D worlds with a level of detail and color that is impossible
with a standard 2D video card.
HOW IS 3D ACCELERATION DONE??????

The primary steps are geometry, transform, and rendering all of which is
handled by both the system CPU and the video card processor. Here are the
steps in a better perspective. All steps in 3D are handled in scenes
-The Geometry, well we all know what a triangle looks like, right? Well your
CPU has the job of determining were the triangle have to be placed to make
an object on screen. This is usually done over a wire frame as seen in Auto
CAD programs.
-Transform, now we have to put these triangles together. Your CPU will put a
model of this image together in system memory on a wire frame. Not only
this but we have to figure out the lighting of the triangles while in memory.
This usually occurs on your systems CPU and not on the Video card
processor.
-Render, now after taking information out of memory it is most likely sent to
the Video Cards processor. When information hits the graphic board
processor we will put those Bit Map images over our triangles thus making an
image in 3D.
Now lets look at some more steps that can come into play with 3D
operations. This is added to the basic steps above to make for a better scene.
-Filter, this is Bi-linear and Tri-linear filtering. This is done by smoothing over
those blocky bitmaps. Textures will seem more streamline and realistic. If you
play Quake II you know what I mean. If you have played Quake I you are also
a witness to the blocky square looking graphics.
-Double Buffering, as we discussed earlier. We need to display a scene and

work on one at the same time. This will give you a streamline appearance. If
your buffering is out of whack you will know it soon.
-Flat Shading, this is similar to the filters above. We are taking those triangle
with color and more or less bleeding them together. This really adds more to
the realism rather than a red block here and a blue block next to it.
-Mipmapping, this is overlooked but very important in game play. Let say
your walking inside a scene towards dog, you want the dog to be bigger than
what you seen it a mile away if you are standing right in front of it. This is
mipmapping. Textures are basically swapped while moving to ad more
realism to the scene. Cool huh!?
-Atmosphere, if your dog in the scene above is smoking a cigarette we also

want to see the hazy smoke. This is were the atmosphere can come into play.
On flight simulators you will see haze or fog while flying this is another
example of atmosphere.
-Lighting, this effect will light up the dog so you can see it or even make his
cigarette flare a little. Your dog may even show the light intensify on his
snout along with the haze over him, this is were lighting comes in.
-Z-Buffering, this comes down to objects that are obscured by another. We

don't want to draw that part of the scene until it is moved. This is another
way of improving performance.
PERFORMANCE FACTORS OF GPU
FILLRATE
The fillrate usually refers to the number of pixels a video card can render and
write to video memory in a second. In this case, fillrates are given in
megapixels per second or in gigapixels per second (in the case of newer
cards), and they are obtained by multiplying the number of raster operations
(ROPs) by the clock frequency of the graphics processor unit (GPU) of a video
card.
The number of textured pixels the card can render to the screen every
second. To render a 3D scene, textures are mapped over the top of polygon
meshes. This is called texture mapping and is accomplished by texture
mapping units (TMUs) on the videocard. Texture fill rate is a measure of the
speed with which a particular card can perform texture mapping.
MEMORY BANDWIDTH
Memory bandwidth is the rate at which data can be read from or stored into a
semiconductor memory by a processor. Memory bandwidth is usually
expressed in units of bytes/second, though this can vary for systems with
natural data sizes that are not a multiple of the commonly used 8-bit bytes.
Memory bandwidth that is advertised for a given memory or system is usually

the maximum theoretical bandwidth. In practice the observed memory
bandwidth will be less than (and is guaranteed not to exceed) the advertised
bandwidth. A variety of computer benchmarks exist to measure sustained
memory bandwidth using a variety of access patterns. These are intended to
provide insight into the memory bandwidth that a system should sustain on
various classes of real applications.
CORE CLOCK
The actual speed at which the graphics processor on a video card operates.
Core clock is measured in megahertz (MHz). The clock speed of a chip,
combined with the number/configuration of the pipelines in the chip, give a
pretty accurate picture of what the performance of the chip will be.
The core clock speed can sometimes be changed on newer cards where users
want to gain a performance boost. This is called overclocking and it can
usually be done using third-party utilities or the drivers provided by the video
card manufacturer.
MEMORY CLOCK
The memory clock, along with the size of the memory bus, tells us the
amount of memory bandwidth a graphics card has. The more memory
bandwidth a card has, the better it can handle higher resolutions.
Like the core clock, the memory clock of most cards can be manually
increased through the driver. Though highly overclockably memory is
somewhat rare.
Memory Interface (Memory Bus):
The larger the Memory Interface width, faster the speed of data traveling in
it. A good analogy would be 8-lane highway would allow more cars to travel
than a 4-lane highway. The unit for measuring Memory Interface is bits (for
example, 256 bit).
SOME TERMS ASSOCIATED ON GPU
ANTI ALIASING
Anti-Aliasing is a method of fooling the eye that a jagged edge is really

smooth. Anti-Aliasing is often referred in games and on graphics cards. In
games especially the chance to smooth edges of the images goes a long way
to creating a realistic 3D image on the screen. Remember though that Anti-
Aliasing does not actually smooth any edges of images it merely fools the
eye.
ROP (Raster Operators)
The Raster Operator manages near latter phase of image processing before it
is displayed onto the monitor screen. The Raster Operators make sure the
images are compressed neatly (if it's from a larger one), don't overlap one
another (Anti-Aliasing) and sent to the memory (Output Buffer) where it is
stored before being displayed on the monitor.
Alpha-blending
It is a technique to do transparency. It is an extra value added to the pixels of

a texture map to define how easy it is to look through the pixel. This way it is
possible to look through things and effects like realistic water and glass are
possible.
Texture Mapping
In 3D graphics, texture mapping is the process of adding a graphic pattern to

the polygons of a 3D scene. Unlike simple shading, which uses colors to the
underlying polygons of the scene, texture mapping applies simple textured
graphics, also known as patterns or more commonly "tiles", to simulate walls,
floors, the sky, and so on.
FPS
FPS stands for Frames per Second. This is the main a unit of measure that is
used to describe graphics and video performance.
STREAM PROCESSING
A new concept is to use a general purpose graphics processing unit as a

modified form of stream processor. This concept turns the massive floating-
point computational power of a modern graphics accelerator's shader
pipeline into general-purpose computing power, as opposed to being hard
wired solely to do graphical operations. In certain applications requiring
massive vector operations, this can yield several orders of magnitude higher
performance than a conventional CPU.
In this Parallel Processing-carrying out many different operations at the same

time is the main feature. Stream Processors are basically smaller processors
of the GPU, which support the feature of forming and displaying images. Their
major function is to act as smaller processors for Parallel Processing
applications of the hardware.
For example, Stream Processors may be used to render (create images) grass
on the field in an open area or create walls in a closed room. Having more
Stream Processors improve the performance speed of a Graphics card
significantly.
TYPES OF GPU
DEDICATED GPU
A dedicated GPU is not necessarily removable, nor does it necessarily

interface with the motherboard in a standard fashion. The term "dedicated"
refers to the fact that dedicated graphics cards have RAM that is dedicated to
the card's use, not to the fact that most dedicated GPUs are removable.
Dedicated GPUs for portable computers are most commonly interfaced
through a non-standard and often proprietary slot due to size and weight
constraints. Such ports may still be considered PCIe or AGP in terms of their
logical host interface, even if they are not physically interchangeable with
their counterparts.
Technologies such as SLI by NVIDIA and CrossFire by ATI allow multiple GPUs
to be used to draw a single image, increasing the processing power available
for graphics.
Integrated graphics solutions, or shared graphics solutions
These are graphics processors that utilize a portion of a computer's system

RAM rather than dedicated graphics memory. Computers with integrated
graphics account for 90% of all PC shipments[6]. These solutions are less
costly to implement than dedicated graphics solutions, but are less capable.
Historically, integrated solutions were often considered unfit to play 3D
games or run graphically intensive programs such as Adobe Flash[citation
needed]. Examples of such IGPs would be offerings from SiS and VIA circa
2004.
HYBRID SOLUTIONS
The most common implementations of this are ATI's HyperMemory and

NVIDIA's TurboCache. Hybrid graphics cards are somewhat more expensive
than integrated graphics, but much less expensive than dedicated graphics
cards. These share memory with the system and have a small dedicated
memory cache, to make up for the high latency of the system RAM.
Technologies within PCI Express can make this possible.
API (Application Program Interface)
It is a set of routines, protocols, and tools for building software applications. A

good API makes it easier to develop a program by providing all the building
blocks. Game programmers use Application Programming Interfaces to help
them program 3D functions more easily and so the program they write will
run on more types of hardware.
The three most popular graphical API's (Application Programming Interface's)

are:
Â¢ GLide by 3dfx,
Â¢ OpenGL by Silicon Graphics (and Microsoft), and
Â¢ Direct3D by Microsoft as part of their multimedia DirectX package.
GLIDE3DFX
Glide is a proprietary 3D graphics API developed by 3dfx Interactive for their

Voodoo Graphics 3D accelerator cards. It was dedicated to gaming
performance, supporting geometry and texture mapping primarily, in data
formats identical to those used internally in their cards. Further refinement of
Microsoft's Direct3D and full OpenGL implementations from other graphics
card vendors, in addition to growing competition in 3D hardware, eventually
caused Glide to become superfluous.
DIRECTX
The DirectX software development kit (SDK) consists of runtime libraries in

redistributable binary form, along with accompanying documentation and
headers for use in coding.
Direct3D is used to render three dimensional graphics in applications where

performance is important, such as games. Direct3D also allows applications
to run fullscreen instead of embedded in a window, though they can still run
in a window if programmed for that feature. Direct3D uses hardware
acceleration if it is available on the graphics card, allowing for hardware
acceleration of the entire 3D rendering pipeline or even only partial
acceleration. Direct3D exposes the advanced graphics capabilities of 3D
graphics hardware, including z-buffering, anti-aliasing, alpha blending,
mipmapping, atmospheric effects, and perspective-correct texture mapping.
Integration with other DirectX technologies enables Direct3D to deliver such
features as video mapping, hardware 3D rendering in 2D overlay planes, and
even sprites, providing the use of 2D and 3D graphics in interactive media
titles.
OPENGL
The DirectX software development kit (SDK) consists of runtime libraries in

redistributable binary form, along with accompanying documentation and
headers for use in coding. OpenGL is the industry-standard interface to
modern programmable graphics hardware widely used for games, animation,
CAD/CAM, medical imaging, and other applications that visualize and
manipulate high performance 2D and 3D graphics. Apple's implementation of
OpenGL has been optimized for the Macintosh platform and includes a suite
of extensions that give developers access to advanced graphics hardware
capabilities. OpenGL's basic operation is to accept primitives such as points,
lines and polygons, and convert them into pixels. This is done by a graphics
pipeline known as the OpenGL state machine. Most OpenGL commands either
issue primitives to the graphics pipeline, or configure how the pipeline
processes these primitives.
MANUFACTURERS OF GPU
AMD (ATI division), Matrox Graphics, Nvidia, S3 Graphics, Intel, SiS, PowerV
SOME EXAMPLES OF GPU
NVIDIAâ„¢S GFORCE
GFORCE 2series, GFORCE 3series, GFORCE 4series, GFORCE 5series, GFORCE
6series, GFORCE 7series, GFORCE 8series, GFORCE 9series, GFORCE
100series, GFORCE 200series
Latest being GFORCE 300series
GFORCE100series comparisons
AMDâ„¢S RADEON :-MACHseries,RAGEseries,RADEON R100series, RADEON

R200series, RADEON R300series, RADEON R400series, RADEON R500series,
RADEON R600series, RADEON R700series.
Latest being EVERGREEN5series
APLLICATIONS OF GPU
Interactive Gaming- gaming industries
Motion Picture and Gaming Industries- Motion Picture and cinematography
Computer Aided Design & Manufacturing- like Autodesk
Visual Simulations- Pilot / Driver Training
General Computing- Emerging Area of Research like molecular modelling
Consumer- handheld, consoles, cell phones
LATEST IN GPU
CUDA (an acronym for Compute Unified Device Architecture) is a parallel

computing architecture developed by NVIDIA. CUDA is the computing engine
in NVIDIA graphics processing units or GPUs that is accessible to software
developers through industry standard programming languages. Programmers
use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale
Open64 C compiler,[1] to code algorithms for execution on the GPU. CUDA
architecture supports a range of computational interfaces including
OpenCL[2] and DirectCompute[3]. Third party wrappers are also available for
Python, Fortran, Java and Matlab.
The latest drivers all contain the necessary CUDA components. CUDA works
with all NVIDIA GPUs from the G8X series onwards, including GeForce, Quadro
and the Tesla line.
The next generation CUDA architecture, code named Fermi, is the most
advanced GPU computing architecture ever built. With over three billion
transistors and featuring up to 512 CUDA cores, Fermi delivers
supercomputing features and performance at 1/10th the cost and 1/20th the
power of traditional CPU-only servers. Fermi makes GPU and CPU co-
processing pervasive by addressing the full-spectrum of computing
applications. Designed for C++ and available with a Visual Studio
development environment, it makes parallel programming easier and
accelerates performance on a wider array of applications than ever before
â€œ including dramatic performance acceleration in ray tracing, physics,
finite element analysis, high-precision scientific computing, sparse linear
algebra, sorting, and search algorithms.
REFERENCES
1. http://www.howstuffworks.com
2. http://www.tomshardware.com
3. http://www.intel.com
4. http://www.nvidia.com
5. http://www.extremetech.com
6. http://www.entecollege.com
7. http://www.pcworld.com
8. http://www.en.wikipedia.com
9. http://www.pcguide.com

Gpu

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Gpu

Загружено:

Авторское право:

Доступные форматы

graphics processing unit seminar report

.doc graphics processing unit seminar report.doc (Size: 391.5 KB /

A Graphics Processing Unit (GPU) is a microprocessor that has been designed

There are various applications that require a 3D world to be simulated as

The functional purpose of a GPU then, is to provide a separate dedicated

A Graphics Processing Unit (GPU) is a microprocessor that has been designed

Used primarily for 3-D applications, a graphics processing unit is a single-chip

HISTORY AND STANDARDS

The first graphics cards, introduced in August of 1981 by IBM, were

PERIPHERAL COMPONENT INTERCONNECT(PCI)

There are a lot of incredibly complex components in a computer. And all of

PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the

ACCELERATED GRAPHICS PORT (AGP)

AGP is a high-performance interconnect between the core-logic chipset and

Segments of system memory can be dynamically reserved by the OS for use

AGP Memory Allocation

During AGP memory initialization, the OS allocates 4K byte pages of AGP

A block of contiguous memory space, called the Aperture is allocated above

2x 133 MHz 533

4x 266 MHZ 1066

8x 533 MHZ 2133

There are several components on a typical graphics card:

Graphics accelerator: In this configuration, the chip on the graphics card

Digital-to-Analog Converter (DAC) â€œ The DAC on a graphics card is

Computer (Bus) Connector â€œ This is usually Accelerated Graphics Port

Internal Organization of GPU

HOW IS 3D ACCELERATION DONE??????

There are different steps involved in creating a complete 3D scene. It is done

PERFORMANCE FACTORS OF GPU

It is defined as the number of pixels or texels (textured pixels) rendered per

Â¢ Hidden Surface removal:

A term to describe the reducing of overdraws when rendering a scene by not

There are mainly two types of GPUs, they are

Some of them areÂ¦Â¦

Quadro series from NVIDIA.

Wildcat series from 3D Labs.

FireGL series from ATI.

Some of them areÂ¦Â¦.

Geforce series from NVIDIA.

Radeon series from ATI Technology ltd.

Kyro series from STM Microelectronics

NVIDIA introduced the groundbreaking, top-to-bottom GeForce4 family of

GeForce4 is the most complete family of graphics solutionsâ€from the

Three of its kind has been releasedÂ¦Â¦.

Â¢ GeForce4 Ti series (NV25)

Â¢ GeForce4 MX series (NV17)

The Accuview AA Engine

The Interface Unit

The Texture Unit

The nfiniteFX II Engine

This is the striking feature of GeForce4 series.

A demo shot of a wolf man which has been rendered by GeForce4

The Display Unit

The 2D/Video/HDVP Unit

Simply put, NVIDIAs Multi-Monitor and Dual Independent Display technology,

Â¢ Full featured interface including explorer browser with birds-eye views of

Â¢ Individual application control

Â¢ Window & dialog repositioning

Â¢ Transparency & colored transparency window options

Â¢ Extends functionality of all applications

nVIDIA's crown graphics card is the GeForce4 Ti, a creation of admirable

Â¢ 63 million transistors (only 3 million more than GeForce3)