Академический Документы
Профессиональный Документы
Культура Документы
Jmyer1@uncc.edu Page 1 of 6
The registers serve as temporary memory storage for the CPU, and while they lack in number
and storage size, they can be read faster than disk or RAM memory [3, 4]. However, their
contents are lost if the CPU loses power, making them unsuitable for long-term or important
information storage. Once calculations are complete, the resulting data is sent to the RAM or
stored within the register for later use, and this data is sent to its respective component when the
instructions are carried out [3].
Some of the devices in a computer are linked by buses, bundles of data lines which facilitate data
transfer [3, 4]. These buses exist in both internal and external formats, transferring data to and
from the ALU and connecting the CPU to memory and input/output controllers, respectively.
Devices attached to the buses can initiate bus transfers, and are called masters; devices that
passively await requests are called slaves. Wider buses can facilitate faster and larger data
transfers, but this comes at the cost of taking up more space in both wires and connectors [2]. A
multiplexed bus separates the data transfers into multiple parts, reducing the number of lines by
using the same lines for data and addressing, but the overall bus performance will be reduced.
Intels CPU Design
Intel was the creator of the first microproccessor in 1971, and it presently operates as the worlds
largest manufacturer of microprocessor chips 99 percent of all chips used in servers are Intels
[6]. Developing a new microprocessor is a process that can take five or more years and over $10
billion to complete, and the improvements made in a processor are tied to the size of the
transistors, which store single bits of information. Intel shrinks the size of its transistors
approximately every three years, allowing for more of them to be implemented within a
microproccessor with the same area as the previous model. This is a process known as Moores
Law, which states that transistor count doubles every two or so years, as seen in Figure 2 [10].
Figure 2: Plot of microprocessor transistor count from 1970 2010, in accordance with
Moores Law
Jakob Myer ENGR1202 Writing Assignment Final Due 4/13/2017
Jmyer1@uncc.edu Page 3 of 6
Presently, the continous shrinking of the transistors is posing two problems: controlling for
quantum tunneling, and designing increasingly dense chip layouts [6]. As transistors decrease
further in size by mere nanometers, the electrons in the chip run the risk of jumping through
transistors, even those which are off, which should logically block the movement. Intels Xeon
chips solve this problem by building tower-shaped transistors known as fin-shaped field effect
transistors (FinFETs), which avoid the tunneling phenomenon for present transistor standards.
The other issue is the increasingly complex layout of the transistors and their interconnects on
the chip, and while transistors can be shrunk and their efficiency improved, the copper wires that
connect them cannot without compromising their ability to carry current. As copper is one of the
most conductive materials available, improvements must be made in areas such as the insulators,
which dampen the current flow. Utilizing air itself as an insulator improved the current speed of
two wire layers in the Xeon E5 chip by 10 percent, but the the challenge of improving chip
power as transistors become smaller and interconnects more dense remains.
The size of Intels chips is approaching 5 nanometers, after which it is believed that further
downscaling will prove impossible and Moores Law will collapse [6]. Intel is looking into
solutions ranging from extreme ultraviolet light to quantum computing (replacing transistors
with atomic particles), but they have only two generations of chips beyond the E5 series left to
develop before that threshhold is reached. The future of transistors and microproccessors
depends on the success of entirely new materials and chip designs.
GPU Basics
Graphics processing units (GPUs) are collections of processing resources that prioritize parallel
processing instead of the single-task processes of a CPU, outperforming it in raw computational
power [5, 7]. As dedicated graphics systems, GPUs are employed in tasks requiring images
and/or 3D modeling, and tasks that require real-time graphics require it to render scenes 60 or so
times per second. The task of rendering graphics is organized into a graphics pipeline (shown in
Figure 3) that details the stages of image rendering from generating the points, lines, and
triangles known as primitives to controlling which pixels are seen by the virtual camera (the
perspective of the viewer).
The vector, primitive, and fragment processing stages of the graphics pipeline are governed by
shader functions, which are coded in languages that support impressive control-flow constructs
and data types, but no primitives related to explicit parallel execution [5]. These shader
functions can process the data records of an entity from a single input, rendering objects
independently of other entity processing. This allows all details in an image to be rendered at the
same time.
Jakob Myer ENGR1202 Writing Assignment Final Due 4/13/2017
Jmyer1@uncc.edu Page 4 of 6
The GPUs ability to process so many shader functions simultaneously lies within its design,
which places an emphasis on processing cores [5]. A single core processor can process one
stream of processor instructions, known as execution contexts, at a time. A multicore processor
can run multiple execution contexts in parallel, and even greater processing power is achieved by
cores that contain multiple ALUs. Through a method known as single instruction, multiple data
(SIMD) processing, the ALUs within a core can perform identical operations on different pieces
of data, improving the rate of data execution and the power and space efficiency of the core.
However, the flow of a stream will stall if a stream instruction cannot execute without
outstanding information, putting it on standby until the information is available. To avoid
wasting cycles, GPUs accept more execution contexts than they can simultaneously process, and
the ALUs will process instructions from other runnable flows while theirs is stalled. This
technique is known as hardware multithreading, and it masks the latency issues caused by
memory access and stalling. CPUs can also employ SIMD and multithreading, but not to the
extent of GPUs given their lower core count.
Jakob Myer ENGR1202 Writing Assignment Final Due 4/13/2017
Jmyer1@uncc.edu Page 5 of 6
References
[1] Basic CPU Tutorial, accessed March 10, 2017,
https://embeddedmicro.com/tutorials/lucid/basic-cpu.
[2] Chapter 4: Processors, accessed March 10, 2017,
http://www.aries.net/demos/Server/chapter04/chapter04_1.html.
[3] Inside the CPU, accessed March 10, 2017,
http://www.belpercomputing.com/year-10/gcse-edexcel-computer-science/block-4-how-
computers-work/inside-the-cpu/.
[4] CPU Buses, accessed March 13, 2017,
https://www.doc.ic.ac.uk/~eedwards/compsys/.
[5] Fatahalian, Kayvon and Houston, Mike. A Closer Look at GPUs. In ACM Queue, vol.
51, pp. 50-57. ACM Queue, 2008.
[6] How Intel Makes a Chip, accessed March 11, 2017,
https://www.bloomberg.com/news/articles/2016-06-09/how-intel-makes-a-chip.
[7] Luebke, David and Humphreys, Greg. How GPUs Work. In IEEE Computer, vol. 40,
pp. 96-100. IEEE, 2007.
[8] Pascal GPU Architecture, accessed March 12, 2017,
http://www.nvidia.com/object/gpu-architecture.html.
[9] Speculation of NVIDIA Volta GPU Ramps Up in Anticipation of 2017 Debut,
accessed March 12, 2017,
https://www.top500.org/news/speculation-of-nvidia-volta-gpu-ramps-up-in-anticipation-
of-2017-debut/.
[10] Moores Law, accessed March 14, 2017,
http://pointsandfigures.com/2015/04/18/moores-law/.