L4

ITC203 – Lecture 4
• Central Processing Unit (CPU) - computer system’s “brain”:

o executes program instructions (computation, comparison, and branching)
o directs all computer system actions (processing, storage, input/output, and data movement)
o Components
▪ Control unit – directs flow of data to/from memory registers, and the arithmetic logic unit
♦ Arithmetic logic unit (ALU) – executes computation and comparison instructions
▪ Registers – storage locations within the CPU that hold ALU inputs and outputs, and other data
for fast access
▪ Internal Clock
♦ generates “ticks” at regular intervals:
♦ frequency of those ticks – clock rate (gigahertz (GHz) – billions of cycles (ticks)/second)
♦ fetch/execute cycles are fractions of the clock rate
♦ time needed to fetch (with no wait states) and execute simplest instruction
♦ modern CPUs are far too complex (memory caches, multiple core processors, multiple
ALUs per core, pipelining)
o CPU alternates between 2 cycles
▪ Instruction cycle (fetch cycle)
♦ control unit reads an instruction from primary storage
♦ control unit increments the instruction pointer (address of the next instruction to be
read)
♦ control unit stores the instruction stored in the instruction register
➢ data inputs embedded in the instruction are loaded into registers as inputs for
the ALU
➢ instruction includes memory addresses of data inputs copied from memory and
loaded into registers as inputs for the ALU
▪ Execution cycle
♦ data movement instructions are executed by the control unit
♦ computation and comparison instructions are executed by the ALU in response to a
signal from the control unit
♦ data inputs flow from registers through processing circuitry and output(s) flows to one
or more registers
o Instruction
▪ a command to the CPU to perform a single processing function on specific data inputs
▪ stored in memory or a register
▪ sequence of bits that must be decoded to extract the processing function and data inputs (or
location of the data inputs
▪ components:
♦ Op code - unique binary number representing the processing function and a template
for extracting the operands
➢ Operands
❖ one or more groups of bits after op code
❖ contain data to be processed or identify the location of data (a register
or memory address)
❖ different kinds of operands have different lengths (depend on type of
data or address stored)
❖ same processing function may correspond to many different op-codes
with different operand formats (e.g., an ADD instruction for integers
stored as operands, another for integers stored in registers, and another
for integers stored in memory)
♦ MOVE – Copy data from:
➢ memory address to a register (a load operation)
➢ register to memory address (a store operation)
➢ register to another register
♦ Boolean logic – convert individual bits within a bit string (bitwise operations) or treat
entire bit strings as true or false and manipulate/combine them (logic operations)
➢ NOT – 0 is 1, 1 is 0
➢ AND – both true, 1 1
➢ OR – both false 0 0
➢ Exclusive OR (XOR) – 0 0 is 1, 1 1 is 1
♦ ADD: produces arithmetic sum of two bit strings
♦ SHIFT
➢ move all bits left or right and fill in zeros
➢ used to extract single bit values (logical shift)
➢ used for binary multiplication and division (arithmetic shift)
♦ BRANCH (JUMP)
➢ alters next instruction fetched/executed
➢ unconditional branch – always changes sequence (e.g., a GOTO statement)
➢ Conditional branch – changes only if the value true is stored in a register
♦ HALT – stop
▪ complex operations such as exponentiation and operations on non-integer data types can be
implemented by complex combinations of the simple instructions
♦ subtraction can be implemented as:
710−310 = ADD(ADD(XOR(0011,1111),0001),0111)
= ADD(ADD(1100,0001),0111
= ADD(1101,0111)
= 10100
▪ pros of providing only a minimal instruction set
♦ processor is simple to build and construct
➢ simple = cheaper CPUs with very fast clock rates
▪ Cons of providing only a minimal instruction set
♦ programs that need complex processing/data are complex
➢ Complex = expensive, slow, and error-prone program development
▪ modern CPUs provide a far richer set of instructions than the minimal set:
♦ duplicate instructions for multiple data types (e.g., signed/unsigned, integer/real)
♦ higher-order computational functions (e.g. subtraction, multiplication/division)
♦ higher-order logical functions (e.g., greater than or equal to)
♦ instructions that combine data movement to/from memory with processing
RISC versus CISC
• Reduced Instruction Set Computing (RISC)
o avoid “unnecessary” complex instructions – keep instruction count to several dozen to a few hundred
o minimize number/complexity of instruction formats
o minimize maximum instruction length
o avoid combining data movement with transformation (load-store architecture)
o “Less is more” – IBM POWER CPUs
• Complex Instruction Set Computing (CISC)
o Opposite of RISC
o Intel Core and Xeon CPUs
o CPU Complexity/Speed
▪ RISC simplifies job of the control unit by simplifying the instruction set
♦ Simpler fetch = faster fetch = higher clock rate
o Program execution speed
▪ Higher clock rate = faster program execution
♦ BUT no complex instructions
♦ Thus, more instructions must be fetched/executed to do “complex” operations
MIPS and MFLOP
• MIPS – millions of (fetched/executed) instructions per second – presumed to be integer instructions or a
“typical” mix
• MFLOPS – millions of (fetched/executed) floating point operations per second
• both terms outdated as modern CPUs get faster (e.g., GFLOPS, TFLOPS, and PFLOPS)
• both terms can apply to performance of processor in isolation or the entire computer system
• lower than implied by clock rate as more complex operations require more execution time (multiple clock
cycles)
• wait states for:
o access to memory
o access to system bus
o access to storage and I/O devices
Benchmarks
• performance measure for a computer system or one of its components when performing a specific and realistic
type of software task
o responding to an HTTP request
o processing a complex database transaction
o reading/writing a disk
o redrawing the screen in an animation
• 2 classes:
o Artificial – a “made-up” workload that is a representative of a class of real workloads
▪ generally more realistic and reliable indicators of computer system performance than MIPS and
MFLOPS
o Live-Load – a workload based on “real” tasks such as playing an online game, encoding a DVD, or
responding to web server requests
Sample Benchmarks
• Standard Performance Evaluation Corporation (SPEC) provides a suite of benchmarks including:
o SPEC CPU: computational performance with integers and floating point numbers
o SPEC MPI: computational performance of problems distributed across a cluster
o SPECviewperf: workstation graphics performance
o SPECmail: email server performance
• TPC: server-oriented performance for processing business or database transactions
• PassMark: test suite for microcomputers
Registers
• 2 classes:
o General-purpose
▪ used as high-performance scratchpad memory by the ALU(s)
▪ more are better up to a point
▪ modern CPUs typically provide a few dozen per ALU
o Special-purpose registers
▪ used primarily by the control unit in CPU management tasks
• instruction pointer – memory address for next instruction fetch, a.k.a. program counter
• instruction register – copy of most recently fetched instruction
• program status word (PSW) – set of bit flags containing error and other codes related to
processing results
o result of comparison operations
o divide by zero
o overflow and underflow
Word Size
• word – fixed number of bits/bytes
o basic “unit” of data transformation in a CPU
o size of a data item that the CPU manipulates when executing a “normal” instruction
o 64-bit Intel Core CPU has word sizes ranging from 16 to 128 bits
• ALU circuitry manipulates all bits of a word in parallel while executing a single instruction
o larger word size implies larger and more complex ALU and other circuitry thus increasing CPU expense
and slowing clock rate
• mismatches between CPU word size and size of data items manipulated by a program include:
o CPU word size > program data size
o lots of zeros carried through fetches, registers, and ALU circuitry
o performance is suboptimal – CPU is more complex than the program requires – more complex = slower
o cost is higher than needed since “extra” word size is unused
• CPU word size = program data size
o performance and cost are both optimal
• CPU word size < program data size
o avoids cost of extra bits
o substantial performance penalty due to breaking data items into word-sized chunks and performing
piece-wise operations on the words
o performance penalty varies with size mismatch and the complexity of the processing function(s)
o cost of CPU is lower since small word size = simpler CPU = less expensive CPU
o modern CPUs are so cheap that word size must be VERY large to significantly increase cost
o for best cost/performance ratio, match CPU word size to the size of data that will be processed
• typical “normal data sizes”
o “Business” applications – 32 or 64 bits
o “Scientific” applications – 64 or 128 bits
o Database and multimedia applications – highly variable
• early CPUs had small word size (e.g., 8 or 16 bits) due to technology limitations and thus had suboptimal
performance
Performance Enhancement Techniques
• As fabrication technology has improved, CPU designers have been able to employ ever more complex
performance improvement techniques individually and in combination (Memory caching, Pipelining, Branch
prediction and speculative execution, Multiprocessing)
Pipelining
• Henry Ford era technique (i.e., the sequential assembly line) applied to executing program instructions
• Execution stages
1. Fetch from memory
2. Increment and store instruction pointer (IP)
3. Decode instruction and store operands and instruction pointer
4. Access ALU inputs
5. Execute instruction within the ALU
6. Store ALU output
• attempts to overlap instruction execution by performing each stage on a different instruction at the same time
• complexities
o is one instruction pointer enough?
o is one instruction register enough?
o is one set of general purpose registers enough?
o is one ALU enough?
o what happens if a branch is encountered?
• can be “finer-grained”
o execution (usually the longest stage) could be subdivided into additional stages
Multiprocessing
• carries the duplication to higher levels
o multiple ALUs (with parallel execution of instructions) per CPU (late 1990s)
o multiple CPUs on a single motherboard (2000s)
o multiple CPUs on a single chip (late 2000s)
• OS more complex as they now manage more processing resources and more complex application software
o application software (with multiprocessing) is more complex because it must be designed for parallel
execution (multithreading)
Branches and Speculative Execution
• branches cause problems with pipelining because they invalidate the partially executed instructions that follow
them
o wrong instructions were fetched and partially executed
o special- and general-purpose register contents are incorrect
• pipeline must be flushed and filled with the proper set of instructions must
• real programs have lots of branches and thus, pipelining will “fail” unless preventive measures are employed
o Preventive Measures
▪ Look-ahead – “watch” incoming instructions for branches and alter standard behavior
accordingly
▪ Branch prediction – if a conditional branch is fetched, attempt to guess the condition result and
load/execute the corresponding instructions (speculative execution)
▪ Speculatively execute both paths beyond a conditional branch
• requires multiple execution units
• half the results will be thrown away (half the effort is wasted)
• modern CPUs employ all three techniques to improve pipelining performance
The Physical CPU
• complex system of interconnected electrical switches
• contains millions of switches, which perform basic processing functions
• physical implementation of switches and circuits
Switches and Gates
• Building blocks of CPU and memory circuitry
o switch – a device that can be open or closed to allow or block passage of electricity – implemented as a
transistor
o gate – multiple switches wired together to perform a processing function on one bit
a) NOT b) AND c) OR d) XOR e) NAND
Circuits
• gates are wired into circuits to perform more complex processing
Electricity
• Circuits are electrical devices. Advantages/limitations:
o Speed – electrons move through circuitry at approximately 70% of light speed
▪ Speed of processing is thus directly proportional to circuit length
o Conductivity – circuits must be constructed of highly conductive material – e.g., copper or gold
o Resistance – good conductors turn some electrical energy into heat
o Circuit length is limited because energy loss accumulates
o Heat must be dissipated to prevent higher resistance or physical damage to conductors
• Properties
o Conductivity: capability of an element to enable electron flow
o Resistance: loss of electrical power that occurs within a conductor
o Heat
▪ Physical damage to conductor
▪ Changes to inherent resistance of conductor
▪ Dissipate heat with a heat sink
o Speed and Circuit Length: time required to perform a processing operation is a function of length of
circuit and speed of light
Processor Fabrication
• CPUs are fabricated as microprocessors
o silicon chips contain billions of transistors and their wiring implement multiple CPUs, memory caches,
and memory/bus interface circuitry
• Speed improved over time by shrinking the physical size of the wires and transistors – (22 nanometers)
• Looming Problems
o Moore’s Law – transistor count on a chip doubles every 18-24 months at no cost increase
▪ Implies greater power and/or speed IF the additional transistors are used as effectively as the
previous ones
o Rock’s Law – cost of a processor fabrication facility doubles every four years
▪ Currently >10 billion dollars
o Process shrinkage has limits
▪ Etching process requires higher and higher wavelength beams (current – X-Rays)
▪ Fabrication errors accumulate (e.g., material impurities)
▪ Molecular width of conductors is a theoretical lower bound (single-digit nanometers)
• Optical interconnects
o Reduces or eliminates wiring
o Logical extension of current technology
o Unknown price/performance characteristics
o Many manufacturing issues yet to be worked out

L4

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

L4

Загружено:

Авторское право:

Доступные форматы

ITC203 – Lecture 4

• Central Processing Unit (CPU) - computer system’s “brain”:

Вам также может понравиться