Kim 2

area
metal line wafer defect / dies /

chip (sq yield die cost
layers width cost sq cm wafer
mm)
385DX 2 0.9 $900 1.0 43 360 71% $4
486DX 3 0.8 $1,200 1.0 81 181 54% $12
PowerPC 601 4 0.8 $1,700 1.3 121 115 28% $53
HP PA 7100 3 0.8 $1,300 1.0 196 66 27% $73
DEC Alpha 3 0.7 $1,500 1.2 234 53 19% $149
SuperSPARC 3 0.7 $1,700 1.6 256 48 13% $272
Pentium 3 0.8 $1,500 1.5 296 40 9% $417
Using Amdahl’s Law: An Example
Suppose that we are considering an enhancement that runs 10 times faster than the
original machine but is only usable 40% of the time. What is the overall speedup gained
by incorporating the enhancement?
Answer:
Fractionenhanced = 0.4
Speedupenhanced = 10
Speedupoverall = 1 / [0.6+(0.4/10)] = 1/0.64 = 1.56
Locality of Reference
 Programs tend to reuse data and instructions they have used recently.
 A program may spend 90% of its execution time in only 10% of the code.
 Based on a program’s recent past, one can predict with reasonable
accuracy what instructions and data will use in the near future.
Two Types of Locality
 Temporal Locality
 recently accessed items are likely to be accessed in the near future
 Spatial Locality
 items whose addresses (or location) are near one another tend to be
referenced close together in time
MIPS Benchmark
millions of instructions per second
• easy to understand and straightforward
• dependent on instruction set
• varies between programs on the same computer
• MIPS can vary inversely to performance!
MFLOPS Benchmark
millions of floating-point operations per second (megaflops)
• intended to measure floating-point operations & some programs don’t use any
• floating-point operations is not consistent across machines
• MFLOPS ratings for the same machine may differ depending on instruction mix
Programs as Evaluators
Four types (in decreasing order of accuracy):
 Real programs
 Kernels
 Toy Benchmarks
 Synthetic Benchmarks
Synthetic Benchmarks
 programs which try to match the average number and frequency of operations of a
typical workload, e.g. dhrystone, whetstone, etc.
 not real programs, may not reflect program behavior for factors not measured
I compilers and hardware optimizations can artificially inflate results

Toy Benchmarks
• small, simple programs
• produce a result the user already knows
e.g. quicksort, Sieve of Erastosthenes, etc.
Kernels
• small, key pieces from real programs put together to evaluate machine
performance
e.g. Linpack, Livermore Loops, etc.
• no user would run kernel programs because they exist solely for performance
evaluation
• best used to isolate performance of individual features of machines to explain the
reasons for differences in real programs
Real Programs
 Real programs have the input, output, and options that a user can select.
Programs as Evaluators
• Companies may design features that would make their machines run faster on the
benchmarks than on real programs.
• A standard set of programs is hard to obtain because each program run differently
for each machine and companies would want to use programs that run fast on
their machines.
What is VLSI
 -Very Large Scale Integration
 refers to a technology through which it is possible to implement large circuits
consisting of up to or more than a million transistors on semiconductor
wafers,primarily silicon.
-
VLSI-technology used in
 Microprocessors
 digital signal processors (DSPs)
 systolic arrays
 large capacity memories
 memory controllers
 I/O controllers
 interconnection networks.
Von Neumann Machines
 Three basic hardware subsystems
(CPU, memory and I/O)
 Stored-program computer
 Sequential operation
 Single path between main memory and the control unit of the CPU, i.e. the Von
Neumann bottleneck
Harvard Architecture
 class of Von Neumann architectures that provide independent pathways for data
addresses, data, instruction addresses and instructions
 allows the CPU to access instructions and data simultaneously
Registers
 A component used for data storage
(can be read from or written to).
 High speed memory locations used to store important information during CPU
operations.
Program Counter (PC)
 A register which holds the address of the next instruction to be executed.
Instruction Register (IR)
 Holds a copy of the currently executing instruction.
Status/Flags Register
 Holds data about system events/states.
Memory Address Register (MAR)
 Holds the memory address of the data to be read from OR written to memory.
 Connected to the address bus.
Memory Data Register (MDR)
 Programmable registers for various use (“scratch pad”).
 Number varies between processors.
Decoder
 Also known as the Instruction Decoder.
 Translates instructions from the machine level code to the sequence of digital
control signals which carry out the instruction.
Control Unit (CU)
 Executes each control signal in the sequence determined by the decoder.
 Connected to the control bus.
Writing to Memory
 <address> to MAR
 <data> to MDR
 Control Unit issues WRITE signal to Memory
• Memory responds by reading data on the data bus and storing it into the
location specified by the information on the address bus.
ALU Operation
 <operand1> to ALU
 <operand2> to ALU
 Control Unit issues <ALU operation> signal to the ALU
 result from ALU to <anywhere in CPU>
How Instructions Are Processed: An Illustration
 A program called SMPLADD.EXE was executed by a user. The operating
system searches for the file in the hard disk and places the program in main
memory. The program contains the following code ◊
Loc Instruction
100 mov (010A), R0
103 add (010B), R0
106 mov R0, 0200
109 end
10A 09
10B 03
What is a WORD?
 CPU: the smallest unit of data that can be processed at a time, e.g. size of ALU
operands or data in the registers.
 Memory: the smallest unit of addressable data.
 Bus: the smallest unit of data that can be sent through the bus at a time.
Word Length
 Size of a word specified in bits.
 Possible benefits of a large word length:
l faster speed
(more data and/or instructions can be fetched at a time)
l greater numeric precision
l more powerful instructions
(Instructions can have more operands and modes of operation.)
Recap: Instruction Cycle
l Fetch - fetch the instruction from the memory
l Decode - decode the instruction
l Operand Fetch - get the necessary operands
l Execute - execute the instruction
l Store - store the result to the appropriate location

Kim 2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Kim 2

Загружено:

Авторское право:

Доступные форматы

area

metal line wafer defect / dies /

Using Amdahl’s Law: An Example

I compilers and hardware optimizations can artificially inflate results

Вам также может понравиться