Вы находитесь на странице: 1из 53

Computer Architecture

A Quantitative Approach, Fifth Edition

Chapter 1 Fundamentals of Quantitative Design and Analysis

Copyright 2012, Elsevier Inc. All rights reserved.

Introduction

1970: The IBM 7044 Computer

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

1970: Magnetic Core Memory

1core = 1 bit

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

How Does a Computer Work?


!!

What is the Von Neumann computer architecture?


!!

!!

What are the elements of a Von Neumann computer? How do these elements interact to execute programs?

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

Von Neumann Architecture


!!

What is the Von Neumann computer architecture?


!!

!!

What are the elements of a Von Neumann computer? How do these elements interact to execute programs?

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

Von Neumann Architecture


ALU
Registers

Data
Program

Stored-program computer

system bus
Copyright 2013, Daniel A. Menasce. All rights reserved. 6

Introduction

Von Neumann Architecture


ALU
Registers

Data
Program

Stored-program computer
Sequence of instructions: (1) Load/Store from memory to/from registers, (2) Arithmetic/Logical Ops., (3) Conditional and unconditional branches.

General registers, FP registers, Program counter, etc.

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

Example of Instruction Formats


opcode RX
Registers

RY RZ Address

RZ "RX op RY RX " Mem(address) Mem(address) "RX Unconditional or Conditional Jump to Address

opcode RX

opcode

Address

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

Computer Technology
!!

Performance improvements:
!!

Improvements in semiconductor technology


!!

Feature size, clock speed Enabled by HLL compilers, UNIX Lead to RISC architectures

!!

Improvements in computer architectures


!! !!

!!

Together have enabled:


!! !!

Lightweight computers Productivity-based managed/interpreted programming languages


Copyright 2012, Elsevier Inc. All rights reserved. 9

Introduction

Single Processor Performance


Move to multi-processor. From ILP to DLP and TLP

RISC

Copyright 2012, Elsevier Inc. All rights reserved.

10

Introduction

Moores Law
The number of transistors on integrated circuits doubles approximately every two years. (Gordon Moore, 1965)

Source: Wikimedia Commons

Copyright 2013, Daniel A. Menasce. All rights reserved.

11

Introduction

Current Trends in Architecture


!!

Cannot continue to leverage Instruction-Level parallelism (ILP)


!!

Single processor performance improvement ended in 2003

!!

New models for performance:


!! !! !!

Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP)

!!

These require explicit restructuring of the application


Copyright 2012, Elsevier Inc. All rights reserved. 12

Introduction

Instruction Level Parallelism (ILP)


In high-level language: e= a + b; f = c + d; g = e * f; In assembly language:
* e = a+b 001 LOAD A,R1 002 LOAD B,R2 003 ADD R1,R2,R3 004 STO R3,E * f=c+d 005 LOAD C,R4 006 LOAD D,R5 007 ADD R4,R5,R6 008 STO R6,F * g=e*f 009 MULT R3,R6,R7 010 STO R7,G

a b + e * g

c + f

Copyright 2013, Daniel A. Menasce. All rights reserved.

13

Introduction

Instruction Level Parallelism


In assembly language:
* e = a+b 001 LOAD A,R1 002 LOAD B,R2 003 ADD R1,R2,R3 004 STO R3,E * f=c+d 005 LOAD C,R4 006 LOAD D,R5 007 ADD R4,R5,R6 008 STO R6,F * g=e*f 009 MULT R3,R6,R7 010 STO R7,G

Instructions that can potentially be executed in parallel: 001, 002, 005, 006 003, 007 004, 008, 009 010

Copyright 2013, Daniel A. Menasce. All rights reserved.

14

Introduction

Data Level Parallelism (DLP)

Task

Data

Task

Data
15

Copyright 2013, Daniel A. Menasce. All rights reserved.

Introduction

Task Level Parallelism (TLP)

Task

Tasks
Copyright 2013, Daniel A. Menasce. All rights reserved. 16

Classes of Computers

Classes of Computers
!!

Personal Mobile Device (PMD)


!! !!

e.g. smart phones, tablet computers Emphasis on energy efficiency and real-time Emphasis on price-performance Emphasis on availability, scalability, throughput Used for Software as a Service (SaaS) Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Emphasis: price
Copyright 2012, Elsevier Inc. All rights reserved.

!!

Desktop Computing
!!

!!

Servers
!!

!!

Clusters / Warehouse Scale Computers


!! !! !!

!!

Embedded Computers
!!

17

Classes of Computers

Parallelism
!!

Classes of parallelism in applications:


!! !!

Data-Level Parallelism (DLP) Task-Level Parallelism (TLP)

!!

Classes of architectural parallelism:


!! !! !! !!

Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism

Copyright 2012, Elsevier Inc. All rights reserved.

18

Classes of Computers

Flynns Taxonomy
!!

Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD)
!! !! !!

!!

Vector architectures Multimedia extensions Graphics processor units

!!

Multiple instruction streams, single data stream (MISD)


!!

No commercial implementation

!!

Multiple instruction streams, multiple data streams (MIMD)


!! !!

Tightly-coupled MIMD (thread-level parallelism) Loosely-coupled MIMD (cluster or warehouse-scale computing)


Copyright 2012, Elsevier Inc. All rights reserved. 19

Defining Computer Architecture

Defining Computer Architecture


!!

Old view of computer architecture:


!! !!

Instruction Set Architecture (ISA) design i.e. decisions regarding:


!!

Registers (register-to-memory or load-store), memory addressing (e.g., byte addressing, alignment), addressing modes e.g., register, immediate, displacement), instruction operands (type and size), available operations (CISC or RISC), control flow instructions, instruction encoding (fixed vs. variable length)

!!

Real computer architecture:


!! !!

!!

Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware
Copyright 2012, Elsevier Inc. All rights reserved. 20

Trends in Technology

Trends in Technology
!!

Integrated circuit technology


!! !! !!

Transistor density: 35%/year Die size: 10-20%/year Integration overall (Moores Law): 40-55%/year

!!

DRAM capacity: 25-40%/year (slowing) Flash capacity (non-volatile semiconductor memory): 50-60%/year
!!

!!

15-20X cheaper/bit than DRAM

!!

Magnetic disk technology: density increase: 40%/year


!! !!

15-25X cheaper/bit then Flash 300-500X cheaper/bit than DRAM


Copyright 2012, Elsevier Inc. All rights reserved. 21

Trends in Technology

Bandwidth and Latency


!!

Bandwidth or throughput
!! !! !!

Total work done in a given time (e.g., MB/sec, MIPS) 10,000-25,000X improvement for processors 300-1200X improvement for memory and disks

!!

Latency or response time


!! !! !!

Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks

Copyright 2012, Elsevier Inc. All rights reserved.

22

Trends in Technology

Bandwidth and Latency

Log-log plot of bandwidth and latency milestones


Copyright 2012, Elsevier Inc. All rights reserved. 23

Trends in Technology

Transistors and Wires


!!

Feature size
!!

!! !!

Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly
!!

Wire delay does not improve with feature size!

!!

Integration density scales quadratically

Copyright 2012, Elsevier Inc. All rights reserved.

24

Trends in Power and Energy

Power (1 Watt = 1 Joule/sec) and Energy (Joules)


!!

Problem: Get power in, get power out Thermal Design Power (TDP)
!! !! !!

!!

Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption

!!

Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement
Copyright 2012, Elsevier Inc. All rights reserved. 25

!!

Introduction

Energy and Power Example


Processor A 20% higher average power consumption: 1.2 P Executes a task in 70% of the time needed by B: 0.7 * T Energy consumption: 1.2 * 0.7 * T = 0.84 P * T More energy efficient! Processor B P T

Energy consumption = P * T

It is better to use energy instead of power for comparing a fixed workload.


Copyright 2013, Daniel A. Menasce. All rights reserved. 26

Trends in Power and Energy

Dynamic Energy and Power


!!

Dynamic energy
!! !!

Transistor switch from 0 -> 1 or 1 -> 0 ! x Capacitive load x Voltage2

!!

Dynamic power
!!

! x Capacitive load x Voltage2 x Frequency switched

!!

Reducing clock rate reduces power, not energy

Copyright 2012, Elsevier Inc. All rights reserved.

27

Trends in Power and Energy

Dynamic Energy and Power: Example


!!

Assume a processor with Dynamic Voltage Scaling (DVS) and that a 15% reduction in voltage results in 15% reduction in frequency. What is the impact on dynamic energy and dynamic power? Assume that the capacitance is unchanged.
Energynew (Voltage " 0.85) 2 = = 0.72 2 Energy old Voltage
Powernew (Voltage " 0.85) " (Frequency " 0.85) = = 0.61 2 Powerold Voltage " Frequency
2

!
!
Copyright 2012, Elsevier Inc. All rights reserved. 28

Trends in Power and Energy

Power
!! !!

!!

!!

Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air

Copyright 2012, Elsevier Inc. All rights reserved.

29

Trends in Power and Energy

Reducing Power
!!

Techniques for reducing power:


!!

!! !! !!

Do nothing well: turn-off clock of inactive modules. Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks (spin-down) Overclocking some cores and turning off other cores

Copyright 2012, Elsevier Inc. All rights reserved.

30

Trends in Power and Energy

Static Power
!!

Static power consumption


!!

!!

!! !! !!

Leakage current flows even when a transistor is off Leakage can be as high as 50% due to large SRAM caches that need power to maintain values. Powerstatic = Currentstatic x Voltage Scales with number of transistors To reduce: power gating (i.e., turn off the power supply).

Copyright 2012, Elsevier Inc. All rights reserved.

31

Trends in Cost

Trends in Cost
!!

Cost driven down by learning curve


!!

Yield (% of manufactured devices that survive testing).

!!

DRAM: price closely tracks cost Microprocessors: price depends on volume (less time to get down the learning curve and increase in manufacturing efficiency)
!!

!!

10% less for each doubling of volume


Copyright 2012, Elsevier Inc. All rights reserved. 32

Trends in Cost

Integrated Circuit Cost


!!

Integrated circuit

# dies along the wafers perimeter


!!

Bose-Einstein formula:

empirical formula

!! !!

Defects per unit area = 0.016-0.057 defects per square cm (2010) N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
Copyright 2012, Elsevier Inc. All rights reserved. 33

Trends in Cost

Integrated Circuit Cost: Example


!!

What is the number of dies in a 30 cmdiameter wafer for a die that is 1.5 cm on a side and for a die that is 1.0 cm on a side?

" # (30/2) 2 " # 30 Dies per wafer = $ = 270 1.5 # 1.5 2 # 1.5 # 1.5 " # (30/2) 2 " # 30 Dies per wafer = $ = 640 1.0 # 1.0 2 # 1.0 # 1.0

Copyright 2012, Elsevier Inc. All rights reserved.

34

Trends in Cost

Integrated Circuit Cost: Example


!!

What is the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a defect density of 0.031/cm2, N = 13.5, a wafer yield of 100%?
1 13.5 = 0.40 (1 + 0.031 " 1.5 " 1.5) 1 13.5 = 0.66 (1 + 0.031 " 1.0 " 1.0)

Die yield = Die yield =

Copyright 2012, Elsevier Inc. All rights reserved.

35

Dependability

Dependability
!!

Service Level Agreement (SLA) or Service Level Objectives (SLO) Module reliability
!! !! !! !!

!!

Mean time to failure (MTTF) reliability measure Mean time to repair (MTTR) service interruption Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF

Copyright 2012, Elsevier Inc. All rights reserved.

36

MTTR, MTTF, and MTBF


time
MTTR MTTF

failure

machine is fixed

failure
MTBF

2004 D. A. Menasc. All Rights Reserved.

37

Dependability

Dependability Example
!!

A disk subsystem has the following components and MTTF values:


Component 10 disks 1 ATA controller 1 power supply 1 fan 1 ATA cable MTTF (hours) Each at 1,000,000 hours 500,000 200,000 200,00 1,000,000

!!

Assume that lifetimes are exponentially distributed and independent, what is the systems MTTF?
Copyright 2012, Elsevier Inc. All rights reserved. 38

Dependability

Dependability Example Contd


Component 10 disks 1 ATA controller 1 power supply 1 fan 1 ATA cable MTTF (hours) Each at 1,000,000 hours 500,000 200,000 200,00 1,000,000 1 1 1 1 1 = 10 " " " " " 1,000,000 500,000 200,000 200,000 1,000,000 = 23,000 /billion hours

Failure Rate syst

MTTFsyst
!

1 = = 43,500 hours " 4.96 yrs Failure Rate syst


Copyright 2012, Elsevier Inc. All rights reserved. 39

Measuring Performance

Measuring Performance
!!

Typical performance metrics:


!! !!

Response time (of interest to users) Throughput (of interest to operators)

!!

Speedup of X relative to Y
!!

Execution timeY / Execution timeX

!!

Execution time
!! !!

Wall clock time: includes all system overheads CPU time: only computation time

!!

Benchmarks
!! !! !! !!

Kernels (e.g., matrix multiply) Toy programs (e.g., sorting) Synthetic benchmarks (e.g., Dhrystone) Benchmark suites (e.g., SPEC, TPC)
Copyright 2012, Elsevier Inc. All rights reserved. 40

Measuring Performance

SPECRate and SPECRatio


!!

SPECrate: a throughput metrics. Measures the number jobs of a given type that can be processed in a given time interval. SPECratio = ratio between elapsed time for a given job at a reference machines and the elapsed time of the same job at a given machine.

!!

Execution Time reference = 10.5sec Execution Time machine A = 5.25sec

SPECratiomachine A = 10.5 /5.25 = 2


!!

Comparing machines A and B:

Execution Time machine B = 21sec Execution Time ! machine A = 5.25sec Execution Time reference SPECratiomachine A Execution Time machine A Execution Time machine B = = = 21/5.25 = 4 Execution Time SPECratiomachine B Execution Time machine A reference Execution Time machine B
Copyright 2013, Daniel A. Menasce 41

Measuring Performance

Geometric Mean of SPECratios


!!

When computing the average of SPECratios one should use the geometric mean and not the arithmetic mean:
n

Geometric mean = n
Program

"x
i=1

SPECRatio A B C 10.2 21.5 15.2 14.94

Geometric mean

Geometric mean = 3 10.2 " 21.5 " 15.5 = 3 3,333.36 = 14.94

Copyright 2013, Daniel A. Menasce

42

Principles

Principles of Computer Design


!!

Take Advantage of Parallelism


!!

e.g., multiple processors, disks, memory banks, pipelining, multiple functional units

!!

Principle of Locality
!!

Reuse of data and instructions

!!

Focus on the Common Case


!!

Amdahls Law

Copyright 2012, Elsevier Inc. All rights reserved.

43

Principles

Amdahls Law: Example


!!

A program takes 60 sec to execute. But, 30% of its execution time can have its execution time improved by a factor of 5. What is the overall speedup?

Execution Time new = 0.7 " 60 + 0.3 " 60 /5 = 42 + 18 /5 = 45.6 sec Speedup overall = 60 /45.6 = 1.316

Copyright 2013, Daniel Menasce. All rights reserved.

44

Principles

Amdahls Law: Example


!!

There are two options to enhance the performance of a graphics application: enhance by a factor of 10 FP SQRT or enhance by a factor of 1.6 all FP operations. Which is best?
% Occurrence Speedup of Enhancement 20% 50% 10 1.6 Overall Speedup =1/(0.2/10+0.8)=1.22 =1/(0.5+0.5/1.6)=1.23

Type of Enhancement FP SQRT All FP Ops.

Copyright 2013, Daniel Menasce. All rights reserved.

45

Principles

Principles of Computer Design


!!

The Processor Performance Equation

(average) Clock cycles per instruction

Copyright 2012, Elsevier Inc. All rights reserved.

46

Principles

Processor Equations: Example


!!

We have the following measurements:


Measurement % of FP operations Avg. CPI of FP operations Avg. CPI of other instructions % of FP SQRT CPI of FP SQRT Value 25% 4.0 1.33 2% 20

!!

Compare the following alternatives: (a) decrease CPI of FP SQRT to 2 or (b) decrease avg. CPI of all FP ops to 2.5. Which is best?
Copyright 2012, Elsevier Inc. All rights reserved. 47

Principles

Processor Equations: Example


Measurement % of FP operations Avg. CPI of FP operations Avg. CPI of other instructions % of FP SQRT CPI of FP SQRT Value 25% 4.0 1.33 2% 20

!! !!

Original CPI: CPIorig = 0.25 " 4 + 0.75 " 1.33 = 2.0 CPI for enhanced FP SQRT:
CPI new FPSQRT = 2.0 " 0.02 # 20 + 0.02 # 2 = 1.64 !

!!

CPI for enhanced FP:


CPI new FP = 2.0 " 0.25 # 4 + 0.25 # 2.5 = 1.625
Copyright 2012, Elsevier Inc. All rights reserved. 48

Principles

Processor Equations: Example


!! !!

Original CPI: CPIorig = 0.25 " 4 + 0.75 " 1.33 = 2.0 CPI for enhanced FP:
! CPI new FP = 2.0 " 0.25 # 4 + 0.25 # 2.5 = 1.625
IC " CPI orig 2.0 = = 1.23 IC " CPI new FP 1.625

!!

Speedup:
!
Speedup =

!!

Most modern processors have counters for ! instructions executed and for clock cycles.

Copyright 2012, Elsevier Inc. All rights reserved.

49

Principles

Falacies and Pitfalls


!!

Fallacy: Multiprocessors are a silver bullet.


!!

!!

Improve performance by replacing high-clock and inefficient core with several lower-clockrate efficient cores. Real performance improvement burden is now shifted to programmers. Should check overall improvement using Amdahls law before spending effort on enhancement.
Copyright 2012, Elsevier Inc. All rights reserved. 50

!!

Pitfall: Falling prey to Amdahls Law.


!!

Principles

Falacies and Pitfalls (contd)


!!

Pitfall: A single point of failure.


!!

Dependability is as strong as the weakest link.

!!

Fallacy: Hardware enhancements that increase performance improve energy efficiency or are at worst energy neutral
!!

Running SPEC2006 on an Intel Core i7 n Turbo mode showed a 7% performance increase at a cost of 37% more energy and 47% more power.
Copyright 2012, Elsevier Inc. All rights reserved. 51

Principles

Falacies and Pitfalls (contd)


!!

Fallacy: Benchmarks remain valid indefinitely.


!!

There is a tendency to optimize performance for standard benchmarks.

!!

Fallacy: The rated MTTF of disks is 1,200,000 hours (almost 140 years). So, disks practically never fail.
!!

These numbers are exaggerated due to manufacturer misleading testing processes. Real-world MTTF is about 2 to 10 times worse than manufacturers MTTF.
Copyright 2012, Elsevier Inc. All rights reserved. 52

Principles

Falacies and Pitfalls (contd)


!!
!!

Fallacy: Peak performance tracks observed performance.


Observed performance as a fraction of peak performance varies significantly (from 5% to 60%) by benchmark and computer. A relatively high percentage of a systems components can fail without affecting its correct operation. If the systems operation is interrupted when these faults are detected, availability is reduced.
Copyright 2012, Elsevier Inc. All rights reserved.

!!

Pitfall: Fault detection can lower availability.


!!

53

Вам также может понравиться