Вы находитесь на странице: 1из 60

COMPUTER ORGANIZATION AND DESIGN

5th
The Hardware/Software Interface

Computer Organization!
January 5, 2015
Vara Varavithya

Edition

COMPUTER ORGANIZATION AND DESIGN


5th
The Hardware/Software Interface

Edition

The lecture material for this course has been


adapted in part from UC Berkeley, Penn
State, UB, and Publisher.

Overview

Administrative Matters (5 minutes)


Course Style, Philosophy and Structure (15
min)
Intro to Computer Architecture

Chapter 1 Computer Abstractions and Technology 3

Course Administration

Instructor:

Vara Varavithya (varavithya@gmail.com)


Office Hours: By appointmen

Materials: https://www.facebook.com/groups/629713900488373/

Text:
Computer Organization and Design: The Hardware/Software
Interface, Fifth Edition, Patterson and Hennessy

Chapter 1 Computer Abstractions and Technology 4

Textbook purchase options

Paperback or eBook
CU University Bookstore on campus

Patterson Computer Org & Design, 5th Edition

Amazon

Chapter 1 Computer Abstractions and Technology 5

Midterm Exams

Reduce the pressure of taking exams

50 minutes
Our goal: test knowledge vs. speed writing
All mid-terms closed book
Necessary sheets will be provided

Chapter 1 Computer Abstractions and Technology 6

Homework Assignments and Project

Assignments consist of two parts

Homework assignments

Homeworks Individual Effort: Exercises from the text book


Projects Team Effort: Lab assignments
Exercises due in one week at beginning of lecture
Brief (15 minute) quiz on assignment material in lecture
or recitations (several of them)
Must understand assignment to do quiz
No late assignments!
Recitations will help solve problems

Homeworks will not be graded, but checked for


completeness

Chapter 1 Computer Abstractions and Technology 8

Project/Lab Summary

Verilog installed on Unix


SPIM available on Unix, Windows
Lab assignments:

2-3 Projects

Written report

Chapter 1 Computer Abstractions and Technology 10

Grading

Grade breakdown

Three Midterm Exams:


Final:
Homeworks:
Projects:
Quizzes:
Class Attendance:

30%
40%
10%
10%
10%
+10%

Chapter 1 Computer Abstractions and Technology 11

Course Problems

Cant make test

Tell us early and we will schedule alternate time, if appropriate

Three Couponsfor 7 days extension, after that NO late


homeworks or labs

What is cheating?

Common examples of cheating: running out of time on an


assignment and then pick up output, take homework from another
and copy, person asks to borrow solution just to take a look,
copying an exam question, ...
Studying together in groups is encouraged
Work must be your own
Better off to skip an assignment (rather than copying)
But it doesnt help on quiz (15% of grade) anyway

Chapter 1 Computer Abstractions and Technology 12

The Instruction Set: a Critical Interface

software

instruction set

hardware

Chapter 1 Computer Abstractions and Technology 14

COMPUTER ORGANIZATION AND DESIGN


5th
The Hardware/Software Interface

Chapter 1
Computer Abstractions and
Technology

Edition

Progress in computer technology

Makes novel applications feasible

Underpinned by Moores Law

1.1 Introduction

The Computer Revolution

Computers in automobiles
Cell phones
Human genome project
World Wide Web
Search Engines

Computers are pervasive


Chapter 1 Computer Abstractions and Technology 18

Classes of Computers

Personal computers

General purpose, variety of software


Subject to cost/performance tradeoff
!

Server computers

Network based
High capacity, performance, reliability
Range from small servers to building sized

Chapter 1 Computer Abstractions and Technology 19

Classes of Computers

Supercomputers

High-end scientific and engineering


calculations
Highest capability but represent a small
fraction of the overall computer market
!

Embedded computers

Hidden as components of systems


Stringent power/performance/cost constraints

Chapter 1 Computer Abstractions and Technology 20

The PostPC Era

Chapter 1 Computer Abstractions and Technology 21

The PostPC Era

Personal Mobile Device (PMD)

Battery operated
Connects to the Internet
Hundreds of dollars
Smart phones, tablets, electronic glasses

Cloud computing

Warehouse Scale Computers (WSC)


Software as a Service (SaaS)
Portion of software run on a PMD and a portion
run in the Cloud
Amazon and Google
Chapter 1 Computer Abstractions and Technology 22

What You Will Learn

How programs are translated into the


machine language

The hardware/software interface


What determines program performance

And how the hardware executes them

And how it can be improved

How hardware designers improve


performance
What is parallel processing
Chapter 1 Computer Abstractions and Technology 23

Understanding Performance

Algorithm

Programming language, compiler, architecture

Determine number of machine instructions executed


per operation

Processor and memory system

Determines number of operations executed

Determine how fast instructions are executed

I/O system (including OS)

Determines how fast I/O operations are executed

Chapter 1 Computer Abstractions and Technology 24

Design for Moores Law

Use abstraction to simplify design

Make the common case fast

Performance via parallelism

Performance via pipelining

Performance via prediction

Hierarchy of memories

Dependability via redundancy

1.2 Eight Great Ideas in Computer Architecture

Eight Great Ideas

Chapter 1 Computer Abstractions and Technology 25

Application software

Written in high-level language

System software

Compiler: translates HLL code to


machine code
Operating System: service code

1.3 Below Your Program

Below Your Program

Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources

Hardware

Processor, memory, I/O controllers

Chapter 1 Computer Abstractions and Technology 26

Levels of Program Code

High-level language

Assembly language

Level of abstraction closer to


problem domain
Provides for productivity and
portability
Textual representation of
instructions

Hardware representation

Binary digits (bits)


Encoded instructions and
data

Chapter 1 Computer Abstractions and Technology 27

The BIG Picture

Same components for


all kinds of computer

Desktop, server,
embedded

1.4 Under the Covers

Components of a Computer

Input/output includes

User-interface devices

Storage devices

Display, keyboard, mouse


Hard disk, CD/DVD, flash

Network adapters

For communicating with other


computers

Chapter 1 Computer Abstractions and Technology 28

Touchscreen

PostPC device
Supersedes keyboard
and mouse
Resistive and
Capacitive types

Most tablets, smart


phones use capacitive
Capacitive allows
multiple touches
simultaneously

Chapter 1 Computer Abstractions and Technology 29

Through the Looking Glass

LCD screen: picture elements (pixels)

Mirrors content of frame buffer memory

Chapter 1 Computer Abstractions and Technology 30

Opening the Box


Capacitive multitouch LCD screen
3.8 V, 25 Watt-hour battery
Computer board

Chapter 1 Computer Abstractions and Technology 31

Inside the Processor (CPU)

Datapath: performs operations on data


Control: sequences datapath, memory, ...
Cache memory

Small fast SRAM memory for immediate


access to data

Chapter 1 Computer Abstractions and Technology 32

Inside the Processor

Apple A5

Chapter 1 Computer Abstractions and Technology 33

Abstractions
The BIG Picture

Abstraction helps us deal with complexity

Instruction set architecture (ISA)

The hardware/software interface

Application binary interface

Hide lower-level detail

The ISA plus system software interface

Implementation

The details underlying and interface


Chapter 1 Computer Abstractions and Technology 34

A Safe Place for Data

Volatile main memory

Loses instructions and data when power off

Non-volatile secondary memory

Magnetic disk
Flash memory
Optical disk (CDROM, DVD)

Chapter 1 Computer Abstractions and Technology 35

Networks

Communication, resource sharing, nonlocal


access
Local area network (LAN): Ethernet
Wide area network (WAN): the Internet
Wireless network: WiFi, Bluetooth

Chapter 1 Computer Abstractions and Technology 36

Electronics
technology continues
to evolve

Increased capacity
and performance
Reduced cost
DRAM capacity

Year

Technology

Relative performance/cost

1951

Vacuum tube

1965

Transistor

1975

Integrated circuit (IC)

1995

Very large scale IC (VLSI)

2013

Ultra large scale IC

1
35
900
2,400,000

1.5 Technologies for Building Processors and Memory

Technology Trends

250,000,000,000
Chapter 1 Computer Abstractions and Technology 37

Semiconductor Technology

Silicon: semiconductor
Add materials to transform properties:

Conductors
Insulators
Switch

Chapter 1 Computer Abstractions and Technology 38

Manufacturing ICs

Yield: proportion of working dies per wafer


Chapter 1 Computer Abstractions and Technology 39

Intel Core i7 Wafer

300mm wafer, 280 chips, 32nm technology


Each chip is 20.7 x 10.5 mm
Chapter 1 Computer Abstractions and Technology 40

Integrated Circuit Cost


Cost per wafer
Cost per die =
Dies per wafer Yield
Dies per wafer Wafer area Die area
1
Yield =
(1+ (Defects per area Die area/2))2

Nonlinear relation to area and defect rate

Wafer cost and area are fixed


Defect rate determined by manufacturing process
Die area determined by architecture and circuit design

Chapter 1 Computer Abstractions and Technology 41

Which airplane has the best performance?


Boeing 777

Boeing 777

Boeing 747

Boeing 747

BAC/Sud Concorde

BAC/Sud Concorde

Douglas DC-8-50

Douglas DC-8-50
0

125

250

375

500

Boeing 777

Boeing 777

Boeing 747

Boeing 747

BAC/Sud Concorde

BAC/Sud Concorde

Douglas DC-8-50

Douglas DC-8-50
350

700

1050

Cruising Speed (mph)

4500

6750

9000

Cruising Range (miles)

Passenger Capacity

2250

1.6 Performance

Defining Performance

1400

75000

150000 225000 300000

Passengers x mph

Chapter 1 Computer Abstractions and Technology 42

Response Time and Throughput

Response time

How long it takes to do a task

Throughput

Total work done per unit time

How are response time and throughput affected


by

e.g., tasks/transactions/ per hour

Replacing the processor with a faster version?


Adding more processors?

Well focus on response time for now

Chapter 1 Computer Abstractions and Technology 43

Relative Performance

Define Performance = 1/Execution Time


X is n time faster than Y
Performanc e X Performanc e Y
= Execution time Y Execution time X = n

Example: time taken to run a program

10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Chapter 1 Computer Abstractions and Technology 44

Measuring Execution Time

Elapsed time

Total response time, including all aspects

Processing, I/O, OS overhead, idle time

Determines system performance

CPU time

Time spent processing a given job

Discounts I/O time, other jobs shares

Comprises user CPU time and system CPU


time
Different programs are affected differently by
CPU and system performance
Chapter 1 Computer Abstractions and Technology 45

CPU Clocking

Operation of digital hardware governed by a


constant-rate clock
Clock period

Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle

e.g., 250ps = 0.25ns = 2501012s

Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0109Hz


Chapter 1 Computer Abstractions and Technology 46

CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate

Performance improved by

Reducing number of clock cycles


Increasing clock rate
Hardware designer must often trade off clock
rate against cycle count

Chapter 1 Computer Abstractions and Technology 47

CPU Time Example

Computer A: 2GHz clock, 10s CPU time


Designing Computer B

Aim for 6s CPU time


Can do faster clock, but causes 1.2 clock cycles

How fast must Computer B clock be?


Clock CyclesB 1.2 Clock Cycles A
Clock RateB =
=
CPU Time B
6s
Clock Cycles A = CPU Time A Clock Rate A
= 10s 2GHz = 20 10 9
1.2 20 10 9 24 10 9
Clock RateB =
=
= 4GHz
6s
6s
Chapter 1 Computer Abstractions and Technology 48

Instruction Count and CPI


Clock Cycles = Instruction Count Cycles per Instruction
CPU Time = Instruction Count CPI Clock Cycle Time
Instruction Count CPI
=
Clock Rate

Instruction Count for a program

Determined by program, ISA and compiler

Average cycles per instruction

Determined by CPU hardware


If different instructions have different CPI

Average CPI affected by instruction mix


Chapter 1 Computer Abstractions and Technology 49

CPI Example

Computer A: Cycle Time = 250ps, CPI = 2.0


Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time

CPU Time

= Instruction Count CPI Cycle Time


A
A
A is faster
= I 2.0 250ps = I 500ps
= Instruction Count CPI Cycle Time
B
B
= I 1.2 500ps = I 600ps

CPU Time

B = I 600ps = 1.2
CPU Time
I 500ps
A

by this much

Chapter 1 Computer Abstractions and Technology 50

CPI in More Detail

If different instruction classes take different


numbers of cycles
n

Clock Cycles = (CPIi Instruction Count i )


i=1

Weighted average CPI

n
Clock Cycles
Instruction Count i $
'
CPI =
= % CPIi
"
Instruction Count i=1 &
Instruction Count #

Relative frequency
Chapter 1 Computer Abstractions and Technology 51

CPI Example

Alternative compiled code sequences using


instructions in classes A, B, C
Class

CPI for class

IC in sequence 1

IC in sequence 2

Sequence 1: IC = 5

Clock Cycles
= 21 + 12 + 23
= 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6

Clock Cycles
= 41 + 12 + 13
=9
Avg. CPI = 9/6 = 1.5

Chapter 1 Computer Abstractions and Technology 52

Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time =

Program
Instruction Clock cycle

Performance depends on

Algorithm: affects IC, possibly CPI


Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Chapter 1 Computer Abstractions and Technology 53

1.7 The Power Wall

Power Trends

In CMOS IC technology
Power = Capacitive load Voltage 2 Frequency
30

5V 1V

1000

Chapter 1 Computer Abstractions and Technology 54

Reducing Power

Suppose a new CPU has

85% of capacitive load of old CPU


15% voltage and 15% frequency reduction

Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85


4
=
=
0.85
= 0.52
2
Pold
Cold Vold Fold

The power wall

We cant reduce voltage further


We cant remove more heat

How else can we improve performance?


Chapter 1 Computer Abstractions and Technology 55

1.8 The Sea Change: The Switch to Multiprocessors

Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency
Chapter 1 Computer Abstractions and Technology 56

Multiprocessors

Multicore microprocessors

More than one processor per chip

Requires explicitly parallel programming

Compare with instruction level parallelism

Hardware executes multiple instructions at once


Hidden from the programmer

Hard to do

Programming for performance


Load balancing
Optimizing communication and synchronization
Chapter 1 Computer Abstractions and Technology 57

SPEC CPU Benchmark

Programs used to measure performance

Standard Performance Evaluation Corp (SPEC)

Supposedly typical of actual workload


Develops benchmarks for CPU, I/O, Web,

SPEC CPU2006

Elapsed time to execute a selection of programs

Negligible I/O, so focuses on CPU performance

Normalize relative to reference machine


Summarize as geometric mean of performance ratios

CINT2006 (integer) and CFP2006 (floating-point)


n

Execution time ratio

i=1

Chapter 1 Computer Abstractions and Technology 58

CINT2006 for Intel Core i7 920

Chapter 1 Computer Abstractions and Technology 59

SPEC Power Benchmark

Power consumption of server at different


workload levels

Performance: ssj_ops/sec
Power: Watts (Joules/sec)

& 10
# & 10
#
Overall ssj_ops per Watt = $ ssj_ops i ! $ poweri !
% i=0
" % i=0
"

Chapter 1 Computer Abstractions and Technology 60

SPECpower_ssj2008 for Xeon X5650

Chapter 1 Computer Abstractions and Technology 61

Improving an aspect of a computer and


expecting a proportional improvement in overall
performance
Timproved

Taffected
=
+ Tunaffected
improvemen t factor

Example: multiply accounts for 80s/100s

How much improvement in multiply performance to


get 5 overall?

80
20 =
+ 20
n

1.10 Fallacies and Pitfalls

Pitfall: Amdahls Law

Cant be done!

Corollary: make the common case fast


Chapter 1 Computer Abstractions and Technology 62

Fallacy: Low Power at Idle

Look back at i7 power benchmark

Google data center

At 100% load: 258W


At 50% load: 170W (66%)
At 10% load: 121W (47%)
Mostly operates at 10% 50% load
At 100% load less than 1% of the time

Consider designing processors to make


power proportional to load
Chapter 1 Computer Abstractions and Technology 63

Pitfall: MIPS as a Performance Metric

MIPS: Millions of Instructions Per Second

Doesnt account for

Differences in ISAs between computers


Differences in complexity between instructions

Instruction count
MIPS =
Execution time 10 6
Instruction count
Clock rate
=
=
6
Instruction count CPI
CPI

10
6
10
Clock rate

CPI varies between programs on a given CPU


Chapter 1 Computer Abstractions and Technology 64

Cost/performance is improving

Hierarchical layers of abstraction

In both hardware and software

Instruction set architecture

Due to underlying technology development

1.9 Concluding Remarks

Concluding Remarks

The hardware/software interface

Execution time: the best performance


measure
Power is a limiting factor

Use parallelism to improve performance


Chapter 1 Computer Abstractions and Technology 65