Академический Документы
Профессиональный Документы
Культура Документы
Introduction
Introduction
1core = 1 bit
Introduction
!!
What are the elements of a Von Neumann computer? How do these elements interact to execute programs?
Introduction
!!
What are the elements of a Von Neumann computer? How do these elements interact to execute programs?
Introduction
Data
Program
Stored-program computer
system bus
Copyright 2013, Daniel A. Menasce. All rights reserved. 6
Introduction
Data
Program
Stored-program computer
Sequence of instructions: (1) Load/Store from memory to/from registers, (2) Arithmetic/Logical Ops., (3) Conditional and unconditional branches.
Introduction
RY RZ Address
opcode RX
opcode
Address
Introduction
Computer Technology
!!
Performance improvements:
!!
Feature size, clock speed Enabled by HLL compilers, UNIX Lead to RISC architectures
!!
!!
Introduction
RISC
10
Introduction
Moores Law
The number of transistors on integrated circuits doubles approximately every two years. (Gordon Moore, 1965)
11
Introduction
!!
!!
Introduction
a b + e * g
c + f
13
Introduction
Instructions that can potentially be executed in parallel: 001, 002, 005, 006 003, 007 004, 008, 009 010
14
Introduction
Task
Data
Task
Data
15
Introduction
Task
Tasks
Copyright 2013, Daniel A. Menasce. All rights reserved. 16
Classes of Computers
Classes of Computers
!!
e.g. smart phones, tablet computers Emphasis on energy efficiency and real-time Emphasis on price-performance Emphasis on availability, scalability, throughput Used for Software as a Service (SaaS) Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Emphasis: price
Copyright 2012, Elsevier Inc. All rights reserved.
!!
Desktop Computing
!!
!!
Servers
!!
!!
!!
Embedded Computers
!!
17
Classes of Computers
Parallelism
!!
!!
Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism
18
Classes of Computers
Flynns Taxonomy
!!
Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD)
!! !! !!
!!
!!
No commercial implementation
!!
Registers (register-to-memory or load-store), memory addressing (e.g., byte addressing, alignment), addressing modes e.g., register, immediate, displacement), instruction operands (type and size), available operations (CISC or RISC), control flow instructions, instruction encoding (fixed vs. variable length)
!!
!!
Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware
Copyright 2012, Elsevier Inc. All rights reserved. 20
Trends in Technology
Trends in Technology
!!
Transistor density: 35%/year Die size: 10-20%/year Integration overall (Moores Law): 40-55%/year
!!
DRAM capacity: 25-40%/year (slowing) Flash capacity (non-volatile semiconductor memory): 50-60%/year
!!
!!
!!
Trends in Technology
Bandwidth or throughput
!! !! !!
Total work done in a given time (e.g., MB/sec, MIPS) 10,000-25,000X improvement for processors 300-1200X improvement for memory and disks
!!
Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks
22
Trends in Technology
Trends in Technology
Feature size
!!
!! !!
Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly
!!
!!
24
Problem: Get power in, get power out Thermal Design Power (TDP)
!! !! !!
!!
Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption
!!
Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement
Copyright 2012, Elsevier Inc. All rights reserved. 25
!!
Introduction
Energy consumption = P * T
Dynamic energy
!! !!
!!
Dynamic power
!!
!!
27
Assume a processor with Dynamic Voltage Scaling (DVS) and that a 15% reduction in voltage results in 15% reduction in frequency. What is the impact on dynamic energy and dynamic power? Assume that the capacitance is unchanged.
Energynew (Voltage " 0.85) 2 = = 0.72 2 Energy old Voltage
Powernew (Voltage " 0.85) " (Frequency " 0.85) = = 0.61 2 Powerold Voltage " Frequency
2
!
!
Copyright 2012, Elsevier Inc. All rights reserved. 28
Power
!! !!
!!
!!
Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air
29
Reducing Power
!!
!! !! !!
Do nothing well: turn-off clock of inactive modules. Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks (spin-down) Overclocking some cores and turning off other cores
30
Static Power
!!
!!
!! !! !!
Leakage current flows even when a transistor is off Leakage can be as high as 50% due to large SRAM caches that need power to maintain values. Powerstatic = Currentstatic x Voltage Scales with number of transistors To reduce: power gating (i.e., turn off the power supply).
31
Trends in Cost
Trends in Cost
!!
!!
DRAM: price closely tracks cost Microprocessors: price depends on volume (less time to get down the learning curve and increase in manufacturing efficiency)
!!
!!
Trends in Cost
Integrated circuit
Bose-Einstein formula:
empirical formula
!! !!
Defects per unit area = 0.016-0.057 defects per square cm (2010) N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
Copyright 2012, Elsevier Inc. All rights reserved. 33
Trends in Cost
What is the number of dies in a 30 cmdiameter wafer for a die that is 1.5 cm on a side and for a die that is 1.0 cm on a side?
" # (30/2) 2 " # 30 Dies per wafer = $ = 270 1.5 # 1.5 2 # 1.5 # 1.5 " # (30/2) 2 " # 30 Dies per wafer = $ = 640 1.0 # 1.0 2 # 1.0 # 1.0
34
Trends in Cost
What is the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a defect density of 0.031/cm2, N = 13.5, a wafer yield of 100%?
1 13.5 = 0.40 (1 + 0.031 " 1.5 " 1.5) 1 13.5 = 0.66 (1 + 0.031 " 1.0 " 1.0)
35
Dependability
Dependability
!!
Service Level Agreement (SLA) or Service Level Objectives (SLO) Module reliability
!! !! !! !!
!!
Mean time to failure (MTTF) reliability measure Mean time to repair (MTTR) service interruption Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF
36
failure
machine is fixed
failure
MTBF
37
Dependability
Dependability Example
!!
!!
Assume that lifetimes are exponentially distributed and independent, what is the systems MTTF?
Copyright 2012, Elsevier Inc. All rights reserved. 38
Dependability
MTTFsyst
!
Measuring Performance
Measuring Performance
!!
!!
Speedup of X relative to Y
!!
!!
Execution time
!! !!
Wall clock time: includes all system overheads CPU time: only computation time
!!
Benchmarks
!! !! !! !!
Kernels (e.g., matrix multiply) Toy programs (e.g., sorting) Synthetic benchmarks (e.g., Dhrystone) Benchmark suites (e.g., SPEC, TPC)
Copyright 2012, Elsevier Inc. All rights reserved. 40
Measuring Performance
SPECrate: a throughput metrics. Measures the number jobs of a given type that can be processed in a given time interval. SPECratio = ratio between elapsed time for a given job at a reference machines and the elapsed time of the same job at a given machine.
!!
Execution Time machine B = 21sec Execution Time ! machine A = 5.25sec Execution Time reference SPECratiomachine A Execution Time machine A Execution Time machine B = = = 21/5.25 = 4 Execution Time SPECratiomachine B Execution Time machine A reference Execution Time machine B
Copyright 2013, Daniel A. Menasce 41
Measuring Performance
When computing the average of SPECratios one should use the geometric mean and not the arithmetic mean:
n
Geometric mean = n
Program
"x
i=1
Geometric mean
42
Principles
e.g., multiple processors, disks, memory banks, pipelining, multiple functional units
!!
Principle of Locality
!!
!!
Amdahls Law
43
Principles
A program takes 60 sec to execute. But, 30% of its execution time can have its execution time improved by a factor of 5. What is the overall speedup?
Execution Time new = 0.7 " 60 + 0.3 " 60 /5 = 42 + 18 /5 = 45.6 sec Speedup overall = 60 /45.6 = 1.316
44
Principles
There are two options to enhance the performance of a graphics application: enhance by a factor of 10 FP SQRT or enhance by a factor of 1.6 all FP operations. Which is best?
% Occurrence Speedup of Enhancement 20% 50% 10 1.6 Overall Speedup =1/(0.2/10+0.8)=1.22 =1/(0.5+0.5/1.6)=1.23
45
Principles
46
Principles
!!
Compare the following alternatives: (a) decrease CPI of FP SQRT to 2 or (b) decrease avg. CPI of all FP ops to 2.5. Which is best?
Copyright 2012, Elsevier Inc. All rights reserved. 47
Principles
!! !!
Original CPI: CPIorig = 0.25 " 4 + 0.75 " 1.33 = 2.0 CPI for enhanced FP SQRT:
CPI new FPSQRT = 2.0 " 0.02 # 20 + 0.02 # 2 = 1.64 !
!!
Principles
Original CPI: CPIorig = 0.25 " 4 + 0.75 " 1.33 = 2.0 CPI for enhanced FP:
! CPI new FP = 2.0 " 0.25 # 4 + 0.25 # 2.5 = 1.625
IC " CPI orig 2.0 = = 1.23 IC " CPI new FP 1.625
!!
Speedup:
!
Speedup =
!!
Most modern processors have counters for ! instructions executed and for clock cycles.
49
Principles
!!
Improve performance by replacing high-clock and inefficient core with several lower-clockrate efficient cores. Real performance improvement burden is now shifted to programmers. Should check overall improvement using Amdahls law before spending effort on enhancement.
Copyright 2012, Elsevier Inc. All rights reserved. 50
!!
Principles
!!
Fallacy: Hardware enhancements that increase performance improve energy efficiency or are at worst energy neutral
!!
Running SPEC2006 on an Intel Core i7 n Turbo mode showed a 7% performance increase at a cost of 37% more energy and 47% more power.
Copyright 2012, Elsevier Inc. All rights reserved. 51
Principles
!!
Fallacy: The rated MTTF of disks is 1,200,000 hours (almost 140 years). So, disks practically never fail.
!!
These numbers are exaggerated due to manufacturer misleading testing processes. Real-world MTTF is about 2 to 10 times worse than manufacturers MTTF.
Copyright 2012, Elsevier Inc. All rights reserved. 52
Principles
!!
53