Вы находитесь на странице: 1из 71

Data Movement

Data Storage
Processing from/to Storage
Processing from Storage to I/O
Structure - Top Level
Peripherals Computer

Central Main
Processing Memory
Unit

Computer
Systems
Interconnection

Input
Output
Communication
lines
Structure - The CPU

CPU

Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection

Control
Unit
Structure - The Control Unit
Control Unit

CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders

Control
Memory
Load R2, LOC
Add R4,R2,R3
Store R4, LOC
Structure of von Neumann machine
Growth in CPU Transistor Count
DEC - PDP-8 Bus Structure
Logic and Memory Performance Gap
Intel Microprocessor Performance
 • Intel Core i7 (2009)  • 128-bit data (2x)
 • Application:  • 14-stage pipelined
desktop/server datapath (0.5x)
 • Technology: 45nm  • 4 instructions per
(1/2x) cycle (~1x)
 • 774M transistors  • Three levels of on-
(12x) chip cache
 • 296 mm2 (3x)  • data-parallel vector
 • 3.2 GHz to 3.6 Ghz (SIMD) instructions,
(~1x) hyperthreading
 • 0.7 to 1.4 Volts (~1x)  • Four-core
multicore (4x)
Possible Organization of an Embedded System
System Clock
 Single task
 Base runtime defined for each
benchmark using reference machine
 Results are reported as ratio of
reference time to system run time
 Trefi execution time for benchmark i on
reference machine
 Tsuti execution time of benchmark i on test
system

• Overall performance calculated by averaging


ratios for all 12 integer benchmarks
— Use geometric mean
– Appropriate for normalized numbers
such as ratios
 Conclusions
 f small, parallel processors has little effect
 N ->∞, speedup bound by 1/(1 – f)
 Diminishing returns for using more processors

Вам также может понравиться