Академический Документы
Профессиональный Документы
Культура Документы
2
What makes a good ISA?
• Implementability
• Supports a (performance/cost) range of
implementations
• Programmability
• Easy to express programs (for human and/or compiler)
• backward/forward compatibility
• Implementability & programmability across generations
• Business reality: software cost greater than hardware
cost
• e.g., x86 generations: 8086, 286, 386, 486, Pentium,
Pentium III, Core i3,...
3
Pre-1975: Human Programmability
4
Pre-1975: Human Programmability
• Orthogonality, composability
• all combinations of operation, data type, addressing mode possible
• e.g., ADD and SUB should have same addressing modes
• Language development
7
CISC – Complex Instruction Set Computer
8
CISC – Complex Instruction Set Computer
9
CISC – Complex Instruction Set Computer
• Microprogramming
• Complex instructions are split into a series of
simples instructions
• Complex instruction = small microprogram
stored in a control memory (ROM) and executed
by CPU
• Simplify design of processor
• allows the addition of new instructions
• allows bug-fixes after processor is released in the
market
10
CISC – Complex Instruction Set Computer
• Disadvantages
• More complex hardware translation from a high level to control
signals and optimization needs to be done by hardware
• Earlier generations of a processor family generally were contained as a
subset in every new version - so instruction set & chip hardware
become more complex with each generation of computers.
• Instructions with different length
• different instructions will take different amounts of clock time to execute,
slowing down the overall performance of the machine.
• Many specialized instructions aren't used frequently enough to justify
their existence -approximately 20% of the available instructions are
used in a typical program.
• Complexity
• Pipelining bottlenecks lower clock rates
• Marketing
• Prolonged design time and frequent microcode errors hurt competitiveness
11
Intel x86 Processors
• Dominate laptop/desktop/server market
• Evolutionary design
• Backwards compatible up until 8086, introduced in 1978
• Added more features as times goes on
• Nowadays, about 5,000 pages in documentation
• CISC Architecture
• Many different instructions with many different formats
12
Intel x86 Evolution: Milestones
Name Date Transistors MHz
• 8086 1978 29K 5-10
• First 16-bit Intel processor. Basis for IBM PC & DOS
• 1MB address space
• 386 1985 275K 16-33
• First 32 bit Intel processor , referred to as IA32
• Added “flat addressing”, capable of running Unix
• Pentium 4E 2004 125M 2800-3800
• First 64-bit Intel x86 processor, referred to as x86-64
• Core 2 2006 291M 1060-3333
• First multi-core Intel processor
• Core i7 2008 731M 1600-4400
• Four cores
13
Intel x86 Processors
• Machine Evolution
• 386 1985 0.3M
• Pentium 1993 3.1M
• Pentium/MMX 1997 4.5M
• PentiumPro 1995 6.5M
• Pentium III 1999 8.2M
• Pentium 4 2000 42M
• Core 2 Duo 2006 291M
• Core i7 2008 731M
• Added Features
• Instructions to support multimedia operations
• Instructions to enable more efficient conditional operations
• Transition from 32 bits to 64 bits
• More cores
14
Intel x86 Processors
• Past Generations
• 1st Pentium Pro 1995 600 nm
• 1st Pentium III 1999 250 nm
• 1st Pentium 4 2000 180 nm
• 1st Core 2 Duo 2006 65 nm
• Recent Generations Process technology dimension
= width of narrowest wires
1. Nehalem 2008 45 nm
(10 nm ≈ 100 atoms wide)
2. Sandy Bridge 2011 32 nm
3. Ivy Bridge 2012 22 nm
4. Haswell 2013 22 nm
5. Broadwell 2014 14 nm
6. Skylake 2015 14 nm
7. Kaby Lake 2016 14 nm
• Upcoming Generations
• Cannonlake 2017? 10 nm
15
2017 State of the Art: Skylake
• Mobile Model: Core i7
• 2.6-2.9 GHz
• 45 W
17
Intel’s 64-Bit History
• 2001: Intel Attempts Radical Shift from IA32 to IA64
• Totally different architecture (Itanium)
• Executes IA32 code only as legacy
• Performance disappointing
• 2003: AMD Steps in with Evolutionary Solution
• x86-64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64
• Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32
• Extended Memory 64-bit Technology
• Almost identical to x86-64!
• All but low-end x86 processors support x86-64
• But, lots of code still runs in 32-bit mode
18
“Simplicity is the ultimate sophistication”
Leonardo da Vinci
19
RISC – Reduced Instruction Set Computer
• John Cock
• IBM 801, 1980 (started in 1975)
• Name 801 came from the building that housed the project
• Idea: Possible to make a very small and very fast core
• Influences: Known as “the father of RISC Architecture”. Turing
Award Recipient and National Medal of Science.
20
RISC – Reduced Instruction Set Computer
Dave Patterson John L. Hennessy
• RISC Project, 1982 • MIPS, 1981
• UC Berkeley • Stanford
• RISC-I: ½ transistors & • Simple pipelining, keep
3x faster full
• Influences: Sun • Influences: MIPS
SPARC computer system,
PlayStation, Nintendo
21
RISC – Reduced Instruction Set Computer
• Low complexity
• Generally results in overall speedup
• Less error-prone implementation by hardwired logic or
simple microcodes
• VLSI implementation advantages
• Less transistors
• Extra space: more registers, cache
• Marketing
• Reduced design time, less errors, and more options
increase competitiveness
22
RISC – Reduced Instruction Set
Computer
• Principle: sacrifice everything for speed
• reduce the number of instructions – make CPU simpler
• get rid of complex instructions, which may slow down
the CPU
• use simple addressing modes – less time spent to
compute the address of an operand
• limit the number of accesses to the memory
• if a given operation cannot be executed in one clock
period than do not implement it in an instruction
• extensive use of pipeline architecture – in order to reach
CPI <=1 (at least one instruction per clock period)
23
RISC – Reduced Instruction Set Computer
24
RISC – Reduced Instruction Set Computer
25
RISC – Reduced Instruction Set Computer
• MIPS (RISC)
• ≈ 200 instructions, 32 bits each, 3 formats
• all operands in registers
• almost all are 32 bits each
• ≈ 1 addressing mode: Mem[reg + imm]
• x86 (ClSC)
• > 1000 instructions, 1 to 15 bytes each
• operands in dedicated registers, general purpose registers,
memory, on stack, …
• can be 1, 2, 4, 8 bytes, signed or unsigned
• 10 types of addressing modes
• e.g. Mem[segment + reg + reg*scale + offset]
26
RISC x CISC
27
RISC x CISC (half-truth)
CISC RISC
µ-
µ- INSTRUCTIONS
INSTRUCTIONS
INSTRUCTIONS PROCESSING
PROCESSING
28
ARM versus x86
Android OS on ARM processor Windows OS on Intel (x86) processor
High-end processor High-end processor
• < 100 instructions • ~ 2000 instructions
• 13 pipeline stages • 14 pipeline stages
• 13B produced (2013) • 100 M produced (2013)
29
ARM
• ARM is British semiconductor
and SW-tools development company,
founded in 1990
• ARM - leading RISC architecture, used in wide variety of
products (mobile devices, peripherals, computers,
HD/SSD controllers, automotive apps, IoT devices,
wearables, etc.)
• During 2014 12 billion ARM-based chips shipped, 20%
annual growth, 17-37% market share [Investopedia]
• In 2016 august SoftBank group (SW, Information
Revolution) purchased ARM Holdings (HW, technology),
aiming more IoT.
30
ARM – family history
Architecture Bit width Cores designed by Designed by third partners Cortex profile
ARM Holdings
ARMv2 32/26 ARM2, ARM3 Amber, STORM Open Soft
Core
ARMv3 32 ARM6, ARM7
32
ARM Architecture
Microcontroller System bus
ARM® CortexTM-M
processor
Input
PPB ports
Internal
Advanced
peripherals High-perf Output
Bus ports
Instructions
Flash ROM Data
ICode bus DCode bus RAM
• Harvard architecture
Different busses for instruction and data
• RISC machine
Pipelining (single cycle operation for many instructions)
Tumb-2 configuration for both 16- and 32-bit instructions
33
ARM ISA: Instruction size
• Variable-length instructions
• ARM instructions are a fixed
length of 32 bits
• Thumb instructions are a fixed
length of 16 bits
• Thumb-2 instructions can be
either 16-bit or 32-bit
• Thumb-2 gives approximately
26% improvement in code
density over ARM
• Thumb-2 gives approximately
25% improvement in
performance over Thumb
34
ARM: state of Art
35
POWER
36
Complex vs. Simple Instructions
• Complex instruction: An instruction does a lot of work,
e.g. many operations
• Insert in a doubly linked list
• Compute FFT
• String copy
37
Complex vs. Simple Instructions
• early 80’s: RISC movement challenges “CISC
establishment”
• RISC (reduced instruction set computer)
• Berkeley RISC-I (Patterson), Stanford MIPS (Hennessy),
IBM 801
• CISC (complex instruction set computer)
• VAX, x86
• word “CISC” did not exist before word RISC came
along
38
Complex vs. Simple Instructions
• RISC argument [Dave Patterson]
• CISC is fundamentally handicapped
• for a given technology, RISC implementation will be faster
• current VLSI technology enables single-chip RISC
• when technology enables single-chip CISC, RISC will be
pipelined
• when technology enables pipelined CISC, RISC will have
caches
• when technology enables CISC with caches, RISC will have ...
• CISC rebuttal [Bob Colwell]
• CISC flaws not fundamental (fixed with more transistors)
• Moore’s Law will narrow the RISC/CISC gap (true)
• software costs will dominate (very true)
39
Complex vs. Simple Instructions
• Argues
• RISCs fundamentally better than CISCs
• implementation effects and compilers are second order
• unfair because it compares specific implementations
• VAX advantages: big immediates, not-taken branches
• MIPS advantages: more registers, FPU, instruction scheduling, TB
40
RISC curiosities
• Most commercially successful ISA is x86
• also: PentiumPro was first out-of-order microprocessor
• good RISC pipeline, 100K transistors
• good CISC pipeline, 300K transistors
• by 1995: 2M+ transistors evened pipeline playing field
• rest of transistors used for caches (diminishing returns)
• Intel’s other trick?
• decoder translates CISC into sequences of RISC μops
42
ISA-level Tradeoffs: Semantic Gap
• Simple compiler, complex hardware vs.
complex compiler, simple hardware
• Caveat: Translation (indirection) can change the tradeoff!
• Performance?
• Optimization opportunity: Example of VAX INDEX instruction:
who (compiler vs. hardware) puts more effort into
optimization?
• Instruction size, code size
43
Small versus Large Semantic Gap
• CISC vs. RISC
• Complex instruction set computer complex instructions
• Initially motivated by “not good enough” code generation
• Reduced instruction set computer simple instructions
• John Cocke, mid 1970s, IBM 801
• Goal: enable better compiler control and optimization
• RISC motivated by
• Memory stalls (no work done in a complex instruction when
there is a memory stall?)
• When is this correct?
• Simplifying the hardware lower cost, higher frequency
• Enabling the compiler to optimize the code better
• Find fine-grained parallelism to reduce stalls
44
A Note on ISA Evolution
• ISAs have evolved to reflect/satisfy the concerns of the
day
• Examples:
• Limited on-chip and off-chip memory size
• Limited compiler optimization technology
• Limited memory bandwidth
• Need for specialization in important applications (e.g., MMX)
• Tradeoffs
• Code size (memory space, bandwidth, latency) vs. hardware complexity
• ISA extensibility and expressiveness
• Performance? Smaller code vs. imperfect decode
47
ISA-level Tradeoffs: Decode type
• Uniform decode: Same bits in each instruction
correspond to the same meaning
• Opcode is always in the same location
• Ditto operand specifiers, immediate values, …
• Many “RISC” ISAs: Alpha, MIPS, SPARC
+ Easier decode, simpler hardware
+ Enables parallelism: generate target address before knowing the
instruction is a branch
-- Restricts instruction format (fewer instructions?) or wastes space
• Non-uniform decode
• E.g., opcode can be the 1st-7th byte in x86
+ More compact and powerful instruction format
-- More complex decode logic
48
ISA Wars: Systematic study
ACM Transactions on Computer Systems, Vol. 33, No. 1, Article 3, Publication date: March 2015.
49
ISA Wars: Systematic study
50
ISA Wars: Systematic study
51
ISA Wars: Systematic study
“Role of ISA: Although our study shows that RISC and CISC ISA traits
are irrelevant to power and performance characteristics of modern
cores, ISAs continue to evolve to better support exposing workload-
specific semantic information to the execution substrate. On x86, such
changes include the transition to Intel64 (larger word sizes, optimized calling
conventions, and shared code support), … , architectural support for
transactions in the form of HLE. Similarly, ARM ISA has introduced shorter
fixed-length instructions for low-power targets (Thumb), vector extensions
(NEON), DSP and bytecode execution extensions (Jazelle DBX), Trustzone
security, and hardware virtualization support. Thus, although ISA… ”
• CONCLUSION:
• CISC x RISC DISCUSSION IS IRRELEVANT
• In both cases performance and power-efficiency are the same
• Microarchitecture and design methodology are the main factor weighing on
performance and consumption.
52