Вы находитесь на странице: 1из 52

CISC versus RISC

Does this comparison still make any sense?


ISA definition in 1964…

“Instruction set architecture is the structure of a computer


that a machine language programmer (or a compiler) must
understand to write a correct (timing independent) program
for that machine”
–IBM introducing 360 (1964)

An instruction set specifies a processor’s functionality:


• what operations it supports
• what storage mechanisms it has & how they are accessed
• how the programmer/compiler communicates programs to
• Processor instruction set architecture (ISA): “architecture” part
of this course
• the rest is micro-architecture

2
What makes a good ISA?
• Implementability
• Supports a (performance/cost) range of
implementations
• Programmability
• Easy to express programs (for human and/or compiler)
• backward/forward compatibility
• Implementability & programmability across generations
• Business reality: software cost greater than hardware
cost
• e.g., x86 generations: 8086, 286, 386, 486, Pentium,
Pentium III, Core i3,...

3
Pre-1975: Human Programmability

• Focus: instruction sets that were easy for humans


to program

• ISA semantically close to high-level-language (HLL)

• Semantically heavy (CISC-like) instructions


• Implicit saves/restores on procedure calls
• People thought computers would someday execute HLL
directly
• Never materialized - “multiple HLLs” = “semantic clash”

4
Pre-1975: Human Programmability

• Earliest machine were programmed in assembly


language and memory was slow and expensive
• Bigger program  more storage  more $

• Need to reduce the number of instructions/program


• A single instruction can be represented by multiple operations

• Instruction length variable  fetch-decode-execute time


unpredictable (CPI > 1)

• Hardware handless the complexity


• E.g.: VAX, x86
5
1975-: Compiler Programmability
Focus: instruction sets that are easy for compilers to compile
to
• Primitive instructions from which solutions are synthesized
• Provide primitives (not solutions)1
• Hard for compiler to tell if complex instruction fits situation

• Regularity: do things the same way, consistently

• Orthogonality, composability
• all combinations of operation, data type, addressing mode possible
• e.g., ADD and SUB should have same addressing modes

• Few modes/obvious choices


• compilers do complicated case analysis, don’t add more cases

1“Compilers and Computer Architecture” by William Wulf, IEEE


Computer, 14(8), 1981.
6
CISC – Complex Instruction Set Computer

• Language development

• More operations  increase instruction size


• Design ever more complex instructions
• Provide more addressing modes

• Implement some High-Level Languages constructs in ISA


• Back-end compiler simplification = Hardware handle all
complexity

7
CISC – Complex Instruction Set Computer

• Intel 8086, 80286, 80386, 80486, Pentium


• Each instruction is hard-wired into the control
unit (ALU)

• Difficult and expensive to design and build new


instructions

• Motivate the use of microprogramming

8
CISC – Complex Instruction Set Computer

• In the last 20 years: optimization in performance


• Changes in software and hardware technology have
forced a re-examination of CISC and many modern CISC
processors are hybrids, implementing many RISC
principles.
• The arise of microprogramming technology
• Ease to add new instructions and maintain backward
compatibility.
• Less hardware-wire instructions
• Microprogram instruction sets can be writer to match the
constructs of high-level languages
• Compiler does not have to be as complicated
• Not avoid HW implementations of some high-level functions

9
CISC – Complex Instruction Set Computer

• Microprogramming
• Complex instructions are split into a series of
simples instructions
• Complex instruction = small microprogram
stored in a control memory (ROM) and executed
by CPU
• Simplify design of processor
• allows the addition of new instructions
• allows bug-fixes after processor is released in the
market

10
CISC – Complex Instruction Set Computer
• Disadvantages
• More complex hardware  translation from a high level to control
signals and optimization needs to be done by hardware
• Earlier generations of a processor family generally were contained as a
subset in every new version - so instruction set & chip hardware
become more complex with each generation of computers.
• Instructions with different length
• different instructions will take different amounts of clock time to execute,
slowing down the overall performance of the machine.
• Many specialized instructions aren't used frequently enough to justify
their existence -approximately 20% of the available instructions are
used in a typical program.
• Complexity
• Pipelining bottlenecks  lower clock rates
• Marketing
• Prolonged design time and frequent microcode errors hurt competitiveness

11
Intel x86 Processors
• Dominate laptop/desktop/server market
• Evolutionary design
• Backwards compatible up until 8086, introduced in 1978
• Added more features as times goes on
• Nowadays, about 5,000 pages in documentation
• CISC Architecture
• Many different instructions with many different formats

12
Intel x86 Evolution: Milestones
Name Date Transistors MHz
• 8086 1978 29K 5-10
• First 16-bit Intel processor. Basis for IBM PC & DOS
• 1MB address space
• 386 1985 275K 16-33
• First 32 bit Intel processor , referred to as IA32
• Added “flat addressing”, capable of running Unix
• Pentium 4E 2004 125M 2800-3800
• First 64-bit Intel x86 processor, referred to as x86-64
• Core 2 2006 291M 1060-3333
• First multi-core Intel processor
• Core i7 2008 731M 1600-4400
• Four cores
13
Intel x86 Processors
• Machine Evolution
• 386 1985 0.3M
• Pentium 1993 3.1M
• Pentium/MMX 1997 4.5M
• PentiumPro 1995 6.5M
• Pentium III 1999 8.2M
• Pentium 4 2000 42M
• Core 2 Duo 2006 291M
• Core i7 2008 731M

• Added Features
• Instructions to support multimedia operations
• Instructions to enable more efficient conditional operations
• Transition from 32 bits to 64 bits
• More cores
14
Intel x86 Processors
• Past Generations
• 1st Pentium Pro 1995 600 nm
• 1st Pentium III 1999 250 nm
• 1st Pentium 4 2000 180 nm
• 1st Core 2 Duo 2006 65 nm
• Recent Generations Process technology dimension
= width of narrowest wires
1. Nehalem 2008 45 nm
(10 nm ≈ 100 atoms wide)
2. Sandy Bridge 2011 32 nm
3. Ivy Bridge 2012 22 nm
4. Haswell 2013 22 nm
5. Broadwell 2014 14 nm
6. Skylake 2015 14 nm
7. Kaby Lake 2016 14 nm
• Upcoming Generations
• Cannonlake 2017? 10 nm
15
2017 State of the Art: Skylake
• Mobile Model: Core i7
• 2.6-2.9 GHz
• 45 W

• Desktop Model: Core i7


• Integrated graphics
• 2.8-4.0 GHz
• 35-91 W

• Server Model: Xeon


• Integrated graphics
• Multi-socket enabled
• 2-3.7 GHz
• 25-80 W
16
x86 Clones: Advanced Micro Devices
(AMD)
• Historically
• AMD has followed just behind Intel
• A little bit slower, a lot cheaper
• Then
• Recruited top circuit designers from Digital Equipment Corp.
and other downward trending companies
• Built Opteron: tough competitor to Pentium 4, XEON
• Developed x86-64, their own extension to 64 bits
• Recent Years
• Intel got its act together
• Leads the world in semiconductor technology
• AMD has fallen behind
• Relies on external semiconductor manufacturer

17
Intel’s 64-Bit History
• 2001: Intel Attempts Radical Shift from IA32 to IA64
• Totally different architecture (Itanium)
• Executes IA32 code only as legacy
• Performance disappointing
• 2003: AMD Steps in with Evolutionary Solution
• x86-64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64
• Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32
• Extended Memory 64-bit Technology
• Almost identical to x86-64!
• All but low-end x86 processors support x86-64
• But, lots of code still runs in 32-bit mode
18
“Simplicity is the ultimate sophistication”
Leonardo da Vinci

19
RISC – Reduced Instruction Set Computer
• John Cock
• IBM 801, 1980 (started in 1975)
• Name 801 came from the building that housed the project
• Idea: Possible to make a very small and very fast core
• Influences: Known as “the father of RISC Architecture”. Turing
Award Recipient and National Medal of Science.

20
RISC – Reduced Instruction Set Computer
Dave Patterson John L. Hennessy
• RISC Project, 1982 • MIPS, 1981
• UC Berkeley • Stanford
• RISC-I: ½ transistors & • Simple pipelining, keep
3x faster full
• Influences: Sun • Influences: MIPS
SPARC computer system,
PlayStation, Nintendo

21
RISC – Reduced Instruction Set Computer

• Low complexity
• Generally results in overall speedup
• Less error-prone implementation by hardwired logic or
simple microcodes
• VLSI implementation advantages
• Less transistors
• Extra space: more registers, cache
• Marketing
• Reduced design time, less errors, and more options
increase competitiveness

22
RISC – Reduced Instruction Set
Computer
• Principle: sacrifice everything for speed
• reduce the number of instructions – make CPU simpler
• get rid of complex instructions, which may slow down
the CPU
• use simple addressing modes – less time spent to
compute the address of an operand
• limit the number of accesses to the memory
• if a given operation cannot be executed in one clock
period than do not implement it in an instruction
• extensive use of pipeline architecture – in order to reach
CPI <=1 (at least one instruction per clock period)

23
RISC – Reduced Instruction Set Computer

• The compilers themselves


• Computationally more complex
• More portable

• The compiler writer


• Less instructions  probably “easier” job
• Simpler instructions  probably less bugs
• Can reuse optimization techniques

24
RISC – Reduced Instruction Set Computer

• MIPS Design Principles


• Simplicity favor regularity
• 32 bit instructions
• Smaller is faster
• Small register file
• Make the common case fast
• Include support for constants
• Good design demands good compromises
• Support for different type off interpretations/classes

• E.g.: ARM, POWER

25
RISC – Reduced Instruction Set Computer

• MIPS (RISC)
• ≈ 200 instructions, 32 bits each, 3 formats
• all operands in registers
• almost all are 32 bits each
• ≈ 1 addressing mode: Mem[reg + imm]

• x86 (ClSC)
• > 1000 instructions, 1 to 15 bytes each
• operands in dedicated registers, general purpose registers,
memory, on stack, …
• can be 1, 2, 4, 8 bytes, signed or unsigned
• 10 types of addressing modes
• e.g. Mem[segment + reg + reg*scale + offset]

26
RISC x CISC

RISC Philosophy CISC Rebuttal


• Regularity & simplicity • Compilers can be smart
• Leaner means faster
• Transistors are plentiful
• Optimize the
common case • Legacy is important
• Code size counts
• Micro-code!
Energy efficiency
Embedded Systems
Phones/Tablets Desktops/Servers

27
RISC x CISC (half-truth)
CISC RISC

MACHINE µ-CODE MACHINE


INSTRUCTIONS TRANSLATION INSTRUCTIONS

µ-
µ- INSTRUCTIONS
INSTRUCTIONS
INSTRUCTIONS PROCESSING
PROCESSING

28
ARM versus x86
Android OS on ARM processor Windows OS on Intel (x86) processor
High-end processor High-end processor
• < 100 instructions • ~ 2000 instructions
• 13 pipeline stages • 14 pipeline stages
• 13B produced (2013) • 100 M produced (2013)

29
ARM
• ARM is British semiconductor
and SW-tools development company,
founded in 1990
• ARM - leading RISC architecture, used in wide variety of
products (mobile devices, peripherals, computers,
HD/SSD controllers, automotive apps, IoT devices,
wearables, etc.)
• During 2014 12 billion ARM-based chips shipped, 20%
annual growth, 17-37% market share [Investopedia]
• In 2016 august SoftBank group (SW, Information
Revolution) purchased ARM Holdings (HW, technology),
aiming more IoT.
30
ARM – family history
Architecture Bit width Cores designed by Designed by third partners Cortex profile
ARM Holdings
ARMv2 32/26 ARM2, ARM3 Amber, STORM Open Soft
Core
ARMv3 32 ARM6, ARM7

ARMv4 32 ARM8 StongARM, FA526

ARMv5 32 ARM7EJ, ARM9E, ARM10E XScale, FA626TE,


Feroceon, PJ1/Mohawk
ARMv6 32 ARM11

ARMv6-M 32 ARM Cortex-M0, ARM Cortex- Microcontroller


M0+, ARM Cortex-M1
ARMv7-M 32 ARM Cortex-M3 Microcontroller

ARMv7E-M 32 ARM Cortex-M4 Microcontroller

ARMv7-R 32 ARM Cortex-R4, ARM Cortex- Real-time


R5, ARM Cortex-R7
ARMv7-A 32 ARM Cortex-A5, ARM Cortex-A7, ARM Krait, Scorpion, Application
Cortex-A8, ARM Cortex-A9, ARM Cortex- PJ4/Sheeva,
A12, ARM Cortex-A15, ARM Cortex-A17 Apple A6/A6X
ARMv8-A 64/32 ARM Cortex-A53, ARM X-Gene, Denver, Apple A7
Cortex-A57 (Cyclone), K12 31
ARM Profiles
• ARMv8-A (Application) architecture profile for high
performance markets such as mobile and enterprise
• ARMv8-R (Real-time) architecture profile for embedded
applications in automotive and industrial control
• ARMv8-M (Microcontroller) architecture profile for
embedded and IoT applications.
Application Real-time Microcontroller

• 32-bit and 64-bit • 32-bit • 32-bit


• A32, T32 and A64 • A32 and T32 instruction • T32 / Thumb® instruction
instruction sets sets set only
• Virtual memory system • Protected memory system • Protected memory system
Supporting rich operating (optional virtual memory) • Optimized for
systems • Optimized for real-time microcontroller
systems applications

32
ARM Architecture
Microcontroller System bus
ARM® CortexTM-M
processor
Input
PPB ports
Internal
Advanced
peripherals High-perf Output
Bus ports
Instructions
Flash ROM Data
ICode bus DCode bus RAM

• Harvard architecture
Different busses for instruction and data
• RISC machine
Pipelining (single cycle operation for many instructions)
Tumb-2 configuration for both 16- and 32-bit instructions
33
ARM ISA: Instruction size
• Variable-length instructions
• ARM instructions are a fixed
length of 32 bits
• Thumb instructions are a fixed
length of 16 bits
• Thumb-2 instructions can be
either 16-bit or 32-bit
• Thumb-2 gives approximately
26% improvement in code
density over ARM
• Thumb-2 gives approximately
25% improvement in
performance over Thumb
34
ARM: state of Art

35
POWER

36
Complex vs. Simple Instructions
• Complex instruction: An instruction does a lot of work,
e.g. many operations
• Insert in a doubly linked list
• Compute FFT
• String copy

• Simple instruction: An instruction does small amount of


work, it is a primitive using which complex operations
can be built
• Add
• XOR
• Multiply

37
Complex vs. Simple Instructions
• early 80’s: RISC movement challenges “CISC
establishment”
• RISC (reduced instruction set computer)
• Berkeley RISC-I (Patterson), Stanford MIPS (Hennessy),
IBM 801
• CISC (complex instruction set computer)
• VAX, x86
• word “CISC” did not exist before word RISC came
along

38
Complex vs. Simple Instructions
• RISC argument [Dave Patterson]
• CISC is fundamentally handicapped
• for a given technology, RISC implementation will be faster
• current VLSI technology enables single-chip RISC
• when technology enables single-chip CISC, RISC will be
pipelined
• when technology enables pipelined CISC, RISC will have
caches
• when technology enables CISC with caches, RISC will have ...
• CISC rebuttal [Bob Colwell]
• CISC flaws not fundamental (fixed with more transistors)
• Moore’s Law will narrow the RISC/CISC gap (true)
• software costs will dominate (very true)

39
Complex vs. Simple Instructions

• Argues
• RISCs fundamentally better than CISCs
• implementation effects and compilers are second order
• unfair because it compares specific implementations
• VAX advantages: big immediates, not-taken branches
• MIPS advantages: more registers, FPU, instruction scheduling, TB

40
RISC curiosities
• Most commercially successful ISA is x86
• also: PentiumPro was first out-of-order microprocessor
• good RISC pipeline, 100K transistors
• good CISC pipeline, 300K transistors
• by 1995: 2M+ transistors evened pipeline playing field
• rest of transistors used for caches (diminishing returns)
• Intel’s other trick?
• decoder translates CISC into sequences of RISC μops

• internally (micro-architecture) is actually RISC!


41
ISA-level Tradeoffs: Semantic Gap
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL)  Small semantic
gap, complex instructions
• Closer to hardware control signals?  Large semantic
gap, simple instructions

• RISC vs. CISC machines


• RISC: Reduced instruction set computer
• CISC: Complex instruction set computer
• FFT, QUICKSORT, POLY, FP instructions?
• VAX INDEX instruction (array access with bounds checking)

42
ISA-level Tradeoffs: Semantic Gap
• Simple compiler, complex hardware vs.
complex compiler, simple hardware
• Caveat: Translation (indirection) can change the tradeoff!

• Burden of backward compatibility

• Performance?
• Optimization opportunity: Example of VAX INDEX instruction:
who (compiler vs. hardware) puts more effort into
optimization?
• Instruction size, code size

43
Small versus Large Semantic Gap
• CISC vs. RISC
• Complex instruction set computer  complex instructions
• Initially motivated by “not good enough” code generation
• Reduced instruction set computer  simple instructions
• John Cocke, mid 1970s, IBM 801
• Goal: enable better compiler control and optimization

• RISC motivated by
• Memory stalls (no work done in a complex instruction when
there is a memory stall?)
• When is this correct?
• Simplifying the hardware  lower cost, higher frequency
• Enabling the compiler to optimize the code better
• Find fine-grained parallelism to reduce stalls

44
A Note on ISA Evolution
• ISAs have evolved to reflect/satisfy the concerns of the
day

• Examples:
• Limited on-chip and off-chip memory size
• Limited compiler optimization technology
• Limited memory bandwidth
• Need for specialization in important applications (e.g., MMX)

• Use of translation (in HW and SW) enabled underlying


implementations to be similar, regardless of the ISA
• Concept of dynamic/static interface
• Contrast it with hardware/software interface
45
RISC x CISC

CISC: less 4-100 long

execution time = no_instructions*CPI*freq

RISC: more 1 short

• Hard to tell which is the best


• A combination of CISC and RISC may be the solution:
• RISC inside, CISC outside – see Pentium processors
• Complex instructions translated into simple (RISC) instructions
• BUT there is a cost of translation in CISC
46
ISA-level Tradeoffs: Instruction
Length
• Fixed length: Length of all instructions the same
+ Easier to decode single instruction in hardware
+ Easier to decode multiple instructions concurrently
-- Wasted bits in instructions
-- Harder-to-extend ISA (how to add new instructions?)
• Variable length: Length of instructions different (determined
by opcode and sub-opcode)
+ Compact encoding
Intel 432: Huffman encoding (sort of). 6 to 321 bit instructions.
-- More logic to decode a single instruction
-- Harder to decode multiple instructions concurrently

• Tradeoffs
• Code size (memory space, bandwidth, latency) vs. hardware complexity
• ISA extensibility and expressiveness
• Performance? Smaller code vs. imperfect decode

47
ISA-level Tradeoffs: Decode type
• Uniform decode: Same bits in each instruction
correspond to the same meaning
• Opcode is always in the same location
• Ditto operand specifiers, immediate values, …
• Many “RISC” ISAs: Alpha, MIPS, SPARC
+ Easier decode, simpler hardware
+ Enables parallelism: generate target address before knowing the
instruction is a branch
-- Restricts instruction format (fewer instructions?) or wastes space

• Non-uniform decode
• E.g., opcode can be the 1st-7th byte in x86
+ More compact and powerful instruction format
-- More complex decode logic

48
ISA Wars: Systematic study

ACM Transactions on Computer Systems, Vol. 33, No. 1, Article 3, Publication date: March 2015.

49
ISA Wars: Systematic study

50
ISA Wars: Systematic study

51
ISA Wars: Systematic study
“Role of ISA: Although our study shows that RISC and CISC ISA traits
are irrelevant to power and performance characteristics of modern
cores, ISAs continue to evolve to better support exposing workload-
specific semantic information to the execution substrate. On x86, such
changes include the transition to Intel64 (larger word sizes, optimized calling
conventions, and shared code support), … , architectural support for
transactions in the form of HLE. Similarly, ARM ISA has introduced shorter
fixed-length instructions for low-power targets (Thumb), vector extensions
(NEON), DSP and bytecode execution extensions (Jazelle DBX), Trustzone
security, and hardware virtualization support. Thus, although ISA… ”

• CONCLUSION:
• CISC x RISC DISCUSSION IS IRRELEVANT
• In both cases performance and power-efficiency are the same
• Microarchitecture and design methodology are the main factor weighing on
performance and consumption.

52

Вам также может понравиться