Вы находитесь на странице: 1из 43

High Performance Computer Architecture

Instruction Set Architecture

Ajit Pal Professor


Department of Computer Science and Engineering

Indian Institute of Technology Kharagpur INDIA-721302

Outline
An Instruction
Instruction Set Architecture Classification of ISAs

Classification of Operations
Operands Instruction Format Addressing Modes Evolution of the Instruction Set

RISC Versus CISC


MIPS Instruction Set

An Instruction
An instruction can be considered as a word in processors language What information an Instruction should covey to the CPU?
opcode

Addr of OP1

Addr of OP2

Dest Addr

Addr of next intr

The instruction is too long if everything specified explicitly More space in memory Longer execution time How can you reduced the size of an instruction? Specifying information implicitly How? Using PC, ACC, GPRs, SP
Ajit Pal, IIT Kharagpur

Instruction Set Architecture


Instruction Set Architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine.
The ISA defines: Operations that the processor can execute Data Transfer mechanisms + how to access data Control Mechanisms (branch, jump, etc) Contract between programmer/compiler and HW ISA is important: Not only from the programmers perspective. From processor design and implementation perspectives as well.
Ajit Pal, IIT Kharagpur

Instruction Set Architecture


Programmer visible part of a processor: Registers (where are data located?) Addressing Modes (how is data accessed?) Instruction Format (how are instructions specified?)

Exceptional Conditions (what happens if


something goes wrong?) Instruction Set (what operations can be performed?)
Ajit Pal, IIT Kharagpur

ISA Design Choices


Types of operations supported e.g. arithmetic/logical, data transfer, control transfer, system, floating-point, decimal (BCD), string Types of operands supported e.g. byte, character, digit, halfword, word (32-bits), doubleword, floating-point number Types of operand storage allowed e.g. stack, accumulator, registers, memory Implicit vs. explicit operands in instructions and number of each Orthogonality of operands, operand location, and addressing modes
Ajit Pal, IIT Kharagpur

Classification of ISAs
Determined by the means used for storing data in CPU: The major choices are: A stack, an accumulator, or a set of registers Stack architecture: Operands are implicitly on top of the stack Accumulator architecture:
One operand is in the accumulator (register) and the others are elsewhere Essentially this is a 1 register machine Found in older machines

General purpose registers:


Operands are in registers or specific memory locations.
Ajit Pal, IIT Kharagpur

Evolution of ISAs
Accumulator Architectures (EDSAC, IBM-701) Special-purpose Register Architectures General purpose registers architectures
Register-memory (IBM 360, DEC PDP-11, Intel 80386 Register-register (load-store) (CDC6600, MIPS, DEC alpha)
Single Accumulator (Manchester Mark I, IBM 700 series 1953) Stack (Burroughs, HP-3000 1960-70) General Purpose Register Machines Complex Instruction Sets (Vax, Intel 386 1977-85) RISC (MIPS, IBM RS6000, . . .1987)

Ajit Pal, IIT Kharagpur

Classification of ISAs
Types of Architecture Stack Source Operands Top two Destination Top of stack

elements in stack
Accumulator Accumulator (1) Memory (other) Register set Register or Memory
Ajit Pal, IIT Kharagpur

Accumulator

Register or Memory

Comparison of Architectures

Consider the operation: C = A + B


Stack Push A Push B Add Pop C Accumulator Load A Add B Store C Register-Memory Load R1, A Add R1, B Store C, R1 Register-Register Load R1, A Load R2, B Add R3, R1, R2 Store C, R3

Ajit Pal, IIT Kharagpur

Classification of Operations
Data Transfer Load, store, mov Data Manipulation Arithmetic Add, subtract, multiply, divide Signed, unsigned, integer, floating-point Logical - Conjunction, disjunction, shift left, shift right Status manipulation Control Transfer Conditional Branch Branch on equal, not equal Set on less than Unconditional Jump
Ajit Pal, IIT Kharagpur

Instruction Format
Instruction length needs to be in multiples of bytes. Instruction encoding can be: Variable or Fixed Variable encoding tries to use as few bits to represent a program as possible: But at the cost of complexity of decoding Alpha, ARM, MIPS instructions:
Operation Addr1 Addr2 Addr3 and Modes
Ajit Pal, IIT Kharagpur

Addressing Modes
Addressing modes describe how an instruction can find the location of its operands Effective address = actual memory address specified by an addressing mode

Ajit Pal, IIT Kharagpur

Addressing Modes
INHERENT (0-address)
OPCODE

IMMEDIATE

OPCODE OPERAND OPERAND

OPCODE ADDRESS

MEMORY

ABSOLUTE (DIRECT)
OPERAND

Ajit Pal, IIT Kharagpur

Addressing Modes
INDIRECT
OPCODE ADDRESS MEMORY

A2 A2 OPERAND

REGISTER

OPCODE

Rk

REGISTERS l

REGISTER INDIRECT

OPCODE

RPk

REGISTER PAIRS MEMORY

(RPk)

OPERAND

Ajit Pal, IIT Kharagpur

Addressing Modes
PAGED
OPCODE OFFSET DIRECT-PAGE REGISTER MEMORY

OPERAND

ADDRESS

INDEXED

OPCODE BASE ADDRESS

INDEX REGISTERS MEMORY OFFSET OPERAND

Ajit Pal, IIT Kharagpur

Addressing Modes
BASED BASED-INDEXED
OPCODE CONSTANT MEMORY BASE REGISTER

OPERAND

RELATIVE

OPCODE DISPLACEMENT MEMORY

PROGRAM COUNTER

OPERAND

Ajit Pal, IIT Kharagpur

Addressing Modes
STACK
Empty

Empty PUSH A

Empty SP Empty

SP

Empty A

Empty PUSH B

Empty POP B

SP

Empty B A

Empty SP B A

Ajit Pal, IIT Kharagpur

RISC/CISC Controversy
RISC: Reduced Instruction Set Computer CISC: Complex Instruction Set Computer Genesis of CISC architecture: Implementing commonly used instructions in hardware can lead to significant performance benefits. For example, use of a FP processor can lead to performance improvements. Genesis of RISC architecture: The rarely used instructions can be eliminated to save chip space --- on chip cache and large number of registers can be provided.
Ajit Pal, IIT Kharagpur

Features of A CISC Processor


Rich instruction set: some simple, some very complex Complex addressing modes: Orthogonal addressing (Every possible addressing mode for every instruction). Many instructions take multiple cycles: Large variation in CPI Instructions are of variable sizes Small number of registers Microcode control No (or inefficient) pipelining

Ajit Pal, IIT Kharagpur

The CISC Philosophy


One instruction could do the work of several instructions. For example, a single instruction could load two numbers to be added, add them, and then store the result back to memory directly. Many versions of the same instructions were supported; Different versions did almost the same thing with minor changes. For example, one version would read two numbers from memory, and store the result in a register. Another version would read one number from memory and the other from a register and store the result to memory.
Ajit Pal, IIT Kharagpur

Features of a RISC Processor


Small number of instructions Small number of addressing modes

Large number of registers (>32)


Instructions execute in one or two clock cycles Uniformed length instructions and fixed instruction format. Register-Register Architecture: Separate memory instructions (load/store) Separate instruction/data cache Hardwired control Pipelining (Why CISC are not pipelined?)
Ajit Pal, IIT Kharagpur

Early RISC Processors

1987 Sun SPARC 1990 IBM RS 6000 1996 IBM/Motorola PowerPC

Ajit Pal, IIT Kharagpur

CISC vs. RISC Organizations

Microprogrammed Control Unit

Cache
Microprogrammed Control Memory

Hardwared Control Unit

Instruction Cache

Data Cache

Main Memory (a) CISC Organization

Main Memory (b) RISC Organization

Ajit Pal, IIT Kharagpur

Why Does RISC Lead to Improved Performance?

Increased GPRs (Also, Register Windows) leads to decreased data traffic to memory.

Register-Register architecture leads to more uniform instructions:


Efficient pipelining becomes possible. However, large instruction memory traffic: Because of larger number of instructions.

Ajit Pal, IIT Kharagpur

Why Does RISC Lead to Improved Performance? RISC-Like instructions can be scheduled to achieve efficiency: Either by compiler or By hardware (Dynamic instruction scheduling). Suppose we need to add 2 values and store the results back to memory. In CISC: It would be done using a single instruction. In RISC: 4 instructions would be necessary.
Ajit Pal, IIT Kharagpur

Case Study

MIPS Instruction Set architecture

Ajit Pal, IIT Kharagpur

Operands in MIPS
Registers
MIPS has 32 architected 32-bit registers Effective use of registers is key to program performance

Data Transfer
32 words in registers, millions of words in main memory Instruction accesses memory via memory address Address indexes memory, a large single-dimensional array

Alignment
Most architectures address individual bytes Addresses of sequential words differ by 4 Words must always start at addresses that are multiples of 4

Ajit Pal, IIT Kharagpur

Register Usage Conventions


Registers
MIPS has 32 architected 32-bit registers Effective use of registers is key to program performance

r0: always 0 r1: reserved for the assembler r2, r3: function return values r4~r7: function call arguments r8~r15: caller-saved temporaries r16~r23 callee-saved temporaries r24~r25 caller-saved temporaries r26, r27: reserved for the operating system r28: global pointer r29: stack pointer r30: callee-saved temporaries r31: return address
Ajit Pal, IIT Kharagpur

Data Types in MIPS


Data and instructions are both represented in bits 32-bit architectures employ 32-bit instructions Combination of fields specifying operations/operands MIPS operates on:
32-bit (unsigned or 2s complement) integers, 32-bit (single precision floating point) real numbers, 64-bit (double precision floating point) real numbers; 32-bit words, bytes and half words can be loaded into GPRs After loading into GPRs, bytes and half words are either zero or sign bit expanded to fill the 32 bits; Only 32-bit units can be loaded into FPRs and 32-bit real numbers are stored in even numbered FPRs. 64-bit real numbers are stored in two consecutive FPRs, starting with even-numbered register.
Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit significand double:11-bit exponent, 52-bit significand
Ajit Pal, IIT Kharagpur

MIPS Memory Organization


MIPS supports byte addressability: it implies that a byte is the smallest unit with its

address;
32-bit word has to start at byte address that is multiple of 4;

Thus, 32-bit word at address 4n includes four bytes


with addresses: 4n, 4n+1, 4n+2, and 4n+3. 16-bit half word has to start at byte address that is

multiple of 2; Thus, 16-bit word at address 2n includes


two bytes with addresses: 2n and 2n+1. it implies that an address is given as 32-bit unsigned
Ajit Pal, IIT Kharagpur

MIPS Instruction Formats


R-Type Instruction Fields op: basic operation of instruction, called the opcode rs, rt: first, second register source operand rd: register destination operand shamt: shift amount for shift instructions funct: specifies a variant of the operation, called function code

Ajit Pal, IIT Kharagpur

MIPS Instruction Formats


I-Type Instruction Fields Opcode specifies instruction format

J-Type Instruction Fields

Ajit Pal, IIT Kharagpur

MIPS Addressing Modes


Register Addressing Operand is a register (e.g., add $s2, $s0, $s1)

Base or Displacement Addressing Operand is at memory location whose address is sum of register and constant (e.g., lw $s1, 8($s0))

Ajit Pal, IIT Kharagpur

MIPS Addressing Modes


Immediate Addressing Operand is constant within instruction (e.g., addi $s1, $s0, 4) PC-Relative Addressing Address is sum of PC and constant within instruction (e.g., beq $s0, $s1, 16)

Ajit Pal, IIT Kharagpur

MIPS Addressing Modes


Pseudo-direct Addressing Address is 26 bits of constant within instruction concatenated with upper 6 bits of PC (e.g., j 1000)

Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Arithmetic Addition add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow undetected Subtraction sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow undetected Immediate/Constants addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow undetected
Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Arithmetic

Multiply mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned product
Divide div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3 divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3, unsigned Ancillary mfhi $s1 # $s1 = Hi, get copy of Hi mflo $s1 # $s1 = Lo, get copy of low
Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Logical Boolean Operations and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR Immediate/Constants andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR Shifting sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right

Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Data Transfer Load Operations lw $s1, 100($s2) # $s1 = Mem($s2+100), load word lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.

Store Operations sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte

Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Control Transfer Jump Operations j 2500 # go to 10000 jal 2500 # $ra = PC+4, go to 10000, for procedure call jr $ra # go to $ra, for procedure return Branch Operations
beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative

Comparison Operations
slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2s comp. slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2s comp. sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned

Ajit Pal, IIT Kharagpur

Survey of MIPS Instruction Set


Floating Point Operations Arithmetic
{add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision Floating-point registers are used in even-odd pairs, using even number register as its name

Data Transfer
{lwc1, swc1} $f1, 100($s2) Transfer data to/from floating-point register file

Conditional Branch
{c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0 {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100

Ajit Pal, IIT Kharagpur

Thanks!

Ajit Pal, IIT Kharagpur