ISA-Instruction Set Architecture

UNIT - II
Representation of Instructions
(Instruction set Architecture)
A Few Words About Where We Are Headed

Performance = 1 / Execution time
simplified to 1 / CPU execution time
CPU execution time = Instructions CPI / (Clock rate)
Try to achieve CPI = 1

with clock that is as
high as that for CPI > 1
designs; is CPI < 1
feasible?
Design memory & I/O
structures to support
ultrahigh-speed CPUs
Design hardware
for CPI = 1; seek
improvements with
CPI > 1
Define an instruction set;

make it simple enough
to require a small number
of cycles and allow high
clock rate, but not so
simple that we need many Design ALU for
instructions, even for very arithmetic & logic
ops
simple tasks
Computer Architecture,
Instruction-Set Architecture
Performance = Clock rate / ( Instructions CPI )
Embedded System Architecture: Instruction Set Architecture

Categorized by memory organization
Von-Neumann architecture
Harvard architecture
address
data memory
data
address
program memory
Categorized by instruction type

CISC- Complex instruction set computer
RISC - Reduced instruction set computer
VLIW - Very Long Instruction Word
data
PC
CPU
CISC -Complex instruction set

computer
Large number of instructions (~200300 instructions)
RISC - Reduced instruction set computer

Relatively few number of instructions
(~50)
Basic instructions
Specialized complex instructions
Many different addressing modes
Relatively few different addressing

modes
Variable length instruction format
Fixed length instruction format
Variable / more number of machine

cycles
Most instructions complete in one

machine cycle
More instructions can access memory
Only load/store instructions can access

memory
small number of general-purpose

registers
Large number of general-purpose

registers
Micro programmed control unit
Hardwired control unit
Larger die size, longer development

time
smaller die size, shorter development

time, high performance
CISC provides a large and

RISC: There are only two Jump
powerful range of instructions
instructions in the ARM processor
JA Jump if Above
- Branch and Branch with
JAE Jump if Above or Equal
Link.
There are 32 jump instructions in
the 8086, and the 80386 adds more.
CISC Disadvantages:
Many specialized instructions aren't
used frequently
Earlier generations of a processor
family generally were contained as a
subset in every new version
Different instructions take different
amount of clock time to execute, due to
their variable length, slowing down the
overall performance of the machine
RISC Disadvantages :
poor code density (because of
fixed instruction size)
don't execute X86 code
The Performance Equation

The following equation is commonly used for expressing a computer's
performance ability:
CISC approach attempts to minimize the number of instructions per program,

sacrificing the number of cycles per instruction
RISC does the opposite, reducing the cycles per instruction at the cost of the
number of instructions per program.
One side supported CISC designs due to its low burden on compiler
developers and wide availability of existing software.
The other camp supported RISC designs because of its simplicity and
efficiency.
processor designers realize that RISC designs might benefit from the
addition of some CISC characteristics and vice-versa.
These designs use a decoder to convert CISC instructions into RISC
instructions before execution.
They are then processed by a RISC core, which performs a few basic
instructions very quickly.
Having a RISC core is advantageous because it allows performance
enhancing features, such as pipelining and branch prediction.
Popular examples of hybrid designs include the Pentium and Athlon family
of processors.
Very Long Instruction Word (VLIW)

One VLIW instruction contains several
independent operations that are executed in
parallel.
VLIW instruction
Instruction level parallelism-rely on the

compiler to determine which instruction
may be executed in parallel.
c=e/g
F=a+b
F
PU
c
PU
The number of operations in VLIW

instruction is equal to the number of
execution units in the processor.
d
PU
Advantage :Simpler and Faster than RISC

Widely used in DSP(Digital Signal
processing) applications :high performance
and low cost
w
PU
d=x&y
w=z*h
Disadvantage :Incremental in execution unit=> the program must be

recompiled
Less successful in general-purpose computer: customers demand software
compatibility between generations of a processor
Pentium
CPU1
CPU2
16 K L1 cache
Co pros
Software model of the Pentium
EFLAGS
Carry
unsigned arithmetic out of range
Overflow
signed arithmetic out of range
Sign
result is negative
Zero
result is zero
Auxiliary Carry
carry from bit 3 to bit 4
Parity
sum of 1 bits is an even number
Direction
Increment & decrement the SI and DI registers
Interrupt
controls operation of the INTR (interrupt request) input pin
Trap
trapping through an on-chip debugging feature
Nested Task
Indicates if current task is nested
Input / output privilege level
Priority level of current task
Flags are divided into two groups:
1. Control flags - IF, DF, TF
2. Status flags
Memory map of the Personal Computer
The transient program area (TPA) holds the DOS (disk operating
system) operating system; other programs that control the computer
system.
Different modes of operation

Real mode operation:
allows addressing of only the first 1M byte of memory space.
the first 1M byte of memory is called the real memory, conventional
memory, or DOS memory system.
Advanced processors simply operate like very fast 8086s.
It is automatically selected upon power up.
DOS is a real mode operating system.
Protected mode operation:
Full memory is available to the processor.

Perform- special privileged instructions, multitasking, virtual memory
addressing, memory management & protection functions.
Control the internal cache
Windows operating system runs in protected mode.
Writing programs requires special background knowledge of operating
systems theory.
Functional
Block
Diagram of
Pentium
A special technique can be used to utilize a 32 bit register on an instruction by

instruction basis.
Single byte operand size prefix
Consider a 32 bit data : 229B0112 h
Db 66h
Mov ax, 0112h
Dw 229Bh
Mov Eax, 229B0112 h
8086 through Core2 considered program visible registers.

registers are used during applications programming and are
specified by the instructions
Other registers considered to be program invisible registers.
they are not addressable directly during applications programming.
they may be used indirectly during system programming.
(80286 and above)
Addressing Modes
Register Addressing
MOV BX, CX
Immediate Addressing
MOV AX, 3456H
Direct Addressing
MOV AL,[1234H]
Register Indirect Addressing
MOV AX,[BX]
Base-Plus-Index Addressing
MOV DX,[BX + DI]
Register Relative Addressing
MOV AX,[BX+1000H]
Base Relative-Plus-Index Addressing

MOV AX,[BX + SI + 100H]
Port addressing
1. The port specified in the operand
field. Address bus contains the
address of an I/O port.
For eg: IN AL, 80H
(00 FF) 256 I/O port locations.
2.
Indirectly via the address contained

in register DX.
OUT DX, AX
(0000 FFFFH) 65,536 I/O port
locations.
Scaled-Index Addressing
Unique to 80386 - Core2 microprocessors.
uses two 32-bit registers (a base register and an index register) to
access the memory.
The second register (index) is multiplied by a scaling factor.
the scaling factor can be 1x, 2x, 4x, 8x
MOV EAX, [EBX] [ECX * 4 + 6].

base
index
scale displacement
32 bit addressing modes may be used while running in real mode by using
Address size prefix
Db 67h
MOV EAX, [EBX] [ECX * 4 + 6]
Super scalar Architecture

Processors capable of parallel instruction execution of multiple instructions
are known as superscalar machines.
Parallel execution is possible through U & V pipeline of Pentium.

Four restriction placed on a pair of integer instruction attempting parallel
execution:
1. Both must be simple instructions
(Mov, Inc, Dec)
2. No data dependencies may exist between them.

read after write dependency
if both instruction write to the same operand
3. Neither instruction may contain both immediate data and a displacement
value.
MOV table[SI], 7
4. Prefixed instruction may only execute in the U pipeline.

MOV ES:[DI], AL
For floating point instruction the first instruction of the pair must be one of the
following :
FADD, FSUB, FMUL, FDIV, FCOM
Second instruction must be FXCH
The compiler plays an important role in the ordering of instruction during
code generation.
Processors are capable of achieving an instruction throughput of more than

one instruction per cycle- superscalar architecture.
Pipelining
Pentium Instruction Set

Data transfer instructions
INS / OUTS - input string from port / output string to port
80286 onwards
INS dest, DX
OUTS DX, src
POPA / PUSHA 80286 onwards All the 16 bit registers
Order of registers for PUSHA- AX,CX,DX,BX,SP,BP,SI,DI
POPAD / PUSHAD 80386 onwards All the 32 bit registers
POPF / PUSHF
POPFD / PUSHFD
80386 onwards
LFS load pointer using FS
LGS load pointer using GS
LSS load pointer using SS
MOVSX move with sign extended

MOVSX dest, src
MOVZX move with zero extended
MOVZX dest, src
It should not be used when working with signed numbers.
BSWAP byte swap- 80486 onwards
Swaps bytes in a 32 bit GPR.
Converting 32 bit numbers from little endian format into big endian
format & vice versa.
BSWAP dst
New Pentium Instruction
MOV - move to / from control registers
Arithmetic Instructions
80286 onwards
CBW convert byte to word
Extend a signed 8 bit number in AL into a signed 16 bit number in AX
Performed before IDIV or IMUL
CWD convert word to double word
Extend a signed 16 bit number in AX into a signed 32 bit number in DX : AX
Performed before IDIV or IMUL
80386 onwards
CWDE - convert word to double word extended
Extend a signed 16 bit number in AX into a signed 32 bit number in EAX
CDQ convert double word to quad word
The sign bit of EAX is extended through EDX.
64 bit results in EDX : EAX
80486 onwards
CMPXCHG compare and exchange
CMPXCHG dst, src
Compares the dst operand with the accumulator.
AL,AX or EAX depending on the size of the dst.
If acc = dst - src is copied to dst.
If acc = dst - acc is replaced by the value in the dst.
Very useful in operating system s/w that supports multiple process
through the use of semaphores.
XADD exchange and add byte, word or double word
XADD dst, src
Pentium instruction
CMPXCHG8B - compare and exchange 8 bytes
CMPXCHG8B dst
ECX : EBX - source
EDX : EAX compared with dst
Bit Manipulation Instructions

80386 onwards
BSF - bit scan forward
BSR - bit scan reverse
BSF EAX, EBX
Scan the src operand for the first bit that equals 1, beginning with the
LSB.
The bit position (Index) of the first 1 found is saved in the dst.
Application : Edge detection in an image processing application
BT bit scan
BT dst, src
To determine the value of a specific bit in the 16 or 32 bit
destination operand.
The bit to be tested is indicated by the source operand
The state of the bit that is tested is copied into the carry flag
BTC after testing the bit - complements
BTS - after testing the bit sets
BTR - after testing the bit resets
Control applications: Single bit is used to operate a device.
Open/close - relay or door
On /off light or indicator
Sense a specific condition of the device.
SHLD / SHRD dst, src, count

Shift left / right double precision
Count : only 8 bit operand lower 5 bits are used
Power PC family
Mid seventies
First RISC type computer IBM 801
Execute an instruction at almost every clock cycles
(To achieve this - hardwired - RISC property)
all 801 instructions - 32 bits long
Mid eighties
IBM developed - commercial RISC type processor
ROMP - Research office products division Microprocessor
65% of the instructions were 16 bits long others were 32 bits long.
In 1990
IBM developed - RS 6000
POWER Performance Optimization with enhanced RISC
RS 6000 - POWER architecture
IBM RS 6000 is a predecessor of the POWER PC architecture
In 1991
IBM + Motorola + Apple - developed a new powerful family of
RISC type Micro processor
POWER PC family
The first POWER PC implementation is the POWER PC 601
Microprocessor also called MPC 601 by Motorola and PPC 601 by
IBM
MPC 603, 604, 620 - based on the POWER PC architecture
derived from the IBM POWER architecture
Power PC Architecture
3 layers
1.User instruction set architecture - includes user level registers,
programming model, data types, addressing modes and the base user
level instruction set (non privileged instruction)
2. Virtual environment architecture - (additional user level
functionality) memory model, cache model, cache control instruction,
address aliasing and other related issues. (user level timer support)
3. Operating environment architecture - supervisor level register,

privileged instruction and the exception model.
operating system level
The basic mode of operation

1. user mode
2. supervisor mode - similar to M68000 family
supervisor mode can access all registers.
user mode can access registers in the user programming model only.
User Programming Model
Application level registers
Supervisor Programming Model
Supervisor level registers
User Programming Model
General-purpose registers (GPRs)
Floating-point registers (FPRs)
FPR0 to FPR31
32 floating-point registers with
64-bit precision.
source and destination operands
of all floating-point operations.
FPRs also provide access to the
FPSCR(Floating-Point Status and
Control Register)
32 general purpose registers.

(GPR0 - GPR31)
Source and destination for all
integer operations.
address source for all load/store
operations. (Base or Index reg)
They also provide access to
SPRs.
Special-purpose registers (SPRs)

The Fixed-Point Exception Register (XER)- used for indicating
conditions for integer operations, such as carries and overflows.
The Floating-Point Status and Control Register (FPSCR)- 32-bit

register used to store the status and control of the floating-point operations.
The Count Register (CTR)- used to hold a loop count that can be
decremented during the execution of branch instructions.
The Condition Register (CR)-32-bit register grouped into eight fields,

where each field is 4 bits that signify the result of an instructions operation
The Link Register (LR) contains the address to return to at the end of a
function call.
Condition Register (CR)
The CR fields can be set in one of the following ways:

Specified fields of the CR can be set from a GPR by using the mtcrf
(move to cr fields) instruction.
The contents of the XER[03] can be moved to another CR field by using
the mcrf (move cr field) instruction.
A specified field of the XER can be copied to a specified field of the CR by
using the mcrxr (move to cr from XER) instruction.
A specified field of the FPSCR can be copied to a specified field of the CR

by using the mcrfs (move to cr from FPSCR) instruction.
Logical instructions of the condition register can be used to perform

logical operations on specified bits in the condition register.
(crand crbD, crbA, crbB)
CR0 can be the implicit result of an integer instruction. (XER)
CR1 can be the implicit result of a floating-point instruction.
(FPSCR)
A specified CR field can indicate the result of either an integer or
floating-point compare instruction.
Branch instructions are provided to test individual CR bits
Bit Settings for CR0 Field of CR

0
1
2
3
LT GT EQ SO
CR0 Bit Description
Bit 0 -Negative (LT)This bit is set when the result is negative.
Bit1- Positive (GT)This bit is set when the result is positive (and not
zero).
Bit 2- Zero (EQ)This bit is set when the result is zero
Bit 3- Summary overflow (SO)This is a copy of the final state of
XER[SO] at the completion of the instruction.
Bit Settings for CR1 Field of CR
4
5
6
7
CR1 Bit Description
FX
FEX VX
OX
Bit 4- Floating-point exception summary (FX)
Bit 5- Floating-point enabled exception summary (FEX)
Bit 6- Floating-point invalid operation exception summary (VX)
Bit 7- Floating-point overflow exception (OX)
Condition Register CRn FieldCompare Instruction

For a compare instruction, when a specified CR field is set to reflect the
result of the comparison, the bits of
Bit 0 -Less than or floating-point less than (LT, FL).
Bit 1 - Greater than or floating-point greater than (GT, FG).
Bit 2 - Equal or floating-point equal (EQ, FE).
Bit 3 - Summary overflow or floating-point unordered (SO, FU).
Fixed-Point Exception Register (XER)
contains carry and overflow information form integer arithmetic operations
The number of bytes to transfer during load and store string instruction
lswx ( load string word indexed ) and stswx (store string word indexed)
Instruction formats
Format
0-5
6-10
11-15
16-20
21-25
26-29
30
31
D-form
opcd
D
tgt/src
A
src/tgt
X-form
opcd
D
tgt/src
A
src/tgt
B
src
opcd
D
tgt/src
A
src/tgt
B
src
C
src
extended opcd
Rc
opcd
D
tgt/src
A
src/tgt
B
src
OE
extended opcd
Rc
BD-form
opcd
BO
BI
I-form
opcd
SIMM
immediate
extended opcd
A-form
BD
LI
AA
LK
AA
LK
Addressing Modes
1. All operations are reg to reg using the following two modes:
Reg direct: operand is in a GPR or FPR (A form)
Immediate : operand is a part of the instruction (D form)
2. EA to memory in needed in two classes of instruction

a) For load and store instruction
- Reg. Indirect GPR reg. contain the address of the operand
in memory (EA)
- Reg. indirect with immediate index (EA = GPR + immediate)
(D form)
- Reg. indirect with index (EA = GPR + index) (X form)
b) For branch instruction
- immediate addressing (I form) target EA is in part of the
instruction
- link register indirect (BD form) target EA is in the LR
- count register indirect target EA is in the CTR
Instruction formats
upper
six bits opcode (0-5)

opcode extended bits (22-30)
two register source operand A (11-15) and B (16-20)
destination operand D (6-10)
OE control bit enables the overflow detection
RC record bit updates the CR
RC (record bit) = 1 for integer operation CR0 is set to reflect
the result of the arithmetic operation (LT GT EQ SO)
RC = 1 for floating point operation CR1 is set to reflect the
state of the exception status bits in the FPSCR (FX FEX VX OX)
D form :
addi rD, rA, SIMM (sign extended)
add immediate rD = rA+ SIMM
Load & store ins A field-reg indirect, SIMM immediate address
D field - dst (load) or src (store)
A form
integer arithmetic have four forms of operation
add add rd, ra, rb (rd ra+rb)
(OE =0 RC =0)
add. add with CR update
(OE =0 RC =1)
addo add with overflow update
(OE =1 RC =0)
addo. add with overflow and CR updated (OE =1 RC =1)
floating point instruction
fadd floating point add
fadd. floating point add with CR update
Composite instructions
fmadd frd, fra, frb, frc (floating point multiply and add)
frd
fra * frb + frc
Load & store ins A field-reg indirect, B field index reg
D field - dst (load) or src (store)
X form
Load & store ins
A-reg indirect, B index reg D- dst (load)

or src (store)
I form
branch instruction unconditional

LI immediate address field (length indicator) 24 bits
AA absolute address bit
AA = 0 LI is shifted two bits left, filling the two lower bits with
zeros and added to the instruction address to form the
branch target address
AA = 1 shifted and sign extended branch target address
LK =1 address of the next ins is placed in LR
b target address
(AA =0 LK = 0)
ba branch absolute
(AA =1 LK = 0)
bl branch then link
(AA =0 LK = 1)
bla branch absolute then link
(AA =1 LK = 1)
BD form :
Conditional branch instruction format
BO field- specifies the conditions under which the branch is taken (type of
condition true or false)
BI field specifies the bit in the CR to be used as a condition of the branch
(which CR bit is to be used as the condition)
BD field is used to form the branch target address (LI field)
bc BO, BI, target address (branch conditional)
bca, bcl and bcla
bclr
branch conditional to link register
bclr BO, BI
(lk=0)
bclrl
(lk =1)
bcctr branch conditional to count register
bcctr BO, BI
(lk=0)
bcctrl BO, BI
(lk=1)
Instruction types
Supervisor Programming Model

Machine state register
Segment registers (SR) 32 bit 16 SRs are

present only in 32 bit Power PC
implementation.
Special purpose registers implementation
dependent
Machine state register
Bit 0 SF 0 32 bit mode
1 64 bit mode
Bit 16 EE external Interrupt enable
(0 disabled & 1 enabled)
SF
EE
PR
FP
ME
FE0
SE
FE1
EP
IT
DT
LE
1-15
16
17
18
19
20
21
22
23
24
25
26
27
28-30
31
Bit 17 PR privilege level 0 execute all the ins

1 only user level ins are executed
Bit 18 FP- Floating point available 0 unavailable
1 available
Bit 19 ME machine check exception enable 0 disabled
1 enabled
Bit 20 & 23 FE0, FE1 FP exception mode (4 modes)
Bit 21 SE- single step trace enable -0 single step execution
1 normal execution
Bit 25 EP exception prefix- exceptions are vectored to the physical
address (0 - 000n nnnnh & 1- FFFn nnnnh)
Bit 26 - IT Instruction address translation
Bit 27 DT- Data address translation
Bit 31 LE 1 little endian mode
0 Big endian mode (default byte ordering)
MPC 601
First Microprocessor
Implementation of the Power PC
66 Mhz, power dissipation is 9w
at 3.6 volts
Integer Execution Unit
Floating Point Unit
Load/Store Unit (LSU)
Branch Execution Units
Memory Management Unit
Memory Unit
Cache
Data Types
It can use either little-endian or big-endian style.
Fixed-point data types include:

o Unsigned byte 8bits
o Unsigned halfword 16-bits
o Signed halfword 16-bits
o Unsigned word 32-bit
o Signed word 32-bit
o Unsigned doubleword 64-bits
o Byte Strings: From 0 128 bytes in length
2s complement is used for negative values
floating-point data formats
single-precision, 32 bits long (23 + 8 + 1)
double-precision, 64 bits long (52 + 11 + 1)
characters are stored using 8-bit ASCII codes

ISA-Instruction Set Architecture

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ISA-Instruction Set Architecture

Загружено:

Авторское право:

Доступные форматы

UNIT - II

A Few Words About Where We Are Headed

simplified to 1 / CPU execution time

CPU execution time = Instructions CPI / (Clock rate)

Try to achieve CPI = 1

Define an instruction set;

Performance = Clock rate / ( Instructions CPI )

Embedded System Architecture: Instruction Set Architecture

Categorized by instruction type

CISC -Complex instruction set

RISC - Reduced instruction set computer

Specialized complex instructions

Many different addressing modes

Relatively few different addressing

Variable length instruction format

Fixed length instruction format

Variable / more number of machine

Most instructions complete in one

More instructions can access memory

Only load/store instructions can access

small number of general-purpose

Large number of general-purpose

Micro programmed control unit

Hardwired control unit

Larger die size, longer development

smaller die size, shorter development

CISC provides a large and

The Performance Equation

CISC approach attempts to minimize the number of instructions per program,

Very Long Instruction Word (VLIW)

Instruction level parallelism-rely on the

The number of operations in VLIW

Advantage :Simpler and Faster than RISC

Disadvantage :Incremental in execution unit=> the program must be

Software model of the Pentium

Memory map of the Personal Computer

Different modes of operation

Full memory is available to the processor.

A special technique can be used to utilize a 32 bit register on an instruction by

Mov Eax, 229B0112 h

8086 through Core2 considered program visible registers.

Base Relative-Plus-Index Addressing

Indirectly via the address contained

MOV EAX, [EBX] [ECX * 4 + 6].

Super scalar Architecture

Parallel execution is possible through U & V pipeline of Pentium.

2. No data dependencies may exist between them.

4. Prefixed instruction may only execute in the U pipeline.

Processors are capable of achieving an instruction throughput of more than

Pentium Instruction Set

MOVSX move with sign extended

Bit Manipulation Instructions

SHLD / SHRD dst, src, count

3. Operating environment architecture - supervisor level register,

The basic mode of operation

User Programming Model

General-purpose registers (GPRs)

Floating-point registers (FPRs)

32 general purpose registers.

Special-purpose registers (SPRs)

The Floating-Point Status and Control Register (FPSCR)- 32-bit

The Condition Register (CR)-32-bit register grouped into eight fields,

Condition Register (CR)