Вы находитесь на странице: 1из 52

UNIT - II

Representation of Instructions
(Instruction set Architecture)

A Few Words About Where We Are Headed


Performance = 1 / Execution time

simplified to 1 / CPU execution time

CPU execution time = Instructions CPI / (Clock rate)

Try to achieve CPI = 1


with clock that is as
high as that for CPI > 1
designs; is CPI < 1
feasible?
Design memory & I/O
structures to support
ultrahigh-speed CPUs

Design hardware
for CPI = 1; seek
improvements with
CPI > 1

Define an instruction set;


make it simple enough
to require a small number
of cycles and allow high
clock rate, but not so
simple that we need many Design ALU for
instructions, even for very arithmetic & logic
ops
simple tasks

Computer Architecture,
Instruction-Set Architecture

Performance = Clock rate / ( Instructions CPI )

Embedded System Architecture: Instruction Set Architecture


Categorized by memory organization
Von-Neumann architecture

Harvard architecture
address
data memory

data
address

program memory

Categorized by instruction type


CISC- Complex instruction set computer
RISC - Reduced instruction set computer
VLIW - Very Long Instruction Word

data

PC
CPU

CISC -Complex instruction set


computer
Large number of instructions (~200300 instructions)

RISC - Reduced instruction set computer


Relatively few number of instructions
(~50)
Basic instructions

Specialized complex instructions

Many different addressing modes

Relatively few different addressing


modes

Variable length instruction format

Fixed length instruction format

Variable / more number of machine


cycles

Most instructions complete in one


machine cycle

More instructions can access memory

Only load/store instructions can access


memory

small number of general-purpose


registers

Large number of general-purpose


registers

Micro programmed control unit

Hardwired control unit

Larger die size, longer development


time

smaller die size, shorter development


time, high performance

CISC provides a large and


RISC: There are only two Jump
powerful range of instructions
instructions in the ARM processor
JA Jump if Above
- Branch and Branch with
JAE Jump if Above or Equal
Link.
There are 32 jump instructions in
the 8086, and the 80386 adds more.

CISC Disadvantages:
Many specialized instructions aren't
used frequently
Earlier generations of a processor
family generally were contained as a
subset in every new version
Different instructions take different
amount of clock time to execute, due to
their variable length, slowing down the
overall performance of the machine

RISC Disadvantages :
poor code density (because of
fixed instruction size)
don't execute X86 code

The Performance Equation


The following equation is commonly used for expressing a computer's
performance ability:

CISC approach attempts to minimize the number of instructions per program,


sacrificing the number of cycles per instruction
RISC does the opposite, reducing the cycles per instruction at the cost of the
number of instructions per program.

One side supported CISC designs due to its low burden on compiler
developers and wide availability of existing software.
The other camp supported RISC designs because of its simplicity and
efficiency.
processor designers realize that RISC designs might benefit from the
addition of some CISC characteristics and vice-versa.
These designs use a decoder to convert CISC instructions into RISC
instructions before execution.

They are then processed by a RISC core, which performs a few basic
instructions very quickly.
Having a RISC core is advantageous because it allows performance
enhancing features, such as pipelining and branch prediction.
Popular examples of hybrid designs include the Pentium and Athlon family
of processors.

Very Long Instruction Word (VLIW)


One VLIW instruction contains several
independent operations that are executed in
parallel.

VLIW instruction

Instruction level parallelism-rely on the


compiler to determine which instruction
may be executed in parallel.

c=e/g

F=a+b

F
PU

c
PU

The number of operations in VLIW


instruction is equal to the number of
execution units in the processor.

d
PU

Advantage :Simpler and Faster than RISC


Widely used in DSP(Digital Signal
processing) applications :high performance
and low cost

w
PU

d=x&y

w=z*h

Disadvantage :Incremental in execution unit=> the program must be


recompiled
Less successful in general-purpose computer: customers demand software
compatibility between generations of a processor

Pentium
CPU1

CPU2

16 K L1 cache

Co pros

Software model of the Pentium

EFLAGS

Carry
unsigned arithmetic out of range
Overflow
signed arithmetic out of range
Sign
result is negative
Zero
result is zero

Auxiliary Carry
carry from bit 3 to bit 4
Parity
sum of 1 bits is an even number
Direction
Increment & decrement the SI and DI registers
Interrupt
controls operation of the INTR (interrupt request) input pin
Trap
trapping through an on-chip debugging feature
Nested Task
Indicates if current task is nested
Input / output privilege level
Priority level of current task
Flags are divided into two groups:
1. Control flags - IF, DF, TF
2. Status flags

Memory map of the Personal Computer

The transient program area (TPA) holds the DOS (disk operating
system) operating system; other programs that control the computer
system.

Different modes of operation


Real mode operation:
allows addressing of only the first 1M byte of memory space.
the first 1M byte of memory is called the real memory, conventional
memory, or DOS memory system.
Advanced processors simply operate like very fast 8086s.
It is automatically selected upon power up.
DOS is a real mode operating system.
Protected mode operation:

Full memory is available to the processor.


Perform- special privileged instructions, multitasking, virtual memory
addressing, memory management & protection functions.
Control the internal cache
Windows operating system runs in protected mode.
Writing programs requires special background knowledge of operating
systems theory.

Functional
Block
Diagram of
Pentium

A special technique can be used to utilize a 32 bit register on an instruction by


instruction basis.
Single byte operand size prefix
Consider a 32 bit data : 229B0112 h
Db 66h
Mov ax, 0112h
Dw 229Bh

Mov Eax, 229B0112 h

8086 through Core2 considered program visible registers.


registers are used during applications programming and are
specified by the instructions
Other registers considered to be program invisible registers.
they are not addressable directly during applications programming.
they may be used indirectly during system programming.
(80286 and above)

Addressing Modes
Register Addressing
MOV BX, CX
Immediate Addressing
MOV AX, 3456H
Direct Addressing
MOV AL,[1234H]
Register Indirect Addressing
MOV AX,[BX]
Base-Plus-Index Addressing
MOV DX,[BX + DI]
Register Relative Addressing
MOV AX,[BX+1000H]

Base Relative-Plus-Index Addressing


MOV AX,[BX + SI + 100H]

Port addressing
1. The port specified in the operand
field. Address bus contains the
address of an I/O port.
For eg: IN AL, 80H
(00 FF) 256 I/O port locations.

2.

Indirectly via the address contained


in register DX.
OUT DX, AX
(0000 FFFFH) 65,536 I/O port
locations.

Scaled-Index Addressing
Unique to 80386 - Core2 microprocessors.
uses two 32-bit registers (a base register and an index register) to
access the memory.
The second register (index) is multiplied by a scaling factor.
the scaling factor can be 1x, 2x, 4x, 8x

MOV EAX, [EBX] [ECX * 4 + 6].


base

index

scale displacement

32 bit addressing modes may be used while running in real mode by using
Address size prefix
Db 67h
MOV EAX, [EBX] [ECX * 4 + 6]

Super scalar Architecture


Processors capable of parallel instruction execution of multiple instructions
are known as superscalar machines.

Parallel execution is possible through U & V pipeline of Pentium.


Four restriction placed on a pair of integer instruction attempting parallel
execution:
1. Both must be simple instructions
(Mov, Inc, Dec)

2. No data dependencies may exist between them.


read after write dependency
if both instruction write to the same operand
3. Neither instruction may contain both immediate data and a displacement
value.
MOV table[SI], 7

4. Prefixed instruction may only execute in the U pipeline.


MOV ES:[DI], AL
For floating point instruction the first instruction of the pair must be one of the
following :
FADD, FSUB, FMUL, FDIV, FCOM
Second instruction must be FXCH
The compiler plays an important role in the ordering of instruction during
code generation.

Processors are capable of achieving an instruction throughput of more than


one instruction per cycle- superscalar architecture.

Pipelining

Pentium Instruction Set


Data transfer instructions
INS / OUTS - input string from port / output string to port
80286 onwards
INS dest, DX
OUTS DX, src
POPA / PUSHA 80286 onwards All the 16 bit registers
Order of registers for PUSHA- AX,CX,DX,BX,SP,BP,SI,DI
POPAD / PUSHAD 80386 onwards All the 32 bit registers
POPF / PUSHF
POPFD / PUSHFD
80386 onwards
LFS load pointer using FS
LGS load pointer using GS
LSS load pointer using SS

MOVSX move with sign extended


MOVSX dest, src
MOVZX move with zero extended
MOVZX dest, src
It should not be used when working with signed numbers.
BSWAP byte swap- 80486 onwards
Swaps bytes in a 32 bit GPR.
Converting 32 bit numbers from little endian format into big endian
format & vice versa.
BSWAP dst
New Pentium Instruction
MOV - move to / from control registers

Arithmetic Instructions
80286 onwards
CBW convert byte to word
Extend a signed 8 bit number in AL into a signed 16 bit number in AX
Performed before IDIV or IMUL
CWD convert word to double word
Extend a signed 16 bit number in AX into a signed 32 bit number in DX : AX
Performed before IDIV or IMUL
80386 onwards
CWDE - convert word to double word extended
Extend a signed 16 bit number in AX into a signed 32 bit number in EAX
CDQ convert double word to quad word
The sign bit of EAX is extended through EDX.
64 bit results in EDX : EAX

80486 onwards
CMPXCHG compare and exchange
CMPXCHG dst, src
Compares the dst operand with the accumulator.
AL,AX or EAX depending on the size of the dst.
If acc = dst - src is copied to dst.
If acc = dst - acc is replaced by the value in the dst.
Very useful in operating system s/w that supports multiple process
through the use of semaphores.
XADD exchange and add byte, word or double word
XADD dst, src
Pentium instruction
CMPXCHG8B - compare and exchange 8 bytes
CMPXCHG8B dst
ECX : EBX - source
EDX : EAX compared with dst

Bit Manipulation Instructions


80386 onwards
BSF - bit scan forward
BSR - bit scan reverse
BSF EAX, EBX
Scan the src operand for the first bit that equals 1, beginning with the
LSB.
The bit position (Index) of the first 1 found is saved in the dst.
Application : Edge detection in an image processing application

BT bit scan
BT dst, src
To determine the value of a specific bit in the 16 or 32 bit
destination operand.
The bit to be tested is indicated by the source operand
The state of the bit that is tested is copied into the carry flag
BTC after testing the bit - complements
BTS - after testing the bit sets
BTR - after testing the bit resets
Control applications: Single bit is used to operate a device.
Open/close - relay or door
On /off light or indicator
Sense a specific condition of the device.

SHLD / SHRD dst, src, count


Shift left / right double precision
Count : only 8 bit operand lower 5 bits are used

Power PC family
Mid seventies
First RISC type computer IBM 801
Execute an instruction at almost every clock cycles
(To achieve this - hardwired - RISC property)
all 801 instructions - 32 bits long
Mid eighties
IBM developed - commercial RISC type processor
ROMP - Research office products division Microprocessor
65% of the instructions were 16 bits long others were 32 bits long.
In 1990
IBM developed - RS 6000
POWER Performance Optimization with enhanced RISC
RS 6000 - POWER architecture
IBM RS 6000 is a predecessor of the POWER PC architecture

In 1991
IBM + Motorola + Apple - developed a new powerful family of
RISC type Micro processor
POWER PC family
The first POWER PC implementation is the POWER PC 601
Microprocessor also called MPC 601 by Motorola and PPC 601 by
IBM
MPC 603, 604, 620 - based on the POWER PC architecture
derived from the IBM POWER architecture

Power PC Architecture
3 layers
1.User instruction set architecture - includes user level registers,
programming model, data types, addressing modes and the base user
level instruction set (non privileged instruction)
2. Virtual environment architecture - (additional user level
functionality) memory model, cache model, cache control instruction,
address aliasing and other related issues. (user level timer support)

3. Operating environment architecture - supervisor level register,


privileged instruction and the exception model.
operating system level

The basic mode of operation


1. user mode
2. supervisor mode - similar to M68000 family
supervisor mode can access all registers.
user mode can access registers in the user programming model only.
User Programming Model
Application level registers
Supervisor Programming Model
Supervisor level registers

User Programming Model

General-purpose registers (GPRs)

Floating-point registers (FPRs)

FPR0 to FPR31
32 floating-point registers with
64-bit precision.
source and destination operands
of all floating-point operations.
FPRs also provide access to the
FPSCR(Floating-Point Status and
Control Register)

32 general purpose registers.


(GPR0 - GPR31)
Source and destination for all
integer operations.
address source for all load/store
operations. (Base or Index reg)
They also provide access to
SPRs.

Special-purpose registers (SPRs)


The Fixed-Point Exception Register (XER)- used for indicating
conditions for integer operations, such as carries and overflows.

The Floating-Point Status and Control Register (FPSCR)- 32-bit


register used to store the status and control of the floating-point operations.
The Count Register (CTR)- used to hold a loop count that can be
decremented during the execution of branch instructions.

The Condition Register (CR)-32-bit register grouped into eight fields,


where each field is 4 bits that signify the result of an instructions operation

The Link Register (LR) contains the address to return to at the end of a
function call.

Condition Register (CR)

The CR fields can be set in one of the following ways:


Specified fields of the CR can be set from a GPR by using the mtcrf
(move to cr fields) instruction.
The contents of the XER[03] can be moved to another CR field by using
the mcrf (move cr field) instruction.
A specified field of the XER can be copied to a specified field of the CR by
using the mcrxr (move to cr from XER) instruction.

A specified field of the FPSCR can be copied to a specified field of the CR


by using the mcrfs (move to cr from FPSCR) instruction.

Logical instructions of the condition register can be used to perform


logical operations on specified bits in the condition register.
(crand crbD, crbA, crbB)
CR0 can be the implicit result of an integer instruction. (XER)
CR1 can be the implicit result of a floating-point instruction.
(FPSCR)
A specified CR field can indicate the result of either an integer or
floating-point compare instruction.
Branch instructions are provided to test individual CR bits

Bit Settings for CR0 Field of CR


0
1
2
3
LT GT EQ SO
CR0 Bit Description
Bit 0 -Negative (LT)This bit is set when the result is negative.
Bit1- Positive (GT)This bit is set when the result is positive (and not
zero).
Bit 2- Zero (EQ)This bit is set when the result is zero
Bit 3- Summary overflow (SO)This is a copy of the final state of
XER[SO] at the completion of the instruction.
Bit Settings for CR1 Field of CR
4
5
6
7
CR1 Bit Description
FX
FEX VX
OX
Bit 4- Floating-point exception summary (FX)
Bit 5- Floating-point enabled exception summary (FEX)
Bit 6- Floating-point invalid operation exception summary (VX)
Bit 7- Floating-point overflow exception (OX)

Condition Register CRn FieldCompare Instruction


For a compare instruction, when a specified CR field is set to reflect the
result of the comparison, the bits of
Bit 0 -Less than or floating-point less than (LT, FL).
Bit 1 - Greater than or floating-point greater than (GT, FG).
Bit 2 - Equal or floating-point equal (EQ, FE).
Bit 3 - Summary overflow or floating-point unordered (SO, FU).
Fixed-Point Exception Register (XER)

contains carry and overflow information form integer arithmetic operations

The number of bytes to transfer during load and store string instruction
lswx ( load string word indexed ) and stswx (store string word indexed)

Instruction formats
Format

0-5

6-10

11-15

16-20

21-25

26-29

30

31

D-form

opcd

D
tgt/src

A
src/tgt

X-form

opcd

D
tgt/src

A
src/tgt

B
src

opcd

D
tgt/src

A
src/tgt

B
src

C
src

extended opcd

Rc

opcd

D
tgt/src

A
src/tgt

B
src

OE

extended opcd

Rc

BD-form

opcd

BO

BI

I-form

opcd

SIMM
immediate

extended opcd

A-form

BD
LI

AA

LK

AA

LK

Addressing Modes
1. All operations are reg to reg using the following two modes:
Reg direct: operand is in a GPR or FPR (A form)
Immediate : operand is a part of the instruction (D form)

2. EA to memory in needed in two classes of instruction


a) For load and store instruction
- Reg. Indirect GPR reg. contain the address of the operand
in memory (EA)
- Reg. indirect with immediate index (EA = GPR + immediate)
(D form)
- Reg. indirect with index (EA = GPR + index) (X form)
b) For branch instruction
- immediate addressing (I form) target EA is in part of the
instruction
- link register indirect (BD form) target EA is in the LR
- count register indirect target EA is in the CTR

Instruction formats
upper

six bits opcode (0-5)


opcode extended bits (22-30)
two register source operand A (11-15) and B (16-20)
destination operand D (6-10)
OE control bit enables the overflow detection
RC record bit updates the CR
RC (record bit) = 1 for integer operation CR0 is set to reflect
the result of the arithmetic operation (LT GT EQ SO)
RC = 1 for floating point operation CR1 is set to reflect the
state of the exception status bits in the FPSCR (FX FEX VX OX)
D form :
addi rD, rA, SIMM (sign extended)
add immediate rD = rA+ SIMM
Load & store ins A field-reg indirect, SIMM immediate address
D field - dst (load) or src (store)

A form
integer arithmetic have four forms of operation
add add rd, ra, rb (rd ra+rb)
(OE =0 RC =0)
add. add with CR update
(OE =0 RC =1)
addo add with overflow update
(OE =1 RC =0)
addo. add with overflow and CR updated (OE =1 RC =1)
floating point instruction
fadd floating point add
fadd. floating point add with CR update
Composite instructions
fmadd frd, fra, frb, frc (floating point multiply and add)
frd
fra * frb + frc
Load & store ins A field-reg indirect, B field index reg
D field - dst (load) or src (store)

X form
Load & store ins

A-reg indirect, B index reg D- dst (load)


or src (store)

I form

branch instruction unconditional


LI immediate address field (length indicator) 24 bits
AA absolute address bit
AA = 0 LI is shifted two bits left, filling the two lower bits with
zeros and added to the instruction address to form the
branch target address
AA = 1 shifted and sign extended branch target address
LK =1 address of the next ins is placed in LR
b target address
(AA =0 LK = 0)
ba branch absolute
(AA =1 LK = 0)
bl branch then link
(AA =0 LK = 1)
bla branch absolute then link
(AA =1 LK = 1)

BD form :
Conditional branch instruction format
BO field- specifies the conditions under which the branch is taken (type of
condition true or false)
BI field specifies the bit in the CR to be used as a condition of the branch
(which CR bit is to be used as the condition)
BD field is used to form the branch target address (LI field)
bc BO, BI, target address (branch conditional)
bca, bcl and bcla
bclr
branch conditional to link register
bclr BO, BI
(lk=0)
bclrl
(lk =1)
bcctr branch conditional to count register
bcctr BO, BI
(lk=0)
bcctrl BO, BI
(lk=1)

Instruction types

Supervisor Programming Model


Machine state register

Segment registers (SR) 32 bit 16 SRs are


present only in 32 bit Power PC
implementation.
Special purpose registers implementation
dependent
Machine state register
Bit 0 SF 0 32 bit mode
1 64 bit mode
Bit 16 EE external Interrupt enable
(0 disabled & 1 enabled)
SF

EE

PR

FP

ME

FE0

SE

FE1

EP

IT

DT

LE

1-15

16

17

18

19

20

21

22

23

24

25

26

27

28-30

31

Bit 17 PR privilege level 0 execute all the ins


1 only user level ins are executed
Bit 18 FP- Floating point available 0 unavailable
1 available
Bit 19 ME machine check exception enable 0 disabled
1 enabled
Bit 20 & 23 FE0, FE1 FP exception mode (4 modes)
Bit 21 SE- single step trace enable -0 single step execution
1 normal execution
Bit 25 EP exception prefix- exceptions are vectored to the physical
address (0 - 000n nnnnh & 1- FFFn nnnnh)
Bit 26 - IT Instruction address translation
Bit 27 DT- Data address translation
Bit 31 LE 1 little endian mode
0 Big endian mode (default byte ordering)

MPC 601
First Microprocessor
Implementation of the Power PC
66 Mhz, power dissipation is 9w
at 3.6 volts
Integer Execution Unit
Floating Point Unit
Load/Store Unit (LSU)
Branch Execution Units
Memory Management Unit
Memory Unit
Cache

Data Types

It can use either little-endian or big-endian style.

Fixed-point data types include:


o Unsigned byte 8bits
o Unsigned halfword 16-bits
o Signed halfword 16-bits
o Unsigned word 32-bit
o Signed word 32-bit
o Unsigned doubleword 64-bits
o Byte Strings: From 0 128 bytes in length
2s complement is used for negative values
floating-point data formats
single-precision, 32 bits long (23 + 8 + 1)
double-precision, 64 bits long (52 + 11 + 1)
characters are stored using 8-bit ASCII codes

Вам также может понравиться