Академический Документы
Профессиональный Документы
Культура Документы
instr.
code
addr
low
addr
high
data
instr.
semantic
code
dest
register
code
source
register
code
register
Immediate Addressing
22.05.2014 137 Microprocessors Architecture
The targeted information is found in the memory, in the
instruction, immediately after the instruction code
The targeted information
is coded in the instruction; it is a constant
is an operand
cannot be a result
cannot be an instruction
Minimum instruction size: 2B (the data has at least 1B)
instr.
code
data
low
data
high
instr.
code
addr
low
addr
high
data
Relative Addressing
22.05.2014 139 Microprocessors Architecture
The targeted information is found in the program memory,
at an address obtained as a sum between the address of the
current instruction and an offset coded in the instruction
The offset can be positive or negative
The targeted information can be an operand or an
instruction
Minimum instruction size: 2B (the offset usually has 1B)
instr.
code
offset data
IP (addr)
+
Register Indirect Addressing
22.05.2014 140 Microprocessors Architecture
The targeted information is found in the memory, at an
address specified in a register coded in the instruction code
The targeted information can be an operand, a result or an
instruction
One register might not be enough to store an address
Minimum instruction size: 1B
instr.
semantic
code
addr.
register
code
register (addr)
data
Memory Indirect Addressing
22.05.2014 141 Microprocessors Architecture
The targeted information is found in the memory, at the
address specified in a memory location(s) whose address is
specified in the instruction code
The targeted information can be an operand, a result or an
instruction
Minimum instruction size: 3B (the address has at least 2B)
data
instr.
code
addr
low
addr
high
addr
low
addr
high
instr.
code
offset
register (addr)
+
Addressing Modes Summary
22.05.2014 143 Microprocessors Architecture
Various addressing modes
some simpler, some more complicated
some can be used for instructions also, some only for data
the route to the data can be direct or indirect
the targeted information can be in a register, in the program
memory or in the data memory
Depending on the addressing mode, the minimum
instruction size can be 1B / 2B / 3B
The information stored in the instructions can have various
semantics / meanings: data, offset, address, etc.
x86 Addressing Modes
22.05.2014 144 Microprocessors Architecture
Program addressing modes
Relative addressing
Direct addressing
Register indirect addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Program Addressing Modes
22.05.2014 145 Microprocessors Architecture
Used to address instructions through jumps and calls
(normally instructions are addressed sequentially)
Relative addressing (jmp 5h)
The targeted instruction is in the memory, in the code
segment, at an address obtained as the sum between the
content of IP and an offset stored in the current instruction
Direct addressing(jmp 56A80100h)
The targeted instruction is in the memory, in the code
segment, at an address specified in the current instruction
Register indirect addressing (jmp [BX])
The targeted instruction is in the memory, in the code
segment, at an address specified in a register coded in the
current instruction code
x86 Data Addressing Modes
22.05.2014 146 Microprocessors Architecture
Used to address data only
Register implicit addressing (mov AH, BL)
The targeted information is in a register:
8-bit register: AL, AH, BL, BH, CL, CH, DL, DH
16-bit register: AX, BX, CX, DX, SI, DI, SP, BP
Immediate addressing (mov AX, 1234h)
The targeted information is in the memory, in the code
segment, at the effective address IP + 1
x86 Data Addressing Modes
22.05.2014 147 Microprocessors Architecture
Direct addressing (mov AX, [1234h])
The targeted information is in the memory, in the data
segment, at an effective address specified in the current
instruction
Indexed addressing (mov AX, [SI+20h])
The targeted information is in the memory, in the data
segment, at an effective address obtained as a sum between
the content of SI or DI and an offset stored in the current
instruction
Indirect implicit addressing (mov AX, [DI])
The targeted information is in the memory, in the data
segment, at the effective address stored in SI or DI
x86 Data Addressing Modes
22.05.2014 148 Microprocessors Architecture
The targeted information is implicitly found in the data
segment
The targeted information can also be found in the code
segment, extended data segment or stack segment
A redirection prefix should be used
x86 Base-relative Addressing Modes
22.05.2014 149 Microprocessors Architecture
Direct base relative addressing (mov AX, [BX + 1234h])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX and
an offset stored in the current instruction
Indexed base relative addressing (mov AX, [BX + DI + 10h])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX, the
content of SI or DI and an offset stored in the current instruction
Implicit base relative addressing (mov AX, [BX+SI])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX, the
content of SI or DI
x86 Stack-relative Addressing Modes
22.05.2014 150 Microprocessors Architecture
Direct stack relative addressing (mov AX, [BP + 1234h])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP and
an offset stored in the current instruction
Indexed stack relative addressing (mov AX, [BP + DI + 10h])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP, the
content of SI or DI and an offset stored in the current instruction
Implicit stack relative addressing (mov AX, [BP+SI])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP, the
content of SI or DI
3.4 The Instruction Set
Microprocessor Instruction Types
22.05.2014 152 Microprocessors Architecture
Data transfer instructions
set a register or a memory location to a fixed constant value
copy data from a memory location to a register, or vice versa
read and write data from I/O devices
Data processing instructions
arithmetic operations (add, subtract, multiply, divide, etc.)
logic operations (and, or, exclusive or, shift, rotate, etc.)
bitwise logic operations
compare operations
Control flow instructions
branch to another location in the program and execute instructions
there
conditional branch to another location if a certain condition holds
branch to another location, while saving the location of the next
instruction as a point to return to (a call)
Data Transfer Instructions
22.05.2014 153 Microprocessors Architecture
Two operands: a source and a destination
General idea: the source is copied at the destination
The source and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size
Performance criterion: transfer as much data as possible
using an instruction with a small format
x86 Simple Data Transfer Instructions
22.05.2014 154 Microprocessors Architecture
MOV Move (Copy) Data
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
MOV Move (Copy) Data
22.05.2014 155 Microprocessors Architecture
Usage: MOV dest, src
Operands:
dest - general-purpose register, segment register (except CS)
or memory location
src - immediate value, general-purpose register, segment
register or memory location
Effects: Copies the source to the destination, overwriting
the destination's value: (dest) (src)
Flags: none
MOV Move (Copy) Data
22.05.2014 156 Microprocessors Architecture
MOV Move (Copy) Data
22.05.2014 157 Microprocessors Architecture
MOV Move (Copy) Data
22.05.2014 158 Microprocessors Architecture
XCHG Exchange Data
22.05.2014 159 Microprocessors Architecture
Usage: XCHG dest, src
Arguments:
dest - register or memory location
src register or memory location
Effects: Exchanges the source with the destination:
(dest) (src)
Flags: none
Miscellaneous: two memory locations cannot be used in
one instruction
XCHG Exchange Data
22.05.2014 160 Microprocessors Architecture
PUSH Push Operand in the Stack
22.05.2014 161 Microprocessors Architecture
Usage: PUSH src
Arguments: src 16-bit immediate value, register or
memory location
Effects: Decrements stack pointer with 2 and copies src on
top of the stack:
(SP) (SP) 2
((SS):(SP)+1) (src
high
)
((SS):(SP)) (src
low
)
Flags: none
Miscellaneous: src must be a 16-bit value
PUSH Push Operand in the Stack
22.05.2014 162 Microprocessors Architecture
PUSH Push Operand in the Stack
22.05.2014 163 Microprocessors Architecture
POP Pop a word from the Stack
22.05.2014 164 Microprocessors Architecture
Usage: POP dest
Arguments:
dest 16-bit register, segment register or memory location
Effects: Copies the element (16-bit) from the top of the
stack into dest and increments the stack pointer with 2:
(dest
high
) ((SS):(SP)+1)
(dest
low
) ((SS):(SP))
(SP) (SP) + 2
Flags: none
POP Pop a word from the Stack
22.05.2014 165 Microprocessors Architecture
POP Pop a word from the Stack
22.05.2014 166 Microprocessors Architecture
The Source and the Destination Arrays
22.05.2014 167 Microprocessors Architecture
The x86 architecture defines two implicit memory zones which
store two arrays of 8-bit or 16-bit numbers
The source array
Stored in the data segment (the segment with the address in DS)
The current element is at the effective address specified in SI
The destination array
Stored in the extended data segment (the segment with the address
in ES)
The current element is at the effective address specified in DI
The arrays are iterated from left-to-right or vice-versa based on
the value of the direction flag (DF)
x86 String / Array Instructions
22.05.2014 168 Microprocessors Architecture
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
MOVS Move String
22.05.2014 169 Microprocessors Architecture
Usage: MOVSB / MOVSW
Arguments: none
Effects:
movsb:
((ES):(DI)) ((DS):(SI))
(SI) (SI) 1, (DI) (DI) 1.
movsw:
((ES):(DI)) ((DS):(SI)), ((ES):(DI)+1) ((DS):(SI)+1)
(SI) (SI) 2, (DI) (DI) 2.
Flags: none
Miscellaneous: can be prefixed by rep, repe/repz,
repne/repnz
MOVS Move String
22.05.2014 170 Microprocessors Architecture
MOVS Move String
22.05.2014 171 Microprocessors Architecture
MOVS Move String
22.05.2014 172 Microprocessors Architecture
LODS Load String
22.05.2014 173 Microprocessors Architecture
Usage: LODSB / LODSW
Arguments: none
Effects:
lodsb: Copies the current 8-bit element from the source string
to the accumulator and increments (if DF=0) or decrements
(if DF=1) the value in SI by 1:
(AL) ((DS):(SI)), (SI) (SI) 1.
lodsw: Copies the current 16-bit element from the source
string to the accumulator and increments (if DF=0) or
decrements (if DF=1) the value in SI by 2:
(AL) ((DS):(SI)), (AH) ((DS):(SI)+1), (SI) (SI) 1.
Flags: none
LODS Load String
22.05.2014 174 Microprocessors Architecture
LODS Load String
22.05.2014 175 Microprocessors Architecture
STOS Store String
22.05.2014 176 Microprocessors Architecture
Usage: STOSB / STOSW
Arguments: none
Effects:
stosb: Copies the value in the accumulator in the current 8-bit
element in the destination string and increments (if DF=0) or
decrements (if DF=1) the value in DI by 1:
((ES):(DI)) (AL), (DI) (DI) 1.
stosw: Copies the value in the accumulator in the current 16-
bit element in the destination string and increments (if
DF=0) or decrements (if DF=1) the value in DI by 2:
((ES):(DI)) (AL), ((ES):(DI)+1) (AH), (DI) (DI) 2.
Flags: none
STOS Store String
22.05.2014 177 Microprocessors Architecture
STOS Store String
22.05.2014 178 Microprocessors Architecture
Data Processing Instructions
22.05.2014 179 Microprocessors Architecture
An arithmetic operation is applied to one or several sources and
the result is stored in the destination
The arithmetic flags (CF, AF, ZF, PF, SF, OF) are modified!
The sources and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size (exceptions: multiplication, division)
CISC processors characteristics:
Data processing uses an accumulator (one of the sources is also the
destination)
The sources and the destination are memory locations
Execution time depends on instruction complexity
Performance criterion: fast execution of complex data processing
operations
x86 Arithmetic Instructions
22.05.2014 180 Microprocessors Architecture
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
CMP Compare two operands
22.05.2014 181 Microprocessors Architecture
Usage: CMP src1, src2
Arguments:
src1, src2 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts src2 from src1: (src1) (src2). Flags are set in
the same way as the SUB instruction does, but the result is of the
substraction is not saved.
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified acording
to the result.
Misc:
usually the next operation would be a conditional jump to perform
an operation according to the result of the comparison;
only one memory argument is allowed and both arguments have to
be of the same size
CMP Compare two operands
22.05.2014 182 Microprocessors Architecture
CMP Compare two operands
22.05.2014 183 Microprocessors Architecture
ADD Integer Addition
22.05.2014 184 Microprocessors Architecture
Usage: ADD d, s
Arguments:
dest - register or memory location
src - immediate, register or memory location; (two memory
operands cannot be used)
Effects: Adds the source to the destination:
(dest) (dest) + (src).
Flags: The CF, ZF, OF, SF, AF, and PF flags are set according
to the result.
Misc: no difference between signed and unsigned operands
ADD Integer Addition
22.05.2014 185 Microprocessors Architecture
ADD Integer Addition
22.05.2014 186 Microprocessors Architecture
ADC Add with Carry
22.05.2014 187 Microprocessors Architecture
Usage: ADC d, s
Arguments: same as for ADD
Effects: Adds the the carry flag (CF) and the source to the
destination: (d) (d) + (s) + (CF)
Flags: same as for ADD
Misc: same as for ADD
ADC Add with Carry
22.05.2014 188 Microprocessors Architecture
SBB Integer Subtraction with Borrow
22.05.2014 189 Microprocessors Architecture
Usage: SBB dest, src
Arguments:
dest 8bit or 16bit register or memory location
src 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts the carry flag and src from dest:
(dest) (dest) (src) (CF)
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified
acording to the result.
Misc: only one memory argument is allowed and the
arguments have to be of the same size
SBB Integer Subtraction with Borrow
22.05.2014 190 Microprocessors Architecture
MUL Unsigned Multiplication of AL or AX
22.05.2014 191 Microprocessors Architecture
Usage: MUL src
Arguments:
src 8bit or 16bit register or memory location.
Effects:
if src is an 8-bit value: multiplies the value stored in AL by src
and stores the result in AX:
(AX) (AL) * (src)
CF and OF are set to 0 if AH is 0, otherwise they are set to 1.
if src is a 16-bit value: multiplies the value stored in AX by src
and stores the result in DX concatenated with AX:
(DX) (AX) (AX) * (src)
CF and OF are set to 0 if DX is 0, otherwise they are set to 1.
Flags: CF and OF are modified as mentioned above. The
rest of the flags are undefined.
MUL Unsigned Multiplication of AL or AX
22.05.2014 192 Microprocessors Architecture
MUL Unsigned Multiplication of AL or AX
22.05.2014 193 Microprocessors Architecture
DIV Unsigned Division
22.05.2014 194 Microprocessors Architecture
Usage: DIV src
Arguments:
src 8-bit or 16-bit register or memory location;
Effects:
if src is an 8-bit value: divides by src the value stored in AX and
stores the remainder in AH and the quotient in AL:
(AH) (AX) mod (src), (AL) (AX) div (src)
if src is a 16bit value: divides by src the value stored in DX
concatenated with AX and stores the remainder in DX and the
quotient in AX:
(DX) (DX)-(AX) mod (src), (AX) (DX)-(AX) div (src)
Flags: The CF, ZF, OF, SF, AF, and PF flags are undefined.
Misc:
if the quotient is larger than 8bits (16bits) and cannot be stored in
AX (DX AX) then a divide overflow error will be thown.
DIV Unsigned Division
22.05.2014 195 Microprocessors Architecture
DIV Unsigned Division
22.05.2014 196 Microprocessors Architecture
x86 Logic Instructions
22.05.2014 197 Microprocessors Architecture
NOT Complement
AND Logic AND
OR Logic OR
XOR Exclusive OR
SHL | SAL Shift Left (Arithmetic and Logic)
SHR Logic Shift Right
SAR Arithmetic Shift Right
ROL Rotate Left
ROR Rotate Right
RCL Rotate Left with Carry
RCR Rotate Right with Carry
TEST Compare using AND
NOT, OR, AND, XOR
22.05.2014 198 Microprocessors Architecture
I1 I2 OR
0 0 0
0 1 1
1 0 1
1 1 1
I1 I2 XOR
0 0 0
0 1 1
1 0 1
1 1 0
I1 I2 AND
0 0 0
0 1 0
1 0 0
1 1 1
I NOT
0 1
1 0
SHL, ROL, RCL
22.05.2014 199 Microprocessors Architecture
SHR and SAR
22.05.2014 200 Microprocessors Architecture
Control Flow Instructions
22.05.2014 201 Microprocessors Architecture
Exceptions in the sequential execution of instructions:
Branch to a different instruction
Conditional branch to a different instruction
Can be used to create decision structures
Conditional skip of the current/following instruction
Can be used to create inline decision structures
Counter update + conditional branch (loop)
Can be used to create repetitive structures
Return address save + branch to a different instruction (call)
Can be used for subprogram calls
x86 Control Flow Instructions
22.05.2014 202 Microprocessors Architecture
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
x86 Conditional Jump Instructions
22.05.2014 203 Microprocessors Architecture
Instruction Usage Condition Description
JA | JNBE JA label (CF)=0 AND (ZF)=0
Jump to label if above | not
below or equal
JAE | JNB | JNC JAE label (CF)=0
Jump to label if above or equal |
not below | not carry
JB | JNAE | JC JB label (CF)=1
Jump to label if below | not
above or equal | carry
JBE | JNA JBE label (CF)=1 OR (ZF)=1
Jump to label if below or equal |
not above
JG | JNLE JG label (SF)=(OF) AND (ZF)=0
Jump to label if greater | not
lower or equal
JGE | JNL JGE label (SF)=(OF)
Jump to label if greater or equal
| not lower
JL | JNGE JL label (SF)!=(OF)
Jump to label if lower | not
greater or equal
JLE | JNG JLE label (SF)!=(OF) OR (ZF)=1
Jump to label if lower or equal |
not greater
x86 Conditional Jump Instructions
22.05.2014 204 Microprocessors Architecture
Instruction Usage Condition Description
JE | JZ JE label (ZF)=1 Jump to label if equal | zero
JNE | JNZ JNE label (ZF)=0 Jump to label if not equal | not zero
JNO JNO label (OF)=0 Jump to label if not overflow
JNP | JPO JNP label (PF)=0 Jump to label if not parity | parity odd
JNS JNS label (SF)=0 Jump to label if not signed | positive
JO JO label (OF)=1 Jump to label if overflow
JP | JPE JP label (PF)=1 Jump to label if parity | parity even
JS JS label (SF)=1 Jump to label if signed | negative
x86 Loop Instructions
22.05.2014 205 Microprocessors Architecture
Instruction Usage Condition Description
LOOP
LOOP
label
(CX) != 0
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero
LOOPE |
LOOPZ
LOOPE
label
(CX) != 0
AND (ZF)=1
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero and ZF is one.
LOOPNE |
LOOPNZ
LOOPNE
label
(CX) != 0
AND (ZF)=0
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero and ZF is zero.
CALL Call Subprogram
22.05.2014 206 Microprocessors Architecture
Usage: CALL dest
Arguments:
dest (target) address of the first instruction in the called
subprogram; can be an immediate value, a general purpose register
or a memory location;
Effects:
The address of the next instruction is saved in the stack and the
instruction pointer is set to the target address (the CPU performs a
jump to the subprogram):
(SP) (SP) 2, ((SS):(SP)+1) (IP
high
), ((SS):(SP)) (IP
low
)
(IP) (dest)
Flags: none
Misc: Usually there is a RET instruction in the subprogram to
return to the instruction after the call.
CALL Call Subprogram
22.05.2014 207 Microprocessors Architecture
RET Return from Subprogram
22.05.2014 208 Microprocessors Architecture
Usage: RET
Arguments: none
Effects:
The CPU pops the value in the top of the stack and uses it to
jump back to the caller program:
(IP
high
) ((SS):(SP)+1), (IP
low
) ((SS):(SP))
(SP) (SP) + 2.
Flags: none
Misc: Usually the address was placed in the stack by a call
instruction and the return is made to the address that
follows the call instruction.
RET Return from Subprogram
22.05.2014 209 Microprocessors Architecture
x86 Subprogram Calls
22.05.2014 210 Microprocessors Architecture
The CALL and RET instructions do not have input/output
parameters as arguments
There are several conventions for sending I/O parameters
Through General Purpose Registers
Through the Stack
Through the Memory
3.5 Summary
Summary
22.05.2014 212 Microprocessors Architecture
The x86 Registers
Memory Management
Memory Access. Addressing Modes
The Instruction Set
4.1 Introduction
RISC Philosophy. Motivation
22.05.2014 215 Microprocessors Architecture
DARPAs VLSI Project (70 80)
how efficient are the current microprocessors?
provided research funding to university-based teams
to improve the state of the art in microprocessor design
Studies in CPU design showed that
simplified instructions can provide higher performance if this
simplicity enables much faster execution of each instruction
a CPU with a small, highly-optimized set of instructions, can
be more efficient than a CPU with a more specialized set of
instructions
Historical Background
22.05.2014 216 Microprocessors Architecture
RISC: Reduced Instruction Set Computer
a type of microprocessor architecture that utilizes a small,
highly-optimized set of instructions instead of a more
specialized set of instructions
The first RISC projects (mid 70s and early 80s)
IBM: the IBM 801 architecture
Stanford University: Stanford MIPS architecture
University of California, Berkeley: Berkeley RISC I and II
commercialized as the SPARC architecture
Other well-known RISC architectures:
ARM, Atmel AVR, Intel i860/i960, PA-RISC, PowerPC
RISC Principles (I)
22.05.2014 217 Microprocessors Architecture
Hardwired Control Unit
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
22.05.2014 218 Microprocessors Architecture
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
4.2 The Registers
A Large Number of GPRs. Benefits
22.05.2014 220 Microprocessors Architecture
Higher processing speed thanks to a lower number of
memory accesses
Hardware data structures (stacks and queues) created with
general purpose registers
Input/output parameters to/from subprograms are
sent/received through GPRs
Increased chip uniformity factor
RISC Register Set Characteristics
22.05.2014 221 Microprocessors Architecture
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Physical vs. logical registers
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
Register Set Organization
22.05.2014 222 Microprocessors Architecture
A single set of registers
Comprising at least 32 physical registers
No logical registers
Any physical register is accessed by decoding a register code
The registers are accessed similarly to the linearly organized
memory
Register Set Organization
22.05.2014 223 Microprocessors Architecture
Multiple sets of logical registers in a single
set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is bijective
Register Set Organization
22.05.2014 224 Microprocessors Architecture
Multiple sets of logical registers, partially
overlapped, in a single set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is not
bijective anymore!
The overlapping portions are called register
windows
Register Set Organization
22.05.2014 225 Microprocessors Architecture
Multiple sets of logical registers in multiple sets of physical
registers: useful for multiprocessing
Berkeley RISC II Register Set
22.05.2014 226 Microprocessors Architecture
8 sets of logical registers in a single set of 138 physical registers
Each set of logical registers (the work-set for each program)
comprises:
10 registers for global variables - shared with all programs
10 registers for local variables
6 registers for I/O parameters - shared with the calling program
6 registers for parameters - shared with the called program
22.05.2014 227 Microprocessors Architecture
1 set of physical registers (R)
8 sets of logical registers (A H)
Mapping examples:
R0 = A0 = = H0
R9 = A9 = = H9
R10 = A10 = H26
R15 = A15 = H31
R16 = A16
R25 = A25
R26 = A26 = B10
R31 = A31 = B15
4.3 The Instruction Set
RISC Instruction Set Characteristics
22.05.2014 229 Microprocessors Architecture
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Control flow instructions
Subprogram calls use register windows for parameter passing
I/O instructions
RISC Typical Addressing Modes
22.05.2014 230 Microprocessors Architecture
Register implicit addressing
Immediate addressing
Direct (absolute) addressing
Register indirect addressing
Base-relative direct addressing
Base-relative indexed addressing
Relative (to PC) addressing
Intel i860 / i960 Instruction Examples
22.05.2014 231 Microprocessors Architecture
Note: in these examples s1, s2 and d are general purpose
registers
Signed integer addition
adds s1, s2, d ;(d) (s1)+ (s2)
Memory access with two pointers
ldl.l s1(s2), d ;(d) ((s2)+ (s1))
Memory access using a constant
st.s s1, #const(s2) ;((s2)+ const) (s1)
Left shift with three operands
shl s1, s2, d; ;(d) (s2)* 2
(s1)
ARM Instruction Examples
22.05.2014 232 Microprocessors Architecture
Note: in these examples s1, s2, s3 and d are general purpose
registers
Logic AND with three operands
and d, s1, s2 ;(d) (s1)& (s2)
Memory access with pre-indexing
ldr d, [s1+#const]! ;(d) ((s1) + const)
;(s1) (s1) + const
Memory access with post-indexing
str s1, d, #8 ;((d)) (s1)
;(s1) (s1) + const
Multiply and add (four operands)
mla d, s1, s2, s3; ;(d) (s1)* (s2) + (s3)
4.4 The Timing and Control Unit
Instruction format for:
Intel x86 (CISC) microprocessors
1 15 bytes, depending on instruction complexity
Intel i860 (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Stanford MIPS (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Fixed instruction format -> simpler Instruction Decoder
-> simpler Memory Addressing Unit
Simpler Instruction Decoder
22.05.2014 234 Microprocessors Architecture
The Timing Control Unit is
Micro-programmed for CISC microprocessors
Hardwired for RISC microprocessors
Example: 32bit x 32bit multiplication
Micro-programmed Control Unit
Uses the same ALU and MACU + a micro-program
Hardwired Control Unit
Uses a dedicated hardwired circuit
Hardwired Timing and Control Unit
22.05.2014 235 Microprocessors Architecture
32b x 32b CISC Multiplication
22.05.2014 236 Microprocessors Architecture
result 0
for i = 1 to 32 do
if multiplier(i) = 1
result result + multiplicand
end_if
multiplicand multiplicand * 2
end_for
32b x 32b RISC Multiplication
22.05.2014 237 Microprocessors Architecture
Micro-programmed Timing and Control Unit (CISC case)
Uses the same ALU and MACU + a memory of micro-programs
Each instruction is associated with a micro-program which
coordinates the timing of elementary actions
Variable number of states / clock cycles depending on instruction
complexity
Hardwired Timing and Control Unit (RISC case)
Uses a dedicated hardwired circuit
Each instruction is associated with a dedicated hardwired circuit
Fixed number of states / clock cycles regardless of instruction
complexity
Main drawback: the lack of flexibility; adding a new instruction
requires modifications in the hardware design
Micro-programmed vs. Hardwired TCU
22.05.2014 238 Microprocessors Architecture
Premises
All instructions (simple/complex) are executed in the same
amount of time / clock cycles
All instructions are executed in a sequence of stages; example:
fetch the instruction from the memory
decode the instruction
read the operands
execute the instruction
write the result into a register
Pipelining concept: at any moment in time the
microprocessor executes simultaneously several different
stages for several pipelined instructions; this leads to CPI=1
RISC Instructions Pipelining
22.05.2014 239 Microprocessors Architecture
Pipeline example
22.05.2014 240 Microprocessors Architecture
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
22.05.2014 241 Microprocessors Architecture
4.5 Compiler Particularities (Issues)
Compiler computer program that transforms source code
written in a high-level programming language to a lower
level language (e.g., assembly language or machine code)
The efficiency of RISC architectures, obtained through
many optimizations and simplifications, also involve strong
software-layer constraints
=> RISC machines are shipped with dedicated compilers
RISC compilers issues
Register allocation
Optimal allocation of variable values to logical registers
Pipeline correct execution and efficiency management
Data dependence, jump and load/store instructions management
RISC Compilers
22.05.2014 243 Microprocessors Architecture
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation
22.05.2014 244 Microprocessors Architecture
Although RISC machines have many registers there are
programs which require more registers than actually exist.
The register allocator must insert spill code to store some
values back into memory for part of their lifetime.
Storing/loading values in/from the memory is time
inefficient => optimal register allocation needs to be done!
Minimizing the runtime cost of spill code is a crucial
consideration in register allocation.
Register Allocation The Issue
22.05.2014 245 Microprocessors Architecture
Values used in a function: A, B, C, , F
Lifetime of the values represented as timelines over a
sequence of CPU states
Available registers: R1, R2, R3
The interference / color graph
Nodes the values A, B, C, , F
Edges indicate lifetime overlapping of the values
Labels outside nodes the register allocated for the value
The Interference / Color Graph
22.05.2014 246 Microprocessors Architecture
The Interference / Color Graph
22.05.2014 247 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 248 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 249 Microprocessors Architecture
The problem:
The instructions following JMP should not be introduced in
the pipeline
The instructions to be introduced in the pipeline are known
only after the execution of JMP (the jump address is
computed)
The solution:
The compiler should insert several NOP instructions after
every JMP instruction
The drawback: NOPs introduce delays => CPI > 1
The Pipeline and Jumps Management
22.05.2014 250 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 251 Microprocessors Architecture
Optimizations (code reordering) are sometimes possible:
ADD r3, r2, r1
AND r0, r5, r6
JMPZ r0, label
NOP
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
AND r0, r5, r6
JMPZ r0, label
ADD r3, r2, r1
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
ADD does not interfere with the execution of JMPZ,
therefore it can be moved downwards, instead of a NOP.
ANDcannot be moved because the jump is taken / not
taken depending on its result!
The Pipeline and Data Dependency
22.05.2014 252 Microprocessors Architecture
ADD r1, r2, r7
AND r6, r1, r3
The Pipeline and Data Dependency
22.05.2014 253 Microprocessors Architecture
The problem:
The value computed by ADD is not available in the
destination register (R1) when AND needs to read it
The AND reads an old value of R1 and the program does not
work correctly anymore
The solution:
The compiler should insert several NOP instructions after
every ADD instruction in order to delay the next instruction
The drawback: NOPs introduce delays => CPI > 1
The Pipeline and Data Dependency
22.05.2014 254 Microprocessors Architecture
The Pipeline and Data Dependency
22.05.2014 255 Microprocessors Architecture
Optimizations (code reordering) are sometimes possible:
MUL r8, r2, r1
SUB r0, r5, r6
ADD r1, r2, r7
NOP
NOP
AND r6, r1, r3
....
MUL and SUB do not interfere with the execution of ADD,
therefore it can be moved downwards, instead of the NOPs
Data dependency appears in the case of data processing
instructions, but also for memory access instructions:
ADD r1, r2, r7
MUL r8, r2, r1
SUB r0, r5, r6
AND r6, r1, r3
....
LOAD r0, mem
SUB r6, r1, r0
Pipeline Correct Execution and
Efficiency Management
22.05.2014 256 Microprocessors Architecture
Generally the pipeline technique can be successfully used
to execute several instruction stages simultaneously
Leads to CPI = 1
There are cases in which the compiler has to take special
measures (introduce NOPs) to assure the correct execution
of the program
Leads to CPI > 1
Some of the above measures can be optimized to increase
efficiency (CPI -> 1)
4.6 Summary
Economic advantages (translate to lower cost)
Smaller Timing and Control Unit (more than 10 times)
Increased chip uniformity factor
Shorter development time for the TCU
Technical advantages
Higher processing speed
Lower power consumption
Lower probability of hardware design errors
Smaller number of memory accesses
Simpler compiler development (on one side)
RISC Advantages
22.05.2014 258 Microprocessors Architecture
Economic drawbacks
Appeared after the CISC processors
Technical drawbacks
Longer programs (require more program memory)
Lack of flexibility for the TCU: adding a new instruction
requires modifications in the hardware design
More complex compiler development (on one side)
RISC Drawbacks
22.05.2014 259 Microprocessors Architecture
5.1 I/O Devices Organization
I/O ports are characterized by address and content
The content of a port is linked to an external peripheral
Writing a port = Sending data to the peripheral
Reading a port = Receiving data from the peripheral
I/O ports can be accessed as
Memory locations, using memory addressing instructions
Ports, using dedicated I/O instructions
I/O Ports and Peripherals
22.05.2014 262 Microprocessors Architecture
IN dest, port
Reads data from the port and stores it into dest
OUT port, src
Writes the data from src to the port
In CISC processors only the accumulator can be used as
source and destination in the dedicated I/O instructions
Dedicated I/O instructions involve
Specific machine cycles
Specific signals on the control bus (IOR and IOW)
Dedicated I/O Instructions
22.05.2014 263 Microprocessors Architecture
Port map is smaller than the memory map
The addressing modes used for the dedicated I/O
instructions are very restrictive: direct and register-indirect
Main advantage: port access is faster
Example for Intel x86
64k ports, one byte each
consecutive one-byte ports can be accessed as one-word port
in AL, 0Fh; in AX, DX
out 10h, AL; out DX, AX
Dedicated I/O Instructions
22.05.2014 264 Microprocessors Architecture
The I/O ports are mapped within the main memory and
are regarded as regular memory locations
Port access is done with regular memory addressing
instructions; consequently:
The same machine cycles are used
The same signals on the control bus (MEMR and MEMW)
The same addressing modes are used
Main advantage: port access is simpler
Drawbacks: port access is slower, a part of the memory map
is wasted on ports
Memory Mapped Ports
22.05.2014 265 Microprocessors Architecture
5.2 Typical I/O Techniques
I/O technique: microcomputer-peripheral synchronization
technique
Types:
Synchronous (with the current program) techniques
The microcomputer-peripheral communication is initiated by
the CPU (by executing specific instructions)
Asynchronous techniques
The microcomputer-peripheral communication is initiated by
the peripheral independently on the program executed by the
CPU
Typical I/O Techniques
22.05.2014 267 Microprocessors Architecture
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
The Polling Technique
22.05.2014 268 Microprocessors Architecture
Procedure:
The CPU reads periodically the state of the peripherals connected to
the ports (reads a status byte from the port)
The CPU initiates a data transfer if the peripheral is ready
Notes:
The CPU actions are triggered by instructions in the program
The status byte is read through the data bus
Main advantage: no additional hardware is required
Drawbacks:
The CPU wastes time on polling the state of the peripherals
Potential communication requests can be lost
The Polling Technique
22.05.2014 269 Microprocessors Architecture
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
22.05.2014 270 Microprocessors Architecture
pollPort24: in AL, 24h
shl AL, 1
jnc pollPort24
out 24h, AX
pollPort37: in AL, 37h
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
Interrupt-driven I/O
22.05.2014 271 Microprocessors Architecture
Procedure:
The peripheral sends an interrupt signal (through a port) on a
dedicated terminal of the microprocessor
By doing this the peripheral says it is ready for data transfer
If it is programmed to respond to interrupt signals, the CPU
interrupts its current activity and starts the data transfer
Notes:
The interrupt signal is received on a dedicated pin
The current program is halted and has nothing to do with the
data transfer
Main advantage: the CPU responds very fast to interrupts
Interrupt-driven I/O
22.05.2014 272 Microprocessors Architecture
Interrupt request (IRQ): the signal sent by the peripheral to
the CPU (on a dedicated pin) to request access to the
systems resources
Interrupt request response: a sequence of steps performed
by the CPU in response to the IRQ
Interrupt service routine (ISR) or interrupt handler: a
dedicated program (sequence of instructions) through
which the CPU responds to the IRQ of a specific peripheral
Interrupt-driven I/O. Definitions
22.05.2014 273 Microprocessors Architecture
Step 1. The CPU finishes the execution of the current instruction.
Step 2. The CPU saves the flags register in the stack.
Step 3. The CPU saves some of the general purpose registers in
the stack.
Step 4. The CPU saves the return address (the address of the next
instruction) in the stack.
Step 5. The interrupt flag (IF) is reset to disable any other
interrupts.
Step 6. The CPU jumps to the interrupt service routine (ISR).
After the ISR is executed the CPU restores all the information
saved in the stack, returns to the main program and continues
The Interrupt Request Response
22.05.2014 274 Microprocessors Architecture
5.3 Typical Interrupt Techniques
DMA is an interrupt-based I/O technique which allows a
peripheral to access the memory directly (without CPU
intervention)
Direct Memory Access (DMA)
22.05.2014 276 Microprocessors Architecture
Procedure:
The peripheral sends an interrupt signal to the DMA controller
The DMA controller sends a Bus Request (BUSRQ) signal to
the microprocessor
The CPU finishes the current machine cycle and interrupts its
activity; this is equivalent to freeing the system bus
The DMA controller is left in charge of the microcomputer and
Generates addresses for a sequence of memory locations
Manages the data transfer between the port to which the
peripheral is connected and the sequence of memory locations
Notes:
DMA interrupts have the highest priority
DMA interrupts cannot be disabled by the user
Direct Memory Access (DMA)
22.05.2014 277 Microprocessors Architecture
The NMIs interrupt procedure is the same as the one
described for the general case
Notes:
NMIs are received on another dedicated terminal (NMI)
NMIs cannot be disabled by the user
In the case of NMIs the CPU finishes the execution of the
current instruction before responding to the interrupt
The ISRs address is predefined
Non-maskable Interrupts (NMI)
22.05.2014 278 Microprocessors Architecture
As opposed to NMIs the maskable interrupts can be
disabled (ignored) by the user
The interrupt procedure is the same as the one described
for the general case
Notes:
Maskable interrupts are received on a dedicated pin (INT)
The ISRs address can be:
Predefined
Provided by the peripheral
Selected by the CPU based on a code sent by the peripheral
Maskable Interrupts (INT)
22.05.2014 279 Microprocessors Architecture
The maskable interrupts for which the ISRs address is selected
by the CPU based on a code sent by the peripheral are called
vectored interrupts.
Interrupt vector the complete address of the ISR
Interrupt Vector Table (IVT) a table (stored in the memory)
which contains all the interrupt vectors for all the ISRs
Interrupt vector selection procedure
The peripheral sends a code to the CPU; each peripheral is initially
allocated a unique code
The code is used by the CPU as an index in the IVT (to select the
corresponding interrupt vector)
Vectored Interrupts
22.05.2014 280 Microprocessors Architecture
Notes
The size of the interrupt vectors is the size of a complete
address
The size of the code sent by the peripheral depends on how
many interrupts the processor can respond to
The size of the IVT is derived by multiplying the size of one
vector by the maximum number of vectors in the IVT
The IVT is usually located in the memory at a predefined
address (usually address 0x00)
Vectored Interrupts
22.05.2014 281 Microprocessors Architecture
The size of an interrupt vector: 4 bytes(SA 2B and EA 2B)
The code sent by the peripheral has 8 bits => max 256 interrupt
types = max 256 interrupt vectors => max 256 ISRs
The size of the IVT: 256 vectors x 4B = 1024B
The IVT is stored at the beginning of the memory
Vectored Interrupts. Example (x86)
22.05.2014 282 Microprocessors Architecture
Software interrupts are special instructions in the x86
instruction set
The execution of such an instruction is identical to the CPU
response to a vectored interrupt
The code which is sent by the peripheral in the case of
vectored interrupts is provided as an operand of is implied
x86 software interrupts:
INT [code]
Interrupt with the provided code or with code=3 (if not provided)
INTO
Interrupt with code=4 only if OF is set
x86 Software Interrupts
22.05.2014 283 Microprocessors Architecture
The execution of INT 5h involves the following steps:
1. The flags register (F) is saved in the stack
2. IF and TF are set to zero (other interrupts are disabled)
3. The return address is saved in the stack
4. The interrupt vector is selected and the jump to the ISR is
performed:
CS is loaded with the value found in the memory at addresses
4*5+3 and 4*5+2
IP is loaded with the value found in the memory at addresses
4*5+1 and 4*5+0
x86 Software Interrupts. Example
22.05.2014 284 Microprocessors Architecture
The execution of INTO involves the following steps:
1. The OF flag is verified; if it is 1 then the following steps are
made, else the CPU continues with the next instruction
2. The flags register (F) is saved in the stack
3. IF and TF are set to zero (other interrupts are disabled)
4. The return address is saved in the stack
5. The interrupt vector is selected (INTO always refers to a
fixed vector) and the jump to the ISR is performed:
CS is loaded with the value found in the memory at addresses 13h
and 12h
IP is loaded with the value found in the memory at addresses 11h
and 10h
x86 Software Interrupts. Example
22.05.2014 285 Microprocessors Architecture
Historical Background
The invention of the transistor and the integrated circuit
Moores Law
The invention of the microprocessor and the microcontroller
Microprocessors Evolution Tree
General purpose microprocessors
Microcontrollers
Special purpose microprocessors (DSPs, commprocessors, )
Typical Applications
Introduction
22.05.2014 287 Microprocessors Architecture
The Structure of a Microcomputer
22.05.2014 288 Microprocessors Architecture
The CPU: executes instructions (processes data) and controls the system
The Memory: stores both the data and the instructions
The I/O Devices: interconnect the microcomputer with the outside world
Instruction Execution Example <>
22.05.2014 289 Microprocessors Architecture
The CPU is reset and starts executing instructions from a
predefined address in the memory (100h)
Reset
Execute
instructions from
address 100h
Overview of a CISC, General Purpose
Microprocessor Core
22.05.2014 290 Microprocessors Architecture
General Purpose Registers (GPRs)
Memory Data Register (MDR)
Memory Address Registers (MAR)
Arithmetic and Logic Unit (ALU)
Memory Addressing Control Unit
Timing and Control Unit (TCU)
Instruction Execution Timing <>
22.05.2014 291 Microprocessors Architecture
Typically, the execution of an instruction has several stages:
Fetch the instruction code is read from the memory
Decode the instruction code is decoded
Execute the instruction is executed (might comprise operands fetch)
Write the result is written in a register or a memory location
The instruction execution stages are called machine cycles
Any instruction is executed in one or several machine cycles (depending on
its complexity)
In a machine cycle the CPU executes sequentially several elementary
actions accomplishing a clear, well-defined task
Elementary actions are executed once every clock cycle
An internal clock signal is generated based on an external quartz oscillator
A CPU state is a physical time period equal to the duration of a clock cycle
In a state, the CPU executes one elementary action or two independent
elementary actions (in the same time)
The x86 Architecture
22.05.2014 292 Microprocessors Architecture
The x86 Registers
Types, sizes, usage, implicit functions, accessibility, etc.
Memory Management
Linear vs. segmented memory models, the x86 memory
model
Memory Access. Addressing Modes
What is an addressing mode? Comparison, examples, etc.
The Instruction Set
Instruction types, formats, examples, comparison, etc.
x86 Registers. Summary
22.05.2014 293 Microprocessors Architecture
x86 has very few registers
4 general purpose registers, 2 index registers, 2 pointer registers
Some of the x86 registers are multifunctional
x86 has 4 segment registers
special functions in memory management
All the registers are user-accessible; one exception: IP
The size of the registers is usually the size of the Internal Data
Bus
x86 Memory Segmentation. Summary
22.05.2014 294 Microprocessors Architecture
The memory can be regarded as a sequence of memory locations
Each memory location stores an 8-bit number and has a unique
20-bit address, called physical address
The x86 CPU regards the memory as being composed of 64k
segments comprising 64k locations each
The x86 CPU uses a 16-bit segment address to select a segment
and a 16-bit effective address to identify a memory location
inside the segment
The translation between the logical organization of the memory
in segments and the physical address is done as follows:
PA = SA 0h + EA
x86 Addressing Modes
22.05.2014 295 Microprocessors Architecture
Program addressing modes
Relative addressing
Direct addressing
Register indirect addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Simple Data Transfer Instructions
22.05.2014 296 Microprocessors Architecture
MOV Move (Copy) Data
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
x86 String / Array Instructions
22.05.2014 297 Microprocessors Architecture
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
x86 Arithmetic Instructions
22.05.2014 298 Microprocessors Architecture
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
x86 Control Flow Instructions
22.05.2014 299 Microprocessors Architecture
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
Fundamental principles
The set of registers
Specific characteristics, organization, examples, etc.
The instruction set
Characteristics, typical addressing modes, examples
The timing and control unit (TCU)
Micro-programmed vs. hardwired, instruction pipelining
Compiler particularities
Register allocation, pipeline issues
RISC Architectures. Summary
22.05.2014 300 Microprocessors Architecture
RISC Principles (I)
22.05.2014 301 Microprocessors Architecture
Hardwired Control Unit
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
22.05.2014 302 Microprocessors Architecture
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
RISC Register Set Characteristics
22.05.2014 303 Microprocessors Architecture
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Physical vs. logical registers
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
RISC Instruction Set Characteristics
22.05.2014 304 Microprocessors Architecture
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Control flow instructions
Subprogram calls use register windows for parameter passing
I/O instructions
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
22.05.2014 305 Microprocessors Architecture
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation <>
22.05.2014 306 Microprocessors Architecture
Microprocessors I/O Techniques
22.05.2014 307 Microprocessors Architecture
I/O devices organization
ports and peripherals; dedicated instructions; memory
mapped ports
The polling technique
Interrupt-driven I/O
Definitions
Interrupt types
Vectored interrupts, x86 interrupts, software interrupts
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
The Polling Technique
22.05.2014 308 Microprocessors Architecture
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
22.05.2014 309 Microprocessors Architecture
pollPort24: in AL, 24h
shl AL, 1
jnc pollPort24
out 24h, AX
pollPort37: in AL, 37h
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
Interrupt-driven I/O
22.05.2014 310 Microprocessors Architecture
The Interrupt Vectors Table (IVT)
22.05.2014 311 Microprocessors Architecture
Compute the size of the interrupt vectors table provided that
The size of the memory is 1 MB (1)
Each memory location stores 8 bits (2)
The processor uses linear memory organization (3)
The code sent by the peripheral has 8 bits (4)
(1) & (2) => there are 2^20 memory locations => the PA has 20
bits (5)
(3) => the processor uses physical addresses directly (6)
(5) & (6) => the size of an interrupt vector is 3 bytes = 24 bits >
20 bits (7)
(4) => there are 256 interrupt vectors in the table
(4) & (7) => the IVT has 256 vectors x 3B = 768B