Microprocessors HC v12

Horia Cucu
Speech & Dialogue Research Laboratory

Faculty of Electronics, Telecommunications and Information Technology
University POLITEHNICA of Bucharest
Introduction to Microprocessors
Historical Background
Microprocessors Evolution Tree
Typical Applications
Educational Need
Administrative Issues
22.05.2014 2 Microprocessors Architecture
1947: Invention of the transistor
1959: Invention of the integrated circuit (IC)
1965: Birth of Moore's Law
1971: Development of the first microprocessor
1976: Introduction of the first microcontroller
Microprocessors and
Microcontrollers
is a CPU-on-a-chip
is a computer-on-a-chip
others
Itanium
RISC
Pentium
80486
80386
80286
8086
8085
8080
8008
4004
8048
8051
DSPs
Comm processors
others
others
General Purpose
Microprocessors
Microcontrollers
Special Purpose
Microprocessors
PIC
AVR
General purpose microprocessors: used to create computers
PCs, Laptops, Workstations
Servers, Super-computers (32-bit/64-bit powerful computers)
Special purpose microprocessors
Digital Signal Processing (DSP) processors
Multimedia applications
Communication processors
Networking equipment (switches, routers, etc.)
Microcontrollers: used to implement embedded systems
consumer electronics (toys, cameras, robots)
consumer products (washing machines, microwave ovens, etc.)
instrumentation (oscilloscopes, medical equipment)
process control (data acquisition and control)
communication (telephone sets, answering machine, etc.)
office appliances (fax machines, printers, etc.)
multimedia (smart-phones, PDAs, tablets, teleconferencing
equipment)
automotive industry (onboard computers)
The Educational Need - a Big Question
others
Itanium
RISC
Pentium
80486
80386
80286
8086
8085
8080
8008
4004
8048
8051
DSPs
Comm processors
others
others
General Purpose
Microprocessors
Microcontrollers
Special Purpose
Microprocessors
AVR
PIC
Microprocessors Course Outline
1. The Structure of a Microcomputer
2. Overview of a CISC, General Purpose Microprocessor Core
3. The x86 Architecture
4. RISC Architectures
5. Input/Output Strategies
Administrative Issues
Laboratory
Objective: highlight the architectural attributes for the x86
Microprocessors
Sessions: 5 teaching labs + one evaluation session
Bibliography
C. Burileanu, Microprocesoarele x86 o abordare software, Grupul
pentru microinformatic, Cluj-Napoca, 1999
Communication through the Moodle framework (Arhitectura
Microprocesoarelor - H. Cucu, Password: Microprocesor)
Lecture slides (contain only a brief summary)
Laboratory documentation
Evaluation results
Evaluation
Evaluare

Evaluarea activitii pe parcurs (pentru care studentul primete o not: N
laborator
)
este compus din 2 teste obligatorii i o evaluare final opional.
o Nici-o component a evalurii activitii pe parcurs nu se reface.
o Notarea:
Cele 2 teste n timpul edinelor de laborator sunt evaluate cu
note (0 10).
dac media celor 2 note < 5: studentul va reface complet aceast
disciplin n anul urmtor.
dac media celor 2 note >= 5: studentul poate opta pentru:
prezentarea la evaluarea final; n acest caz: N
laborator
= 5 10;
prezentarea direct la examen; n acest caz: N
laborator
= 5.

Examen final n sesiunea de var:
o Examen oral.
o Studentul primete o not: N
examen
= 0 10.
o Se poate reface n septembrie.

Media final: M = (N
laborator
+ N
examen
) / 2
calculat prin trunchiere pentru 4 =< M < 5 i prin rotunjire pentru celelalte
valori.
Evaluation
Definitions
Block Diagram of a Microcomputer
A microcomputer is a general purpose device that can be programmed
to carry out a set of arithmetic and/or logical operations.
Functional Components
CPU: the hardware block which processes data and
controls the system
Memory: the hardware block which stores data in a
sequence of memory locations
I/Odevices: hardware blocks that form the interface
between the microcomputer and the external world
Busses: the connections between the above blocks
The von Neumann Principles
Both data and instructions are stored in the memory
The contents of the memory is accessed by location
The microprocessor is the CPU of the microcomputer; its role is
to process data and control the system
The instructions are fetched from the memory and executed
sequentially by the CPU
I/O ports are used to communicate with other devices
The three hardware blocks are interconnected by the system bus
The Memory Basic Principles
Memory sequence of memory locations used to store info
Each memory location:
stores an 8-bit number, a byte of data
is identified by a unique number, called address
The memory is accessed and organized by the CPU only
The CPU can choose to create logical subdivisions within the
memory (called pages or segments)
The memory map all memory locations that can be
addressed by the CPU (not necessarily implemented)
The Memory A Closer Look
The size of the memory is directly linked with the address
size through the following equation:
Example 1:
using an address of 2 bits, one can form 4 different addresses:
00, 01, 10, and 11, for up to 4 different memory locations
consequently, a memory with an address of 2 bits will
comprise 4 memory locations (4 bytes).
Example 2:
using a 20-bit address, one can form 2
20
different addresses,
corresponding to 2
20
different memory locations
consequently, a memory with a 20-bit address will comprise
2
20
memory locations (1 MB).
] [
2
bits e addressSiz
memorySize
The Memory Contents Significance
This could be a 16-bit result
This could be an instruction
These could be the first two elements in
an array of 8-bit numbers
The significance of the information is given by the programmer.
The memory doesnt know the significance of the information it stores!
Input/Output Devices
I/O Devices hardware blocks that form the interface
between the microcomputer and the external world
I/O Devices can be regarded as a set of I/O Ports
Each I/O port can be used to:
send an 8-bit/16-bit/32-bit number to an external device
receive an 8-bit/16-bit/32-bit number from an external device
is identified by a unique number, called port address
The ports map all ports that can be addressed by the CPU
(not necessarily implemented)
The System Bus
Bus set of physical connections that link several hardware
blocks; these connections are used for information transfer
The CPU, Memory and I/O Devices are connected through
a unique System Bus with three components:
A bidirectional Data Bus
Transfers data (operands, results, etc.) and instructions
An unidirectional Address Bus
Through this bus the CPU sends addresses to the Memory and
I/O Devices
A bidirectional Control Bus
Transfers command and control signals from/to the CPU
The Software Component
The microcomputer is executing instructions organized in
computer programs, namely the software
Two main categories:
The Operating System: set of programs which facilitate the
users access to the systems resources
User Software: set of programs specifically created by the user
to achieve a certain task
Summary
The CPU: executes instructions (processes data) and controls the system
The Memory: stores both the data and the instructions
The I/O Devices: interconnect the microcomputer with the outside world
Information Representation in Computer Systems
Information Representation in
Computer Systems
Information is stored using electronic circuits, called flip-
flops (or bistables), that have two stable states: on/off
The state of a bistable can be used to represent a bit (i.e.
binary digit: 0, 1) or a boolean value (true, false)
Data types with more than two possible values are stored
using sequences of bits:
Byte (B) a sequence of 8 bits: can store max 2
8
(256) values
Word (w) a sequence of 16 bits: can store max 2
16
values
Double word (dw) 32 bits: can store max 2
32
values
Numbers representation
Unsigned (positive) integer numbers
Natural binary representation
Signed integer numbers
Sign & magnitude representation
1s complement representation
2s complement representation
Signed real numbers
Fixed point representation
Floating point representation
Integer numbers representation
Decimal
value
Sign and magnitude 1s complement 2s complement
5 natural binary: 00000101 natural binary: 00000101 natural binary: 00000101
-5
natural binary: 00000101
flip the sign bit: 10000101
flip all bits: 11111010
add 1: 11111011
12 natural binary: 00001100 natural binary: 00001100 natural binary: 00001100
-12
flip the sign bit: 10001100
add 1: 11110100
Real numbers representation
Fixed point representation
A fixed sequence of bits is used to represent decimal part
Twos complement representation
A fixed sequence of bits is used to represent the fractional part
Floating point representation
A fixed sequence of bits is used to represent the mantissa
Twos complement representation
A fixed sequence of bits is used to represent the exponent
Example: real number = mantissa 2
exponent
Characters representation
Coding
conventions:
ASCII
UTF-8
UTF-16
Unicode
Instructions are represented using sequences of bytes;
Some processors have fixed-size instructions
8086 has variable-size instructions (1-6 bytes)
The instruction codes
are formed of several fields:
one instruction-type field
none, one or several data fields
none, one or several address fields
are associated with mnemonics (to be used in programming)
Example: add AX, 8017h <=> 051780h
Programs representation
The binary, decimal
and hexadecimal bases
Any sequence of bits can also be represented as:
a decimal number (number in base 10)
can be written as a sequence of decimal digits (0, 1, , 9)
a hexadecimal number (number in base 16)
can be written as a sequence of hexadecimal digits (0, 1, , 9, A,
B, C, D, E and F)
Hexadecimal numbers representation conventions:
the h suffix: 1A44h
the 0x prefix: 0x1A44
Conversion algorithms
binary
decimal
hexa
2.1 Von Neumann Architecture Reminder and Example
Block Diagram of a Microcomputer
Instruction Execution Example
The CPU is reset and starts executing instructions from a
predefined address in the memory (100h)
Reset
Execute
instructions from
address 100h
The CPU sends the address of this first instruction (100h)
through the Address Bus
The CPU sends a MEM-READ signal through the Control Bus
100h
MEM-READ
The Memory receives the MEM-READ signal and reads the
address from the Address Bus
100h
MEM-READ
The Memory finds the instruction (instruction #1) in the
memory location(s) with the corresponding address (100h)
The Memory sends the instruction through the Data Bus and
sends an ACK signal through the Control Bus
instruction #1
ACK
The CPU receives the ACK signal and reads the instruction
from the Data Bus
instruction #1
ACK
The CPU decodes the instruction to "understand" what it has
to do next
Let's suppose that it has to add the value 50h to the value
stored in the memory location with the address 2000h
Decode
instruction
The CPU sends the address (2000h) on the Address Bus and
sends a MEM-READ signal through the Control Bus
2000h
MEM-READ
The Memory receives the MEM-READ signal and reads the
address from the Address Bus
2000h
MEM-READ
The Memory finds the data (85h) in the memory location
with the corresponding address (2000h)
The Memory sends the data (85h) through the Data Bus and
sends an ACK signal through the Control Bus
85h
ACK
The CPU receives the ACK signal and reads the data from the
Data Bus
85h
ACK
The CPU temporarily stores the data in a register
The CPU adds the value 50h to the register (the result will be
D5h)
The CPU sends
the result (D5h) through the Data Bus,
the address (2000h) through the Address Bus and
a MEM-WRITE signal through the Control Bus
2000h
MEM-WRITE
D5h
The Memory
receives the MEM-WRITE signal,
reads the address (2000h) from the Address Bus,
reads the result (D5h) from the Data Bus and
stores the result into the corresponding memory location
2000h
MEM-WRITE
D5h
The CPU continues by executing the next instruction
2.2 The Set of General Purpose Registers
CPU Registers
Register a small amount of storage inside the CPU
Implemented as a set of N synchronized bistables
Stores N bits of data
Highest access speed among all storage options
Several types of registers:
General vs. special purpose (dedicated) registers
Physical vs. logical registers
User-accessible vs. non user-accessible registers
General Purpose Registers
General purpose registers (GPRs)
Set of equally-sized registers used to store temporary data
(operands/results) needed in the execution of the program
User-accessible (architectural attributes)
Implemented as physical or logical registers
The size of the GPRs performance criterion
Equal to the size of the Internal Data Bus
The number of GPRs performance criterion
A larger number of GPRs => faster, more compact programs,
ease of programming,
General Purpose Registers
MUX (multiplexer) outputs one of the data inputs
(depending on the address inputs)
Internal Data Bus extension of the External Data Bus
inside the CPU
Special Purpose Registers
Special purpose registers
Dedicated registers that can be used only for specific purposes
Size depends on the particular role of the register
Some are user-accessible (architectural attributes), some not
Examples:
Data register (DR) and Address register (AR)
Accumulator (A)
Status (Flags) register (F)
Instruction Pointer (IP)
Stack Pointer (SP)
2.3 The interface between the CPU and the System Bus
The Data Register
and the Address Register
DR (data register): the CPU Data Bus interface
The data in DR are available to all the hardware blocks
connected on the Data Bus
The size of DR is the size of the Data Bus
DR is not an architectural attribute
The Data Register
and the Address Register
AR (address register): the CPU Address Bus interface
The address in AR is available to all the hardware blocks
connected on the Address Bus; only the CPU writes in AR
The size of AR is the size of the Address Bus
AR is not an architectural attribute
2.4 The Arithmetic and Logic Unit (ALU)
The Arithmetic and Logic Unit
The Arithmetic and Logic Unit (ALU)
digital circuit that performs
integer arithmetic operations: add, subtract, increment, etc.
logical operations: and, or, xor, not, clear, shift, rotate, etc.
The inputs to the ALU
Data to be processed (one or two integer numbers)
The operation to be performed (specified by the Control Unit)
Possibly some status flags
The outputs of the ALU
The operation result(s) are placed in the Accumulator or on the
Internal Data Bus
The status flags are updated after each operation
The Arithmetic and Logic Unit
The Status Register
The Status Register (also called Flags Register)
A collection of flag bits, which store information regarding
the state of the processor
Arithmetic and logic flags
Bits encoding the status of the previous arithmetic/logic
operation
Used and updated by the ALU
Other types of flags
Interrupt enable flag
Supervisor flag
Direction flag
Typical Arithmetic and Logic Flags
Carry flag (CF): signals an arithmetic carry or borrow for
unsigned numbers
Parity flag (PF): signals that the number of ones in the least
significant byte of the result is even
Zero flag (ZF): signals that the result is 0
Sign flag (SF) : signals that the most significant bit of the result
is set (this is the sign bit in twos complement representation)
Overflow flag (OF): signals an arithmetic overflow for signed
numbers
The Accumulator and the Shift Register
The Accumulator special purpose register
Stores one of the operands before the operation
Stores the result of the operation
Size equal to the size of the general purpose registers
Is user-accessible (architecture attribute)
The Shift Register special purpose register
Used by the ALU to make shift and rotation operations
Size double than the size of the general purpose registers
Is not user-accessible
2.5 The Memory Addressing Control Unit (MACU)
The Memory Addressing Control Unit
Hardware block that computes the physical address needed to
identify information in the Memory or I/O Ports
Receives input from the Internal Data Bus
Places its output (a physical address) in the Address Register
Functionality classification
Instruction addressing (in the program memory)
Sequentially, instruction after instruction
Non-sequentially, through jumps
Data addressing (in the data memory)
Elementary data addressing
Stack addressing
Data arrays addressing
Memory Management Techniques
Linear Memory Organization
The memory is regarded as a single block of memory locations
The memory is addressed using directly a physical address
Memory Segmentation
The memory is logically divided into segments (non equal-sized,
possibly overlapping sections)
The memory is addressed using a segment address and an offset
Memory Paging
The memory is logically divided into pages (equal sized, non-
overlapping, strictly concatenated sections)
The memory is addressed using a page address and an offset
Sequential Instructions Addressing
The main principle of the von Neumann architecture
Achieved through the means of a counter register
The Program Counter (PC) special purpose register
Stores the physical address of the current instruction
Incremented after the execution of each instruction
Size equal to the size of a physical address
In some architectures is user-accessible
Other hardware blocks involved: MUX2 and MUX5
The program is executed instruction after instruction
The Instruction Register stores the instruction before
decoding
Non-Sequential Instructions Addressing
Exceptions to the normal, sequential execution of a program:
jumps, loops or subprogram calls
The jump address can be:
An absolute address: a complete physical address
The address is provided by another hardware block through the
Internal Data Bus
An offset relative to the address of the current instruction
The offset provided by another hardware block through the Internal
Data Bus is added to the address in PC
The Program Counter is also updated
Other hardware blocks involved: MUX2, MUX4, MUX5, Adder
Elementary Data Addressing
The data can potentially reside anywhere in the memory
The data address can be:
An absolute address: a complete physical address
The address is provided by another hardware block through the
Internal Data Bus
An offset relative to the address of the current instruction
The offset provided by another hardware block through the
Internal Data Bus is added to the address in PC
Other hardware blocks involved: MUX4, MUX5, Adder
Stack Addressing
The Stack: LIFO data structure
Accessed through the means of a pointer register
Pushing an element in the Stack -> decrementing the pointer
Popping an element out of the Stack -> incrementing the pointer
Software vs. hardware Stack
The Stack Pointer (SP ) special purpose register
Stores the physical address of the top element
User-accessible (architecture attribute)
Other hardware blocks involved: MUX3 and MUX5
Stack Addressing
Data Arrays Addressing
The Memory can accommodate arrays of data
Accessed through the means of index registers, which store the
physical of the first element in the array
The address of a random element is obtained by adding a relative
offset to the index register
Offset size => max number of elements in the array
The Index Registers (IX) special purpose registers
Store the physical addresses of various data arrays
User-accessible (architecture attribute)
Other hardware blocks involved: MUX1, MUX4, MUX5, Adder
2.6 The Timing and Control Unit
The Timing and Control Unit
The Timing and Control Unit (TCU)
Hardware block inside the CPU that:
fetches, decodes and manages the execution of instructions
controls the flow of data through the processor
coordinates the activities of the other units within the CPU and also
outside the CPU
achieves the above through timing and control signals
Design: hardwired vs. micro-programmed
The inputs to the TCU
The instruction in the Instruction Register (IR)
Internal control signals (the status flags)
The outputs of the TCU
Internal control signals (for the blocks within the CPU)
External control signals (for the blocks outside the CPU)
The Timing and Control Unit
The Instruction Register and
the Instruction Decoder
The Instruction Register (IR) special purpose register
Stores the instruction code fetched from the memory
Receives input only from the Data Register
Size equal to the smallest instruction code
Is not user-accessible (not an architecture attribute)
The Instruction Decoder
Hardware block that decodes instruction codes
Each code has an associated, unique output line
Only one of the output lines will be 1 at any moment in time
Receives input from the Instruction Register
Sends its output to the Timing and Control Unit
The Typical CISC Instruction Format
The instructions are stored in the memory in one or several
memory locations (depending on the type of instruction)
Instruction format all the information required by the CPU to
execute an instruction
Comprises at least one byte: the instruction code (the semantic)
The instruction code may require additional bytes
May comprise operands, addresses, offsets on one or several bytes
1-6 bytes for 16-bit x86 microprocessors
1-15 bytes for 32-bit x86 microprocessors
Example:
code [code]
[data or
address]
[data or
address]
[data or
address]
Instruction Execution Timing

Typically, the execution of an instruction has several stages:
Fetch the instruction code is read from the memory
Decode the instruction code is decoded
Execute the instruction is executed (might comprise operands fetch)
Write the result is written in a register or a memory location
The instruction execution stages are called machine cycles
Any instruction is executed in one or several machine cycles (depending on
its complexity)
In a machine cycle the CPU executes sequentially several elementary
actions accomplishing a clear, well-defined task
Elementary actions are executed once every clock cycle
An internal clock signal is generated based on an external quartz oscillator
A CPU state is a physical time period equal to the duration of a clock cycle
In a state, the CPU executes one elementary action or two independent
elementary actions (in the same time)
Instruction Execution Timing Example
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)
Machine Cycle 1: Fetch

Instruction format:
6 machine cycles:
T1. (AR) <- (PC), MEM-READ
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (IR) <- (DR)
T4. decode instruction code
M4. Read operand 1
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (IR) <- (DR)
T4. decode instruction code
Machine Cycle 2: Read Address
Instruction format:
6 machine cycles:
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX2) <- (DR)
M4. Read operand 1
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX2) <- (DR)
Instruction format:
6 machine cycles:
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX1) <- (DR)
M4. Read operand 1
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX1) <- (DR)
Machine Cycle 4: Read Operand 1
Instruction format:
6 machine cycles:
M4. Read operand 1
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (A) <- (DR)
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (A) <- (DR)
Machine cycle 5: Read operand 2 and Execute
Instruction format:
6 machine cycles:
M4. Read operand 1
T1. (AR) <- (AUX1, AUX2), MEM-READ
T2. (DR) <- ((AR))
T3. (A) <- (A) + (DR)
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T1. (AR) <- (AUX1, AUX2), MEM-READ
T2. (DR) <- ((AR))
T3. (A) <- (A) + (DR)
Machine Cycle 6: Write Result
Instruction format:
6 machine cycles:
M4. Read operand 1
M6. Write result
T1. (DR) <- (A)
T2. (AR) <- (AUX1, AUX2), MEM-WRITE
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

T1. (DR) <- (A)
T2. (AR) <- (AUX1, AUX2), MEM-WRITE
2.7 Summary
Summary
General Purpose Registers (GPRs)
Memory Data Register (MDR)
Memory Address Registers (MAR)
Arithmetic and Logic Unit (ALU)
Memory Addressing Control Unit
Timing and Control Unit (TCU)
3.1 The Registers
x86 Registers
Types of registers:
General vs. special purpose (dedicated) registers
User-accessible vs. non user-accessible registers
Size of registers:
8/16-bit for the 16-bit microprocessors
8/16/32-bit for the 32-bit microprocessors
8/16/32/64-bit for the 64-bit microprocessors
x86 General Purpose Registers
x86 GPRs: AX, BX, CX, DX (16-bit registers)
Multifunctional: can be potentially used for any operation
They have implicit functions also
Can be accessed as two separate bytes: AH, AL, BH, BL, etc.
In 32-bit microprocessors they are: EAX, EBX, ECX, EDX
Implicit functions
AX Accumulator
BX Base index (for use with arrays)
CX Counter (for use with loops and strings)
DX Extend the precision of the accumulator
x86 Pointer Registers
x86 Pointer Registers: SP, BP (16-bit registers)
They have implicit functions also
In 32-bit microprocessors they are: ESP, EBP (32-bit registers)
SP Stack pointer
Stores the effective address of the element in the top of the stack
Used implicitly in several instructions: push, pop, call, ret, int
BP Base pointer
Used to point at some other place in the stack
Stores the effective address of another value in the stack
x86 Index Registers
x86 Index Registers: SI, DI (16-bit registers)
Used implicitly in array indexing instructions: movs, lods,
stos, cmps, scas
In 32-bit microprocessors they are: ESI, EDI (32-bit registers)
SI Source Index
Stores the effective address or the index of the current
element in the source array
DI Destination Index
Stores the effective address or the index of the current
element in the destination array
x86 Flags Register
The x86 Flags register (F)
A collection of 16 flag bits, which store information regarding the
state of the processor
Used implicitly in several instructions: pushf, popf, lahf, sahf
Interrupt enable flag (IF): determines whether or not the CPU will
handle maskable hardware interrupts
Trap flag (TF): permits operation of a processor in single-step mode
Direction flag (DF): controls the left-to-right or right-to-left
direction of array processing
x86 Arithmetic and Logic Flags
Carry flag (CF): signals an arithmetic carry or borrow for unsigned
numbers
Auxiliary flag (AF): signals an arithmetic carry over the first nibble
Parity flag (PF): signals that the number of ones in the least significant
byte of the result is even
Zero flag (ZF): signals that the result is 0
Sign flag (SF) : signals that the most significant bit of the result is set
(this is the sign bit in twos complement representation)
Overflow flag (OF): signals an arithmetic overflow for signed numbers
x86 Segment Registers
x86 Segment Registers: CS, DS, ES, SS (16-bit registers)
Special purpose registers
Used for memory management: the memory is logically
segmented into smaller parts called segments
32-bit microprocessors use the same registers: CS, DS, ES, SS
Segment registers store segment addresses for:
The code segment CS
The data segment DS
The extended data segment ES
The stack segment SS
x86 Instruction Pointer Register
x86 Instruction Pointer (Program Counter) Register: IP
16-bit register
Special purpose register
Stores the effective address of the current instruction
It is not user-accessible
Incremented after every instruction
Used implicitly by the flow control instructions: jumps, calls
x86 Register Summary
x86 has very few registers
4 general purpose registers, 2 index registers, 2 pointer registers
Some of the x86 registers are multifunctional
x86 has 4 segment registers
special functions in memory management
All the registers are user-accessible; one exception: IP
The size of the registers is usually the size of the Internal Data
Bus
3.2 Memory Management
The Memory Basic Principles
Memory sequence of memory locations used to store info
Each memory location:
stores an 8-bit number, a byte of data
is identified by a unique number, called address
The memory is accessed and organized by the CPU only
The CPU can choose to create logical subdivisions within the
memory (called pages or segments)
The memory map all memory locations that can be
addressed by the CPU (not necessarily implemented)
Memory Management Techniques
Linear Memory Organization
The memory is regarded as a single block of memory
locations
The memory is addressed using directly a physical address
Memory Segmentation
The memory is logically divided into segments (non equal-
sized, possibly overlapping sections)
The memory is addressed using a segment address and an
offset
x86 Memory Segmentation
16-bit x86 microprocessors have 20 address pins
The memory map has 2
20
memory locations
The physical address (PA) has 20 bits
x86 organizes the memory into smaller segments
Segment address (SA) 16-bit address used to identify a
segment in the memory
Effective address (EA) or offset 16-bit address used to
identify the memory location inside the segment
The memory is organized in 2
16
=64k segments comprising
2
16
=64k memory locations
Logic Address -> Physical Address
The logic address (LA)
32-bit address; concatenation of SA and EA
The physical address (PA) is not an architecture attribute!
The logic address, segment address and effective address
are architecture attributes
The microprocessor translates the LA into a PA in order to
access the memory: PA = SA 0h + EA
Default Memory Segments
Segment addresses (SAs) can be stored in segment
registers
CS stores the SA of the current code segment
DS stores the SA of the current data segment
ES stores the SA of the current extended data segment
SS stores the SA of the current stack segment
Segments can start only at physical addresses which are
multiples of 16
Effective addresses (EAs)can be stored in address registers:
BX, SI, DI, SP, BP and IP
Special (SA, EA) pairs
Particular address registers are associated with particular
segment registers:
IP+CS the physical address of the current instruction is
formed using the effective address in IP and the segment
address in CS
SP+SS the physical address of the element in the top of the
stack is formed using SP and SS
BP+SS, BX+DS, SI+DS, DI+ES
Segment redirection
Segment overlapping
x86 Memory Segmentation. Summary
The memory can be regarded as a sequence of memory locations
Each memory location stores an 8-bit number and has a unique
20-bit address, called physical address
The x86 CPU regards the memory as being composed of 64k
segments comprising 64k locations each
The x86 CPU uses a 16-bit segment address to select a segment
and a 16-bit effective address to identify a memory location
inside the segment
The translation between the logical organization of the memory
in segments and the physical address is done as follows:
PA = SA 0h + EA
3.3 Memory Access. Addressing Modes
What is an Addressing Mode?
A technique to specify the location of the operands and
results
Specifies how to calculate the effective memory address of
operands and results, using information in registers and/or
constants with the instruction format
Defines how machine language instructions in the
architecture identify the operands /results of each
instruction
Register Implicit Addressing
The targeted information is found in a register (not in the
memory)
The information regarding which register stores the data is
coded in the instruction code (the first byte in the
instruction)
The instruction code comprises several fields; among them:
the fields which code the source/destination registers
The targeted information is an
operand or a result
Minimum instruction size: 1B
instr.
code
addr
low
addr
high
data
instr.
semantic
code
dest
register
code
source
register
code
register
Immediate Addressing
The targeted information is found in the memory, in the
instruction, immediately after the instruction code
The targeted information
is coded in the instruction; it is a constant
is an operand
cannot be a result
cannot be an instruction
Minimum instruction size: 2B (the data has at least 1B)
instr.
code
data
low
data
high
Direct (Absolute) Addressing

The targeted information is found in the memory, at an
address coded in the instruction
The address is in the program memory
The targeted information is in the data or program memory
Minimum instruction size: 3B (the address has at least 2B)
instr.
code
addr
low
addr
high
data
Relative Addressing
The targeted information is found in the program memory,
at an address obtained as a sum between the address of the
current instruction and an offset coded in the instruction
The offset can be positive or negative
The targeted information can be an operand or an
instruction
Minimum instruction size: 2B (the offset usually has 1B)
instr.
code
offset data
IP (addr)
+
Register Indirect Addressing
The targeted information is found in the memory, at an
address specified in a register coded in the instruction code
The targeted information can be an operand, a result or an
instruction
One register might not be enough to store an address
Minimum instruction size: 1B
instr.
semantic
code
addr.
register
code
register (addr)
data
Memory Indirect Addressing
The targeted information is found in the memory, at the
address specified in a memory location(s) whose address is
specified in the instruction code
The targeted information can be an operand, a result or an
instruction
Minimum instruction size: 3B (the address has at least 2B)
data
instr.
code
addr
low
addr
high
addr
low
addr
high
Base plus Index Addressing

The targeted information is found in the memory, at the
address obtained as a sum between the address stored in a
register and an offset (index) coded in the instruction
The address stored in the register is usually the base address
of an array of data (the address of the first element)
The targeted information can be an operand or a result
Minimum instruction size: 2B (the offset has at least 1B)
data data data
instr.
code
offset
register (addr)
+
Addressing Modes Summary
Various addressing modes
some simpler, some more complicated
some can be used for instructions also, some only for data
the route to the data can be direct or indirect
the targeted information can be in a register, in the program
memory or in the data memory
Depending on the addressing mode, the minimum
instruction size can be 1B / 2B / 3B
The information stored in the instructions can have various
semantics / meanings: data, offset, address, etc.
x86 Addressing Modes
Program addressing modes
Relative addressing
Direct addressing
Register indirect addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Program Addressing Modes
Used to address instructions through jumps and calls
(normally instructions are addressed sequentially)
Relative addressing (jmp 5h)
The targeted instruction is in the memory, in the code
segment, at an address obtained as the sum between the
content of IP and an offset stored in the current instruction
Direct addressing(jmp 56A80100h)
segment, at an address specified in the current instruction
Register indirect addressing (jmp [BX])
segment, at an address specified in a register coded in the
current instruction code
x86 Data Addressing Modes
Used to address data only
Register implicit addressing (mov AH, BL)
The targeted information is in a register:
8-bit register: AL, AH, BL, BH, CL, CH, DL, DH
16-bit register: AX, BX, CX, DX, SI, DI, SP, BP
Immediate addressing (mov AX, 1234h)
The targeted information is in the memory, in the code
segment, at the effective address IP + 1
Direct addressing (mov AX, [1234h])
The targeted information is in the memory, in the data
segment, at an effective address specified in the current
instruction
Indexed addressing (mov AX, [SI+20h])
segment, at an effective address obtained as a sum between
the content of SI or DI and an offset stored in the current
instruction
Indirect implicit addressing (mov AX, [DI])
segment, at the effective address stored in SI or DI
The targeted information is implicitly found in the data
segment
The targeted information can also be found in the code
segment, extended data segment or stack segment
A redirection prefix should be used
x86 Base-relative Addressing Modes
Direct base relative addressing (mov AX, [BX + 1234h])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX and
an offset stored in the current instruction
Indexed base relative addressing (mov AX, [BX + DI + 10h])
an effective address obtained as a sum between the content of BX, the
content of SI or DI and an offset stored in the current instruction
Implicit base relative addressing (mov AX, [BX+SI])
an effective address obtained as a sum between the content of BX, the
content of SI or DI
x86 Stack-relative Addressing Modes
Direct stack relative addressing (mov AX, [BP + 1234h])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP and
an offset stored in the current instruction
Indexed stack relative addressing (mov AX, [BP + DI + 10h])
an effective address obtained as a sum between the content of BP, the
content of SI or DI and an offset stored in the current instruction
Implicit stack relative addressing (mov AX, [BP+SI])
an effective address obtained as a sum between the content of BP, the
content of SI or DI
3.4 The Instruction Set
Microprocessor Instruction Types
Data transfer instructions
set a register or a memory location to a fixed constant value
copy data from a memory location to a register, or vice versa
read and write data from I/O devices
Data processing instructions
arithmetic operations (add, subtract, multiply, divide, etc.)
logic operations (and, or, exclusive or, shift, rotate, etc.)
bitwise logic operations
compare operations
Control flow instructions
branch to another location in the program and execute instructions
there
conditional branch to another location if a certain condition holds
branch to another location, while saving the location of the next
instruction as a point to return to (a call)
Data Transfer Instructions
Two operands: a source and a destination
General idea: the source is copied at the destination
The source and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size
Performance criterion: transfer as much data as possible
using an instruction with a small format
x86 Simple Data Transfer Instructions
MOV Move (Copy) Data
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
Usage: MOV dest, src
Operands:
dest - general-purpose register, segment register (except CS)
or memory location
src - immediate value, general-purpose register, segment
register or memory location
Effects: Copies the source to the destination, overwriting
the destination's value: (dest) (src)
Flags: none
XCHG Exchange Data
Usage: XCHG dest, src
Arguments:
dest - register or memory location
src register or memory location
Effects: Exchanges the source with the destination:
(dest) (src)
Flags: none
Miscellaneous: two memory locations cannot be used in
one instruction
XCHG Exchange Data
PUSH Push Operand in the Stack
Usage: PUSH src
Arguments: src 16-bit immediate value, register or
memory location
Effects: Decrements stack pointer with 2 and copies src on
top of the stack:
(SP) (SP) 2
((SS):(SP)+1) (src
high
)
((SS):(SP)) (src
low
)
Flags: none
Miscellaneous: src must be a 16-bit value
POP Pop a word from the Stack
Usage: POP dest
Arguments:
dest 16-bit register, segment register or memory location
Effects: Copies the element (16-bit) from the top of the
stack into dest and increments the stack pointer with 2:
(dest
high
) ((SS):(SP)+1)
(dest
low
) ((SS):(SP))
(SP) (SP) + 2
Flags: none
The Source and the Destination Arrays
The x86 architecture defines two implicit memory zones which
store two arrays of 8-bit or 16-bit numbers
The source array
Stored in the data segment (the segment with the address in DS)
The current element is at the effective address specified in SI
The destination array
Stored in the extended data segment (the segment with the address
in ES)
The current element is at the effective address specified in DI
The arrays are iterated from left-to-right or vice-versa based on
the value of the direction flag (DF)
x86 String / Array Instructions
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
MOVS Move String
Usage: MOVSB / MOVSW
Arguments: none
Effects:
movsb:
((ES):(DI)) ((DS):(SI))
(SI) (SI) 1, (DI) (DI) 1.
movsw:
((ES):(DI)) ((DS):(SI)), ((ES):(DI)+1) ((DS):(SI)+1)
(SI) (SI) 2, (DI) (DI) 2.
Flags: none
Miscellaneous: can be prefixed by rep, repe/repz,
repne/repnz
MOVS Move String
MOVS Move String
MOVS Move String
LODS Load String
Usage: LODSB / LODSW
Arguments: none
Effects:
lodsb: Copies the current 8-bit element from the source string
to the accumulator and increments (if DF=0) or decrements
(if DF=1) the value in SI by 1:
(AL) ((DS):(SI)), (SI) (SI) 1.
lodsw: Copies the current 16-bit element from the source
string to the accumulator and increments (if DF=0) or
decrements (if DF=1) the value in SI by 2:
(AL) ((DS):(SI)), (AH) ((DS):(SI)+1), (SI) (SI) 1.
Flags: none
LODS Load String
LODS Load String
STOS Store String
Usage: STOSB / STOSW
Arguments: none
Effects:
stosb: Copies the value in the accumulator in the current 8-bit
element in the destination string and increments (if DF=0) or
decrements (if DF=1) the value in DI by 1:
((ES):(DI)) (AL), (DI) (DI) 1.
stosw: Copies the value in the accumulator in the current 16-
bit element in the destination string and increments (if
DF=0) or decrements (if DF=1) the value in DI by 2:
((ES):(DI)) (AL), ((ES):(DI)+1) (AH), (DI) (DI) 2.
Flags: none
STOS Store String
STOS Store String
Data Processing Instructions
An arithmetic operation is applied to one or several sources and
the result is stored in the destination
The arithmetic flags (CF, AF, ZF, PF, SF, OF) are modified!
The sources and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size (exceptions: multiplication, division)
CISC processors characteristics:
Data processing uses an accumulator (one of the sources is also the
destination)
The sources and the destination are memory locations
Execution time depends on instruction complexity
Performance criterion: fast execution of complex data processing
operations
x86 Arithmetic Instructions
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
CMP Compare two operands
Usage: CMP src1, src2
Arguments:
src1, src2 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts src2 from src1: (src1) (src2). Flags are set in
the same way as the SUB instruction does, but the result is of the
substraction is not saved.
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified acording
to the result.
Misc:
usually the next operation would be a conditional jump to perform
an operation according to the result of the comparison;
only one memory argument is allowed and both arguments have to
be of the same size
ADD Integer Addition
Usage: ADD d, s
Arguments:
dest - register or memory location
src - immediate, register or memory location; (two memory
operands cannot be used)
Effects: Adds the source to the destination:
(dest) (dest) + (src).
Flags: The CF, ZF, OF, SF, AF, and PF flags are set according
to the result.
Misc: no difference between signed and unsigned operands
ADC Add with Carry
Usage: ADC d, s
Arguments: same as for ADD
Effects: Adds the the carry flag (CF) and the source to the
destination: (d) (d) + (s) + (CF)
Flags: same as for ADD
Misc: same as for ADD
ADC Add with Carry
SBB Integer Subtraction with Borrow
Usage: SBB dest, src
Arguments:
dest 8bit or 16bit register or memory location
src 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts the carry flag and src from dest:
(dest) (dest) (src) (CF)
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified
acording to the result.
Misc: only one memory argument is allowed and the
arguments have to be of the same size
SBB Integer Subtraction with Borrow
MUL Unsigned Multiplication of AL or AX
Usage: MUL src
Arguments:
src 8bit or 16bit register or memory location.
Effects:
if src is an 8-bit value: multiplies the value stored in AL by src
and stores the result in AX:
(AX) (AL) * (src)
CF and OF are set to 0 if AH is 0, otherwise they are set to 1.
if src is a 16-bit value: multiplies the value stored in AX by src
and stores the result in DX concatenated with AX:
(DX) (AX) (AX) * (src)
CF and OF are set to 0 if DX is 0, otherwise they are set to 1.
Flags: CF and OF are modified as mentioned above. The
rest of the flags are undefined.
DIV Unsigned Division
Usage: DIV src
Arguments:
src 8-bit or 16-bit register or memory location;
Effects:
if src is an 8-bit value: divides by src the value stored in AX and
stores the remainder in AH and the quotient in AL:
(AH) (AX) mod (src), (AL) (AX) div (src)
if src is a 16bit value: divides by src the value stored in DX
concatenated with AX and stores the remainder in DX and the
quotient in AX:
(DX) (DX)-(AX) mod (src), (AX) (DX)-(AX) div (src)
Flags: The CF, ZF, OF, SF, AF, and PF flags are undefined.
Misc:
if the quotient is larger than 8bits (16bits) and cannot be stored in
AX (DX AX) then a divide overflow error will be thown.
x86 Logic Instructions
NOT Complement
AND Logic AND
OR Logic OR
XOR Exclusive OR
SHL | SAL Shift Left (Arithmetic and Logic)
SHR Logic Shift Right
SAR Arithmetic Shift Right
ROL Rotate Left
ROR Rotate Right
RCL Rotate Left with Carry
RCR Rotate Right with Carry
TEST Compare using AND
NOT, OR, AND, XOR
I1 I2 OR
0 0 0
0 1 1
1 0 1
1 1 1
I1 I2 XOR
0 0 0
0 1 1
1 0 1
1 1 0
I1 I2 AND
0 0 0
0 1 0
1 0 0
1 1 1
I NOT
0 1
1 0
SHL, ROL, RCL
SHR and SAR
Control Flow Instructions
Exceptions in the sequential execution of instructions:
Branch to a different instruction
Conditional branch to a different instruction
Can be used to create decision structures
Conditional skip of the current/following instruction
Can be used to create inline decision structures
Counter update + conditional branch (loop)
Can be used to create repetitive structures
Return address save + branch to a different instruction (call)
Can be used for subprogram calls
x86 Control Flow Instructions
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
x86 Conditional Jump Instructions
Instruction Usage Condition Description
JA | JNBE JA label (CF)=0 AND (ZF)=0
Jump to label if above | not
below or equal
JAE | JNB | JNC JAE label (CF)=0
Jump to label if above or equal |
not below | not carry
JB | JNAE | JC JB label (CF)=1
Jump to label if below | not
above or equal | carry
JBE | JNA JBE label (CF)=1 OR (ZF)=1
Jump to label if below or equal |
not above
JG | JNLE JG label (SF)=(OF) AND (ZF)=0
Jump to label if greater | not
lower or equal
JGE | JNL JGE label (SF)=(OF)
Jump to label if greater or equal
| not lower
JL | JNGE JL label (SF)!=(OF)
Jump to label if lower | not
greater or equal
JLE | JNG JLE label (SF)!=(OF) OR (ZF)=1
Jump to label if lower or equal |
not greater
x86 Conditional Jump Instructions
JE | JZ JE label (ZF)=1 Jump to label if equal | zero
JNE | JNZ JNE label (ZF)=0 Jump to label if not equal | not zero
JNO JNO label (OF)=0 Jump to label if not overflow
JNP | JPO JNP label (PF)=0 Jump to label if not parity | parity odd
JNS JNS label (SF)=0 Jump to label if not signed | positive
JO JO label (OF)=1 Jump to label if overflow
JP | JPE JP label (PF)=1 Jump to label if parity | parity even
JS JS label (SF)=1 Jump to label if signed | negative
x86 Loop Instructions
LOOP
LOOP
label
(CX) != 0
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero
LOOPE |
LOOPZ
LOOPE
label
(CX) != 0
AND (ZF)=1
zero and ZF is one.
LOOPNE |
LOOPNZ
LOOPNE
label
(CX) != 0
AND (ZF)=0
zero and ZF is zero.
CALL Call Subprogram
Usage: CALL dest
Arguments:
dest (target) address of the first instruction in the called
subprogram; can be an immediate value, a general purpose register
or a memory location;
Effects:
The address of the next instruction is saved in the stack and the
instruction pointer is set to the target address (the CPU performs a
jump to the subprogram):
(SP) (SP) 2, ((SS):(SP)+1) (IP
high
), ((SS):(SP)) (IP
low
)
(IP) (dest)
Flags: none
Misc: Usually there is a RET instruction in the subprogram to
return to the instruction after the call.
CALL Call Subprogram
RET Return from Subprogram
Usage: RET
Arguments: none
Effects:
The CPU pops the value in the top of the stack and uses it to
jump back to the caller program:
(IP
high
) ((SS):(SP)+1), (IP
low
) ((SS):(SP))
(SP) (SP) + 2.
Flags: none
Misc: Usually the address was placed in the stack by a call
instruction and the return is made to the address that
follows the call instruction.
RET Return from Subprogram
x86 Subprogram Calls
The CALL and RET instructions do not have input/output
parameters as arguments
There are several conventions for sending I/O parameters
Through General Purpose Registers
Through the Stack
Through the Memory
3.5 Summary
Summary
The x86 Registers
Memory Management
Memory Access. Addressing Modes
The Instruction Set
4.1 Introduction
RISC Philosophy. Motivation
DARPAs VLSI Project (70 80)
how efficient are the current microprocessors?
provided research funding to university-based teams
to improve the state of the art in microprocessor design
Studies in CPU design showed that
simplified instructions can provide higher performance if this
simplicity enables much faster execution of each instruction
a CPU with a small, highly-optimized set of instructions, can
be more efficient than a CPU with a more specialized set of
instructions
RISC: Reduced Instruction Set Computer
a type of microprocessor architecture that utilizes a small,
highly-optimized set of instructions instead of a more
specialized set of instructions
The first RISC projects (mid 70s and early 80s)
IBM: the IBM 801 architecture
Stanford University: Stanford MIPS architecture
University of California, Berkeley: Berkeley RISC I and II
commercialized as the SPARC architecture
Other well-known RISC architectures:
ARM, Atmel AVR, Intel i860/i960, PA-RISC, PowerPC
RISC Principles (I)
Hardwired Control Unit
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
4.2 The Registers
A Large Number of GPRs. Benefits
Higher processing speed thanks to a lower number of
memory accesses
Hardware data structures (stacks and queues) created with
general purpose registers
Input/output parameters to/from subprograms are
sent/received through GPRs
Increased chip uniformity factor
RISC Register Set Characteristics
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
Register Set Organization
A single set of registers
Comprising at least 32 physical registers
No logical registers
Any physical register is accessed by decoding a register code
The registers are accessed similarly to the linearly organized
memory
Multiple sets of logical registers in a single
set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is bijective
Multiple sets of logical registers, partially
overlapped, in a single set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is not
bijective anymore!
The overlapping portions are called register
windows
Multiple sets of logical registers in multiple sets of physical
registers: useful for multiprocessing
Berkeley RISC II Register Set
8 sets of logical registers in a single set of 138 physical registers
Each set of logical registers (the work-set for each program)
comprises:
10 registers for global variables - shared with all programs
10 registers for local variables
6 registers for I/O parameters - shared with the calling program
6 registers for parameters - shared with the called program
1 set of physical registers (R)
8 sets of logical registers (A H)
Mapping examples:
R0 = A0 = = H0

R9 = A9 = = H9
R10 = A10 = H26

R15 = A15 = H31
R16 = A16

R25 = A25
R26 = A26 = B10

R31 = A31 = B15
4.3 The Instruction Set
RISC Instruction Set Characteristics
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Subprogram calls use register windows for parameter passing
I/O instructions
RISC Typical Addressing Modes
Register implicit addressing
Immediate addressing
Direct (absolute) addressing
Base-relative direct addressing
Base-relative indexed addressing
Relative (to PC) addressing
Intel i860 / i960 Instruction Examples
Note: in these examples s1, s2 and d are general purpose
registers
Signed integer addition
adds s1, s2, d ;(d) (s1)+ (s2)
Memory access with two pointers
ldl.l s1(s2), d ;(d) ((s2)+ (s1))
Memory access using a constant
st.s s1, #const(s2) ;((s2)+ const) (s1)
Left shift with three operands
shl s1, s2, d; ;(d) (s2)* 2
(s1)
ARM Instruction Examples
Note: in these examples s1, s2, s3 and d are general purpose
registers
Logic AND with three operands
and d, s1, s2 ;(d) (s1)& (s2)
Memory access with pre-indexing
ldr d, [s1+#const]! ;(d) ((s1) + const)
;(s1) (s1) + const
Memory access with post-indexing
str s1, d, #8 ;((d)) (s1)
;(s1) (s1) + const
Multiply and add (four operands)
mla d, s1, s2, s3; ;(d) (s1)* (s2) + (s3)
4.4 The Timing and Control Unit
Instruction format for:
Intel x86 (CISC) microprocessors
1 15 bytes, depending on instruction complexity
Intel i860 (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Stanford MIPS (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Fixed instruction format -> simpler Instruction Decoder
-> simpler Memory Addressing Unit
Simpler Instruction Decoder
The Timing Control Unit is
Micro-programmed for CISC microprocessors
Hardwired for RISC microprocessors
Example: 32bit x 32bit multiplication
Micro-programmed Control Unit
Uses the same ALU and MACU + a micro-program
Uses a dedicated hardwired circuit
Hardwired Timing and Control Unit
32b x 32b CISC Multiplication
result 0
for i = 1 to 32 do
if multiplier(i) = 1
result result + multiplicand
end_if
multiplicand multiplicand * 2
end_for
32b x 32b RISC Multiplication
Micro-programmed Timing and Control Unit (CISC case)
Uses the same ALU and MACU + a memory of micro-programs
Each instruction is associated with a micro-program which
coordinates the timing of elementary actions
Variable number of states / clock cycles depending on instruction
complexity
Hardwired Timing and Control Unit (RISC case)
Uses a dedicated hardwired circuit
Each instruction is associated with a dedicated hardwired circuit
Fixed number of states / clock cycles regardless of instruction
complexity
Main drawback: the lack of flexibility; adding a new instruction
requires modifications in the hardware design
Micro-programmed vs. Hardwired TCU
Premises
All instructions (simple/complex) are executed in the same
amount of time / clock cycles
All instructions are executed in a sequence of stages; example:
fetch the instruction from the memory
decode the instruction
read the operands
execute the instruction
write the result into a register
Pipelining concept: at any moment in time the
microprocessor executes simultaneously several different
stages for several pipelined instructions; this leads to CPI=1
RISC Instructions Pipelining
Pipeline example
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
4.5 Compiler Particularities (Issues)
Compiler computer program that transforms source code
written in a high-level programming language to a lower
level language (e.g., assembly language or machine code)
The efficiency of RISC architectures, obtained through
many optimizations and simplifications, also involve strong
software-layer constraints
=> RISC machines are shipped with dedicated compilers
RISC compilers issues
Register allocation
Optimal allocation of variable values to logical registers
Pipeline correct execution and efficiency management
Data dependence, jump and load/store instructions management
RISC Compilers
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation
Although RISC machines have many registers there are
programs which require more registers than actually exist.
The register allocator must insert spill code to store some
values back into memory for part of their lifetime.
Storing/loading values in/from the memory is time
inefficient => optimal register allocation needs to be done!
Minimizing the runtime cost of spill code is a crucial
consideration in register allocation.
Register Allocation The Issue
Values used in a function: A, B, C, , F
Lifetime of the values represented as timelines over a
sequence of CPU states
Available registers: R1, R2, R3
The interference / color graph
Nodes the values A, B, C, , F
Edges indicate lifetime overlapping of the values
Labels outside nodes the register allocated for the value
The Interference / Color Graph
The Interference / Color Graph
The Pipeline and Jumps Management
The problem:
The instructions following JMP should not be introduced in
the pipeline
The instructions to be introduced in the pipeline are known
only after the execution of JMP (the jump address is
computed)
The solution:
The compiler should insert several NOP instructions after
every JMP instruction
The drawback: NOPs introduce delays => CPI > 1
Optimizations (code reordering) are sometimes possible:
ADD r3, r2, r1
AND r0, r5, r6
JMPZ r0, label
NOP
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
AND r0, r5, r6
JMPZ r0, label
ADD r3, r2, r1
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
ADD does not interfere with the execution of JMPZ,
therefore it can be moved downwards, instead of a NOP.
ANDcannot be moved because the jump is taken / not
taken depending on its result!
The Pipeline and Data Dependency
ADD r1, r2, r7
AND r6, r1, r3
The problem:
The value computed by ADD is not available in the
destination register (R1) when AND needs to read it
The AND reads an old value of R1 and the program does not
work correctly anymore
The solution:
The compiler should insert several NOP instructions after
every ADD instruction in order to delay the next instruction
The drawback: NOPs introduce delays => CPI > 1
Optimizations (code reordering) are sometimes possible:
MUL r8, r2, r1
SUB r0, r5, r6
ADD r1, r2, r7
NOP
NOP
AND r6, r1, r3
....
MUL and SUB do not interfere with the execution of ADD,
therefore it can be moved downwards, instead of the NOPs
Data dependency appears in the case of data processing
instructions, but also for memory access instructions:
ADD r1, r2, r7
MUL r8, r2, r1
SUB r0, r5, r6
AND r6, r1, r3
....
LOAD r0, mem
SUB r6, r1, r0
Pipeline Correct Execution and
Efficiency Management
Generally the pipeline technique can be successfully used
to execute several instruction stages simultaneously
Leads to CPI = 1
There are cases in which the compiler has to take special
measures (introduce NOPs) to assure the correct execution
of the program
Leads to CPI > 1
Some of the above measures can be optimized to increase
efficiency (CPI -> 1)
4.6 Summary
Economic advantages (translate to lower cost)
Smaller Timing and Control Unit (more than 10 times)
Increased chip uniformity factor
Shorter development time for the TCU
Technical advantages
Higher processing speed
Lower power consumption
Lower probability of hardware design errors
Smaller number of memory accesses
Simpler compiler development (on one side)
RISC Advantages
Economic drawbacks
Appeared after the CISC processors
Technical drawbacks
Longer programs (require more program memory)
Lack of flexibility for the TCU: adding a new instruction
requires modifications in the hardware design
More complex compiler development (on one side)
RISC Drawbacks
5.1 I/O Devices Organization
I/O ports are characterized by address and content
The content of a port is linked to an external peripheral
Writing a port = Sending data to the peripheral
Reading a port = Receiving data from the peripheral
I/O ports can be accessed as
Memory locations, using memory addressing instructions
Ports, using dedicated I/O instructions
I/O Ports and Peripherals
IN dest, port
Reads data from the port and stores it into dest
OUT port, src
Writes the data from src to the port
In CISC processors only the accumulator can be used as
source and destination in the dedicated I/O instructions
Dedicated I/O instructions involve
Specific machine cycles
Specific signals on the control bus (IOR and IOW)
Dedicated I/O Instructions
Port map is smaller than the memory map
The addressing modes used for the dedicated I/O
instructions are very restrictive: direct and register-indirect
Main advantage: port access is faster
Example for Intel x86
64k ports, one byte each
consecutive one-byte ports can be accessed as one-word port
in AL, 0Fh; in AX, DX
out 10h, AL; out DX, AX
Dedicated I/O Instructions
The I/O ports are mapped within the main memory and
are regarded as regular memory locations
Port access is done with regular memory addressing
instructions; consequently:
The same machine cycles are used
The same signals on the control bus (MEMR and MEMW)
The same addressing modes are used
Main advantage: port access is simpler
Drawbacks: port access is slower, a part of the memory map
is wasted on ports
Memory Mapped Ports
5.2 Typical I/O Techniques
I/O technique: microcomputer-peripheral synchronization
technique
Types:
Synchronous (with the current program) techniques
The microcomputer-peripheral communication is initiated by
the CPU (by executing specific instructions)
Asynchronous techniques
The microcomputer-peripheral communication is initiated by
the peripheral independently on the program executed by the
CPU
Typical I/O Techniques
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
The Polling Technique
Procedure:
The CPU reads periodically the state of the peripherals connected to
the ports (reads a status byte from the port)
The CPU initiates a data transfer if the peripheral is ready
Notes:
The CPU actions are triggered by instructions in the program
The status byte is read through the data bus
Main advantage: no additional hardware is required
Drawbacks:
The CPU wastes time on polling the state of the peripherals
Potential communication requests can be lost
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
pollPort24: in AL, 24h
shl AL, 1
jnc pollPort24
out 24h, AX
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
Interrupt-driven I/O
Procedure:
The peripheral sends an interrupt signal (through a port) on a
dedicated terminal of the microprocessor
By doing this the peripheral says it is ready for data transfer
If it is programmed to respond to interrupt signals, the CPU
interrupts its current activity and starts the data transfer
Notes:
The interrupt signal is received on a dedicated pin
The current program is halted and has nothing to do with the
data transfer
Main advantage: the CPU responds very fast to interrupts
Interrupt request (IRQ): the signal sent by the peripheral to
the CPU (on a dedicated pin) to request access to the
systems resources
Interrupt request response: a sequence of steps performed
by the CPU in response to the IRQ
Interrupt service routine (ISR) or interrupt handler: a
dedicated program (sequence of instructions) through
which the CPU responds to the IRQ of a specific peripheral
Interrupt-driven I/O. Definitions
Step 1. The CPU finishes the execution of the current instruction.
Step 2. The CPU saves the flags register in the stack.
Step 3. The CPU saves some of the general purpose registers in
the stack.
Step 4. The CPU saves the return address (the address of the next
instruction) in the stack.
Step 5. The interrupt flag (IF) is reset to disable any other
interrupts.
Step 6. The CPU jumps to the interrupt service routine (ISR).
After the ISR is executed the CPU restores all the information
saved in the stack, returns to the main program and continues
The Interrupt Request Response
5.3 Typical Interrupt Techniques
DMA is an interrupt-based I/O technique which allows a
peripheral to access the memory directly (without CPU
intervention)
Direct Memory Access (DMA)
Procedure:
The peripheral sends an interrupt signal to the DMA controller
The DMA controller sends a Bus Request (BUSRQ) signal to
the microprocessor
The CPU finishes the current machine cycle and interrupts its
activity; this is equivalent to freeing the system bus
The DMA controller is left in charge of the microcomputer and
Generates addresses for a sequence of memory locations
Manages the data transfer between the port to which the
peripheral is connected and the sequence of memory locations
Notes:
DMA interrupts have the highest priority
DMA interrupts cannot be disabled by the user
Direct Memory Access (DMA)
The NMIs interrupt procedure is the same as the one
described for the general case
Notes:
NMIs are received on another dedicated terminal (NMI)
NMIs cannot be disabled by the user
In the case of NMIs the CPU finishes the execution of the
current instruction before responding to the interrupt
The ISRs address is predefined
Non-maskable Interrupts (NMI)
As opposed to NMIs the maskable interrupts can be
disabled (ignored) by the user
The interrupt procedure is the same as the one described
for the general case
Notes:
Maskable interrupts are received on a dedicated pin (INT)
The ISRs address can be:
Predefined
Provided by the peripheral
Selected by the CPU based on a code sent by the peripheral
Maskable Interrupts (INT)
The maskable interrupts for which the ISRs address is selected
by the CPU based on a code sent by the peripheral are called
vectored interrupts.
Interrupt vector the complete address of the ISR
Interrupt Vector Table (IVT) a table (stored in the memory)
which contains all the interrupt vectors for all the ISRs
Interrupt vector selection procedure
The peripheral sends a code to the CPU; each peripheral is initially
allocated a unique code
The code is used by the CPU as an index in the IVT (to select the
corresponding interrupt vector)
Vectored Interrupts
Notes
The size of the interrupt vectors is the size of a complete
address
The size of the code sent by the peripheral depends on how
many interrupts the processor can respond to
The size of the IVT is derived by multiplying the size of one
vector by the maximum number of vectors in the IVT
The IVT is usually located in the memory at a predefined
address (usually address 0x00)
Vectored Interrupts
The size of an interrupt vector: 4 bytes(SA 2B and EA 2B)
The code sent by the peripheral has 8 bits => max 256 interrupt
types = max 256 interrupt vectors => max 256 ISRs
The size of the IVT: 256 vectors x 4B = 1024B
The IVT is stored at the beginning of the memory
Vectored Interrupts. Example (x86)
Software interrupts are special instructions in the x86
instruction set
The execution of such an instruction is identical to the CPU
response to a vectored interrupt
The code which is sent by the peripheral in the case of
vectored interrupts is provided as an operand of is implied
x86 software interrupts:
INT [code]
Interrupt with the provided code or with code=3 (if not provided)
INTO
Interrupt with code=4 only if OF is set
x86 Software Interrupts
The execution of INT 5h involves the following steps:
1. The flags register (F) is saved in the stack
2. IF and TF are set to zero (other interrupts are disabled)
3. The return address is saved in the stack
4. The interrupt vector is selected and the jump to the ISR is
performed:
CS is loaded with the value found in the memory at addresses
4*5+3 and 4*5+2
IP is loaded with the value found in the memory at addresses
4*5+1 and 4*5+0
x86 Software Interrupts. Example
The execution of INTO involves the following steps:
1. The OF flag is verified; if it is 1 then the following steps are
made, else the CPU continues with the next instruction
2. The flags register (F) is saved in the stack
3. IF and TF are set to zero (other interrupts are disabled)
4. The return address is saved in the stack
5. The interrupt vector is selected (INTO always refers to a
fixed vector) and the jump to the ISR is performed:
CS is loaded with the value found in the memory at addresses 13h
and 12h
IP is loaded with the value found in the memory at addresses 11h
and 10h
x86 Software Interrupts. Example
The invention of the transistor and the integrated circuit
Moores Law
The invention of the microprocessor and the microcontroller
General purpose microprocessors
Microcontrollers
Special purpose microprocessors (DSPs, commprocessors, )
Introduction
The Structure of a Microcomputer
Instruction Execution Example <>
The CPU is reset and starts executing instructions from a
predefined address in the memory (100h)
Reset
Execute
instructions from
address 100h
Overview of a CISC, General Purpose
Microprocessor Core
General Purpose Registers (GPRs)
Memory Data Register (MDR)
Memory Address Registers (MAR)
Arithmetic and Logic Unit (ALU)
Memory Addressing Control Unit
Timing and Control Unit (TCU)
Instruction Execution Timing <>
Typically, the execution of an instruction has several stages:
Fetch the instruction code is read from the memory
Decode the instruction code is decoded
Execute the instruction is executed (might comprise operands fetch)
Write the result is written in a register or a memory location
The instruction execution stages are called machine cycles
Any instruction is executed in one or several machine cycles (depending on
its complexity)
In a machine cycle the CPU executes sequentially several elementary
actions accomplishing a clear, well-defined task
Elementary actions are executed once every clock cycle
An internal clock signal is generated based on an external quartz oscillator
A CPU state is a physical time period equal to the duration of a clock cycle
In a state, the CPU executes one elementary action or two independent
elementary actions (in the same time)
The x86 Architecture
The x86 Registers
Types, sizes, usage, implicit functions, accessibility, etc.
Memory Management
Linear vs. segmented memory models, the x86 memory
model
Memory Access. Addressing Modes
What is an addressing mode? Comparison, examples, etc.
The Instruction Set
Instruction types, formats, examples, comparison, etc.
x86 Registers. Summary
x86 has very few registers
4 general purpose registers, 2 index registers, 2 pointer registers
Some of the x86 registers are multifunctional
x86 has 4 segment registers
special functions in memory management
All the registers are user-accessible; one exception: IP
The size of the registers is usually the size of the Internal Data
Bus
x86 Memory Segmentation. Summary
The memory can be regarded as a sequence of memory locations
Each memory location stores an 8-bit number and has a unique
20-bit address, called physical address
The x86 CPU regards the memory as being composed of 64k
segments comprising 64k locations each
The x86 CPU uses a 16-bit segment address to select a segment
and a 16-bit effective address to identify a memory location
inside the segment
The translation between the logical organization of the memory
in segments and the physical address is done as follows:
PA = SA 0h + EA
x86 Addressing Modes
Program addressing modes
Relative addressing
Direct addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Simple Data Transfer Instructions
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
x86 String / Array Instructions
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
x86 Arithmetic Instructions
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
x86 Control Flow Instructions
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
Fundamental principles
The set of registers
Specific characteristics, organization, examples, etc.
The instruction set
Characteristics, typical addressing modes, examples
The timing and control unit (TCU)
Micro-programmed vs. hardwired, instruction pipelining
Compiler particularities
Register allocation, pipeline issues
RISC Architectures. Summary
RISC Principles (I)
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
RISC Register Set Characteristics
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
RISC Instruction Set Characteristics
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Subprogram calls use register windows for parameter passing
I/O instructions
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation <>
Microprocessors I/O Techniques
I/O devices organization
ports and peripherals; dedicated instructions; memory
mapped ports
The polling technique
Definitions
Interrupt types
Vectored interrupts, x86 interrupts, software interrupts
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
shl AL, 1
jnc pollPort24
out 24h, AX
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
The Interrupt Vectors Table (IVT)
Compute the size of the interrupt vectors table provided that
The size of the memory is 1 MB (1)
Each memory location stores 8 bits (2)
The processor uses linear memory organization (3)
The code sent by the peripheral has 8 bits (4)
(1) & (2) => there are 2^20 memory locations => the PA has 20
bits (5)
(3) => the processor uses physical addresses directly (6)
(5) & (6) => the size of an interrupt vector is 3 bytes = 24 bits >
20 bits (7)
(4) => there are 256 interrupt vectors in the table
(4) & (7) => the IVT has 256 vectors x 3B = 768B

Microprocessors HC v12

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Microprocessors HC v12

Загружено:

Авторское право:

Доступные форматы

Horia Cucu

Speech & Dialogue Research Laboratory

Instruction Execution Timing

Machine Cycle 1: Fetch

Machine Cycle 1: Fetch

Machine Cycle 2: Read Address

Machine Cycle 3: Read Address

Machine Cycle 4: Read Operand 1

Machine cycle 5: Read operand 2 and Execute

Machine Cycle 6: Write Result

Direct (Absolute) Addressing

Base plus Index Addressing

Вам также может понравиться