Вы находитесь на странице: 1из 311

Horia Cucu

Speech & Dialogue Research Laboratory


Faculty of Electronics, Telecommunications and Information Technology
University POLITEHNICA of Bucharest
Introduction to Microprocessors
Historical Background
Microprocessors Evolution Tree
Typical Applications
Educational Need
Administrative Issues
22.05.2014 2 Microprocessors Architecture
Historical Background
1947: Invention of the transistor
1959: Invention of the integrated circuit (IC)
1965: Birth of Moore's Law
1971: Development of the first microprocessor
1976: Introduction of the first microcontroller
22.05.2014 3 Microprocessors Architecture
Microprocessors and
Microcontrollers
22.05.2014 4 Microprocessors Architecture
is a CPU-on-a-chip
is a computer-on-a-chip
Microprocessors Evolution Tree
22.05.2014 5 Microprocessors Architecture
others
Itanium
RISC
Pentium
80486
80386
80286
8086
8085
8080
8008
4004
8048
8051
DSPs
Comm processors
others
others
General Purpose
Microprocessors
Microcontrollers
Special Purpose
Microprocessors
PIC
AVR
Typical Applications
General purpose microprocessors: used to create computers
PCs, Laptops, Workstations
Servers, Super-computers (32-bit/64-bit powerful computers)
Special purpose microprocessors
Digital Signal Processing (DSP) processors
Multimedia applications
Communication processors
Networking equipment (switches, routers, etc.)
22.05.2014 6 Microprocessors Architecture
Typical Applications
Microcontrollers: used to implement embedded systems
consumer electronics (toys, cameras, robots)
consumer products (washing machines, microwave ovens, etc.)
instrumentation (oscilloscopes, medical equipment)
process control (data acquisition and control)
communication (telephone sets, answering machine, etc.)
office appliances (fax machines, printers, etc.)
multimedia (smart-phones, PDAs, tablets, teleconferencing
equipment)
automotive industry (onboard computers)
22.05.2014 7 Microprocessors Architecture
The Educational Need - a Big Question
22.05.2014 8 Microprocessors Architecture
others
Itanium
RISC
Pentium
80486
80386
80286
8086
8085
8080
8008
4004
8048
8051
DSPs
Comm processors
others
others
General Purpose
Microprocessors
Microcontrollers
Special Purpose
Microprocessors
AVR
PIC
Microprocessors Course Outline
1. The Structure of a Microcomputer
2. Overview of a CISC, General Purpose Microprocessor Core
3. The x86 Architecture
4. RISC Architectures
5. Input/Output Strategies
22.05.2014 9 Microprocessors Architecture
Administrative Issues
Laboratory
Objective: highlight the architectural attributes for the x86
Microprocessors
Sessions: 5 teaching labs + one evaluation session
Bibliography
C. Burileanu, Microprocesoarele x86 o abordare software, Grupul
pentru microinformatic, Cluj-Napoca, 1999
Communication through the Moodle framework (Arhitectura
Microprocesoarelor - H. Cucu, Password: Microprocesor)
Lecture slides (contain only a brief summary)
Laboratory documentation
Evaluation results
22.05.2014 10 Microprocessors Architecture
Evaluation
22.05.2014 11 Microprocessors Architecture
Evaluare

Evaluarea activitii pe parcurs (pentru care studentul primete o not: N
laborator
)
este compus din 2 teste obligatorii i o evaluare final opional.
o Nici-o component a evalurii activitii pe parcurs nu se reface.
o Notarea:
Cele 2 teste n timpul edinelor de laborator sunt evaluate cu
note (0 10).
dac media celor 2 note < 5: studentul va reface complet aceast
disciplin n anul urmtor.
dac media celor 2 note >= 5: studentul poate opta pentru:
prezentarea la evaluarea final; n acest caz: N
laborator
= 5 10;
prezentarea direct la examen; n acest caz: N
laborator
= 5.

Examen final n sesiunea de var:
o Examen oral.
o Studentul primete o not: N
examen
= 0 10.
o Se poate reface n septembrie.

Media final: M = (N
laborator
+ N
examen
) / 2
calculat prin trunchiere pentru 4 =< M < 5 i prin rotunjire pentru celelalte
valori.
Evaluation
22.05.2014 12 Microprocessors Architecture
Definitions
Block Diagram of a Microcomputer
22.05.2014 15 Microprocessors Architecture
A microcomputer is a general purpose device that can be programmed
to carry out a set of arithmetic and/or logical operations.
Functional Components
CPU: the hardware block which processes data and
controls the system
Memory: the hardware block which stores data in a
sequence of memory locations
I/Odevices: hardware blocks that form the interface
between the microcomputer and the external world
Busses: the connections between the above blocks
22.05.2014 16 Microprocessors Architecture
The von Neumann Principles
Both data and instructions are stored in the memory
The contents of the memory is accessed by location
The microprocessor is the CPU of the microcomputer; its role is
to process data and control the system
The instructions are fetched from the memory and executed
sequentially by the CPU
I/O ports are used to communicate with other devices
The three hardware blocks are interconnected by the system bus
22.05.2014 17 Microprocessors Architecture
The Memory Basic Principles
Memory sequence of memory locations used to store info
Each memory location:
stores an 8-bit number, a byte of data
is identified by a unique number, called address
The memory is accessed and organized by the CPU only
The CPU can choose to create logical subdivisions within the
memory (called pages or segments)
The memory map all memory locations that can be
addressed by the CPU (not necessarily implemented)
22.05.2014 18 Microprocessors Architecture
The Memory A Closer Look
22.05.2014 19 Microprocessors Architecture
The Memory A Closer Look
The size of the memory is directly linked with the address
size through the following equation:
Example 1:
using an address of 2 bits, one can form 4 different addresses:
00, 01, 10, and 11, for up to 4 different memory locations
consequently, a memory with an address of 2 bits will
comprise 4 memory locations (4 bytes).
Example 2:
using a 20-bit address, one can form 2
20
different addresses,
corresponding to 2
20
different memory locations
consequently, a memory with a 20-bit address will comprise
2
20
memory locations (1 MB).
22.05.2014 20 Microprocessors Architecture
] [
2
bits e addressSiz
memorySize
The Memory Contents Significance
22.05.2014 21 Microprocessors Architecture
This could be a 16-bit result
This could be an instruction
These could be the first two elements in
an array of 8-bit numbers
The significance of the information is given by the programmer.
The memory doesnt know the significance of the information it stores!
Input/Output Devices
I/O Devices hardware blocks that form the interface
between the microcomputer and the external world
I/O Devices can be regarded as a set of I/O Ports
Each I/O port can be used to:
send an 8-bit/16-bit/32-bit number to an external device
receive an 8-bit/16-bit/32-bit number from an external device
is identified by a unique number, called port address
The ports map all ports that can be addressed by the CPU
(not necessarily implemented)
22.05.2014 22 Microprocessors Architecture
The System Bus
Bus set of physical connections that link several hardware
blocks; these connections are used for information transfer
The CPU, Memory and I/O Devices are connected through
a unique System Bus with three components:
A bidirectional Data Bus
Transfers data (operands, results, etc.) and instructions
An unidirectional Address Bus
Through this bus the CPU sends addresses to the Memory and
I/O Devices
A bidirectional Control Bus
Transfers command and control signals from/to the CPU
22.05.2014 23 Microprocessors Architecture
The Software Component
The microcomputer is executing instructions organized in
computer programs, namely the software
Two main categories:
The Operating System: set of programs which facilitate the
users access to the systems resources
User Software: set of programs specifically created by the user
to achieve a certain task
22.05.2014 24 Microprocessors Architecture
Summary
22.05.2014 25 Microprocessors Architecture
The CPU: executes instructions (processes data) and controls the system
The Memory: stores both the data and the instructions
The I/O Devices: interconnect the microcomputer with the outside world
Information Representation in Computer Systems
Information Representation in
Computer Systems
22.05.2014 27 Microprocessors Architecture
Information is stored using electronic circuits, called flip-
flops (or bistables), that have two stable states: on/off
The state of a bistable can be used to represent a bit (i.e.
binary digit: 0, 1) or a boolean value (true, false)
Data types with more than two possible values are stored
using sequences of bits:
Byte (B) a sequence of 8 bits: can store max 2
8
(256) values
Word (w) a sequence of 16 bits: can store max 2
16
values
Double word (dw) 32 bits: can store max 2
32
values
Numbers representation
22.05.2014 28 Microprocessors Architecture
Unsigned (positive) integer numbers
Natural binary representation
Signed integer numbers
Sign & magnitude representation
1s complement representation
2s complement representation
Signed real numbers
Fixed point representation
Floating point representation
Integer numbers representation
22.05.2014 29 Microprocessors Architecture
Decimal
value
Sign and magnitude 1s complement 2s complement
5 natural binary: 00000101 natural binary: 00000101 natural binary: 00000101
-5
natural binary: 00000101
flip the sign bit: 10000101
natural binary: 00000101
flip all bits: 11111010
natural binary: 00000101
flip all bits: 11111010
add 1: 11111011
12 natural binary: 00001100 natural binary: 00001100 natural binary: 00001100
-12
natural binary: 00001100
flip the sign bit: 10001100
natural binary: 00001100
flip all bits: 11110011
natural binary: 00001100
flip all bits: 11110011
add 1: 11110100
Real numbers representation
22.05.2014 30 Microprocessors Architecture
Fixed point representation
A fixed sequence of bits is used to represent decimal part
Twos complement representation
A fixed sequence of bits is used to represent the fractional part
Natural binary representation
Floating point representation
A fixed sequence of bits is used to represent the mantissa
Twos complement representation
A fixed sequence of bits is used to represent the exponent
Natural binary representation
Example: real number = mantissa 2
exponent
Characters representation
22.05.2014 31 Microprocessors Architecture
Coding
conventions:
ASCII
UTF-8
UTF-16
Unicode
Instructions are represented using sequences of bytes;
Some processors have fixed-size instructions
8086 has variable-size instructions (1-6 bytes)
The instruction codes
are formed of several fields:
one instruction-type field
none, one or several data fields
none, one or several address fields
are associated with mnemonics (to be used in programming)
Example: add AX, 8017h <=> 051780h
Programs representation
22.05.2014 32 Microprocessors Architecture
The binary, decimal
and hexadecimal bases
22.05.2014 33 Microprocessors Architecture
Any sequence of bits can also be represented as:
a decimal number (number in base 10)
can be written as a sequence of decimal digits (0, 1, , 9)
a hexadecimal number (number in base 16)
can be written as a sequence of hexadecimal digits (0, 1, , 9, A,
B, C, D, E and F)
Hexadecimal numbers representation conventions:
the h suffix: 1A44h
the 0x prefix: 0x1A44
Conversion algorithms
binary
decimal
hexa
2.1 Von Neumann Architecture Reminder and Example
Block Diagram of a Microcomputer
22.05.2014 36 Microprocessors Architecture
The CPU: executes instructions (processes data) and controls the system
The Memory: stores both the data and the instructions
The I/O Devices: interconnect the microcomputer with the outside world
Instruction Execution Example
22.05.2014 37 Microprocessors Architecture
The CPU is reset and starts executing instructions from a
predefined address in the memory (100h)
Reset
Execute
instructions from
address 100h
Instruction Execution Example
22.05.2014 38 Microprocessors Architecture
The CPU sends the address of this first instruction (100h)
through the Address Bus
The CPU sends a MEM-READ signal through the Control Bus
100h
MEM-READ
Instruction Execution Example
22.05.2014 39 Microprocessors Architecture
The Memory receives the MEM-READ signal and reads the
address from the Address Bus
100h
MEM-READ
Instruction Execution Example
22.05.2014 40 Microprocessors Architecture
The Memory finds the instruction (instruction #1) in the
memory location(s) with the corresponding address (100h)
Instruction Execution Example
22.05.2014 41 Microprocessors Architecture
The Memory sends the instruction through the Data Bus and
sends an ACK signal through the Control Bus
instruction #1
ACK
Instruction Execution Example
22.05.2014 42 Microprocessors Architecture
The CPU receives the ACK signal and reads the instruction
from the Data Bus
instruction #1
ACK
Instruction Execution Example
22.05.2014 43 Microprocessors Architecture
The CPU decodes the instruction to "understand" what it has
to do next
Let's suppose that it has to add the value 50h to the value
stored in the memory location with the address 2000h
Decode
instruction
Instruction Execution Example
22.05.2014 44 Microprocessors Architecture
The CPU sends the address (2000h) on the Address Bus and
sends a MEM-READ signal through the Control Bus
2000h
MEM-READ
Instruction Execution Example
22.05.2014 45 Microprocessors Architecture
The Memory receives the MEM-READ signal and reads the
address from the Address Bus
2000h
MEM-READ
Instruction Execution Example
22.05.2014 46 Microprocessors Architecture
The Memory finds the data (85h) in the memory location
with the corresponding address (2000h)
Instruction Execution Example
22.05.2014 47 Microprocessors Architecture
The Memory sends the data (85h) through the Data Bus and
sends an ACK signal through the Control Bus
85h
ACK
Instruction Execution Example
22.05.2014 48 Microprocessors Architecture
The CPU receives the ACK signal and reads the data from the
Data Bus
85h
ACK
Instruction Execution Example
22.05.2014 49 Microprocessors Architecture
The CPU temporarily stores the data in a register
Instruction Execution Example
22.05.2014 50 Microprocessors Architecture
The CPU adds the value 50h to the register (the result will be
D5h)
Instruction Execution Example
22.05.2014 51 Microprocessors Architecture
The CPU sends
the result (D5h) through the Data Bus,
the address (2000h) through the Address Bus and
a MEM-WRITE signal through the Control Bus
2000h
MEM-WRITE
D5h
Instruction Execution Example
22.05.2014 52 Microprocessors Architecture
The Memory
receives the MEM-WRITE signal,
reads the address (2000h) from the Address Bus,
reads the result (D5h) from the Data Bus and
stores the result into the corresponding memory location
2000h
MEM-WRITE
D5h
Instruction Execution Example
22.05.2014 53 Microprocessors Architecture
The CPU continues by executing the next instruction
2.2 The Set of General Purpose Registers
CPU Registers
22.05.2014 55 Microprocessors Architecture
Register a small amount of storage inside the CPU
Implemented as a set of N synchronized bistables
Stores N bits of data
Highest access speed among all storage options
Several types of registers:
General vs. special purpose (dedicated) registers
Physical vs. logical registers
User-accessible vs. non user-accessible registers
General Purpose Registers
22.05.2014 56 Microprocessors Architecture
General purpose registers (GPRs)
Set of equally-sized registers used to store temporary data
(operands/results) needed in the execution of the program
User-accessible (architectural attributes)
Implemented as physical or logical registers
The size of the GPRs performance criterion
Equal to the size of the Internal Data Bus
The number of GPRs performance criterion
A larger number of GPRs => faster, more compact programs,
ease of programming,
General Purpose Registers
22.05.2014 57 Microprocessors Architecture
MUX (multiplexer) outputs one of the data inputs
(depending on the address inputs)
Internal Data Bus extension of the External Data Bus
inside the CPU
Special Purpose Registers
22.05.2014 58 Microprocessors Architecture
Special purpose registers
Dedicated registers that can be used only for specific purposes
Size depends on the particular role of the register
Some are user-accessible (architectural attributes), some not
Examples:
Data register (DR) and Address register (AR)
Accumulator (A)
Status (Flags) register (F)
Instruction Pointer (IP)
Stack Pointer (SP)
2.3 The interface between the CPU and the System Bus
The Data Register
and the Address Register
22.05.2014 60 Microprocessors Architecture
DR (data register): the CPU Data Bus interface
The data in DR are available to all the hardware blocks
connected on the Data Bus
The size of DR is the size of the Data Bus
DR is not an architectural attribute
The Data Register
and the Address Register
22.05.2014 61 Microprocessors Architecture
AR (address register): the CPU Address Bus interface
The address in AR is available to all the hardware blocks
connected on the Address Bus; only the CPU writes in AR
The size of AR is the size of the Address Bus
AR is not an architectural attribute
2.4 The Arithmetic and Logic Unit (ALU)
The Arithmetic and Logic Unit
22.05.2014 63 Microprocessors Architecture
The Arithmetic and Logic Unit (ALU)
digital circuit that performs
integer arithmetic operations: add, subtract, increment, etc.
logical operations: and, or, xor, not, clear, shift, rotate, etc.
The inputs to the ALU
Data to be processed (one or two integer numbers)
The operation to be performed (specified by the Control Unit)
Possibly some status flags
The outputs of the ALU
The operation result(s) are placed in the Accumulator or on the
Internal Data Bus
The status flags are updated after each operation
The Arithmetic and Logic Unit
22.05.2014 64 Microprocessors Architecture
The Status Register
22.05.2014 65 Microprocessors Architecture
The Status Register (also called Flags Register)
A collection of flag bits, which store information regarding
the state of the processor
Arithmetic and logic flags
Bits encoding the status of the previous arithmetic/logic
operation
Used and updated by the ALU
Other types of flags
Interrupt enable flag
Supervisor flag
Direction flag
Typical Arithmetic and Logic Flags
22.05.2014 66 Microprocessors Architecture
Carry flag (CF): signals an arithmetic carry or borrow for
unsigned numbers
Parity flag (PF): signals that the number of ones in the least
significant byte of the result is even
Zero flag (ZF): signals that the result is 0
Sign flag (SF) : signals that the most significant bit of the result
is set (this is the sign bit in twos complement representation)
Overflow flag (OF): signals an arithmetic overflow for signed
numbers
The Accumulator and the Shift Register
22.05.2014 67 Microprocessors Architecture
The Accumulator special purpose register
Stores one of the operands before the operation
Stores the result of the operation
Size equal to the size of the general purpose registers
Is user-accessible (architecture attribute)
The Shift Register special purpose register
Used by the ALU to make shift and rotation operations
Size double than the size of the general purpose registers
Is not user-accessible
2.5 The Memory Addressing Control Unit (MACU)
The Memory Addressing Control Unit
22.05.2014 69 Microprocessors Architecture
The Memory Addressing Control Unit
Hardware block that computes the physical address needed to
identify information in the Memory or I/O Ports
Receives input from the Internal Data Bus
Places its output (a physical address) in the Address Register
Functionality classification
Instruction addressing (in the program memory)
Sequentially, instruction after instruction
Non-sequentially, through jumps
Data addressing (in the data memory)
Elementary data addressing
Stack addressing
Data arrays addressing
The Memory Addressing Control Unit
22.05.2014 70 Microprocessors Architecture
Memory Management Techniques
22.05.2014 71 Microprocessors Architecture
Linear Memory Organization
The memory is regarded as a single block of memory locations
The memory is addressed using directly a physical address
Memory Segmentation
The memory is logically divided into segments (non equal-sized,
possibly overlapping sections)
The memory is addressed using a segment address and an offset
Memory Paging
The memory is logically divided into pages (equal sized, non-
overlapping, strictly concatenated sections)
The memory is addressed using a page address and an offset
Sequential Instructions Addressing
22.05.2014 72 Microprocessors Architecture
Sequential Instructions Addressing
The main principle of the von Neumann architecture
Achieved through the means of a counter register
The Program Counter (PC) special purpose register
Stores the physical address of the current instruction
Incremented after the execution of each instruction
Size equal to the size of a physical address
In some architectures is user-accessible
Other hardware blocks involved: MUX2 and MUX5
Sequential Instructions Addressing
22.05.2014 73 Microprocessors Architecture
The program is executed instruction after instruction
The Instruction Register stores the instruction before
decoding
The Memory Addressing Control Unit
22.05.2014 74 Microprocessors Architecture
Non-Sequential Instructions Addressing
22.05.2014 75 Microprocessors Architecture
Exceptions to the normal, sequential execution of a program:
jumps, loops or subprogram calls
The jump address can be:
An absolute address: a complete physical address
The address is provided by another hardware block through the
Internal Data Bus
An offset relative to the address of the current instruction
The offset provided by another hardware block through the Internal
Data Bus is added to the address in PC
The Program Counter is also updated
Other hardware blocks involved: MUX2, MUX4, MUX5, Adder
Elementary Data Addressing
22.05.2014 76 Microprocessors Architecture
The data can potentially reside anywhere in the memory
The data address can be:
An absolute address: a complete physical address
The address is provided by another hardware block through the
Internal Data Bus
An offset relative to the address of the current instruction
The offset provided by another hardware block through the
Internal Data Bus is added to the address in PC
Other hardware blocks involved: MUX4, MUX5, Adder
Stack Addressing
22.05.2014 77 Microprocessors Architecture
The Stack: LIFO data structure
Accessed through the means of a pointer register
Pushing an element in the Stack -> decrementing the pointer
Popping an element out of the Stack -> incrementing the pointer
Software vs. hardware Stack
The Stack Pointer (SP ) special purpose register
Stores the physical address of the top element
Size equal to the size of a physical address
User-accessible (architecture attribute)
Other hardware blocks involved: MUX3 and MUX5
Stack Addressing
22.05.2014 78 Microprocessors Architecture
Data Arrays Addressing
22.05.2014 79 Microprocessors Architecture
The Memory can accommodate arrays of data
Accessed through the means of index registers, which store the
physical of the first element in the array
The address of a random element is obtained by adding a relative
offset to the index register
Offset size => max number of elements in the array
The Index Registers (IX) special purpose registers
Store the physical addresses of various data arrays
Size equal to the size of a physical address
User-accessible (architecture attribute)
Other hardware blocks involved: MUX1, MUX4, MUX5, Adder
The Memory Addressing Control Unit
22.05.2014 80 Microprocessors Architecture
2.6 The Timing and Control Unit
The Timing and Control Unit
22.05.2014 82 Microprocessors Architecture
The Timing and Control Unit (TCU)
Hardware block inside the CPU that:
fetches, decodes and manages the execution of instructions
controls the flow of data through the processor
coordinates the activities of the other units within the CPU and also
outside the CPU
achieves the above through timing and control signals
Design: hardwired vs. micro-programmed
The inputs to the TCU
The instruction in the Instruction Register (IR)
Internal control signals (the status flags)
The outputs of the TCU
Internal control signals (for the blocks within the CPU)
External control signals (for the blocks outside the CPU)
The Timing and Control Unit
22.05.2014 83 Microprocessors Architecture
The Instruction Register and
the Instruction Decoder
22.05.2014 84 Microprocessors Architecture
The Instruction Register (IR) special purpose register
Stores the instruction code fetched from the memory
Receives input only from the Data Register
Size equal to the smallest instruction code
Is not user-accessible (not an architecture attribute)
The Instruction Decoder
Hardware block that decodes instruction codes
Each code has an associated, unique output line
Only one of the output lines will be 1 at any moment in time
Receives input from the Instruction Register
Sends its output to the Timing and Control Unit
The Typical CISC Instruction Format
22.05.2014 85 Microprocessors Architecture
The instructions are stored in the memory in one or several
memory locations (depending on the type of instruction)
Instruction format all the information required by the CPU to
execute an instruction
Comprises at least one byte: the instruction code (the semantic)
The instruction code may require additional bytes
May comprise operands, addresses, offsets on one or several bytes
1-6 bytes for 16-bit x86 microprocessors
1-15 bytes for 32-bit x86 microprocessors
Example:
code [code]
[data or
address]
[data or
address]
[data or
address]

Instruction Execution Timing


22.05.2014 86 Microprocessors Architecture
Typically, the execution of an instruction has several stages:
Fetch the instruction code is read from the memory
Decode the instruction code is decoded
Execute the instruction is executed (might comprise operands fetch)
Write the result is written in a register or a memory location
The instruction execution stages are called machine cycles
Any instruction is executed in one or several machine cycles (depending on
its complexity)
In a machine cycle the CPU executes sequentially several elementary
actions accomplishing a clear, well-defined task
Elementary actions are executed once every clock cycle
An internal clock signal is generated based on an external quartz oscillator
A CPU state is a physical time period equal to the duration of a clock cycle
In a state, the CPU executes one elementary action or two independent
elementary actions (in the same time)
Instruction Execution Timing Example
22.05.2014 87 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 1: Fetch


22.05.2014 88 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
T1. (AR) <- (PC), MEM-READ
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (IR) <- (DR)
T4. decode instruction code
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 1: Fetch


22.05.2014 89 Microprocessors Architecture
T1. (AR) <- (PC), MEM-READ
Machine Cycle 1: Fetch
22.05.2014 90 Microprocessors Architecture
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
Machine Cycle 1: Fetch
22.05.2014 91 Microprocessors Architecture
T3. (IR) <- (DR)
Machine Cycle 1: Fetch
22.05.2014 92 Microprocessors Architecture
T4. decode instruction code
Machine Cycle 2: Read Address
22.05.2014 93 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
T1. (AR) <- (PC), MEM-READ
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX2) <- (DR)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 2: Read Address


22.05.2014 94 Microprocessors Architecture
T1. (AR) <- (PC), MEM-READ
Machine Cycle 2: Read Address
22.05.2014 95 Microprocessors Architecture
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
Machine Cycle 2: Read Address
22.05.2014 96 Microprocessors Architecture
T3. (AUX2) <- (DR)
Machine Cycle 3: Read Address
22.05.2014 97 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
T1. (AR) <- (PC), MEM-READ
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (AUX1) <- (DR)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 3: Read Address


22.05.2014 98 Microprocessors Architecture
T1. (AR) <- (PC), MEM-READ
Machine Cycle 3: Read Address
22.05.2014 99 Microprocessors Architecture
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
Machine Cycle 3: Read Address
22.05.2014 100 Microprocessors Architecture
T3. (AUX1) <- (DR)
Machine Cycle 4: Read Operand 1
22.05.2014 101 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
T1. (AR) <- (PC), MEM-READ
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
T3. (A) <- (DR)
M5. Read operand 2 and Execute
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 4: Read Operand 1


22.05.2014 102 Microprocessors Architecture
T1. (AR) <- (PC), MEM-READ
Machine Cycle 4: Read Operand 1
22.05.2014 103 Microprocessors Architecture
T2. (PC) <- (PC) + 1, (DR) <- ((AR))
Machine Cycle 4: Read Operand 1
22.05.2014 104 Microprocessors Architecture
T3. (A) <- (DR)
Machine cycle 5: Read operand 2 and Execute
22.05.2014 105 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
T1. (AR) <- (AUX1, AUX2), MEM-READ
T2. (DR) <- ((AR))
T3. (A) <- (A) + (DR)
M6. Write result
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine cycle 5: Read operand 2 and Execute


22.05.2014 106 Microprocessors Architecture
T1. (AR) <- (AUX1, AUX2), MEM-READ
Machine cycle 5: Read operand 2 and Execute
22.05.2014 107 Microprocessors Architecture
T2. (DR) <- ((AR))
Machine cycle 5: Read operand 2 and Execute
22.05.2014 108 Microprocessors Architecture
T3. (A) <- (A) + (DR)
Machine Cycle 6: Write Result
22.05.2014 109 Microprocessors Architecture
Instruction example: (2000h) <- (2000h) + 50h
Instruction format:
6 machine cycles:
M1. Fetch and Decode
M2. Read address (least significant byte)
M3. Read address (most significant byte)
M4. Read operand 1
M5. Read operand 2 and Execute
M6. Write result
T1. (DR) <- (A)
T2. (AR) <- (AUX1, AUX2), MEM-WRITE
code
addr
low
(00h)
addr
high
(20h)
data
(50h)

Machine Cycle 6: Write Result


22.05.2014 110 Microprocessors Architecture
T1. (DR) <- (A)
Machine Cycle 6: Write Result
22.05.2014 111 Microprocessors Architecture
T2. (AR) <- (AUX1, AUX2), MEM-WRITE
2.7 Summary
Summary
22.05.2014 113 Microprocessors Architecture
General Purpose Registers (GPRs)
Memory Data Register (MDR)
Memory Address Registers (MAR)
Arithmetic and Logic Unit (ALU)
Memory Addressing Control Unit
Timing and Control Unit (TCU)
3.1 The Registers
x86 Registers
22.05.2014 116 Microprocessors Architecture
Types of registers:
General vs. special purpose (dedicated) registers
Physical vs. logical registers
User-accessible vs. non user-accessible registers
Size of registers:
8/16-bit for the 16-bit microprocessors
8/16/32-bit for the 32-bit microprocessors
8/16/32/64-bit for the 64-bit microprocessors
x86 General Purpose Registers
22.05.2014 117 Microprocessors Architecture
x86 GPRs: AX, BX, CX, DX (16-bit registers)
Multifunctional: can be potentially used for any operation
They have implicit functions also
Can be accessed as two separate bytes: AH, AL, BH, BL, etc.
In 32-bit microprocessors they are: EAX, EBX, ECX, EDX
Implicit functions
AX Accumulator
BX Base index (for use with arrays)
CX Counter (for use with loops and strings)
DX Extend the precision of the accumulator
x86 Pointer Registers
22.05.2014 118 Microprocessors Architecture
x86 Pointer Registers: SP, BP (16-bit registers)
Multifunctional: can be potentially used for any operation
They have implicit functions also
In 32-bit microprocessors they are: ESP, EBP (32-bit registers)
SP Stack pointer
Stores the effective address of the element in the top of the stack
Used implicitly in several instructions: push, pop, call, ret, int
BP Base pointer
Used to point at some other place in the stack
Stores the effective address of another value in the stack
x86 Index Registers
22.05.2014 119 Microprocessors Architecture
x86 Index Registers: SI, DI (16-bit registers)
Multifunctional: can be potentially used for any operation
Used implicitly in array indexing instructions: movs, lods,
stos, cmps, scas
In 32-bit microprocessors they are: ESI, EDI (32-bit registers)
SI Source Index
Stores the effective address or the index of the current
element in the source array
DI Destination Index
Stores the effective address or the index of the current
element in the destination array
x86 Flags Register
22.05.2014 120 Microprocessors Architecture
The x86 Flags register (F)
A collection of 16 flag bits, which store information regarding the
state of the processor
Used implicitly in several instructions: pushf, popf, lahf, sahf
Interrupt enable flag (IF): determines whether or not the CPU will
handle maskable hardware interrupts
Trap flag (TF): permits operation of a processor in single-step mode
Direction flag (DF): controls the left-to-right or right-to-left
direction of array processing
x86 Arithmetic and Logic Flags
22.05.2014 121 Microprocessors Architecture
Carry flag (CF): signals an arithmetic carry or borrow for unsigned
numbers
Auxiliary flag (AF): signals an arithmetic carry over the first nibble
Parity flag (PF): signals that the number of ones in the least significant
byte of the result is even
Zero flag (ZF): signals that the result is 0
Sign flag (SF) : signals that the most significant bit of the result is set
(this is the sign bit in twos complement representation)
Overflow flag (OF): signals an arithmetic overflow for signed numbers
x86 Segment Registers
22.05.2014 122 Microprocessors Architecture
x86 Segment Registers: CS, DS, ES, SS (16-bit registers)
Special purpose registers
Used for memory management: the memory is logically
segmented into smaller parts called segments
32-bit microprocessors use the same registers: CS, DS, ES, SS
Segment registers store segment addresses for:
The code segment CS
The data segment DS
The extended data segment ES
The stack segment SS
x86 Instruction Pointer Register
22.05.2014 123 Microprocessors Architecture
x86 Instruction Pointer (Program Counter) Register: IP
16-bit register
Special purpose register
Stores the effective address of the current instruction
It is not user-accessible
Incremented after every instruction
Used implicitly by the flow control instructions: jumps, calls
x86 Register Summary
22.05.2014 124 Microprocessors Architecture
x86 has very few registers
4 general purpose registers, 2 index registers, 2 pointer registers
Some of the x86 registers are multifunctional
x86 has 4 segment registers
special functions in memory management
All the registers are user-accessible; one exception: IP
The size of the registers is usually the size of the Internal Data
Bus
3.2 Memory Management
The Memory Basic Principles
Memory sequence of memory locations used to store info
Each memory location:
stores an 8-bit number, a byte of data
is identified by a unique number, called address
The memory is accessed and organized by the CPU only
The CPU can choose to create logical subdivisions within the
memory (called pages or segments)
The memory map all memory locations that can be
addressed by the CPU (not necessarily implemented)
22.05.2014 126 Microprocessors Architecture
The Memory A Closer Look
22.05.2014 127 Microprocessors Architecture
Memory Management Techniques
22.05.2014 128 Microprocessors Architecture
Linear Memory Organization
The memory is regarded as a single block of memory
locations
The memory is addressed using directly a physical address
Memory Segmentation
The memory is logically divided into segments (non equal-
sized, possibly overlapping sections)
The memory is addressed using a segment address and an
offset
x86 Memory Segmentation
22.05.2014 129 Microprocessors Architecture
16-bit x86 microprocessors have 20 address pins
The memory map has 2
20
memory locations
The physical address (PA) has 20 bits
x86 organizes the memory into smaller segments
Segment address (SA) 16-bit address used to identify a
segment in the memory
Effective address (EA) or offset 16-bit address used to
identify the memory location inside the segment
The memory is organized in 2
16
=64k segments comprising
2
16
=64k memory locations
Logic Address -> Physical Address
22.05.2014 130 Microprocessors Architecture
The logic address (LA)
32-bit address; concatenation of SA and EA
The physical address (PA) is not an architecture attribute!
The logic address, segment address and effective address
are architecture attributes
The microprocessor translates the LA into a PA in order to
access the memory: PA = SA 0h + EA
Default Memory Segments
22.05.2014 131 Microprocessors Architecture
Segment addresses (SAs) can be stored in segment
registers
CS stores the SA of the current code segment
DS stores the SA of the current data segment
ES stores the SA of the current extended data segment
SS stores the SA of the current stack segment
Segments can start only at physical addresses which are
multiples of 16
Effective addresses (EAs)can be stored in address registers:
BX, SI, DI, SP, BP and IP
Special (SA, EA) pairs
22.05.2014 132 Microprocessors Architecture
Particular address registers are associated with particular
segment registers:
IP+CS the physical address of the current instruction is
formed using the effective address in IP and the segment
address in CS
SP+SS the physical address of the element in the top of the
stack is formed using SP and SS
BP+SS, BX+DS, SI+DS, DI+ES
Segment redirection
Segment overlapping
x86 Memory Segmentation. Summary
22.05.2014 133 Microprocessors Architecture
The memory can be regarded as a sequence of memory locations
Each memory location stores an 8-bit number and has a unique
20-bit address, called physical address
The x86 CPU regards the memory as being composed of 64k
segments comprising 64k locations each
The x86 CPU uses a 16-bit segment address to select a segment
and a 16-bit effective address to identify a memory location
inside the segment
The translation between the logical organization of the memory
in segments and the physical address is done as follows:
PA = SA 0h + EA
3.3 Memory Access. Addressing Modes
What is an Addressing Mode?
22.05.2014 135 Microprocessors Architecture
A technique to specify the location of the operands and
results
Specifies how to calculate the effective memory address of
operands and results, using information in registers and/or
constants with the instruction format
Defines how machine language instructions in the
architecture identify the operands /results of each
instruction
Register Implicit Addressing
22.05.2014 136 Microprocessors Architecture
The targeted information is found in a register (not in the
memory)
The information regarding which register stores the data is
coded in the instruction code (the first byte in the
instruction)
The instruction code comprises several fields; among them:
the fields which code the source/destination registers
The targeted information is an
operand or a result
Minimum instruction size: 1B

instr.
code
addr
low
addr
high
data
instr.
semantic
code
dest
register
code
source
register
code
register
Immediate Addressing
22.05.2014 137 Microprocessors Architecture
The targeted information is found in the memory, in the
instruction, immediately after the instruction code
The targeted information
is coded in the instruction; it is a constant
is an operand
cannot be a result
cannot be an instruction
Minimum instruction size: 2B (the data has at least 1B)

instr.
code
data
low
data
high

Direct (Absolute) Addressing


22.05.2014 138 Microprocessors Architecture
The targeted information is found in the memory, at an
address coded in the instruction
The address is in the program memory
The targeted information is in the data or program memory
Minimum instruction size: 3B (the address has at least 2B)

instr.
code
addr
low
addr
high

data
Relative Addressing
22.05.2014 139 Microprocessors Architecture
The targeted information is found in the program memory,
at an address obtained as a sum between the address of the
current instruction and an offset coded in the instruction
The offset can be positive or negative
The targeted information can be an operand or an
instruction
Minimum instruction size: 2B (the offset usually has 1B)

instr.
code
offset data
IP (addr)
+
Register Indirect Addressing
22.05.2014 140 Microprocessors Architecture
The targeted information is found in the memory, at an
address specified in a register coded in the instruction code
The targeted information can be an operand, a result or an
instruction
One register might not be enough to store an address
Minimum instruction size: 1B

instr.
semantic
code
addr.
register
code

register (addr)
data
Memory Indirect Addressing
22.05.2014 141 Microprocessors Architecture
The targeted information is found in the memory, at the
address specified in a memory location(s) whose address is
specified in the instruction code
The targeted information can be an operand, a result or an
instruction
Minimum instruction size: 3B (the address has at least 2B)
data

instr.
code
addr
low
addr
high

addr
low
addr
high

Base plus Index Addressing


22.05.2014 142 Microprocessors Architecture
The targeted information is found in the memory, at the
address obtained as a sum between the address stored in a
register and an offset (index) coded in the instruction
The address stored in the register is usually the base address
of an array of data (the address of the first element)
The targeted information can be an operand or a result
Minimum instruction size: 2B (the offset has at least 1B)
data data data

instr.
code
offset
register (addr)
+
Addressing Modes Summary
22.05.2014 143 Microprocessors Architecture
Various addressing modes
some simpler, some more complicated
some can be used for instructions also, some only for data
the route to the data can be direct or indirect
the targeted information can be in a register, in the program
memory or in the data memory
Depending on the addressing mode, the minimum
instruction size can be 1B / 2B / 3B
The information stored in the instructions can have various
semantics / meanings: data, offset, address, etc.
x86 Addressing Modes
22.05.2014 144 Microprocessors Architecture
Program addressing modes
Relative addressing
Direct addressing
Register indirect addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Program Addressing Modes
22.05.2014 145 Microprocessors Architecture
Used to address instructions through jumps and calls
(normally instructions are addressed sequentially)
Relative addressing (jmp 5h)
The targeted instruction is in the memory, in the code
segment, at an address obtained as the sum between the
content of IP and an offset stored in the current instruction
Direct addressing(jmp 56A80100h)
The targeted instruction is in the memory, in the code
segment, at an address specified in the current instruction
Register indirect addressing (jmp [BX])
The targeted instruction is in the memory, in the code
segment, at an address specified in a register coded in the
current instruction code
x86 Data Addressing Modes
22.05.2014 146 Microprocessors Architecture
Used to address data only
Register implicit addressing (mov AH, BL)
The targeted information is in a register:
8-bit register: AL, AH, BL, BH, CL, CH, DL, DH
16-bit register: AX, BX, CX, DX, SI, DI, SP, BP
Immediate addressing (mov AX, 1234h)
The targeted information is in the memory, in the code
segment, at the effective address IP + 1
x86 Data Addressing Modes
22.05.2014 147 Microprocessors Architecture
Direct addressing (mov AX, [1234h])
The targeted information is in the memory, in the data
segment, at an effective address specified in the current
instruction
Indexed addressing (mov AX, [SI+20h])
The targeted information is in the memory, in the data
segment, at an effective address obtained as a sum between
the content of SI or DI and an offset stored in the current
instruction
Indirect implicit addressing (mov AX, [DI])
The targeted information is in the memory, in the data
segment, at the effective address stored in SI or DI
x86 Data Addressing Modes
22.05.2014 148 Microprocessors Architecture
The targeted information is implicitly found in the data
segment
The targeted information can also be found in the code
segment, extended data segment or stack segment
A redirection prefix should be used
x86 Base-relative Addressing Modes
22.05.2014 149 Microprocessors Architecture
Direct base relative addressing (mov AX, [BX + 1234h])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX and
an offset stored in the current instruction
Indexed base relative addressing (mov AX, [BX + DI + 10h])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX, the
content of SI or DI and an offset stored in the current instruction
Implicit base relative addressing (mov AX, [BX+SI])
The targeted information is in the memory, in the data segment, at
an effective address obtained as a sum between the content of BX, the
content of SI or DI
x86 Stack-relative Addressing Modes
22.05.2014 150 Microprocessors Architecture
Direct stack relative addressing (mov AX, [BP + 1234h])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP and
an offset stored in the current instruction
Indexed stack relative addressing (mov AX, [BP + DI + 10h])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP, the
content of SI or DI and an offset stored in the current instruction
Implicit stack relative addressing (mov AX, [BP+SI])
The targeted information is in the memory, in the stack segment, at
an effective address obtained as a sum between the content of BP, the
content of SI or DI
3.4 The Instruction Set
Microprocessor Instruction Types
22.05.2014 152 Microprocessors Architecture
Data transfer instructions
set a register or a memory location to a fixed constant value
copy data from a memory location to a register, or vice versa
read and write data from I/O devices
Data processing instructions
arithmetic operations (add, subtract, multiply, divide, etc.)
logic operations (and, or, exclusive or, shift, rotate, etc.)
bitwise logic operations
compare operations
Control flow instructions
branch to another location in the program and execute instructions
there
conditional branch to another location if a certain condition holds
branch to another location, while saving the location of the next
instruction as a point to return to (a call)
Data Transfer Instructions
22.05.2014 153 Microprocessors Architecture
Two operands: a source and a destination
General idea: the source is copied at the destination
The source and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size
Performance criterion: transfer as much data as possible
using an instruction with a small format
x86 Simple Data Transfer Instructions
22.05.2014 154 Microprocessors Architecture
MOV Move (Copy) Data
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
MOV Move (Copy) Data
22.05.2014 155 Microprocessors Architecture
Usage: MOV dest, src
Operands:
dest - general-purpose register, segment register (except CS)
or memory location
src - immediate value, general-purpose register, segment
register or memory location
Effects: Copies the source to the destination, overwriting
the destination's value: (dest) (src)
Flags: none
MOV Move (Copy) Data
22.05.2014 156 Microprocessors Architecture
MOV Move (Copy) Data
22.05.2014 157 Microprocessors Architecture
MOV Move (Copy) Data
22.05.2014 158 Microprocessors Architecture
XCHG Exchange Data
22.05.2014 159 Microprocessors Architecture
Usage: XCHG dest, src
Arguments:
dest - register or memory location
src register or memory location
Effects: Exchanges the source with the destination:
(dest) (src)
Flags: none
Miscellaneous: two memory locations cannot be used in
one instruction
XCHG Exchange Data
22.05.2014 160 Microprocessors Architecture
PUSH Push Operand in the Stack
22.05.2014 161 Microprocessors Architecture
Usage: PUSH src
Arguments: src 16-bit immediate value, register or
memory location
Effects: Decrements stack pointer with 2 and copies src on
top of the stack:
(SP) (SP) 2
((SS):(SP)+1) (src
high
)
((SS):(SP)) (src
low
)
Flags: none
Miscellaneous: src must be a 16-bit value
PUSH Push Operand in the Stack
22.05.2014 162 Microprocessors Architecture
PUSH Push Operand in the Stack
22.05.2014 163 Microprocessors Architecture
POP Pop a word from the Stack
22.05.2014 164 Microprocessors Architecture
Usage: POP dest
Arguments:
dest 16-bit register, segment register or memory location
Effects: Copies the element (16-bit) from the top of the
stack into dest and increments the stack pointer with 2:
(dest
high
) ((SS):(SP)+1)
(dest
low
) ((SS):(SP))
(SP) (SP) + 2
Flags: none
POP Pop a word from the Stack
22.05.2014 165 Microprocessors Architecture
POP Pop a word from the Stack
22.05.2014 166 Microprocessors Architecture
The Source and the Destination Arrays
22.05.2014 167 Microprocessors Architecture
The x86 architecture defines two implicit memory zones which
store two arrays of 8-bit or 16-bit numbers
The source array
Stored in the data segment (the segment with the address in DS)
The current element is at the effective address specified in SI
The destination array
Stored in the extended data segment (the segment with the address
in ES)
The current element is at the effective address specified in DI
The arrays are iterated from left-to-right or vice-versa based on
the value of the direction flag (DF)
x86 String / Array Instructions
22.05.2014 168 Microprocessors Architecture
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
MOVS Move String
22.05.2014 169 Microprocessors Architecture
Usage: MOVSB / MOVSW
Arguments: none
Effects:
movsb:
((ES):(DI)) ((DS):(SI))
(SI) (SI) 1, (DI) (DI) 1.
movsw:
((ES):(DI)) ((DS):(SI)), ((ES):(DI)+1) ((DS):(SI)+1)
(SI) (SI) 2, (DI) (DI) 2.
Flags: none
Miscellaneous: can be prefixed by rep, repe/repz,
repne/repnz
MOVS Move String
22.05.2014 170 Microprocessors Architecture
MOVS Move String
22.05.2014 171 Microprocessors Architecture
MOVS Move String
22.05.2014 172 Microprocessors Architecture
LODS Load String
22.05.2014 173 Microprocessors Architecture
Usage: LODSB / LODSW
Arguments: none
Effects:
lodsb: Copies the current 8-bit element from the source string
to the accumulator and increments (if DF=0) or decrements
(if DF=1) the value in SI by 1:
(AL) ((DS):(SI)), (SI) (SI) 1.
lodsw: Copies the current 16-bit element from the source
string to the accumulator and increments (if DF=0) or
decrements (if DF=1) the value in SI by 2:
(AL) ((DS):(SI)), (AH) ((DS):(SI)+1), (SI) (SI) 1.
Flags: none
LODS Load String
22.05.2014 174 Microprocessors Architecture
LODS Load String
22.05.2014 175 Microprocessors Architecture
STOS Store String
22.05.2014 176 Microprocessors Architecture
Usage: STOSB / STOSW
Arguments: none
Effects:
stosb: Copies the value in the accumulator in the current 8-bit
element in the destination string and increments (if DF=0) or
decrements (if DF=1) the value in DI by 1:
((ES):(DI)) (AL), (DI) (DI) 1.
stosw: Copies the value in the accumulator in the current 16-
bit element in the destination string and increments (if
DF=0) or decrements (if DF=1) the value in DI by 2:
((ES):(DI)) (AL), ((ES):(DI)+1) (AH), (DI) (DI) 2.
Flags: none
STOS Store String
22.05.2014 177 Microprocessors Architecture
STOS Store String
22.05.2014 178 Microprocessors Architecture
Data Processing Instructions
22.05.2014 179 Microprocessors Architecture
An arithmetic operation is applied to one or several sources and
the result is stored in the destination
The arithmetic flags (CF, AF, ZF, PF, SF, OF) are modified!
The sources and the destination:
Can be registers, memory locations, constants, I/O ports
Are identified using various addressing modes
Must have the same size (exceptions: multiplication, division)
CISC processors characteristics:
Data processing uses an accumulator (one of the sources is also the
destination)
The sources and the destination are memory locations
Execution time depends on instruction complexity
Performance criterion: fast execution of complex data processing
operations
x86 Arithmetic Instructions
22.05.2014 180 Microprocessors Architecture
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
CMP Compare two operands
22.05.2014 181 Microprocessors Architecture
Usage: CMP src1, src2
Arguments:
src1, src2 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts src2 from src1: (src1) (src2). Flags are set in
the same way as the SUB instruction does, but the result is of the
substraction is not saved.
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified acording
to the result.
Misc:
usually the next operation would be a conditional jump to perform
an operation according to the result of the comparison;
only one memory argument is allowed and both arguments have to
be of the same size
CMP Compare two operands
22.05.2014 182 Microprocessors Architecture
CMP Compare two operands
22.05.2014 183 Microprocessors Architecture
ADD Integer Addition
22.05.2014 184 Microprocessors Architecture
Usage: ADD d, s
Arguments:
dest - register or memory location
src - immediate, register or memory location; (two memory
operands cannot be used)
Effects: Adds the source to the destination:
(dest) (dest) + (src).
Flags: The CF, ZF, OF, SF, AF, and PF flags are set according
to the result.
Misc: no difference between signed and unsigned operands
ADD Integer Addition
22.05.2014 185 Microprocessors Architecture
ADD Integer Addition
22.05.2014 186 Microprocessors Architecture
ADC Add with Carry
22.05.2014 187 Microprocessors Architecture
Usage: ADC d, s
Arguments: same as for ADD
Effects: Adds the the carry flag (CF) and the source to the
destination: (d) (d) + (s) + (CF)
Flags: same as for ADD
Misc: same as for ADD
ADC Add with Carry
22.05.2014 188 Microprocessors Architecture
SBB Integer Subtraction with Borrow
22.05.2014 189 Microprocessors Architecture
Usage: SBB dest, src
Arguments:
dest 8bit or 16bit register or memory location
src 8bit or 16bit immediate value, register or memory
location;
Effects: Subtracts the carry flag and src from dest:
(dest) (dest) (src) (CF)
Flags: The CF, ZF, OF, SF, AF, and PF flags are modified
acording to the result.
Misc: only one memory argument is allowed and the
arguments have to be of the same size
SBB Integer Subtraction with Borrow
22.05.2014 190 Microprocessors Architecture
MUL Unsigned Multiplication of AL or AX
22.05.2014 191 Microprocessors Architecture
Usage: MUL src
Arguments:
src 8bit or 16bit register or memory location.
Effects:
if src is an 8-bit value: multiplies the value stored in AL by src
and stores the result in AX:
(AX) (AL) * (src)
CF and OF are set to 0 if AH is 0, otherwise they are set to 1.
if src is a 16-bit value: multiplies the value stored in AX by src
and stores the result in DX concatenated with AX:
(DX) (AX) (AX) * (src)
CF and OF are set to 0 if DX is 0, otherwise they are set to 1.
Flags: CF and OF are modified as mentioned above. The
rest of the flags are undefined.
MUL Unsigned Multiplication of AL or AX
22.05.2014 192 Microprocessors Architecture
MUL Unsigned Multiplication of AL or AX
22.05.2014 193 Microprocessors Architecture
DIV Unsigned Division
22.05.2014 194 Microprocessors Architecture
Usage: DIV src
Arguments:
src 8-bit or 16-bit register or memory location;
Effects:
if src is an 8-bit value: divides by src the value stored in AX and
stores the remainder in AH and the quotient in AL:
(AH) (AX) mod (src), (AL) (AX) div (src)
if src is a 16bit value: divides by src the value stored in DX
concatenated with AX and stores the remainder in DX and the
quotient in AX:
(DX) (DX)-(AX) mod (src), (AX) (DX)-(AX) div (src)
Flags: The CF, ZF, OF, SF, AF, and PF flags are undefined.
Misc:
if the quotient is larger than 8bits (16bits) and cannot be stored in
AX (DX AX) then a divide overflow error will be thown.
DIV Unsigned Division
22.05.2014 195 Microprocessors Architecture
DIV Unsigned Division
22.05.2014 196 Microprocessors Architecture
x86 Logic Instructions
22.05.2014 197 Microprocessors Architecture
NOT Complement
AND Logic AND
OR Logic OR
XOR Exclusive OR
SHL | SAL Shift Left (Arithmetic and Logic)
SHR Logic Shift Right
SAR Arithmetic Shift Right
ROL Rotate Left
ROR Rotate Right
RCL Rotate Left with Carry
RCR Rotate Right with Carry
TEST Compare using AND
NOT, OR, AND, XOR
22.05.2014 198 Microprocessors Architecture
I1 I2 OR
0 0 0
0 1 1
1 0 1
1 1 1
I1 I2 XOR
0 0 0
0 1 1
1 0 1
1 1 0
I1 I2 AND
0 0 0
0 1 0
1 0 0
1 1 1
I NOT
0 1
1 0
SHL, ROL, RCL
22.05.2014 199 Microprocessors Architecture
SHR and SAR
22.05.2014 200 Microprocessors Architecture
Control Flow Instructions
22.05.2014 201 Microprocessors Architecture
Exceptions in the sequential execution of instructions:
Branch to a different instruction
Conditional branch to a different instruction
Can be used to create decision structures
Conditional skip of the current/following instruction
Can be used to create inline decision structures
Counter update + conditional branch (loop)
Can be used to create repetitive structures
Return address save + branch to a different instruction (call)
Can be used for subprogram calls
x86 Control Flow Instructions
22.05.2014 202 Microprocessors Architecture
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
x86 Conditional Jump Instructions
22.05.2014 203 Microprocessors Architecture
Instruction Usage Condition Description
JA | JNBE JA label (CF)=0 AND (ZF)=0
Jump to label if above | not
below or equal
JAE | JNB | JNC JAE label (CF)=0
Jump to label if above or equal |
not below | not carry
JB | JNAE | JC JB label (CF)=1
Jump to label if below | not
above or equal | carry
JBE | JNA JBE label (CF)=1 OR (ZF)=1
Jump to label if below or equal |
not above
JG | JNLE JG label (SF)=(OF) AND (ZF)=0
Jump to label if greater | not
lower or equal
JGE | JNL JGE label (SF)=(OF)
Jump to label if greater or equal
| not lower
JL | JNGE JL label (SF)!=(OF)
Jump to label if lower | not
greater or equal
JLE | JNG JLE label (SF)!=(OF) OR (ZF)=1
Jump to label if lower or equal |
not greater
x86 Conditional Jump Instructions
22.05.2014 204 Microprocessors Architecture
Instruction Usage Condition Description
JE | JZ JE label (ZF)=1 Jump to label if equal | zero
JNE | JNZ JNE label (ZF)=0 Jump to label if not equal | not zero
JNO JNO label (OF)=0 Jump to label if not overflow
JNP | JPO JNP label (PF)=0 Jump to label if not parity | parity odd
JNS JNS label (SF)=0 Jump to label if not signed | positive
JO JO label (OF)=1 Jump to label if overflow
JP | JPE JP label (PF)=1 Jump to label if parity | parity even
JS JS label (SF)=1 Jump to label if signed | negative
x86 Loop Instructions
22.05.2014 205 Microprocessors Architecture
Instruction Usage Condition Description
LOOP
LOOP
label
(CX) != 0
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero
LOOPE |
LOOPZ
LOOPE
label
(CX) != 0
AND (ZF)=1
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero and ZF is one.
LOOPNE |
LOOPNZ
LOOPNE
label
(CX) != 0
AND (ZF)=0
Decrement CX (without modifying the
flags) and jump to label if CX is not
zero and ZF is zero.
CALL Call Subprogram
22.05.2014 206 Microprocessors Architecture
Usage: CALL dest
Arguments:
dest (target) address of the first instruction in the called
subprogram; can be an immediate value, a general purpose register
or a memory location;
Effects:
The address of the next instruction is saved in the stack and the
instruction pointer is set to the target address (the CPU performs a
jump to the subprogram):
(SP) (SP) 2, ((SS):(SP)+1) (IP
high
), ((SS):(SP)) (IP
low
)
(IP) (dest)
Flags: none
Misc: Usually there is a RET instruction in the subprogram to
return to the instruction after the call.
CALL Call Subprogram
22.05.2014 207 Microprocessors Architecture
RET Return from Subprogram
22.05.2014 208 Microprocessors Architecture
Usage: RET
Arguments: none
Effects:
The CPU pops the value in the top of the stack and uses it to
jump back to the caller program:
(IP
high
) ((SS):(SP)+1), (IP
low
) ((SS):(SP))
(SP) (SP) + 2.
Flags: none
Misc: Usually the address was placed in the stack by a call
instruction and the return is made to the address that
follows the call instruction.
RET Return from Subprogram
22.05.2014 209 Microprocessors Architecture
x86 Subprogram Calls
22.05.2014 210 Microprocessors Architecture
The CALL and RET instructions do not have input/output
parameters as arguments
There are several conventions for sending I/O parameters
Through General Purpose Registers
Through the Stack
Through the Memory
3.5 Summary
Summary
22.05.2014 212 Microprocessors Architecture
The x86 Registers
Memory Management
Memory Access. Addressing Modes
The Instruction Set
4.1 Introduction
RISC Philosophy. Motivation
22.05.2014 215 Microprocessors Architecture
DARPAs VLSI Project (70 80)
how efficient are the current microprocessors?
provided research funding to university-based teams
to improve the state of the art in microprocessor design
Studies in CPU design showed that
simplified instructions can provide higher performance if this
simplicity enables much faster execution of each instruction
a CPU with a small, highly-optimized set of instructions, can
be more efficient than a CPU with a more specialized set of
instructions
Historical Background
22.05.2014 216 Microprocessors Architecture
RISC: Reduced Instruction Set Computer
a type of microprocessor architecture that utilizes a small,
highly-optimized set of instructions instead of a more
specialized set of instructions
The first RISC projects (mid 70s and early 80s)
IBM: the IBM 801 architecture
Stanford University: Stanford MIPS architecture
University of California, Berkeley: Berkeley RISC I and II
commercialized as the SPARC architecture
Other well-known RISC architectures:
ARM, Atmel AVR, Intel i860/i960, PA-RISC, PowerPC
RISC Principles (I)
22.05.2014 217 Microprocessors Architecture
Hardwired Control Unit
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
22.05.2014 218 Microprocessors Architecture
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
4.2 The Registers
A Large Number of GPRs. Benefits
22.05.2014 220 Microprocessors Architecture
Higher processing speed thanks to a lower number of
memory accesses
Hardware data structures (stacks and queues) created with
general purpose registers
Input/output parameters to/from subprograms are
sent/received through GPRs
Increased chip uniformity factor
RISC Register Set Characteristics
22.05.2014 221 Microprocessors Architecture
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Physical vs. logical registers
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
Register Set Organization
22.05.2014 222 Microprocessors Architecture
A single set of registers
Comprising at least 32 physical registers
No logical registers
Any physical register is accessed by decoding a register code
The registers are accessed similarly to the linearly organized
memory
Register Set Organization
22.05.2014 223 Microprocessors Architecture
Multiple sets of logical registers in a single
set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is bijective
Register Set Organization
22.05.2014 224 Microprocessors Architecture
Multiple sets of logical registers, partially
overlapped, in a single set of physical registers
Each set of logical registers
Comprises at least 32 registers
Can be accessed using a pointer
Is allocated to a different program
The logical <-> physical mapping is not
bijective anymore!
The overlapping portions are called register
windows
Register Set Organization
22.05.2014 225 Microprocessors Architecture
Multiple sets of logical registers in multiple sets of physical
registers: useful for multiprocessing
Berkeley RISC II Register Set
22.05.2014 226 Microprocessors Architecture
8 sets of logical registers in a single set of 138 physical registers
Each set of logical registers (the work-set for each program)
comprises:
10 registers for global variables - shared with all programs
10 registers for local variables
6 registers for I/O parameters - shared with the calling program
6 registers for parameters - shared with the called program
22.05.2014 227 Microprocessors Architecture
1 set of physical registers (R)
8 sets of logical registers (A H)
Mapping examples:
R0 = A0 = = H0

R9 = A9 = = H9
R10 = A10 = H26

R15 = A15 = H31
R16 = A16

R25 = A25
R26 = A26 = B10

R31 = A31 = B15
4.3 The Instruction Set
RISC Instruction Set Characteristics
22.05.2014 229 Microprocessors Architecture
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Control flow instructions
Subprogram calls use register windows for parameter passing
I/O instructions
RISC Typical Addressing Modes
22.05.2014 230 Microprocessors Architecture
Register implicit addressing
Immediate addressing
Direct (absolute) addressing
Register indirect addressing
Base-relative direct addressing
Base-relative indexed addressing
Relative (to PC) addressing
Intel i860 / i960 Instruction Examples
22.05.2014 231 Microprocessors Architecture
Note: in these examples s1, s2 and d are general purpose
registers
Signed integer addition
adds s1, s2, d ;(d) (s1)+ (s2)
Memory access with two pointers
ldl.l s1(s2), d ;(d) ((s2)+ (s1))
Memory access using a constant
st.s s1, #const(s2) ;((s2)+ const) (s1)
Left shift with three operands
shl s1, s2, d; ;(d) (s2)* 2
(s1)
ARM Instruction Examples
22.05.2014 232 Microprocessors Architecture
Note: in these examples s1, s2, s3 and d are general purpose
registers
Logic AND with three operands
and d, s1, s2 ;(d) (s1)& (s2)
Memory access with pre-indexing
ldr d, [s1+#const]! ;(d) ((s1) + const)
;(s1) (s1) + const
Memory access with post-indexing
str s1, d, #8 ;((d)) (s1)
;(s1) (s1) + const
Multiply and add (four operands)
mla d, s1, s2, s3; ;(d) (s1)* (s2) + (s3)
4.4 The Timing and Control Unit
Instruction format for:
Intel x86 (CISC) microprocessors
1 15 bytes, depending on instruction complexity
Intel i860 (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Stanford MIPS (RISC) microprocessors
4 bytes, regardless of the instruction complexity
Fixed instruction format -> simpler Instruction Decoder
-> simpler Memory Addressing Unit
Simpler Instruction Decoder
22.05.2014 234 Microprocessors Architecture
The Timing Control Unit is
Micro-programmed for CISC microprocessors
Hardwired for RISC microprocessors
Example: 32bit x 32bit multiplication
Micro-programmed Control Unit
Uses the same ALU and MACU + a micro-program
Hardwired Control Unit
Uses a dedicated hardwired circuit
Hardwired Timing and Control Unit
22.05.2014 235 Microprocessors Architecture
32b x 32b CISC Multiplication
22.05.2014 236 Microprocessors Architecture
result 0
for i = 1 to 32 do
if multiplier(i) = 1
result result + multiplicand
end_if
multiplicand multiplicand * 2
end_for
32b x 32b RISC Multiplication
22.05.2014 237 Microprocessors Architecture
Micro-programmed Timing and Control Unit (CISC case)
Uses the same ALU and MACU + a memory of micro-programs
Each instruction is associated with a micro-program which
coordinates the timing of elementary actions
Variable number of states / clock cycles depending on instruction
complexity
Hardwired Timing and Control Unit (RISC case)
Uses a dedicated hardwired circuit
Each instruction is associated with a dedicated hardwired circuit
Fixed number of states / clock cycles regardless of instruction
complexity
Main drawback: the lack of flexibility; adding a new instruction
requires modifications in the hardware design
Micro-programmed vs. Hardwired TCU
22.05.2014 238 Microprocessors Architecture
Premises
All instructions (simple/complex) are executed in the same
amount of time / clock cycles
All instructions are executed in a sequence of stages; example:
fetch the instruction from the memory
decode the instruction
read the operands
execute the instruction
write the result into a register
Pipelining concept: at any moment in time the
microprocessor executes simultaneously several different
stages for several pipelined instructions; this leads to CPI=1
RISC Instructions Pipelining
22.05.2014 239 Microprocessors Architecture
Pipeline example
22.05.2014 240 Microprocessors Architecture
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
22.05.2014 241 Microprocessors Architecture
4.5 Compiler Particularities (Issues)
Compiler computer program that transforms source code
written in a high-level programming language to a lower
level language (e.g., assembly language or machine code)
The efficiency of RISC architectures, obtained through
many optimizations and simplifications, also involve strong
software-layer constraints
=> RISC machines are shipped with dedicated compilers
RISC compilers issues
Register allocation
Optimal allocation of variable values to logical registers
Pipeline correct execution and efficiency management
Data dependence, jump and load/store instructions management
RISC Compilers
22.05.2014 243 Microprocessors Architecture
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation
22.05.2014 244 Microprocessors Architecture
Although RISC machines have many registers there are
programs which require more registers than actually exist.
The register allocator must insert spill code to store some
values back into memory for part of their lifetime.
Storing/loading values in/from the memory is time
inefficient => optimal register allocation needs to be done!
Minimizing the runtime cost of spill code is a crucial
consideration in register allocation.
Register Allocation The Issue
22.05.2014 245 Microprocessors Architecture
Values used in a function: A, B, C, , F
Lifetime of the values represented as timelines over a
sequence of CPU states
Available registers: R1, R2, R3
The interference / color graph
Nodes the values A, B, C, , F
Edges indicate lifetime overlapping of the values
Labels outside nodes the register allocated for the value
The Interference / Color Graph
22.05.2014 246 Microprocessors Architecture
The Interference / Color Graph
22.05.2014 247 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 248 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 249 Microprocessors Architecture
The problem:
The instructions following JMP should not be introduced in
the pipeline
The instructions to be introduced in the pipeline are known
only after the execution of JMP (the jump address is
computed)
The solution:
The compiler should insert several NOP instructions after
every JMP instruction
The drawback: NOPs introduce delays => CPI > 1
The Pipeline and Jumps Management
22.05.2014 250 Microprocessors Architecture
The Pipeline and Jumps Management
22.05.2014 251 Microprocessors Architecture
Optimizations (code reordering) are sometimes possible:
ADD r3, r2, r1
AND r0, r5, r6
JMPZ r0, label
NOP
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
AND r0, r5, r6
JMPZ r0, label
ADD r3, r2, r1
NOP
NOP
NOP
XOR r5, r3, r2
....
label: SUB r1, r5, r6
ADD does not interfere with the execution of JMPZ,
therefore it can be moved downwards, instead of a NOP.
ANDcannot be moved because the jump is taken / not
taken depending on its result!
The Pipeline and Data Dependency
22.05.2014 252 Microprocessors Architecture
ADD r1, r2, r7
AND r6, r1, r3
The Pipeline and Data Dependency
22.05.2014 253 Microprocessors Architecture
The problem:
The value computed by ADD is not available in the
destination register (R1) when AND needs to read it
The AND reads an old value of R1 and the program does not
work correctly anymore
The solution:
The compiler should insert several NOP instructions after
every ADD instruction in order to delay the next instruction
The drawback: NOPs introduce delays => CPI > 1
The Pipeline and Data Dependency
22.05.2014 254 Microprocessors Architecture
The Pipeline and Data Dependency
22.05.2014 255 Microprocessors Architecture
Optimizations (code reordering) are sometimes possible:
MUL r8, r2, r1
SUB r0, r5, r6
ADD r1, r2, r7
NOP
NOP
AND r6, r1, r3
....
MUL and SUB do not interfere with the execution of ADD,
therefore it can be moved downwards, instead of the NOPs
Data dependency appears in the case of data processing
instructions, but also for memory access instructions:
ADD r1, r2, r7
MUL r8, r2, r1
SUB r0, r5, r6
AND r6, r1, r3
....
LOAD r0, mem
SUB r6, r1, r0
Pipeline Correct Execution and
Efficiency Management
22.05.2014 256 Microprocessors Architecture
Generally the pipeline technique can be successfully used
to execute several instruction stages simultaneously
Leads to CPI = 1
There are cases in which the compiler has to take special
measures (introduce NOPs) to assure the correct execution
of the program
Leads to CPI > 1
Some of the above measures can be optimized to increase
efficiency (CPI -> 1)
4.6 Summary
Economic advantages (translate to lower cost)
Smaller Timing and Control Unit (more than 10 times)
Increased chip uniformity factor
Shorter development time for the TCU
Technical advantages
Higher processing speed
Lower power consumption
Lower probability of hardware design errors
Smaller number of memory accesses
Simpler compiler development (on one side)
RISC Advantages
22.05.2014 258 Microprocessors Architecture
Economic drawbacks
Appeared after the CISC processors
Technical drawbacks
Longer programs (require more program memory)
Lack of flexibility for the TCU: adding a new instruction
requires modifications in the hardware design
More complex compiler development (on one side)
RISC Drawbacks
22.05.2014 259 Microprocessors Architecture
5.1 I/O Devices Organization
I/O ports are characterized by address and content
The content of a port is linked to an external peripheral
Writing a port = Sending data to the peripheral
Reading a port = Receiving data from the peripheral
I/O ports can be accessed as
Memory locations, using memory addressing instructions
Ports, using dedicated I/O instructions
I/O Ports and Peripherals
22.05.2014 262 Microprocessors Architecture
IN dest, port
Reads data from the port and stores it into dest
OUT port, src
Writes the data from src to the port
In CISC processors only the accumulator can be used as
source and destination in the dedicated I/O instructions
Dedicated I/O instructions involve
Specific machine cycles
Specific signals on the control bus (IOR and IOW)
Dedicated I/O Instructions
22.05.2014 263 Microprocessors Architecture
Port map is smaller than the memory map
The addressing modes used for the dedicated I/O
instructions are very restrictive: direct and register-indirect
Main advantage: port access is faster
Example for Intel x86
64k ports, one byte each
consecutive one-byte ports can be accessed as one-word port
in AL, 0Fh; in AX, DX
out 10h, AL; out DX, AX
Dedicated I/O Instructions
22.05.2014 264 Microprocessors Architecture
The I/O ports are mapped within the main memory and
are regarded as regular memory locations
Port access is done with regular memory addressing
instructions; consequently:
The same machine cycles are used
The same signals on the control bus (MEMR and MEMW)
The same addressing modes are used
Main advantage: port access is simpler
Drawbacks: port access is slower, a part of the memory map
is wasted on ports
Memory Mapped Ports
22.05.2014 265 Microprocessors Architecture
5.2 Typical I/O Techniques
I/O technique: microcomputer-peripheral synchronization
technique
Types:
Synchronous (with the current program) techniques
The microcomputer-peripheral communication is initiated by
the CPU (by executing specific instructions)
Asynchronous techniques
The microcomputer-peripheral communication is initiated by
the peripheral independently on the program executed by the
CPU
Typical I/O Techniques
22.05.2014 267 Microprocessors Architecture
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
The Polling Technique
22.05.2014 268 Microprocessors Architecture
Procedure:
The CPU reads periodically the state of the peripherals connected to
the ports (reads a status byte from the port)
The CPU initiates a data transfer if the peripheral is ready
Notes:
The CPU actions are triggered by instructions in the program
The status byte is read through the data bus
Main advantage: no additional hardware is required
Drawbacks:
The CPU wastes time on polling the state of the peripherals
Potential communication requests can be lost
The Polling Technique
22.05.2014 269 Microprocessors Architecture
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
22.05.2014 270 Microprocessors Architecture
pollPort24: in AL, 24h
shl AL, 1
jnc pollPort24
out 24h, AX
pollPort37: in AL, 37h
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
Interrupt-driven I/O
22.05.2014 271 Microprocessors Architecture
Procedure:
The peripheral sends an interrupt signal (through a port) on a
dedicated terminal of the microprocessor
By doing this the peripheral says it is ready for data transfer
If it is programmed to respond to interrupt signals, the CPU
interrupts its current activity and starts the data transfer
Notes:
The interrupt signal is received on a dedicated pin
The current program is halted and has nothing to do with the
data transfer
Main advantage: the CPU responds very fast to interrupts
Interrupt-driven I/O
22.05.2014 272 Microprocessors Architecture
Interrupt request (IRQ): the signal sent by the peripheral to
the CPU (on a dedicated pin) to request access to the
systems resources
Interrupt request response: a sequence of steps performed
by the CPU in response to the IRQ
Interrupt service routine (ISR) or interrupt handler: a
dedicated program (sequence of instructions) through
which the CPU responds to the IRQ of a specific peripheral
Interrupt-driven I/O. Definitions
22.05.2014 273 Microprocessors Architecture
Step 1. The CPU finishes the execution of the current instruction.
Step 2. The CPU saves the flags register in the stack.
Step 3. The CPU saves some of the general purpose registers in
the stack.
Step 4. The CPU saves the return address (the address of the next
instruction) in the stack.
Step 5. The interrupt flag (IF) is reset to disable any other
interrupts.
Step 6. The CPU jumps to the interrupt service routine (ISR).
After the ISR is executed the CPU restores all the information
saved in the stack, returns to the main program and continues
The Interrupt Request Response
22.05.2014 274 Microprocessors Architecture
5.3 Typical Interrupt Techniques
DMA is an interrupt-based I/O technique which allows a
peripheral to access the memory directly (without CPU
intervention)
Direct Memory Access (DMA)
22.05.2014 276 Microprocessors Architecture
Procedure:
The peripheral sends an interrupt signal to the DMA controller
The DMA controller sends a Bus Request (BUSRQ) signal to
the microprocessor
The CPU finishes the current machine cycle and interrupts its
activity; this is equivalent to freeing the system bus
The DMA controller is left in charge of the microcomputer and
Generates addresses for a sequence of memory locations
Manages the data transfer between the port to which the
peripheral is connected and the sequence of memory locations
Notes:
DMA interrupts have the highest priority
DMA interrupts cannot be disabled by the user
Direct Memory Access (DMA)
22.05.2014 277 Microprocessors Architecture
The NMIs interrupt procedure is the same as the one
described for the general case
Notes:
NMIs are received on another dedicated terminal (NMI)
NMIs cannot be disabled by the user
In the case of NMIs the CPU finishes the execution of the
current instruction before responding to the interrupt
The ISRs address is predefined
Non-maskable Interrupts (NMI)
22.05.2014 278 Microprocessors Architecture
As opposed to NMIs the maskable interrupts can be
disabled (ignored) by the user
The interrupt procedure is the same as the one described
for the general case
Notes:
Maskable interrupts are received on a dedicated pin (INT)
The ISRs address can be:
Predefined
Provided by the peripheral
Selected by the CPU based on a code sent by the peripheral
Maskable Interrupts (INT)
22.05.2014 279 Microprocessors Architecture
The maskable interrupts for which the ISRs address is selected
by the CPU based on a code sent by the peripheral are called
vectored interrupts.
Interrupt vector the complete address of the ISR
Interrupt Vector Table (IVT) a table (stored in the memory)
which contains all the interrupt vectors for all the ISRs
Interrupt vector selection procedure
The peripheral sends a code to the CPU; each peripheral is initially
allocated a unique code
The code is used by the CPU as an index in the IVT (to select the
corresponding interrupt vector)
Vectored Interrupts
22.05.2014 280 Microprocessors Architecture
Notes
The size of the interrupt vectors is the size of a complete
address
The size of the code sent by the peripheral depends on how
many interrupts the processor can respond to
The size of the IVT is derived by multiplying the size of one
vector by the maximum number of vectors in the IVT
The IVT is usually located in the memory at a predefined
address (usually address 0x00)
Vectored Interrupts
22.05.2014 281 Microprocessors Architecture
The size of an interrupt vector: 4 bytes(SA 2B and EA 2B)
The code sent by the peripheral has 8 bits => max 256 interrupt
types = max 256 interrupt vectors => max 256 ISRs
The size of the IVT: 256 vectors x 4B = 1024B
The IVT is stored at the beginning of the memory
Vectored Interrupts. Example (x86)
22.05.2014 282 Microprocessors Architecture
Software interrupts are special instructions in the x86
instruction set
The execution of such an instruction is identical to the CPU
response to a vectored interrupt
The code which is sent by the peripheral in the case of
vectored interrupts is provided as an operand of is implied
x86 software interrupts:
INT [code]
Interrupt with the provided code or with code=3 (if not provided)
INTO
Interrupt with code=4 only if OF is set
x86 Software Interrupts
22.05.2014 283 Microprocessors Architecture
The execution of INT 5h involves the following steps:
1. The flags register (F) is saved in the stack
2. IF and TF are set to zero (other interrupts are disabled)
3. The return address is saved in the stack
4. The interrupt vector is selected and the jump to the ISR is
performed:
CS is loaded with the value found in the memory at addresses
4*5+3 and 4*5+2
IP is loaded with the value found in the memory at addresses
4*5+1 and 4*5+0
x86 Software Interrupts. Example
22.05.2014 284 Microprocessors Architecture
The execution of INTO involves the following steps:
1. The OF flag is verified; if it is 1 then the following steps are
made, else the CPU continues with the next instruction
2. The flags register (F) is saved in the stack
3. IF and TF are set to zero (other interrupts are disabled)
4. The return address is saved in the stack
5. The interrupt vector is selected (INTO always refers to a
fixed vector) and the jump to the ISR is performed:
CS is loaded with the value found in the memory at addresses 13h
and 12h
IP is loaded with the value found in the memory at addresses 11h
and 10h
x86 Software Interrupts. Example
22.05.2014 285 Microprocessors Architecture
Historical Background
The invention of the transistor and the integrated circuit
Moores Law
The invention of the microprocessor and the microcontroller
Microprocessors Evolution Tree
General purpose microprocessors
Microcontrollers
Special purpose microprocessors (DSPs, commprocessors, )
Typical Applications
Introduction
22.05.2014 287 Microprocessors Architecture
The Structure of a Microcomputer
22.05.2014 288 Microprocessors Architecture
The CPU: executes instructions (processes data) and controls the system
The Memory: stores both the data and the instructions
The I/O Devices: interconnect the microcomputer with the outside world
Instruction Execution Example <>
22.05.2014 289 Microprocessors Architecture
The CPU is reset and starts executing instructions from a
predefined address in the memory (100h)
Reset
Execute
instructions from
address 100h
Overview of a CISC, General Purpose
Microprocessor Core
22.05.2014 290 Microprocessors Architecture
General Purpose Registers (GPRs)
Memory Data Register (MDR)
Memory Address Registers (MAR)
Arithmetic and Logic Unit (ALU)
Memory Addressing Control Unit
Timing and Control Unit (TCU)
Instruction Execution Timing <>
22.05.2014 291 Microprocessors Architecture
Typically, the execution of an instruction has several stages:
Fetch the instruction code is read from the memory
Decode the instruction code is decoded
Execute the instruction is executed (might comprise operands fetch)
Write the result is written in a register or a memory location
The instruction execution stages are called machine cycles
Any instruction is executed in one or several machine cycles (depending on
its complexity)
In a machine cycle the CPU executes sequentially several elementary
actions accomplishing a clear, well-defined task
Elementary actions are executed once every clock cycle
An internal clock signal is generated based on an external quartz oscillator
A CPU state is a physical time period equal to the duration of a clock cycle
In a state, the CPU executes one elementary action or two independent
elementary actions (in the same time)
The x86 Architecture
22.05.2014 292 Microprocessors Architecture
The x86 Registers
Types, sizes, usage, implicit functions, accessibility, etc.
Memory Management
Linear vs. segmented memory models, the x86 memory
model
Memory Access. Addressing Modes
What is an addressing mode? Comparison, examples, etc.
The Instruction Set
Instruction types, formats, examples, comparison, etc.
x86 Registers. Summary
22.05.2014 293 Microprocessors Architecture
x86 has very few registers
4 general purpose registers, 2 index registers, 2 pointer registers
Some of the x86 registers are multifunctional
x86 has 4 segment registers
special functions in memory management
All the registers are user-accessible; one exception: IP
The size of the registers is usually the size of the Internal Data
Bus
x86 Memory Segmentation. Summary
22.05.2014 294 Microprocessors Architecture
The memory can be regarded as a sequence of memory locations
Each memory location stores an 8-bit number and has a unique
20-bit address, called physical address
The x86 CPU regards the memory as being composed of 64k
segments comprising 64k locations each
The x86 CPU uses a 16-bit segment address to select a segment
and a 16-bit effective address to identify a memory location
inside the segment
The translation between the logical organization of the memory
in segments and the physical address is done as follows:
PA = SA 0h + EA
x86 Addressing Modes
22.05.2014 295 Microprocessors Architecture
Program addressing modes
Relative addressing
Direct addressing
Register indirect addressing
Data addressing modes
Several simple addressing modes
Composed addressing modes
Base-relative addressing modes
Stack-relative addressing modes
x86 Simple Data Transfer Instructions
22.05.2014 296 Microprocessors Architecture
MOV Move (Copy) Data
XCHG Exchange Data
LEA Load Effective Address
PUSH Push data in the Stack
POP Pop data out of the Stack
x86 String / Array Instructions
22.05.2014 297 Microprocessors Architecture
MOVS Move String
LODS Load String
STOS Store String
SCAS Scan String
CMPS Compare String
STD Set Direction Flag
CLD Clear Direction Flag
x86 Arithmetic Instructions
22.05.2014 298 Microprocessors Architecture
INC Increment
DEC Decrement
ADD Add
ADC Add with Carry
SUB Subtract
SBB Subtract with Borrow
MUL Multiply
DIV Divide
CMP Compare
x86 Control Flow Instructions
22.05.2014 299 Microprocessors Architecture
Unconditional branch: JMP jump
Conditional branches:
For unsigned numbers: JA|JNBE, JAE|JNB|JNC, JB|JNAE, etc.
For signed numbers: JG|JNLE, JGE|JNL, JL|JNGE, etc.
For other type of comparisons: JP, JE, JS, JO, etc.
Counter update + conditional branches:
LOOP, LOOPZ, LOOPNZ
Call and return branches:
CALL, RET
Fundamental principles
The set of registers
Specific characteristics, organization, examples, etc.
The instruction set
Characteristics, typical addressing modes, examples
The timing and control unit (TCU)
Micro-programmed vs. hardwired, instruction pipelining
Compiler particularities
Register allocation, pipeline issues
RISC Architectures. Summary
22.05.2014 300 Microprocessors Architecture
RISC Principles (I)
22.05.2014 301 Microprocessors Architecture
Hardwired Control Unit
One cycle execution time
Each instruction is hardwired to be executed in a single cycle
CPI (clocks per instruction) = 1
reduced -> the amount of work any single instruction
accomplishes is reduced
Pipelining is used
Technique that allows for simultaneous execution of parts of
instructions
Leads to a more efficient instructions processing
Large number of general purpose registers
Prevents large amounts of interactions with memory
RISC Principles (II)
22.05.2014 302 Microprocessors Architecture
Small number of instructions
Fixed instruction format(s)
Decreases the time needed to decode the instructions
Fixed instruction size
Small number of addressing modes
Leads to a small size of the addressing mode code
Memory access only through LOAD/STORE instructions
Data processing instructions cannot use memory operands
Helps to obtain the CPI=1 desiderate
RISC Register Set Characteristics
22.05.2014 303 Microprocessors Architecture
A large number of general purpose registers (more than 32)
The size of the registers is the size of the usual operands
Identical, multifunctional general purpose registers
Allows any register to be used in any context
Simplifies compiler design
Physical vs. logical registers
Not all the physical registers may be available at all times
Logical registers are mapped into physical registers by the
CPU
RISC Instruction Set Characteristics
22.05.2014 304 Microprocessors Architecture
Fewer instructions than in CISC instruction set
Simpler instructions than in CISC instruction set
Instruction types
Memory access instructions (load / store)
Arithmetic and logic processing instructions
Always with register or immediate operands
Typically without an accumulator
Control flow instructions
Subprogram calls use register windows for parameter passing
I/O instructions
If the execution of every instruction can be broken up in N
states, then one can build a pipeline structure with N stages
This leads to the simultaneous execution of N instructions
Pipelining concept: at any moment in time several
instructions are in progress of execution, in various stages
Instructions pipelining is possible because of the fact that all
instructions are executed in the same amount of time
Instructions pipelining leads to the desiderate CPI = 1
Note that pipelining does not work continuously (exceptions)
RISC Instructions Pipelining. Summary
22.05.2014 305 Microprocessors Architecture
Register allocation
the process of determining which values should be placed
into which registers and at what times during the execution of
the program
values, not variables are allocated to various registers, because
distinct uses of the same variable can be assigned to different
registers without affecting the logic of the program
Local register allocation
allocation within a very small piece of code, typically a basic
block
Global register allocation
assigns registers within an entire function
Register Allocation <>
22.05.2014 306 Microprocessors Architecture
Microprocessors I/O Techniques
22.05.2014 307 Microprocessors Architecture
I/O devices organization
ports and peripherals; dedicated instructions; memory
mapped ports
The polling technique
Interrupt-driven I/O
Definitions
Interrupt types
Vectored interrupts, x86 interrupts, software interrupts
Polling is a synchronous I/O technique: the communication
with the peripherals is initiated by the CPU
The Polling Technique
22.05.2014 308 Microprocessors Architecture
An x86 microprocessor communicates with two I/O ports
The most significant bit of the status byte indicates the
availability of the peripheral to receive data
The Polling Technique. Example
22.05.2014 309 Microprocessors Architecture
pollPort24: in AL, 24h
shl AL, 1
jnc pollPort24
out 24h, AX
pollPort37: in AL, 37h
shl AL, 1
jnc pollPort37
out 37h, AX
Do you note any potential
problems in this code?
How would you optimize it?
Interrupt-driven I/O is an asynchronous I/O technique: the
communication with the CPU is initiated by the peripheral
Interrupt-driven I/O
22.05.2014 310 Microprocessors Architecture
The Interrupt Vectors Table (IVT)
22.05.2014 311 Microprocessors Architecture
Compute the size of the interrupt vectors table provided that
The size of the memory is 1 MB (1)
Each memory location stores 8 bits (2)
The processor uses linear memory organization (3)
The code sent by the peripheral has 8 bits (4)
(1) & (2) => there are 2^20 memory locations => the PA has 20
bits (5)
(3) => the processor uses physical addresses directly (6)
(5) & (6) => the size of an interrupt vector is 3 bytes = 24 bits >
20 bits (7)
(4) => there are 256 interrupt vectors in the table
(4) & (7) => the IVT has 256 vectors x 3B = 768B

Вам также может понравиться