Comp Arch Module Final

COMPUTER
ARCHITECTURE AND
ORGANIZATION
ECEG -3143
Guide Book
FACULTY OF TECHNOLOGY
Department of Electrical & Computer Engineering
0|Page
Contents
CHAPTER 01..............................................................................................................................................1
[FUNDAMENTAL CONCEPTS OF COMPUTER ORGANIZATION & ARCHITECTURE]........................................1
Chapter Description.................................................................................................................................1
1.1 Introduction.................................................................................................................................1
1.1.1. Organization and Architecture.............................................................................................1
1.1.2. Structure and Function..............................................................................................................2
1.2 Computer Evolution and Performance...............................................................................................4
1.2.1. Brief History of Computers........................................................................................................4
1.2.2. Measuring Performance............................................................................................................5
1.2.3. Performance Improvement Techniques.....................................................................................6
CHAPTER 02..............................................................................................................................................8
[A TOP LEVEL VIEW OF COMPUTER]............................................................................................................8
Chapter Description.................................................................................................................................8
2.1. Computer Components....................................................................................................................8
Program Concept.................................................................................................................................8
What is a program?..............................................................................................................................8
Function of Control Unit......................................................................................................................8
Components........................................................................................................................................8
Computer Components: Top Level View..............................................................................................8
Some basic registers inside CPU..........................................................................................................9
2.2. Computer Function...........................................................................................................................9
Instruction Cycle..................................................................................................................................9
Example of Program Execution..........................................................................................................10
Instruction Cycle State Diagram.........................................................................................................12
Interrupts...........................................................................................................................................12
Interrupts Cycle.................................................................................................................................12
Transfer of Control via Interrupts.......................................................................................................13
Instruction Cycle with Interrupts.......................................................................................................13
Instruction Cycle (with Interrupts) - State Diagram............................................................................13
2.3. Interconnection Structures.............................................................................................................14
1
Computer Modules............................................................................................................................14
Memory Connection..........................................................................................................................14
Input/Output Connection..................................................................................................................15
CPU Connection.................................................................................................................................15
2.4. Bus Interconnection........................................................................................................................15
What is a Bus?...................................................................................................................................15
Bus Interconnection Scheme.............................................................................................................16
Bus Types...........................................................................................................................................16
Bus Arbitration...................................................................................................................................17
Centralised or Distributed Arbitration...............................................................................................17
CHAPTER 03............................................................................................................................................18
[COMPUTER ARTHIMETICS AND NUMBERING SYSTEMS]..........................................................................18
Chapter Description...............................................................................................................................18
3.1. Arithmetic and Logic unit (ALU)......................................................................................................18
3.2. Integer Representation...................................................................................................................18
Sign-Magnitude.................................................................................................................................18
Twos Compliment.............................................................................................................................19
Conversion Between different bit Lengths.........................................................................................20
3.3. Integer Arithmetic...........................................................................................................................20
Addition and Subtraction...................................................................................................................20
Multiplication....................................................................................................................................21
Division..............................................................................................................................................24
3.4. Floating Point Representation.........................................................................................................26
Real Numbers....................................................................................................................................26
IEEE Standard for Binary Floating-Point Representation....................................................................28
3.5. Floating Point Arithmetic................................................................................................................28
FP Arithmetic +/-................................................................................................................................28
FP Arithmetic x/...............................................................................................................................29
CHAPTER 04............................................................................................................................................32
[INSTRUCTION SETS AND ADDRESSING MODES].......................................................................................32
Chapter Description...............................................................................................................................32
4.1 Instruction sets.................................................................................................................................32
4.1.1 Introduction..............................................................................................................................32
2
4.1.2 Instruction Format....................................................................................................................33
4.1.3 Instruction Types.......................................................................................................................34
4.2 Addressing modes............................................................................................................................38
4.2.1 Immediate addressing modes....................................................................................................39
4.2.2 Direct addressing modes...........................................................................................................39
4.2.3 Register addressing modes........................................................................................................39
4.2.4 Register indirect addressing modes...........................................................................................39
4.2.5 Displacement addressing modes...............................................................................................40
4.2.6 Stack addressing modes............................................................................................................40
x86 addressing modes.......................................................................................................................40
CHAPTER 05............................................................................................................................................42
[PROCESSOR ORGANIZATION & INSTRUCTION CYCLE]..............................................................................42
Chapter description...............................................................................................................................42
5.1 Processor organization.....................................................................................................................42
5.2 Register Organizations.....................................................................................................................43
Types of registers...............................................................................................................................43
5.3 Instruction cycle and Pipeline..........................................................................................................44
Instruction cycle.................................................................................................................................44
Instruction Pipelining.........................................................................................................................44
CHAPTER 06............................................................................................................................................49
[COMPUTER MEMORY].............................................................................................................................49
6.1 Computer Memory System Overview........................................................................................49
6.2 Cache Memory...........................................................................................................................50
CHAPTER 7..............................................................................................................................................53
[Input/output]............................................................................................................................................53
7.1 EXTERNAL DEVICES...................................................................................................................53
7.2 I/O MODULES...............................................................................................................................53
7.2.1 I /O steps...................................................................................................................................54
7.3 I /O techniques.................................................................................................................................54
3
Table of Figure
Figure 1.1 the Computer..............................................................................................................................3

Figure 1.2 the computer top-level structure................................................................................................4
Figure 2.1 Computer components: Top level view.......................................................................................9
Figure 2.2 Instruction cycle..........................................................................................................................9
Figure 2.3 Example of Program Execution (contents of memory and registers in hexadecimal)................11
Figure 2.4 Characteristics of a Hypothetical Machine................................................................................11
Figure 2.5 Instruction Cycle State Diagram................................................................................................12
Figure 2.6 Transfer of Control via Interrupts..............................................................................................13
Figure 2.7 Instruction Cycle with Interrupts...............................................................................................13
Figure 2.8 Instruction Cycle (with Interrupts) - State Diagram..................................................................13
Figure 2.8 Computer modules...................................................................................................................14
Figure 2.9 Bus Interconnection Scheme....................................................................................................16
Figure 3.1 ALU inputs and outputs............................................................................................................18
Figure 3.2 Hardware for Addition and subtraction....................................................................................21
Figure 3.3 Flowchart for unsigned binary multiplication...........................................................................22
Figure 3.4 Hardware Implementation of Unsigned Binary Multiplication..................................................23
Figure 3.5 Booths Algorithm for Twos Complement Multiplication..........................................................24
Figure 3.5 Flowchart for Unsigned Binary Division....................................................................................25
Figure 3.6 Expressible Numbers in Typical 32-Bit Formats.........................................................................27
Figure 3.7 FP addition and subtraction......................................................................................................29
Figure 4.1 Unconditional jump program sequence....................................................................................37
Figure 4.2 Conditional jump program sequence........................................................................................38
4
5
CHAPTER 01
[FUNDAMENTAL CONCEPTS OF COMPUTER
ORGANIZATION & ARCHITECTURE]
Chapter Description
Chapter 1 introduces the concept of the computer as a hierarchical system. A computer can be viewed as
a structure of components and its function described in terms of the collective function of its
cooperating components. Each component, in turn, can be described in terms of its internal structure
and function. The major levels of this hierarchical view are introduced.
The chapter also discusses about the history of computers, Measuring Computer performance and
techniques used to improve computer performance.
1.1 Introduction
Brain storming
What is a Computer? List some of the computers you know?
This course is about the structure and function of computers. Its purpose is to present, as clearly and
completely as possible, the nature and characteristics of modern-day computers. This task is a
challenging one for two reasons.
First, there are various devices that are considered as computers. These devices (computers)
exhibit variety in cost, size, performance, application.
Second, the rapid pace of change that has always characterized computer technology continues
with no letup.
In spite of the variety and pace of change in the computer field, certain fundamental concepts apply
consistently throughout.
The intent of this course is to provide a complete discussion of the fundamentals of computer
organization and architecture and to relate these to contemporary computer design issues.
1.1.1. Organization and Architecture

Computer architecture refers to those attributes of a system visible to a programmer or, put another
way, those attributes that have a direct impact on the logical execution of a program.
Computer organization refers to the operational units and their interconnections that realize the
architectural specifications.
Examples of architectural attributes include
the instruction set,

the number of bits used to represent various data types (e.g., numbers, characters),
I/O mechanisms, and
Techniques for addressing memory.
1|Page
Organizational attributes include those hardware details transparent to the programmer, such as
control signals,
interfaces between the computer and peripherals, and
The memory technology used.
Historically, and still today, the distinction between architecture and organization has been an important
one.
This Course examines both computer organization and computer architecture. The emphasis is perhaps
more on the side of organization.
1.1.2. Structure and Function

Most Complex systems including the computer have hierarchical nature. A hierarchical system is a set of
interrelated subsystems, each of the latter, in turn, hierarchical in structure until we reach some lowest
level of elementary subsystem.
The hierarchical nature of complex systems is essential to both their design and their description. The
designer need only deal with a particular level of the system at a time. At each level, the system consists
of a set of components and their interrelationships. At each level, the designer is concerned with
structure and function:
Structure: The way in which the components are interrelated

Function: The operation of each individual component as part of the structure
The computer system will be described from the top down. We begin with the major components of a
computer, describing their structure and function, and proceed to successively lower layers of the
hierarchy.
Function
In general terms, there are only four basic functions that a computer can perform:
Data processing
Data storage
Data movement
Control
Structure
Figure 1.1 is the simplest possible depiction of a computer.
2
Figure 1.1 the Computer
But of greater concern in this course is the internal structure of the computer itself, which is shown in
Figure 1.2.
There are four main structural components:
The Central processing point (CPU)

I/O
Main Memory
System interconnection
Each of these components will be examined in some detail in other chapters.
The most interesting and in some ways the most complex component is the CPU. Its major structural
components are as follows:
Control Unit
Arithmetic and Logic Unit (ALU)
Registers
CPU interconnection
Each of these components will be also examined in some detail in chapter five Processor structure and
function.
Finally, there are several approaches to the implementation of the control unit, one common approach is
a microprogrammed implementation. With this approach, the structure of the control unit can be
depicted, as in Figure 1.2.This structure will be examined in Chapter 8.
3
Figure 1.2 the computer top-level structure
1.2 Computer Evolution and Performance

1.2.1. Brief History of Computers
Generation in computer terminology is a change in technology a computer is/was being used. Initially,
the generation term was used to distinguish between varying hardware technologies. But now a days,
generation includes both hardware and software, which together make up an entire Computer system.
There are totally five computer generations known till date. Each generation has been discussed in detail
along with their time period and characteristics. Here approximate dates against each generations have
been mentioned which are normally accepted.
Following are the main five generations of computers
First Generation
The period of first generation: 1946-1959. Vacuum tube based.
Second Generation
The period of second generation: 1959-1965. Transistor based.
Third Generations
The period of third generation: 1965-1971. Integrated Circuit based.
4
Fourth Generations
The period of fourth generation: 1971-1980. VLSI microprocessor based.
Fifth Generation
The period of fifth generation: 1980-onwards. ULSI microprocessor based
1.2.2. Measuring Performance

All computers have a clock to determine when events take place.
One period of this clock is called clock cycle time
Average number of clock cycles per instruction for a program is called clock cycle per instruction (CPI).
CPU execution time: Total time a CPU spends computing on a given task (excludes time for I/O or
running other programs). This is also referred to as simply CPU time.
Instruction count: number of instructions executed by a program
Less CPU time => Better performance
Q. Which processor has better performance, P1 or P2?
Response time: Total time to complete a task, including time spent executing on the CPU, accessing disk
and memory, waiting for I/O and other processes, and operating system overhead.
Throughput (Bandwidth): Number of tasks completed per unit time
To improve performance:
5
Reduce response time
Increase throughput
1.2.3. Performance Improvement Techniques

Obvious solution: Increase clock rate
The clock rate is the inverse of the clock cycle time.
(Increasing clock rate => reducing response time=>improved performance)
Performance can be improved by improving response time and/or throughput
Techniques that improve response time
Increasing clock rate
Cache
Techniques that improve throughput
Instruction-level parallelism (pipelining)
Multiple cores
Pipelining:
Processor fetch, decode, execute and write instructions at same time.
only improves throughput
Fetch Unit gets the next instruction from the cache.
6
Decode Unit determines type of instruction.
Instruction and data sent to Execution Unit.
Write Unit stores result.
Multiple core:
Modern microprocessors contain multiple processors (cores) on a single chip
7
CHAPTER 02
[A TOP LEVEL VIEW OF COMPUTER]
Chapter Description
This Chapter provides a brief examination of the computers components and their input-output
requirements. And it looks at key issues that affect interconnection design, especially the need to
support interrupts.
2.1. Computer Components

Program Concept
Hardwired systems are inflexible
General purpose hardware can do different tasks, given correct control signals
Instead of re-wiring, supply a new set of control signals
What is a program?
A sequence of steps
For each step, an arithmetic or logical operation is done
For each operation, a different set of control signals is needed
Function of Control Unit

For each operation a unique code is provided
e.g. ADD, MOVE
A hardware segment accepts the code and issues the control signals
Components
The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit
Data and instructions need to get into the system and results out
Input/output
Temporary storage of code and results is needed
Main memory
Computer Components: Top Level View
8
Figure 2.1 Computer components: Top level view
Some basic registers inside CPU

Program Counter (PC): Holds address of next instruction to fetch.
Instruction Register (IR): Temporarily holds fetched instruction while it is being read (decoded)
by CPU.
Memory Address Register (MAR): Specifies the address in memory of the word to be written
from or read into the MBR.
Memory Buffer (data) register (MBR): Contains a word to be stored in memory or is used to
receive a word from memory.
Input/output address register (I/O AR): specifies a particular I/O device.
Input/output buffer register (I/O BR): used for the exchange of data between an I/O module and
the CPU.
2.2. Computer Function

Instruction Cycle
Two steps:
Fetch
Execute
Figure 2.2 Instruction cycle
9
Fetch Cycle
Program Counter (PC) holds address of next instruction to fetch
Processor fetches instruction from memory location pointed to by PC
Increment PC
Unless told otherwise
Instruction loaded into Instruction Register (IR)
Processor interprets instruction and performs required actions
Execute Cycle
Processor-memory
data transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
e.g. jump
Combination of above
Example of Program Execution

Consider a simple example using a hypothetical machine that includes the characteristics listed in Figure
2.4.
Figure 2.3 illustrates a partial program execution, showing the relevant portions of memory and
processor registers. The program fragment shown adds the contents of the memory word at address 940
to the contents of the memory word at address 941 and stores the result in the latter location.
10
Figure 2.3 Example of Program Execution (contents of memory and registers in hexadecimal)
Figure 2.4 Characteristics of a Hypothetical Machine
11
Instruction Cycle State Diagram
Figure 2.5 Instruction Cycle State Diagram
Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing
Program
e.g. overflow, division by zero
Timer
Generated by internal processor timer
Used in pre-emptive multi-tasking
I/O
from I/O controller
Hardware failure
e.g. memory parity error
Interrupts Cycle
Added to instruction cycle
Processor checks for interrupt
Indicated by an interrupt signal
If no interrupt, fetch next instruction
If interrupt pending:
Suspend execution of current program
Save context
Set PC to start address of interrupt handler routine
12
Process interrupt
Restore context and continue interrupted program
Transfer of Control via Interrupts
Figure 2.6 Transfer of Control via Interrupts
Instruction Cycle with Interrupts
Figure 2.7 Instruction Cycle with Interrupts
Instruction Cycle (with Interrupts) - State Diagram
Figure 2.8 Instruction Cycle (with Interrupts) - State Diagram
13
2.3. Interconnection Structures
A computer consists of a set of components or modules of three basic types (processor, memory,
I/O) that communicate with each other. In effect, a computer is a network of basic modules.
Thus, there must be paths for connecting the modules.
The collection of paths connecting the various modules is called the interconnection structure.
Different type of connection for different type of unit
o Memory
o Input/Output
o CPU
Computer Modules
Figure 2.8 suggests the types of exchanges that are needed by indicating the major forms of input and
output for each module type:
Figure 2.8 Computer modules
Memory Connection
Receives and sends data
Receives addresses (of locations)
Receives control signals
Read
Write
14
Input/Output Connection
Similar to memory from Proccessers viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral
Send data to computer
Receive control signals from computer
Send control signals to peripherals
e.g. spin disk
Receive addresses from computer
e.g. port number to identify peripheral
Send interrupt signals
CPU Connection
Reads instruction and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
2.4. Bus Interconnection

What is a Bus?
A shared communication pathway connecting two or more devices
Usually broadcast
Often grouped
A number of channels in one bus
e.g. 32 bit data bus is 32 separate single bit channels
Power lines may not be shown
Data Bus
Carries data
15
Remember that there is no difference between data and instruction at this level
Width is a key determinant of performance
8, 16, 32, 64 bit
Address bus
Identify the source or destination of data
e.g. CPU needs to read an instruction (data) from a given location in memory
Bus width determines maximum memory capacity of system
e.g. 8080 has 16 bit address bus giving 64k address space
Control Bus
Control and timing information
Memory read/write signal
I/O read/write signal
Bus request/grant
Interrupt request
Clock signals
Bus Interconnection Scheme
Figure 2.9 Bus Interconnection Scheme
Bus Types
Dedicated
Separate data & address lines
Multiplexed
Shared lines
Address valid or data valid control line
Advantage - fewer lines
16
Disadvantages
More complex control
Reduction performance
Bus Arbitration
More than one module controlling the bus
e.g. CPU and DMA controller
Only one module may control bus at one time
Arbitration may be centralised or distributed
Centralised or Distributed Arbitration

Centralised
Single hardware device controlling bus access
Bus Controller
Arbiter
May be part of CPU or separate
Distributed
Each module may claim the bus
Control logic on all modules
17
CHAPTER 03
[COMPUTER ARTHIMETICS AND NUMBERING SYSTEMS]
Chapter Description
This chapter examines the functionality of the arithmetic and logic unit (ALU) and focuses on the
representation of numbers and techniques for implementing arithmetic operations. Processors typically
support two types of arithmetic: integer, or fixed point, and floating point. For both cases, the chapter
first examines the representation of numbers and then discusses arithmetic operations.
3.1. Arithmetic and Logic unit (ALU)

Does the calculations
Everything else in the computer is there to service this unit
Handles integers
May handle floating point (real) numbers
Figure 3.1 ALU inputs and outputs
3.2. Integer Representation

Only have 0 & 1 to represent everything
Positive numbers stored in binary
e.g. 41=00101001
No minus sign
No period
Sign-Magnitude
Twos compliment
Sign-Magnitude
Left most bit is sign bit
18
0 means positive
1 means negative
+18 = 00010010
-18 = 10010010
Problems
Need to consider both sign and magnitude in arithmetic
Two representations of zero (+0 and -0)
Twos Compliment
+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
Benefits
One representation of zero
Arithmetic works easily (see later)
Negating is fairly easy
3 = 00000011
Boolean complement gives 11111100
Add 1 to LSB 11111101
Negation Special Case 1

0= 00000000
Bitwise not 11111111
Add 1 to LSB +1
Result 1 00000000
Overflow is ignored, so:
-0=0
19
Negation Special Case 2
-128 = 10000000
bitwise not 01111111
Add 1 to LSB +1
Result 10000000
So:
-(-128) = -128 X
Monitor MSB (sign bit)
It should change during negation
Range of Numbers
8 bit 2s compliment
+127 = 01111111 = 27 -1
-128 = 10000000 = -27
16 bit 2s compliment
+32767 = 011111111 11111111 = 215 - 1
-32768 = 100000000 00000000 = -215
Conversion Between different bit Lengths

Positive number pack with leading zeros
+18 = 00010010
+18 = 00000000 00010010
Negative numbers pack with leading ones
-18 = 11101110
-18 = 11111111 11101110
i.e. pack with MSB (sign bit)
3.3. Integer Arithmetic

Addition and Subtraction
Normal binary addition
Monitor sign bit for overflow
20
Take twos compliment of substahend and add to minuend
i.e. a - b = a + (-b)
So we only need addition and complement circuits
Overflow rule
If two numbers are added, and they are both positive or both negative, then overflow occurs if and only
if the result has the opposite sign.
Hardware for Addition and Subtraction
Figure 3.2 Hardware for Addition and subtraction
Multiplication
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Example:
1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec)
1011 Partial products
0000 Note: if multiplier bit is 1 copy
1011 multiplicand (place value)
21
1011 otherwise zero
10001111 Product (143 dec)
Note: need double length result
Flowchart for Unsigned Binary Multiplication
Figure 3.3 Flowchart for unsigned binary multiplication
Execution of Example
M multiplicand Q multiplier
22
Hardware implementation of Unsigned Binary Multiplication
Figure 3.4 Hardware Implementation of Unsigned Binary Multiplication
Multiplying negative numbers

Comparison of Multiplication of Unsigned and Twos Complement Integers
This does not work!

Solution 1
Convert to positive if required
Multiply as above
If signs were different, negate answer
Solution 2
Booths algorithm
23
Booths Algorithm
Figure 3.5 Booths Algorithm for Twos Complement Multiplication
Example of Booths Algorithm

Example of Booths Algorithm (7 3)
Division
More complex than multiplication
Negative numbers are really bad!
Division of Unsigned Binary Integers

An example of the long division of unsigned binary integers.
24
Flowchart for Unsigned Binary Division
Figure 3.5 Flowchart for Unsigned Binary Division
Twos complement division

The algorithm can be summarized as follows:
Load the divisor into the M register and the dividend into the A,Q registers.
Shift A,Q left one position.
If M and A have the same signs, perform A-M else perform A+M
The preceding operation is successful if the sign of A is the same as before, after the operation,
* if the operation is successful or A = 0, then set Q 0 = 1
* if the operation is unsuccessful and A is = not 0 then set Q 0 = 0 and restore the previous value of A.
Repeat steps 2 through 4 as many times as there are bit positions in Q.
The remainder is in A. If the signs of the divisor and dividend is the same, then the quotient is Q, else it
is the twos complement of Q.
25
3.4. Floating Point Representation
Real Numbers
Numbers with fractions
Could be done in pure binary
1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
Where is the binary point?
Fixed?
Very limited
Moving?
How do you show where it is?
Floating Point
+/- .significand x 2exponent
Point is actually fixed between sign bit and body of mantissa
Exponent indicates place value (point position)
Floating Point Examples
26
Signs for Floating Point
Exponent is in excess or biased notation
e.g. Excess (bias) 127 means
8 bit exponent field
Pure value range 0-255
Subtract 127 to get correct value
Range -127 to +128
Normalization
FP numbers are usually normalized
i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1
Since it is always 1 there is no need to store it
(c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point
e.g. 3.123 x 103)
Accuracy
Accuracy
The effect of changing lsb of mantissa
23 bit mantissa 2-23 1.2 x 10-7
About 6 decimal places
Maximum Value is determined by the exponent
For comparison, Figure 3.6 indicates the range of numbers that can be represented in a 32-bit word.
27
Figure 3.6 Expressible Numbers in Typical 32-Bit Formats
IEEE Standard for Binary Floating-Point Representation

IEEE 754
Standard for floating point storage
32 and 64 bit standards
8 and 11 bit exponent respectively
Extended formats (both mantissa and exponent) for intermediate results
IEEE 754 Formats
3.5. Floating Point Arithmetic

FP Arithmetic +/-
Check for zeros
Align significands (adjusting exponents)
Add or subtract significands
Normalize result
28
FP Addition & Subtraction Flowchart
Figure 3.7 FP addition and subtraction
FP Arithmetic x/
Check for zero
Add/subtract exponents
Multiply/divide significands (watch sign)
Normalize
Round
All intermediate results should be in double length storage
29
Floating Point Multiplication Flowchart
30
Floating Point Division Flowchart
31
CHAPTER 04
[INSTRUCTION SETS AND ADDRESSING MODES]
Chapter Description
From a programmer point of view, the best way to understand the operation of a processor is to learn
the machine instruction set that it executes. So in this chapter we will study this instruction sets.
Architectural issues such as instruction set design and data types are covered.
4.1 Instruction sets
4.1.1 Introduction
Instructions?
Specify operations to be performed by a computer
Words of a computers language
Instruction set
Collection of the instructions of a computer
The complete collection of instructions that are understood by a CPU
Elements of an Instruction
Operation code (opcode)
Specifies the operation to be performed
ADD,SUB,MUL,,,,,,,,,
Addresses (operands)
Provide more information about the operation
May include:
Source operands: specify where operands come from
Destination operands: specify where results go
Next instruction reference: specifies where to fetch next instruction from
Operation Code (Opcode) Addresses (operands)
Instructions to be read by a computer contain strings of 1s and 0s (They are numbers) (Machine
instructions)
Symbolic representations of machine instructions are used for convenience (assembly language)
32
Even more convenient (High-level languages)
void main()
{
Compiler Assembler main:
int a,b,c; 0567
ADD c,a,b
c = a+b;
High-level}language Assembly language Machine language
4.1.2 Instruction Format

How long is an instruction? How many operands?
Defines the layout of the bits of an instruction in terms of its constituent fields (What does each field
represent and how many bits is it?)
Common Instruction formats:
1. Zero operand:
Opcode
2. One operand:
Opcode Address
3. Two operands:
Opcode Address1 Address2
4. Three operands:
Opcode Add.1 Add.2 Add.3
Instruction Representation
In machine code each instruction has a unique bit pattern
For human consumption (well, programmers anyway) a symbolic representation is used
33
e.g. ADD, SUB, LOAD
Opcodes are represented by abbreviations, called mnemonics that indicate the operation.
Common examples include
ADD Add
SUB Subtract
MUL Multiply
DIV Divide
LOAD Load data from memory
STOR Store data to memory
Operands can also be represented in this way
ADD A,B
Number of Addresses
One of the traditional ways of describing processor architecture is in terms of the number of addresses
contained in each instruction
4.1.3 Instruction Types

Common types:
Data transfer(Data movement)
Arithmetic
Logical
Input/output
Transfer of control(Program flow control)
System control
Data transfer
Copy values from one location to another

(E.g. MOV, LEA, IN/OUT, PUSH/POP)
34
MOV destination, source
Destination: can be register or memory location
Source: can be register, memory location or an immediate number
E.g. MOV CX, 20 place the value 20 in CX register (CX20)
MOV CX, [20] copy value at memory location 20 to CX
PUSH source
Used to transfer data to stack
Source: can be register or memory location
E.g. PUSH CX copy CX to stack
POP destination
Used to retrieve data from stack
E.g. POP BX copy data on top of stack to register BX
Arithmetic
(E.g. ADD, INC, SUB, DEC, MUL, DIV)
ADD destination, source
Source: can be register, memory location or an immediate number
E.g. ADD CX, BX (CXCX+BX)
DEC destination
E.g. DEC CX (CXCX-1)
MUL source
Source: can be register or memory location
Destination is an accumulator register, AX
35
E.g. MUL BL (AXAL x BL)
Logical
Operate on a bit-by-bit basis
(E.g. AND, OR, XOR, NOT, SHR, SHL)
E.g. AND CX, BX (CXCX AND BX)
Input /output
Instructions to read data from an input module and to write data to an output module
(E.g. IN, OUT)
IN accumulator, port OUT port, accumulator
Port: address of the I/O module (8-bits for 8086)
Transfer of control
Instructions discussed so far execute sequentially
Transfer of control instructions change the sequence of execution (update value of the program
counter (PC))
Common transfer of control instructions

Branch (Jump) instructions
Procedure call instruction
Jump instructions
There are two types of jump, unconditional and conditional in unconditional jump, as the instruction is
executed, the jump always takes place to change the execution sequence.
36
Unconditional jump
Figure 4.1 Unconditional jump program sequence
In unconditional jump, as the instruction is executed, the jump always takes place to change the execution
sequence
Conditional jump
branch is made if a certain condition is met
E.g. JZ target (Jump to target address if result of previous operation is zero)
E.g. SUB CX, 32
JZ label
label: MOV BX, 10
37
Figure 4.2 Conditional jump program sequence.
Procedure call Instructions

Instruct the processor to go and execute an entire procedure and return
CALL instruction is used to call the subroutine.
RET instruction must be included at the end of the subroutine to initiate the return
Sequence to the main program environment
4.2 Addressing modes
An addressing mode is a method of specifying an operand.

How is the address of an operand specified?
Common addressing modes
Immediate
Direct
38
Register
Register indirect
Displacement
Stack
4.2.1 Immediate addressing modes
Operand is specified in the instruction itself
E.g. MOV R1, 100 (R1100)
The operand is part of the instruction instead of the contents of a register or a Memory location
Advantage
Does not require extra memory reference to fetch the operand
Drawback
Only a constant can be supplied
The number of values is limited by the size of the operand field
4.2.2 Direct addressing modes
The value of the effective address is encoded directly in the instruction.

Memory
E.g. MOV R1, [100]
CPU 99 78
Memory address R1 96
100 96
Advantage
101 0
Requires only one memory reference
Drawback
Can address a limited number of memory locations (relatively smaller address space)
4.2.3 Register addressing modes

Register address is specified in the address field of the instruction
E.g. MOV R1, 100 (R1100)
Register Address
39
Most common addressing mode in most computers
4.2.4 Register indirect addressing modes

Register that holds memory address is specified in the address field of the instruction
E.g. MOV R1, [R2]
Memory Address
Advantage
Can address larger number of memory locations compared with direct addressing
4.2.5 Displacement addressing modes

Combines direct addressing and register indirect addressing
Main memory address is added with a displacement value to get the effective address in memory E.g.
MOV R1, [R2+100]
Displacement value
4.2.6 Stack addressing modes

An implied addressing that refers to the top of a stack
It is implied that the address is contained in side a stack pointer register
E.g. PUSH R1
40
x86 addressing modes
Register, Immediate
E.g. MOV AX, 0546
Direct
E.g. ADD AX, [0546]
Register Indirect
E.g. MOV [BX], AX
Displacement
Indexed
E.g. MOV AX, [R+0645] where R is an Index register (SI or DI)
Based
E.g. MOV AX, [R+0645] where R is a base register (BX or BP)
41
CHAPTER 05
[PROCESSOR ORGANIZATION & INSTRUCTION CYCLE]

Chapter description
This chapter is devoted to a discussion of the internal structure and function of the processor. The
chapter describes the use of registers as the CPUs internal memory and then pulls together all of the
material covered so far to provide an overview of CPU structure and function. The overall organization
(ALU, register file, control unit) is reviewed. Then the organization of the register file is discussed. The
instruction cycle is examined to show the function and interrelationship of fetch, indirect, execute, and
interrupt cycles. Finally, the use of pipelining to improve performance is explored in depth.
5.1 Processor organization

What is a processor (CPU) required to do?
Fetch and execute instructions
PC, IR Fetch Instruction From memory
Interpret (decode)
Decoding circuit Instruction
MAR, MBR [Fetch Data] From memory, I/O
ALU [Process Data]
MAR, MBR [Write Data] To memory, I/O
CPU contains:
Registers
Internal processor memory
ALU
performs arithmetic and logic operations (processes data)
Operates only on data in registers
ALU with its inputs and outputs is termed as a data path
42
Control Unit
Decodes instructions, generates control signals to control the processor
Internal Bus
Interconnects CPU parts
5.2 Register Organizations

Types of registers
User-visible registers
They can be directly accessed (read or written to) by programmers (instructions)
Used to minimize memory reference
Control registers
Used by control unit to control operation of the processor
Status (flag) registers

Indicate the current state (status) of the processor
No clean separation of registers into these categories (depends on the processor)

User-visible registers
General purpose registers

Can be used for a variety of functions
(hold data, used for addressing)
Data registers
Hold only data
e.g. Accumulator (working) register used to store intermediate ALU results
Address registers
Only used for addressing
e.g. Segment registers (SS, DS, CS and ES in x86)

Index registers (SI, DI in x86)
Stack pointer
43
Control registers
Program Counter (PC): Contains address of next instruction to be fetched
Instruction Register (IR): Temporarily holds most recently fetched instruction
Memory Address Register (MAR): Specifies the address in memory of the word to be written
from or read into the MBR
Memory Buffer Register (MBR): Contains a word to be stored in memory or is used to receive a
word from memory
Status registers
e.g. Flag register (x86), CPSR(ARM)
Flags : Indicate the occurrence of an event in the CPU
Carry flag (CF), Zero flag (ZF), Sign flag (SF), Interrupt flag (IF), Overflow flag (OF)
Used by branch (jump) instructions and interrupts (CPU checks the appropriate flags when a
conditional branch instruction is encountered or when interrupt is enabled)
5.3 Instruction cycle and Pipeline

Instruction cycle
In Section 2.2, we described the processors instruction cycle .To recall, an instruction cycle includes the
following stages:
Fetch: Read the next instruction from memory into the processor.
Execute: Interpret the opcode and perform the indicated operation.
Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process state
and service the interrupt.
Instruction Cycle with Interrupt
44
Instruction Pipelining
Review
Executiontime for a program=no .of instructions CPI clock period
Where CPI: Average clock cycle per instruction
e.g. Suppose a program has 10 instructions with the following relationship between instructions and
clock cycles required to execute each instruction
To reduced execution time:
Reduce clock period (Increase clock frequency)

(Improve response time)
Reduce CPI (execute more instructions with the same number of clock cycles)
(Improve throughput)
One approach to reduce CPI is to overlap execution of instructions (pipelining)
Pipelining
Instruction cycle has several stages (fetch, decode, execute)
Let instructions execute one after the other
(assume one clock cycle per stage (3 clock cycles per instruction) )
Clk
5 clock cycles for 3 instructions (CPI is reduced)
Additional hardware is required for a pipelined processor (pipeline registers between the stages)
45
In practice the three stages may take different times (clock cycles): execution may take more
time than decoding. This would reduce the effectiveness of the pipeline
Currently decoded instruction has to wait until previous instruction is executed

Throughput is limited by the slowest stage
If we have more stages:
The stages will be of more nearly equal duration
Program execution time is reduced more
e.g. 5-stage pipeline
Operands can be fetched from memory or from registers
Operand can be written to memory or to registers
5-stage Pipeline
Assume:
All instructions require all the five stages

Equal duration for each stage
46
Assuming one clock cycle per stage, 3 instructions would require 7 clock cycles
Pipeline Performance
Assume an instruction goes through k stages and each stage has a duration of
Without pipelining, execution time for n instructions (T) will be:

T =nk
With pipelining
T k ,n=( k + ( n1 ) )
e.g. For =1, k=5, n=10
T =5 10=50
T k ,n=( 5+ ( 101 ) ) =14
50
Speed up factor of =3.57
14
With pipelining the program is executed 3.57 times faster than without pipelining
T nk
Speed up factor (S k )= =
T k , n k + ( n1 )
47
Pipeline Hazards
Some things could go wrong on real pipelined executions
A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must stall (be idle)
because conditions do not permit continued execution
Pipeline hazards:
Resource (Structural) hazards
Data hazards
Control hazards
Resource Hazards
Occur when two or more instructions that are already in the pipeline need the same resource
o e.g. Memory access
Data Hazards
Occur when one instruction depends on data value produced by a preceding instruction
o e.g.
ADD R1,R2 (R1=1)
ADD R3,R1 (R3=3)
Such hazard is termed as read after write (RAW) hazard since current instruction must wait to
read data until after a previous instruction writes the correct data
The hazard occurs if read takes place before the write operation is complete
48
Other types of data hazards:
Write after read (WAR)
Write after write (WAW)
Approaches for handling data hazards:

Avoid hazard
Detect and stall
Detect and forward
49
CHAPTER 06
[COMPUTER MEMORY]
6.1 Computer Memory System Overview
Memory is used to store data and instructions in computers.
There are different types of memory within a computer: registers, cache, and main memory
(Primary memory), secondary memory (external memory).A computer may have all or a subset of these
memory types.
The different types of memories can be characterized by their speed, cost per bit and Capacity.
Speed: How fast can data be accessed from the memory. This is defined by the memory access time
(latency). Access time is the time from the instant that an address is presented to the memory to the
instant that data have been stored or made available for use.
The above characteristics for a certain memory type depend on the technology used to manufacture the
memory. The technologies used to manufacture the memory types mentioned above and their
characteristics is summarized below.
Cache: Uses SRAM (Static Random Access memory)
SRAM is made up of transistors (4 to 6 transistors per bit)
Fast (Approximately 2 ns access time)
Expensive ($5 per Megabyte)
Main Memory: Uses DRAM (Dynamic Random Access Memory)
DRAM is made up of transistors and capacitors (1 transistor and 1 capacitor per bit)
Slower than SRAM (Approximately 60 ns access time)
Less expensive than SRAM ($0.012 per Megabyte)
Secondary Memory: Different types (Flash memory, magnetic disks like a hard disk, optical disks like a
CD-ROM)
These memory types are the slowest
They are the least expensive
They are used when large amount of data have to be stored (also when frequent access is not necessary)
50
We want to have fast memory with big capacity. But as you can see, as the speed for a certain memory
type increases the price also increases. Having 100s of Gigabytes of the fastest memory only will be very
expensive. Therefore a hierarchy of different types of memory is used in computers.
Fig: hierarchy of different memories
Use a small array of SRAM (cache), larger DRAM (main memory) and even larger secondary memory to
fulfill the need for speed and capacity with a reasonable cost.
Secondary memory permanently holds programs and data used by the computer (it is non-volatile).
Main memory holds instructions for current programs run by the computer (it is volatile).
Cache holds a copy of portion of main memory most recently accessed by the computer. Since, according
to the principle of locality of reference, the most recently accessed memory location tend to be accessed
again soon, keeping this data in faster memory (cache) decreases the average memory access time.
The principle of locality of reference states that, if a data location is referenced, then the same location
or data locations with nearby addresses will tend to be referenced soon. This arises from natural
program structures. For example most programs contain loops, so instructions and data are likely to be
accessed repeatedly.
6.2 Cache Memory

A cache memory is logically located between a CPU and main memory (physically it is usually embedded
inside the CPU). It contains a copy of portions of memory most recently accessed by the CPU.
A processor may have a single cache or multiple levels of cache. Also there may be separate instruction
and data cache (called split caches), or a single cache to hold both instruction and data (called unified
cache).
51
Fig: cache memory
When the CPU attempts to read a word from memory, a check is made to determine if the word is in
cache. If so, the word is delivered to the processor (this is called a Hit). If the data is not in cache (this is
called a Miss), a block of memory (several memory words) consisting of that data is read into the cache
and then the required word is delivered to the CPU.
6.2.1 Cache structure
Let the main memory of a computer contain 2n addressable words, with each word having a unique n-bit
address. This memory can be divided into blocks, each block containing a number of addressable words.
Let K = the number of words per block. This implies that there are 2 n/K = M blocks in main memory as
shown in the following diagram.
A cache memory consists of multiple tag/block pairs called cache lines. Let us assume a cache has L lines.
The cache structure is shown below.
52
Each cache line contains control bits, a tag field used in addressing, and a block of memory data.
The number of cache lines is considerably less than the number of main memory blocks
(L<<M). At any time, some subset of the blocks of memory resides in lines in the cache. If a word in a
block of memory is read, that block is transferred to one of the lines of the cache.
Because there are more blocks than lines, an individual line cannot be uniquely and permanently
dedicated to a particular block. Thus, each line includes a tag that identifies which particular block is
currently being stored. The tag is usually a portion of the main memory address.
Memory (main memory) address is specified in instructions. A processor has to know where in cache to
look for a certain data, given the memory address. Therefore, the memory address specified in
instructions has to be translated into cache line number. This translation of memory address into a cache
line is termed as mapping.
There are different mapping techniques:
Direct Mapping,
Associative Mapping and
Set Associative Mapping
53
CHAPTER 7
[Input/output]
In addition to the processor and a set of memory modules, the third key element of a computer system
is a set of I/O modules. Each module interfaces to the system bus or central switch and controls one or
more peripheral devices.
7.1 EXTERNAL DEVICES

I/O operations are accomplished through a wide assortment of external devices that provide a means of
exchanging data between the external environment and the computer.
An external device connected to an I/O module is often referred to as a peripheral device or, simply, a
peripheral.
We can broadly classify external devices into three categories:
Human readable: Suitable for communicating with the computer user
Machine readable: Suitable for communicating with equipment
Communication: Suitable for communicating with remote devices
7.2 I/O MODULES
Need to Interface to CPU and Memory
Interface to one or more peripherals
The major functions or requirements for an I/O module fall into the following categories:
Control and timing
54
Processor communication
Device communication
Data buffering
Error detection
7.2.1 I /O steps
The control of the transfer of data from an external device to the processor might involve the following
sequence of steps:
CPU checks I/O module device status

I/O module returns status
If ready, CPU requests data transfer
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
7.3 I /O techniques
There are three principal I/O techniques:
Programmed I/O, in which I/O occurs under the direct and continuous control of the program requesting
the I/O operation.
interrupt-driven I/O, in which a program issues an I/O command and then continues to execute, until it
is interrupted by the I/O hardware to signal the end of the I/O operation and
Direct memory access (DMA), in which a specialized I/O processor takes over control of an I/O
operation to move a large block of data.
When the processor, main memory, and I/O share a common bus, two modes of addressing are possible:
memory mapped and isolated.
Memory mapped I/O
Uses instructions that transfers data between microprocessor & memory
Treated as memory location in the memory map
Treated as memory location in the memory map
Portion of memory is used as I/O map

Isolated I /O
I/O locations are isolated from memory system in a separate address space.
55
User can expand the memory to its full size
Data transferred between I/O and microprocessor must be access by IN/OUT instructions
In PC, isolated I/O ports are used for controlling peripheral device
56
Reference
Computer Organization and Architecture Designing for Performance: William Stallings 8 th Edition
D.A. Patterson & J.L. Hennessy - Computer Architecture
Approved by : ______________________________________________________________
57

Comp Arch Module Final

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Comp Arch Module Final

Загружено:

Авторское право:

Доступные форматы

COMPUTER

Figure 1.1 the Computer..............................................................................................................................3

What is a Computer? List some of the computers you know?

1.1.1. Organization and Architecture

Examples of architectural attributes include

the instruction set,

1.1.2. Structure and Function

Structure: The way in which the components are interrelated

There are four main structural components:

The Central processing point (CPU)

Each of these components will be examined in some detail in other chapters.

1.2 Computer Evolution and Performance

Following are the main five generations of computers

The period of first generation: 1946-1959. Vacuum tube based.

The period of second generation: 1959-1965. Transistor based.

The period of third generation: 1965-1971. Integrated Circuit based.

The period of fourth generation: 1971-1980. VLSI microprocessor based.

The period of fifth generation: 1980-onwards. ULSI microprocessor based

1.2.2. Measuring Performance

One period of this clock is called clock cycle time

Instruction count: number of instructions executed by a program

Less CPU time => Better performance

Q. Which processor has better performance, P1 or P2?

Throughput (Bandwidth): Number of tasks completed per unit time

1.2.3. Performance Improvement Techniques

The clock rate is the inverse of the clock cycle time.

(Increasing clock rate => reducing response time=>improved performance)

Performance can be improved by improving response time and/or throughput

Techniques that improve response time

Increasing clock rate

Techniques that improve throughput

Instruction-level parallelism (pipelining)

Processor fetch, decode, execute and write instructions at same time.

only improves throughput

Fetch Unit gets the next instruction from the cache.

Instruction and data sent to Execution Unit.

Write Unit stores result.

Modern microprocessors contain multiple processors (cores) on a single chip

2.1. Computer Components

Instead of re-wiring, supply a new set of control signals

For each step, an arithmetic or logical operation is done

For each operation, a different set of control signals is needed

Function of Control Unit

e.g. ADD, MOVE

Temporary storage of code and results is needed

Computer Components: Top Level View

Some basic registers inside CPU

2.2. Computer Function

Figure 2.2 Instruction cycle

Processor fetches instruction from memory location pointed to by PC

Unless told otherwise

Instruction loaded into Instruction Register (IR)

Processor interprets instruction and performs required actions

data transfer between CPU and main memory

Data transfer between CPU and I/O module

Some arithmetic or logical operation on data

Alteration of sequence of operations

Example of Program Execution

Figure 2.4 Characteristics of a Hypothetical Machine

Figure 2.5 Instruction Cycle State Diagram

e.g. overflow, division by zero

Generated by internal processor timer

Used in pre-emptive multi-tasking

from I/O controller