Вы находитесь на странице: 1из 412

Computer Organization

and Software Systems


CONTACT SESSION 1
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Introduction

Why Study COSS? SEM


I

SEM SEM
IV COSS II

SEM
III

BITS Pilani, Pilani Campus


Introduction

Hardware Data

Software

Data analytics: is the process of examining data sets in order


to draw conclusions about the information they contain,
increasingly with the aid of specialized systems and
software. BITS Pilani, Pilani Campus
Text Books and Reference Books
Text Books:
(T1) W. Stallings, Computer Organization & Architecture, PHI, 10th ed.,
2010.
(T2) A Silberschatz, Abraham and others, Operating Systems Concepts,
Wiley Student Edition, 8th Edition

Reference Books:
(R1) Patterson, David A & J L Hennenssy, Computer Organization and
Design – The Hardware/Software Interface, Elsevier, 5th Ed., 2014.
(R2) Randal E. Bryant, David R. O’Hallaron, Computer Systems – A
Programmer’s Perspective, Pearson, 3rd Ed, 2016.
(R3)Tanenbaum, Modern Operating Systems: Pearson New International
Edition, Pearson Education, 2013 (Pearson Online)
(R4)Stallings, Operating Systems: Internals and Design Principles :
International Edition, Pearson Education, 2013 (Pearson Online)
4
5/18/2020
BITS Pilani, Pilani Campus
Evaluation Scheme
5 unit course.
Sl Evaluation Duration Weightage % Nature of
No. Component Component
1 Mid Sem Exam 90 min 30% CB
2 Comprehensive 180 min 40% OB
Examination
3 Quiz ---- 5% OB
4 Assignments --- 25% OB

5
BITS Pilani, Pilani Campus
Assignments

• Two assignments:
• One pre-midsem exam : 12%
• One post-midsem : 13%
• Lab based
• Simulator to be used : CPU-OS simulator
• Open source tool
https://drive.google.com/open?id=12YUK52RQ-
JhP0ddj6CD_oifW4sTMbsBl
• Virtual lab (Platify)

BITS Pilani, Pilani Campus


Assignment should not be

BITS Pilani, Pilani Campus


General Instructions

1. Always use note book for writing important points


and for solving problems
2. Use chat box for writing subject related questions
3. Do not repeat the questions on chat box. Questions
will be answered during last 10 minutes of the
session
4. Unanswered questions will be put up on the canvas
forum

BITS Pilani, Pilani Campus


Today’s Session

Contact List of Topic Title Text/Ref


Hour Book/external
resource
1-2 Introduction to Computer Systems T1
• Hardware Organization of a computer
• Running a Hello Program
• Instruction Cycle State Diagram
• Operating System role in Managing
Hardware
• Processes
• Threads
• Virtual Memory
• Files.

BITS Pilani, Pilani Campus


Definition of a Computer

• Is a complex system

• Is a programmable device

• Must be able to process data

• Must be able to store data

• Must be able to move data

• Must be able to control above three functions

10

BITS Pilani, Pilani Campus


Computer System

• Hardware
• Central Processing Unit (CPU)
• Memory
• I/O devices
• Software
• System Software
• System Management Software
• Tools and Utilities for Developing the software
• Application Software
• General Purpose Software
• Specific Purposed Software

BITS Pilani, Pilani Campus


Hardware Organization of a
computer

BITS Pilani, Pilani Campus


Running a Hello.c Program

BITS Pilani, Pilani Campus


Reading ./hello command from Keyboard

BITS Pilani, Pilani Campus


Loading the executable from
disk into main memory

BITS Pilani, Pilani Campus


Writing the output string
from memory to the display

BITS Pilani, Pilani Campus


Why do we need to know how
compilation works?
• Optimizing program performance.
• Understanding link-time errors
• Avoiding security holes.

BITS Pilani, Pilani Campus


Von Neumann Architecture
• Three key concepts:
– Data and instructions
are stored in a single
read – write memory
– The contents of this
memory are addressable
by location, without
regard to the type of
data contained there
– Execution occurs in a
sequential fashion
(unless explicitly
modified) from one
instruction to the next
BITS Pilani, Pilani Campus
Von Neumann Architecture…

• Stored-program computers have the following


characteristics:
– Three hardware systems:
• A central processing unit (CPU)
• A main memory system
• An I/O system
– The capacity to carry out sequential instruction
processing.
– A single path between the CPU and main memory.
• This single path is known as the von Neumann
bottleneck.
• Side effect : reduced throughput (Data Rate)

BITS Pilani, Pilani Campus


Harvard Architecture

• Uses two memory systems and two separate busses


• Instruction Memory
• Data Memory

BITS Pilani, Pilani Campus


Instruction Cycle Diagram
• Instruction execution : Two steps:
– Fetch
– Execute
• Interrupt: Interrupt is checked at the end of Instruction
cycle

BITS Pilani, Pilani Campus


Fetch Cycle
• Program Counter (PC)
holds address of next
instruction to be
fetched
• Processor fetches
instruction from memory
location pointed to by PC
• Instruction loaded into
Instruction Register (IR)
• Processor interprets
instruction and performs
required actions during
execution cycle
• Increment PC
– Unless told otherwise
BITS Pilani, Pilani Campus
Execute Cycle

• Processor - memory
– Data transfer between CPU and main memory
• Processor - I/O
– Data transfer between CPU and I/O module
• Data processing
– Some arithmetic or logical operation on data
• Control
– Alteration of sequence of operations
– e.g. jump
• Combination of above Break!

BITS Pilani, Pilani Campus


Interrupt Cycle

• Interrupts: Mechanism by which other modules (e.g.


I/O) may interrupt normal sequence of processing
• Interrupts enhances processing efficiency

BITS Pilani, Pilani Campus


Program Flow Control (No Interrupts)

BITS Pilani, Pilani Campus


Program Flow Control (With Interrupts)

BITS Pilani, Pilani Campus


Contd…

• Classes interrupts:
• Program
• e.g. overflow, division by zero
• Timer
• Generated by internal processor timer
• Used in pre-emptive multi-tasking
• I/O
• from I/O controller
• Hardware failure
• e.g. memory parity error

BITS Pilani, Pilani Campus


Interrupt Cycle

• Processor checks for interrupt


– Indicated by an interrupt signal
• If no interrupt, fetch next instruction
• If interrupt pending:
– Suspend execution of current program
– Save context
– Set PC to start address of interrupt handler
routine
– Process interrupt
– Restore context and continue interrupted program

BITS Pilani, Pilani Campus


Transfer of Control via Interrupts

Reg
A
X1
B
X2
C

BITS Pilani, Pilani Campus


Instruction Cycle - State Diagram

BITS Pilani, Pilani Campus


Memory Hierarchy

BITS Pilani, Pilani Campus


Role of Cache Memory

BITS Pilani, Pilani Campus


Operating System
• collection of software/ Program that acts as an intermediary
between an user of a computer and the computer hardware.
• is a program that helps to run all the other programs
• Three main functions:
• Resource management
• Establish a user interface
• Execute and provide services for application software

BITS Pilani, Pilani Campus


Main objectives
• Convenience
• Efficiency
• Ability to evolve and offer new services
• Maximize System performance
• Protection and access control
• Footprint of OS should be small

BITS Pilani, Pilani Campus


Important Note

• “The one program running at all times on the


computer” is the kernel. Everything else is either a
system program (ships with the operating system) or
an application program
• bootstrap program is loaded at power-up or reboot
– Typically stored in ROM or EPROM, generally
known as firmware
– Initializes all aspects of system
– Loads operating system kernel and starts
execution

BITS Pilani, Pilani Campus


Operating System Operations
• Dual-mode operation
• User mode
• Kernel mode ( also known as System Mode / Supervisor
mode/ privileged mode )
• User mode(0):
• user program executes in user mode
• certain areas of memory are protected from user access
• certain privileged instructions may not be executed
• Kernel Mode (1)
• privileged instructions may be executed
• protected areas of memory may be accessed

BITS Pilani, Pilani Campus


Transition from user to
kernel mode

BITS Pilani, Pilani Campus


Services provided by the OS

• Process Management
• Memory Management
• Storage Management
• Protection and security

BITS Pilani, Pilani Campus


Process
 A program in execution
 Needs resources : CPU,
memory, I/O devices
 A program becomes process
when executable file loaded
into memory
 Process execution must
progress in sequential fashion
 word processor, a Web
browser and an e-mail package
are different processes.
 Types: System processes and
user processes

BITS Pilani, Pilani Campus


Threads
• A thread is a single sequence stream within in a process

BITS Pilani, Pilani Campus


Virtual Memory

• Virtual memory is a technique that allows the


execution of processes that are not completely in
memory.
• Motivation:
• Programs often have code to handle unusual error conditions which is almost
never executed.
• Arrays, lists, and tables are often allocated more memory than they actually
need.
• Certain options and features of a program may be used rarely
• Principle of locality

BITS Pilani, Pilani Campus


Files

• Is a sequence of bytes
• I/O device such as disks, keyboards, displays, and even
networks, is modeled as a file
• Reading and writing requires set of system calls

BITS Pilani, Pilani Campus


Computer Organization
and Software Systems
CONTACT SESSION 2
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Today’s Class

Last Class…
Contact List of Topic Title Text/Ref
Hour Book/external
resource
Performance Assessment Class Slides
3 MIPS Rate
Amdahl’s Law
Memory Organization T1, R2
4 Storage Technologies
Random Access Memory
Disk Storage
Solid State Disks
Storage Technology Trends
Locality
Locality of Reference to Program Data
Locality of instruction fetches
Memory Hierarchy

BITS Pilani, Pilani Campus


Performance Assessment
BITS Pilani
Pilani Campus
Units

• Kilo- (K) = 1 thousand = 103 and 210


• Mega- (M) = 1 million = 106 and 220
• Giga- (G) = 1 billion = 109 and 230
• Tera- (T) = 1 trillion = 1012 and 240
• Peta- (P) = 1 quadrillion = 1015 and 250
• Exa – (E) = 1 quintillion = 1018 and 260

BITS Pilani, Pilani Campus


Examples

Hertz = clock cycles per second (frequency)


– 1MHz = 1,000,000Hz
– Processor speeds are measured in MHz or GHz.

Byte = a unit of storage


– 1KB = 210 = 1024 Bytes
– 1MB = 220 = 1,048,576 Bytes
– Main memory (RAM) is measured in MB / GB
– Disk storage is measured in GB for small systems,
TB for large systems.

BITS Pilani, Pilani Campus


Units…

• Milli- (m) = 1 thousandth = 10 -3


• Micro- () = 1 millionth = 10 -6
• Nano- (n) = 1 billionth = 10 -9
• Pico- (p) = 1 trillionth = 10 -12
• Femto- (f) = 1 quadrillionth = 10 -15

BITS Pilani, Pilani Campus


Examples

• Millisecond = 1 thousandth of a second


– Hard disk drive access times are often 10 to 20
milliseconds.
• Nanosecond = 1 billionth of a second
– Main memory access times are often 50 to 70
nanoseconds.
• Micron (micrometer) = 1 millionth of a meter
– Circuits on computer chips are measured in
microns.

BITS Pilani, Pilani Campus


Important Terms
• Execution time : The
total time required for
the computer to
complete a task,
including disk accesses,
memory accesses, I/O
activities, operating
system overhead, CPU
execution
• Throughput or
bandwidth :number of
tasks completed per
unit time.
BITS Pilani, Pilani Campus
Example
Do the following changes to a computer system, increase
throughput, decrease execution time, or both?
1. Replacing the processor in a computer with a faster
version
2. Adding additional processors to a system that uses
multiple processors for separate tasks

BITS Pilani, Pilani Campus


Contd…
• Relationship between Performance and execution time of
Computer X

• if the performance of X is greater than the performance


of Y, we have

BITS Pilani, Pilani Campus


Contd…
• Quantitative performance analysis
• Computer X is “n” times faster than Computer Y

• If X is n times faster than Y, then the execution time


on Y is n times longer than it is on X

BITS Pilani, Pilani Campus


Example
• If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?

• Computer A is therefore 1.5 times faster than B.

BITS Pilani, Pilani Campus


CPU performance and its factors

• CPU execution time for a program:

BITS Pilani, Pilani Campus


Example

• Our favorite program runs in 10 seconds on computer A,


which has a 2 GHz clock. We are trying to help a computer
designer build a computer, B, which will run this program in
6 seconds. The designer has determined that a substantial
increase in the clock rate is possible, but this increase will
affect the rest of the CPU design, causing computer B to
require 1.2 times as many clock cycles as computer A for
this program. What clock rate should we tell the designer to
target?

BITS Pilani, Pilani Campus


Computer A Computer B
Execution TimeA = 10s Execution TimeB = 6s
Clock RateA = 2x109 Hz CPU Clock CyclesB = 1.2xClock CycleA
CPU Clock CycleA = ? Clock Rate B

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Pilani Campus
Instruction Performance

• CPI: Clock cycles Per Instruction


• Average number of clock cycles per instruction for
a program or program fragment.

BITS Pilani, Pilani Campus


Example

Computer A has a clock cycle time of 250 ps and a CPI


of 2.0 for some program, and computer B has a clock
cycle time of 500 ps and a CPI of 1.2 for the same
program. Which computer is faster for this program
and by how much?

BITS Pilani, Pilani Campus


Solution
• the number of processor clock cycles for each computer
CPU clock cyclesA = I × 2.0
CPU clock cyclesB = I × 1.2
• Execution time for each computer
Execution time = CPU clock cycles × Clock cycle time
Execution timeA = I × 2.0 × 250 ps = 500 × I ps
Execution timeB = I × 1.2 × 500 ps = 600 × I ps
• Comparison:
CPU performanceA Execution timeB 600 I ps
--------------------------- = ---------------------- = -------------- = 1.2
CPU performanceB Execution timeA 500 I ps

BITS Pilani, Pilani Campus


Amdahl’s Law

• proposed by Gene Amdahl in 1967


• deals with the potential speedup of a program using
multiple processors compared to a single processor

BITS Pilani, Pilani Campus


Memory Organization
BITS Pilani
Pilani Campus
Semiconductor Memory

BITS Pilani, Pilani Campus


Random-Access Memory (RAM)

• Key features
– RAM is traditionally packaged as a
chip.
– Basic storage unit is normally a cell
(one bit per cell).
– Multiple RAM chips form a
memory.
• RAM comes in two varieties:
– SRAM (Static RAM)
– DRAM (Dynamic RAM)
• SRAM and DRAM are volatile
memories
– Lose information if powered off. BITS Pilani, Pilani Campus
SRAM vs DRAM Summary

Trans. Access Needs Needs


per bit time refresh? EDC? Cost Applications

SRAM 4 to 6 1X No Maybe 100x Cache


DRAM 1 10X Yes Yes 1X Main memories,
frame buffers
BITS Pilani, Pilani Campus
Read Only Memory

• Permanent Storage and Nonvolatile Memories


• Read Only Memory Variants:
– Read-only memory (ROM): programmed during production
– Programmable ROM (PROM): can be programmed once
– Erasable PROM (EPROM): can be bulk erased (UV, X-Ray)
– Electrically erasable PROM (EEPROM): electronic erase
capability
– Flash memory: EEPROMs. with partial (block-level) erase
capability
• Wears out after about 100,000 erasing
• Firmware
BITS Pilani, Pilani Campus
Applications
• Storing fonts for printers
• Storing sound data in musical instruments
• Video game consoles
• Implantable Medical devices.
• High definition Multimedia Interfaces(HDMI)
• BIOS chip in computer
• Program storage chip in modem, video card and many
electronic gadgets, controllers for disks, network
cards, ….

BITS Pilani, Pilani Campus


Memory Read Operation (1)

CPU places address A and then read control signal on


the memory bus
Register file Load operation: MOV R4, A
R4  [A]
ALU
R4

Main memory
I/O bridge 0
A
Bus interface x A

BITS Pilani, Pilani Campus


Memory Read Operation (2)

Main memory reads A from the memory bus, retrieves


word x, and places it on the bus
Load operation: MOV R4, A
R4 [A]
Register file

ALU
R4
Main
memory
I/O bridge x 0

Bus interface x A

BITS Pilani, Pilani Campus


Memory Read Operation (3)

CPU read word x from the bus and copies it into


register R4.
Load operation: MOV R4, A
R4 [A]

Register file

ALU
R4 x

Main memory
I/O bridge 0

Bus interface x A

BITS Pilani, Pilani Campus


Memory Write Operation (1)

CPU places address A and WRITE control signal on bus.


Main memory reads them and waits for the
corresponding data word to arrive.
Load operation: MOV A, R4
Register file [A]  R4

ALU
R4 y

Main memory
I/O bridge 0
A
Bus interface A

BITS Pilani, Pilani Campus


Memory Write Operation (2)

CPU places data word y on the bus

Register file Load operation: MOV A, R4


[A]  R4
ALU
R4 y
Main memory
0
I/O bridge y
Bus interface A

BITS Pilani, Pilani Campus


Memory Write Operation (3)

Main memory reads data word y from the bus and stores
it at address A.

Load operation: MOV A, R4


Register file [A]  R4

ALU
R4 y

main memory
I/O bridge 0

Bus interface y A

BITS Pilani, Pilani Campus


Magnetic Disk Drive
Spindle
Arm
Platters

Actuator

Electronics
(including a
processor
and memory!)
SCSI
connector

Image courtesy of Seagate Technology


BITS Pilani, Pilani Campus
Disk Geometry Cylinder k

Surface 0
• Disks consist of Surface 1
Platter 0

platters, each with Surface 2


Platter 1
two surfaces. Surface 3
Surface 4
• Each surface consists Surface 5
Platter 2

of concentric rings
called tracks Tracks
Spindle
Surface
• Aligned tracks form a Track k Gaps
cylinder
• Each track consists of Spindle
sectors separated by
gaps.
Sectors

BITS Pilani, Pilani Campus


Disk Capacity
• Capacity: maximum number of bits that can be stored.
– Vendors express capacity in units of gigabytes (GB
/TB), where 1 GB = 230 Bytes, 1 TB = 240 Bytes,
• Capacity is determined by these technology factors:
– Recording density (bits/in): number of bits that
can be squeezed into a 1 inch segment of a track.
– Track density (tracks/in): number of tracks that
can be squeezed into a 1 inch radial segment.
– Areal density (bits/in2): product of recording and
track density.

BITS Pilani, Pilani Campus


Recording zones
• Modern disks partition tracks
into disjoint subsets called
recording zones
– Each track in a zone has
the same number of
sectors, determined by the
circumference of innermost
track.
– Each zone has a different
number of sectors/track,
outer zones have more
sectors/track than inner
zones.
– So we use average number
of sectors/track when
computing capacity.
BITS Pilani, Pilani Campus
Computing Disk Capacity
• Capacity = (# bytes/sector) x (avg. # sectors/track) x
(# tracks/surface) x (# surfaces/platter) x
(# platters/disk)
• Example:
– 512 bytes/sector
– 300 sectors/track (on average)
– 20,000 tracks/surface
– 2 surfaces/platter
– 5 platters/disk

• Capacity = 512 x 300 x 20000 x 2 x 5


= 30,720,000,000
= 28.61 GB

BITS Pilani, Pilani Campus


Disk Operation (Single-Platter View)
The disk
The read/write head
surface
is attached to the end
spins at a fixed
of the arm and flies over
rotational rate
the disk surface on
a thin cushion of air.

spindle
spindle
spindle
spindle

By moving radially, the arm


can position the read/write
head over any track.

BITS Pilani, Pilani Campus


Disk Operation (Multi-Platter View)

Read/write heads
move in unison
from cylinder to cylinder

Arm

Spindle

BITS Pilani, Pilani Campus


Disk Access

Need to access a sector


colored in blue

BITS Pilani, Pilani Campus


Disk Access

Head in position above a track

BITS Pilani, Pilani Campus


Disk Access

Rotate the platter in counter-


clockwise direction

BITS Pilani, Pilani Campus


Disk Access – Read

About to read blue sector

BITS Pilani, Pilani Campus


Disk Access – Read

After BLUE
read

After reading blue sector

BITS Pilani, Pilani Campus


Disk Access – Read

After BLUE
read

Red request scheduled next

BITS Pilani, Pilani Campus


Disk Access – Seek

After BLUE Seek for RED


read

Seek to red’s track

BITS Pilani, Pilani Campus


Disk Access – Rotational Latency

After BLUE Seek for RED Rotational latency


read

Wait for red sector to rotate around

BITS Pilani, Pilani Campus


Disk Access – Read

After BLUE Seek for RED Rotational latency After RED read
read

Complete read of red

BITS Pilani, Pilani Campus


Disk Access – Access Time Components

After BLUE Seek for RED Rotational latency After RED read
read

Data transfer Seek Rotational Data transfer


latency

BITS Pilani, Pilani Campus


Disk Access Time
• Average time to access some target sector given by :
Taccess = Tavg_seek + Tavg_rotation + Tavg_transfer
• Seek time (Tavg_seek )
– Time to position heads over cylinder containing target sector.
– Typical Tavg_seek is 3—9 ms
• Rotational latency (Tavg_rotation)
– Time waiting for first bit of target sector to pass under r/w head.
– Tavg_rotation = 1/2r , where r is rotation Speed in revolution per
Second
– Tavg_rotation = 1/2r , given 7200 RPM then r = 7200/60 RPS
• Transfer time (Tavg_transfer )
– Time to read the bits in the target sector.
– Tavg_transfer = b/rN, where b is the number of bytes to be transferred
and N is the average number of bytes on a track
BITS Pilani, Pilani Campus
Disk Access Time Example
Given:
– Rotational rate = 7,200 RPM
– Average seek time = 9 ms.
– Avg # sectors/track = 400.
– 512 bytes per sector
Derived:
– Tavg rotation = 1/2r = 1/2 x (60 secs/7200 RPM)
= 0.00416 = 4.16ms.
– Tavg transfer =b/rN
= 512 x 60/7200 x 1/(400*512)
= 0.02 ms
– Taccess = 9 ms + 4.16ms + 0.02 ms
BITS Pilani, Pilani Campus
Important points

• Access time dominated by seek time and rotational


latency.
• First bit in a sector is the most expensive, the rest
are free.
• SRAM access time is about 4 ns/doubleword, DRAM
about 60 ns
– Disk is about 40,000 times slower than SRAM,
– 2,500 times slower then DRAM.

BITS Pilani, Pilani Campus


Logical Disk Blocks
• Modern disks present a simpler abstract view of the
complex sector geometry:
– The set of available sectors is modeled as a sequence of
logical blocks (0, 1, 2, ...B-1)
• Mapping between logical blocks and actual (physical) sectors
– Maintained by hardware/firmware device called disk
controller.
– Converts requests for logical blocks into (surface,
track,sector) triples.
• Allows controller to set aside spare cylinders for each zone.
– Accounts for the difference in “formatted capacity” and
“maximum capacity”.
BITS Pilani, Pilani Campus
I/O Bus
CPU chip

Register file

ALU
System bus Memory bus

I/O Main
Bus interface
bridge memory

I/O bus Expansion slots for


other devices such
USB Graphics Disk as network adapters.
controller adapter controller

Mouse Keyboard Monitor


Disk
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Reading a Disk Sector (1)
CPU chip
CPU initiates a disk read by writing a
Register file command, logical block number, and
destination memory address to a port
ALU
(address) associated with disk controller.

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

mouse keyboard Monitor


Disk
BITS Pilani, Pilani Campus
Reading a Disk Sector (2)
CPU chip
Disk controller reads the sector and
Register file
performs a direct memory access (DMA)
transfer into main memory.
ALU

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

Mouse Keyboard Monitor


Disk
BITS Pilani, Pilani Campus
Reading a Disk Sector (3)
CPU chip When the DMA transfer completes, the
Register file disk controller notifies the CPU with an
interrupt (i.e., asserts a special
ALU “interrupt” pin on the CPU)

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

Mouse Keyboard Monitor


Disk
BITS Pilani, Pilani Campus
Solid State Disks (SSDs)
I/O bus

Requests to read and


write logical disk blocks
Solid State Disk (SSD)
Flash
translation layer
Flash memory
Block 0 Block B-1
Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1

• Pages: 512KB to 4KB, Blocks: 32 to 128 pages


• Data read/written in units of pages.
• Page can be written only after its block has been erased
• A block wears out after about 100,000 repeated writes.

BITS Pilani, Pilani Campus


SSD Performance Characteristics

Sequential read tput* 550 MB/s Sequential write tput 470 MB/s
Random read tput 365 MB/s Random write tput 303 MB/s
Avg seq read time 50 us Avg seq write time 60 us

Tput  Throughput
• Sequential access faster than random access
– Common theme in the memory hierarchy
• Random writes are somewhat slower
– Erasing a block takes a long time (~1 ms)
– Modifying a block page requires all other pages to be
copied to new block
– In earlier SSDs, the read/write gap was much larger.
Source: Intel SSD 730 product specification.
BITS Pilani, Pilani Campus
SSD Tradeoffs vs Rotating Disks
• Advantages
– No moving parts  faster, less power, more rugged
• Disadvantages
– Have the potential to wear out
• Mitigated by “wear leveling logic” in flash translation
layer
• E.g. Intel SSD 730 guarantees 128 petabyte (128 x
1015 bytes) of writes before they wear out
– In 2015, about 30 times more expensive per byte
• Applications
– MP3 players, smart phones, laptops
– Beginning to appear in desktops and servers
BITS Pilani, Pilani Campus
The Bottom Line

• How much?
– Capacity
• How fast?
– Time is money
• How expensive?

•Faster access time , ------------ cost per bit


•Greater capacity, --------------- cost per bit
•Greater capacity, ---------------access time
BITS Pilani, Pilani Campus
Memory Hierarchy

• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store

BITS Pilani, Pilani Campus


Performance enhancement -
Motivation

BITS Pilani, Pilani Campus


Performance enhancement -
Motivation

Locality of Reference
During the course of the execution of a
program, memory references tend to
cluster
• Temporal locality: Locality in time
• If an item is referenced, it will tend
to be referenced again soon
• Spatial locality: Locality in space
• If an item is referenced, items
whose addresses are close by will
tend to be referenced soon.

BITS Pilani, Pilani Campus


Example
product = 1;
for ( i = 0; i < n-1; i++)
product = product * a[i] ;

Data :
• Access array elements in succession –
spatial locality
• Reference to “product” in each iteration
– Temporal locality
Instructions :
• Reference instructions in sequence :
Spatial locality
• Looping through : Temporal locality BITS Pilani, Pilani Campus
Performance enhancement -
Motivation

BITS Pilani, Pilani Campus


Computer Organization
and Software Systems
CONTACT SESSION 3
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Today’s Session
Contact List of Topic Title Text/Ref
Hour Book/external
resource

5-6 Memory Organization (Contd..) T1


• Solid State Devices
• Locality
• Locality of Reference to Program Data
• Locality of instruction fetches
• Memory Hierarchy
• Cache Memories
• Generic Cache Memory Organization
• Direct-Mapped Caches
• Fully Associative Caches
BITS Pilani, Pilani Campus
Solid State Disks (SSDs)
I/O bus

Requests to read and


write logical disk blocks
Solid State Disk (SSD)
Flash
translation layer
Flash memory
Block 0 Block B-1
Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1

• Pages: 512KB to 4KB, Blocks: 32 to 128 pages


• Data read/written in units of pages.
• Page can be written only after its block has been erased
• A block wears out after about 100,000 repeated writes.

BITS Pilani, Pilani Campus


SSD Performance Characteristics

Sequential read tput* 550 MB/s Sequential write tput 470 MB/s
Random read tput 365 MB/s Random write tput 303 MB/s
Avg seq read time 50 us Avg seq write time 60 us

Tput  Throughput
• Sequential access faster than random access
– Common theme in the memory hierarchy

• Random writes are somewhat slower


– Erasing a block takes a long time (~1 ms)
– Modifying a block page requires all other pages to be copied to new block
– In earlier SSDs, the read/write gap was much larger.

Source: Intel SSD 730 product specification.


BITS Pilani, Pilani Campus
SSD Tradeoffs vs Rotating Disks
• Advantages
– No moving parts  faster, less power, more rugged

• Disadvantages
– Have the potential to wear out

• Mitigated by “wear leveling logic” in flash translation


layer
• E.g. Intel SSD 730 guarantees 128 petabyte (128 x
1015 bytes) of writes before they wear out
– In 2015, about 30 times more expensive per byte

• Applications
– MP3 players, smart phones, laptops
– Beginning to appear in desktops and servers

BITS Pilani, Pilani Campus


The Bottom Line

• How much?
– Capacity
• How fast?
– Time is money
• How expensive?

•Faster access time , ------------ cost per bit


•Greater capacity, --------------- cost per bit
•Greater capacity, ---------------access time
BITS Pilani, Pilani Campus
Memory Hierarchy

• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store

BITS Pilani, Pilani Campus


Performance enhancement -
Motivation

BITS Pilani, Pilani Campus


Performance enhancement -
Motivation

Locality of Reference
During the course of the execution of a
program, memory references tend to
cluster
• Temporal locality: Locality in time
• If an item is referenced, it will tend
to be referenced again soon
• Spatial locality: Locality in space
• If an item is referenced, items
whose addresses are close by will
tend to be referenced soon.

BITS Pilani, Pilani Campus


Example
product = 1;
for ( i = 0; i < n-1; i++)
product = product * a[i] ;

Data :
• Access array elements in succession –
spatial locality
• Reference to “product” in each iteration
– Temporal locality
Instructions :
• Reference instructions in sequence :
Spatial locality
• Looping through : Temporal locality BITS Pilani, Pilani Campus
Performance enhancement -
Motivation

BITS Pilani, Pilani Campus


Cache
BITS Pilani
Pilani Campus
Cache
• Small, fast memory
• Sits between normal main memory and CPU
• May be located on CPU chip or separate module

BITS Pilani, Pilani Campus


Cache and Main Memory Structure

BITS Pilani, Pilani Campus


START
Cache Read Operation
Receive address (RA)
From CPU

Is block No Access main memory


containing RA for block containing
word in Cache? RA word
Yes
Fetch RA word and Allocate cache line
deliver to CPU for main memory
block

Load main memory Deliver RA word to


block in to cache line CPU
DONE
Performance of cache

• Hit ratio : Number of Hits / total references


to memory
• Hit
• Miss

BITS Pilani, Pilani Campus


Read Hit

BITS Pilani, Pilani Campus


Read Miss

Miss Penalty BITS Pilani, Pilani Campus


Write hit

BITS Pilani, Pilani Campus


Write miss

BITS Pilani, Pilani Campus


Mapping Function
• How memory blocks are mapped to cache lines
• Three types
– Direct mapping
– Associative mapping
– Set Associative mapping

BITS Pilani, Pilani Campus


Direct Mapped Cache

• 16 Bytes main memory


– How many address
bits are required?
• Memory block size is 4
bytes
• Cache of 8 Byte
– How many cache lines?
– cache contains 2 lines
( 4 bytes per Line)

BITS Pilani, Pilani Campus


Direct Mapped Cache
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific
place
– i = j modulo m
where i = cache line number
j = main memory block no.
m = no.of lines in the
cache

BITS Pilani, Pilani Campus


Direct Mapped Cache

i = j modulo m

• Address is split in three parts:


• Tag
• Line
• Word BITS Pilani, Pilani Campus
Direct Mapped Cache Organization

BITS Pilani, Pilani Campus


Direct mapped cache- Summary

Address length = (s+w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2s+w / 2w = 2s
Number of lines in cache = m = 2r
Size of tag =(s-r) bits

BITS Pilani, Pilani Campus


Direct mapped cache- Summary

1 1 2

Tag s-r Line or Slot r Word w

BITS Pilani, Pilani Campus


Direct mapped cache-pros & cons

• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high

BITS Pilani, Pilani Campus


Problem 1 : Direct Mapped Cache
Given :
• Cache of 64KByte, Cache block of 4 bytes
• 16MBytes main memory
Find out
a) Number of bits required to address the main
memory
b) Number of blocks in main memory
c) Number of cache lines
d) Number of bits required to identify a word (byte) in
a block
e) Number of bits to identify a block
f) Tag, Line, Word
BITS Pilani, Pilani Campus
Solution 1
Given :
• Cache of 64kByte, Cache block of 4 bytes
• 16MBytes main memory

Find out
a) Number of bits required to address the main memory

b) Number of blocks in main memory

c) Number of cache lines

BITS Pilani, Pilani Campus


Solution 1
Given :
• Cache of 64kByte, Cache block of 4 bytes
• 16MBytes main memory

Find out
d) Number of bits required to identify a word (byte) in a
block?

e) Number of bits required to identify a block

f) Tag, Line, Word

Tag s-r Line r Word w


8 14 2
BITS Pilani, Pilani Campus
Problem 2

BITS Pilani, Pilani Campus


Solution 2

BITS Pilani, Pilani Campus


Solution 2

BITS Pilani, Pilani Campus


Solution 2

BITS Pilani, Pilani Campus


Problem 3
Consider a direct-mapped cache
with 64 cache lines and a block
size of 16 bytes and main
memory of 8K (Byte
addressable memory) . To what
line number does byte address
1200H map?

BITS Pilani, Pilani Campus


Problem 4

The system uses a L1 cache with


direct mapping and 32-bit
address format is as follows:
bits 0 - 3 = offset (word)
bits 4 - 14 = index bits (Line)
bits15 - 31 = tag
a) What is the size of cache line?
b) How many Cache lines are
there?
c) How much space is required to
store the tags in the L1 cache?
d) What is the total Capacity of
cache including tag storage?
BITS Pilani, Pilani Campus
Problem 5

• 16 Bytes main memory,


Memory block size is 4 bytes,
Cache of 8 Byte (cache is 2
lines of 4 bytes each )
• Block access sequence :
0 2 0 2 2 0 0 2 0 0 0 2 1
• Find out hit ratio.

BITS Pilani, Pilani Campus


Problem 6

Suppose a 1024-byte cache has an access time of 0.1


microseconds and the main memory stores 1 Mbytes
with an access time of 1 microsecond. A referenced
memory block that is not in cache must be loaded into
cache .
Answer the following questions:
a) What is the number of bits needed to address the
main memory?
b) If the cache hit ratio is 95%, what is the average
access time for a memory reference?

BITS Pilani, Pilani Campus


Solution 6

a) 20 bits

b) If the cache hit ratio is 95%, what is the average


access time for a memory reference?
Avg access time = hit ratio * cache access +
(1- hit ratio) * (cache access + memory access)

BITS Pilani, Pilani Campus


Associative Mapping

• A main memory block can load into any line of


cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive

BITS Pilani, Pilani Campus


Associative Cache Organization

BITS Pilani, Pilani Campus


Associative Mapping Summary

• Address length = (s + w) bits


• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits

BITS Pilani, Pilani Campus


Problem 7
Given :
• Cache of 128kByte, Cache block of 8
bytes
• 32 MBytes main memory
Find out
a) Number of bits required to address the
memory
b) Number of blocks in main memory
c) Number of cache lines
d) Number of bits required to identify a
word (byte) in a block?
e) Tag, Word
BITS Pilani, Pilani Campus
Problem 8
Cache of 64KByte, Cache block of 4 bytes and 16 M
Bytes main memory and associative mapping.

Fill in the blanks:


Number of bits in main memory address = _____

Number of lines in the cache memory = ______

Word bits = _________

Tag bits = ________

BITS Pilani, Pilani Campus


Problem 9

• 16 Bytes main memory, Memory block size is


4 bytes, Cache of 8 Byte (cache is 2 lines of 4
bytes each ) and associative mapping
• Block access sequence :
0 2 0 2 2 0 0 2 0 0 0 2 1

• Find out hit ratio.

BITS Pilani, Pilani Campus


Set Associative Mapping
• Cache is divided into a number of sets (v sets each
with k lines)
• m=v*k
i = j modulo v
where i = cache set number
j = main memory block number
m = number of lines in the cache
• Each set contains ‘k’ number of lines
• A given block maps to any line in a given set
– e.g. Block B can be in any line of set i
• e.g. 2 lines per set
– 2 way set associative mapping
– A given block can be in one of 2 lines in only one set BITS Pilani, Pilani Campus
Two Way Set Associative Cache
Organization

BITS Pilani, Pilani Campus


Set Associative Mapping Summary

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2d
Number of lines in set = k
Number of sets = v = 2d
Number of lines in cache = kv = k * 2d
Size of tag = (s – d) bits

BITS Pilani, Pilani Campus


Computer Organization
and Software Systems
CONTACT SESSION 4
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Today’s Session
Contact List of Topic Title Text/Ref
Hour Book/external
resource
7-8 • Memory Organization (Contd..) T1, R2
• Cache Memories (Contd..)
• Set Associative Caches
• Issues with Writes
• Performance Impact of Cache
Parameters
• Writing Cache friendly Codes
• Replacement Algorithms

BITS Pilani, Pilani Campus


Set Associative Mapping
• m line Cache is divided into a number of sets (v sets
each with k lines)
• m=v*k
i = j modulo v
where i = cache set number
j = main memory block number
v = number of sets in the cache
• Each set contains ‘k’ number of lines
• A given block maps to any line in a given set
– e.g. Block B can be in any line of set i
• e.g. 2 lines per set
– 2 way set associative mapping
– A given block can be in one of 2 lines in only one set BITS Pilani, Pilani Campus
Example
• 16 Bytes main
memory, Block Size
is 2 Bytes,
• Cache of 8 Bytes, 2
way set associative
cache
• # address bits
• Cache line size
• # main memory blocks
• # Number of cache lines
• # lines per set
• # of sets

BITS Pilani, Pilani Campus


Example –
Mapping Function
i = j modulo v Set #
0%2
1%2
2%2
3%2
4%2
5%2
6%2
7%2

TAG SET WORD


BITS Pilani, Pilani Campus
Set Associative Cache Organization

BITS Pilani, Pilani Campus


Set Associative Mapping Summary

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2d
Number of lines in set = k
Number of sets = v = 2d
Number of lines in cache = kv = k * 2d
Size of tag = (s – d) bits

BITS Pilani, Pilani Campus


Problem 1
A set-associative cache
consists of 64 lines, or
slots, divided into four-line
sets. Main memory contains
4K blocks of 128 bytes
each. Find out
• Total main memory capacity
• Total cache memory capacity
• Total number of sets in the
cache
• Number of bits for TAG, SET
and word

BITS Pilani, Pilani Campus


Problem 2 (1/2)
A 4-way set-associative cache
memory unit with a capacity of
16 KB is built using a block size
of 8 words. The word length is
32 bits. The size of the
physical address space is 4 GB.
Find out address format.
Find out:
1) # bits to address main memory
2) # cache lines
3) # sets
4) Main Memory Address Format

BITS Pilani, Pilani Campus


Problem 2 (2/2)
A 4-way set-associative cache
memory unit with a capacity of
16 KB is built using a block size
of 8 words. The word length is
32 bits. The size of the
physical address space is 4 GB.
Find out address format.
Find out:
1) # bits to address main memory
2) # cache lines
3) # sets
4) Main Memory Address Format

BITS Pilani, Pilani Campus


Replacement Algorithms (1/3)

Direct mapped cache


• No choice
• Each block maps to one line and replace that line

BITS Pilani, Pilani Campus


Replacement Algorithms (2/3)

• Needed in Associative & Set Associative mapped


cache
• Hardware implemented algorithm (speed)
• Methods:
– Least Recently Used (LRU)
– Least Frequently Used (LFU)
– First In First Out (FIFO)
– Random

BITS Pilani, Pilani Campus


Replacement Algorithms (3/3)

• Least Recently used (LRU): Replace the block that


has been in the cache longest with no reference to
it
– e.g. 2 way set associative
– Uses “USE” bits
– Most effective method
• Least frequently used: Replace block which has
had fewest hits
– Uses counter with each line
• First in first out (FIFO): Replace block that has
been in cache longest
– Round robin or circular buffer technique
• Random
BITS Pilani, Pilani Campus
Problem 3

Consider a reference pattern that accesses the


sequence of blocks 0, 4, 0, 2, 1, 8, 0, 1, 2, 3, 0, 4.
Assuming that the cache uses associative mapping,
find the hit ratio for a cache with four lines
a) LRU
b) LFU
c) FIFO

BITS Pilani, Pilani Campus


Problem 2 - LRU

Ref 0 4 0 2 1 8 0 1 2 3 0 4
time 0 1 2 3 4 5 6 7 8 9 10 11
L0
L1
L2
L3
H/M

BITS Pilani, Pilani Campus


Problem 2 - LFU

Ref 0 4 0 2 1 8 0 1 2 3 0 4
L0
L1
L2
L3
H/M

BITS Pilani, Pilani Campus


Problem 2 - FIFO

Ref 0 4 0 2 1 8 0 1 2 3 0 4
time 0 1 2 3 4 5 6 7 8 9 10 11
L0
L1
L2
L3
H/M

BITS Pilani, Pilani Campus


Issues with Writes
• Multiple copies of data exist:
– L1, L2, L3, Main Memory, Disk
• What to do on a write-hit?
– Write-through (write immediately to
memory)
– Write-back (defer write to memory until
replacement of line)
• Need a dirty bit (line different from
memory or not)

BITS Pilani, Pilani Campus


Issues with Writes
• What to do on a write-miss?
– Write-allocate (load into cache, update line in cache)
• Good if more writes to the location follow
– No-write-allocate (writes straight to memory, does not
load into cache)
• Typical
– Write-through + No-write-allocate
– Write-back + Write-allocate

BITS Pilani, Pilani Campus


Intel Core i7 Cache
Hierarchy

BITS Pilani, Pilani Campus


Intel Core i7 Cache Hierarchy

BITS Pilani, Pilani Campus


Performance Impact of Cache Parameters
• Associativity :
• higher associativity  more complex hardware
• Higher Associativity  Lower miss rate
• Higher Associativity  reduces average memory
access time (AMAT)
• Cache Size
• Larger the cache size  Lower miss rate
• Larger the cache size  reduces average memory
access time (AMAT)
• Block Size / Line Size:
• Smaller blocks do not take maximum advantage of
spatial locality.
BITS Pilani, Pilani Campus
Revisiting Locality of reference

Does this function


have good locality?

N=8
Address 0 1 2 3 4 5 6 7
Contents V[0] V[1] V[2] V[3] V[4] V[5] V[6] V[7]
Access Order 1 2 3 4 5 6 7 8
BITS Pilani, Pilani Campus
Byte Addressable
Stride k – memory and word

reference pattern
length is 1 byte

i Address i Address
0 0000 0 0000
Stride 1
1 0001 1 0001 Stride 2
2 0002 2 0002
3 0003 3 0003
4 0004 4 0004
Address difference
5 0005 5 0005
Stride=----------------------
6 0006 Word Length 6 0006
7 0007 7 0007
8 0008 8 0008
9 0009 9 0009
10 000A 10 000A
11 000B 11 000B
12 000C 12 000C
BITS Pilani, Pilani Campus
Stride k – reference pattern
Byte Addressable
memory and word
length is 2 bytes
i Address i Address
0 0000 0 0000
Stride 1
1 0002 1 0002 Stride 2
2 0004 2 0004
3 0006 3 0006
4 0008 4 0008
Address difference
5 000A 5 000A
Stride=----------------------
6 000C Word Length 6 000C
7 000E 7 000E
8 0010 8 0010
9 0012 9 0012
10 0014 10 0014
11 0016 11 0016
12 0018 12 0018
BITS Pilani, Pilani Campus
Revisiting Locality of reference

Does this function


have good locality?

M = 2, N=3
Address 0 1 2 3 4 5
Contents A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2]
Access Order 1 2 3 4 5 6

BITS Pilani, Pilani Campus


Revisiting Locality of reference

Does this function


have good locality?

M = 2, N=3
Address 0 1 2 3 4 5
Contents A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2]
Access Order 1 3 5 2 4 6

BITS Pilani, Pilani Campus


Writing Cache Friendly Code

• Make the common case go fast


– Focus on the inner loops of the core functions

• Minimize the misses in the inner loops


– Repeated references to variables are good
(temporal locality)
– Stride-1 reference patterns are good (spatial
locality)

BITS Pilani, Pilani Campus


Example 1

int sumarrayrows(int a[4][4])


{
int i, j, sum = 0;
for (i = 0; i < 4; i++)
for (j = 0; j < 4; j++)
sum += a[i][j];
return sum;
}
Assumption:
• The cache has a block size of 4 words each, 2 cache lines
• Word size 4 bytes.
• C stores arrays in row-major order
BITS Pilani, Pilani Campus
Example 1(Contd..)
W0
W1
W2
int sumarrayrows(int a[4][4]) W3
{
int i, j, sum = 0; W4
L0
for (i = 0; i < 4; i++) W5
L1
for (j = 0; j < 4; j++)
sum += a[i][j]; W6
return sum; W7
}
W8
A[i][j] J=0 J=1 J=2 J=3 W9

i=0 W10
W11
i=1
W12
i=2 W13
W14
I=3
W15
BITS Pilani, Pilani Campus
Example 2:

int sumarraycols(int a[4][4])


{
int i, j, sum = 0;
for (j = 0; j < 4; j++)
for (i = 0; i < 4; i++)
sum += a[i][j];
return sum;
}
Assumption:
• The cache has a block size of 4 words each, 2 cache lines
• Word size 4 bytes.
• C stores arrays in row-major order BITS Pilani, Pilani Campus
Example 1(Contd..)
W0
W1
W2
int sum_array(int a[4][4]) W3
{
int i, j, sum = 0; W4
L0
for (j = 0; j < 4; j++) W5
L1
for (i = 0; i < 4; i++)
sum += a[i][j]; W6
return sum; W7
}
W8
A[i][j] J=0 J=1 J=2 J=3 W9

i=0 W10
W11
i=1
W12
i=2 W13
W14
I=3
W15
BITS Pilani, Pilani Campus
Home Work – Which one is
better ?
Program 1: Program 2:
for (int i = 0; i < n; i++) { for (int i = 0; i < n; i++) {
z[i] = x[i] – y[i]; z[i] = x[i] – y[i];
z[i] = z[i] * z[i]; }
} for (int i = 0; i < n; i++) {
z[i] = z[i] * z[i];
}

BITS Pilani, Pilani Campus


COMPUTER ORGANIZATION
AND SOFTWARE SYSTEMS
SESSION 5
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
CISC Instruction Set
(Intel x86 as an example)
BITS Pilani
Pilani Campus
Today’s Session

Contact List of Topic Title Text/Ref


Hour Book/external
resource
9-10
• Instruction Set Architecture - CISC Vs RISC T1
• CISC Instruction Set (Intel x86 as an example)
• Machine Instruction Characteristics
• Types of Operands
• Types of Operations
• Addressing Modes
• Instruction Formats

BITS Pilani, Pilani Campus


Introduction
• What is an Instruction Set?
• The complete collection of instructions that are
understood by a CPU
• Elements of an Instruction
• Operation code (Op code)
• Source Operand reference
• Result Operand reference
• Next Instruction Reference
• Source and Destination Operands can be found in four areas
• Main memory (or virtual memory or cache)
• CPU register
• Immediate
• I/O device BITS Pilani, Pilani Campus
Simple Instruction Format

• During instruction execution, an instruction is read into an


instruction register (IR) in the processor.
• The processor must be able to extract the data from the
various instruction fields to perform the required operation.
• Opcodes are represented by abbreviations, called
mnemonics
Example: ADD AX, BX  Add instruction

BITS Pilani, Pilani Campus


Instruction Types
• Data processing : Arithmetic and logic instructions

• Data storage (main memory) : Movement of data into


or out of register and or memory locations

• Data movement (I/O) : I/O instructions

• Program flow control : Test and branch instructions

BITS Pilani, Pilani Campus


Number of Addresses (1/2)

• 3 addresses
– Operand 1, Operand 2, Result
– c = a + b; add a, b, c
– May be a forth - next instruction (usually implicit)
– Needs very long words to hold everything
• 2 addresses
– One address doubles as operand and result
– a = a + b : add a, b
– Reduces length of instruction
– The original value of a is lost.

BITS Pilani, Pilani Campus


Number of Addresses (2/2)
• 1 address
– Implicit second address
– Usually a register (accumulator)
– Common on early machines
• 0 (zero) addresses
– All addresses implicit
– Uses a stack
– e.g. c = a + b
push a
push b
add
pop c
BITS Pilani, Pilani Campus
Example

BITS Pilani, Pilani Campus


How Many Addresses
• Fewer addresses
– More Primitive instructions, shorter length instructions
– Less complex instructions, hence requires less complex hardware
– More instructions per program
• Longer programs
• More complex programs
• Longer execution time
• Multiple address instructions
– Lengthy instructions
– More registers
• Inter-register operations are quicker
– Fewer instructions per program

BITS Pilani, Pilani Campus


Instruction set Design Decisions
• Operation repertoire
– How many ops?
– What can they do?
– How complex are they?
• Data types
• Instruction formats
– Length of op code field
– Number of addresses
• Registers
– Number of CPU registers available
– Which operations can be performed on which registers?
• Addressing modes
BITS Pilani, Pilani Campus
Types of Operand
• Machine instructions operate on data
• General categories of data
• Addresses
• Numbers
• Binary integer or binary fixed point, floating
point, decimal
• Characters
• ASCII etc.
• Logical Data Binary Representation
• Bits or flags 32 16 8 4 2 1
1 0 0 1 0 0
• Packed Decimal
• 36 : 0011 0110
BITS Pilani, Pilani Campus
Byte Ordering

BITS Pilani, Pilani Campus


x86 Data Types
• General – Byte, Word, double word, quadword, double quad
word - arbitrary binary contents
• Integer – signed binary using two’s complement
representation
• Ordinal - unsigned integer
• Unpacked BCD - One digit per byte
• Packed BCD - 2 digits per byte
• Near Pointer
• Far pointer
• Bit field : A contiguous sequence of bits in which the
position of each bit is considered as an independent unit.
• Bit and Byte String
BITS Pilani, Pilani Campus
x86 Numeric Data Formats

BITS Pilani, Pilani Campus


Types of Operation
• Data Transfer
• Arithmetic
• Logical
• Conversion
• I/O
• System Control
• Transfer of Control

BITS Pilani, Pilani Campus


Data Transfer
• Specify
– Source
– Destination
– Amount of data
• Action:
1. Calculate the memory address, based on the address mode
2. If the address refers to virtual memory, translate from
virtual to real memory address.
3. Determine whether the addressed item is in cache.
4. If not, issue a command to the memory module.

BITS Pilani, Pilani Campus


Arithmetic
• Add, Subtract, Multiply, Divide
• May include
– Absolute value (|a|)
– Increment (a++)
– Decrement (a--)
– Negate (-a)
• Signed Integer
• Floating point

BITS Pilani, Pilani Campus


Logical
• Bitwise operations
• AND, OR, XOR, NOT

BITS Pilani, Pilani Campus


Shift and Rotate Operations
Input Operation Output
10101101 Logical right
shift (3 bits)
10101101 Logical left shift
(3 bits)
10101101 Arithmetic right
shift (3 bits)

10101101 Arithmetic left


shift (3 bits)

10101101 Right rotate (3


bits)

10101101 Left rotate (3


bits) BITS Pilani, Pilani Campus
Conversion
E.g. Binary to Decimal

BITS Pilani, Pilani Campus


Input/Output
• May be specific instructions (I/O-Mapped I/O)
• May be done using data movement instructions
(memory mapped)
• May be done by a separate controller (DMA)

BITS Pilani, Pilani Campus


Systems Control
• Privileged instructions
• CPU needs to be in specific state
– User Mode
– Kernel mode
• For operating systems use

BITS Pilani, Pilani Campus


Transfer of Control
• Jump / Branch (Unconditional / Conditional)
– e.g. jump to x if result is zero
• Skip (Unconditional / Conditional)
o skip (unconditional) : Increment to skip next
instruction
– e.g. increment and skip if zero
ISZ Register1
Branch xxxx
ADD A
• Subroutine call
• interrupt call
BITS Pilani, Pilani Campus
Branch / Jump Instruction

BITS Pilani, Pilani Campus


Use of Stack

BITS Pilani, Pilani Campus


Addressing Modes
• Addressing modes refers to the way in which the
operand of an instruction is specified
• Types:
• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement (Indexed)
• Stack

BITS Pilani, Pilani Campus


Immediate Addressing
• Operand is specified in the instruction itself
• e.g. ADD #5
– Add 5 to contents of accumulator
– 5 is operand
• No memory reference to fetch data
• Fast
• Limited range

BITS Pilani, Pilani Campus


Direct Addressing
• Address of the operand is specified in
the instruction
• Effective address (EA) = address field
(A)
• e.g. ADD A
– Add contents of memory cell whose
address is A to accumulator
– Look in memory at address A for
operand
• Single memory reference to access data
• No additional calculations to work out
effective address
• Limited address space
BITS Pilani, Pilani Campus
Indirect Addressing
• Memory cell pointed to by
address field of the
instruction contains the
address of (pointer to) the
operand
• EA = (A)
– Look in A, find address and
look there for operand
• e.g. ADD (A)
– Add contents of cell pointed
to by contents of A to
accumulator

BITS Pilani, Pilani Campus


Indirect Addressing…
• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
– e.g. EA = (((A)))
• Multiple memory accesses to find operand
• Slower

BITS Pilani, Pilani Campus


Register Addressing
• Operand is held in register named in address
filed
• EA = R
• Limited number of registers
• Very small address field needed
– Shorter instructions
– Faster instruction fetch
• No memory access hence Very fast
execution but very limited address space
• Multiple registers helps in improving
performance
– Requires good assembly programming or compiler
writing
– C programming : register int a; BITS Pilani, Pilani Campus
Register Indirect Addressing

• Similar to indirect addressing


• EA = (R)
• Operand is in memory cell
pointed to by contents of
register R
• Large address space (2n)
• One memory access compared
indirect addressing

BITS Pilani, Pilani Campus


Displacement Addressing
• EA = A + (R)
• Address field hold two
values
– A = base value
– R = register that holds
displacement
– or vice versa
• Three variants:
• Relative addressing
• Base register
addressing
• Indexing
BITS Pilani, Pilani Campus
Relative Addressing
• Also known as PC relative addressing
• A version of displacement addressing
• R = Program counter, PC
• EA = A + (PC)
• Relative addressing exploits the concept of locality

BITS Pilani, Pilani Campus


Base-Register Addressing
• The referenced register “R” contains a main memory
address
• EA = R + A
• address field contains a displacement A
• R may be explicit or implicit
• e.g. segment registers in 80x86

BITS Pilani, Pilani Campus


Indexed Addressing
• The address field references a main memory address
A
• The referenced register R contains a positive
displacement from that address.
• EA = A + R
• Good for accessing arrays
– EA = A + R
– R++

BITS Pilani, Pilani Campus


Auto Indexing
• Auto indexing incase certain registers are devoted
exclusively to indexing
EA = A + (R)
(R)  (R) + 1
• Example: LODSB
AL  DS:[SI]
SI is incremented or decremented based on direction flag.
D = 0  increment SI
D = 1  decrement SI

BITS Pilani, Pilani Campus


Post-indexing

• indexing is performed after the indirection


EA = (A) + (R)
• Steps:
1. The contents of the address field are used to access a
memory location containing a direct address.
2. Address is then indexed by the register value
• Use:
• for accessing one of a number of blocks of data of a
fixed format

BITS Pilani, Pilani Campus


Pre-indexing

• An address is calculated as with simple indexing


EA = (A + (R) )

• Use:
• to construct a multiway branch table

BITS Pilani, Pilani Campus


Stack Addressing
• Operand is (implicitly) on top of stack
• e.g.
– ADD Pop top two items from stack and add, push
the result on stack top

BITS Pilani, Pilani Campus


x86 Addressing Modes

32 bit registers  EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP
16 bit registers  AX, BX, CX, DX, SI, DI, SP, BP),
8 bit registers  AH, AL, BH, BL, CH, CL, DH , DL
Segment registers  CS, DS, ES, SS, FS, GS

BITS Pilani, Pilani Campus


x86 Addressing Mode Calculation

BITS Pilani, Pilani Campus


Instruction Formats

• Layout of bits in an instruction


• Includes opcode
• Includes (implicit or explicit) operand(s)
• Usually more than one instruction format in an
instruction set

BITS Pilani, Pilani Campus


Instruction Length

Affected by and affects:


– Memory size
– Memory organization
– Bus structure
– CPU complexity
– CPU speed
Trade off between powerful instruction repertoire and
saving space

BITS Pilani, Pilani Campus


Allocation of Bits

• Number of addressing modes


• Number of operands
• Register versus memory
• Number of register sets
• Address range
• Address granularity

BITS Pilani, Pilani Campus


COMPUTER ORGANIZATION
AND SOFTWARE SYSTEMS
SESSION 6
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
MIPS Architecture
 MIPS follows RISC principles and based on Harvard
Architecture
 MIPS architecture is a register architecture
 MIPS ISA is Load/Store architecture
 R2000 is a 32-bit processor
 Uses fixed length instruction format.

2
BITS Pilani, Pilani Campus
MIPS Register Set
Types of Registers
• 32 general-purpose registers ($0 – $31)
• Program Counter (PC)
• Two special Purpose register (HI and LO)
– Used to hold the results of integer multiply and
divide instruction.
– integer multiply operation, HI and LO register hold
the 64-bit result.
– integer divide operation, the 32-bit quotient is
stored in the LO and the remainder in the HI
register.

3
BITS Pilani, Pilani Campus
General-Purpose Register >>

Usage Convention Back

Register Number Intended usage


name
Zero 0 Constant 0
at 1 Reserved for assembler
v0,v1 2,3 Values for function results and Exp evaluation
a0,a1,a2,a3 4-7 Arguments 1-4
t0-t7 8-15 Temporary (not preserved across call)
s0-s7 16-23 Saved temporary( preserved across call)
t8,t9 24,25 Temporary (not preserved across call)
k0,k1 26,27 Reserved for OS kernel
gp 28 Pointer to Global area
sp 29 Stack Pointer
fp 30 Frame pointer(if needed);Otherwise, a saved register $s8
4
ra 31 Return address(used by a procedure call) BITS Pilani, Pilani Campus
Memory Usage
• A program’s address space
consists of three parts:
• Code/Text segment
• Data segment
• Stack segment

5
BITS Pilani, Pilani Campus
Instruction Format
 Fixed-length instruction format
 32-bits long
 Three different instruction formats:
1. Immediate (I-type)
2. Jump (J-type)
3. Register (R-type)
 Meaning of various fields in the instruction format
– op: Opcode
– rs: The first register source operand
– rt: The second register source operand
– rd: The register destination operand
– shamt / sa: shift amount
– funct: function code 6
BITS Pilani, Pilani Campus
Instruction Format
Register File
R-Type:

I-Type:

J-Type:

7
BITS Pilani, Pilani Campus
Addressing modes

 Immediate Addressing Mode

 Register Addressing Mode

 Base or Displacement Addressing Mode

8
BITS Pilani, Pilani Campus
Addressing modes

 PC- Relative Addressing Mode

 Pseudo Direct Addressing Mode

9
BITS Pilani, Pilani Campus
MIPS Instruction Set

• Arithmetic Instructions ( ADD, SUB etc.)


• Logical Instructions (OR, AND, SHIFT, etc. )
• Data Transfer Instructions (Load and Store)
• Decision Making Instructions ( J, BNE, BEQ, etc.)
• Stack Related Instructions (PUSH and POP)

10
BITS Pilani, Pilani Campus
A Basic MIPS Implementation

• Basic implementation includes a subset of core MIPS


instruction set
– Memory reference instruction
• lw and sw
– Arithmetic and logical instructions
• add, sub, and, or and slt
– Branch instructions
• beq and j

BITS Pilani, Pilani Campus


Memory Reference
Instruction - lw
• copies data from memory to register
• Format : lw reg, address
• Example : lw $s1, 100($s2)
• $s1  memory[100 +$s2]
• Alignment restriction
• MIPS is Big endian
• Spilling registers

s2 s1

BITS Pilani, Pilani Campus


Memory Reference Instruction - sw
• Store Instruction
• copies data from register to memory
• Format : sw reg, address
• Example : sw $s1, 100($s2)
memory[100 +$s2] $s1

s2 s1

BITS Pilani, Pilani Campus


Arithmetic Instructions :add
add des, src1, src2
Addressing mode: register
Example: add $s3, $s1, $s2
meaning : $s3 = $s1 + $s2
opcode : 0 , function: 0x20
sa :0
$s1 $s2 $s3

BITS Pilani, Pilani Campus


Arithmetic Instructions :sub
sub des, src1, src2
Addressing mode: register
Example: sub $s3, $s1, $s2
meaning : $s3 = $s1 - $s2
opcode : 0 , function: 0x22
sa :0
$s1 $s2 $s3

BITS Pilani, Pilani Campus


Logical instructions :and
and des, src1, src2
Addressing mode: register
Example: and $s3, $s1, $s2
meaning : $s3 = $s1 & $s2 (bit by bit)
opcode : 0 , function: 0x24
sa :0

$s1 $s2 $s3

BITS Pilani, Pilani Campus


Logical instructions :or
or des, src1, src2
Addressing mode: register
Example: or $s3, $s1, $s2
meaning : $s3 = $s1 | $s2 (bit by bit)
opcode : 0 , function: 0x25
sa :0

$s1 $s2 $s3

BITS Pilani, Pilani Campus


Logical instructions :slt
set on less than : slt
Format: slt des, src1, src2
set des = 1, if src1 < src2
Example:
slt $s3, $s1, $s2
$s1 $s2 $s3

Set register rd to 1 if rs is less than rt, and to 0


otherwise
BITS Pilani, Pilani Campus
Branch Instructions: beq and j
beq : branch if equal
– format : beq $s1, $s2, label
– If $s1 = $s2 then branch to address specified as
Label

j : jump unconditional
– format : j label

BITS Pilani, Pilani Campus


Back • and $s1, $s2, $s3

Summary
• lw $s1, 100($s2) • or $s1, $s2, $s3

• sw $s1, 100($s2) • slt $t0, $s1, $s2

• add $s1, $s2, $s3 • beq $s1, $s2, label

• sub $s1, $s2, $s3 • j label

BITS Pilani, Pilani Campus


Data Path Design
BITS Pilani
Pilani Campus
Introduction

• The Five Classic Components of a Computer


– Datapath: The component of the processor that
performs data processing operations
– Control: The component of the processor that
commands the datapath, memory and I/O devices
according to the instructions of the memory
– Memory
– Input device
– Output device

BITS Pilani, Pilani Campus


Building a datapath
• Datapath elements :memory – instruction/data,
register file, ALU, Adders, Multiplexers….
• Demonstrate Datapath implementation for the
following instruction:
• lw and sw
• add, sub, and, or and slt
• beq and j
• Instruction Cycle = Fetch + Execute

BITS Pilani, Pilani Campus


Do this…

write a datapath used for fetching instructions and


incrementing the program counter

BITS Pilani, Pilani Campus


Data Path for R-type instruction
Instruction
Summary

Examples:
add, sub, and, or, slt  rs, rt, rd

BITS Pilani, Pilani Campus


Data Path for R-type
instruction…
Instruction
Summary

•add $s1, $s2, $s3

BITS Pilani, Pilani Campus


Data Path for lw and sw
instructions
Instruction
Summary

lw $s1, 100($s2)

sw $s1, 100($s2)

BITS Pilani, Pilani Campus


Data Path for lw instructions Previous

lw $s1, 100($s2)
s2 s1

BITS Pilani, Pilani Campus


Data Path for sw instructions

sw $s1, 100($s2)

s2 s1

BITS Pilani, Pilani Campus


Combined Datapath for R-
Type and I-Type instructions

R-Type I-Type

BITS Pilani, Pilani Campus


MUX

BITS Pilani, Pilani Campus


beq instruction
• beq $s1, $s2, offset
• Two important points to be noted
– The instruction set architecture defines that the
base for the branch address calculation is the
address of the instruction following the branch
– the architecture also states that the offset field
is shifted left 2 bits
• branch taken or branch not taken

BITS Pilani, Pilani Campus


Datapath for beq instruction

BITS Pilani, Pilani Campus


Jump instruction
j <26 bit jump offset>
the jump instruction operates by replacing the lower 28
bits of the PC with the lower 26 bits of the
instruction shifted left by 2 bits

BITS Pilani, Pilani Campus


Creating a single cycle datapath
Simplest datapath executes all instructions in one clock
cycle
– No element in the datapath can be used more than
once
– Any element needed more than once must be
duplicated

BITS Pilani, Pilani Campus


Control unit design
Previous
The ALU control
– used by load and store instructions
– used by r-type instruction : AND, OR, add, sub and slt
• opcode : zero
• operation performed based on “function field”

BITS Pilani, Pilani Campus


Contd…
• ALU control unit generates 4 bit ALUOperation control
signal based on
– function field
– 2 bit control field  ALUOp
• 00  lw and sw
• 01 beq (sub)
• 10 add, subtract, AND, OR and slt
• Main Control unit generates 7 control signals : RegDst,
RegWrite, ALUSrc, PCSrc, MemRead, MemWrite and
MemtoReg

37
BITS Pilani, Pilani Campus
Datapath

BITS Pilani, Pilani Campus


Effects of 7 control signals

BITS Pilani, Pilani Campus


Major observations
• opcode is always contained in bits 31:26
• The two registers to be read are always specified by the rs
and rt fields at positions 25:21 and 20:16
– r-type
– branch equal
– store
• The base register for load and store instruction is always in
bit positions 25:21 i.e. rs
• The 16 bit offset for branch equal, load and store is always
in positions 15:0
• The destination register is in one of two places.
– load instruction rt 20:16
– r type instruction rd 15:11 40
BITS Pilani, Pilani Campus
Execution steps: add $s1, $s2, $s3
1. The instruction is fetched from _______, and the
PC is incremented to _________
2. Two registers ______ and _______ are read from
the register file
3. The ALU operates on the data read from the
register file, based on the _______code
4. The result from the ALU is written into the
________register in register file

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Execution steps: lw $s1, 100($s2)

1. The instruction is fetched, and the PC is


incremented
2. A register _________value is read from the
register file
3. The ALU computes the sum of the value read from
the register file and the sign extended lower 16 bits
of the instruction
4. the sum from the ALU is used as the address for
the data memory
5. The data from the memory unit is written into
_____ in register file

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Complete Data Path

BITS Pilani, Pilani Campus


COMPUTER ORGANIZATION
AND SOFTWARE SYSTEMS
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Webinar - 3
MIPS-Multicycle Implementation
BITS Pilani
Pilani Campus
Last Class…
• MIPS Instruction set architecture
• Datapath for
– fetching instructions
– updating program counter
– implementing R-type ALU operations
– implementing data memory unit and the sign
extension unit
– datapath for implementing branch instructions
• Effect of control signals

BITS Pilani, Pilani Campus


Effects of 7 control signals

BITS Pilani, Pilani Campus


Effect of ALUop

ALUop Instructions
00 lw, sw
01 Beq
10 add, sub, and, or, slt

BITS Pilani, Pilani Campus


Execution steps: sw $s1, 100($s2)
1. The instruction is fetched, and the PC is
incremented
2. A register $s2 and $s1 value is read from the
register file
3. The ALU computes the sum of the value read from
the register s2 and the sign extended lower 16 bits
of the instruction
4. the sum from the ALU is used as the address for
the data memory and writes the contents of $s1 on
to memory

BITS Pilani, Pilani Campus


Fig
PCsrc

Opcode

rs RegWrite

rt MemWrite

ALUSrc MemtoReg
rd

RegDst
Offset

MemRead
Function ALUop
6/9/2020 10:38
Lucy
PMJ. Gudino BITS Pilani, Pilani Campus
7
Home Work:
Execution steps:beq $s1,$s2, offset
1. The instruction is fetched, and the PC is
incremented
2. Two registers $s1 and $s2 are read from the
register file
3. The ALU performs a subtract on the data values
read from the register file. The value of PC + 4 is
added to the sign extended, lower 16 bits of the
instruction (offset) sifted left by two; the result is
the branch target address
4. The zero result from the ALU is used to decide
which adder result to store into the PC

BITS Pilani, Pilani Campus


6/9/2020 10:38
Lucy
PMJ. Gudino 9
BITS Pilani, Pilani Campus
Single Cycle Advantages & Disadvantages

• Uses the clock cycle inefficiently – the clock cycle


must be timed to accommodate the slowest
instruction
• May be wasteful of area since some functional units
(e.g., adders) must be duplicated since they can not
be shared during a clock cycle but is simple and easy
to understand

Lucy J. Gudino BITS Pilani, Pilani Campus


Multicycle datapath approach
• also called multiple clock cycle implementation
• an instruction is executed in multiple clock cycles
• Advantage : hardware sharing and ability to allow
instructions to take different numbers of clock
cycles
• Break up instructions into steps where each step
takes a cycle while trying to
– balance the amount of work to be done in each step
– restrict each cycle to use only one major functional
unit
• Not every instruction takes the same number of
clock cycles BITS Pilani, Pilani Campus
Single cycle vs multicycle
• Single cycle:
– Two memory units – one for instruction and one for data
– one ALU and two adders
• Multicycle:
– A single memory unit for both instructions and data
• only one memory access per cycle
– only a single ALU
• only one ALU operation per cycle
– one or more registers are added after every major
functional unit to hold the output of that unit until the
value is used in a subsequent clock cycle

BITS Pilani, Pilani Campus


Temporary registers

• IR  Instruction Register
• MDR  Memory Data Register
• A,B used to hold the register operand values read
from the register file
• ALUout  holds the output of the ALU

BITS Pilani, Pilani Campus


Contd…
Important note:
1. all data that is used in subsequent clock cycles must
be stored in inter stage registers
2. Data used by subsequent instruction is stored in
programmer visible registers (i.e., register file, PC,
or memory)
3. All the registers except IR do not need a write
control signal
4. Expand the multiplexer units

BITS Pilani, Pilani Campus


The Five Steps
• Step 1 : IF (Instruction Fetch)
• Step 2 : ID (Instruction Decode)
• Step 3 : EX (Execute)
• Step 4 : MEM (Memory)
• Step 5 : WB (Write Back)

BITS Pilani, Pilani Campus


Step1: IF (Instruction Fetch)
• Instruction Fetch and Update PC
IR  Memory [PC]
PC PC + 4

BITS Pilani, Pilani Campus


Step 2: ID (Instruction Decode)
• Following operations are performed
– read two registers corresponding to rs and rt
fields
A  Reg [ IR [25:21] ]
B  Reg [ IR [20:16] ]
– compute branch target address with ALU
ALUout  PC + (sign-extend (IR[15-0]) << 2)

BITS Pilani, Pilani Campus


Step 3: EX (Execute)
• Four operations are possible
1. Memory reference
ALUout  A + sign-extend ( IR[15:0] )
2.Arithmetic and logical instruction
ALUout  A op B
3.Branch
if ( A == B )
PC  ALUout
4. Jump
PC  { PC[31:28], (IR [25:0] << 2)

BITS Pilani, Pilani Campus


Step 4: MEM ( Memory)
• Memory access or R type instruction completion step
1. Memory Access
MDR  Memory [ALUout]
or
Memory [ALUout]  B
2. R-Type Instruction Completion
Reg [ IR [15: 11]]  ALUout

BITS Pilani, Pilani Campus


Step 5: WB (Write Back)
• Memory read completion step
– Reg [ IR [20 : 16 ] ]  MDR

BITS Pilani, Pilani Campus


R-Type Instruction: add $S1, $S2, $S3

BITS Pilani, Pilani Campus


lw Instruction: lw $S1, 100($S2)

BITS Pilani, Pilani Campus


sw Instruction: sw $S1, 100($S2)

BITS Pilani, Pilani Campus


beq Instruction: beq $S1, $S2, 100

BITS Pilani, Pilani Campus


Summary
• R-Type: Require four cycles, CPI =4
IF, ID, EX, WB
• Loads Require five cycles, CPI = 5
IF, ID, EX, MEM, WB
• Store: Require four cycles, CPI = 4
IF, ID, EX, MEM
• Branch: Require three cycles, CPI = 3
IF, ID, EX

BITS Pilani, Pilani Campus


Complete Data Path (1/3)
0: PC 1: Read 1: Read 0: From Rt 1: to write
1: ALUout Memory instruction 1: From Rd in reg file
Memory

rs
rt

rd

1: Write
Memory

BITS Pilani, Pilani Campus


Complete Data Path (2/3) 00: Add
01: Subtract (Beq)
0: From PC
10:Function field
1: From A
determines the
ALU operation

00: B register
01: 4
10: sign extd input
0:from ALUout
11: Sign extd input <<2
1: from MDR BITS Pilani, Pilani Campus
Complete Data
1: PC gets
Path (3/3)
updated

00: PC + 4 output of ALU


01: branch target address ALUout
10:jump target address
BITS Pilani, Pilani Campus
Actions of the 1 bit control signals

BITS Pilani, Pilani Campus


Actions of the 2 bit control signals

Back
BITS Pilani, Pilani Campus
First Step FIG

 Instruction Fetch : IFetch


• IR  Memory [PC]: 0: PC
1 – MemRead 1: ALUout
1 – IRWrite
0 – IorD 0: First operand is in PC
1: First operand is in register A
• PC PC + 4 :
0 - ALUSrcA Second operand
01 - ALUSrcB 00: A register
01: 4
- ALUop 10:Sign Extd lower 16 bits of IR
- PCSource 11 :Sign Extd lower 16 bits of IR << 2
- PCWrite BITS Pilani, Pilani Campus
Contd…
Fig

 Instruction Fetch : IFetch


00: Add operation
• IR  Memory [PC]: 01: Subtract Operation
1 – MemRead 10:Function field of the instruction
determines the ALU operation
1 – IRWrite
0 – IorD
• PC PC + 4 :
0
- ALUSrcA
00: PC + 4 output of ALU
01 - ALUSrcB 01: branch target address ALUout
00 - ALUop 10:jump target address
00 - PCSource
1 - PCWrite BITS Pilani, Pilani Campus
Second Step 0: First operand is PC Fig
1: First operand is register
Instruction decode and register
Secondfetch
operandstep
– read two registers corresponding
00: register to rs and rt
fields 01: 4
– compute branch target address
10:Sign Extd with
lowerALU
16 bits of IR
A  Reg [ IR [25:21] ] 11 :Sign Extd lower 16 bits of IR << 2
B  Reg [ IR [20:16] ]
ALUout  PC + (sign-extend (IR[15-0]) << 2)

0 ALUSrcA 00: Add


01: Subtract
11
ALUSrcB
10:Function field determines the
ALU operation
00 ALUop
BITS Pilani, Pilani Campus
Step 3
Execution, memory address computation, or branch
completion
– Four operations are possible
1. Memory reference generation
ALUout  A + sign-extend ( IR[15:0] )
2.Arithmetic and logical instruction
ALUout  A op B
3.Branch
if ( A == B ) PC  ALUout
4. Jump
PC  { PC[31:28], (IR [25:0] << 2)
BITS Pilani, Pilani Campus
Fig
Step 3: Contd…
0: First operand is PC
1: First operand is register A
1.Memory reference generation
ALUout  A + sign-extend ( IR[15:0] )
1 - ALUSrcA
Second operand
00: register
10 - ALUSrcB 01: 4
10:Sign Extd lower 16 bits of IR
11 :Sign Extd lower 16 bits of IR << 2
00 - ALUop
00: Add operation
01: Subtract Operation
10:Function field of the instruction
determines the ALU operation
BITS Pilani, Pilani Campus
Step 3: Contd…
Fig

2. Arithmetic and logical instruction


ALUout  A op B 0: First operand is PC
1 - ALUSrcA 1: First operand is register

00: register
00 - ALUSrcB 01: 4
10:Sign Extd lower 16 bits of IR
11 :Sign Extd lower 16 bits of IR << 2
10 - ALUop
00: Add operation
01: Subtract Operation
10:Function field of the instruction
determines the ALU operation
BITS Pilani, Pilani Campus
Step 3: Contd… 0: First operand is PC
1: First operand is register Fig
00: register
3. Branch 01: 4
if ( A == B ) PC  ALUout
10:Sign Extd lower 16 bits of IR
1 - ALUSrcA 11 :Sign Extd lower 16 bits of IR << 2

00: Add operation


00 - ALUSrcB 01: Subtract Operation(beq)
10:Function field of the instruction
determines the ALU operation
01 - ALUop
00: PC + 4 output of ALU
1 - PCWrite 01: branch target address ALUout
10:jump target address
01 - PCSource BITS Pilani, Pilani Campus
Step 3: Contd…
Fig

4. Jump
PC  { PC[31:28], (IR [25:0] << 2)
10 - PCSource

1 - PCWrite 00: PC + 4 output of ALU


01: branch target address ALUout
10:jump target address

BITS Pilani, Pilani Campus


Step 4
• Memory access or R type instruction completion step
– Two operations are possible
• Memory reference
MDR  Memory [ALUout]
or
Memory [ALUout]  B
• Arithmetic and logical instruction
Reg [ IR [15: 11]]  ALUout

BITS Pilani, Pilani Campus


Step 4 Contd… FIG

Memory reference
lw
MDR  Memory [ALUout]
1 MemRead
1 IorD
sw
Memory [ALUout]  B
1 MemWrite
IorD
1

BITS Pilani, Pilani Campus


Step 4 contd… Fig

Arithmetic and logical instruction


Reg [ IR [15: 11]]  ALUout
1 RegDst
1 RegWrite
0 MemtoReg

BITS Pilani, Pilani Campus


Step 5 Fig
Write back or Memory read completion step
– Reg [ IR [20 : 16 ] ]  MDR
MemtoReg 1

RegWrite 1
RegDst 0

BITS Pilani, Pilani Campus


BACK

BITS Pilani, Pilani Campus


Summary

BITS Pilani, Pilani Campus


BACK

BITS Pilani, Pilani Campus


COMPUTER ORGANIZATION
AND SOFTWARE SYSTEMS
SESSION 7
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Single Cycle Implementation :
sub $s1, $S2, $S3

BITS Pilani, Pilani Campus


Multi Cycle Implementation

BITS Pilani, Pilani Campus


Actions of the 1 bit control signals

BITS Pilani, Pilani Campus


Actions of the 2 bit control signals

BITS Pilani, Pilani Campus


Summary

BITS Pilani, Pilani Campus


R-Type Instruction: add $S1, $S2, $S3

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
lw Instruction: lw $S1, 100($S2)

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Control Unit Design
BITS Pilani
Pilani Campus
Control Unit implementation

Hardwired control unit (RISC)


Microprogrammed control unit(CISC)

BITS Pilani, Pilani Campus


Hardwired Implementation
• Control unit inputs
• Flags and control bus
• Each bit means something
• Instruction register
• Op-code causes different
control signals for each
different instruction
• Unique logic for each op-
code
• Clock

BITS Pilani, Pilani Campus


Problems With Hard Wired Designs

• Complex sequencing & micro-operation logic


• Difficult to design and test
• Inflexible design
• Difficult to add new instructions

BITS Pilani, Pilani Campus


Microprogrammed Control Unit

• A computer executes a program


• Fetch/execute cycle/interrupt cycle…
• Each cycle has a number of steps
• Called micro-operations
• Each step does very little
• Atomic operation of CPU

BITS Pilani, Pilani Campus


Constituent Elements of
Program Execution

BITS Pilani, Pilani Campus


Example: ADD R1,X
ADD R1,X - add the contents of location X to R1, and
stores the result in R1
Fetch Sequence :
t1: MAR  (PC)
t2: MBR  (memory)
PC  (PC) +1
t3: IR  (MBR)

OR

t1: MAR <- (PC)


t2: MBR <- (memory)
t3: PC <- (PC) +1
IR <- (MBR) BITS Pilani, Pilani Campus
Example: ADD R1,X
Execute Cycle (ADD):
t4: MAR  (IRaddress)
t5: MBR  (memory)
t6: R1  R1 + (MBR)

BITS Pilani, Pilani Campus


Instruction Cycle
• Different phases : Instruction fetch, indirect, execute and
interrupt
• Each phase decomposed into sequence of elementary micro-
operations
• Execute cycle
– One sequence of micro-operations for each opcode
• Need to tie sequences together
• Assume new 2-bit register
– Instruction cycle code (ICC) designates which part of
cycle processor is in
• 00: Fetch
• 01: Indirect
• 10: Execute
• 11: Interrupt BITS Pilani, Pilani Campus
Flowchart for Instruction
Cycle

BITS Pilani, Pilani Campus


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Functions of Control Unit
• The control unit performs two basic tasks:
– Sequencing
• Causing the CPU to step through a series of
micro-operations
– Execution
• Causing the performance of each micro-op

Example: ADD R1, X


t1: MAR  (PC) t4: MAR  (IRaddress)
t2: MBR  (memory) t5: MBR  (memory)
PC  (PC) +1 t6: R1  R1 + (MBR)
t3: IR  (MBR)

BITS Pilani, Pilani Campus


Control Signals
• Inputs to the control unit
– Clock
• One micro-instruction (or set of parallel micro-
instructions) per clock cycle
– Instruction register
• Op-code for current instruction
• Determines which micro-instructions are performed
– Flags
• State of CPU
• Results of previous operations
– From control bus
• Interrupts
• Acknowledgements
BITS Pilani, Pilani Campus
Microprogrammed Control Unit

BITS Pilani, Pilani Campus


Organization of
Control Memory

BITS Pilani, Pilani Campus


Pipeline
BITS Pilani
Pilani Campus
Laundry System

BITS Pilani, Pilani Campus


Laundry System……

BITS Pilani, Pilani Campus


Pipelining
• An overlapped parallelism: overlapped execution of
multiple operations
• Pipelining
– Subdivide the input task into a sequence of
subtasks
– Specialized hardware stage for each task
– Concurrent operation of all the stages

BITS Pilani, Pilani Campus


Two segment instruction pipeline
• Contains
– Instruction Fetch (IF)
– Execute (EX)
• Example: 8086 microprocessor
• Instruction Fetch unit is implemented by means of
first in first out buffer (Queue)

BITS Pilani, Pilani Campus


Two Stage Pipeline

BITS Pilani, Pilani Campus


Issues

1. The execution time will generally be longer than the


fetch time
2. A conditional branch instruction makes the address of
the next instruction to be fetched unknown.

BITS Pilani, Pilani Campus


Four Segment instruction
pipeline
• Contains
– FI : fetch instruction
– DA : Decode instruction and calculate effective
address
– FO :Fetch operand
– EX: Execute instruction

BITS Pilani, Pilani Campus


Six stage pipeline
• Contains
– FI : fetch instruction
– DI : Decode instruction
– CO : calculate effective address
– FO :Fetch operand
– EI : Execute instruction
– WO : Write Operand

BITS Pilani, Pilani Campus


Timing Diagram for
Instruction Pipeline Operation

BITS Pilani, Pilani Campus


Important Points to
be noted
• Do all the instructions
need all the stages?
• At t=6, WO, FO and FI
accesses the memory. Is
there any issue?
• Is there any implication
on having different time
duration for different
stages?
• Any issues with
conditional branch?
• Dependency of CO stage
on register used in
previous stage BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
COMPUTER ORGANIZATION
AND SOFTWARE SYSTEMS
SESSION 8
Dr. Lucy J. Gudino
BITS Pilani WILP & Department of CS & IS
Pilani Campus
Structure of a pipeline

L S1 L S2 L L Sk L

Clock

BITS Pilani, Pilani Campus


Classification
• Arithmetic pipelining
• Instruction pipelining
• Processor pipelining
• Unifunction and multifunction pipelining
• Static and Dynamic pipelining
• Scalar and Vector pipelining

BITS Pilani, Pilani Campus


Arithmetic pipelining

• Arithmetic and logic units of a computer can be


segmentized for pipeline operations
• Usually found in high speed computers
• Example:
– Star 100  4 stage
– TI-ASC  8 stage
– Cray-1  14 stage
– Cyber 205  26 stages
– Intel Cooper Lake (3rd Gen Intel Xeon) = 14 stages
• Floating point adder pipeline
X = A*2a
Y = B*2b
BITS Pilani, Pilani Campus
Floating point
addition
1. Compare the
exponents
2. Align the
mantissas
3. Add or
subtract the
mantissas
4. Normalize the
result

Numerical Example:
X = 0.9504*103
Y = 0.8200*102
BITS Pilani, Pilani Campus
Instruction pipelining
• The execution of a stream of instructions can be pipelined
by overlapping the execution of the current instruction with
the fetch, decode……of subsequent instructions
• Sequence of steps followed in most general purpose
computer to process instruction
1. IF: Fetch the instruction from memory
2. ID: Decode the instruction
3. CO: Calculate the effective address
4. FO: Fetch the operands from memory
5. EI: Execute the instruction
6. WO: Store the result in the proper place

BITS Pilani, Pilani Campus


Unifunction and multifunction pipelining
• Unifunction
Input Input
– Pipeline with
a fixed and
dedicated S1 S1
function
– Ex: Floating
point adder
S2 S2
• Multifunction
– Pipeline may Output B
perform
different S3 S3

functions
Output Output A

BITS Pilani, Pilani Campus


Reservation Table

• Is a two dimensional chart


• Used to show how successive pipeline stages are
utilized or reserved

BITS Pilani, Pilani Campus


Uni-function Vs Multifunction

Input Input Reservation Table A


T0 T1 T2
S1 Reservation Table A
S1 S1 A
T0 T1 T2 S2 A
S1 A S3 A

S2
S2 A
S2
S3 A Reservation Table B
Output B
T0 T1
S1 B
S3 S3
S2 B

Output A Output A

BITS Pilani, Pilani Campus


Linear and Nonlinear Pipelines
• Linear Pipeline: Without feed forward and feed back connection
• Nonlinear Pipeline with feed forward and/or feed back connection
Input Input

S1 S1

S2
S2

S3
S3

Output A
Output A BITS Pilani, Pilani Campus
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3

Reservation Table
Time 

0 1 2 3 4 5 6 7
Stage 

S1 X X X X
S2 X X
S3 X X
Static and Dynamic pipelining

• Dynamic pipeline allows more frequent changes in its


configuration
• Require more elaborate sequencing and control
mechanisms

BITS Pilani, Pilani Campus


Scalar and Vector pipelining
• Based on the operand types or instruction type
• Scalar pipeline processes scalar operands
• Vector pipeline operate on vector data and
instructions.

BITS Pilani, Pilani Campus


Important Terms

L S1 L S2 L L Sk L

Clock
Fig: Structure of a pipeline

BITS Pilani, Pilani Campus


Important Terms
Clock period: t
ti: time delay of Si stage
tL: time delay of latch

t = max{ti} + tL
Pipeline processor frequency f = 1/ t

BITS Pilani, Pilani Campus


Important Terms
Time taken to complete n tasks
by k stage pipeline is
Tk = [k +(n-1)]t
Time taken by the nonpipelined
processor
?

BITS Pilani, Pilani Campus


Important Terms
Time taken to complete n tasks
by k stage pipeline is
Tk = [k +(n-1)]t
Time taken by the nonpipelined
processor
T1 = k* n*t

BITS Pilani, Pilani Campus


Important Terms
• Speedup: speedup of a k-stage linear-pipeline over an
equivalent nonpipelined processor

• The maximum speedup is Sk k when n INF


• Maximum speedup is very difficult to achieve because of
data dependencies between successive tasks, program
branches, interrupts etc.
BITS Pilani, Pilani Campus
Important Terms
Efficiency: the ratio of actual speedup to ideal speedup

• Maximum efficiency
  1 as n  
• Implies that the larger the number of tasks flowing through
the pipeline, the better is its efficiency
• In steady state of a pipeline, we have n >> k, then efficiency
should approach 1
• However, this ideal case may not hold all the time because of
program branches and interrupts and data dependencies
BITS Pilani, Pilani Campus
Important Terms

Throughput : The number of tasks that can be completed by a


pipeline per unit time

n nf
Hk    ηf
k  n  1τ k  n  1

BITS Pilani, Pilani Campus


Problem 1
Draw a space-time diagram for a six segment pipeline
showing the time it takes to process eight tasks
Determine the number of clock cycles that it takes to
process 200 tasks in a six segment pipeline

1 2 3 4 5 6 7 8 9 10 11 12 13
T1
S1 T1
T1 T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6
T6 T7
T7
T7 T8
T8
S2 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6
T6 T7
T7 T8
T8

S3 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6 T7
T7 T8
T8

S4 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5 T6
T6 T7
T7 T8
T8

S5 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4 T5
T5 T6
T6 T7
T7 T8
T8

S6 T1
T1
T1 T2
T2
T2 T3
T3 T4
T4 T5
T5 T6
T6 T7
T7 18
BITS Pilani, Pilani Campus
Alternate Method
1 2 3 4 5 6 7 8 9 10 11 12 13
T1 S1 S2 S3 S4 S5 S6
T2 S1 S2 S3 S4 S5 S6
T3 S1 S2 S3 S4 S5 S6
T4 S1 S2 S3 S4 S5 S6
T5 S1 S2 S3 S4 S5 S6
T6 S1 S2 S3 S4 S5 S6
T7 S1 S2 S3 S4 S5 S6
T8 S1 S2 S3 S4 S5 S6

Determine the number of clock cycles that it takes to process


200 tasks in a six segment pipeline

BITS Pilani, Pilani Campus


Problem 2
Assume each task is subdivided in to 6 subtasks and
clock cycle is 10 microseconds.
– Determine the number of clock cycles that is
taken to process 50 tasks.
– Determine the number of clock cycles that is
taken to process 50 tasks in non- pipeline
processor
– Compute speed up, efficiency and throughput

BITS Pilani, Pilani Campus


Pipeline Hazards / Conflicts

• Resource Hazard: access to same resource by two


segments at the same time

• Data Hazard : an instruction depends on the result of


a previous instruction, but this result is not yet
available

• Control Hazard: arise from branch and other


instructions that change the value of PC

BITS Pilani, Pilani Campus


Resource Hazard

BITS Pilani, Pilani Campus


Problem

Consider the following code. Assume that initial


contents of all the registers is zero.

START: MOV R4, #1


MOV R3, #2
MOV R1, #2
MOV R2, #3
ADD R3, R1, R2
SUB R4, R3, R2
• Write timing of instruction pipeline with FIVE stages
• What is the content of various registers after the
execution of program?

BITS Pilani, Pilani Campus


START: MOV R4, #1
Data Dependency MOV R3, #2
Conflict MOV R1, #2
MOV R2, #3
ADD R3, R1, R2
SUB R4, R3, R2

time
1 2 3 4 5 6 7 8 9 10
I1 FI DI FO EI WO

I2 FI DI FO EI WO

I3 FI DI FO EI WO

I4 FI DI FO EI WO

I5 FI DI FO EI WO

I6 FI DI FO EI WO

BITS Pilani, Pilani Campus


Example

START: ADD R3, R1, R2


SUB R4, R1, R2
JNZ NEXT
MUL R5, R1, R2
ADD R6, R3, R4
:
NEXT: ST R3, LOCN1
HLT
Write timing of instruction pipeline
BITS Pilani, Pilani Campus
The Effect of a Conditional Branch
on Instruction Pipeline Operation

BITS Pilani, Pilani Campus


Dealing with Branches
Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching

BITS Pilani, Pilani Campus


Multiple Streams
• Have two pipelines
• Pre-fetch each branch into a separate pipeline
• Use appropriate pipeline
• Issues :
• Leads to bus & register contention
• Multiple branches lead to further pipelines being
needed

BITS Pilani, Pilani Campus


Prefetch Branch Target
• Target of branch is prefetched in addition to
instructions following branch
• Keep target until branch is executed
• Used by IBM 360/91

BITS Pilani, Pilani Campus


Loop Buffer
• Very fast memory
• Maintained by fetch
stage of pipeline
• Check buffer before
fetching from memory
• Very good for small
loops or jumps
• Working principle is
similar to cache
• Used by CRAY-1

BITS Pilani, Pilani Campus


Branch Prediction (1)
• Predict never taken
• Predict always taken
• Predict by opcode
• Taken/not taken switch
• Branch history table

BITS Pilani, Pilani Campus


Branch Prediction (2)
• Predict never taken
– Assume that jump will not happen
– Always fetch next instruction
• Predict always taken
– Assume that jump will happen
– Always fetch target instruction
• Predict by Opcode
– Some instructions are more likely to result in a
jump than others
– Can get up to 75% success

BITS Pilani, Pilani Campus


Branch Prediction (3)
• Taken/Not taken switch
– Based on previous
history
– Good for loops

BITS Pilani, Pilani Campus


State Diagram

BITS Pilani, Pilani Campus


Branch Prediction (4)

• Delayed Branch
ADD R3, R1, R2
– Do not take jump
SUB R1, R2, R3
until you have to
ADD R3, R1, R2
– Rearrange SUB R4, R1, R2
instructions JNZ NEXT
MUL R5, R1, R2
ADD R6, R3, R4
:
NEXT: ST R3, LOCN1
HLT

BITS Pilani, Pilani Campus


ADD R3, R1, R2 JNZ NEXT
SUB R1, R2, R3 ADD R3, R1, R2
ADD R3, R1, R2 SUB R1, R2, R3
SUB R4, R1, R2 ADD R3, R1, R2
JNZ NEXT SUB R4, R1, R2
MUL R5, R1, R2 MUL R5, R1, R2
ADD R6, R3, R4 ADD R6, R3, R4
: :
NEXT: ST R3, LOCN1 NEXT: ST R3, LOCN1
HLT HLT

BITS Pilani, Pilani Campus


Problem 3
Consider a computer system that requires four stages to execute an
instruction. The details of the stages and their time requirements are
listed below:
Instruction Fetch (IF) stage: 30 ns,
Instruction Decode (ID) stage: 9 ns
Execute (EX) stage: 20 ns
Store Results (WO stage): 10 ns.
Assume that inter-stage delay is 1 ns and every instruction in the program
must proceed through the stages in sequence.
a) What is the frequency of the Processor?
b) What is the minimum time for 10 instructions to complete in non-pipelined
manner?
c) What is the minimum time for 10 instructions to complete in pipelined
manner?

BITS Pilani, Pilani Campus

Вам также может понравиться