Академический Документы
Профессиональный Документы
Культура Документы
SEM SEM
IV COSS II
SEM
III
Hardware Data
Software
Reference Books:
(R1) Patterson, David A & J L Hennenssy, Computer Organization and
Design – The Hardware/Software Interface, Elsevier, 5th Ed., 2014.
(R2) Randal E. Bryant, David R. O’Hallaron, Computer Systems – A
Programmer’s Perspective, Pearson, 3rd Ed, 2016.
(R3)Tanenbaum, Modern Operating Systems: Pearson New International
Edition, Pearson Education, 2013 (Pearson Online)
(R4)Stallings, Operating Systems: Internals and Design Principles :
International Edition, Pearson Education, 2013 (Pearson Online)
4
5/18/2020
BITS Pilani, Pilani Campus
Evaluation Scheme
5 unit course.
Sl Evaluation Duration Weightage % Nature of
No. Component Component
1 Mid Sem Exam 90 min 30% CB
2 Comprehensive 180 min 40% OB
Examination
3 Quiz ---- 5% OB
4 Assignments --- 25% OB
5
BITS Pilani, Pilani Campus
Assignments
• Two assignments:
• One pre-midsem exam : 12%
• One post-midsem : 13%
• Lab based
• Simulator to be used : CPU-OS simulator
• Open source tool
https://drive.google.com/open?id=12YUK52RQ-
JhP0ddj6CD_oifW4sTMbsBl
• Virtual lab (Platify)
• Is a complex system
• Is a programmable device
10
• Hardware
• Central Processing Unit (CPU)
• Memory
• I/O devices
• Software
• System Software
• System Management Software
• Tools and Utilities for Developing the software
• Application Software
• General Purpose Software
• Specific Purposed Software
• Processor - memory
– Data transfer between CPU and main memory
• Processor - I/O
– Data transfer between CPU and I/O module
• Data processing
– Some arithmetic or logical operation on data
• Control
– Alteration of sequence of operations
– e.g. jump
• Combination of above Break!
• Classes interrupts:
• Program
• e.g. overflow, division by zero
• Timer
• Generated by internal processor timer
• Used in pre-emptive multi-tasking
• I/O
• from I/O controller
• Hardware failure
• e.g. memory parity error
Reg
A
X1
B
X2
C
• Process Management
• Memory Management
• Storage Management
• Protection and security
• Is a sequence of bytes
• I/O device such as disks, keyboards, displays, and even
networks, is modeled as a file
• Reading and writing requires set of system calls
Last Class…
Contact List of Topic Title Text/Ref
Hour Book/external
resource
Performance Assessment Class Slides
3 MIPS Rate
Amdahl’s Law
Memory Organization T1, R2
4 Storage Technologies
Random Access Memory
Disk Storage
Solid State Disks
Storage Technology Trends
Locality
Locality of Reference to Program Data
Locality of instruction fetches
Memory Hierarchy
• Key features
– RAM is traditionally packaged as a
chip.
– Basic storage unit is normally a cell
(one bit per cell).
– Multiple RAM chips form a
memory.
• RAM comes in two varieties:
– SRAM (Static RAM)
– DRAM (Dynamic RAM)
• SRAM and DRAM are volatile
memories
– Lose information if powered off. BITS Pilani, Pilani Campus
SRAM vs DRAM Summary
Main memory
I/O bridge 0
A
Bus interface x A
ALU
R4
Main
memory
I/O bridge x 0
Bus interface x A
Register file
ALU
R4 x
Main memory
I/O bridge 0
Bus interface x A
ALU
R4 y
Main memory
I/O bridge 0
A
Bus interface A
Main memory reads data word y from the bus and stores
it at address A.
ALU
R4 y
main memory
I/O bridge 0
Bus interface y A
Actuator
Electronics
(including a
processor
and memory!)
SCSI
connector
Surface 0
• Disks consist of Surface 1
Platter 0
of concentric rings
called tracks Tracks
Spindle
Surface
• Aligned tracks form a Track k Gaps
cylinder
• Each track consists of Spindle
sectors separated by
gaps.
Sectors
spindle
spindle
spindle
spindle
Read/write heads
move in unison
from cylinder to cylinder
Arm
Spindle
After BLUE
read
After BLUE
read
After BLUE Seek for RED Rotational latency After RED read
read
After BLUE Seek for RED Rotational latency After RED read
read
Register file
ALU
System bus Memory bus
I/O Main
Bus interface
bridge memory
Main
Bus interface
memory
I/O bus
Main
Bus interface
memory
I/O bus
Main
Bus interface
memory
I/O bus
Sequential read tput* 550 MB/s Sequential write tput 470 MB/s
Random read tput 365 MB/s Random write tput 303 MB/s
Avg seq read time 50 us Avg seq write time 60 us
Tput Throughput
• Sequential access faster than random access
– Common theme in the memory hierarchy
• Random writes are somewhat slower
– Erasing a block takes a long time (~1 ms)
– Modifying a block page requires all other pages to be
copied to new block
– In earlier SSDs, the read/write gap was much larger.
Source: Intel SSD 730 product specification.
BITS Pilani, Pilani Campus
SSD Tradeoffs vs Rotating Disks
• Advantages
– No moving parts faster, less power, more rugged
• Disadvantages
– Have the potential to wear out
• Mitigated by “wear leveling logic” in flash translation
layer
• E.g. Intel SSD 730 guarantees 128 petabyte (128 x
1015 bytes) of writes before they wear out
– In 2015, about 30 times more expensive per byte
• Applications
– MP3 players, smart phones, laptops
– Beginning to appear in desktops and servers
BITS Pilani, Pilani Campus
The Bottom Line
• How much?
– Capacity
• How fast?
– Time is money
• How expensive?
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
Locality of Reference
During the course of the execution of a
program, memory references tend to
cluster
• Temporal locality: Locality in time
• If an item is referenced, it will tend
to be referenced again soon
• Spatial locality: Locality in space
• If an item is referenced, items
whose addresses are close by will
tend to be referenced soon.
Data :
• Access array elements in succession –
spatial locality
• Reference to “product” in each iteration
– Temporal locality
Instructions :
• Reference instructions in sequence :
Spatial locality
• Looping through : Temporal locality BITS Pilani, Pilani Campus
Performance enhancement -
Motivation
Sequential read tput* 550 MB/s Sequential write tput 470 MB/s
Random read tput 365 MB/s Random write tput 303 MB/s
Avg seq read time 50 us Avg seq write time 60 us
Tput Throughput
• Sequential access faster than random access
– Common theme in the memory hierarchy
• Disadvantages
– Have the potential to wear out
• Applications
– MP3 players, smart phones, laptops
– Beginning to appear in desktops and servers
• How much?
– Capacity
• How fast?
– Time is money
• How expensive?
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
Locality of Reference
During the course of the execution of a
program, memory references tend to
cluster
• Temporal locality: Locality in time
• If an item is referenced, it will tend
to be referenced again soon
• Spatial locality: Locality in space
• If an item is referenced, items
whose addresses are close by will
tend to be referenced soon.
Data :
• Access array elements in succession –
spatial locality
• Reference to “product” in each iteration
– Temporal locality
Instructions :
• Reference instructions in sequence :
Spatial locality
• Looping through : Temporal locality BITS Pilani, Pilani Campus
Performance enhancement -
Motivation
i = j modulo m
1 1 2
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
Find out
a) Number of bits required to address the main memory
Find out
d) Number of bits required to identify a word (byte) in a
block?
a) 20 bits
Ref 0 4 0 2 1 8 0 1 2 3 0 4
time 0 1 2 3 4 5 6 7 8 9 10 11
L0
L1
L2
L3
H/M
Ref 0 4 0 2 1 8 0 1 2 3 0 4
L0
L1
L2
L3
H/M
Ref 0 4 0 2 1 8 0 1 2 3 0 4
time 0 1 2 3 4 5 6 7 8 9 10 11
L0
L1
L2
L3
H/M
N=8
Address 0 1 2 3 4 5 6 7
Contents V[0] V[1] V[2] V[3] V[4] V[5] V[6] V[7]
Access Order 1 2 3 4 5 6 7 8
BITS Pilani, Pilani Campus
Byte Addressable
Stride k – memory and word
reference pattern
length is 1 byte
i Address i Address
0 0000 0 0000
Stride 1
1 0001 1 0001 Stride 2
2 0002 2 0002
3 0003 3 0003
4 0004 4 0004
Address difference
5 0005 5 0005
Stride=----------------------
6 0006 Word Length 6 0006
7 0007 7 0007
8 0008 8 0008
9 0009 9 0009
10 000A 10 000A
11 000B 11 000B
12 000C 12 000C
BITS Pilani, Pilani Campus
Stride k – reference pattern
Byte Addressable
memory and word
length is 2 bytes
i Address i Address
0 0000 0 0000
Stride 1
1 0002 1 0002 Stride 2
2 0004 2 0004
3 0006 3 0006
4 0008 4 0008
Address difference
5 000A 5 000A
Stride=----------------------
6 000C Word Length 6 000C
7 000E 7 000E
8 0010 8 0010
9 0012 9 0012
10 0014 10 0014
11 0016 11 0016
12 0018 12 0018
BITS Pilani, Pilani Campus
Revisiting Locality of reference
M = 2, N=3
Address 0 1 2 3 4 5
Contents A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2]
Access Order 1 2 3 4 5 6
M = 2, N=3
Address 0 1 2 3 4 5
Contents A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2]
Access Order 1 3 5 2 4 6
i=0 W10
W11
i=1
W12
i=2 W13
W14
I=3
W15
BITS Pilani, Pilani Campus
Example 2:
i=0 W10
W11
i=1
W12
i=2 W13
W14
I=3
W15
BITS Pilani, Pilani Campus
Home Work – Which one is
better ?
Program 1: Program 2:
for (int i = 0; i < n; i++) { for (int i = 0; i < n; i++) {
z[i] = x[i] – y[i]; z[i] = x[i] – y[i];
z[i] = z[i] * z[i]; }
} for (int i = 0; i < n; i++) {
z[i] = z[i] * z[i];
}
• 3 addresses
– Operand 1, Operand 2, Result
– c = a + b; add a, b, c
– May be a forth - next instruction (usually implicit)
– Needs very long words to hold everything
• 2 addresses
– One address doubles as operand and result
– a = a + b : add a, b
– Reduces length of instruction
– The original value of a is lost.
• Use:
• to construct a multiway branch table
32 bit registers EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP
16 bit registers AX, BX, CX, DX, SI, DI, SP, BP),
8 bit registers AH, AL, BH, BL, CH, CL, DH , DL
Segment registers CS, DS, ES, SS, FS, GS
2
BITS Pilani, Pilani Campus
MIPS Register Set
Types of Registers
• 32 general-purpose registers ($0 – $31)
• Program Counter (PC)
• Two special Purpose register (HI and LO)
– Used to hold the results of integer multiply and
divide instruction.
– integer multiply operation, HI and LO register hold
the 64-bit result.
– integer divide operation, the 32-bit quotient is
stored in the LO and the remainder in the HI
register.
3
BITS Pilani, Pilani Campus
General-Purpose Register >>
5
BITS Pilani, Pilani Campus
Instruction Format
Fixed-length instruction format
32-bits long
Three different instruction formats:
1. Immediate (I-type)
2. Jump (J-type)
3. Register (R-type)
Meaning of various fields in the instruction format
– op: Opcode
– rs: The first register source operand
– rt: The second register source operand
– rd: The register destination operand
– shamt / sa: shift amount
– funct: function code 6
BITS Pilani, Pilani Campus
Instruction Format
Register File
R-Type:
I-Type:
J-Type:
7
BITS Pilani, Pilani Campus
Addressing modes
8
BITS Pilani, Pilani Campus
Addressing modes
9
BITS Pilani, Pilani Campus
MIPS Instruction Set
10
BITS Pilani, Pilani Campus
A Basic MIPS Implementation
s2 s1
s2 s1
j : jump unconditional
– format : j label
Summary
• lw $s1, 100($s2) • or $s1, $s2, $s3
Examples:
add, sub, and, or, slt rs, rt, rd
lw $s1, 100($s2)
sw $s1, 100($s2)
lw $s1, 100($s2)
s2 s1
sw $s1, 100($s2)
s2 s1
R-Type I-Type
37
BITS Pilani, Pilani Campus
Datapath
ALUop Instructions
00 lw, sw
01 Beq
10 add, sub, and, or, slt
Opcode
rs RegWrite
rt MemWrite
ALUSrc MemtoReg
rd
RegDst
Offset
MemRead
Function ALUop
6/9/2020 10:38
Lucy
PMJ. Gudino BITS Pilani, Pilani Campus
7
Home Work:
Execution steps:beq $s1,$s2, offset
1. The instruction is fetched, and the PC is
incremented
2. Two registers $s1 and $s2 are read from the
register file
3. The ALU performs a subtract on the data values
read from the register file. The value of PC + 4 is
added to the sign extended, lower 16 bits of the
instruction (offset) sifted left by two; the result is
the branch target address
4. The zero result from the ALU is used to decide
which adder result to store into the PC
• IR Instruction Register
• MDR Memory Data Register
• A,B used to hold the register operand values read
from the register file
• ALUout holds the output of the ALU
rs
rt
rd
1: Write
Memory
00: B register
01: 4
10: sign extd input
0:from ALUout
11: Sign extd input <<2
1: from MDR BITS Pilani, Pilani Campus
Complete Data
1: PC gets
Path (3/3)
updated
Back
BITS Pilani, Pilani Campus
First Step FIG
00: register
00 - ALUSrcB 01: 4
10:Sign Extd lower 16 bits of IR
11 :Sign Extd lower 16 bits of IR << 2
10 - ALUop
00: Add operation
01: Subtract Operation
10:Function field of the instruction
determines the ALU operation
BITS Pilani, Pilani Campus
Step 3: Contd… 0: First operand is PC
1: First operand is register Fig
00: register
3. Branch 01: 4
if ( A == B ) PC ALUout
10:Sign Extd lower 16 bits of IR
1 - ALUSrcA 11 :Sign Extd lower 16 bits of IR << 2
4. Jump
PC { PC[31:28], (IR [25:0] << 2)
10 - PCSource
Memory reference
lw
MDR Memory [ALUout]
1 MemRead
1 IorD
sw
Memory [ALUout] B
1 MemWrite
IorD
1
RegWrite 1
RegDst 0
OR
L S1 L S2 L L Sk L
Clock
Numerical Example:
X = 0.9504*103
Y = 0.8200*102
BITS Pilani, Pilani Campus
Instruction pipelining
• The execution of a stream of instructions can be pipelined
by overlapping the execution of the current instruction with
the fetch, decode……of subsequent instructions
• Sequence of steps followed in most general purpose
computer to process instruction
1. IF: Fetch the instruction from memory
2. ID: Decode the instruction
3. CO: Calculate the effective address
4. FO: Fetch the operands from memory
5. EI: Execute the instruction
6. WO: Store the result in the proper place
functions
Output Output A
S2
S2 A
S2
S3 A Reservation Table B
Output B
T0 T1
S1 B
S3 S3
S2 B
Output A Output A
S1 S1
S2
S2
S3
S3
Output A
Output A BITS Pilani, Pilani Campus
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Reservation Table
Machine X
input output
S1 S2 S3
Reservation Table
Time
0 1 2 3 4 5 6 7
Stage
S1 X X X X
S2 X X
S3 X X
Static and Dynamic pipelining
L S1 L S2 L L Sk L
Clock
Fig: Structure of a pipeline
t = max{ti} + tL
Pipeline processor frequency f = 1/ t
• Maximum efficiency
1 as n
• Implies that the larger the number of tasks flowing through
the pipeline, the better is its efficiency
• In steady state of a pipeline, we have n >> k, then efficiency
should approach 1
• However, this ideal case may not hold all the time because of
program branches and interrupts and data dependencies
BITS Pilani, Pilani Campus
Important Terms
n nf
Hk ηf
k n 1τ k n 1
1 2 3 4 5 6 7 8 9 10 11 12 13
T1
S1 T1
T1 T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6
T6 T7
T7
T7 T8
T8
S2 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6
T6 T7
T7 T8
T8
S3 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6 T7
T7 T8
T8
S4 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4
T4 T5
T5 T6
T6 T7
T7 T8
T8
S5 T1
T1
T1 T2
T2
T2 T3
T3
T3 T4
T4 T5
T5 T6
T6 T7
T7 T8
T8
S6 T1
T1
T1 T2
T2
T2 T3
T3 T4
T4 T5
T5 T6
T6 T7
T7 18
BITS Pilani, Pilani Campus
Alternate Method
1 2 3 4 5 6 7 8 9 10 11 12 13
T1 S1 S2 S3 S4 S5 S6
T2 S1 S2 S3 S4 S5 S6
T3 S1 S2 S3 S4 S5 S6
T4 S1 S2 S3 S4 S5 S6
T5 S1 S2 S3 S4 S5 S6
T6 S1 S2 S3 S4 S5 S6
T7 S1 S2 S3 S4 S5 S6
T8 S1 S2 S3 S4 S5 S6
time
1 2 3 4 5 6 7 8 9 10
I1 FI DI FO EI WO
I2 FI DI FO EI WO
I3 FI DI FO EI WO
I4 FI DI FO EI WO
I5 FI DI FO EI WO
I6 FI DI FO EI WO
• Delayed Branch
ADD R3, R1, R2
– Do not take jump
SUB R1, R2, R3
until you have to
ADD R3, R1, R2
– Rearrange SUB R4, R1, R2
instructions JNZ NEXT
MUL R5, R1, R2
ADD R6, R3, R4
:
NEXT: ST R3, LOCN1
HLT