Вы находитесь на странице: 1из 18

ANAND INSTITUTE OF HIGHER TECHNOLOGY Chennai-603 103

DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING


CS2071 COMPUTER ARCHITECTURE
Faculty: C.MAGESHKUMAR

Class: IV EIE A&BSemester: VII

UNIT IV MEMORY SYSTEM


I.

Page no.
2
2
2
4
4
5

CONTENT
Review of digital design
1.
Signals, logic operators and gates
2.
Gates as control element
3.
Combinational circuits
4.
Programmable combinational parts
5.
Sequential circuits
Main memory concepts
1.
Memory definition
2.
Memory hierarchy
3.
Memory performance parameters
4.
Memory structure and memory cycle, Memory chip organization
5.
Hitting memory wall, Pipelined memory and interleaved memory

7
7
7
9
9
11

III.

Types of memory
1.
Types
2.
Static RAM
3.
Dynamic RAM
4.
Other types

12
12
12
12
12

IV.

Cache memory organization


1.
Cache memory & need for cache
2.
Basic cache terms, Design parameters of cache memory
3.
What makes the cache work?
4.
Cache organization (mapping)
5.
Cache performance measure
6.
Cache and main memory
7.
Cache coherency

12
12
13
14
15
16
16
17

II.

V.
VI.

Secondary storage(Mass Memory Concepts)

17

Virtual memory and paging

17

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

REVIEW OF DIGITAL DESIGN


1. SIGNALS, LOGIC OPERATORS AND GATES:
Signals:
All information elements in digital computers including instructions, numbers, and symbols are
encoded as electronic signals that are almost always two-valued. (binary).
Binary signals can be represented by the presence or absence of some electrical property such as voltage,
current, field or charge.
Signals 1.Analogsignal (continuous signal)
2. Digital signal (binary signal)
0s
1s
off
On
False
True
Low
High
Negative Positive
Circuits:
Combinational digital circuit (memoryless circuit) ex: multiplexer, decode, encoder
Sequential digital circuits ( circuit with memory) ex: latches, flip-flops, register
Logic operators:
Name

NOT

AND

OR

XOR

Operator
sign and
alternat e(s)

x
_
x or x

xy
x y

x y
xy

xy
x y

Output
is 1 iff:

Input is 0

Both inputs
are 1s

At least one
input is 1

Inputs are
not equal

1x

x y or xy

x y xy

x y 2xy

Graphical
symbol

Arithmetic
expression

Variations in Gate Symbols


Gates with more than two inputs and/or with inverted signals at input or output.

AND

OR

NAND

NOR

XNOR

2. GATES AS CONTROL ELEMENTS:


Tristate buffer:
whose output is equal to data input when control signal is asserted (declared) and assumes an
indeterminate value when e is de-asserted.
Used to effectively isolate output from input.
An AND gate and a tristate buffer act as controlled switches or valves. An inverting buffer is
logically the same as a NOT gate.

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Enable/Pass signal
e

Enable/Pass signal
e
Data out
x or 0

Data in
x

Data in
x

(a) AND gate for controlled transfer

Data out
x or high impedance
(b) Tristate buffer

e
0

0
x

e
0

ex

(c) Model for AND switch.

No data
or x

(d) Model for tristate buffer.

Wired OR and Bus Connections:


Wired OR allows tying together of several controlled signals.
ex

ex
x

x
ey

ey

Data out
(x, y, z,
or high
impedance)

Data out
(x, y, z, or 0)
ez

ez
z

z
(a) Wired OR of product terms

Control/Data Signals and Signal Bundles:


Arrays of logic gates represented
8
by a single gate symbol.
/
/

(b) Wired OR of t ristate outputs

Enable
/

Compl

(a) 8 NOR gates


(b) 32 AND gat es
Designing Gate Networks
AND-OR, NAND-NAND, OR-AND, NOR-NOR
Logic optimization: cost, speed, power dissipation
A two-level AND-OR circuit and two equivalent circuits are shown below
x
y
y
z
z
x
(a) AND-OR circuit

x
y
y
z
z
x

32

32

(c) k XOR gat es

x
y
y
z
z
x

(b) Int ermediate circuit

(c) NAND-NAND equivalent

BCD-to-Seven-Segment Decoder:
The logic circuit that generates the enable signal for the lowermost segment (number 3) in a sevensegment display unit.
4-bit input in [0, 9]
x3 x2 x1 x0

Signals to
enable or
turn on the
segments

e0

e5

e6
e4
e3

6
4

2
3

e2
e1

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

3. COMBINATIONAL (MEMORYLESS) CIRCUITS : (COMBINATIONAL PARTS)


High-level building blocks
Much like prefab parts used in building a house
Arithmetic components (adders, multipliers, ALUs)
examples for combinational part are:
multiplexers,
decoders/demultiplexers,
encoders
Multiplexer (MUX): (many to one)
Multiplexer (mux), or selector, allows one of several inputs to be selected and routed to output
depending on the binary value of a set of selection or address signals provided to it.
Decoders/Demultiplexers
A decoder allows the selection of one of 2a options using an a-bit address as input. A demultiplexer
(demux) is a decoder that only selects an output if its enable signal is asserted.
x0
z
x1
y

x0
x1

(a) 2-to-1 mux

z
y

/
32

/
32

/
32

x0
x1
x2
x3

0
1
2
3

y
y1y0

(d) Mux array

x0

x1

x2

x3

(e) 4-to-1 mux with enable

x1

y1 y0

y
(c) Mux symbol

(b) Switch view


e (Enable)

x0

x0

0
1
2
3

x1
0

y0

x2

y1

y1y0

y1y0
x0
x1
x2
x3

e
(Enable)

x3

y0
(e) 4-to-1 mux design

(a) 2-to-4 decoder

(b) Decoder symbol

0
1
2
3

(c) Demultiplexer, or
decoder with enable

4. PROGRAMMABLE COMBINATIONAL PARTS

A programmable combinational part can do the job of many gates or gate networks.
To avoid having to use large number of small-scale integrated circuits for implementing Boolean
function of several variables.
Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
Programmable ROM (PROM)
Programmable connections and their use in a PROM are shown below
w

Inputs

Decoder

.
.
.

...
Outputs
(a) Program mable
OR gates

(b) Logic equivalent


of part a

(c) Programmable read-only


memory (PROM )

Programmable array logic (PAL): when OR array has fixed connections but the inputs to
AND gates can be programmed.
Programmable logic array (PLA): when both AND and OR arrays are programmed.
Programmable combinational logic: general structure and two classes known as PAL and
PLA devices. Not shown is PROM with fixed AND array (a decoder) and programmable
OR array is shown below
4

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

x0
x1
x2
x3

Inputs

8-input
ANDs

...
AND
array
(AND
plane)

6-input
ANDs

OR
array
(OR
plane)

.
.
.

...

4-input
ORs

Outputs
(a) General programmable
combinational logic

(b) PAL: programmable


AND array, fixed OR array

(c) PLA: programmable


AND and OR arrays

Timing and Circuit Considerations:


Changes in gate/circuit output, triggered by changes in its inputs, are not instantaneous
Gate delay (): a fraction nanoseconds, delay time taken by the gate to give the output after giving
the input
Wire delay, previously negligible, is now important (electronic signals travel about 15 cm per ns)
Circuit simulation to verify function and timing
y
P
CMOS Transmission Gates:
A CMOS transmission gate
and its use in building a 2-to-1 mux.

x0

TG

z
N

x1

TG

(a) CMOS transmission gate:


circuit and symbol

TG

(b) Two-input mux built of t wo


transmission gat es

5. SEQUENTIAL CIRCUITS (WITH MEMORY)


(NOTE:Please Refer Page No.28 34,Chapter 2 in B.Parhami,Computer Architecture for Detailed
Description)

A programmable sequential part contain gates and memory elements


Programmed by cutting existing connections (fuses) or establishing new connections
(antifuses)
Design of sequential circuits exhibiting memory requires the use of storage elements
capable of holding information (a single bit) can be set to 1 or reset to 0
Programmable array logic (PAL)
Field-programmable gate array (FPGA)
Both types contain macrocells and interconnects
Latches, Flip-Flops, and Registers
D

S
C

(a) SR latch
D

D
C

Q
Q

(b) D latch
D
C

Q
Q

(c) Master-slave D flip-flop

FF
C

(d) D flip-flop symbol

FF
C

(e) k -bit register

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Sequential Machine Implementation (Hardware realization of Moore and Mealy sequential machines)
Only for Mealy machine

Inputs
Next-state
logic

/
m

/
l

Next-state
excitation
signals

Present
state

Outputs

Output
logic

State register

/
n

DESIGNING SEQUENTIAL CIRCUITS:


Inputs
q

Useful Sequential Parts:

High-level building blocks


Much like prefab closets used in building a house d
Other memory components will be
SRAM details, DRAM, Flash
Here we cover three useful parts:
shift register, register file (SRAM basics), counter

Output
e

FF2
C

FF1
C

FF0
C

Write
data

2 h k -bit registers
/

Write
/
address h

FF
C

Write
enable

FF
C

Decoder

Muxes
Write enable

Read
data 0

k
D

FF

FF
C

Write
data

Write
addr

Read
addr 0

Read
data 1

Read
data 0 k/

Read
data 1 k/

Read
addr 1
Read enable

(b) Graphic symbol


for register file

Push

Read address 0

Read address 1

Read
enable

Input
Empty

Full
Output /
k
Pop

(a) Register file with random access

(c) FIFO symbol

(NOTE:

PLEASE REFER CHAPTER 1 AND CHAPTER 2 IN B.PARHAMI,COMPUTER


ARCHITECTURE FOR DETAILED DESCRIPTIONS)

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

II. MAINMEMORY CONCEPTS:


1. MEMORY DEFINITION
Memory:
Memory refers to a physical device used to store programs or data on a temporary or permanent basis for
use in a computer or other electronic device.
Memory cell:
A memory cell is capable of storing one bit of information. It is usually organized in the form of an array.
Components of the Memory System
Main memory: fast, random access, expensive, located close (but not inside) the CPU and used to store
program and data which are currently manipulated by the CPU.
Secondary memory: slow, cheap, direct access, located remotely from the CPU.
2.

MEMORY HIERARCHY

The Need for a Memory Hierarchy:


To match memory speed with processor speed
o Memory holding the program must be accessible in nanoseconds or less.
The widening speed gap between CPU and main memory
o Processor operations take of the order of 1 ns
o Memory access requires 10s or even 100s of ns
Memory bandwidth limits the instruction execution rate
o Each instruction executed involves at least one memory access. Hence, a few to 100s of MIPS is
the best that can be achieved.
o A fast buffer memory can help bridge the CPU-memory gap
o The fastest memories are expensive and thus not very large
Problems with the Memory System
What do we need?
We need memory to fit very large programs and towork at a speed comparable to that of themicroprocessors.
Main problem:
- microprocessors are working at a very high rateand they need large memories;
- memories are much slower than microprocessors;
Facts:
- the larger a memory, the slower it is;
- the faster the memory, the greater the cost/bit.
A Solution:
It is possible to build a composite memory system which combines a small, fast memory and a large slow
main memory and which behaves (most of the time) like a large fast memory.
The two level principle above can be extended into a hierarchy of many levels including the secondary
memory (disk store).
The effectiveness of such a memory hierarchy is based on property of programs called the principle of
locality

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Some typical characteristics:


1. Processor registers:
- 32 registers of 32 bits each = 128 bytes
- access time = few nanoseconds
2. On-chip cache memory:
- capacity = 8 to 32 Kbytes
- access time = ~10 nanoseconds
3. Off-chip cache memory:
- capacity = few hundred Kbytes
- access time = tens of nanoseconds
4. Main memory:
- capacity = tens of Mbytes
- access time = ~100 nanoseconds
5. Hard disk:
- capacity = few Gbytes
- access time = tens of milliseconds
The key to the success of a memory hierarchyis if dataand instructions can be distributed across the
memoryso that most of the time they are available, when needed,on the top levels of the hierarchy.

The data which is held in the registers is under thedirect control of the compiler or of the assembler
programmer.

The contents of the other levels of the hierarchy aremanaged automatically:


- Migration of data/instructions to and fromcaches is performed under hardware control;
- Migration between main memory and backupstore is controlled by the operating system
(withhardware support).

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

3.

4.

MEMORY PERFORMANCE PARAMETERS(Refer page no. 167 & 168 in Xerox)


Access methods: sequential access and random access
Performance: Access time, memory cycle time, transfer rate
MEMORY STRUCTURE AND MEMORY CYCLE, MEMORY CHIP ORGANIZATION

(With addition to the below notes & pictures, also refer page number 175 to 191 in Xerox)
SRAM:
Basically large array of storage cells that are accessed like registers
SRAM memory cell requires 4-6 transistors / bit
SRAM holds the stored data as long as it is powered on.
These storage cells are edge triggered D-flip flops
Limitations of flip flops:
o Adds complexity to cells
o Only fewer cells can be mounted on chip.
So, Latches are used instead of flip-flops but it will take more time write/read
Memory Structure and SRAM(page no. 317 in B.Parhami)
Conceptual inner structure of a 2hg SRAM chip and its shorthand representation is shown below
Output enable
Chip select

Storage
cells

Write enable

Data in
Address

FF

Data out

0
D

FF

Address
decoder

1
.
.
.

FF
C

2h 1

WE
D in
D out
Addr
CS
OE

SRAM with Bidirectional Data Bus


When data input and output of an SRAM chip are shared or connected to a bidirectional data bus, output
must be disabled during write operations.
Output enable
Chip select

Write enable
Data in/out
Address

Data in

Data out

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Multiple-Chip SRAM

Eight 128K 8 SRAM chips forming a 256K 32 memory unit is shown below
Data
in

32
Address

/
18

/
17

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

WE
D in
D out
Addr
CS
OE

MSB

Data out,
byte 3

Data out,
byte 2

Data out,
byte 1

Data out,
byte 0

DRAM and Refresh Cycles


DRAM:

Stores data as electric charge on tiny capacitor, that is accessed by MOS transistor
When word line is asserted (declared),
o to write:
low voltage on bit line causes capacitor to discharged. i.e., bit = 0
high voltage on bit line causes capacitor to charged. i.e., bit = 1
o to read
read operation takes in 2 steps:
step1:row is accessed
step2: column selection
bit line is prefetched first to halfway voltage and sensed by sense amplifier.
Reading operation destroys the content, so a write operation is enabled after reading. This is
also called destructive readout
Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity
DRAM memory chips.
Vcc
Word line
Word line

Pass
transistor
Capacitor
Bit
line

Compl.
bit
line

Bit
line

(a) DRAM cell

(b) Typical SRAM cell

10

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

DRAM Refresh Cycles and Refresh Rate:


o Variations in the voltage across a DRAM cell capacitor after writing a 1 and subsequent refresh
operations
o Leakage of charge causes (tiny capacitor) data to be erased after fraction of second due to discharging
nature of capacitor. So DRAM should be periodically refreshed.
o Refreshing: write operation is enabled when capacitor charge value nears the threshold voltage.

Voltage
for 1

1 Written

Refreshed

Refreshed

Refreshed

Threshold
voltage
0 Stored

Voltage
for 0

10s of ms
before needing
refresh cycle

Time

DRAM Packaging:
24-pin dual in-line package (DIP) : Typical DRAM package housing a 16M 4 memory
Vss D4 D3 CAS OE A9 A8 A7 A6 A5 A4 Vss

Legend:

24

23

22

21

20

19

18 17

16

15

14

13

10

11

12

Ai
CAS
Dj
NC
OE
RAS
WE

Address bit i
Column address strobe
Data bit j
No connection
Output enable
Row address strobe
Write enable

Vcc D1 D2 WE RAS NC A10 A0 A1 A2 A3 Vcc

MEMORY CYCLE:(Refer page number 182, 183 in xerox)

5.

HITTING MEMORY WALL, PIPELINED MEMORY AND INTERLEAVED MEMORY

Relative performance

10 6

Hitting the Memory Wall:


Memory density and capacity have grown
along with the CPU power and complexity,
but memory speed has not kept pace.

Processor

10 3

Memory
1
1980

1990

2000

2010

Calendar year
Bridging the CPU-Memory Speed Gap
Two ways of using a wide-access memory to bridge the speed gap between the processor and memory.
Idea: Retrieve more data from memory with each access

Wideaccess
memory

.
.
.

.
.
.

Mux

Narrow bus
to
processor

Wideaccess
memory

(a) Buffer and mult iplex er


at the memory side

.
.
.

Wide bus
to
processor

.
.
.

Mux

(a) Buffer and mult iplex er


at the processor side

11

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

PIPELINED MEMORY AND INTERLEAVED MEMORY:


(Refer page no. 325 in text book B.Parhami)

Memory latency may involve other supporting operationsbesides the physical access itself
o Virtual-to-physical address translation
o Tag comparison to determine cache hit/miss
Pipelined cache memory is shown below
Address
translation

Row
decoding
& read out

Column
decoding
& selection

Tag
comparison
& validation

Memory Interleaving:
o Interleaved memory is more flexible than wide-access memory in that it can handle multiple
independent accesses at once.
Module accessed

Addresses that
are 0 mod 4

0
1

Address
Data
in

Dispatch
(based on
2 LSBs of
address)

Addresses that
are 1 mod 4
Return
data

Data
out

3
0

Addresses that
are 2 mod 4

1
Bus cycle

Addresses that
are 3 mod 4

Memory cycle

2
3

Time

III. TYPES OF MEMORY (Refer page no. 171 to 175, 193 to 196, 200 to 209 in Xerox)
1. TYPES
2. Static RAM
3. Dynamic RAM
4. Other types
IV. CACHE MEMORY ORGANIZATION
1. CACHE MEMORY & NEED FOR CACHE:
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access
memory. The cache is a smaller, faster memory which stores copies of the data from frequently used main
memory locations.
As long as most memory accesses are cached memory locations, the average latency of memory accesses will
be closer to the cache latency than to the latency of main memory.
A cache memory is a small, very fast memory that retains copies of recently used information frommain
memory. It operates transparently to theprogrammer, automatically deciding which valuesto keep and which to
overwrite.
The processor operates at its high clock rate only when the memory items it requires are held in the
cache.The overall system performance depends strongly on the proportion of the memory accesses which can
be satisfied by the cache
12

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Cache space (~KBytes) is much smaller than mainmemory (~MBytes) Items have to be placed in the
cache so that theyare available there when (and possibly only when)they are needed.
When memory size increases cost, speed, memory access time altogether decreases.
Processor speed memory speed
There is a huge gap between processor speed and memory speed when compared to the improvement and
processor performance
Cache memories act as intermediaries between the superfast processor and the much slower main
memory.

Multiple caches

2. BASIC CACHE TERMS, DESIGN PARAMETERS OF CACHE MEMORY

An access to an item which is in the cache or finding required data in cache:cache hit
An access to an item which is not in the cacheor not finding the required data in cache : cachemiss.
The proportion of all memory accesses that aresatisfied by the cache or fraction of data accesses that can
be satisfied from cache as opposed to slower memory (main memory) : hit rate
The proportion of all memory accesses that are notsatisfied by the cache: miss rate
The miss rate of a well-designed cache: few %
Cfast cache memory access cycle
Cslow slower memory (main memory) access cycle
Ceff effective memory cycle time
One level of cache with hit rate h

Ceff= hCfast + {(1 h)(Cslow + Cfast) }


Ceff= Cfast + (1 h)Cslow
Ceff =Cfast (when hit rate (h) = 1)

Ceff =Cfast creates an illusion that entire memory space consist of fast memory (cache memory)
13

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Compulsory misses:also called as cold-start miss. Occurs at first access to any cache line. With on-demand
fetching, first access to any item is a miss. Some compulsory misses can be avoided by prefetching.

Capacity misses:Since cache capacity is limited, after accessing all cache block it should be overwritten with
next set of instruction. We have to oust (throw out) some items to make room for others. This leads to misses
that are not incurred with an infinitely large cache.

Conflict misses:Also called as collision miss, occurs when useless data are placed in cache that forces to
overwrite useful data to bring new required data block.Occasionally, there is free room, or space occupied by
useless data, but the mapping/placement scheme forces us to displace useful items to bring in other items.
This may lead to misses in future.

DESIGN PARAMETERS:

Cache size: in bytes or words, a larger cache can hold more of the programs useful data but is more costly
and likely to be slower.

Block size or cache line width: unit of data transfer between cache and main memory. With a larger cache
line, more data is brought in cache with each miss. This can improve the hit rate but also may bring lowutility data in.

Placement policy: To determine where an incoming cache line can be stored (where to store memory (data)
coming from main memory). More flexible policies imply higher hardware cost and may or may not have
performance benefits (due to more complex data location).

Replacement policy: To determine which block (cache) can be overwritten.Determining which of several
existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies:
choosing a random or the least recently used block.
Replacement in 2 ways:
1. choosing random block
2. choosing least recently used block.

Write policy: To determine Determining if updates to cache words are immediately forwarded to main
(write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or
copy-back).
o When to forward / update main memory or the cache word are updated (memory write)
o Modified cache blocks are copied entirely replacing main memory
o When to transfer updated main memory to cache (copy back or write back policy)

REPLACEMENT ALGORITHMS: (Refer page no. 229-230 in XEROX )

3. WHAT MAKES THE CACHE WORK?


How can this work?The answer is: locality
During execution of a program, memory referencesby the processor, for both instructions and data,tend to
cluster: once an area of the program isentered, there are repeated references to a smallset of instructions (loop,
subroutine) and data(components of a data structure, local variables orparameters on the stack).
Cache improves performance of modern processor because of 2 locality properties of memory access patterns
in typical programs.
locality properties cause instruction and data at given given point in a programs execution to reside in cache
that results in high cache hit rates (90-98%) and low cache miss rates (2-10%)

14

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Temporal locality (locality in time): If an item isreferenced, it will tend to be referenced again
soon.Instruction or data once accessed & it will take more time to do a second access and further accesses.
Spacial locality (locality in space): If an item isreferenced, items whose addresses are close bywill tend to be
referenced soon.Consecutive access of nearby memory location frequently.

4. CACHE ORGANIZATION (MAPPING)


(Refer page number 221 - 227 in xerox)
Direct Mapping
Advantages:
Simple and cheap;
The tag field is short; only those bits
have to be stored which are not used to
address the cache (compare with the
following approaches);
Access is very fast.
Disadvantage:
A given block fits into a fixed cache
location a given cache line will be
replaced whenever there is a reference to
another memory block which fits to
the same line, regardless what the status
of the other cache lines is This can
produce a low hit ratio, even if only a
verysmall part of the cache is effectively
used.

15

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Associative Mapping
Advantages:
Associative mapping provides the
highest flexibility concerning the line to
be replaced when a newblock is read
into the cache.
Disadvantages:
Complex
The tag field is long
Fast access can be achieved only using
highperformance associative memories
for the cache,which is difficult and
expensive.

5. CACHE PERFORMANCE MEASURE


For a given cache size, the following design issues and tradeoffs exist:
Line width (2W). Too small a value for W causes a lot of main memory accesses; too large a value increases the
miss penalty and may tie up cache space with low-utility items that are replaced before being used.
Set size or associativity (2S). Direct mapping (S = 0) is simple and fast; greater associativity leads to more
complexity, and thus slower access, but tends to reduce conflict misses. More on this later.
Line replacement policy. Usually LRU (least recently used) algorithm or some approximation thereof; not an
issue for direct-mapped caches. Somewhat surprisingly, random selection works quite well in practice.
Write policy. Modern caches are very fast, so that write-through is seldom a good choice. We usually implement
write-back or copy-back, using write buffers to soften the impact of main memory latency.
Performance characteristics of two level memories: (Refer page no. 243-244 in xerox)

6. CACHE AND MAIN MEMORY


(Refer page no. 345-346 in text book B.Parhami)
Split cache: separate instruction and data caches (L1)
Unified cache: holds instructions and data (L1, L2, L3)
Harvard architecture: separate instruction and data memories
Von Neumann architecture: one memory for instructions and
data
The writing problem:
Write-through slows down the cache to allow main to catch up.
Write-back or copy-back is less problematic, but still hurts
performance due to two main memory accesses in some cases.
Solution: Provide write buffers for the cache so that it does not
have to wait for main memory to catch up.
Advantages of unified caches:
- they are able to better balance the load between instruction and
data fetches depending on the dynamics of the program execution;
- design and implementation are cheaper.
Advantages of split caches (Harvard Architectures)
- competition for the cache between instructionprocessing and
execution units is eliminatedinstruction fetch can proceed in
parallel with memory access from the execution unit.
16

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

7. CACHE COHERENCY
(Refer page number 228 - 229 in xerox)
(Refer page no. 512-514 in text book B.Parhami)
VI. SECONDARY STORAGE(MASS MEMORY CONCEPTS)
(Refer page number 200 218 in xerox)
(Refer page no. 353 365 in text book B.Parhami)
1. Disk Memory Basics
2. Organizing Data on Disk

3.
4.
5.
6.

Disk Performance
Disk Caching
Disk Arrays and RAID (Refer page number 209 in xerox)
Other Types of Mass Memory

VII. VIRTUAL MEMORY AND PAGING


(Refer page number 230 - 243 in xerox)
1.The Need for Virtual Memory
2.Address Translation in Virtual Memory
3.Translation Lookaside Buffer
4.Page Placement and Replacement
5.Main and Mass Memories
6.Improving Virtual Memory Performance

17

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Page Table
The page table has one entry for each page of thevirtual memory space.
Each entry of the page table holds the address ofthe memory frame which stores the respectivepage, if that page is
in main memory.
Each entry of the page table also includes somecontrol bits which describe the status of the page:
- whether the page is actually loaded into mainmemory or not;
- if since the last loading the page has beenmodified;
- information concerning the frequency ofaccess, etc.
Problems:
- The page table is very large (number of pagesin virtual memory space is very large).
- Access to the page table has to be very fast the page table has to be stored in very fastmemory, on chip.
A special cache is used for page table entries,called translation lookaside buffer (TLB); it works inthe same way
as an ordinary memory cache andcontains those page table entries which have beenmost recently used.
The page table is often too large to be stored inmain memory. Virtual memory techniques are used to store the
page table itself only part of thepage table is stored in main memory at a givenmoment.
The page table itself is distributed along thememory hierarchy:
- TLB (cache)
- main memory
- disk
Memory Reference with Virtual Memory and TLB
Memory access is solved by hardware except thepage fault sequence which is executed by the OSsoftware.
The hardware unit which is responsible fortranslation of a virtual address into a physical one isthe Memory
Management Unit (MMU).

18

CMageshKumar_AP_AIHT

CS2071_Computer Architecture

Вам также может понравиться