Ict123 W9

Centre for Computer Technology
ICT123 Computer Architecture

Week 09
Virtual Memory Systems and Cache Systems
Content at a Glance
Review
week 8 Introduction Virtual Memory Paging Segmentation

Cache Direct Set
March 20, 2012
Associative
associative
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Memory Hierarchy
March 20, 2012
SRAM v DRAM
Both Dynamic
Power
volatile cell
needed to preserve data
Simpler
Static
to build, smaller More dense Less expensive Needs refresh Larger memory units
Faster Cache
March 20, 2012
Types of ROM
ROM
- Written during manufacture, Very expensive for small runs PROM - Programmable (once), Needs special equipment to program Read mostly Erasable Programmable (EPROM), Erased by UV Electrically Erasable (EEPROM), Takes much longer to write than read
March 20, 2012
Flash
memory, Erase whole memory electrically

Hard Failure
Permanent Random,
Error Correction
defect
Soft
non-destructive No permanent damage to memory
Error
Detected
using CRC or Hamming error correcting code
March 20, 2012
Virtual Memory
Use of
main memory and disk space to provide the illusion of endless amount of physical memory. Memory on hard disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory
March 20, 2012
A System with Physical Memory Only Memory

Physical Addresses 0: 1:
CPU
N-1:
Addresses generated by the CPU correspond directly to bytes in physical memory Examples include, Most Cray machines, early PCs, some embedded systems
(Virtual Memory, CS 105, Tour of the Black Holes of Computing!) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
March 20, 2012
A System with Virtual Memory Memory

Page Table Virtual Addresses 0: 1: Physical Addresses 0: 1:
CPU
P-1:
N-1: Disk
Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table)
March 20, 2012 Richard Salomon, Sudipto Mitra Examples include, Workstations, servers, modern PCs, etc.
Copyright Box the Black (Virtual Memory, CS 105, Tour of Hill InstituteHoles of Computing!)
Virtual Memory
Combination
of physical memory and disk space to create a memory image for a running application Divided into logical segments of variable size (segmentation) or fixed length pages (pagination) Paged systems are easier to manage, and are made transparent to the application (user). Virtual memory management is an operating system issue
March 20, 2012
Paging
Split
memory into equal sized, small chunks -page frames, typically 4k each Split programs (processes) into equal sized small chunks - pages Allocate a number of page frames to a process Operating System maintains list of free frames A process does not require contiguous page frames Use page table to keep track
March 20, 2012
Disable Caching

Ref.
Dirty Protection Present/Absent
Page Frame #
Present/Absent bit Is page loaded in main memory? Page Frame # - physical page frame where virtual page is loaded Protection read, write, executable
If access is not allowed, produce segmentation fault
dirty bit has page been modified?
If yes, then need to write back to disk when page is swapped out of physical memory Helps OS decide whether or not to swap page out
Referenced set when page is accessed
Cached copy may be invalid if page is mapped to I/O device and will March 20, 2012 change often Richard Salomon, Sudipto Mitra I/O) (memory-mapped Copyright Box Hill Institute
(Memory Hierarchy, Quynh Dinh)
Disable caching if set, do not use cached copy
Allocation of Free Frames
March 20, 2012
Logical and Physical Addresses Paging
March 20, 2012
Virtual Memory
Demand
Do
not require all pages of a process in memory Bring in pages as required
paging
Page
Required
March 20, 2012
page is not in memory Operating System must swap in required page May need to swap out a page to make space Select page to throw out based on recent history
fault
Too
many processes in too little memory Operating System spends all its time swapping Little or no real work is done Disk light is on (flickering) all the time
Solutions
Good
Thrashing
page replacement algorithms Reduce number of processes running Fit more memory
March 20, 2012
Bonus of Demand Paging

We
do not need all of a process in memory for it to run We can swap in pages as required So - we can now run processes that are bigger than total memory available!
Main memory is
called real memory User/programmer sees much bigger memory - virtual memory
March 20, 2012
Translation Look-aside Buffer (TLB)

Every
virtual memory reference causes two physical memory access

Fetch
page table entry Fetch data

Contains
Use special
March 20, 2012
page table entries that have been most recently used By the principle of locality, most references will be to locations in recently Richard Sudipto used pages Salomon,CopyrightMitraHill Institute Box
cache for page table TLB
TLB Operation
March 20, 2012
Segments
are multiple address spaces of variable dynamic size Paging is not (usually) visible to the programmer Segmentation is visible to the programmer Usually different segments are allocated to program and data May be have a number of program and data segments
March 20, 2012
Segmentation
Advantages of Segmentation
Simplifies
handling of growing data structures Allows programs to be altered and recompiled independently, without relinking and re-loading Lends itself to sharing among processes Lends itself to protection Some systems combine segmentation with paging
March 20, 2012
Hardware for segmentation and paging Unsegmented unpaged

Pentium II
Unsegmented paged

virtual address = physical address Low complexity High performance
Segmented unpaged

Memory viewed as paged linear address space Protection and management via paging Berkeley UNIX Collection of local address spaces Protection to single byte level Translation table needed is on chip when segment is in memory
Segmentation used to define logical memory partitions subject to access control Paging manages allocation of memory within partitions Unix System V Richard Salomon, Sudipto Mitra March 20, 2012
Copyright Box Hill Institute
Segmented paged
Pentium II Address Translation Mechanism
March 20, 2012
Pentium II Segmentation
Each
virtual address is 16-bit segment and 32-bit offset 2 bits of segment are protection mechanism 14 bits specify segment Unsegmented virtual memory 232 = 4Gbytes Segmented 246=64 terabytes
Can
be larger depends on which process is active Half (8K segments of 4Gbytes) is global Half is local and distinct for each process
March 20, 2012
Pentium II Protection
Protection bits
0
most protected, 3 least Use of levels software dependent Usually level 3 for applications, level 1 for O/S and level 0 for kernel (level 2 not used) Level 2 may be used for apps that have internal security e.g. database Some instructions only work in level 0
give 4 levels of privilege
March 20, 2012
Segmentation
In
Pentium II Paging
may be disabled
which case linear address space is used page directory
Two
First,
level page table lookup
1024
entries max Splits 4G linear memory into 1024 page groups of 4Mbyte Each page table has 1024 entries corresponding to 4Kbyte pages Can use one page directory for all processes, one per process or mixture Page directory for current process always in memory
TLB holding 32 page table entries Two March 20, 2012 page sizes available 4k or 4M Richard Salomon, Sudipto Mitra
Use
PowerPC Memory Management Hardware

32
bit paging with simple segmentation 64 bit paging with more powerful segmentation both do block address translation Map 4 large blocks of instructions & 4 of memory to bypass paging e.g. OS tables or graphics frame buffers 32 bit effective address 12 bit byte selector 4kbyte pages 16 bit page id 64k pages per segment 4 bits indicate one of 16 segment registers
March 20, 2012
Segment
registers under OS control Richard Salomon, Sudipto Mitra

PowerPC 32-bit Memory Management Formats
March 20, 2012
PowerPC 32-bit Address Translation
March 20, 2012
Current Memory Hierarchy

Processor Control Datapath L2 Cache Main Memory
L1 cache regs
Secondary Memory
Speed(ns): 0.5ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.05 1-4 100-1000 100,000 Cost ($/MB): -$100 $30 $1 $0.05 Technology: Regs SRAM SRAM DRAM Disk
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)
March 20, 2012
So you want fast?

It
is possible to build a computer which uses only static RAM This would be very fast This would need no cache
How
can you cache cache?
This
would cost a very large amount Impractical !

March 20, 2012
Cache
Small
amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module
March 20, 2012
Cache/Main Memory Structure
March 20, 2012
Cache operation overview

1. 2. 3. 4. 5. 6.
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
March 20, 2012
Cache Read Operation - Flowchart
March 20, 2012
Cache Design Issues

Size Mapping
Function Replacement Algorithm Write Policy Block Size Number of Caches
March 20, 2012
Size does matter

Cost
More More
cache is expensive
Speed
cache is faster (up to a point) Checking cache for data takes time
March 20, 2012
Typical Cache Organization
March 20, 2012
Processor IBM 360/85 PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601 PowerPC 620 PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP CRAY MTAb Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1
a
Type Mainframe Minicomputer Minicomputer Mainframe Mainframe PC PC PC PC PC/server Mainframe Mainframe PC/server High-end server/ supercomputer Supercomputer PC/server High-end server PC/server High-end server Supercomputer
Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993 1996 1999 1997 1999 2000 2000 2000 2001 2001 2002 2003 2004
L1 cachea 16 to 32 KB 1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB 32 KB/32 KB 32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB 8 KB 16 KB/16 KB 32 KB/32 KB 32 KB 64 KB 64 KB/64 KB
L2 cache 256 to 512 KB 256 KB to 1 MB 256 KB 8 MB 256 KB 8 MB 2 MB 96 KB 4 MB 256 KB 1.9 MB 1MB
L3 cache 2 MB 2 MB 4 MB 6 MB 36 MB
Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches
March 20, 2012
Mapping Function Example

Cache
i.e.
of 64kByte Cache block of 4 bytes

16MBytes
cache is 16k (214) lines of 4 bytes
main memory 24 bit address

(224=16M)
March 20, 2012
Direct Mapping
Each
block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)
March 20, 2012
Direct Mapping Address Structure

Tag s-r 8

Line or Slot r 14
Word w 2
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier

8 bit tag (=22-14) 14 bit slot or line
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
March 20, 2012
Direct Mapping Cache Line Table

Cache line 0 1 . . . m-1
March 20, 2012
Main Memory blocks held 0, m, 2m, 3m2s-m 1,m+1, 2m+12s-m+1 . . . m-1, 2m-1,3m-12s-1
Block Address
Main Memory
Cache Index
4-Block Direct Mapped Cache
0 1 0010 2 3 4 5 0110 6 7 8 9 1010 10 11 12 13 1110 14 15

March 20, 2012
0 1 2 3
Memory block address tag
index
index
cache index = (address) mod (# blocks) If number of cache blocks is power of 2, then cache index is just the lower n bits of Richard Salomon, Sudipto Mitra memory address [ n = log2(# Copyright Box Hill Institute
determines block in
Direct Mapping Example
March 20, 2012
Direct Mapping pros and cons

Simple Fixed
If
Inexpensive
location for given block Problem
a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
March 20, 2012
Another Extreme: Fully Associative

Fully
Omit
cache index; place item in any block! Compare all Cache Tags in parallel
Associative Cache (8 word block)
31
4 0 Cache Tag (27 bits long) Byte Offset

B 31
= = = :
March 20, 2012
= = : :
Cache Tag Valid Cache Data

B1 B0
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Fully Associative Cache

Must
search all tags in cache, as item can be in any cache block Search for tag must be done by hardware in parallel (other searches too slow) But, the necessary parallel comparator hardware is very expensive Therefore, fully associative placement practical only for a very small cache
March 20, 2012
Compromise: N-way Set Associative Cache

N-way
Like
set associative: N cache blocks for each Cache Index
having N direct mapped caches operating in parallel Select the one that gets a hit
Cache
Example:
Index selects a set of 2 blocks from the cache The 2 tags in set are compared in parallel Data is selected based on the tag result (which matched the address)
2-way set associative cache
March 20, 2012
Example: 2-way Set Associative Cache

tag index offset address
Valid Cache Tag Cache Data Block 0
Cache Data Cache Tag Valid Block 0
mux Cache Block Richard Salomon, Sudipto Mitra Hit Copyright Box Hill Institute
March 20, 2012
Set Associative Cache Contd.

Direct
Mapped, Fully Associative can be seen as just variations of Set Associative block placement strategy Direct Mapped = 1-way Set Associative Cache Fully Associative = n-way Set associativity for a cache with exactly n blocks
March 20, 2012
Replacement Algorithms (Direct mapping)

No
choice Each block only maps to one line Replace that line
March 20, 2012
Two Way Set Associative Mapping Example
March 20, 2012
Replacement Algorithms (Associative & Set Associative)

Hardware
implemented algorithm (speed) Least Recently Used (LRU) e.g. in a 2 way set associative, which of the 2 blocks is LRU? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random Richard Salomon, Sudipto Mitra
March 20, 2012
Write Policy
Must
not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly
March 20, 2012
Write through
All
writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes
March 20, 2012
Updates
initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes
March 20, 2012
Write back
Pentium 4 Cache (1)

80386
no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches
Pentium
Data
& instructions
III L3 cache added off chip
March 20, 2012
Pentium 4
L1
8k
Pentium 4 Cache (2)

bytes 64 byte lines four way set associative
Feeding 256k 128
caches
L2
cache
both L1 caches
L3
March 20, 2012
byte lines 8 way set associative
cache on chip
Pentium 4 Block Diagram
March 20, 2012
PowerPC Cache Organization (1)

601
single 32kb 8 way set associative 603 16kb (2 x 8kb) two way set associative 604 32kb 620 64kb
March 20, 2012
PowerPC Cache Organization (2)

G3 & G4
64kb
8
way set associative way set associative
L1 cache
256k,
two
512k or 1M L2 cache
G5
32kB
instruction cache 64kB data cache

March 20, 2012
PowerPC G5 Block Diagram
March 20, 2012
Summary

Combination of physical memory and disk space to create a memory image for a running application Pages are programs (processes) split into equal sized small chunks A very fast, but expensive memory in close proximity to the CPU In direct mapping, each block of main memory maps to only one cache line In associative mapping, a search of all tags in cache is necessary, as the required item can be in any cache block
March 20, 2012
Reference
Stallings
William, 2003, Computer Organization & Architecture designing for performance, 6th edn, Pearson Education, Inc. ISBN 0-13-049307-4 [chapter 4, 5 & 6] Virtual Memory, CS 105, Tour of the Black Holes of Computing! Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB Memory Hierarchy, Quynh Dinh
March 20, 2012
Internet Sources
Manufacturer sites
Intel IBM/Motorola
Search
on cache
March 20, 2012

Ict123 W9

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ict123 W9

Загружено:

Авторское право:

Доступные форматы

Centre for Computer Technology

ICT123 Computer Architecture

Virtual Memory Systems and Cache Systems

week 8 Introduction Virtual Memory Paging Segmentation

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

needed to preserve data

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

memory, Erase whole memory electrically

non-destructive No permanent damage to memory

using CRC or Hamming error correcting code

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

A System with Physical Memory Only Memory

A System with Virtual Memory Memory

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Dirty Protection Present/Absent

If access is not allowed, produce segmentation fault

dirty bit has page been modified?

Referenced set when page is accessed

(Memory Hierarchy, Quynh Dinh)

Disable caching if set, do not use cached copy

Allocation of Free Frames

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Logical and Physical Addresses Paging

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

not require all pages of a process in memory Bring in pages as required

March 20, 2012

March 20, 2012

Bonus of Demand Paging

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Translation Look-aside Buffer (TLB)

virtual memory reference causes two physical memory access

page table entry Fetch data

March 20, 2012

cache for page table TLB

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Hardware for segmentation and paging Unsegmented unpaged

virtual address = physical address Low complexity High performance

Pentium II Address Translation Mechanism

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

give 4 levels of privilege

March 20, 2012

which case linear address space is used page directory

level page table lookup

PowerPC Memory Management Hardware

registers under OS control Richard Salomon, Sudipto Mitra

PowerPC 32-bit Memory Management Formats

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

PowerPC 32-bit Address Translation

March 20, 2012