Академический Документы
Профессиональный Документы
Культура Документы
Content at a Glance
Review
Associative
associative
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Memory Hierarchy
SRAM v DRAM
Both Dynamic
Power
volatile cell
Simpler
Static
to build, smaller More dense Less expensive Needs refresh Larger memory units
Faster Cache
Types of ROM
ROM
- Written during manufacture, Very expensive for small runs PROM - Programmable (once), Needs special equipment to program Read mostly Erasable Programmable (EPROM), Erased by UV Electrically Erasable (EEPROM), Takes much longer to write than read
March 20, 2012
Flash
Hard Failure
Permanent Random,
Error Correction
defect
Soft
Error
Detected
Virtual Memory
Use of
main memory and disk space to provide the illusion of endless amount of physical memory. Memory on hard disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory
March 20, 2012
CPU
N-1:
Addresses generated by the CPU correspond directly to bytes in physical memory Examples include, Most Cray machines, early PCs, some embedded systems
(Virtual Memory, CS 105, Tour of the Black Holes of Computing!) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
March 20, 2012
CPU
P-1:
N-1: Disk
Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table)
March 20, 2012 Richard Salomon, Sudipto Mitra Examples include, Workstations, servers, modern PCs, etc.
Copyright Box the Black (Virtual Memory, CS 105, Tour of Hill InstituteHoles of Computing!)
Virtual Memory
Combination
of physical memory and disk space to create a memory image for a running application Divided into logical segments of variable size (segmentation) or fixed length pages (pagination) Paged systems are easier to manage, and are made transparent to the application (user). Virtual memory management is an operating system issue
March 20, 2012
Paging
Split
memory into equal sized, small chunks -page frames, typically 4k each Split programs (processes) into equal sized small chunks - pages Allocate a number of page frames to a process Operating System maintains list of free frames A process does not require contiguous page frames Use page table to keep track
March 20, 2012
Disable Caching
Ref.
Page Frame #
Present/Absent bit Is page loaded in main memory? Page Frame # - physical page frame where virtual page is loaded Protection read, write, executable
If yes, then need to write back to disk when page is swapped out of physical memory Helps OS decide whether or not to swap page out
Cached copy may be invalid if page is mapped to I/O device and will March 20, 2012 change often Richard Salomon, Sudipto Mitra I/O) (memory-mapped Copyright Box Hill Institute
Virtual Memory
Demand
Do
paging
Page
Required
page is not in memory Operating System must swap in required page May need to swap out a page to make space Select page to throw out based on recent history
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
fault
Too
many processes in too little memory Operating System spends all its time swapping Little or no real work is done Disk light is on (flickering) all the time
Solutions
Good
Thrashing
page replacement algorithms Reduce number of processes running Fit more memory
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
do not need all of a process in memory for it to run We can swap in pages as required So - we can now run processes that are bigger than total memory available!
Main memory is
called real memory User/programmer sees much bigger memory - virtual memory
March 20, 2012
Use special
page table entries that have been most recently used By the principle of locality, most references will be to locations in recently Richard Sudipto used pages Salomon,CopyrightMitraHill Institute Box
TLB Operation
Segments
are multiple address spaces of variable dynamic size Paging is not (usually) visible to the programmer Segmentation is visible to the programmer Usually different segments are allocated to program and data May be have a number of program and data segments
March 20, 2012
Segmentation
Advantages of Segmentation
Simplifies
handling of growing data structures Allows programs to be altered and recompiled independently, without relinking and re-loading Lends itself to sharing among processes Lends itself to protection Some systems combine segmentation with paging
March 20, 2012
Pentium II
Unsegmented paged
Segmented unpaged
Memory viewed as paged linear address space Protection and management via paging Berkeley UNIX Collection of local address spaces Protection to single byte level Translation table needed is on chip when segment is in memory
Segmentation used to define logical memory partitions subject to access control Paging manages allocation of memory within partitions Unix System V Richard Salomon, Sudipto Mitra March 20, 2012
Copyright Box Hill Institute
Segmented paged
Pentium II Segmentation
Each
virtual address is 16-bit segment and 32-bit offset 2 bits of segment are protection mechanism 14 bits specify segment Unsegmented virtual memory 232 = 4Gbytes Segmented 246=64 terabytes
Can
be larger depends on which process is active Half (8K segments of 4Gbytes) is global Half is local and distinct for each process
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Pentium II Protection
Protection bits
0
most protected, 3 least Use of levels software dependent Usually level 3 for applications, level 1 for O/S and level 0 for kernel (level 2 not used) Level 2 may be used for apps that have internal security e.g. database Some instructions only work in level 0
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Segmentation
In
Pentium II Paging
may be disabled
Two
First,
1024
entries max Splits 4G linear memory into 1024 page groups of 4Mbyte Each page table has 1024 entries corresponding to 4Kbyte pages Can use one page directory for all processes, one per process or mixture Page directory for current process always in memory
TLB holding 32 page table entries Two March 20, 2012 page sizes available 4k or 4M Richard Salomon, Sudipto Mitra
Copyright Box Hill Institute
Use
bit paging with simple segmentation 64 bit paging with more powerful segmentation both do block address translation Map 4 large blocks of instructions & 4 of memory to bypass paging e.g. OS tables or graphics frame buffers 32 bit effective address 12 bit byte selector 4kbyte pages 16 bit page id 64k pages per segment 4 bits indicate one of 16 segment registers
March 20, 2012
Segment
Secondary Memory
Speed(ns): 0.5ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.05 1-4 100-1000 100,000 Cost ($/MB): -$100 $30 $1 $0.05 Technology: Regs SRAM SRAM DRAM Disk
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)
March 20, 2012
is possible to build a computer which uses only static RAM This would be very fast This would need no cache
How
This
Cache
Small
amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
cache is expensive
Speed
cache is faster (up to a point) Checking cache for data takes time
Processor IBM 360/85 PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601 PowerPC 620 PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP CRAY MTAb Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1
a
Type Mainframe Minicomputer Minicomputer Mainframe Mainframe PC PC PC PC PC/server Mainframe Mainframe PC/server High-end server/ supercomputer Supercomputer PC/server High-end server PC/server High-end server Supercomputer
Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993 1996 1999 1997 1999 2000 2000 2000 2001 2001 2002 2003 2004
L1 cachea 16 to 32 KB 1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB 32 KB/32 KB 32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB 8 KB 16 KB/16 KB 32 KB/32 KB 32 KB 64 KB 64 KB/64 KB
L3 cache 2 MB 2 MB 4 MB 6 MB 36 MB
Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches
March 20, 2012
Direct Mapping
Each
block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)
March 20, 2012
Line or Slot r 14
Word w 2
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
March 20, 2012
Main Memory blocks held 0, m, 2m, 3m2s-m 1,m+1, 2m+12s-m+1 . . . m-1, 2m-1,3m-12s-1
Block Address
Main Memory
Cache Index
0 1 2 3
Memory block address tag
index
index
cache index = (address) mod (# blocks) If number of cache blocks is power of 2, then cache index is just the lower n bits of Richard Salomon, Sudipto Mitra memory address [ n = log2(# Copyright Box Hill Institute
determines block in
Inexpensive
a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
cache index; place item in any block! Compare all Cache Tags in parallel
31
= = = :
March 20, 2012
= = : :
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
search all tags in cache, as item can be in any cache block Search for tag must be done by hardware in parallel (other searches too slow) But, the necessary parallel comparator hardware is very expensive Therefore, fully associative placement practical only for a very small cache
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)
March 20, 2012
having N direct mapped caches operating in parallel Select the one that gets a hit
Cache
Example:
Index selects a set of 2 blocks from the cache The 2 tags in set are compared in parallel Data is selected based on the tag result (which matched the address)
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
mux Cache Block Richard Salomon, Sudipto Mitra Hit Copyright Box Hill Institute
Mapped, Fully Associative can be seen as just variations of Set Associative block placement strategy Direct Mapped = 1-way Set Associative Cache Fully Associative = n-way Set associativity for a cache with exactly n blocks
March 20, 2012
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
choice Each block only maps to one line Replace that line
implemented algorithm (speed) Least Recently Used (LRU) e.g. in a 2 way set associative, which of the 2 blocks is LRU? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random Richard Salomon, Sudipto Mitra
March 20, 2012
Write Policy
Must
not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly
Write through
All
writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes
March 20, 2012
Updates
initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes
March 20, 2012
Write back
no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches
Pentium
Data
& instructions
Pentium 4
L1
8k
caches
L2
cache
both L1 caches
L3
March 20, 2012
cache on chip
single 32kb 8 way set associative 603 16kb (2 x 8kb) two way set associative 604 32kb 620 64kb
L1 cache
256k,
two
512k or 1M L2 cache
G5
32kB
Summary
Combination of physical memory and disk space to create a memory image for a running application Pages are programs (processes) split into equal sized small chunks A very fast, but expensive memory in close proximity to the CPU In direct mapping, each block of main memory maps to only one cache line In associative mapping, a search of all tags in cache is necessary, as the required item can be in any cache block
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Reference
Stallings
William, 2003, Computer Organization & Architecture designing for performance, 6th edn, Pearson Education, Inc. ISBN 0-13-049307-4 [chapter 4, 5 & 6] Virtual Memory, CS 105, Tour of the Black Holes of Computing! Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB Memory Hierarchy, Quynh Dinh
March 20, 2012
Internet Sources
Manufacturer sites
Intel IBM/Motorola
Search
on cache