Вы находитесь на странице: 1из 54

Unit-III : Memory

Topics

Types and Hierarchy Model level organization Cache memory Performance considerations Mapping Virtual memory Swapping Paging Segmentation Replacement policies

Random-Access Memory (RAM)


Key features

RAM is packaged as a chip. Basic storage unit is a cell (one bit per cell). Multiple RAM chips form a memory.

Static RAM (SRAM)


Each cell stores bit with a six-transistor circuit.

Retains value indefinitely, as long as it is kept powered. Relatively insensitive to disturbances such as electrical noise. Faster and more expensive than DRAM.

Dynamic RAM (DRAM)


Each cell stores bit with a capacitor and transistor. Value must be refreshed every 10-100 ms. Sensitive to disturbances. Slower and cheaper than SRAM.

SRAM vs DRAM Summary

Tran. per bit SRAM DRAM 6 1

Access time Persist? Sensitive? 1X 10X Yes No No Yes

Cost 100x 1X

Applications cache memories Main memories, frame buffers

Traditional Architecture

Processor MAR

k-bit address bus


n-bit data bus

Memory

MDR

k Up to 2 addressable locations

Word length = bits n Control lines ( R / W , M/IO, etc.)

Figure 5.1. Connection of the memory to the processor.

Conventional DRAM Organization


d x w DRAM:

dw total bits organized as d supercells of size w bits


16 x 8 DRAM chip cols 0
2 bits /

0 1 rows 2 3 supercell (2,1)

addr
memory controller (to CPU) data
8 bits /

internal row buffer

Reading DRAM Supercell (2,1)


Step 1(a): Row access strobe (RAS) selects row 2. Step 1(b): Row 2 copied from DRAM array to row buffer.
16 x 8 DRAM chip cols RAS = 2
2 /

0 0 1

addr
memory controller
8 /

rows 2 3

data

internal row buffer

Reading DRAM Supercell (2,1)


Step 2(a): Column access strobe (CAS) selects column 1. Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU.
16 x 8 DRAM chip cols CAS = 1
2 /

0
1

To CPU
memory controller supercell (2,1)

addr rows 2
8 /

data

supercell (2,1)

internal row buffer

Memory Modules
addr (row = i, col = j) : supercell (i,j)
DRAM 0

DRAM 7

64 MB memory module consisting of eight 8Mx8 DRAMs

bits bits bits bits bits bits bits 56-63 48-55 40-47 32-39 24-31 16-23 8-15

bits 0-7

63

56 55

48 47

40 39

32 31

24 23 16 15

8 7

64-bit doubleword at main memory address A

Memory controller

64-bit doubleword

Enhanced DRAMs
All enhanced DRAMs are built around the conventional DRAM core.

Fast page mode DRAM (FPM DRAM)


Access contents of row with [RAS, CAS, CAS, CAS, CAS]

instead of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].

Extended data out DRAM (EDO DRAM)


Enhanced FPM DRAM with more closely spaced CAS signals.

Synchronous DRAM (SDRAM)


Driven with rising clock edge instead of asynchronous control

signals.

Double data-rate synchronous DRAM (DDR SDRAM)


Enhancement of SDRAM that uses both clock edges as control

signals.

Video RAM (VRAM)


Like FPM DRAM, but output is produced by shifting row buffer Dual ported (allows concurrent reads and writes)

Nonvolatile Memories
DRAM and SRAM are volatile memories

Lose information if powered off.

Nonvolatile memories retain value even if powered off.


Generic name is read-only memory (ROM). Misleading because some ROMs can be read and modified.

Types of ROMs

Programmable ROM (PROM) Eraseable programmable ROM (EPROM) Electrically eraseable PROM (EEPROM) Flash memory

Firmware

Program stored in a ROM


Boot time code, BIOS (basic input/ouput system) graphics cards, disk controllers.

Disk Geometry
Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps.
tracks surface track k gaps

spindle

sectors

Disk Geometry (Muliple-Platter View)


Aligned tracks form a cylinder.
cylinder k surface 0 surface 1 surface 2 surface 3 surface 4 surface 5 spindle platter 0 platter 1 platter 2

Disk Capacity
Capacity: maximum number of bits that can be stored.

Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^9.

Capacity is determined by these technology factors:


Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. Areal density (bits/in2): product of recording and track density.

Modern disks partition tracks into disjoint subsets called recording zones

Each track in a zone has the same number of sectors, determined by the circumference of innermost track. Each zone has a different number of sectors/track

Computing Disk Capacity


Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x

(# platters/disk)
Example:

512 bytes/sector 300 sectors/track (on average) 20,000 tracks/surface 2 surfaces/platter 5 platters/disk

Capacity = 512 x 300 x 20000 x 2 x 5


= 30,720,000,000 = 30.72 GB

Disk Operation (Single-Platter View)


The disk surface spins at a fixed rotational rate The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle By moving radially, the arm can position the read/write head over any track. spindle spindle

spindle

Disk Operation (Multi-Platter View)


read/write heads move in unison from cylinder to cylinder

arm

spindle

Disk Access Time


Average time to access some target sector approximated by :

Taccess = Tavg seek + Tavg rotation + Tavg transfer

Seek time (Tavg seek)


Time to position heads over cylinder containing target sector. Typical Tavg seek = 9 ms

Rotational latency (Tavg rotation)

Time waiting for first bit of target sector to pass under r/w head. Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min

Transfer time (Tavg transfer)

Time to read the bits in the target sector.

Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.

Logical Disk Blocks


Modern disks present a simpler abstract view of the complex sector geometry:

The set of available sectors is modeled as a sequence of bsized logical blocks (0, 1, 2, ...)

Mapping between logical blocks and actual (physical) sectors


Maintained by hardware/firmware device called disk controller. Converts requests for logical blocks into (surface,track,sector) triples.

Allows controller to set aside spare cylinders for each zone.

Accounts for the difference in formatted capacity and maximum capacity.

The CPU-Memory Gap


The increasing gap between DRAM, disk, and CPU speeds.
100,000,000 10,000,000 1,000,000 100,000
ns

Disk seek time DRAM access time SRAM access time CPU cycle time

10,000 1,000 100 10 1 1980 1985 1990 year 1995 2000

Locality
Principle of Locality:

Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. Temporal locality: Recently referenced items are likely to be referenced in the near future. Spatial locality: Items with nearby addresses tend to be referenced close together in time.
sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum;

Locality Example:

Data Reference array elements in succession (stride-1 reference pattern): Spatial locality Reference sum each iteration: Temporal locality

Instructions Reference instructions in sequence: Spatial locality Cycle through loop repeatedly: Temporal locality

Memory Hierarchies
Some fundamental and enduring properties of hardware and software:

Fast storage technologies cost more per byte and have less capacity. The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality.

These fundamental properties complement each other beautifully.

They suggest an approach for organizing memory and storage systems known as a memory hierarchy.

An Example Memory Hierarchy


Smaller, faster, and costlier (per byte) storage devices L0: registers L1: on-chip L1 cache (SRAM) L2: off-chip L2 cache (SRAM)
CPU registers hold words retrieved from L1 cache.

L1 cache holds cache lines retrieved from the L2 cache memory. L2 cache holds cache lines retrieved from main memory.

L3:

Larger, slower, and cheaper (per byte) storage devices


L5:

main memory (DRAM)

Main memory holds disk blocks retrieved from local disks.

L4:

local secondary storage (local disks)


Local disks hold files retrieved from disks on remote network servers.

remote secondary storage (distributed file systems, Web servers)

Memory Hierarchy
CPU
Main Memory I/O Processor

Cache

Magnetic Disks

Magnetic Tapes

Cache Memory
Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy:

For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

Why do memory hierarchies work?


Cache

Processor

Cache

Main memory

Figure 5.14. Use of a cache memory. Replacement algorithm Hit / miss Write-through / Write-back Load through

Cache Memory Operation


1. Cache fetches data from next to current addresses in main memory 2. CPU checks to see whether the next instruction it requires is in cache

Main Memory (DRAM)

Cache

Memory
(SRAM)

CPU

4. If not, the CPU has to fetch next instruction from main memory - a much slower process

3. If it is, then the instruction is fetched from the cache a very fast position

= Bus connections

Cache Memory

Miss

CPU
Cache (Fast) Cache 95% hit ratio

Main Memory (Slow)

Mem

Hit

Access = 0.95 Cache + 0.05 Mem

Caching in a Memory Hierarchy


Level k: 8 4 9 14 10 3 Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1

10 4

Data is copied between levels in block-sized transfer units

0 Level k+1: 4 8 12

1 5 9 13

2 6 10 14

3 7 11 15 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks.

General Caching Concepts


14 12
0 1

Request 12 14
2 3

Program needs object d, which is stored in some block b. Cache hit

Level k:

4* 12

14

Program finds b in the cache at level k. E.g., block 14.

12 4*

Request 12

Cache miss

0 Level k+1:

4 4*
8 12

5
9 13

6
10 14

7
11 15

b is not at level k, so level k cache must fetch it from level k+1. E.g., block 12. If level k cache is full, then some current block must be replaced (evicted). Which one is the victim?
Placement policy: where can the new

block go? E.g., b mod 4 Replacement policy: which block should be evicted? E.g., LRU

General Caching Concepts


Types of cache misses:

Cold (compulsary) miss


Cold misses occur because the cache is empty.

Conflict miss
Most caches limit blocks at level k+1 to a small subset

(sometimes a singleton) of the block positions at level k. E.g. Block i at level k+1 must be placed in block (i mod 4) at level k+1. Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.

Capacity miss
Occurs when the set of active cache blocks (working set) is

larger than the cache.

Examples of Caching in the Hierarchy


Cache Type Registers TLB L1 cache L2 cache Virtual Memory Buffer cache What Cached 4-byte word Address translations 32-byte block 32-byte block 4-KB page Parts of files Where Cached CPU registers On-Chip TLB On-Chip L1 Off-Chip L2 Main memory Main memory Local disk Local disk Remote server disks Latency (cycles) Managed By 0 Compiler 0 Hardware 1 Hardware 10 Hardware 100 Hardware+ OS 100 OS 10,000,000 AFS/NFS client 10,000,000 Web browser 1,000,000,000 Web proxy server

Network buffer Parts of files cache Browser cache Web pages Web cache Web pages

Performance Considerations
Overview
Two key factors: performance and cost

Price/performance ratio
Performance depends on how fast machine instructions can be brought into the processor for execution and how fast they can be executed.

For memory hierarchy, it is beneficial if transfers to and from the faster units can be done at a rate equal to that of the faster unit.
This is not possible if both the slow and the fast units are accessed in the same manner.

However, it can be achieved when parallelism is used in the organizations of the slower unit.

Interleaving
If the main memory is structured as a collection of physically separated modules, each with its own ABR (Address buffer register) and DBR( Data buffer register), memory access operations may proceed in more than one module at the same time.
mbits k bits Module mbits Address in module MM address Address in module k bits Module MM address

ABR DBR ABR DBR Module 0 ABR DBR Module i ABR DBR Module n- 1

ABR DBR

ABR DBR

Module 0

Module i

Module k 2 - 1

(b) Consecutive words in consecutive modules (a) Consecutive words in a module

Figure 5.25. Addressing multiple-module memory systems.

Hit Rate and Miss Penalty


The success rate in accessing information at various levels of the memory hierarchy hit rate / miss rate.

Ideally, the entire memory hierarchy would appear to the processor as a single memory unit that has the access time of a cache on the processor chip and the size of a magnetic disk depends on the hit rate (>>0.9).
A miss causes extra time needed to bring the desired information into the cache.

Hit Rate and Miss Penalty (cont.)


Tave=hC+(1-h)M
Tave: average access time experienced by the processor h: hit rate M: miss penalty, the time to access information in the main memory C: the time to access information in the cache

Example:
Assume that 30 percent of the instructions in a typical program perform a read/write operation, which means that there are 130 memory accesses for every 100 instructions executed. h=0.95 for instructions, h=0.9 for data C=10 clock cycles, M=17 clock cycles, interleaved memory Time without cache : 130x10 Time with cache : 100(0.95x1+0.05x17)+30(0.9x1+0.1x17) = 5.04 The computer with the cache performs five times better

How to Improve Hit Rate?


Use larger cache increased cost Increase the block size while keeping the total cache size constant. However, if the block size is too large, some items may not be referenced before the block is replaced miss penalty increases. Load-through approach

Caches on the Processor Chip


On chip vs. off chip Two separate caches for instructions and data, respectively Single cache for both Which one has better hit rate? -- Single cache Whats the advantage of separating caches? parallelism, better performance

Level 1 and Level 2 caches


L1 cache faster and smaller. Access more than one word simultaneously and let the processor use them one at a time. L2 cache slower and larger.

How about the average access time?


Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M where h is the hit rate, C is the time to access information in cache, M is the time to access information in main memory.

Other Enhancements
Write buffer processor doesnt need to wait for the memory write to be completed

Prefetching prefetch the data into the cache before they are needed
Lockup-Free cache processor is able to access the cache while a miss is being serviced.

Mapping
00000000 00000001 3FFFFFFF

Main Memory

00000 00001 FFFFF

Cache

Address Mapping !!!

Direct Mapping
Block j of main memory maps onto block j modulo 128 of the cache
Cache
tag tag Block 0 Block 1

Main memory Block 0 Block 1

Block 127 Block 128 Block 129

4: one of 16 words. (each block has 16=24 words)


7: points to a particular block in the cache (128=27) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128).

tag

Block 127

Block 255 Block 256 Block 257

Figure 5.15. Direct-mapped cache.

Block 4095 Tag 5 Block 7 W ord 4 Main memory address

Direct Mapping
Address 000 00500 What happens when Address = 100 00500

00000

Cache
Tag Data

00500 000 0 1 A 6 00900 080 4 7 C C 01400 150 0 0 0 5 FFFFF Compare Match No match 000 0 1 A 6

20 10 16 Bits Bits Bits (Addr) (Tag) (Data)

Direct Mapping with Blocks


Address 000 0050 0

Block Size = 16

00000

Cache
Tag Data

00500 01A6 000 00501 0254 00900 47CC 080 00901 A0B4 01400 0005 150 01401 5C04 FFFFF

000 0 1 A 6

Compare

Match No match

20 10 16 Bits Bits Bits (Addr) (Tag) (Data)

Direct Mapping
Tag 5 Block 7 W ord 4 Main memory address

11101,1111111,1100

Tag: 11101

Block: 1111111=127, in the 127th block of the cache


Word:1100=12, the 12th word of the 127th block in the cache

Associative Mapping
Main memory Block 0 Block 1 Cache tag tag Block 0 Block 1

Block i tag

Block 127

4: one of 16 words. (each block has 16=24 words)

12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212.

Block 4095 Tag 12 Word 4 Main memory address

Figure 5.16. Associative-mapped cache.

Associative Memory
Cache Location
00000 00001 FFFFF
00000000 00000001 00012000 08000000 15000000 3FFFFFFF

Main Memory

Cache
00012000 15000000

08000000

Address (Key)

Data

Associative Mapping
Address

00012000

Can have any number of locations

Cache
00012000 0 1 A 6 Data 15000000 0 0 0 5 08000000 4 7 C C 01A6

How many comparators?

30 Bits (Key)

16 Bits (Data)

Associative Mapping
Tag 12 Word 4 Main memory address

111011111111,1100

Tag: 111011111111 Word:1100=12, the 12th word of a block in the cache

Set-Associative Mapping
Cache tag Set 0 tag tag Set 1 Block 0

Main memory Block 0 Block 1

Block 63
Block 1 Block 64 Block 2 Block 65

tag

Block 3

4: one of 16 words. (each Set block has 16=24 words) 63 6: points to a particular set in the cache (128/2=64=26)

tag tag

Block 127 Block 126 Block 128 Block 127 Block 129

6: 6 tag bits is used to check if the desired block is present (4096/64=26).


Tag
6

Block 4095

Figure 5.17. Set-associative-mapped cache with two blocks per set.


Set
6

Word
4

Main memory address

Set-Associative Mapping
Address 000 00500 2-Way Set Associative 00000

Cache
010 0 7 2 1 Tag1 Data1 Tag2 Data2 000 0 8 2 2 000 0 9 0 9 000 0 1 A 6 010 0 7 2 1

00500 000 0 1 A 6 00900 080 4 7 C C 01400 150 0 0 0 5 FFFFF

Compare

Compare

20 10 16 10 16 Bits Bits Bits Bits Bits (Addr) (Tag) (Data) (Tag) (Data)

Match

No match

Set-Associative Mapping
Tag 6 Set 6 Word 4 Main memory address

111011,111111,1100

Tag: 111011
Set: 111111=63, in the 63th set of the cache Word:1100=12, the 12th word of the 63th set in the cache

Replacement Algorithms
Difficult to determine which blocks to kick out Least Recently Used (LRU) block The cache controller tracks references to all blocks as computation proceeds. Increase / clear track counters when a hit/miss occurs

Replacement Algorithms
For Associative & Set-Associative Cache
Which

location should be emptied when the cache is full and a miss occurs? First In First Out (FIFO) Least Recently Used (LRU)

Distinguish an Empty location from a Full one

Valid Bit

Replacement Algorithms
CPU Reference A
Miss

B
Miss

C
Miss

A
Hit

D
Miss

E
Miss

A
Miss

D
Hit

C
Hit

F
Miss

Cache FIFO

A B

A B C

A B C

A B C D

E B C D

E A C D

E A C D

E A C D

E A F D

Hit Ratio = 3 / 10 = 0.3

Replacement Algorithms
CPU Reference A
Miss

B
Miss

C
Miss

A
Hit

D
Miss

E
Miss

A
Hit

D
Hit

C
Hit

F
Miss

Cache LRU

B A

C B A

A C B

D A C B

E D A C

A E D C

D A E C

C D A E

F C D A

Hit Ratio = 4 / 10 = 0.4

Вам также может понравиться