Вы находитесь на странице: 1из 6

Module

2 Lesson
Embedded Processors and
6
Memory Memory-II
Version 2 EE IIT, Kharagpur 1 Version 2 EE IIT, Kharagpur 2
Instructional Objectives 6.2 Cache
After going through this lesson the student would x Usually designed with SRAM
ƒ faster but more expensive than DRAM
ƒ Memory Hierarchy x Usually on same chip as processor
ƒ Cache Memory ƒ space limited, so much smaller than off-chip main memory
- Different types of Cache Mappings ƒ faster access (1 cycle vs. several cycles for main memory)
- Cache Impact on System Performance x Cache operation
ƒ Dynamic Memory ƒ Request for main memory access (read or write)
- Different types of Dynamic RAMs ƒ First, check cache for copy
ƒ Memory Management Unit ƒ cache hit
- copy is in cache, quick access
Pre-Requisite ƒ cache miss
- copy not in cache, read address and possibly its neighbors into cache
Digital Electronics, Microprocessors x Several cache design choices
ƒ cache mapping, replacement policies, and write techniques
6.1 Memory Hierarchy
6.3 Cache Mapping
Objective is to use inexpensive, fast memory
x Main memory x is necessary as there are far fewer number of available cache addresses than the memory
ƒ Large, inexpensive, slow memory stores entire program and data x Are address’ contents in cache?
x Cache x Cache mapping used to assign main memory address to cache address and determine hit
ƒ Small, expensive, fast memory stores copy of likely accessed parts of larger memory or miss
ƒ Can be multiple levels of cache x Three basic techniques:
ƒ Direct mapping
ƒ Fully associative mapping
Process ƒ Set-associative mapping
x Caches partitioned into indivisible blocks or lines of adjacent memory addresses
ƒ usually 4 or 8 addresses per line
Registers
Direct Mapping
Cache
x Main memory address divided into 2 fields
ƒ Index which contains
Main memory - cache address
- number of bits determined by cache size
ƒ Tag
Disk - compared with tag stored in cache at address indicated by index
- if tags match, check valid bit
Tape x Valid bit
ƒ indicates whether data in slot has been loaded from memory
x Offset
Fig. 6.1 The memory Hierarchy ƒ used to find particular word in cache line

Version 2 EE IIT, Kharagpur 3 Version 2 EE IIT, Kharagpur 4


Tag Index Offset Tag Index Offset

V T D V T D
V T D Data

Valid
Data
= =

Valid
=

Fig. 6.2 Direct Mapping Fig. 6.4 Set Associative Mapping

Fully Associative Mapping 6.4 Cache-Replacement Policy


x Complete main memory address stored in each cache address x Technique for choosing which block to replace
x All addresses stored in cache simultaneously compared with desired address ƒ when fully associative cache is full
x Valid bit and offset same as direct mapping ƒ when set-associative cache’s line is full
x Direct mapped cache has no choice
x Random
Tag Offset
ƒ replace block chosen at random
Data x LRU: least-recently used
V T V T V T ƒ replace block not accessed for longest time
x FIFO: first-in-first-out
Valid ƒ push block onto queue when accessed
= = ƒ choose block to replace by popping queue
=

6.5 Cache Write Techniques


x When written, data cache must update main memory
x Write-through
Fig. 6.3 Fully Associative Mapping
ƒ write to main memory whenever cache is written to
ƒ easiest to implement
Set-Associative Mapping ƒ processor must wait for slower main memory write
ƒ potential for unnecessary writes
x Compromise between direct mapping and fully associative mapping x Write-back
x Index same as in direct mapping ƒ main memory only written when “dirty” block replaced
x But, each cache address contains content and tags of 2 or more memory address locations ƒ extra dirty bit for each block set when cache block written to
x Tags of that set simultaneously compared as in fully associative mapping ƒ reduces number of slow main memory writes
x Cache with set size N called N-way set-associative
ƒ 2-way, 4-way, 8-way are common 6.6 Cache Impact on System Performance
x Most important parameters in terms of performance:

Version 2 EE IIT, Kharagpur 5 Version 2 EE IIT, Kharagpur 6


ƒ Total size of cache 6.8 Advanced RAM
- total number of data bytes cache can hold
- tag, valid and other house keeping bits not included in total x DRAMs commonly used as main memory in processor based embedded systems
ƒ Degree of associativity ƒ high capacity, low cost
ƒ Data block size
x Many variations of DRAMs proposed
x Larger caches achieve lower miss rates but higher access cost ƒ need to keep pace with processor speeds
ƒ e.g., ƒ FPM DRAM: fast page mode DRAM
- 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles ƒ EDO DRAM: extended data out DRAM
- avg. cost of memory access ƒ SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM
= (0.85 * 2) + (0.15 * 20) = 4.7 cycles ƒ RDRAM: rambus DRAM
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
- avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 6.9 Basic DRAM
cycles (improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not x Address bus multiplexed between row and column components
change x Row and column addresses are latched in, sequentially, by strobing ras (row address
- avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = strobe) and cas (column address strobe) signals, respectively
4.8904 cycles x Refresh circuitry can be external or internal to DRAM device
ƒ strobes consecutive memory address periodically causing memory content to be
6.7 Cache Performance Trade-Offs refreshed
ƒ Refresh circuitry disabled during read or write operation
x Improving cache hit rate without increasing size
ƒ Increase line size data Refresh
ƒ Change set-associativity
Circuit
. Buffer

0.16 In
Sense
Buffer
Addr Amplifiers
0.14 Data Col Decoder ras, clock
rd/ wr Col cas
0.12
% cache miss

Row
0.1 1 way Out Buff Decod cas,
Buffer er
2 way er
0.08 Data Addr.
4 ways
Row ras
0.06 8 way address Bit storage array
0.04
Fig. 6.6 The Basic Dynamic RAM Structure
0.02

0 Fast Page Mode DRAM (FPM DRAM)


cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
x Each row of memory bit array is viewed as a page
Fig. 6.5 Cache Performance x Page contains multiple words
x Individual words addressed by column address
x Timing diagram:
ƒ row (page) address sent
ƒ 3 words read consecutively by sending column address for each
Extra cycle eliminated on each read/write of words from same

Version 2 EE IIT, Kharagpur 7 Version 2 EE IIT, Kharagpur 8


clock
ras
ras
cas
cas

address row col col col address row col

data data data data data


data data data
Fig. 6.9 The timing diagram in SDRAM
Fig. 6.7 The timing diagram in FPM DRAM
Rambus DRAM (RDRAM)
Extended data out DRAM (EDO DRAM)
x More of a bus interface architecture than DRAM architecture
x Improvement of FPM DRAM x Data is latched on both rising and falling edge of clock
x Extra latch before output buffer x Broken into 4 banks each with own row decoder
ƒ allows strobing of cas before data read operation completed ƒ can have 4 pages open at a time
x Reduces read/write latency by additional cycle x Capable of very high throughput

ras 6.10 DRAM Integration Problem


cas
x SRAM easily integrated on same chip as processor
address row col col col x DRAM more difficult
ƒ Different chip making process between DRAM and conventional logic
data data data data ƒ Goal of conventional logic (IC) designers:
- minimize parasitic capacitance to reduce signal propagation delays and power
Speedup through overlap consumption
ƒ Goal of DRAM designers:
Fig. 6.8 The timing diagram in EDORAM - create capacitor cells to retain stored information
ƒ Integration processes beginning to appear
(S)ynchronous and Enhanced Synchronous (ES) DRAM
6.11 Memory Management Unit (MMU)
x SDRAM latches data on active edge of clock
x Eliminates time to detect ras/cas and rd/wr signals x Duties of MMU
x A counter is initialized to column address then incremented on active edge of clock to ƒ Handles DRAM refresh, bus interface and arbitration
access consecutive memory locations ƒ Takes care of memory sharing among multiple processors
x ESDRAM improves SDRAM ƒ Translates logic memory addresses from processor to physical memory addresses
ƒ added buffers enable overlapping of column addressing of DRAM
ƒ faster clocking and lower read/write latency possible x Modern CPUs often come with MMU built-in
x Single-purpose processors can be used

6.12 Question
Q1. Discuss different types of cache mappings.

Version 2 EE IIT, Kharagpur 9 Version 2 EE IIT, Kharagpur 10


Ans: SDRAM
Direct, Fully Associative, Set Associative clock

Q2 Discuss the size of the cache memory on the system performance. ras

cas
Ans:
address row col
0.16
data data data data
0.14

0.12

0.1 1 way
% cache miss

2 way
0.08
4 ways
0.06 8 way

0.04

0.02

0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb

Q3. Discuss the differences between EDORAM and SDRAM

Ans:

EDO RAM
ras

cas

address row col col col

data data data data

Speedup through overlap

Version 2 EE IIT, Kharagpur 11 Version 2 EE IIT, Kharagpur 12

Вам также может понравиться