Embedded Systems Memory-2

Module
2 Lesson
Embedded Processors and
6
Memory Memory-II
Version 2 EE IIT, Kharagpur 1 Version 2 EE IIT, Kharagpur 2
Instructional Objectives 6.2 Cache
After going through this lesson the student would x Usually designed with SRAM
faster but more expensive than DRAM
Memory Hierarchy x Usually on same chip as processor
Cache Memory space limited, so much smaller than off-chip main memory
- Different types of Cache Mappings faster access (1 cycle vs. several cycles for main memory)
- Cache Impact on System Performance x Cache operation
Dynamic Memory Request for main memory access (read or write)
- Different types of Dynamic RAMs First, check cache for copy
Memory Management Unit cache hit
- copy is in cache, quick access
Pre-Requisite cache miss
- copy not in cache, read address and possibly its neighbors into cache
Digital Electronics, Microprocessors x Several cache design choices
cache mapping, replacement policies, and write techniques
6.1 Memory Hierarchy
6.3 Cache Mapping
Objective is to use inexpensive, fast memory
x Main memory x is necessary as there are far fewer number of available cache addresses than the memory
Large, inexpensive, slow memory stores entire program and data x Are address’ contents in cache?
x Cache x Cache mapping used to assign main memory address to cache address and determine hit
Small, expensive, fast memory stores copy of likely accessed parts of larger memory or miss
Can be multiple levels of cache x Three basic techniques:
Direct mapping
Fully associative mapping
Process Set-associative mapping
x Caches partitioned into indivisible blocks or lines of adjacent memory addresses
usually 4 or 8 addresses per line
Registers
Direct Mapping
Cache
x Main memory address divided into 2 fields
Index which contains
Main memory - cache address
- number of bits determined by cache size
Tag
Disk - compared with tag stored in cache at address indicated by index
- if tags match, check valid bit
Tape x Valid bit
indicates whether data in slot has been loaded from memory
x Offset
Fig. 6.1 The memory Hierarchy used to find particular word in cache line

Tag Index Offset Tag Index Offset
V T D V T D
V T D Data
Valid
Data
= =
Valid
=
Fig. 6.2 Direct Mapping Fig. 6.4 Set Associative Mapping
Fully Associative Mapping 6.4 Cache-Replacement Policy

x Complete main memory address stored in each cache address x Technique for choosing which block to replace
x All addresses stored in cache simultaneously compared with desired address when fully associative cache is full
x Valid bit and offset same as direct mapping when set-associative cache’s line is full
x Direct mapped cache has no choice
x Random
Tag Offset
replace block chosen at random
Data x LRU: least-recently used
V T V T V T replace block not accessed for longest time
x FIFO: first-in-first-out
Valid push block onto queue when accessed
= = choose block to replace by popping queue
=
6.5 Cache Write Techniques

x When written, data cache must update main memory
x Write-through
Fig. 6.3 Fully Associative Mapping
write to main memory whenever cache is written to
easiest to implement
Set-Associative Mapping processor must wait for slower main memory write
potential for unnecessary writes
x Compromise between direct mapping and fully associative mapping x Write-back
x Index same as in direct mapping main memory only written when “dirty” block replaced
x But, each cache address contains content and tags of 2 or more memory address locations extra dirty bit for each block set when cache block written to
x Tags of that set simultaneously compared as in fully associative mapping reduces number of slow main memory writes
x Cache with set size N called N-way set-associative
2-way, 4-way, 8-way are common 6.6 Cache Impact on System Performance
x Most important parameters in terms of performance:

Total size of cache 6.8 Advanced RAM
- total number of data bytes cache can hold
- tag, valid and other house keeping bits not included in total x DRAMs commonly used as main memory in processor based embedded systems
Degree of associativity high capacity, low cost
Data block size
x Many variations of DRAMs proposed
x Larger caches achieve lower miss rates but higher access cost need to keep pace with processor speeds
e.g., FPM DRAM: fast page mode DRAM
- 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles EDO DRAM: extended data out DRAM
- avg. cost of memory access SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM
= (0.85 * 2) + (0.15 * 20) = 4.7 cycles RDRAM: rambus DRAM
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
- avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 6.9 Basic DRAM
cycles (improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not x Address bus multiplexed between row and column components
change x Row and column addresses are latched in, sequentially, by strobing ras (row address
- avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = strobe) and cas (column address strobe) signals, respectively
4.8904 cycles x Refresh circuitry can be external or internal to DRAM device
strobes consecutive memory address periodically causing memory content to be
6.7 Cache Performance Trade-Offs refreshed
Refresh circuitry disabled during read or write operation
x Improving cache hit rate without increasing size
Increase line size data Refresh
Change set-associativity
Circuit
. Buffer
0.16 In
Sense
Buffer
Addr Amplifiers
0.14 Data Col Decoder ras, clock
rd/ wr Col cas
0.12
% cache miss
Row
0.1 1 way Out Buff Decod cas,
Buffer er
2 way er
0.08 Data Addr.
4 ways
Row ras
0.06 8 way address Bit storage array
0.04
Fig. 6.6 The Basic Dynamic RAM Structure
0.02
0 Fast Page Mode DRAM (FPM DRAM)

cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
x Each row of memory bit array is viewed as a page
Fig. 6.5 Cache Performance x Page contains multiple words
x Individual words addressed by column address
x Timing diagram:
row (page) address sent
3 words read consecutively by sending column address for each
Extra cycle eliminated on each read/write of words from same

clock
ras
ras
cas
cas
address row col col col address row col
data data data data data

data data data
Fig. 6.9 The timing diagram in SDRAM
Fig. 6.7 The timing diagram in FPM DRAM
Rambus DRAM (RDRAM)
Extended data out DRAM (EDO DRAM)
x More of a bus interface architecture than DRAM architecture
x Improvement of FPM DRAM x Data is latched on both rising and falling edge of clock
x Extra latch before output buffer x Broken into 4 banks each with own row decoder
allows strobing of cas before data read operation completed can have 4 pages open at a time
x Reduces read/write latency by additional cycle x Capable of very high throughput
ras 6.10 DRAM Integration Problem

cas
x SRAM easily integrated on same chip as processor
address row col col col x DRAM more difficult
Different chip making process between DRAM and conventional logic
data data data data Goal of conventional logic (IC) designers:
- minimize parasitic capacitance to reduce signal propagation delays and power
Speedup through overlap consumption
Goal of DRAM designers:
Fig. 6.8 The timing diagram in EDORAM - create capacitor cells to retain stored information
Integration processes beginning to appear
(S)ynchronous and Enhanced Synchronous (ES) DRAM
6.11 Memory Management Unit (MMU)
x SDRAM latches data on active edge of clock
x Eliminates time to detect ras/cas and rd/wr signals x Duties of MMU
x A counter is initialized to column address then incremented on active edge of clock to Handles DRAM refresh, bus interface and arbitration
access consecutive memory locations Takes care of memory sharing among multiple processors
x ESDRAM improves SDRAM Translates logic memory addresses from processor to physical memory addresses
added buffers enable overlapping of column addressing of DRAM
faster clocking and lower read/write latency possible x Modern CPUs often come with MMU built-in
x Single-purpose processors can be used
6.12 Question
Q1. Discuss different types of cache mappings.

Ans: SDRAM
Direct, Fully Associative, Set Associative clock
Q2 Discuss the size of the cache memory on the system performance. ras
cas
Ans:
address row col
0.16
data data data data
0.14
0.12
0.1 1 way
% cache miss
2 way
0.08
4 ways
0.06 8 way
0.04
0.02
0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
Q3. Discuss the differences between EDORAM and SDRAM
Ans:
EDO RAM
ras
cas
address row col col col
data data data data
Speedup through overlap

Embedded Systems Memory-2

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Embedded Systems Memory-2

Загружено:

Авторское право:

Доступные форматы

Module

Version 2 EE IIT, Kharagpur 3 Version 2 EE IIT, Kharagpur 4

Fig. 6.2 Direct Mapping Fig. 6.4 Set Associative Mapping

Fully Associative Mapping 6.4 Cache-Replacement Policy

6.5 Cache Write Techniques

Version 2 EE IIT, Kharagpur 5 Version 2 EE IIT, Kharagpur 6

0 Fast Page Mode DRAM (FPM DRAM)

Version 2 EE IIT, Kharagpur 7 Version 2 EE IIT, Kharagpur 8

address row col col col address row col

data data data data data

ras 6.10 DRAM Integration Problem

Version 2 EE IIT, Kharagpur 9 Version 2 EE IIT, Kharagpur 10

Q3. Discuss the differences between EDORAM and SDRAM

address row col col col

data data data data

Speedup through overlap

Version 2 EE IIT, Kharagpur 11 Version 2 EE IIT, Kharagpur 12

Вам также может понравиться