Вы находитесь на странице: 1из 8

Memory HW for Block Transfer to Cache

Example: DRAM burst mode

EEL
3801

Given: a Memory Bus interface from processor to DRAM main memory that takes the following cycles:
3 memory bus clock cycle to send the address
20 memory bus clock cycles for each DRAM access initiated
1 memory bus clock cycle to send a word of data
Sought: find the number of clock cycles required to transfer a complete block of 4 words from
memory to the cache, for each of the following designs:
Partial Credit 1: the interface between main memory and cache is one word wide.
Solution: 3+20*4+4=87 cycles.
Partial Credit 2: the interface between main memory and cache is 4 words wide (and storage is uninterleaved).
Solution: Here, 3+20+1=24 cycles
Partial Credit 3: the interface between main memory and cache is 1 word wide with interleaved
storage.
Solution: Here, 3+20+4=27 cycles
Partial Credit 4: What is the number of bytes transferred per cycle for the 4-word wide un-interleaved
design?
Answer: 0.67 or 16/24 or 2/3 or 0.667. Here, 1 block =4 words=16 bytes. So based on 8.2)
above we see 16 bytes would be transferred in 24 cycles. Thus, 16/24=2/3=0.67 bytes/cycle.
Study Set 12 Memory Components and DRAM

slide 1

EEL
3801

Average Memory Access Time (AMAT)


Miss rate has large impact on memory system performance
Define: Average Memory Access Time (AMAT)
AMAT = Hit time + Miss rate Miss penalty

Example
CPU with a 1nsec clock cycle time (1 GHz clock rate)
Instruction cache access time = 1 cycle = 1 nsec (time for hit)
Cache miss penalty = 50 cycles = 50 nsec to go to main memory
If cache miss rate=10% then AMAT=1+0.150= 1+5 = 6 nsec
6 cycles per instruction on average (memory is slowing down CPU by factor of
6-fold (memory access time)/(CPU clock period)= 6 nsec / 1 nsec = 6.
If we reduce cache miss rate to 1% then AMAT=1+0.0150= 1+0.5 = 1.5 nsec
1.5 cycles per instruction on average (memory is slowing down CPU by a factor of
50%)
If hit rate= 99.99% then AMAT=1+0.000150= 1+0.005 = 1.005 nsec
about 1 cycles per instruction on average (CPU runs at nearly full speed, so it
appears to the CPU as if all of memory is made of cache)
Study Set 12 Memory Components and DRAM

slide 2

Example on Memory Stall Cycles

EEL
3801

Consider a program with the given characteristics

Instruction count = 106 instructions


30% of instructions are loads and stores
D-cache miss rate is 5% and I-cache miss rate is 1%
Miss penalty is 100 clock cycles for instruction and data caches
Compute combined misses per instruction and memory stall cycles

Combined misses per instruction in I-Cache and D-Cache


1% + 30% 5% = 0.025 combined misses per instruction
Equal to 25 misses per 1000 instructions

Memory stall cycles


0.025 100 (miss penalty) = 2.5 stall cycles per instruction
Total memory stall cycles = 106 2.5 = 2,500,000

Study Set 12 Memory Components and DRAM

slide 3

Quantifying Cache Impact

Study Set 12 Memory Components and DRAM

EEL
3801

slide 4

Calculating Miss Penalty

Study Set 12 Memory Components and DRAM

EEL
3801

slide 5

Memory Design

EEL
3801

conceptual cell organization


Design of an (15-word x 8-bit) SRAM device
b7

in

out
FF

b7

b1

in

out
FF

b1

b0

in

out
FF

b
b00

Word0

enable

Address
Input
lines

A0

FF

Word1

FF

FF

A1
A2

enable
Address
decoder

memory
cells as D F/Fs

A3
FF

Sense / Store
circuit

Data Input / Output lines:

FF

Sense / Store
circuit

Sense / Store
circuit

b1

b0

Word15
enable

FF

b7

Study Set 12 Memory Components and DRAM

R/W
Memory Enable

components
slide 6

SRAM Cell

EEL
3801

using 2 crosscoupled inverters and 2 MOSFETs


Write Operation
The state of the cell is set
by placing the
appropriate value on bit
line b
Signals on bit lines
generated by Sense/Write
circuit
Then, activate the word
line, forcing the cell
into the corresponding
state
Each SRAM bit cell
requires 6 transistors
(so-called 6T cell)

Study Set 12 Memory Components and DRAM

Static RAM bit cell


in

out

b0

T1

b0

T2

Word line
Bit lines

slide 7

EEL
3801

Dynamic RAM Design


Dynamic RAM (DRAM) stores data in the form of a charge on a
capacitor. Charge leaks away with time (about ten milliseconds) so must
be refreshed. Hence, called dynamic RAM. In return for this refresh
overhead, the density is much higher.
Only {1 transistor + 1 capacitor}
vs
6T per bit for SRAM
Higher density (more capacity per
unit area)
Reduced cost/bit
Much slower access time as
capacitor is involved
b0

Study Set 12 Memory Components and DRAM

slide 8

Вам также может понравиться