Вы находитесь на странице: 1из 5

Miss Penalty Reduction

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-1

Early restart/Critical word first


m

Early Restart:
u As soon as the requested word arrives, forward it to the
processor
u Miss penalty is now time to fetch the requested word

Critical Word First:


u Problem still with Early Restart: What if requested word
is in the middle of the block?
u Start fetching the block with the required (critical)
word, and fill in the rest

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-2

Miss penalty reduction via L2 caches


m

If it takes 100 cycles to go to main memory,


why not put a cache between the L1 cache
and main memory?
Average access time = Hit time L1 + Miss rate L1 Miss penaltyL1
Miss penalty L1 = Hit time L2 + Miss rate L2 Miss penalty L2

L1 cache
4 cycles
L2 cache
100 cycles
main memory

Hit time L1 = 1 cycle


Miss rate L1 = 0.05
Hit time L2 = 4 cycles
Miss rateL2 = 0.01
Miss penaltyL2 = 100 cycles
Miss penaltyL1 = 4 + 0.01 100 = 5 cycles
Average access time = 1 + 0.05 5
= 1.25 cycles (w/o L2 : 6 cycles)

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-3

Sub-blocking
m
m

Problem: Tags are overhead, they take up extra space


Partial Solution: Large blocks reduce amount of tag storage
u
u
u
u
u

Say we keep cache size fixed, but double block size


Then the number of blocks is halved
The number of tags is also halved (# tags == # blocks)
So, weve reduced amount of tag overhead, conserving space on the chip
BUT large blocks increase the cache miss penalty

Complete Solution:
u
u
u
u

Use large blocks, but also divide blocks into smaller sub-blocks
Fetch only 1 sub-block on a miss
Keep valid bits for sub-blocks
Better if other sub-blocks are prefetched in the background (that is,
combine sub-blocking with early restart and critical word first)

block

valid bits

tag
sub-block

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-4

Write buffer in WTNA


m

WTNA
1. Write to L1 cache, if block is there
2. Send write to next level in the memory hierarchy (may
percolate further down until either a WBWA cache or
main memory is reached)

Dont stall processor


u Do the write in the background
u Need a write buffer
u Processor delegates responsibility to the write buffer,
for propagating the write to the memory hierarchy

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-5

Write buffer in WTNA (cont.)


m

Consider a read miss


u Before L1 cache requests block from next level, must check
the write buffer for any pending writes to the requested
block
If there are any pending writes to the block, either:
l Drain the write buffer before requesting the block from next
level (i.e., wait until writes are performed)
-ORl Request the block and merge in the write data when the block
arrives

u Cheaper solution
Do not check write buffer
Must always drain write buffer before requesting a
block from next level (whether or not there are
pending writes to the requested block)

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-6

Write buffer in WTNA (cont.)


write request
from processor

L1 cache
WTNA policy

data

address

extract
block addr

read miss
block address
=?
=?
=?
=?

write to next level

Next level in mem. hier.


(L2 cache or main memory)

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-7

Write buffer in WBWA


m

WBWA
u Write miss, like read miss, causes an allocation
u Stall processor until requested block is received, then
perform write to the block

Dont stall processor


u Do the allocation and write in the background
u Need a write buffer (and more see next slide)
u Processor delegates responsibility to the write buffer,
for performing the write when the requested block
makes it into the cache

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-8

Write buffer in WBWA (cont.)


m

But theres more to it


u Need special hardware to track the miss
Miss Handling Status Register (MHSR)
u This takes us into the realm of a more sophisticated
class of caches, non-blocking (or lockup-free) caches,
that support:
Hit-under-miss
l Free up the cache port for subsequent cache hits, even
though there is an outstanding miss

Miss-under-miss

l Support multiple outstanding misses (multiple MHSRs)


u Previous slide requires a limited non-blocking cache
Hit-under-write-miss
l Subsequent read and write hits may proceed despite prior
write misses

Write-miss-under-write-miss

l One or more MHSRs for tracking write misses

Read miss stalls our simple in-order processor,


so nothing fancy with respect to read misses

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-9

Writeback buffer in WBWA


m

Recall what causes a writeback


u Read or write misses in cache (requested block X not found)
u Identify a victim block V for replacement
u If block V is dirty, write-back block V to next level
u Fetch block X
u Finally perform read or write

Writeback need not stall this process


u Processor wants block X
u Block V has nothing to do with block X
Except that V needs to leave to make room for X
Dont need to wait for writeback
Transfer V from cache to a temporary holding area
(writeback buffer)
Deal with writeback after X has been all taken care of

ECE 463/521, Profs Conte/Rotenberg/Sair


Conte/Rotenberg/Sair,, Dept. of ECE, NC State University

CACHE6-10

Вам также может понравиться