Вы находитесь на странице: 1из 17

Computer Architecture and Organization

Lecture16: Cache Performance

Majid Khabbazian mkhabbazian@ualberta.ca


Electrical and Computer Engineering University of Alberta

April 9, 2013

CPU execution time: revisited


Lets now account for cycles during which the processor is stalled waiting for a memory access
CPU execution time= (CPU clock cycles + Memory stall cycles) x Clock cycle time

Memory stall cycles


Simplifying assumptions:
CPU clock cycles include the time to handle a cache hit The processor is stalled during a cache miss
Memory stall cycles=number of misses x Miss penalty =IC x (Memory accesses / Instructions) x Miss rate x Miss penalty

Miss rates and miss penalties are often different for reads and writes!
Memory stall cycles=IC x Reads per instruction x Read miss rate x Read miss penalty
+ IC x Writes per instruction x Write miss rate x Write miss penalty

Example 15.2
Assume we have a computer where the cycles per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hit?

Average memory access time


A measure of memory hierarchy performance

Average memory access time = Hit time + Miss rate x Miss penalty

Impact on Processor Performance


What is the impact of average memory access time on processor performance?
Assumptions:
Processor stalls during misses.
We have an in-order execution processor

Memory hierarchy dominates other reasons of stalls.

Then, average memory access can somewhat predict processor performance


The impact is higher on processors with low CPI
6

Example 15.3
Assume that the cache miss penalty is 200 clock cycles, and all instructions normally take 1.0 clock cycles (ignoring memory stalls). Assume that the average miss rate is 2%, and there is an average of 1.5 memory references per instructions.
What is the impact on performance when behavior of the cache is included? Compare this to the case where there is no cache.

Some basic cache optimization

Intel Core i7 Cache Hierarchy


Processor package Core 0 Core 3

Regs
L1 L1 d-cache i-cache L2 unified cache

Regs
L1 L1 d-cache i-cache L2 unified cache

L1 i-cache and d-cache: 32 KB, 8-way, Access: 4 cycles

L2 unified cache: 256 KB, 8-way, Access: 11 cycles


L3 unified cache: 8 MB, 16-way, Access: 30-40 cycles Block size: 64 bytes for all caches.

L3 unified cache (shared by all cores)

Main memory

Intel Smart Cache

10

Intel Smart Cache

11

Improve Cache Performance


improve cache and memory access times:
Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty

Reducing each of these! Simultaneously?

Improve performance by: 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache.

12

1) Larger Block Size


Larger block size to reduce miss rates
Take advantage of special locality

The block size should not be too large!


Extreme case: there is only one block What if the working set is larger than the cache ALSO Larger block size increases miss penalty!

How to decide then?


13

Example 16.1
Assume the memory system takes 80 clock cycles of overhead and then delivers 16 bytes every 2 clock cycles. Based on the following table, which block size has the smallest average memory access time? Assume the hit time is 1 independent of block size.

Block size Miss rate

16 3.94%

32 2.87%

64 2.64%

128 2.77%

14

2) Larger Caches
Larger caches to reduce miss rate
The obvious way to reduce capacity miss

Drawbacks?
Potentially longer hit time Higher cost Higher power

15

3) Higher Associativity
Higher associativity to reduce miss rate
Reduces conflict misses

Drawbacks?
Longer hit time

16

4) Multilevel Caches
Multilevel caches to reduce miss penalty
Reducing cache penalty can be just as beneficial as reducing miss rate Miss penalty is increasing (DRAMs become relatively slower than CPUs)

Average memory access


How to analyze for a two-level cache?

17

Вам также может понравиться