Вы находитесь на странице: 1из 6

MAJOR TECHNICAL PROJECT ON

Reconfigurable Cache Architecture

INTERIM PROGRESS REPORT


to be submitted by

J.Raghunath
B15216

for the award of the degree


of

BACHELOR OF TECHNOLOGY IN
(Electrical) ENGINEERING

SCHOOL OF COMPUTING AND ELECTRICAL ENGINEERING


INDIAN INSTITUTE OF TECHNOLOGY MANDI, MANDI
September 2018
1 Introduction
Since 1980s, the trend has been that there is increasing gap between the processor speed and
the main memory speed.Today the top-line processor operates at the speed of 4.1 GHz, which
used to be the order of 5MHz in 1980s at that time processor and main memory operated at
around the same speed but Sadly the Main Memory speed rise couldn’t mirror the rise of speed
of the Processors due to advances in Instruction Level Parallelism. So, Main Memory speed
became the bottleneck in faster execution of programs as the gap started to increase in the trend.
Cache was added to the memory hierarchy, somewhat bridged the speed gap of processor and
memory but came at cost that’s area per bit. Cache uses SRAM technology, which needs 6
CMOS transistor to store one bit of information.
All solution have tried to get optimum cache solution against the trade off of Area(Cost) and
memory speed.

2 Background

Figure 1: Memory Heirarchy

Cache exploits the temporal and spatial locality of memory access of program. Using this
, it gives the processor the feel of faster memory by as fast as the highest level of memory
hierarchy and as large as the lowest hierarchy. This type of emasculation of main memory
is only successful when all memory access request is present in the highest level, this means
cache have very very low miss rate and very fast hit time and very low miss penalty.

Average − hit − time = hit − time + miss − rate ∗ miss − penalty (1)

For ideal cache , hit time , miss rate and miss penalty is zero. But in reality this gets limited to
due to limited size of the cache. So, People have come with various methods to achieve cache
specification closer to the ideal specification of cache.
Each cache level is divided into blocks. Block is set of contiguous memory location, whenever
miss occurs while processing request cache retrieve the requested address it is always done in
form block to exploit the spatial locality. Blocks present in cache takes care of temporal locality
of memory access. Different Blocks of main memory maps to the blocks of the main memory
based on which one is selected out of the three block mapping policy.
The three block mapping policy are :-

1. Direct mapping : Blocks of the main memory only maps to the specific block in cache

Figure 2: Direct Mapped Cache

memory even if cache in other location is empty. This gives to rise to conflict misses,
as it might happen that block which needs to accessed in the memory is not present and
can be made available only by replacing it with block present in its location to mapped
even if there is other empty space. Advantage , no need for tag search circuit as only one
tag is present for one unique block(index), is pumped out for tag comparison to ascertain
if address is present or not. So, hit time is low but miss rate would high due to conflict
miss.

2. Fully Associative mapping : Block of main memory can map to any block in the cache

Figure 3: Fully Associatve Mapped Cache

memory level. This takes care of the conflict miss happens in the direct mapped cache but
needs to search and compare for the tag requested this reduces hit time. Conflict misses
is zero Misses in this cache occur only due to compulsory miss and capacity miss.

2
3. Set Associative mapping : Tries to blend in the advantages of the above mentioned

Figure 4: Fully Associatve Mapped Cache

caches. Block of main memory can maps into to a specific set of blocks in a cache.
This reduces conflict misses and reduces number tag comparison and search to number
of blocks in the set.

3 Problem Statement
In fixed cache architecture , block size, cache size and associativity is fixed to level decided in
the manufacturing stage of processor with no scope to change subsequently. Different program
have different cache requirement. High amount cache size and associativity leads to low miss
rate but high cache search time, this slows down the cache access time. So program with high
memory requirement need high cache size and associativity as servicing the miss rate becomes
the common cause of memory access. For program with low memory interaction, servicing
misses is not the main concern but how fast it does take to service is the common cause. If they
are forced to operate with cache which is way out of mark from the required cache requirement
leads to sub optimal cache operation.
So, There is strong case for ditching the ’One Size fits all’ approach.
Following approaches were discussed in the literature which ditched Fixed cache architecture:

1. Flexi Core Architecture : In this architecture, There is another core in addition to proces-

Figure 5: Flexi Core Architecture

sor which collects the program statistics when the program runs for the first time based on

3
the statistics Flexi-Core architecture configured the caches based on the program statis-
tics for subsequent program runs .Here the programs runs sub optimally first time but
adapts for subsequent runs.

2. Tournament Caching : There are three modes of cache operation namely Normal mode

Figure 6: Tournament Caching

cache always starts operating from this mode, Small tournament Cache mode and Large
tournament Cache mode. Tournament length size is fixed before the execution of the
program, it is number of consecutive hits or misses based on which transition between
the modes takes place. Say the program starts to execute , First mode Normal Mode
cache operation no.of consecutive hits equals to tournament length then cache move to
small tournament cache. Cache will remain in small tournament cache mode till number
of cache misses is not equal to tournament length. So cache adapts to different need for
cache requirement for different program execution region.

3. Reconfigurable Cache Architecture(University of Havana) In this block size is reduction

Figure 7: Reconfigurable Cache

is the only degree freedom for cache reconfiguration. They implemented this architecture
with Microblaze Processor IP on the FPGA and collected execution statistics.

4. Dynamically Tuneable Memory Hierarchy In this architecture cache reconfiguration has


reconfiguration freedom in all three direction namely Cache Size , Block Size and Asso-
ciativity. They proposed in hardware architecture approach to achieve this and hardware
implementable FSM to control state transition between the states. The granularity of
mode of operation of cache is high in this reconfigurable cache. They have simulated

4
Figure 8: Tuneable Memory Hierarchy

this on CACTI simulation environment adapted to cache reconfiguration. We are trying


to implement this architecture in FPGA and modified hardware FSM operation.

4 Tentative Plan
1. Do Simulation on Gem5 to study effect Cache specification in execution of different
Benchmarks.

2. Implement Dynamically Tuneable Cache Architecture on Zynq Board.

3. Come up with FSM model to control transition between the different modes of caches
operation.

4. Collect execution statistics of programs(benchmark)

5. Compare it with Fixed cache architecture’s execution statistics

5 References
1. A Dynamically Tunable Memory Hierarchy

2. Reconfigurable Caches and their Application to Media Processing

3. Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Recon-


figurable Fabric

4. Computer Architecture - A Quantitative Approach