MTP 01 FINAL J.Raghunat B15216 PDF

MAJOR TECHNICAL PROJECT ON
Reconfigurable Cache Architecture
INTERIM PROGRESS REPORT

to be submitted by
J.Raghunath
B15216
for the award of the degree

of
BACHELOR OF TECHNOLOGY IN
(Electrical) ENGINEERING
SCHOOL OF COMPUTING AND ELECTRICAL ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY MANDI, MANDI
September 2018
1 Introduction
Since 1980s, the trend has been that there is increasing gap between the processor speed and
the main memory speed.Today the top-line processor operates at the speed of 4.1 GHz, which
used to be the order of 5MHz in 1980s at that time processor and main memory operated at
around the same speed but Sadly the Main Memory speed rise couldn’t mirror the rise of speed
of the Processors due to advances in Instruction Level Parallelism. So, Main Memory speed
became the bottleneck in faster execution of programs as the gap started to increase in the trend.
Cache was added to the memory hierarchy, somewhat bridged the speed gap of processor and
memory but came at cost that’s area per bit. Cache uses SRAM technology, which needs 6
CMOS transistor to store one bit of information.
All solution have tried to get optimum cache solution against the trade off of Area(Cost) and
memory speed.
2 Background
Figure 1: M emoryHeirarchy [1]
Cache exploits the temporal and spatial locality of memory access of program. Using this
, it gives the processor the feel of faster memory by as fast as the highest level of memory
hierarchy and as large as the lowest hierarchy. This type of emasculation of main memory
is only successful when all memory access request is present in the highest level, this means
cache have very very low miss rate and very fast hit time and very low miss penalty.
Average − hit − time = hit − time + miss − rate ∗ miss − penalty (1)
For ideal cache , hit time , miss rate and miss penalty is zero. But in reality this gets limited to
due to limited size of the cache. So, People have come with various methods to achieve cache
specification closer to the ideal specification of cache.
Each cache level is divided into blocks. Block is set of contiguous memory location, whenever
miss occurs while processing request cache retrieve the requested address it is always done in
form block to exploit the spatial locality. Blocks present in cache takes care of temporal locality
of memory access. Different Blocks of main memory maps to the blocks of the main memory
based on which one is selected out of the three block mapping policy.
The three block mapping policy are :-
1. Direct mapping : Blocks of the main memory only maps to the specific block in cache
Figure 2: DirectM appedCache[2]
memory even if cache in other location is empty. This gives to rise to conflict misses,
as it might happen that block which needs to accessed in the memory is not present and
can be made available only by replacing it with block present in its location to mapped
even if there is other empty space. Advantage , no need for tag search circuit as only one
tag is present for one unique block(index), is pumped out for tag comparison to ascertain
if address is present or not. So, hit time is low but miss rate would high due to conflict
miss.
2. Fully Associative mapping : Block of main memory can map to any block in the cache
Figure 3: F ullyAssociatveM appedCache[2]
memory level. This takes care of the conflict miss happens in the direct mapped cache but
needs to search and compare for the tag requested this reduces hit time. Conflict misses
is zero Misses in this cache occur only due to compulsory miss and capacity miss.
3. Set Associative mapping : Tries to blend in the advantages of the above mentioned
caches. Block of main memory can maps into to a specific set of blocks in a cache.
2
Figure 4: F ullyAssociatveM appedCache[2]
This reduces conflict misses and reduces number tag comparison and search to number
of blocks in the set.
3 Problem Statement
In fixed cache architecture , block size, cache size and associativity is fixed to level decided in
the manufacturing stage of processor with no scope to change subsequently. Different program
have different cache requirement. High amount cache size and associativity leads to low miss
rate but high cache search time, this slows down the cache access time. So program with high
memory requirement need high cache size and associativity as servicing the miss rate becomes
the common cause of memory access. For program with low memory interaction, servicing
misses is not the main concern but how fast it does take to service is the common cause. If they
are forced to operate with cache which is way out of mark from the required cache requirement
leads to sub optimal cache operation.
So, There is strong case for ditching the ’One Size fits all’ approach.
Following approaches were discussed in the literature which ditched Fixed cache architecture:
1. Flexi Core Architecture : In this architecture, There is another core in addition to proces-
Figure 5: F lexiCoreArchitecture[3]
sor which collects the program statistics when the program runs for the first time based on
the statistics Flexi-Core architecture configured the caches based on the program statis-
tics for subsequent program runs .Here the programs runs sub optimally first time but
adapts for subsequent runs.
3
2. Tournament Caching : There are three modes of cache operation namely Normal mode
Figure 6: T ournamentCachingZ [6]
cache always starts operating from this mode, Small tournament Cache mode and Large
tournament Cache mode. Tournament length size is fixed before the execution of the
program, it is number of consecutive hits or misses based on which transition between
the modes takes place. Say the program starts to execute , First mode Normal Mode
cache operation no.of consecutive hits equals to tournament length then cache move to
small tournament cache. Cache will remain in small tournament cache mode till number
of cache misses is not equal to tournament length. So cache adapts to different need for
cache requirement for different program execution region.
3. Reconfigurable Cache Architecture(University of Havana) In this block size is reduction
Figure 7: Reconf igurableCache[4]
is the only degree freedom for cache reconfiguration. They implemented this architecture
with Microblaze Processor IP on the FPGA and collected execution statistics.
4. Dynamically Tuneable Memory Hierarchy In this architecture cache reconfiguration has
Figure 8: T uneableM emoryHierarchy [8]
4
reconfiguration freedom in all three direction namely Cache Size , Block Size and Asso-
ciativity. They proposed in hardware architecture approach to achieve this and hardware
implementable FSM to control state transition between the states. The granularity of
mode of operation of cache is high in this reconfigurable cache. They have simulated
this on CACTI simulation environment adapted to cache reconfiguration. We are trying
to implement this architecture in FPGA and modified hardware FSM operation.
5 Implementation
We want to compare the working of cache working with the philosophy of fixed cache and
another which works on the philosophy of reconfigurable cache. To study these caches , we
would implement it on the Zync Board. In view of this we implemented fixed cache using
Verilog Hardware description language and have got our simulation results. We have not yet
synthesised the results. We would do those shortly. Following is the specification of the fixed
cache
1. L1 Direct mapped
2. L1 size 64 KB
3. word size 4 Bytes
4. Block size 32 words
5. L2 size 512KB
6. L2 set associativity - 8
7. Address bits 32
8. offset bits 5
9. index bits 9
10. tag bits 18
Our cache is emulation of actual cache as we are not specifically storing data into the fields.
This cache faithfully follows the actual cache sans actually holding the data value. This would
save us resources when we implement this on the actual Field Programmable Gated Array.
The address which comes to the L1 cache and the subsequent levels of the cache is the actual
Physical address. This makes our cache as Physically indexed Physically Tagged cache. The
other approach would have been Virtually indexed Physically Tagged cache in this cache L1
receives the virtual address , this reduces the latency of operation for L1 and rest of levels
receives physical address after getting translated from virtual address using TLB and Page
Table maintained by the OS. So in short the assumption is each level gets the translated address
5
physical address.
Cache controller , a Finite State Machine is also implemented in order to control the operation
which are performed by the Cache in both the Levels. Cache controller have the following
states
1. S1: Read from L1 state: Cache controller moves to reset state if hit happens. If miss
happens Cache transits to Read from L2 state
2. S2 : Read from L2 state: Cache controller first moves to write L2 than to write L1 state
subsequently if miss happens. If hit happens it moves to L1 write stage.
3. S3 : Write L1 : Cache controller sends signal which allows writing in the L1 stage of the
cache. Then moves to reset stage
4. S4 : Write L2 : Cache controller sends signal which allows writing in the L2 stage of the
cache. Then moves to Write L1 stage
5. S5: Reset : Cache controller always starts from this state whenever new address needs to
be processed and moves to Read L1
rst
0
start s5 s1 s2
1
0
1
s3 s4
6
6 Results
1. L1 cache simulation result
Figure 9: L1 cache simulation result
2. L2 cache simulation result
Figure 10: L2 cache simulation result
3. Cache controller
7
Figure 11: Cache Controller simulation result
7 Future Plan
Next step in our plan reconfigurable cache,This cache would allow reconfiguration of block
size, set associativity and capacity . Block size reconfiguration needs us look into address
translation as the mapping of tag and index bits changes . And to come up with cache controller
logic which reconfigures the reconfigurable cache , which takes care of the transitions based on
the statistics collected for miss rate and hit rate and filters out noisy transitions.
8 References
1. John L. Hennessy,John L. Hennessy,“Computer Architecture : A Quantitative Approach”,
Elsevier, 5th Edition
2. Wikipedia contributors, “Cache Placement Policies,” Wikipedia, The Free Encyclopedia,

Nov 2018.
3. Daniel Y. Deng, Daniel Lo, Greg Malysa, Skyler Schneider and G. Edward Suh, “Flex-
ible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfig-
urable Fabric”
4. Santana Gil, A. D.,Benavides Benitez, J.I. ,Hernandez Calviño, M. Herruzo Gómez, “Re-
configurable Cache implemented on an FPGA ”, 2010 International Conference on Re-
configurable Computing
5. Ing-Jer Huang , Chun-Hung Lai, Yun-Chung Yang, Hsu-Kang Dow, and Hung-Lun
Chen,“A Reconfigurable Cache for Efficient Use of Tag RAM as Scratch-Pad Mem-
ory”,IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYS-
TEMS, VOL. 26, NO. 4, APRIL 2018
6. Safaa S. Omran,Ibrahim A. Amory ,“Comparative Study of Reconfigurable Cache Memory”,CIC-

COCOS-17
8
7. Ashish Ranjan, Shankar Ganesh Ramasubramanian, Rangharajan Venkatesan, Vijay Pai,
Kaushik Roy and Anand Raghunathan,“DyReCTape: A Dynamically Reconfigurable
Cache using Domain Wall Memory Tapes”
8. Adam Spanberger,“Designing a Dynamically Reconfigurable Cache for High Perfor-

mance and Low Power”,School of Engineering and Applied Science University of Vir-
ginia
9. Sparsh Mittal,“Dynamic cache reconfiguration based techniques for improving cache

energy efficiency”,Iowa State University
10. Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas,“A
Dynamically Tunable Memory Hierarchy”,IEEE TRANSACTIONS ON COMPUTERS,
VOL. 52, NO. 10, OCTOBER 2003
11. Parthasarathy Ranganathan,Sarita Adve,Norman P. Jouppi,“Reconfigurable Caches and

their Application to Media Processing”,Proceedings 27 International Symposium on Com-
puter Architecture, June 2000
12. Daniel Y. Deng, Daniel Lo, Greg Malysa, Skyler Schneider and G. Edward Suh, “Flex-
ible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfig-
urable Fabric”

MTP 01 FINAL J.Raghunat B15216 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MTP 01 FINAL J.Raghunat B15216 PDF

Загружено:

Авторское право:

Доступные форматы

MAJOR TECHNICAL PROJECT ON

Reconfigurable Cache Architecture

INTERIM PROGRESS REPORT

for the award of the degree

SCHOOL OF COMPUTING AND ELECTRICAL ENGINEERING

Figure 1: M emoryHeirarchy [1]

Figure 2: DirectM appedCache[2]

Figure 3: F ullyAssociatveM appedCache[2]

Figure 6: T ournamentCachingZ [6]

3. Reconfigurable Cache Architecture(University of Havana) In this block size is reduction

Figure 7: Reconf igurableCache[4]

4. Dynamically Tuneable Memory Hierarchy In this architecture cache reconfiguration has

Figure 8: T uneableM emoryHierarchy [8]

3. word size 4 Bytes

4. Block size 32 words

10. tag bits 18

Figure 9: L1 cache simulation result

2. L2 cache simulation result

Figure 10: L2 cache simulation result

2. Wikipedia contributors, “Cache Placement Policies,” Wikipedia, The Free Encyclopedia,

6. Safaa S. Omran,Ibrahim A. Amory ,“Comparative Study of Reconfigurable Cache Memory”,CIC-

8. Adam Spanberger,“Designing a Dynamically Reconfigurable Cache for High Perfor-

9. Sparsh Mittal,“Dynamic cache reconfiguration based techniques for improving cache

11. Parthasarathy Ranganathan,Sarita Adve,Norman P. Jouppi,“Reconfigurable Caches and

Вам также может понравиться