Cache Memory

CACHE MEMORY
Dr. Pardeep Kumar
In the last set of lectures

Components of a computer Functions of a computer Interconnection Structures Bus Interconnection Peripheral Component Interconnect (PCI)
In this set of lectures

Computer Memory System Overview Cache Memory Principles Elements of Cache Design Pentium 4 Cache Organization ARM Cache Organization
Computer Memory System Overview

Characteristics of Memory Systems The Memory Hierarchy
Characteristics of Memory Systems

Location
Internal or External
Capacity
Number of words and number of bytes
Unit of transfer
Word or Block
Access Method
Sequential or direct or random or associative
Performance
Access time, cycle time and transfer rate
Physical type
Semiconductor or magnetic or optical or magneto-optical
Physical characteristics
Volatile/non-volatile or erasable/non-erasable
Organization
Memory modules
Characteristics of Memory Systems Location of memory

The term location refers to whether memory is internal or external to the computer.
Internal memory
Registers, cache memory, main memory, I/O buffers
External memory
Peripheral storage devices like disks and tapes
Characteristics of Memory Systems Capacity

Internal memory capacity is often expressed in Bytes External memory capacity is often expressed in Mega Bytes or Giga Bytes One thing to be noted is that as per the computer industry conventions small b represents bits and capital B represents bytes
Characteristics of Memory Systems Unit of transfer

For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module. This may be equal to the word length, but is often larger, such as 64, 128, or 256 bits. Three related concepts are:
Word: The unit of organization of main memory.
Addressable units: In some systems, the addressable unit is the word while others allow addressing at the byte level. In any case, the relationship between the length in bits A of an address and the number N of addressable units is 2A =N.
Unit of transfer: For main memory, this is the number of bits read out of memory or written into memory at a time, which need not be equal to the word length or length of addressable unit. For external memory, data are often transferred in much larger units than a word, and these are referred to as blocks.
Characteristics of Memory Systems Method of accessing

Sequential
Memory is organized into units of data called records Search for the data starts at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Individual blocks have unique address based on the physical location Access is by jumping to block and then performing sequential search Access time depends on location and previous location e.g. disk
Direct
Characteristics of Memory Systems Method of accessing

Random
Individual physical addresses identify locations exactly Access time is independent of location or previous access and is constant That is any location can be selected at random and directly addressed and accessed e.g. RAM
Associative
This is a random type of memory access that enables one to make a comparison of desired bit locations within a word for a specified match, and to do this for all words simultaneously. Thus, a word in this case is retrieved based on a portion of its contents rather than its address. Access time is independent of location or previous access e.g. cache
Characteristics of Memory Systems Performance

Performance for a memory can be measured in terms of three parameters:
Access time (latency):
For random-access memory, this is the time it takes to perform a read or write operation, that is, the time from the instant that an address is presented to the memory to the instant that data have been stored or made available for use. For non-random-access memory, access time is the time it takes to position the readwrite mechanism at the desired location.

Memory cycle time: This concept is primarily applied to random-access memory and consists of the access time plus any additional time required before a second access can commence. This time is concerned with the system bus, not the processor.

Transfer rate: This is the rate at which data can be transferred into or out of a memory unit. For random-access memory, it is equal to 1/(cycle time).

For non-random-access memory, the following relationship holds:
Characteristics of Memory Systems Physical type

The most common types of physical memory used presently are are semiconductor memory, magnetic surface memory, used for disk and tape, and optical and magnetooptical.
Characteristics of Memory Systems Physical characteristics

Volatile/non-volatile:
In a volatile memory, information decays naturally or is lost when electrical power is switched off. In a nonvolatile memory, information once recorded remains without deterioration until deliberately changed; no electrical power is needed to retain information.
Characteristics of Memory Systems Physical characteristics

Erasable/non-erasable
Erasable: New data can be stored after erasing the older one Non-erasable: In this category of memory data cannot be erased unless the storage unit is destroyed
Characteristics of Memory Systems Organization of bits

By organization we mean the physical arrangement of bits to form words.
The Memory Hierarchy Some basics

The design constraints on a computers memory can be summed up by three questions:
How much? (That is capacity of the memory) How fast? (That is the access time or the latency) How expensive? (That is the cost per bit)
The Memory Hierarchy Some basics

As you may expect there is a trade-off among the three key characteristics of memory: namely, capacity, access time, and cost.
Less capacity would lead to faster access time but greater cost per bit Greater capacity would lead to smaller cost per bit but would lead to slower access time
Solution to the above dilemma The memory hierarchy

The way out of this dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy. As one goes down the memory hierarchy, the following occur:
Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor
The Memory Hierarchy
Some observations by looking at the memory hierarchy

Looking at the above one may observe that the smaller, more expensive, faster memories are supplemented by larger, cheaper, slower memories. The key to the success of the above memory hierarchy is that as we move down the hierarchy the processors frequency of accessing particular memory is decreasing.
Principle of locality of reference

In a typical software program there are a number of iterative loops and subroutines. Once the control flow of the program enters the loops or the subroutines, there are repeated references to a small set of instructions and data. Over a long period of time this set of instructions and data may change, but over a small period it may remain same. Thus, during the course of execution of a software program, the data and the instructions tend to cluster and hence the processor is primarily working with the same set.
How the above principle is related with our memory hierarchy concepts
It is possible to store the clustered data across the hierarchy such that the most frequently used data is stored at the top of the hierarchy and hence the processors access to the slower high capacity lower memory hierarchy levels is considerably reduced.
This principle can be applied to more than two levels of memory hierarchy and is known as Principle of locality of reference.
Cache Memory Principles

The cache memory is a relatively small memory having a faster access time compared to that of the main memory. The cache contains a copy of portions of main memory. The cache may be located on the CPU itself (known as an on-chip cache) or externally on the board
Single Cache
Cache Memory Principles

When the processor attempts to read a word of memory, a check is made to determine if the word is in the cache. If so, the word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of words, is read into the cache and then the word is delivered to the processor. As per the phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that there will be future references to that same memory location or to other words in the block.
Multiple cache levels
Structure of main memory

If the number of bits per address in the main memory is n, i.e. a single word consists of n bits; then the main memory consists of up to 2n unique addresses.
For mapping purposes, each memory block is said to consist of K words each. That is K unique addresses in each block.
Thus, if M represents the number of blocks in the main memory. Then M = 2n/K.
Structure of main memory
Structure of cache memory

The cache consists of blocks known as lines with each line containing K words along with a tag of a few bits. Each line also includes control bits (not shown in the figure), such as a bit to indicate whether the line has been modified since being loaded into the cache. The length of a line, not including tag and control bits, is the line size.

As one may guess the number of lines in a cache is considerably less than the number of main memory blocks (m<<M). Thus, at any time, some subset of the blocks of main memory resides in lines in the cache.
As there are more blocks than lines, an individual line cannot be uniquely and permanently dedicated to a particular block.
Thus, each line includes a tag that identifies which particular main memory block is currently being stored in the cache line. The tag is usually a portion of the main memory address
An example flowchart Cache read operation
Typical Cache Organization
Typical Cache Organization

In the above typical cache organization, the cache connects to the processor via data, control, and address lines. The data and address lines also attach to data and address buffers, which attach to a system bus from which main memory is reached.
When a cache hit occurs, the data and address buffers are disabled and communication is only between processor and cache, with no system bus traffic.
But when a cache miss occurs, the desired address is loaded onto the system bus and the data are returned through the data buffer to both the cache and the processor.
Elements of Cache Design

Cache Addresses
Logical or Physical
Cache Size Mapping function

Direct or associative or set-associative
Replacement algorithm
LRU or FIFO or LFU or Random
Write Policy
Write through or write back or write once
Line size Number of caches

Single or two level Unified or split
Overview to virtual memory

Virtual memory is a facility that allows programs to address memory from a logical point of view, without regard to the amount of actual main memory physically available. When virtual memory is used, the address fields of machine instructions contain virtual addresses. For reads to and writes from main memory, a hardware memory management unit (MMU) translates each virtual address or logical address into a physical address in main memory.
Elements of Cache Design Cache Addresses

Based on whether the physical address or the logical address is used in the cache memory, the cache memory is divided into the following two categories:
Virtual or logical cache storing the data using virtual addresses. Physical cache storing the data using main memorys physical addresses.
Logical Cache
Physical Cache
Which one is better? Logical Cache or Physical Cache

As the logical cache directly fetches the data based on the logical addresses generated by the processor, it is faster than the physical cache as the cache responds before the MMU performs a logical to physical address translation. The disadvantage of logical cache has to do with the fact that most virtual memory systems supply each application with the same virtual memory address space. That is, each application sees a virtual memory that starts at address 0. Thus, the same virtual address in two different applications refers to two different physical addresses. Thus the cache memory must therefore be completely flushed with each application context switch, or extra bits must be added to each line of the cache to identify which virtual address space this physical address refers to.
Elements of Cache Design Cache Size

The larger the cache, the larger the number of gates involved in building the cache. The result is that large caches tend to be slightly slower than small ones. We would like the size of the cache to be big enough so that the overall average cost per bit is close to that of main memory and small enough so that the overall average access time is close to that of the cache alone.
Thus, it is almost impossible to arrive at a single optimum cache size.
Cache Sizes of some Processors
As there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines. This algorithm is implemented in the form of the mapping function. Mapping function also provides a means for determining which main memory block currently occupies a cache line. The choice of the mapping function dictates how the cache is organized.
Elements of Cache Design Mapping Function
Mapping function techniques

Three techniques can be used as a mapping function:
Direct mapping, Associative mapping, and Set associative mapping
An example for understanding the three mapping functions

The example we are going to use for understanding the three mapping functions includes the following elements:
The cache can hold 64 Kbytes. Data are transferred between main memory and the cache in blocks of 4 bytes each. This means that the cache is organized as 64 Kbytes that is 64 * 1024 bytes = 65536 Bytes. Thus the number of cache lines will be 65535 / 4 = 16384 which is equal to 214 lines.
The example we are going to use for understanding the three mapping functions includes the following main memory element:
The main memory consists of 16 Mbytes, that is 16 * 1024 * 1024 bytes = 16777216 bytes or 224 bytes. Thus each main memory byte can be addressed with a 24 bit address because 224 = 16 Mbytes.
Thus, for mapping purposes, main memory consists of 4M blocks of 4 bytes each.
Mapping function Direct Mapping

The simplest technique, known as direct mapping, maps each block of main memory into only one possible cache line. The mapping is expressed as
Mapping function Direct Mapping

In direct mapping, each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place The main memory address is divided into two parts The Least Significant w bits identify unique word
Most Significant s bits specify one memory block

The MSBs are further split into a cache line field r and a tag field of s-r
Direct Mapping
Direct Mapping Main Memory Address Structure

8 Tag s-r 14 Line or Slot r 2 Word w
The above represents the main memory address of 24 bits Of which the list significant 2 bits is the word identifier for identifying a unique word or byte from the block of the main memory The rest of 22 bits identifies the block in the main memory which can be mapped over the cache memory as
8 bit tag identifies the cache tag (=22-14) 14 bit identifies the cache line
The contents of the cache memory can be checked by finding the tag and then checking the line
Summarizing Direct Mapping
The effect of direct mapping is that blocks of main memory are assigned to the lines of cache memory as follows:
Direct mapping Cache Organization
Direct Mapping Example

In the example, let the number of cache lines be m = 16K = 214 As the main memory block consists of data of 4 bytes each, the corresponding mapping becomes
Thus, blocks with starting addresses 000000, 010000,,FF0000 have tag numbers 00,01,,FF, respectively.
Direct Mapping Example
Direct mapping Example Explanation

The cache system is presented with a 24-bit address. From this 24 bits, the 14-bits representing the line number is used as an index into the cache to access a particular cache line. If the 8-bit tag number matches the tag number currently stored in that cache line, then the 2-bit word number is used to select one of the 4 bytes in that line. Otherwise, the 22-bit tag-plus-line field is used to fetch a block from main memory. And from that particular block, the 2 bit word field is used to determine which byte to fetch.
Disadvantage of Direct Mapping

The direct mapping technique is simple and inexpensive to implement. Its main disadvantage is that there is a fixed cache location for any given block. Thus, if a program happens to reference words repeatedly from two different blocks that map into the same line, then the blocks have to be continually swapped in the cache, and the hit ratio will be low. This phenomenon is known as thrashing.
Associative Mapping
Associative mapping overcomes the above disadvantage of thrashing in direct mapping by permitting each main memory block to be loaded into any line of the cache.
Associative Mapping
Associative Mapping Working

In case of associative mapping, the cache control logic interprets a main memory address simply as a Tag and a Word field.
The Tag field is used to uniquely identify a block of main memory. To determine whether a main memory block is in the cache or not, the cache control logic simultaneously examine every lines tag for a match.
Associative Mapping Main Memory Address Structure

Tag 22 bit Word 2 bit
The 22 bit tag field is stored with each 32 bit block of data This tag field is compared with every tag entry in the cache line to check for hit The least significant 2 bits of address identify which 16 bit word is required from 32 bit data block
Summarizing Associative Memory
Fully Associative Cache Organization
Associative Mapping Example
Associative Mapping Example Explanation

The main memory address consists of a 22-bit tag and a 2-bit byte number. The 22-bit tag must be stored with the 32-bit block of data for each line in the cache. Note that it is the leftmost (most significant) 22 bits of the address that form the tag. Thus, the 24-bit hexadecimal address 16339C has the 22-bit tag 058CE7. This can be easily seen in the binary notation:
Associative Mapping Disadvantage

With associative mapping, there is flexibility as to which cache block from the cache line to replace when a new block is read into the cache. Certain replacement algorithms, (to be discussed in coming slides), are designed to maximize the hit ratio. The principal disadvantage of associative mapping is the complex circuitry required to examine the tags of all the cache lines in parallel.
Set Associative Mapping

Set-associative mapping is a compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages.
Set Associative Mapping

In this case, the cache consists of a number sets, each of which consists of a number of lines. The relationships are
The above is referred to as k-way set-associative mapping. With set-associative mapping, block Bj can be mapped into any of the lines of set j.
Set Associative Mapping Explained

As in associative mapping where each word maps into multiple cache lines. For set-associative mapping, each word maps into all the cache lines in a specific set, so that main memory block B0 maps into set 0, and so on.
Thus, the set-associative cache can be physically implemented as v-associative caches.
An example of v-Associativemapped cache
Another implementation of set-associative mapping as k direct mapping caches

Here each direct-mapped cache is referred to as a way, consisting of many cache lines. The first lines of main memory are directly mapped into the cache lines of each way; the next group of lines of main memory are similarly mapped, and so on.
K Direct-mapped caches
Comparing the two approaches

The direct-mapped implementation is typically used for small degrees of associativity (small values of k) while the associative-mapped implementation is typically used for higher degrees of associativity.
Comparing Fully associative and kway set associative mapping

With fully associative mapping, the tag in a main memory address is quite large and must be compared to the tag of every line in the cache. On the other hand, with k-way set-associative mapping, the tag in a memory address is much smaller and is only compared to the k tags within a single set.
Summarizing k-way set associative mapping
K-Way Set Associative Cache Organization
Set Associative Mapping Main Memory Address Structure

Tag 9 bit Set 13 bit Word 2 bit
From the above main memory address structure the set field is used to determine the cache set to look into Then the tag field is compared with the tag field of the cache set to see if we have a hit
Two way set associative mapping means that each set in the cache memory comprises of two cache lines. The 13-bit set number identifies a unique set of two lines within the cache. It also gives the number of the block in main memory as modulo 213 which determines the mapping of blocks into lines. Thus, blocks 000000, 008000,,FF8000 of main memory maps into cache set 0. Any of those blocks can be loaded into either of the two lines in the set. Note that no two blocks that map into the same cache set have the same tag number. For a read operation, the 13-bit set number is used to determine which set of two lines is to be examined. Both lines in the set are examined for a match with the tag number of the address to be accessed.
Two-way set associative mapping example
Two-way set associative mapping example
Two way set associative mapping advantages

The use of two lines per set, i.e. two-way set associative mapping is the most common set-associative organization. It significantly improves the hit ratio over direct mapping. Four-way set associative also makes a modest additional improvement for a relatively small additional cost. Further increases in the number of lines per set have little effect.
Elements of cache design Replacement Algorithms

Once the cache has been filled, when a new data block is brought into the cache, one of the existing blocks must be replaced. In case of direct mapping, there is only one possible cache line for any particular block, and hence no choice is possible. For the associative and set-associative mapping techniques, a replacement algorithm is needed. Common replacement algorithms used are: Least Recently Used (LRU) First In First Out (FIFO) Least Frequently Used (LFU)
Replacement Algorithms LRU

LRU or least recently used is probably the most effective replacement algorithm. As the name suggests, in LRU, we replace that block in the set that has been in the cache longest with no reference to it. Implementing LRU in a two-way set associative is quite easily.
LRU Implementation for 2-way set associative mapping

For implementing LRU, for a 2-way set associative mapping, each cache line includes a USE bit. Whenever a cache line is referenced, its USE bit is set to 1 and the USE bit of the other line in that set is set to 0.
When a block is to be read into the set, the line whose USE bit is 0 is used.
LRU Implementation for fully associative mapping

LRU is also relatively easy to implement for a fully associative cache. The cache mechanism maintains a separate list of indexes to all the lines in the cache. When a line is referenced, it moves to the front of the list. For replacement, the line at the back of the list is used. Because of its simplicity of implementation, LRU is the most popular replacement algorithm.
Replacement Algorithm FIFO

In First In First Out (FIFO) replacement algorithm, we replace that block in the set that has been in the cache longest. FIFO can be easily implemented as a roundrobin or circular buffer technique.
Replacement Algorithm LFU

In Least Frequently Used (LFU) replacement algorithm, we replace that block in the set that has experienced the fewest references. LFU could be implemented by associating a counter with each line.
Elements of cache design Write Policy

As the cache gets filled and a new data block needs to be brought into the cache from the main memory, we need to replace the old cache block and there will be two cases arising:
If the old block in the cache has not been altered, then it may be overwritten with a new block without first writing out the old block. If at least one write operation has been performed on a word in that line of the cache, then main memory must be updated by writing the line of cache out to the block of memory before bringing in the new block.
How this replacement of the old cache block by the new cache block is done is determined by the write policy.
Challenges involved in Write Policy

More than one device may have access to main memory.
For example, an I/O module may be able to readwrite directly to memory. If a word has been altered only in the cache, then the corresponding memory word is invalid. Similarly, if an I/O device has altered main memory, then the cache word is invalid.
Elements of cache design Write Policy

Techniques for implementing write policy:
Write through Write back
Techniques for implementing write policy

Write through:
Using this technique, all write operations are made to the main memory as well as to the cache, ensuring that main memory is always valid. Any other processorcache module can monitor traffic to main memory to maintain consistency within its own cache.
The main disadvantage of this technique is that it generates substantial memory traffic and may create a bottleneck.
Techniques for implementing write policy

Write back:
With write back, updates are first made only in the cache and not in the main memory. Here, when an update occurs in the cache, a dirty bit, or use bit, associated with the cache line is set. Then, when a block needs to be replaced from the cache, it is written back to main memory if and only if the dirty bit is set signifying that its an updated line and needs to be written back to the main memory. The problem with write back is that portions of main memory are invalid, and hence accesses by I/O modules can be allowed only through the cache. This makes for complex circuitry and a potential bottleneck.
Problem with the above write policies - Cache Coherence Problem

In a multiprocessor system, where all the processors have their own local cache memories but all share the same main memory, a new problem is introduced. Here, if data in one cache are altered, this invalidates not only the corresponding word in main memory, but also that same word in other caches (if any other cache happens to have that same word). This problem is termed as the cache coherency problem. Even if a write-through policy is used, the other caches may contain invalid data.
Dealing with cache coherency problem

Shared bus watching with write through thus identifying the address at which data has been updated and informing the other processors to invalidate the data at that cache address Using additional hardware to ensure that all updates to main memory via a particular cache and vice versa are reflected in all other caches.
Elements of cache design Cache Line Size

When a block of data is retrieved from the main memory and placed in the cache, not only the desired word but also some number of adjacent words are retrieved from the main memory (as per the principles of locality of reference). As the block size increases from very small to larger sizes, the hit ratio will at first increase. But as the block size increases further, more useful data are brought into the cache. Now, the hit ratio will begin to decrease, however, as the block becomes even bigger and the probability of using the newly fetched information becomes less than the probability of reusing the information that has to be replaced.
Elements of cache design Cache Line Size

Thus, the relationship between block size and hit ratio is complex and depends on the locality characteristics of a particular program, and no definitive optimum value has been found.
Elements of cache design Number of caches

Most contemporary cache organization tend to follow the following designs:
Unified cache design involving the use of a single cache to reference both the data as well as the instructions. Split cache design involving usage of two separate caches; one dedicated to instructions and another dedicated to data.
Split Cache Design

In case of the split cache design, when the processor attempts to fetch an instruction from main memory, it first consults the instruction L1 cache, before looking into the main memory. Similarly, when the processor attempts to fetch data from main memory, it first consults the data L1 cache, before fetching the data from the main memory.
Comparing unified cache approach with split cache approach

There are two potential advantages of a unified cache:
For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically. Also, only one cache needs to be designed and implemented.
Comparing unified cache approach with split cache approach

Despite the above advantages of unified cache approach, the trend is toward split caches, particularly for superscalar machines, which emphasize parallel instruction execution and the prefetching of predicted future instructions. The key advantage of the split cache design is that instruction and data fetch operations can be carried out independently of each other thus eliminating contention for the cache between the instruction fetch/decode unit and the execution unit.
Intel Cache Evolution
Pentium 4 Cache Organization

The processor core consists of four major components:
Fetch/decode unit Out-of-order execution logic Execution units Memory subsystem
Pentium 4 Cache Organization Block Diagram

Fetch/decode unit: Fetches program instructions in order from the L2 cache, decodes these into a series of micro-operations, and stores the results in the L1 instruction cache.
Out-of-order execution logic: Micro-operations fetched from the L1 instruction cache may be scheduled for execution in a different order. This unit schedules execution of the micro-operations out-oforder on the basis of data dependencies and resource availability; and also to perform speculative execution of instructions.

Execution units: These is the unit which actually executes the micro-operations, fetching the required data from the L1 data cache and temporarily storing results in registers. Memory subsystem: This unit includes the L2 and L3 caches and the system bus, which is used to access main memory when the L1 and L2 caches have a cache miss and to access the system I/O resources.
ARM Cache Organization
ARM Cache and Write Buffer Organization

The write buffer is interposed between the cache and main memory and consists of a set of addresses and a set of data words. The write buffer is small compared to the cache, and may hold up to four independent addresses.
ARM cache organization Write Buffer

When the processor performs a write to a cache, the data are also placed in the write buffer and the processor continues execution. Thus, the data to be written to the main memory are transferred from the cache to the write buffer. The write buffer then performs the external write to the main memory in parallel.
If, however, when the write buffer is full then the processor is stalled until there is sufficient space in the buffer.
In this case, the write buffer continues to write to main memory until the buffer is completely empty. Thus, unless there is a high proportion of writes in an executing program, the write buffer improves performance.

Cache Memory

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cache Memory

Загружено:

Авторское право:

Доступные форматы

CACHE MEMORY

Dr. Pardeep Kumar

In the last set of lectures

In this set of lectures

Computer Memory System Overview

Characteristics of Memory Systems

Characteristics of Memory Systems Location of memory

Characteristics of Memory Systems Capacity

Characteristics of Memory Systems Unit of transfer

Characteristics of Memory Systems Method of accessing

Characteristics of Memory Systems Method of accessing

Characteristics of Memory Systems Performance

Characteristics of Memory Systems Performance

Characteristics of Memory Systems Performance

Characteristics of Memory Systems Performance

Characteristics of Memory Systems Physical type

Characteristics of Memory Systems Physical characteristics

Characteristics of Memory Systems Physical characteristics

Characteristics of Memory Systems Organization of bits

The Memory Hierarchy Some basics

The Memory Hierarchy Some basics

Solution to the above dilemma The memory hierarchy

The Memory Hierarchy

Some observations by looking at the memory hierarchy

Principle of locality of reference

Cache Memory Principles

Cache Memory Principles

Multiple cache levels

Structure of main memory

Structure of main memory

Structure of cache memory

Structure of cache memory

Structure of cache memory

An example flowchart Cache read operation

Typical Cache Organization

Typical Cache Organization

Elements of Cache Design

Cache Size Mapping function

Line size Number of caches

Overview to virtual memory

Elements of Cache Design Cache Addresses

Which one is better? Logical Cache or Physical Cache

Elements of Cache Design Cache Size

Thus, it is almost impossible to arrive at a single optimum cache size.

Cache Sizes of some Processors

Elements of Cache Design Mapping Function

Mapping function techniques

An example for understanding the three mapping functions

Mapping function Direct Mapping

Mapping function Direct Mapping

Most Significant s bits specify one memory block

Direct Mapping Main Memory Address Structure

Summarizing Direct Mapping

Direct mapping Cache Organization

Direct Mapping Example

Direct Mapping Example

Direct mapping Example Explanation

Disadvantage of Direct Mapping

Associative Mapping Working

Associative Mapping Main Memory Address Structure

Summarizing Associative Memory

Fully Associative Cache Organization

Associative Mapping Example

Associative Mapping Example Explanation