Академический Документы
Профессиональный Документы
Культура Документы
Sandeep Srivastava
Introduction
•
This is a slightly advanced lecture on
Computer Architecture and covers high level
topics like speed, performance, cost, etc.
•
Brief familiarity with Computer Organization is
assumed
Introduction
•
In making a design trade-off, favor the
frequent case over the infrequent case. This
principle also applies when determining how
to spend resources, since the impact on
making some occurrence faster is higher if the
occurrence is frequent.
•
Improving the frequent occurrence:
– Helps performance
– Is simpler and can be done faster
Locality of References
•
This important fundamental observation
comes from properties of programs. The most
important program property that we regularly
exploit is locality of references : Programs tend
to reuse data and instructions they have used
recently. 90/10 rule comes from empirical
observation:
"A program spends 90% of its time in 10% of
its code"
•
An implication of locality is that we can predict
Smaller is Faster Rule
•
Smaller pieces of hardware will generally be
faster than larger pieces. This simple principle
is particularly applicable to memories built
from the same technology for two reasons:
•
In high-speed machines, signal propagation is
a major cause of delay
•
In most technologies we can obtain smaller
memories that are faster than larger
memories. This is primarily because the
designer can use more power per memory cell
Memory Hierarchy Design
•
Memory hierarchy design is based on three
important principles:
Principle of Locality
Smaller is Faster
Memory Hierarchy Design
The objective of Memory Hierarchy is to obtain the highest possible access
speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks
CPU Cache
memory
Register
Cache
Main Memory
Magnetic Disk
Memory Hierarchy Design
•
The above principles suggest that we should
try to keep recently accessed items in the
fastest memory. Because the smaller
memories are more expensive and faster, we
want to use smaller memories to try to hold
the most recently accessed items close to the
CPU and successively larger (and slower, and
less expensive) memories as we move away
from the CPU. This type of organization is
called a memory hierarchy . Two important
Memory Hierarchy Design
•
Using principle of locality to improve
performance while keeping the memory
system affordable we can pose four questions
about any level of memory hierarchy. We will
answer those questions considering one level
of memory hierarchy.
•
Block Placement
Where should a block be placed in the
cache?
Block Placement
•
There are three methods in block placement:
– Direct mapped : if each block has only one place it
can appear in the cache, the cache is said to be
direct mapped. The mapping is usually (Block
address) MOD (Number of blocks in cache)
– Fully Associative : if a block can be placed
anywhere in the cache, the cache is said to be fully
associative.
– Set associative : if a block can be placed in a
restricted set of places in the cache, the cache is
Block Identification
•
Cache memory consists of two portions:
•
Directory
- Address Tags ( checked to match the
block address from CPU )
- Control Bits ( indicate that the content of
a block is valid )
RAM
- Block Frames ( contain data )
Block Identification
•
As a rule, all possible tags are searched in
parallel because speed is critical.
•
The block offset field selects the desired data
(minimal addressable unit) from the block, the
index field selects the set, and the tag field is
compared against cache tag for a hit.
•
While the comparison could be made on more
of the address than the tag, there is no need
because:
Checking the index would be redundant,
Basic Identification Algorithm
•
//Search cache Directory for Tag
•
if "hit" then
Use offset to fetch data from RAM
else
//access main memory
•
if "hit" then
Store data (and block) in cache and
Pass data to CPU
else
Do Context Switch (while processing
Block Replacement
•
When a miss occurs, the cache controller must
select a block to be replaced with the desired
data. A replacement policy determines which
block should be replaced. With direct-mapped
placement the decision is simple because
there is no choice: only one block frame is
checked for a hit and only that block can be
replaced.
•
With fully-associative or set-associative
placement , there are more than one block to
Block Replacement
•
Other strategies:
First In First Out (FIFO)
Most Recently Used (MRU)
Least-Frequently Used (LFU)
Most-Frequently Used (MFU)
Interaction Policies with Main
Memory
•
Reads dominate processor cache accesses. All
instruction accesses are reads, and most
instructions do not write to memory. The
block can be read at the same time that the
tag is read and compared, so the block read
begins as soon as the block address is
available. If the read is a miss, there is no
benefit - but also no harm; just ignore the
value read.
•
The read policies are:
Interaction Policies with Main
Memory
•
The write policies on write hit often
distinguish cache designs:
Write Through - the information is written to
both the block in the cache and to the block in
the lower-level memory.
Advantage:
- read miss never results in writes to main
memory
- easy to implement
- main memory always has the most current
Ordering of Bytes in Memory