Вы находитесь на странице: 1из 68

The Memory System

Overview

Basic memory circuits


Organization of the main memory
Cache memory concept
Virtual memory mechanism
Secondary storage

Some Basic Concepts

Basic Concepts

The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations

Most modern computers are byte addressable.


Word
address

Byte address

Byte address

2 -4

2 -4

2 -3

2- 2

2 - 1

(a) Big-endian assignment

2 - 4

2- 1

2 - 2

2 -3

2 -4

(b) Little-endian assignment

Traditional Architecture
Processor

k-bit
address bus

Memory

MAR
n-bit
data bus

MDR

Up to 2k addressable
locations
Word length =n bits

Control lines
( R / W , MFC, etc.)

Figure 5.1. Connection of the memory to the processor.

Basic Concepts

Block transfer bulk data transfer


Memory access time
A useful measure of the speed of memory units is the time that elapses
between the initiation of an operation to transfer a word of data and the
completion of that operation. This is referred to as the memory access time.
Memory cycle time
The minimum time delay required between the initiation of two successive
memory operations,
for example, the time between two successive Read operations. The cycle
time is usually slightly longer than the access time, depending on the
implementation details of the memory unit.
RAM any location can be accessed for a Read or Write operation in some
fixed amount of time that is independent of the locations address.
Cache memory
Virtual memory, memory management unit

Semiconductor RAM
Memories

Internal Organization of
Memory Chips
b7

b7

b1

b1

b0

b0

W0

FF
A0

A2

A1

W1

FF

Address
decoder

Memory
cells

A3

W15

16 words of 8 bits each: 16x8 memory


org.. It has 16 external connections:
Sense / Write
circuit
addr. 4, data 8, control: 2,
power/ground: 2
1K memory cells:
memory,
Data 128x8
input/output
lines: b7
external connections: ? 19(7+8+2+2)
1Kx1:? 15 (10+1+2+2)

Sense / Write
circuit

b1

Sense / Write
circuit

b0

Figure 5.2. Organization of bit cells in a memory chip.

R/W
CS

A Memory Chip
5-bit row
address

W0
W1
5-bit
decoder

32 32
memory cell
array

W31

10-bit
address

Sense/ Write
circuitry

32-to-1
output multiplexer
and
input demultiplexer
5-bit column
address
Data
input/output

Figure 5.3. Organization of a 1K 1 memory chip.

R/ W

CS

Static Memories

The circuits are capable of retaining their state as long as power


is applied.
b

T1

T2

Word line
Bit lines

Figure 5.4. A static RAM cell.

Read Operation
In order to read the state of the SRAM cell, the word line is activated to
close switches T1 and T2. If the cell is in state 1, the signal on bit line b is
high and the signal on bit line b is low. The opposite is true if the cell is in
state 0.
Thus, b and b are always complements of each other. The Sense/Write
circuit at the end of the two bit lines monitors their state and sets the
corresponding output accordingly.
Write Operation
During a Write operation, the Sense/Write circuit drives bit lines b and b,
instead of sensing their state.
It places the appropriate value on bit line b and its complement on b and
activates the word line.
This forces the cell into the corresponding state, which the cell retains when
the word line is deactivated.

Vsupply

T3

T4

T1

T2
X

T5

T6

Word line
Bit lines

Static Memories
Figure 5.5. An example of a CMOS memory cell.

CMOS cell: low power consumption

Asynchronous DRAMs

Static RAMs are fast, but they cost more area and are more expensive.
Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not
retain their state indefinitely need to be periodically refreshed.

Bit line
Word line

T
C

Figure 5.6. A single-transistor dynamic memory cell

To store information in this cell, transistor T is turned on and an appropriate


voltage is applied to the bit line. This causes a known amount of charge to
be stored in the capacitor.
After the transistor is turned off, the charge remains stored in the capacitor,
but not for long. The capacitor begins to discharge. This is because the
transistor continues to conduct a tiny amount of current, measured in
picoamperes, after it is turned off.
Hence,the information stored in the cell can be retrieved correctly only if it
is read before the charge in the capacitor drops below some threshold
value.
During a Read operation, the transistor in a selected cell is turned on.

A sense amplifier connected to the bit line detects whether the


charge stored in the capacitor is above or below the threshold value.
If the charge is above the threshold, the sense amplifier drives the
bit line to the full voltage representing the logic value 1.
As a result, the capacitor is recharged to the full charge
corresponding to the logic value 1.
If the sense amplifier detects that the charge in the capacitor is
below the threshold value, it pulls the bit line to ground level to
discharge the capacitor fully.
Thus, reading the contents of a cell automatically refreshes its
contents. Since the word line is common to all cells in a row, all cells
in a selected row are read and refreshed at the same time.

A Dynamic Memory Chip


RA S

Row
address
latch

A20 - 9 A 8 -

Row
decoder

4096 (512 8)
cell array

Sense / Write
circuits

Column
address
latch

CA S

CS
R/ W

Column
decoder

D7

D0

Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.

The cell are organized in the form of 4K X 4K array.


4096 cells in each row are divided into 512 groups of 8,
so that row can store 512 bytes of data.
12 address bits are needed to select to select row, 9 to
specify a group of 8 bits in a selected row.
To reduce no of pins needed for external connection, the
row and column address are multiplexed on 12 pins.

A256-Megabit DRAM chip, configured as 32M 8.


The cells are organized in the form of a 16K 16K array. The 16,384 cells in each
row are divided into 2,048 groups of 8, forming 2,048 bytes of data.
Therefore, 14 address bits are needed to select a row, and another 11 bits are
needed to specify a group of 8 bits in the selected row.
In total, a 25-bit address is needed to access a byte in this memory. The high-order
14 bits and the low-order 11 bits of the address constitute the row and column
addresses of a byte, respectively.
To reduce the number of pins needed for external connections, the row and column
addresses are multiplexed on 14 pins.
During a Read or a Write operation, the row address is applied first. It is loaded into
the row address latch in response to a signal pulse on an input control line called the
Row Address Strobe (RAS). This causes a Read operation to be initiated, in which all
cells in the selected row are read and refreshed.

Shortly after the row address is loaded, the column address is applied to the
address pins and loaded into the column address latch under control of a
second control line called the ColumnAddress Strobe (CAS).
The information in this latch is decoded and the appropriate group of 8
Sense/Write circuits is selected.
If the R/W control signal indicates a Read operation, the output values of
the selected circuits are transferred to the data lines, D70.
For a Write operation, the information on theD70 lines is transferred to the
selected circuits, then used to overwrite the contents of the selected cells in
the corresponding 8 columns.
In commercial DRAM chips, the RAS and CAS control signals are active
when low.
Hence, addresses are latched when these signals change from high to low.

Fast Page Mode

When the DRAM in last slide is accessed, the


contents of all 4096 cells in the selected row are
sensed, but only 8 bits are placed on the data lines
D7-0, as selected by A8-0.
Fast page mode make it possible to access the
other bytes in the same row without having to
reselect the row.
A latch is added at the output of the sense amplifier
in each column.
Good for bulk transfer.

Synchronous DRAMs

The operations of SDRAM are controlled by a clock signal.


Refresh
counter

Row
address
latch

Row
decoder

Cell array

Column
address
counter

Column
decoder

Read/Write
circuits & latches

Row/Column
address

Clock
RA S
CA S
R/ W

Mode register
and
timing control

Data input
register

Data output
register

CS

Figure 5.8. Synchronous DRAM.

Data

Synchronous DRAMs
Clock

R/ W

RA S

CA S

Address

Data

Row

Col

D0

D1

D2

Figure 5.9. Burst read of length 4 in an SDRAM.

D3

Synchronous DRAMs

No CAS pulses is needed in burst operation.


Refresh circuits are included (every 64ms).
Clock frequency > 100 MHz
Intel PC100 and PC133

Latency and Bandwidth

The speed and efficiency of data transfers among


memory, processor, and disk have a large impact on
the performance of a computer system.
Memory latency the amount of time it takes to
transfer a word of data to or from the memory.
Memory bandwidth the number of bits or bytes
that can be transferred in one second. It is used to
measure how much time is needed to transfer an
entire block of data.
Bandwidth is not determined solely by memory. It is
the product of the rate at which data are transferred
(and accessed) and the width of the data bus.

DDR SDRAM

Double-Data-Rate SDRAM
Standard SDRAM performs all actions on the rising
edge of the clock signal.
DDR SDRAM accesses the cell array in the same
way, but transfers the data on both edges of the
clock.
The cell array is organized in two banks. Each can
be accessed separately.
DDR SDRAMs and standard SDRAMs are most
efficiently used in applications where block transfers
are prevalent.

Structures of Larger Memories


21-bit
addresses

19-bit internal chip address

A0
A1

A19
A20

2-bit
decoder

512K 8
memory chip

D31-24

D23-16

D 15-8

D7-0

512K 8 memory chip

19-bit
address

8-bit data
input/output

Chip select

Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.

Consider a memory consisting of 2M words of 32 bits


each.
Figure shows how this memory can be implemented
using 512K 8 static memory chips.
Each column in the figure implements one byte position
in a word, with four chips providing 2M bytes.
Four columns implement the required 2M 32 memory.
Each chip has a control input called Chip-select. When
this input is set to 1, it enables the chip to accept data
from or to place data on its data lines.

The data output for each chip is of the tri-state type .


Only the selected chip places data on the data output line, while all
other outputs are electrically disconnected from the data lines.
Twenty-one address bits are needed to select a 32-bit word in this
memory.
The high-order two bits of the address are decoded to determine
which of the four rows should be selected.
The remaining 19 address bits are used to access specific byte
locations inside each chip in the selected row.
The R/W inputs of all chips are tied together to provide a common
Read/Write control line (not shown in the figure)

Memory System
Considerations

The choice of a RAM chip for a given application depends on


several factors:
Cost, speed, power, size
SRAMs are faster, more expensive, smaller.
DRAMs are slower, cheaper, larger.
Which one for cache and main memory, respectively?
Refresh overhead suppose a SDRAM whose cells are in 8K
rows; 4 clock cycles are needed to access each row; then it
takes 81924=32,768 cycles to refresh all rows; if the clock rate
is 133 MHz, then it takes 32,768/(13310-6)=24610-6 seconds;
suppose the typical refreshing period is 64 ms, then the refresh
overhead is 0.246/64=0.0038<0.4% of the total time available for
accessing the memory.

Memory Controller
Row/Column
address

Address

RA S
R/ W
Request

Memory
controller

Processor

CA S
R/ W
CS

Clock

Clock

Data

Figure 5.11. Use of a memory controller.

Memory

Read-Only Memories

Read-Only-Memory

Volatile / non-volatile memory


ROM
Bit line
PROM
EPROM
EEPROM
T
P

Word line

Not connected to store a 1


Connected to store a 0

Figure 5.12. A ROM cell.

Flash Memory

Similar to EEPROM
Difference: only possible to write an entire
block of cells instead of a single cell
Low power
Use in portable equipment
Implementation of such modules

Flash cards
Flash drives

Speed, Size, and Cost


Processor

Registers
Increasing
size
Primary L1
cache

Increasing Increasing
speed cost per bit

SecondaryL2
cache

Main
memory

Magnetic disk
secondary
memory

Figure 5.13. Memory hierarchy.

Cache Memories

Cache

What is cache?
Why we need it?
Locality of reference (very important)
- temporal
- spatial
Cache block cache line

Cache
Processor

Cache

Figure 5.14. Use of a cache memory.

Replacement algorithm
Hit / miss
Write-through / Write-back
Load through

Main
memory

Main
memory
Block 0

Direct Mapping
tag
tag

tag

Block 1

Cache

Block 127

Block 0

Block 128

Block 1

Block 129

Block 127

Block 255
Block 256
Block 257

Figure 5.15. Direct-mapped cache.


Block 4095
Tag

Block

Word

Main memory address

Associative Mapping

Main
memory
Block 0
Block 1

Cache
tag

Block 0

tag

Block 1

Block i
tag

Block 127

Block 4095
Tag

Word

12

Main memory address

Figure 5.16. Associative-mapped cache.

Main
memory
Block 0

Set-Associative Mapping

Block 1

Cache
tag
Set 0
tag
tag
Set 1

tag

tag
Set 63
tag

Block 0

Block 63
Block 1
Block 64
Block 2
Block 65

Block 3

Block 127
Block 126
Block 128
Block 127
Block 129

Figure 5.17. Set-associative-mapped cache with two blocks per set.


Block 4095

Tag

Set

Word

Main memory address

Replacement Algorithms

Difficult to determine which blocks to kick out


Least Recently Used (LRU) block
The cache controller tracks references to all
blocks as computation proceeds.
Increase / clear track counters when a
hit/miss occurs

Performance
Considerations

Overview

Two key factors: performance and cost


Price/performance ratio
Performance depends on how fast machine
instructions can be brought into the processor for
execution and how fast they can be executed.
For memory hierarchy, it is beneficial if transfers to
and from the faster units can be done at a rate equal
to that of the faster unit.
This is not possible if both the slow and the fast
units are accessed in the same manner.
However, it can be achieved when parallelism is
used in the organizations of the slower unit.

Interleaving

ABR

If the main memory is structured as a collection of


physically separated modules, each with its own
ABR and DBR, memory access operations may
proceed in more than one module at the same time.

DBR

Module
0

k bits

m bits

Module

Address in module

ABR DBR
Module
i

m bits

k bits

Address in module

Module

MM address

MM address

ABR DBR

ABR DBR

ABR DBR

Module
0

Module
i

Module
k
2 - 1

ABR DBR
Module
n- 1

(b) Consecutive words in consecutive modules


(a) Consecutive words in a module

Figure 5.25. Addressing multiple-module memory systems.

Hit Rate and Miss Penalty

The success rate in accessing information at various


levels of the memory hierarchy hit rate / miss rate.
Ideally, the entire memory hierarch would appear to
the processor as a single memory unit that has the
access time of a cache on the processor chip and
the size of a magnetic disk depends on the hit rate
(>>0.9).
A miss causes extra time needed to bring the
desired information into the cache.
Example 5.2, page 332.

How to Improve Hit Rate?

Use larger cache


Increase the block size while keeping the
total cache size constant.
However, if the block size is too large, some
items may not be referenced before the block
is replaced miss penalty increases.
Load-through approach

Caches on the Processor Chip

On chip vs. off chip


Two separate caches for instructions and data, respectively
Single cache for both
Which one has better hit rate?
Whats the good for separating caches?
Level 1 and Level 2 caches
L1 cache faster and smaller. Access more than one word
simultaneously and let the processor use them one at a time.
L2 cache slower and larger.
How about the average access time?
Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M
where h is the hit rate, C is the time to access information in cache,
M is the time to access information in main memory.

Other Enhancements

Write buffer processor doesnt need to wait


for the memory write to be completed
Prefetching prefetch the data into the cache
before they are needed
Lockup-Free cache processor is able to
access the cache while a miss is being
serviced.

Virtual Memories

Overview

Physical main memory is not as large as the address space


spanned by an address issued by the processor.
232 = 4 GB, 264 =
When a program does not completely fit into the main memory,
the parts of it not currently being executed are stored on
secondary storage devices.
Techniques that automatically move program and data blocks
into the physical main memory when they are required for
execution are called virtual-memory techniques.
Virtual addresses will be translated into physical addresses.

Overview

Address Translation

All programs and data are composed of fixedlength units called pages, each of which
consists of a block of words that occupy
contiguous locations in the main memory.
Page cannot be too small or too large.
The virtual memory mechanism bridges the
size and speed gaps between the main
memory and secondary storage similar to
cache.

Address Translation

Virtual address from processor

Page table base register

Page table address

Virtual page number

Offset

Page frame

Offset

+
PAGE TABLE

Control
bits

Page frame
in memory

Figure 5.27. Virtual-memory address translation.


Physical address in main memory

Address Translation

The page table information is used by the


MMU for every access, so it is supposed to
be with the MMU.
However, since MMU is on the processor
chip and the page table is rather large, only
small portion of it, which consists of the page
table entries that correspond to the most
recently accessed pages, can be
accommodated within the MMU.
Translation Lookaside Buffer (TLB)

Virtual address from processor

TLB

Virtual page number

Offset

TLB
Virtual page
number

No

Control
bits

Page frame
in memory

=?
Yes

Miss

Hit

Page frame

Offset

Physical address in main memory

Figure 5.28. Use of an associative-mapped TLB.

TLB

The contents of TLB must be coherent with


the contents of page tables in the memory.
Translation procedure.
Page fault
Page replacement
Write-through is not suitable for virtual
memory.
Locality of reference in virtual memory

Memory Management
Requirements

Multiple programs
System space / user space
Protection (supervisor / user state, privileged
instructions)
Shared pages

Secondary Storage

Magnetic Hard Disks

Disk
Disk drive
Disk controller

Organization of Data on a Disk

Sector 3, trackn

Sector 0, track 1
Sector 0, track 0

Figure 5.30. Organization of one surface of a disk.

Access Data on a Disk

Sector header
Following the data, there is an errorcorrection code (ECC).
Formatting process
Difference between inner tracks and outer
tracks
Access time seek time / rotational delay
(latency time)
Data buffer/cache

Disk Controller
Processor

Main memory

System bus
Disk controller

Disk drive

Disk drive

Figure 5.31. Disks connected to the system bus.

Disk Controller

Seek
Read
Write
Error checking

RAID Disk Arrays

Redundant Array of Inexpensive Disks


Using multiple disks makes it cheaper for
huge storage, and also possible to improve
the reliability of the overall system.
RAID0 data striping
RAID1
RAID2, 3, 4
RAID5 parity-based error-recovery

Aluminum

Optical Disks

Pit

Acrylic

Label

Polycarbonate plastic

Land

(a) Cross-section

Pit

Land

Reflection

Reflection

No reflection

Source

Detector

Source

Detector

Source

Detector

(b) Transition from pit to land

0 1 0 0

1 0 0 0 0

1 0 0 0 1

(c) Stored binary pattern

Figure 5.32. Optical disk.

0 0 1 0 0

1 0

Optical Disks

CD-ROM
CD-Recordable (CD-R)
CD-ReWritable (CD-RW)
DVD
DVD-RAM

Magnetic Tape Systems


File

File
mark

File
mark

File gap

Record

Record
gap

Record

Record
gap

Figure 5.33. Organization of data on magnetic tape.

7 or 9
bits

Вам также может понравиться