Академический Документы
Профессиональный Документы
Культура Документы
Input-Output Organization
INTRODUCTION
Note that the designation of a device as either input or output depends on the perspective. Mouses and
keyboards take as input physical movement that the human user outputs and convert it into signals that
a computer can understand. The output from these devices is input for the computer. Similarly, printers
and monitors take as input signals that a computer outputs. They then convert these signals into
representations that human users can see or read. (For a human user the process of reading or seeing
these representations is receiving input.)
In computer architecture, the combination of the CPU and main memory (i.e. memory that the CPU
can read and write to directly, with individual instructions) is considered the brain of a computer, and
from that point of view any transfer of information from or to that combination, for example to or from
a disk drive, is considered I/O. The CPU and its supporting circuitry provide memory-mapped I/O that
is used in low-level computer programming in the implementation of device drivers.
INPUT-OUTPUT ORGANIZATION
• Peripheral Devices
• Input-Output Interface
• Asynchronous Data Transfer
• Modes of Transfer
• Priority Interrupt
• Direct Memory Access
• Input-Output Processor
• Serial Communication
6.1.1 Keyboard
A keyboard is a human interface device which is represented as a layout of buttons. Each button, or
key, can be used to either input a linguistic character to a computer, or to call upon a particular
function of the computer. Traditional keyboards use spring-based buttons, though newer variations
employ virtual keys, or even projected keyboards.
72
Examples of types of keyboards include:
Computer keyboard
Keyer
Chorded keyboard
LPFK
A pointing device is any human interface device that allows a user to input spatial data to a
computer. In the case of mice and touch screens, this is usually achieved by detecting movement across
a physical surface. Analog devices, such as 3D mice, joysticks, or pointing sticks, function by
reporting their angle of deflection. Movements of the pointing device are echoed on the screen by
movements of the cursor, creating a simple, intuitive way to navigate a computer's GUI.
Some devices allow many continuous degrees of freedom as input. These can be used as pointing
devices, but are generally used in ways that don't involve pointing to a location in space, such as the
control of a camera angle while in 3D applications. These kinds of devices are typically used in
CAVEs, where input that registers 6DOF is
Video input devices are used to digitize images or video from the outside world into the computer. The
information can be stored in a multitude of formats depending on the user's requirement.
Webcam
Image scanner
Fingerprint scanner
Barcode reader
3D scanner
Laser rangefinder
o Computed tomography
o Magnetic resonance imaging
o Positron emission tomography
o Medical ultrasonography
An output device is any piece of computer hardware equipment used to communicate the results
of data processing carried out by an information processing system (such as a computer) to the
outside world. In computing, input/output, or I/O, refers to the communication between an
information processing system (such as a computer), and the outside world. Inputs are the signals
or data sent to the system, and outputs are the signals or data sent by the system to the outside.
73
The most common input devices used by the computer are the keyboard and mouse. The keyboard
allows the entry of textual information while the mouse allows the selection of a point on the
screen by moving a screen cursor to the point and pressing a mouse button. The most common
outputs are monitors and speakers
• Provides a method for transferring information between internal storage (such as memory and
CPU registers) and external I/O devices
• Resolves the differences between the computer and peripheral devices
– Peripherals - Electromechanical Devices
– CPU or Memory - Electronic Device
– Data Transfer Rate
» Peripherals - Usually slower
» CPU or Memory - Usually faster than peripherals
• Some kinds of Synchronization mechanism may be needed
– Unit of Information
» Peripherals – Byte, Block, …
» CPU or Memory – Word
– Data representations may differ
74
- Provides signals for the peripheral controller
Commands
Functions of Buses
* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface
* Many computers use a common single bus system for both memory and I/O interface units
75
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses, one to communicate with memory and the other
with I/O interfaces
- Communication between CPU and all interface units is via a common I/O Bus
- An interface connected to a peripheral device may have a number of data registers , a control
register, and a status register
- Function code and sense lines are not needed (Transfer of data, control, and status information is
always via the common I/O Bus)
Isolated I/O
Separate I/O read/write control lines in addition to memory read/write control lines.
Memory-mapped I/O
A single set of read/write control lines(no distinction between memory and I/O transfer)
-> The same memory reference instructions can be used for I/O transfers
76
6.3.6. Programmable Interface
- Information in each port can be assigned a meaning depending on the mode of operation of the I/O
device. → Port A = Data; Port B = Command; Port C = Status
→ Programmable Port: By changing the bits in the control register, it is possible to change
the interface characteristics.
Synchronous - All devices derive the timing information from common clock line.
Asynchronous data transfer between two independent units requires that control signals be transmitted
between the communicating units to indicate the time at which data is being transmitted.
Strobe pulse : A strobe pulse is supplied by one unit to indicate the other unit when the transfer has to
occur.
Handshaking: A control signal is accompanied with each data being transmitted to indicate the
presence of data .The receiving unit responds with another control signal to acknowledge receipt of
the data.
77
6.4.2.1 STROBE CONTROL
* The strobe may be activated by either the source or the destination unit
6.4.2.2. HANDSHAKING
Strobe Methods
1. Source-Initiated
The source unit that initiates the transfer has no way of knowing whether the destination unit
has actually received data
2. Destination-Initiated
The destination unit that initiates the transfer no way of knowing whether the source has actually
placed the data on the bus.To solve this problem, the HANDSHAKE method introduces a second
control signal to provide a Reply to the unit that initiates the transfer.
78
* Allows arbitrary delays from one state to the next
- Employs special bits which are inserted at both ends of the character code .
- Each character consists of three parts; Start bit; Data bits; Stop bits.
When data are not being sent, the line is kept in the 1-state (idle state)
- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1 bit
time.
The receiver knows in advance the transfer rate of the bits and the number of information bits to
expect.
Transmitter Register
Receiver
Define baud rate, no. of bits in each character, whether to generate and check parity, and no. of stop
bits.
FIRST-IN-FIRST-OUT(FIFO) BUFFER
80
* Output data are always in the same order in which the data entered the buffer.
3 different Data Transfer Modes between the central computer (CPU or Memory) and peripherals;
1. Program-Controlled I/O
2. Interrupt-Initiated I/O
81
MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA
- Open communication only when some data has to be passed -> Interrupt.
- When the interface determines that the I/O device is ready for data transfer, it generates an Interrupt
Request to the CPU.
- Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the service
routine to process the data transfer, and then returns to the task it was performing.
- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums,
disks, tapes, etc.
- DMA controller : Interface that provides I/O transfer of data directly to and from the memory and
the I/O device.
- CPU initializes the DMA controller by sending a memory address and the number of words to be
transferred.
- Actual transfer of data is done directly between the device and memory through DMA controller.
82
PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first when two or more requests are made
simultaneously. And also determines which interrupts are permitted to interrupt the computer while
another is being serviced.
- Higher priority interrupts can make requests while servicing a lower priority interrupt .
- Priority is established by the order of polling the devices(interrupt sources) and flexible since it is
established by software.
- Very slow .
Require a priority interrupt manager which accepts all the interrupt requests to determine the
highest priority request . Fast since identification of the highest priority interrupt request is
identified by the hardware. Fast since each interrupt source has its own interrupt vector to access
directly to its own service routine .
-> Any device receives signal (INTACK) 1 at PI puts the VAD on the bus.
Among interrupt requesting devices the only device which is physically closest to CPU gets
INTACK=1, and it blocks INTACK to propagate to the next device.
83
One stage of the daisy chain priority arrangement
IST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load
VAD generated by the Priority Logic.
Interrupt Register:
- Each bit is associated with an Interrupt Request from different Interrupt Source - different priority
level.
Mask Register:
84
INTERRUPT PRIORITY ENCODER
Determines the highest priority interrupt when more than one interrupts take place.
INTERRUPT CYCLE
85
Initial and Final Operations
Each interrupt service routine must have an initial and final set of operations for controlling the
registers in the hardware interrupt system.
* Block of data transfer from high speed devices, Drum, Disk, Tape.
* DMA controller - Interface which allows I/O transfer directly between Memory and Device, freeing
CPU for other tasks.
* CPU initializes DMA Controller by sending memory address and the block size (number of words) .
Starting an I/O
86
- CPU executes instruction to
d. Issue a GO command
Upon receiving, a GO Command DMA performs I/O operation as follows independently from CPU.
[2] Buffer (DMA Controller) <- Input Byte;and assembles the byte into a word until word is full.
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC – 1.
Output
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes .
While DMA I/O takes place, CPU is also executing instructions DMA Controller and CPU both access
Memory -> Memory Access Conflict .
- Priority System
Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA
Controller
87
Cycle Steal
- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory
cycles.
- For those slow CPU, DMA Controller may steal most of the memory cycles which may cause
CPU remain idle long time .
DMA TRANSFER
Channel
- Processor with direct memory access capability that communicates with I/O devices.
iii. Each CCW specifies the parameters needed by the channel to control the I/O
devices and perform data transfer operations.
88
- CPU initiates the channel by executing an channel I/O class instruction and once initiated,
channel operates independently of the CPU .
89
Multiple choice Questions
c). 1 , handshaking
d). For those slow CPU, DMA Controller may steal most of the memory cycles which may
cause CPU remain idle for few seconds .
90
6. Transmitter register
7.The larger the RAM of computer the faster its processing speed is, since it eliminates.
a). need for external memory b). need for ROM
8. A group of signal lines used to transmit data in parallel from one element of a computer to
another is
a).Control Bus b).Address Bus
9. The basic unit within a computer store capable of holding a single unit of Data is
Answers
91
Chapter 7
Memory Organization
The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were equipped
with additional storage
device beyond the capacity of main memory.The main memory unit that communicates directly with
CPU is called the MAIN MEMORY . Devices that provide backup storage are called AUXILARY
MEMORY. Most common auxiliary devices are magnetic disks and tapes they are used for strong
system programs, large data files and other backup information. Only programs and data currently
needed by the processor resides in main memory. All other informations are stored in auxiliary
memory and transferred to the main memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high –capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is
to obtain the highest possible access speed while minimizing the total cost of the memory system.
A very high speed memory is called cache memory used to increase the speed of processing
by making current programs and data available to the CPU at rapid rate.The cache memory is
employed in the system to compensates the speed differential between main memory access time and
processor logic.
The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operations. The principal technology
used for maim memory is based on semiconductor integrated circuits. Integrated circuits RAM chips
are available in two possible operating modes static and dynamic. The static RAM is easier to use and
has shorter read and write cycles.
92
The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory
chip compared to static RAM.
Most of main memory in a general- purpose computer is made up of RAM integrated circuit chips, but
apportion of the memory may be constructed with ROM chips. Originally RAM was used to refer the
random access memory, but now it used to designate the read/write memory to distinguish it from only
read only memory, although ROM is also a random access. RAM is used for storing bulk of programs
and data that are subject to change. ROM are used to for storing programs that are permanently
resident in the computer and for tables of constants that do not change in value once the production of
computer s completed. Among other things , the ROM portion is used to store the initial programs
called a bootstrap loader .This is program whose function is used to turn on the computer software
operating system. Since RAM is volatile its content are destroyed when power is turn off on other side
the content in ROM remain unchanged either the power is turn off and on again.
93
7.2.2 Memory Address maps
The designer of computer system must calculate the amount of memory required for particular
application and assign it to either RAM and ROM. The interconnection between memory and
processor is an established from knowledge of the size of memory needed and the type of RAM and
ROM chips available. The addressing of memory can be established by means of a table that specifies
the memory address to each chip. The table, called a memory address map , is a pictorial representation
of assigned address space for each chip in the system.
1. RAM and ROM chips are connected to a CPU through the data and address buses.
2. The low-order lines in the address bus select the byte within the chips and other lines in the
address bus select a particular chip through its chip select inputs.
94
7.3. Auxiliary Memory
The most common auxiliary memory devices used in computer systems are magnetic disks and tapes.
Other components used, but not as frequently, are magnetic drums, magnetic bubble memory, and
optical disks. To understand fully the physical mechanism of auxiliary memory devices one must have
knowledge of magnetic, electronics and electronics and electromechanical systems.
A magnetic tape transport consists of electrical, mechanical and electronic components to provide the
parts and control mechanism for magnetic – tape unit. The tape itself is a strip of coated with magnetic
recording medium. Bits are recorded as magnetic spots on the tape along tracks. Usually, seven or nine
bits are recorded simultaneously to from a character together with a parity bit. Read/write heads are
mounted one in each track so that data can be recorded and read as a sequence of characters.
A magnetic disk is a circular plate constructed of metals or plastic coated with magnetized. Often both
sides of disk are used and several disks may be stacked on one spindle with read/write heads available on
each surface. All disks rotate together at high speed and are not stopped or started for access purposes.
Bits are stored in magnetized surface in spots along concentric circles called track. The tracks are
commonly divided into section called sectors. In most systems, the minimum quality of information,
which can be transferred, is a sector.
95
7.3.3 RAID
RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the
University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive Disks a
technology that allowed computer users to achieve high levels of storage reliability from low-cost and
less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for
redundancy .More recently, marketers representing industry RAID manufacturers reinvented the term
to describe a Redundant Array of Independent Disks as a means of disassociating a "low cost"
expectation from RAID technology.
"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate
data among multiple hard disk drives. The different Schemes/architectures are named by the word RAID
followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design
goals: increased data reliability or increased input/output performance. When multiple physical disks are
set up to use RAID technology, they are said to be in a RAID array. This array distributes data across
multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID
can be set up to serve several different purposes.
Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives (known
as mirroring), or writing extra data (known as parity data) across the array, calculated such that the
failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss
of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining
data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity.
For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been
available using both disks independently, and a RAID 5 array with several disks loses the capacity of
one disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a
single disk.There are various combinations of these approaches giving different trade-offs of protection
against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover
most requirements.
RAID can involve significant computation when reading and writing information. With traditional
"real" RAID hardware, a separate controller does this computation. In other cases the operating system
or simpler and less expensive controllers require the host computer's processor to do the computing,
which reduces the computer's performance on processor-intensive tasks (see "Software RAID" and
"Fake RAID" below). Simpler RAID controllers may provide only levels 0 and 1, which require less
processing.
RAID systems with redundancy continue working without interruption when one (or possibly more,
depending on the type of RAID) disks of the array fail, although they are then vulnerable to further
failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to
operate normally. Some systems have to be powered down when removing or adding a drive; others
support hot swapping, allowing drives to be replaced without powering down. RAID with hot-
swapping is often used in high availability systems, where it is important that the system remains
running as much of the time as possible.
96
Principles: RAID combines two or more physical hard disks into a single logical unit by using either
special hardware or software. Hardware solutions often are designed to present themselves to the
attached system as a single hard drive, so that the operating system would be unaware of the technical
workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in
hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software
solutions are typically implemented in the operating system and would present the RAID drive as a
single drive to applications running upon the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping,
the splitting of data across more than one disk; and error correction, where redundant data is stored to
allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use
one or more of these techniques, depending on the system requirements. RAID's main aim can be
either to improve reliability and availability of data, ensuring that important data is available more
often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g.
for a system that delivers video on demand TV programs to many viewers).
The number of accesses to memory depends on the location of item and efficiency of the search
algorithm.
The time required to find the item stored in memory can be reduced considerably if stored data can be
identified for access by content of the data itself rather than by an address. A memory unit accessed by
a content is called associative memory or content addressable memory(CAM).
97
Compare each word in CAM in parallel with the content of A(Argument Register)
- K(Key Register) provides a mask for choosing a particular field or key in the argument in A(only
those bits in the argument that have 1’s in their corresponding position of K are compared).
Organization of CAM
The cache is a small amount of high-speed memory, usually with a memory cycle time comparable to
the time required by the CPU to fetch one instruction. The cache is usually filled from main memory
when instructions or data are fetched into the CPU. Often the main memory will supply a wider data
word to the cache than the CPU requires, to fill the cache more rapidly. The amount of information
which is replaces at one time in the cache is called the line size for the cache. This is normally the
width of the data bus between the cache memory and the main memory. A wide line size for the cache
means that several instruction or data words are loaded into the cache at one time, providing a kind of
prefetching for instructions or data. Since the cache is small, the effectiveness of the cache relies on the
following properties of most programs:
Spatial locality -- most programs are highly sequential; the next instruction usually comes from
the next memory location.
Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.
98
Short loops are a common program structure, especially for the innermost sets of nested loops.
This means that the same small set of instructions is used over and over.
Generally, several operations are performed on the same data values, or variables.
When a cache is used, there must be some way in which the memory controller determines whether the
value currently being addressed in memory is available from the cache. There are several ways that this
can be accomplished. One possibility is to store both the address and the value from main memory in
the cache, with the address stored in a type of memory called associative memory or, more
descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the memory,
otherwise an indication that the value is not in the associative memory is returned. All of the
comparisons are done simultaneously, so the search is performed very quickly. This type of memory is
very expensive, because each memory location must have both a comparator and a storage element. A
cache memory can be implemented with a block of associative memory, together with a block of
``ordinary'' memory. The associative memory would hold the address of the data stored in the cache,
and the ordinary memory would contain the data at that address. Such a cache memory might be
configured as shown in Figure.
If the address is not found in the associative memory, then the value is obtained from main memory.
Associative memory is very expensive, because a comparator is required for every word in the
memory, to perform all the comparisons in parallel. A cheaper way to implement a cache memory,
without using expensive associative memory, is to use direct mapping. Here, part of the memory
address (usually the low order digits of the address) is used to address a word in the cache. This part of
the address is called the index. The remaining high-order bits in the address, called the tag, are stored
in the cache memory along with the data. For example, if a processor has an 18 bit address for
99
memory, and a cache of 1 K words of 2 bytes (16 bits) length, and the processor can address single
bytes or 2 byte words, we might have the memory address field and cache organized as in Figure .
This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there are 4
other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for each byte
and one for the tag. The parity bits are used to check that a single bit error has not occurred to the data
while in the cache. A fourth bit, called the valid bit is used to indicate whether or not a given location
in cache is valid. In the PDP-11/60 and in many other processors, the cache is not updated if memory
is altered by a device other than the CPU (for example when a disk stores new data in memory). When
such a memory operation occurs to a location which has its value stored in cache, the valid bit is reset
to show that the data is ``stale'' and does not correspond to the data in main memory. As well, the valid
bit is reset when power is first applied to the processor or when the processor recovers from a power
failure, because the data found in the cache at that time will be invalid. In the PDP-11/60, the data path
from memory to cache was the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a
faster machine, the data path from the CPU to cache was 16 bits, while from memory to cache was 32
bits which means that the cache had effectively prefetched the next instruction, approximately half of
the time). The amount of information (instructions or data) stored with each tag in the cache is called
the line size of the cache. (It is usually the same size as the data path from main memory to the cache.)
A large line size allows the prefetching of a number of instructions or data words. All items in a line of
the cache are replaced in the cache simultaneously, however, resulting in a larger block of data being
replaced for each cache miss.
The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K bytes.
For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a cache
configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes would be
typical, and line sizes of 1, 2 or 4 words would be typical).
100
Figure: One possible MIPS cache organization
A characteristic of the direct mapped cache is that a particular memory address can be mapped into
only one cache location. Many memory addresses are mapped to the same cache location (in fact, all
addresses with the same index field are mapped to the same cache location.) Whenever a ``cache miss''
occurs, the cache line will be replaced by a new line of information from main memory at an address
with the same index but with a different tag.
Note that if the program ``jumps around'' in memory, this cache organization will likely not be
effective because the index range is limited. Also, if both instructions and data are stored in cache, it
may well happen that both map into the same area of cache, and may cause each other to be replaced
very often. This could happen, for example, if the code for a matrix operation and the matrix data itself
happened to have the same index values.
A more interesting configuration for a cache is the set associative cache, which uses a set associative
mapping. In this cache organization, a given memory location can be mapped to more than one cache
location. Here, each index corresponds to two or more data words, each with a corresponding tag. A
set associative cache with n tag and data fields is called an ``n-way set associative cache''. Usually
, for k = 1, 2, 3 are chosen for a set associative cache (k = 0 corresponds to direct mapping).
Such n-way set associative caches allow interesting tradeoff possibilities; cache performance can be
improved by increasing the number of ``ways'', or by increasing the line size, for a given total amount
101
of memory. An example of a 2-way set associative cache is shown in Figure , which shows a cache
containing a total of 2K lines, or 1 K sets, each set being 2-way associative. (The sets correspond to the
rows in the figure.)
In a 2-way set associative cache, if one data word is empty for a read operation corresponding to a
particular index, then it is filled. If both data words are filled, then one must be overwritten by the new
data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are filled, then one
value in the set must be overwritten, or replaced, in the cache by the new tag and data values. Note that
an entire line must be replaced each time. The most common replacement algorithms are:
Random -- the location for the value to be replaced is chosen at random from all n of the cache
locations at that index position. In a 2-way set associative cache, this can be accomplished with
a single modulo 2 random variable obtained, say, from an internal clock.
First in, first out (FIFO) -- here the first value stored in the cache, at each index position, is the
value to be replaced. For a 2-way set associative cache, this replacement strategy can be
implemented by setting a pointer to the previously loaded word each time a new word is stored
in the cache; this pointer need only be a single bit. (For set sizes > 2, this algorithm can be
implemented with a counter value stored for each ``line'', or index in the cache, and the cache
can be filled in a ``round robin'' fashion).
Least recently used (LRU) -- here the value which was actually used least recently is replaced.
In general, it is more likely that the most recently used value will be the one required in the
near future. For a 2-way set associative cache, this is readily implemented by setting a special
bit called the ``USED'' bit for the other word when a value is accessed while the corresponding
bit for the word which was accessed is reset. The value to be replaced is then the value with the
USED bit set. This replacement strategy can be implemented by adding a single USED bit to
each cache location. The LRU strategy operates by setting a bit in the other word when a value
is stored and resetting the corresponding bit for the new word. For an n-way set associative
cache, this strategy can be implemented by storing a modulo n counter with each data word. (It
is an interesting exercise to determine exactly what must be done in this case. The required
circuitry may become somewhat complex, for large n.)
Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:
Write through cache -- both the cache and main memory are updated at the same time. This
may slow down the execution of instructions which write data to memory, because of the
102
relatively longer write time to main memory. Buffering memory writes can help speed up
memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache memory
controller marks the value so that it can be written back into memory when the word is
removed from the cache. This method is used because a memory location may often be altered
several times while it is still in cache without having to write the value into main memory. This
method is often implemented using an ``ALTERED'' bit in the cache. The ALTERED bit is set
whenever a cache value is written into by the processor. Only if the ALTERED bit is set is it
necessary to write the value back into main memory (i.e., only values which have been altered
must be written back into main memory). The value should be written back immediately before
the value is replaced in the cache.
In a memory hierarchy system, programs and data are first stored in auxiliary memory. Portion of
program or data are brought into main memory as they are needed by CPU. Virtual memory is a
concept used in some large computer systems that permit the user to construct programs as though a
large memory space were available , equal to the totality of auxiliary memory. Each address that is
referenced by CPU goes through the address mapping from the so called virtual address to a physical
address in memory.Virtual memory is used to give the programmer the illusion that the system has a
very large memory, even though the computer actually has a relatively small main memory.
7.6.1 Address Mapping Memory Mapping Table for Virtual Address -> Physical
address
Address Space and Memory Space are each divided into fixed size group of words called blocks
or pages
1K words group
103
Organization of memory Mapping Table in a paged system
Assume that Number of Blocks in memory = m, Number of Pages in Virtual Address Space = n.
Page Table
Straight forward design -> n entry table in memory, Inefficient storage space utilization <- n-m
entries of the table is empty
More efficient method is m-entry Page Table. Page Table made of an Associative Memory that
is m words; (Page Number: Block Number)
104
Virtual address
Page no.
1 0 1 0 0 Key register
0 0 1 1 1
0 1 0 0 0 Associative memory
1 0 1 0 1
1 1 0 1 0
Page no. Block no.
Page Fault
1. Trap to the OS
4. Check that the page reference was legal and determine the location of the page on the backing
store(disk)
8. Save the registers and program state for the other user
10. Correct the page tables (the desired page is now in memory)
12. Restore the user registers, program state, and new page table, then resume the interrupted
instruction.
105
Processor architecture should provide the ability to restart any instruction after a page fault.
106
Multiple choice questions
1.The content of these chips are lost when the computer is switched off?
a) ROM Chips
b) RAM Chips
c) DRAM Chips
a) ROM Chips
b) RAM Chips
c) DRAM Chips
a) Associative memory
b) Main memory
c) Cache memory
4.How many bits of information can each memory cell in a computer chip hold?
a).0 bit
b).1 bit
c).8 bits
a).ROM Chips
b).RAM Chips
c).DRAM Chips
6. The interface between level-2 (operating system) and level-1 (Microprogram) of a computer
design is called:
a).Computer architecture
a)SAM b)ROM
9. A micro computer has primary memory of 640k . What is the exact number of bytes contained in this
memory?
Answers
108