Hardware and Computer Organization: The Software Perspective

Hardware and Computer Organization
The Software Perspective

By
Arnold S. Berger
Solutions for
Even-Numbered Problems
AMSTERDAM BOSTON HEIDELBERG LONDON

NEW YORK OXFORD PARIS SAN DIEGO
SAN FRANCISCO SINGAPORE SYDNEY TOKYO
Newnes is an imprint of Elsevier
Solutions for Even-Numbered Problems
Chapter 1: Solutions for

2. Moores Law states that the density of integrated circuits doubles every 18 months. Therefore,
we would expect a 200 million transistor processor around July of 2005.
4. Most PCs use a PCI bus to communicate between the I/O cards and the processor. The AGP
bus connects the video card to the CPU. Auxiliary busses include the USB bus, which comes
in two avors, 1.0 and 2.0, with 2.0 being the faster version. Firewire is built into Apple
Computers and is available on some PCs. Most disk drives use the IDE bus, although there are
some higher-performance busses that use the SCSI bus protocol.
6. Decimal 357 in base 9 is 436.
8. Convert the following decimal numbers to binary:
(m)
(n)
(o)
(p)
510 = 111111110
64,200 = 1111101011001000
4,001 = 111110100001
255 = 11111111
Hardware and Computer Organization: The Software Perspective

2. Modify the 4-input NAND gate to 2-inputs and add an inverter to the output to create the AND
function as shown below:
VCC
VCC
A
F = A*B
4. When X is raised to logic level 1 it creates a low impedance path to ground. The signal from A
to B is short-circuited to ground. No signal can travel from A to B unless X is at logic 0.
6. The rst thing to notice is that the NOT gate connected to input C controls the
two AND gates. When C = 0, the upper AND gate is enabled, when C = 1, the
lower AND gate is enabled. Thus, we may consider the effects of NOR and
XOR gates independently. Here is the truth table:
8.
OR gate
A=0
A=1
output follows input at B
Output is always at logic level 1.
XOR gate output follows input at B
Output is 180 degrees out of

phase with the input (inverted).
A
0
1
0
1
0
1
0
1
B
0
0
1
1
0
0
1
1
C
0
0
0
0
1
1
1
1
X
1
0
0
0
0
1
1
0

2. The truth tables are shown, right:
CASE 1
A
A NOR B
*A
*B
*A AND *B
CASE 2
4. The truth table is shown below:

A
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
B
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
C
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
D
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
X
1
1
1
0
0
1
1
0
1
0
0
0
0
0
0
1
A NAND B
*A
*B
*A OR *B

6. The circuit is shown below:
A0
B0
A1
B1
A2
B2
A3
B3
XOR
XOR
AND
XOR
XOR
8. This problem can be very tedious, but it is actually quite simple if you realize that you can
convert the logic to positive logic for the output devices. This means that there is only one
output term for each row in the truth table. Then you only need to add a NOT gate at the end
(or use a NAND gate). You could still do all of the Boolean algebra and K-map manipulations
and you would arrive at the right answer.
A0
A0 A1 A1 A2
A2
NAND
CS0
NAND
CS1
NAND
CS2
NAND
CS3
NAND
CS4
NAND
CS5
NAND
CS6
NAND
CS7

2. This circuit counts down from zero, instead of counting up. Thus, the sequence is 0, 7, 6, 5, 4,
3, 2, 1, 0 and so on. Following is the timing diagram:
Clock
Q0
Q1
Q2
T0 T1
T2
T3
T4
T5
T6
T7
T8
T9 T10 T11 T12
4. FF1 and FF2 are connect like a binary counter, so theyll count up from 00 to 11 and then
return to 00. Thus, we can ll them in right away. The XOR gate will have a zero output if
both inputs are 1 or 0, so after the RESET pulse, the *Q outpus are 1, so the output of the
XOR gate is 0. Thus, the pattern that will exit the XOR gate is 0, 1, 1, 0, 0. This is then the
inputs to FF# and FF4. The truth table is shown below:
After reset pulse
After clock pulse 1
After clock pulse 2
After clock pulse 3
After clock pulse 4
Q0
Q1
Q2
Q3
0
1
0
1
0
0
0
1
1
0
0
0
1
1
0
0
0
0
1
1
6. Each D-FF stage in the counter introduces a delay of 25 ns from the rising edge of the clock to
the time that the output change has settled down. Therefore, when we have one clock pulse enter
the rst FF, then changes to the last Q output (if any) will occur 8 25 ns later, or after 200 ns.
If we want to be sure that the last output has stabilized before another clock pulse can come in,
then the next rising clock edge cant occur until 200 ns has elapsed. Therefore, the frequency
of a string of clock pulses that have rising edges separated by 200 ns is 1/200 ns or 5.0 MHz.
7

8. Following is the circuitry for the switching logic block:
Switching Logic
Cin
Cout
K
LOAD
In case youre wondering how the circuit was derived, heres the details. Following is the truth
table:
Cin
LOAD
Cout
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
1
1
1
0
0
0
1
0
0
0
1
1
1
1
1
0
0
0
0
The K-map will simplify the equations for J, K and Cout:

J = D * [LOAD + Cin * Q] + [LOAD * Cin * Q]
K = D * [LOAD + Cin * Q] + [LOAD * Cin * Q]
Cout = Cin * Q

2. The solution is shown below:
000
101
001
110

Ain
0
1
0
1
0
1
0
1
Bin
0
0
1
1
0
0
1
1
Cin
0
0
0
0
1
1
1
1
Anext Bnext
1
1
X
0
1
0
0
1
1
0
X
0
1
1
0
0
Cnext
0
1
X
1
1
1
0
0
X = Dont care.
In the gure, below, each of the input variable becomes an output variable to determine the
next state.
The Karnaugh maps are shown below with the logic gates. In this particular case, we didnt
need to use the fact that state 010 never shows up. It doesnt help us.

A
AB AB
B B C C
C
AB AB
1
1
Anext = *A*B + *B*C + ABC
Anext
AB AB
Bnext
AB AB
1
Bnext = *A*B + *BC

AB AB
C
Cnext
AB AB
1
1
1
Cnext = A*C + *BC
6. The state diagram is shown below:
1/0
000
0/0
1/0
0/0
100
1/0
001
0/0
010
1/0 0/0
0/0
1/1
011
This leads to the following truth table (X = dont care):

a
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
IN
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
X
0
X
0
X
0
0
0
X
0
X
1
X
0
0
1
X
1
X
0
X
0
0
0
X
0
X
0
X
0
1
1
X
0
X
0
X
1
0
0
X
0
X
0
X
0
0
0
X
0
X
0
X
0
0
0
X
0
X
1
X
10

The gate circuit is shown below:
IN
a
b
c
C
Q
D
clock
B
Q
D
clock
A
Q
D
clock
OUT
Clock
11
Q
D
clock

A0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
A1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
R/W
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
CS
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
W0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CS0
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
W1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CS1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
W2
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
CS2
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
W3
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
CS3
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
The rst thing to notice is that the output side of the truth table is almost all 1s. This could
make the K-map reduction really tedious. However, there is hope. Suppose that on the output
side we change the sense of the logic. Lets make 0 true and 1 false. Suddenly, the truth table
gets a lot simpler. In fact, it becomes so simple we dont need a K-map because each output
variable is true for only one set of conditions (one row) of the input variables.
By inspection, the equations are:
W0 = A0 * A1 * (R/W) * CS
W1 = A0 * A1 * (R/W) * CS
W2 = A0 * A1 * (R/W) * CS
W3 = A0 * A1 * (R/W) * CS
CS0 = A0 * A1 * (R/W) * CS
W1 = A0 * A1 * (R/W) * CS
W2 = A0 * A1 * (R/W) * CS
12

W3 = A0 * A1 * (R/W) * CS
Now, lets see what we can do about simplifying these equations. Recall DeMorgans
Theorem. Consider the logical equation for W0.
W0 = A0 * A1 * (R/W) * CS
By DeMorgans Theorem, this is the same as: W0 = (A0 + A1 + R/W + CS)
W0 = A0 + A1 + R/W + CS
Thats pretty simple, weve gone from an ORing or AND terms with lots of NOT gates to a
simple OR term.
A0
W0
A1
W0
R/W
CS
W2
W3
CS0
CS1
CS2
CS3
4. A clock frequency of 50 MHz has a clock period of 20 nanoseconds. From the timing diagram
we see that RD, WR and ADDRVAL go low on the falling edge of T1 and they rise again to
end the cycle at the falling edge of T3. Since this is two clock periods, the slowest memory
that will work reliably must have an access time of at least 40 nanoseconds. Anything with an
access time of 40 nanoseconds or less will work in this application.
6a. Since the memory device has 17 address lines, it contains 128 K of addressable memory.
6b. Each memory location holds 16 bits, or 24 bits. Thus 217 memory locations 24 bits per
memory location = 221 bits, or 2M bits.
6c.
Page
0
1
7
Starting Address
00000
20000
E0000
13
Ending Address
1FFFF
3FFFF
FFFFF

6d. Six devices (3 pages 2 devices per page ).
6e. Since each memory device has 16 lines for data and only one write line, WE, it does not appear to be capable of writing only one byte. Thus, the minimum data quantity that you could
read or write from each memory device is a word of data.
8a. A microprocessor with a 20-bit address range using 8 memory chips with a capacity of 128K
each. The address range of the processor is 00000..FFFFF and the address range of each
memory chip is 00000..1FFFF
Memory chip
1
2
3
4
5
6
7
8
Address Range
00000..1FFFF
20000..3FFFF
40000..5FFFF
60000..7FFFF
80000..9FFFF
A0000..BFFFF
C0000..DFFFF
E0000..FFFFF
8b. A microprocessor with a 24-bit address range and using four memory chips with a capacity of
64K each. Two of the memory chips occupy the rst 128K of the address range and two chips
occupy the top 128K of the address range.
This is a bit tougher. The address range of the processor is 000000..FFFFFF and each of the
memory chips has an address range of 0000..FFFF. The address table looks like this.
Memory chip
1
2
3
4
Address Range
000000..00FFFF
010000..01FFFF
FE0000..FEFFFF
FF0000..FFFFFF
8c. A microprocessor with a 32-bit address range and using eight memory chips with a capacity
of 1M each. (1 M= 1,048,576). Two of the memory chips occupy the rst 2M of the address
range and six chips occupy the top 6M of the address range. This gets a bit tougher still.
Memory chip
1
2
3
4
5
6
7
8
Address Range
00000000..000FFFFF
00100000..001FFFFF
FFA00000..FFAFFFFF
FFB00000..FFBFFFFF
FFC00000..FFCFFFFF
FFD00000..FFDFFFFF
FFE00000..FFEEFFFF
FFF00000..FFFFFFFF
14

8d. A microprocessor with a 20-bit addressing range and using eight memory chips of different
sizes. Four of the memory chips have a capacity of 128K each and occupy the rst 512K consecutive addresses from 00000 on. The other four memory chips have a capacity of 32K each
and occupy the topmost 128K of the addressing range.
Memory chip
1
2
3
4
5
6
7
8
Address Range
00000..1FFFF
20000..3FFFF
40000..5FFFF
60000..7FFFF
E0000..E7FFF
E8000..EFFFF
F0000..F7FFF
F8000..FFFFF
15

2. All of the signals on the address bus or the data bus are of the same type and direction. The
status bus carries many different kinds of signals that may each be input, output, or both.
4a. Big endian/Little endian: Describes the manner in which 8-bit bytes are stored in memory
spaces that are organized as words (16-bits wide ) or long words ( 32-bits wide ). In big endian
format. The most signicant byte aligns with the most signicant bit of the word. In little
endian, the most signicant byte aligns with the least signicant bit of the word.
4b. Nonaligned access: Attempting to access a word or long word on a odd byte boundary
(A0 = 1) is called a nonaligned access. This type of access is illegal in the 68K because it
would require two memory cycles plus additional byte shifting in the processor to complete.
4c. Address bus, data bus, status bus: These are the three main busses of the processor. The
address bus presents the address of the next memory operation to the memory system. It is
unidirectional, that is all signals are outputs from the processor to memory. The data bus is
bi-directional. Data ows into the processor and out to memory on the same bus signals. The
status bus is heterogeneous. Some signals are input only, some are output only and others are
bi-directional. The status bus carries all of the housekeeping signals of the processor.
6. <$4000> = $FFFF5555
8.
************************************
*
* Program to reverse the order of 4 bytes in memory
*
*************************************
start
ORG
$400
move.b
move.b
move.b
move.b
move.b
move.b
$4000,D0
$4001,D1
$4002,D2
$4003,$4000
D2,$4001
D1,$4002
16

done
move.b
bra
end
D0,$4003
done
$400
10. The rst instruction puts the 32-bit number $FA865580 in D0. The next instruction moves it
to memory starting at address $4000 and going to address $4003, with $FA stored in $4000
and $80 stored in $4003. The next instruction is a logical shift left of a word start in at address
$4002. Thus, we are going to do a shift left of $5580, with each bit going to the next higher
bit position, with zero shifting into the LSB and the MSB going into the carry bit. Therefore
$5580 becomes $AB00. Moving the longword into D1 gives us. <D1> = $FA86AB00.
17

2.
***********************************************************
*
* Subroutine: CHECKSUM
* Description: This subroutine calculates a checksum for a
*
string of bytes stored in memory pointed to by address
*
register A0 and compares it with the checksum value
*
passed in register D0. The length of the string is
*
passed in register D1. If the calculated checksum
*
agrees with the transmitted value, then address
*
register A0 returns a pointer to the start of the
*
string. If the checksum comparison fails, the value
*
in the address register is set to 0. Any overow
*
in the checksum beyond FFFF is ignored.
* Registers:
A0 = longword pointer to string in memory.
*
D0 = word value of checksum passed into subroutine
*
D1 = word value length of string.
*
Return values: All registers, with the exception of A0,
*
are returned with their original values intact.
*
*
***********************************************************
ZERO
EQU
0000
checksum
MOVEM.L A1-A6/D0-D7,-(SP) * Save the registers not changed
MOVEA.L A0,A6
* Keep a local
CLR.W
D2
* Use D2 as an accumulator
CLR.W
D3
* Use D3 to hold byte operand
CMPI.W #ZERO,D1
* Test for 0 length string
BEQ
exit
* Were done
loop
MOVE.B (A0)+,D3
* Get value and advance pointer
ADD.W
D3,D2
* Add and accumulate
SUBQ.W #1,D1
* Decrement counter
BNE
loop
* Done?
MOVE.L A6,A0
* Restore A0
CMP.W
D0,D2
* Are they equal?
BEQ
exit
* OK
18

exit
LEA
ZERO,A0
* Not OK
MOVEM.L (SP)+,D0-D7/A1-A6 * Restore the registers
RTS
* Go back
The probability of the checksum not detecting an error is 1 part in 216, or 1/65,536.
4.
****************************************************************
*
* Subroutine : int_srch
*
* This subroutine searches a sequence of long-words in memory,
* starting at the address pointed to by A0. D2 returns the search
* results.
*
* Register usage:
*
* A0 = Pointer to search string
* D1 = Longword value to search for
* D0 = Number of elements to search for
* D2 = Returns zero or the address of the rst search match
*
* Assumptions:
* D0 contains a positive number between 1 and 65,535.
* The user knows the length of the sequence in memory. No error
* checking will be done by the program.
*
******************************************************************
int_srch
MOVEM.L
D3,-(SP)
*Save the registers I use
CLR.L
D3
*Well use D3 as a 32-bit counter
MOVE.W
D0,D3
*Load D3
loop
CMPI.L
#00,D0
*Is it zero?
BEQ
exit
*D0 is = 0, so exit
CMP.L
D1,(A0)+
*Check and advance A0
BEQ
match
*They are equal
SUBQ.L
#1,D3
*Decrement counter
BRA
loop
*Go back for next test
match
MOVE.L
A0,D2
*Transfer address
SUBQ.L
#4,D2
*Reset address
exit
MOVEM.L
(SP)+,D3
*Restore register
RTS
*Go home
6.
start
org $400
CLR.B $101F
MOVE.L
$1000,D0
MOVE.L
$1008,D1
*Get OP1, low order

*Get OP2, low order
19
no_carry
MOVE.L
MOVE.L
ADD.L
ADDX.L
BCC
ADDQ.B
MOVE.L
MOVE.L
END $400
$1004,D2
$100C,D3
D3,D2
D1,D0
no_carry
#1,$101F
D0,$1020
D2,$1024
*Get OP1, high order

*
8.
***********************************************************************
* Subroutine Send_String
* Sends a string of byte characters to a serial port
* A pointer to the data is passed in address register A0
* The data string is terminated with the null character, $FF
* The data is sent from D0 and the status is checked in D1
* The subroutine saves all registers used
**********************************************************************
* Equates for subroutine
Xmit
EQU
$4000
*Data Port for Serial I/O
Status
EQU
$4001
* Status Port for serial I/O
EOS
EQU
$FF
* End of String Character
TBE
EQU
01
* Transmitter buffer empty mask
reg_list
REG
D0/D1
* Saved registers
**********************************************************************
Send_Str
MOVEM.W
reg_list,-(SP)
*Save registers
Byte_loop MOVE.B
(A0)+,D0
*Get a byte
CMPI.B
#EOS,D0
*Is it FF?
BEQ
Exit
*If yes, exit
Xmit_loop MOVE.B
Status,D1
*Get status byte
ANDI.B
#TBE,D1
*Empty?
BEQ
Xmit_loop
*Not yet. Keep waiting
MOVE.B
D0,Xmit
*Ship it
BRA
Byte_loop
*Do it again
Exit
MOVEM.W
(SP)+,reg_list
*Get ready to leave
RTS
20

10. The ow charts for the main program and the subroutine are shown below:
Initialize
Get second pattern
Get first pattern
Do third test
Do first test
Complement
Pattern 2
Complement
Pattern 1
Do shifted bits test
Shift the bits one

to the left
NO
Done
Shifting?
Do fourth test
YES
Quit
Get third pattern
Do second test
Main Program
Save registers
Pattern
the
same?
Initialize
Save the pertinent

information
YES
Write pattern and

increment pointer
Memory
filled?
NO
Increment
address pointer
Increment the
counter
NO
Memory
tested?
NO
YES
Reset pointer
Read back pattern
Subroutine: Do Test
YES
Restore registers
and return
21

********************************************************************
*
* Memory test program
*
********************************************************************
* System equates
pattern1
pattern2
pattern3
st_addr
end_addr
stack
word
byte
bit
exit_pgm
data
start
EQU
EQU
EQU
EQU
EQU
EQU
EQU
EQU
EQU
EQU
EQU
EQU
$AAAA
$FFFF
$0001
$00001000
$0003FFFF
$000A0000
2
1
1
$2700
$500
$400
*
*
*
*
*
*
*
*
*
*
*
*
First test pattern

Second test pattern
Third test pattern
Starting address of test
Ending address of the test
Location of the stack pointer
Length of a word, in bytes
One byte long, NO MAGIC NUMBERS!
Shifting by bits
Simulator exit code
Data storage region
Program starts here
* Main Program
shift1
OPT
ORG
CRE
start
* Turn on cross references

* Program begins here
LEA
LEA
LEA
LEA
LEA
CLR.B
MOVE.W
JSR
NOT.W
JSR
MOVE.W
JSR
NOT.W
MOVE.W
JSR
ROL.W
BCC
MOVE.W
NOT.W
stack,SP
test_patt,A3
bad_cnt,A4
bad_addr,A5
data_read,A6
bad_cnt
(A3)+,D0
do_test
D0
do_test
(A3)+,D0
do_test
D0
(A3),D0
do_test
#bit,D0
shift1
-(A3),D0
D0
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Initialize the stack pointer

A3 points to the test pattern to use
A4 points to bad memory counter
A5 points to the bad addr location
A6 points to data storage
Clear bad address count
Get current pattern, point to next one
Run rst test
Complement bits for next test
Run second test
Get next pattern
Run third test
Complement bits for fourth test
Get last pattern
Run shift test
Shift bits
Done yet? No go back
Get test pattern 3 again
Complement test pattern 3
22

shift2
done
JSR
ROL.W
BCS
STOP
do_test
#bit,D0
shift2
#exit_pgm
*
*
*
*
Run the test

Shift the bits
Done yet? If not go back
Quit back to simulator
************************************************************************
*
* Subroutine: do_test
*
* Performs the actual memory test. Fills
* the memory with the test pattern of interest.
* Registers used: D1,A0,A1,A2
* Return values: None
* Registers saved: None
* Input parameters:
* D0.W = test pattern
* A4.L = Points to memory location to save the count of bad addresses
* A5.L = Points to memory location to save the last bad address found
* A6.L = Points to memory location to save the data_read back and data
* written
*
* Assumptions: Saves all registers used internally
************************************************************************
do_test
MOVEM.L
LEA
LEA
MOVE.L
ll_loop MOVE.W
CMPA.L
BLE
MOVE.L
test_loop MOVE.W
CMP.W
BEQ
not_ok
MOVE.L
ADDQ.W
MOVE.W
MOVE.W
SUBQ.L
addr_ok
ADDQ.L
CMPA.L
BLE
MOVEM.L
A0-A2/D1,-(SP)
st_addr,A0
end_addr,A1
A0,A2
D0,(A2)+
A1,A2
ll_loop
A0,A2
(A2),D1
D0,D1
addr_ok
A0,(A5)
#byte,(A4)
D1,(A6)+
D0,(A6)
#word,A6
#word,A2
A1,A2
test_loop
(SP)+,D1/A0-A2
*
*
*
*
*
*
Save registers
A0 points to start address
A1 points to last address
Fill A2 will point to memory
Fill and increment pointer
Are we done?
*
*
*
*
*
*
*
*
*
*
*
*
*
Reset pointer
Read value back from memory
Are they the same?
OK, check next location
Save the address of the bad location
Increment the counter
Save the data read back
Save the data written
Restore A6 as a pointer
A2 points to next memory location
Have we hit the last address yet?
No, keep testing
Restore registers
23

RTS
* Data Space
test_patt
bad_cnt
bad_addr
data_read
data_wrt
ORG
DC.W
DS.W
DS.L
DS.W
DS.W
END
* Go back
1
1
1
1
data
pattern1,pattern2,pattern3
* Keep track
* Store last
* What did I
* What did I
* Memory test patterns

of # of bad addresses
bad address found here
read back?
write?
start
If you examine the code youll see that there are no magic numbers. All numeric values used
as operands are given symbolic names with the EQUates so that each instruction is as readable
as possible.
24

2. The instruction moves the long word value $00001B00 from register D0 to memory location
$0040E376. The destination address is calculated by adding together the contents of address
register A6, data register D6.L (as a long word), and a displacement value, 70, ( hex $46). Thus,
Effective address = $0040C830 + $00001B00 + $46 = $0040E376
4a. Storage space for the variables used in C or C++ functions are created on the processors stack
in a stack frame. As long as the processor is executing in a particular stack frame, the variable
exists. Once, the function is exited, the stack frame for that function is no longer used, so the
variables can no longer be accessed.
4b. First, in order to determine the amount of storage space needed to allocate for each variable and,
second, to determine the type of code to generate to manipulate the different type of variables.
4c. These are the instructions that create and remove the stack frames.
6. This algorithm is an example of a jump table. The basic idea is that bits D15 through D12 of
the op-code word dene a particular category of instructions that can be decoded further in
subsequent stages of the algorithm. The purpose of this entry point is to vector the program
towards the correct decoding algorithms for each group of instructions. The three instructions:
MOVE.W
MOVE.B
LSR.W
(A6),D0
#shift,D1
D1,D0
*Well play with it here

*Shift 12 bits to the right
*Move the bits
fetch the next instruction from memory, and shift the bits 12 positions to the right, shifting in
zeros to the leftmost bit positions. Thus, the contents of D0 will be a number between 0 and $F.
The instruction:
MULU
#6,D0
*Form offset
creates an indexing value that multiplies the value in D0 by 6, the length, in bytes, of each of
the jump instructions in the jump table.
Finally, the instruction,
JSR
00(A0,D0)
*Jump indirect with index
uses the complex addressing modes, address register indirect with index and offset, to form a
vector to the appropriate subroutine through one of the 16 possible jump instructions in the table.
25

2. 50C3:1000
4a. MOV AX,DX = 8Bh 0C2h
4b. MOV BX[SI],BX = 89h 18h
4c. MOV DX,0A34h = 0BAh 34h 0Ah
6.
MOV
MOV
MOV
MOV
AX,0AA55H
BL,AL
AL,AH
AH,BL
MOV
MOV
MUL
MOV
MOV
AX,[204h]
BX,[206h]
BX
[202h],DX
[200h],AX
8.
10.
MOV
MOV
MOV
MOV
MOV
MOV
MOV
loader:
MOV
ES:MOV
INC
INC
DEC
JNZ
AX,8200H
BX,0C400H
DX,AX
ES,BX
SI,0000
DI,0000H
CX,1000
;Get source segment value

;Get destination segment value
;Load segment register
;Use the extra segment register
;Load source index register
;Load destination index register
;Load counter
AL,[SI]
[DI],AL
SI
DI
CX
loader
;Get byte
;Store byte
;Advance pointers
;Decrement counter
;Done? No,go back.
26

2. The Fast Interrupt Request mode is used for servicing certain classes of interrupts where time is
most critical. In particular, it is useful when the time required to switch from the current operational mode to the FIQ mode must be kept to a minimum. The FIQ mode does this in two ways:
a. A second partial set of registers r8_q through r14_q, are bank-switched into the register
le when FIQ is entered,
b. The FIQ vector sits at the top of the exception vector table so that the FIQ code can begin
at that location, rather than needing a jump to the start of the code.
4. The instruction is illegal because the number, &103, cannot be represented as an 8-bit value
and an even number of ROR bit shifts.
6. MOV
ORR
ORR
ORR
8. MOV
LDR
ADDS
STR
r4,#&06000000
r4,r4,#&00AA0000
r4,r4,#&00004C00
r4,r4,#&00000001
r0,#&00001000
r1,[r0]
r3,r3,r1
r3,[r0]
;Initialize pointer
;Get operand
;Do addition
;Write back
27

2. Executing the code will cause a divide by zero exception to occur. The contents of D1 is
decreased by 8 each time through the loop. Eventually, <D1>=0 and the division operation
will fail.
According to the programmers manual, a divide by zero exception is handled by the exception
vector located at address $00000014. This location contains the address $00CCAA00, so the
processor will go to $00CCAA00 to handle the exception.
4a. Keyboard strike input: This is a low priority because there is little danger of losing the data.
The human response time is very slow, but you dont want to give it too low a priority if the
system is heavily loaded because the system might then become unresponsive to a keyboard
input. Ans. 0-3
4b. Imminent Power failure: This is what non-maskable interrupts were made for. Ans. 7
4c. Watchdog timer: This is also a high priority interrupt that would either trigger the NMI or a
reset. Ans. 7
4d. MODEM has data available for reading: This is generally a mid to high priority interrupt because data could be coming fast ( ISDN for example ) and it would get lost if the ISR was too
slow or too late. Ans. 4-6
4e. A/D converter has new data available: This is similar to the modem problem, above, however,
A/Ds generally dont free run and they are triggered by the computer to begin another digitization. However, they still need to be responsive, and the data could be coming fast. Generally,
you have as much time to respond to the IRQ as the digitization time of the converter. Ans. 3-6
4f. 10 millisecond real time clock tick: Lots of ways to argue this one. A computer can execute a
lot of instructions in 10 milliseconds, and an ISR for a timer tick is generally pretty fast, so the
interrupt could be low priority. However, sometimes the timer tick signals an important event,
like an O/S task switch, which is a high priority interrupt. Ans 2-6
4g. Mouse click: Very low priority, similar to a keyboard, but it has to be fast enough to catch a
double click. Ans. 0-3
4h. Robot hand has touched solid surface: This is pretty high priority because damage could be
done if the hand isnt tightly controlled. Ans. 5-6.
28

4i. Memory parity error: This is like a power failure. Any system failure must be serviced. Ans. 7
4j. Incoming FAX transmission ( ring detected ): The computer has lots of time to respond.
Ans. 0-3
6. The data values are taken every 200 microseconds ( 200x106 seconds ). The values represent a
digital fraction of a span from 000 to 3FF in hexadecimal. How did I know that? A 10-bit A/D
can divide an analog voltage span into 1023 increments ( 210 = 1023). Thus, we have an analog
span of 4 volts ( 2 volts to + 2 volts ) and we are dividing it into 1023 increments. Thus, each
voltage increment = 4/1023 or 3.91 mV. Now, we can express the analog voltage output to the
strip chart output device by this equation:
Vout = XXX * (3.91 mV) 2.00V
The data is:
2.5
2C8, 33B, 398, 3DA, 3FC, 3FB, 3D7, 393,
2
334, 2BF, 23E, 1B8, 137, 0C4, 067, 025, 003,
004, 028, 06C, 0CB, 140, 1C1, 247
1.5
The plotted data looks like the gure shown,
right:
The waveform is periodic. According to the
data, the maximum of the sine wave occurs at
900 microseconds and the minimum occurs
at 3300 microseconds. Thus, the elapsed time
from the maximum to minimum is 1/2 of a
cycle, so the period of the sine wave is (3300
900 ) 2 = 4800 microseconds
1
0.5
0
0.5
1
40
0
0
0
0
0
0
0
0
0
0
80 120 180 200 240 280 320 360 400 440
1.5
2
2.5
Therefore, period = 4800 106 seconds = 4.8 103 seconds

8a. Artillery shell shock wave measurements at an Army research lab: D
8b. General purpose data logger for weather telemetry: B
8c. 7-digit laboratory quality digital voltmeter: A
8d. Molten steel temperature controller in a foundry: C
10. 1525.9 ohms
29

2. Part A: Having a longer pipeline allows the processor to run with a higher clock speed because
the processor has to do less at each stage of the pipeline. However, because the pipe is longer
there will be a higher penalty of lost performance if the pipeline stalls or has to be ushed
because a new instruction sequence must be started in that pipe. Thus, the shorter the pipeline,
the less performance penalty will be caused if a stall or ush occurs due to a data or structural
hazard.
Part B: As stated in part A, a longer pipeline allows for a higher clock rate because the processor does fewer operations at each stage of the pipe. Ultimately, the benchmark determines how
many instructions are executed in a given period of time. Thus, if the benchmarks are comparable then the Athlon and Pentium must be executing the same number of instructions per unit
time. If the Pentium clock is faster, then the Athlon pipeline must be doing more at each stage,
so it has to run at a slower clock speed.
4. CLR.L
MOVE.B
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
MOVE.L
SUBA.L
D4
#4,D4
D4,SP
D0,(SP)
D4,SP
D1,(SP)
D4,SP
D2,(SP)
D4,SP
D3,(SP)
D4,SP
D5,(SP)
D4,SP
A0,(SP)
D4,SP
A1,(SP)
D4,SP
A2,(SP)
D4,SP
*D4 will be the subtrahend

*Value to subtract by one long word
*Pre-decrement pointer
*Move D0
*Move D1
*Move D2
*Move D3
*Move D5
*Move A0
*Move A1
*Move A2
30

6. This is referred to as loop unrolling. By removing a loop, processors with pipeline architectures can run more efciently because all of the instructions that are in-lined will move
through the pipe and be executed when they are fetched. With loop structures, parts of the
pipeline will have to be ushed and reloaded because the instructions in the loop are executed
out of order.
8. The rst, second and third instructions depend upon each other, therefore, they would tend to
stall a pipe, or at least prevent the second from completing until the rst completes, and the
third from completing until the second completes. The solution is to re-order the instructions
so that they can be executing independently. See below:
MOVE.W
MOVE.W
LEA
ADD.W
ADDA.W
MOVE.W
MULU
D1,D0
#$3400,D2
(A4),A6
D0,D3
D7,A2
#$F6AA,D4
D3,D1
Actually, there are lots of possibilities. The idea is to place instructions without dependencies in
the space between the instructions with dependencies. That way, the dependent instructions can be
completed without needing to wait for the next instruction.
31

2. Because programs tend to execute instructions that are located near each other because
they are loaded sequentially (Locality of Reference), many blocks of code will be executed
from memory regions that are physically close to each other. Thus, a smaller cache memory
works for this situation. Also, Temporal Locality tells us that memory regions that have been
accessed will usually be accessed soon again, so that instructions and data kept in a cache will
probably be needed again.
4. This example does reinforce the idea of locality because all of the elements of the array are
located together in memory. Even though the variable, daysArray, is a pointer variable to
an array of pointers to char, each of the array elements points to regions of memory that are
immediately adjacent to each other. Had each of the pointers pointed to strings that were
located in various regions of memory, it would not have been an example of locality.
6. a. There are 212 total memory locations in the cache, not counting the memory required for
the address tags. Since there are 26 bytes per rell line, there are 26 rell lines in the cache.
b. Since is a direct mapped cache, the number of rell lines in the cache equals the number
of rows of rell lines in main memory. Therefore, there are 26 rows in main memory. Main
memory has a total of 220 bytes, this is equivalent to 214 rell lines. Since we have 26 rows
we must have 28 columns to give a total of 214 rell lines. Therefore we have 26 rows by
28 columns in main memory.
c. Since there are 28 columns, we need 8 address bits in our address tag memory to identify
which column in main memory is in a given row of the cache at any point in time.
d. In order to see where the main memory address maps we need to identify which bits
dene the offset, rows and columns.
219 218 217 216 215 214 213 212 211 210 29 28
|
column
|
row
27
26
25
|
24
23 22
offset
21
Now we need to look at the bit pattern for $3FB0A

0 0 1 1
1 1 1 1
1 0 1 1 0 0 0 0
1 0 1 0
Rearranging this into the proper format:
0 0 1 1
1 1 1 1
1 0 1 1 0 0
0 0 1 0 1 0
Therefore: The address is in row $2C, column $3F
32
20
|

8. If we have a cache miss, then the processor must rell one line in the cache. Since this is
a 32-bit processor, we can read 4 bytes at once. Thus, we must read main memory 64/4 or
16 times. Since each external memory fetch requires 2 cycles, the memory read requires
16 2 = 32 clock cycles. Each clock cycle is 10 nsecs, so the miss penalty requires 320 nsec.
Effective execution time = 0.98 * 10 + .02 * 320 = 9.8 + 6.4 = 16.2 nsec
33

2. More memory allows the operating system to keep more tasks resident in memory, rather than
having to swap out tasks to the hard disk. Since the ratio of access time between memory and
the hard disk is about 10,000 to 1, this would have much more of an impact than doubling the
speed of the processor.
4.
a. Increase the clock rate

b. Improve the internal design of the CPU so that it is more efcient
c. Improve the compiler so that fewer instructions are required or that the clock cycles per
instruction is lower, or both.
6. First, lets calculate the average number of cycles per instruction for the program compiled by
each compiler.
Compiler A:
Ave. # of cycles per instruction for program = 2 0.4 + 3 0.1 + 4 0.3 + 6 0.2
= 0.8 + 0.3 + 1.2 + 1.2
= 3.5 CPU cycles per instruction
Compiler B:
Ave. # of cycles per instruction for program = 2 0.6 + 3 0.2 + 4 0.1 + 6 0.1
= 1.2 + 0.6 + 0.4 + 0.6
= 2.8 CPU cycles per instruction
However, the calculation of the average number of CPU cycles per instruction is based upon
two different numbers of instructions for the program compiled by each compiler, so were
comparing apples and oranges if we just stop here.
For compiler A, the total number of CPU cycles to execute the program is:
1000 instructions 3.5 cycles per instruction = 3,500 cycles
For compiler B, the total number of CPU cycles to execute the program is:
1200 instructions 2.8 cycles per instruction = 3,360 cycles
Thus, while theyre relatively close, compiler B takes 140 less CPU cycles to execute the program than compiler A, so compiler B is the better choice.
8. The basic block is the idea code ow for a pipelined processor. Since there are no loops, or
multiple exit points, all the code ows through the pipeline in succession Thus, there are no
wasted clock cycles due to the processor needing to ush the pipe and reload it from memory.
34

2. No, calculating the value of the exponent, e, we nd that for:
N = 4, k = 2.83 and G = 6, e ~ 0.2
4. A 10 GHz clock frequency has a period of 100 picoseconds. Three levels of gate delay equals
84 picoseconds, leaving a margin of 6 picoseconds. If we must have 10 picoseconds for our
slop in the circuit, then we are left with 6 picoseconds for clock skew. The speed of light on
a circuit is approximately 6 inches per nanosecond, so, in 6 picoseconds, light travels around
0.036 inches. This is the maximum difference in path length that we can tolerate and still
expect the circuit to function correctly.
35

Hardware and Computer Organization: The Software Perspective

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hardware and Computer Organization: The Software Perspective

Загружено:

Авторское право:

Доступные форматы

Hardware and Computer Organization

The Software Perspective

AMSTERDAM BOSTON HEIDELBERG LONDON

Solutions for Even-Numbered Problems

Chapter 1: Solutions for