Research Paper

Project 4
MIPS Pipelined Processor

ECECS 314, Computer Organization
Alan Leung (geke00) & Lav Varshney (lav)
May 4, 2002
1. Arithmetic Logical Unit (ALU)

Our implementation for the ALU involved making functional blocks that performed each operation separately, and
then using the control inputs to combine these different signals into a single output. It was necessary to create many
sub-blocks that were placed in myparts.cast. See Section 5 for a listing of blocks in myparts.cast. We had
experimented with a bit-slice implementation that combined 32 bit-slices using various methods, but this
implementation was quite slow, hence we moved to the current block implementation. The bit-slice implementation
was particularly slow and redundant in shifting arithmetically because each bit slice had to be given the sign bit.
This large use of the sign bit causes the signal to encounter a very large capacitance, requiring a great amount of
time to drive the signal from GND to Vdd, and from Vdd to GND. Each of the functional blocks is described in the
subsections. To combine the various outputs of the functional blocks together, the outputs are ANDed with their
select line, and then ORed together. This is done with SuperBusMux11.
1.1 Carry Lookahead Adder-Subtractor

We designed an adder/subtractor for use with add, sub, slt, and sltu ALU operations. It was a carry lookahead adder.
We designed a Full Adder that produces generate and propagate signals, rather than a carryout. We then designed a
4-bit carry lookahead unit, combining the outputs of four full adders, which generates carry, group generate, and
group propagate bits. We then designed a 16-bit carry lookahead unit, which combines the outputs of four 4-bit
carry lookahead units, producing group carries based on the group generate and group propagate bits from the 4-bit
carry lookahead units. Then, combining 32 full adders, 8 4-bit carry lookahead units (one for each block of 4 full
adders), and 2 16 bit carry lookahead units (1 for each block of 4 4-bit carry lookahead units), we created a 32-bit
carry lookahead adder.
To create a 32 bit adder/subtractor, the 2nd input's bits were XORed with the subtract select bit to negate the inputs
when subtracting. The subtract select bit was also fed in as the carry in, to complete the 2's complement negation.
1.2 Slt and Sltu Operations

The calculation for slt involves a conditional structure: if a is negative, and b is non-negative, then a < b; if b is
negative, and a is non-negative, then a > b; if both are the same sign, then subtract a - b, and the sign bit of this is the
result bit. We check for negative/non-negative by looking at the top bit of a and b. The top 31 bits of the output are
grounded and the result bit is tied to the least significant output bit. To calculate sltu, subtract a - b, and the
inversion of the end carry of this subtraction is the result bit. The top 31 bits are grounded and the result bit is tied
to the least significant output bit. The code for slt and sltu are contained within the definition for ALU in alu.cast.
We decided to place the code in this final stage of organization since these operations depend on the results of
addition/subtraction, and therefore cannot be effectively modularized to be performed in parallel with the other 32bit functional blocks. Thus, we expected that slt and sltu may have been in the critical path of our ALU.
1.3 Logical Operations

To perform the logical operations: and, xor, or, & nor, the corresponding logical operation is performed bitwise.
This is done in CAST using a loop with 32 iterations, one for each of the 32 bits to be operated on.
1.4 Shift Operations

To perform left logical shifting, we use five 2-to-1 32-bit bus multiplexers that are "rippled" together. The first
multiplexer in the chain has its upper bits (corresponding to 0 select) tied to an unshifted version of the b input. Its
bottom bits (corresponding to the 1 select) are tied to a version of the b input that is shifted left logically by 16-bits.
The select bit for this first multiplexer is the most significant shift amount bit (c.sa[4]). We can do this because we
know if the top shift amount bit is 1, then we will have to shift by at least 16 bits. The second multiplexer has its top
bits tied to the output of the first multiplexer, and the bottom bits are tied to a left-logical-shifted version of the
output from the first multiplexer. The select line is tied to the second-most-significant shift amount bit. This time,
the shift is by 8 bits. We can do this because we know if the second-most-significant shift amount bit is 1, then we
will have to shift by an additional 8 bits. We continue this pattern with 3 more multiplexers, feeding in an unshifted
and a shifted version (by 4, 2, and 1 -bits, respectively) of the output from the previous multiplexer, and using the
next less significant bit as the multiplexer select. The end result of this chain of multiplexers is shifted by the proper
amount. Note that by "left-logical-shifted" I mean that we fill the remainder bits with 0, which is done by tying the
corresponding multiplexer inputs to GND. Note that this shifter requires only 5 32-bit bus multiplexers. An
alternate implementation would have used a barrel shifter that would require 32 32-to-1 multiplexers, which would
have more than tripled the number of transistors needed for the ALU.
To perform right logical shifting, we use the same method as above, with the exception that we tie right-logicalshifted rather than left-logical shifted versions into the multiplexers.
To perform right arithmetic shifting, we use the same method as above again, but we set the remainder bits to the top
bit of the b input (this is how we sign extend), rather than simply to GND. This is done by tying the corresponding
multiplexer inputs to the most significant input bit (b.d[31]). When experimenting with using a barrel shifter for
possible speed gains, we found that the speed was in fact dramatically decreased, most likely due to the fact that any
change in the sign bit of b would have to propagate across the multiple inputs on 31 of the 32 large multiplexers. As
noted above, the huge resulting input capacitance causes long delays. Therefore, we found that a barrel shifter is an
inefficient implementation for wide ALUs.
1.5 Delay
The largest delay that we found was in the slt operation, and this was a delay of 15.10 ns. This seems reasonable,
because slt relies on the adder/subtractor to produce a result before the block can produce a stable result. Thus, we
have a serial combination of two blocks that takes the delay of both circuits for the output to stabilize. To find the
largest delay, first set a and b to undefined states. Then we stepped and fed slt the values 0x7fffffff and 0x7fffffff,
because subtracting these numbers will cause carries on the most bits for subtraction between positive numbers,
which is the worst case for the carry lookahead hardware. Since subtracting is an expensive operation, we tried
subtraction of 0xffffffff and 0xffffffff. This result causes carries on all bits. However, the delay for this was only
14.60 ns, which is less than the worst-case for slt. The worst-case delay is included in the test file as the final test
case.
2. Central Processing Unit (CPU) Datapath

The CPU Datapath is broken up into five pipeline stages, the Instruction Fetch (IF) stage; the Decode/Operand Fetch
stage (RD); The Execution stage (EX); the Memory (MEM) stage; and the Writeback (WB) stage. Each is described
in its respective subsection. The testing that is mentioned is done by setting control inputs manually, rather than
with any control. The linkage of the stages is described in Section 2.6.
2.1 Instruction Fetch

The IF stage takes 3 30-bit buses as input, and 3 control inputs. These buses are the computations for the top 30 bits
for the new PC from the EX stage for branches, jumps, and jump registers respectively. Internally, there is a PC + 4
computation that happens. Multiplexers are used to select among these 4 possible new PC calculations, using the
control inputs as select lines. There is a 30-bit negative-edge triggered register, which is the PC. The PC
automatically sets to zero, when Reset. This stage also has communication with the Instruction Memory, from
which new instructions are retrieved and passed along. The address bits sent to the memory take the top 30 address
bits from the PC, while the lower 2 bits are grounded, to ensure word alignment. The instruction fetched from the
Instruction Memory is passed along to the rest of the stages, as is the PC value.
2.2 Decode/Operand Fetch

The RD stage has a set of multiplexers, which selects whether to write results to the register in the rt slot in the
instruction, the register in the rd slot in the instruction, or $31, based on two control inputs. The immediate sign
extender could have been placed in this stage, but we decided to place it in the next stage, due to the fact we believe
the critical path may lie in the Register File access, and for simplicitys sake.
2.3 Execution
The EX stage takes the instruction and the PC value as input. In addition, it takes the two 32 bit data outputs of the
previous stage, the operands read from the register file. It also takes outputs from itself and the WB stage, for data
hazard operations. There are also many control signals. The outputs are the new PC calculations for branches,
jumps, and jump registers. Another output is the data that is calculated with the ALU. The data that is passed from
the RD stage, is also passed through for possible storage in the data memory, though with possible bypassing. Six
control outputs are also produced, that are a product of comparators; these are used for branch taken determination.
The main internal block that is used is the ALU, as described in Section 1. Multiplexers are required to handle the
bypassing that happens to avoid data hazards. Two MUXes are required for each of the data inputs to the block, to
choose between the RD output, the EX output, and the WB output. An additional MUX is required to choose
between data and immediates for one of the inputs to the ALU. A sign extender is also necessary, to sign extend
immediates when they are to be sign extended. To determine the shift amount, we are using a set of multiplexers to
choose either 10000 for lui; the shamt in the instruction for sll, srl, or sra; the lower five bits of the RD stages
output for the variable shifts. It is necessary to determine the shift amount as a part of Datapath, rather than control
because bypassing is necessary for the variable shift shift amounts. We are using a separate adder to perform the
branch PC calculation, for speed concerns. We are also using a comparator to determine the equality and relation to
zero.
2.4 Memory
The MEM stage has controls to the data memory. In terms of datapath elements, the MEM stage performs the PC +
8 computation for the "and link" instructions. For load instructions, this stage performs the decoding and signextending of the output from memory, with respect to the specific load instruction. The decoder/sign-extender is in
essence a shifter that right shifts in byte increments, either logically or arithmetically, depending on the instruction.
This stage also performs the lwl and lwr decoding, as described in Section 2.4.1.
2.4.1 LWL and LWR implementation
We implemented a block that takes the data read from memory and the data from the rt register. This block is
actually made from two other blocks, one that calculates the value for lwl, and one that calculates the value for lwr.
Each of these blocks is implemented by a set of 4 4-to-1 8-bit bus muxes, one for each byte in the result. The b3,
b2, and r3, r2, bytes are connected to these muxes such that by using the offset value as the select line on these
muxes, the result of all four muxes is the 32-bit result of the operation. Finally, the outputs from the lwl block and
the lwr block are muxed together in a 2-to-1 32-bit bus selected by a control bit that determines whether the
operation was an lwl instruction or an lwr instruction. The output of this block is our final calculated value for
lwl/lwr. To implement the bypassing, we added a bypass mux on the r input of this lwl/lwr block. This bypass mux
is connected to the rt value from the EX stage, and also a bypass from the WB writeback line. This bypass allows
two consecutive instructions from this instruction class to be executed.
2.5 Writeback
The writeback stage, WB, has a set of multiplexers, which makes up the data path portion of it. The multiplexers
select which data to writeback: the load output from the MEM stage, the lwl/lwr output of the MEM stage, the
output from the EX stage, or the PC + 8, depending on which instruction it is.
2.6 Connection of the Five Stages

The stages are all connected inside of cpu.cast. The pipeline registers for the instructions, the PC, and various other
flopped data values are also implemented within cpu.cast. The 2 32-bit operands fetched from the register file are
flopped between the RD and EX stage. The memory to be stored in memory, and the ALU computation, both 32
buses are flopped between the EX and MEM stages. The ALU computation is further flopped to the WB stage. The
PC + 8 computation and the data read from the data memory are flopped from the MEM to WB stage.
3. CPU Control
The controls that are based on a single stages instructions are located in control.cast, and are described in Section
3.1. The forwarding controls that are needed for data bypassing for data hazard avoidance are in the data forwarding
unit, forwarding.cast. This is described in Section 3.2. Most of the control signals calculations are performed
instantaneously, rather than performing all control calculations in RD and flopping the results. To do this, many
instructions are taken from IR, IR1, IR2, IR3, and IR4. It was felt this was a better approach than flopping the
controls through, because of the savings of extraneous registers, and the fact that some controls (such as branch
taken) could not be calculated at the beginning.
3.1 Instruction Decoding Control

A list of control lines that are needed for each stage, and the response for each implemented instruction are shown in
Table 3-1. The resulting logic equations that were derived are shown in Table 3-2.
3.2 Data Forwarding Unit

There are essentially three data forwarding units. One of these is to deal with the data hazard associated with
consecutive ALU operations using a register written in the first instruction as the operand for the second. This data
forwarding unit takes as input the rs and rt fields for the EX instruction, and the write address for the MEM stage,
and determines whether to forward by comparing the addresses (i.e., if the addresses are the same, then we must
forward). The other forwarding unit deals with data hazards associated with a register write operation followed 2
stages later by an operation that uses that register as an operand. This forwarder also takes the rs and rt fields for the
EX stage, but the write address from the WB stage. It works the same way. The last data forwarding unit
corresponds to the lwl/lwr calculation in the MEM stage. This forwarder takes only the rt field from the MEM stage,
and the write address from the WB stage. Once again, this forwarding works the same as described above.
4. Testing
A great deal of testing was performed on the individual blocks and on the final datapath. The details of the block
testing are beyond the scope of this design report. The final datapath testing is discussed only briefly, as further
discussion would also be beyond the scope of the report. To test the arithmetic and logical style instructions, one or
more instructions were issued of each type, such that there were no data hazards. To test the bypassing and
forwarding, a set of operations were written that had many data hazards that had to be resolved. The data to be
written and the address to which to be written in the Register File were examined in the waveform viewer. To test
load instructions, a known word was stored in memory, and then it was retrieved and stored in the register file. To
test the store instructions, storage was performed, and then the stored was loaded. In addition to the register file
inputs, the data memorys data and control inputs were also examined. To test branches and jumps, the PC was
examined. For the linking branches and jumps, the storage of PC + 8 was also ensured. A few of the interesting test
cases waveforms are shown in attached figures.
5. Sub-blocks Defined
A listing of the parts, in addition to the standard set of 5 stages, that were defined is shown in Table 5.1.
Block
Location
Description
Nand3
myparts.cast
3 input NAND
Nor3
myparts.cast
3 input NOR
Xor2
myparts.cast
2 input XOR
Xnor2
myparts.cast
2 input XNOR
And2
myparts.cast
2 input AND
And3
myparts.cast
3 input AND
And4
myparts.cast
4 input AND
Nand4
myparts.cast
4 input NAND
And5
myparts.cast
5 input AND
And6
myparts.cast
6 input AND
And7
myparts.cast
7 input AND
Or2
myparts.cast
2 input OR
Or3
myparts.cast
3 input OR
Or4
myparts.cast
4 input OR
Or5
myparts.cast
5 input OR
Or6
myparts.cast
6 input OR
Or10
myparts.cast
10 input OR
Or11
myparts.cast
11 input OR
Mux
myparts.cast
2 to 1 single-bit multiplexer
BusMux32
myparts.cast
2 to 1 32-bit multiplexer
BusMuxN
myparts.cast
2 to 1 N-bit multiplexer
Mux4
Mux32
And32ALU
Xor32ALU
Or32ALU
Nor32ALU
Srl32
Sll32
Sra32
Fagp
4bitCarryLookaheadUnit
16bitCarryLookaheadUnit
32bitAdder
30bitAdder
32bitAdderSubtractor
NodeMatrix
SuperBusMux11
Zcomparator
Bcomparator
4to1BusMux8
MemLoadShifter
Register
Sext2
Incrementer
Lwl
Lwr
LwlLwr
SubControl
Excontrol
Control
RWcomparator
Forwarder
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
subcontrol.cast
excontrol.cast
control.cast
forwarding.cast
forwarding.cast
32 bit bitwise AND
32 bit bitwise XOR
32 bit bitwise OR
32 bit bitwise NOR
32 bit logical right shifter
32 bit logical left shifter
32 bit arithmetic right shifter
Full adder, that produces generate and propagate
First level of carry lookahead logic
Second level of carry lookahead logic
32 bit carry lookahead adder
30 bit carry lookahead adder
32 bit carry lookahead combination adder/subtractor
Definition for a matrix of nodes
Multiplexes 11 32-bit inputs, with 11 one-hot select lines
Determines equality/inequality with zero of 32 bit inputs
Determines equality/inequality of 2 32 bit inputs
4 to 1 8-bit multiplexer
Shifter that aligns the output from memory loads
N-bit positive edge-triggered Register with synchronous clear
16 to 32 bit sign/zero extender
30 bit incrementer
Performs the load word left operation
Performs the load word right operation
Multiplexer that chooses between lwl and lwr
Control logic for IF, RD, MEM, and WB stages
Control logic for EX stage
Combines SubControl and Excontrol
5 bit equivalence comparator
Control logic to handle data forwarding
Table 5-1: Parts
Table 3-2: Control Equations
RDInstrX refers to the Xth bit of the output of IR

EXInstrX refers to the Xth bit of the output of IR1
MEMInstrX refers to the Xth bit of the output of IR2
WBInstrX refers to the Xth bit of the output of IR3
aeb, aneb, alez, agez, alz, & agez are inputs to the control, denoting equality/inequality of operands
-----------------------------------------------------------------------------------------------------------------------------------IF------------------------------------Brtaken = (aeb & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & !EXInstr27 & !EXInstr26)| (aneb & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & !EXInstr27 &
EXInstr26)| (alez & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & EXInstr27 & !EXInstr26)| (agz & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & EXInstr27 &
EXInstr26)| (agez & !EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & !EXInstr27 & EXInstr26 & EXInstr16)| (alz & !EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28
& !EXInstr27 & EXInstr26 & !EXInstr16)
Jtaken = (!EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & EXInstr27)
JRtaken = special & !EXInstr5 & !EXInstr4 & EXInstr3& !EXInstr2 & !EXInstr1
RD------------------------------------WE = (!RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & RDInstr27 & RDInstr26) | (!RDInstr31 & !RDInstr30 & RDInstr29) | (RDInstr31 & !RDInstr30 & !RDInstr29) |
(special & !(!RDInstr5 & !RDInstr4 & RDInstr3& !RDInstr2 & !RDInstr1 & !RDInstr0)) | (!RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & !RDInstr27 & RDInstr26 &
RDInstr20)
chooseRTorRDtoWriteto = !RDInstr31 &!RDInstr30 &!RDInstr29 &!RDInstr28 &!RDInstr27
jal_bltzal_bgezal = !RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & RDInstr26
EX------------------------------------[special = (!EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & !EXInstr27 & !EXInstr26)]
SignExtendImmediate = (!EXInstr31 & !EXInstr30 & !EXInstr29) | (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28) | (EXInstr31) | (!EXInstr31 & !EXInstr30 & !EXInstr29
& !EXInstr28 & !EXInstr27 & EXInstr26)
ImmediateInstruction = (!EXInstr31 & !EXInstr30 & EXInstr29) | (EXInstr31)
lui = EXInstr29
shiftVar = EXInstr2
add = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & !EXInstr27) | (EXInstr31) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr2 & !EXInstr1))
and = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & !EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & !EXInstr1 & !EXInstr0))
xor = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & EXInstr1 & !EXInstr0))
or = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & !EXInstr27 & EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & !EXInstr1 & EXInstr0))
nor = (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & EXInstr1 & EXInstr0))
sub = special & (EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr2 & EXInstr1)
sltu = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & EXInstr27 & EXInstr26) | (special & (EXInstr5 & !EXInstr4 & EXInstr3 & !EXInstr2 & EXInstr1 & EXInstr0))
slt = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & EXInstr3 & !EXInstr2 & EXInstr1 & !EXInstr0))
sra = special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr1 & EXInstr0)
srl = special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr1 & !EXInstr0)
sll = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & EXInstr27 & EXInstr26) | (special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr1 & !EXInstr0))
MEM------------------------------------DMC0 = (RDInstr31 & !RDInstr30 & RDInstr29 & RDInstr26)
DMC1 = (RDInstr31 & !RDInstr30 & RDInstr29) & !(!RDInstr27 & RDInstr26)
signed = MEMInstr31 & !MEMInstr28
CTRLbh = MEMInstr31 & MEMInstr26
CTRLword = MEMInstr27 & MEMInstr26
chooseLwlorLwr = MEMInstr28
WB------------------------------------link = (!WBInstr31 & !WBInstr30 & !WBInstr29 & !WBInstr28 & WBInstr27) | (!WBInstr31 & !WBInstr30 & !WBInstr29 & !WBInstr28 & !WBInstr27 & WBInstr26) | (special
& !WBInstr5 & !WBInstr4 & WBInstr3)
WB_EXorMEM = WBInstr31
isLwlLwr = WBInstr31 & !WBInstr30 & !WBInstr29 & WBInstr27 & !WBInstr26
Boot Code uses addi to write 0 to the registers
Figure 1. Beginning of Boot Code
cpu
(01.ps) Sat May 4 20:49:07
Daddr
00000000
Dout
00000000
DMC
00
RW XX
00
01
02
03
04
05
06
07
08
09
0a
0b
0c
00000000
IR XXXXXXXX24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000240c0000240d0000 240e0000 240f0000
PC XXXXXXXX00000000 00000004 000000080000000c00000010 00000014 000000180000001c00000020 00000024 00000028 0000002c 00000030 00000034 00000038 0000003c
IR1
00000000 24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000 240d0000 240e0000 240f0000
PC1
00000000
00000004 00000008 0000000c 00000010 00000014 000000180000001c00000020 00000024 00000028 0000002c 00000030 00000034 00000038 0000003c
IR2
00000000
24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000 240d0000 240e0000
PC2
00000000
00000004 000000080000000c00000010 00000014 00000018 0000001c 00000020 00000024 00000028 0000002c 00000030 00000034 00000038
IR3
00000000
24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000240c0000240d0000
PC3
00000000
00000004 000000080000000c00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030 00000034
IR4
00000000
24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000
PC4
00000000
00000004 00000008 0000000c 00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030
CLK
c.r.WE
time (ns)
0.00
250.00
500.00
750.00
1000.00
1250.00
1500.00
1750.00
2000.00
2250.00
2500.00
lui $29, $0, 0x7fff
PC + 8 stored to $31
jalr $31, $1
sw $0, 4($29)
ori $29, $29, 0xae50
Figure 2. End of Boot Code
cpu
(03.ps) Sat May 4 20:50:13
Daddr
00000000
7fffae50
Dout
00000000
7fff0000
DMC
RW
00000000
7fffae50 7fffae54 7fffae58
W 7fff0000
1b
1c
00000000
f2340000 00001111 0000000f f2341234 f2342cf0
00000000
00
1d
00000000
00
1d
7fffae50
1e
1f
00000000
ffffffeb 00000001
004000d0 00000000 f2340000 00000000 f2342cf0
00
00000000
00
00
1f
00000000
00
02
01
0a
02
08
000000a4 00000000 f2340000 00001111 0000000f f2341234 f2342cf0
03
ffffffeb 00000001
IR
241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 28430043 28240043 2c4500432c261388
PC
000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4 004000e8 004000ec 004000f0 004000f4
IR1 37bdae50241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 28430043 28240043 2c4500432c261388
PC1 00000084000000880000008c00000090 00000094 000000980000009c000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4 004000e8004000ec 004000f0 004000f4
IR2 241c000037bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 3442123420481abc 2548ffdc 28430043 282400432c450043
PC2 0000008000000084 00000088 0000008c 00000090 00000094 000000980000009c000000a0 004000d0 004000d4 004000d8 004000dc 004000e0 004000e4 004000e8 004000ec004000f0
IR3 241b0000241c0000 37bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 2843004328240043
PC3 0000007c00000080 00000084 000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8 004000dc 004000e0 004000e4 004000e8004000ec
IR4 3c1d7fff 241b0000241c000037bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f80900000000 3c02f234 20011111 240a000f 3442123420481abc 2548ffdc 28430043
PC4 000000780000007c00000080 00000084 000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4004000e8
CLK
c.r.WE
time (ns)
5250.00
5500.00
5750.00
6000.00
6250.00
6500.00
delay slot
6750.00
JR PC calculation
7000.00
7250.00
7500.00
7750.00
lower bits of $9 were 10000, $1 contained 0x00001111

Note: $9 was changed in the previous instruction, but
pypassing allowed the correct shamt to be chosen
sllv $13, $1, $9
Figure 3. Trial of Some Arithmetic/Logical Instructions
cpu
Daddr
(04.ps) Sat May 4 20:50:22

00000000
Dout
00000001 00000010 f23416f6 f23416e6 d9030000 0001e000 00d90300 ffff2341 000d903011110000 00003640
00000000
00000001
00000000
0000000f d9030000 f2341234 00d90300 00001111 000d9030 f23416e6 00d90300 00003640 ffffff23 0000000f
DMC
RW 03
W
ffffff23 0000000d 11113640 11113563 f2341225
00
04
05
00000000
06
07
03
04
05
06
07
08
09
0b
0c
0d
0e
0f
10
00000001 00000010 f23416f6 f23416e6 d9030000 0001e000 00d90300 ffff2341 000d9030 11110000 00003640 ffffff23 0000000d 11113640 11113563
IR 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023 01a29824
PC 004000f8 004000fc 00400100 00400104 00400108 0040010c 00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130 00400134 00400138
IR1
304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023 01a29824
PC1 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130 00400134 00400138
IR2
2c261388 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023
PC2 004000f0 004000f4 004000f8 004000fc 00400100 00400104 00400108 0040010c 00400110 00400114 00400118 0040011c 00400120 00400124 004001280040012c00400130 00400134
IR3
2c450043 2c261388 304704d2 344304d2 384404d23c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007016c782001ed8021 004a8822
PC3
004000f0 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130
IR4
28240043 2c450043 2c261388 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021
PC4
004000ec 004000f0 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c
CLK
c.r.WE
time (ns)
7808.73
8000.00
8250.00
8500.00
8750.00
9000.00
9250.00
9500.00
9750.00
10000.00
10250.00 10411.64
addi $5,$0,0xc
addi $6,$0,0xd
add $6,$5,$6: A register just written to, and a register written to 2

instructions before are used to calculate an addition, and the
result is correct, 0xc + 0xd = 0x19, hence bypassing is operational
Figure 4: Test of Bypassing
cpu
(05.ps) Sat May 4 20:50:40
Daddr f2341225 f2341220
ffffff37 0dcbed17 000000c8 00000001
Dout 0000000f
f2341234
00000000
0000000f f2341234 0000000f 00001111 d9030000 0001e000 0000000d 00000000 ffff2341
DMC
RW
W
00000001 0000000c0000000d00000019 00000000 0000ffff ffff001a
00000001 ffff001b
0000ffff 0000000f 00000001
00
11
12
f2341225
13
14
15
16
17
18
f2341220 ffffff37 0dcbed17000000c800000001
19
00000000
1a
05
06
07
08
00000001 0000000c 0000000d 00000019 00000000 0000ffff
09
0a
0b
ffff001a 00000001
IR 01a2a02501a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b2005000c2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000 012a5820 016a6024016c6826000d7100
PC 0040013c00400140 00400144 00400148 0040014c 00400150 00400154 00400158 0040015c 00400160 00400164 004001680040016c00400170 00400174 00400178 0040017c 00400180
IR1 01a2a025 01a2a826 01a2b027 004ab82a0022c02a004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822292a0000 012a5820 016a6024 016c6826
PC1 0040013c00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170 00400174 004001780040017c
IR2 01a29824 01a2a025 01a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822292a0000 012a5820 016a6024
PC2 00400138 0040013c 00400140 00400144 00400148 0040014c 00400150 00400154 00400158 0040015c 00400160 00400164 004001680040016c00400170 00400174 00400178
IR3 004a9023 01a29824 01a2a025 01a2a826 01a2b027 004ab82a0022c02a004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000 012a5820
PC3 00400134 00400138 0040013c 00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170 00400174
IR4 004a8822 004a9023 01a29824 01a2a025 01a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000
PC4 00400130 00400134 004001380040013c00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170
CLK
c.r.WE
time (ns)
10500.00
10750.00
11000.00
11250.00
11500.00
11750.00
12000.00
12250.00
12500.00
12750.00
13000.00
lbu $11, 3($16), address in $16 is 0x11111110, so 3 + $16 = 0x11111113

The word at this location is 0x7913b5ef, so 0xef should be written to $11 in the next pipeline stage
cpu
Figure 5. Test Load & Store Instructions
effective address
(07.ps) Sat May 4 20:51:03
Daddr 11111111 11111112 11111113 11111110 11111112
11111110
11111111 11111110 11111112 11111110 11111113
11111110
11111112
Dout ffff001a00000001 ffff001b 00000001 ffff001a 7913b5ef fff001a0 00000019 ffffff00 00000019 f2341225 00000019 f2341220 00000019 00000079 00000019 00000013 00000019
load
DMC
RW 08
00
09
0a
00
0b
0c
0d
11
00
0e
06
00
0f
06
00
12
06
00
13
06
00
14
06
00
15
write to $11
write 0xef
00000013 000000b5 000000ef 00007913 0000b5ef
7913b5ef
00000019 1913b5ef 00000019 1919b5ef 00000019 191919ef 00000019 19191919 00000019 00191919
IR 960c0000 960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000 3c05aabb34a5ccdd
PC 004001c8 004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204 004002080040020c
IR1
960c0000 960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000 3c05aabb
PC1
004001c8 004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204 00400208
IR2
920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000
PC2
004001c4 004001c8 004001cc 004001d0 004001d4 004001d8 004001dc 004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204
Instruction in
MEM stage
IR3
920a0002 920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002
PC3
004001c0 004001c4004001c8004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8004001ec 004001f0 004001f4 004001f8 004001fc 00400200
Instruction in
WB stage
IR4
92090001 920a0002 920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000
PC4
004001bc 004001c0004001c4004001c8004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8004001ec 004001f0 004001f4 004001f8 004001fc
CLK
c.r.WE
time (ns)
15617.46
15750.00
16000.00
16250.00
16500.00
16750.00
17000.00
17250.00
17500.00
17750.00
18000.00
18220.37
$6 contains 0x11223344. $16 has address 0x11111110. This address has 0xaabbccdd stored at it. lwl $6, 0($16) performed at PC =
0x0040021c; result is 0xaabbccdd, as expected. lwl $6, 1($16) performed at PC=0x00400220; result is 0xbbccdddd, as expected, and
so on. lwr $6 3($16) performed at PC=0x004022c; result is 0xaabbccdd, as expected. lwr $6 1($16) at PC = 0x0040230; result is
0xaabbaabb,
cpu as expected. These test cases demonstrate the fuctioning of lwl and lwr with necesary bypassing
Daddr 11111110 aabb0000 aabbccdd 1122000011223344
Dout
11111111 11111112
11111113
ffffffb5 0000000caabb0000 00000019 11220000 aabbccdd 11223344 11111110 11111111 11111112
DMC
RW
11111110
16
(08.ps) Sat May 4 20:51:08
11111111 1111111011111112 00000001 0000000200000003
11111113
00
06
Figure 6. LWL/LWR Test
11111111 11111110 00001111 f2341234 f23416f6
00
05
06
05
06
01
02
03
W 00000019 00190019 aabb0000 aabbccdd 11220000 11223344 aabbccdd aabbccdd bbccdddd ccdddddd dddddddd aabbccdd aabbaabb aabbaaaa aaaabbcc 00000001 00000002
IR 34a5ccdd3c061122 34c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004 2005000520060006
PC 0040020c00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 004002380040023c00400240 00400244 004002480040024c00400250
IR1 34a5ccdd3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004 20050005
PC1 0040020c00400210 00400214 00400218 0040021c 00400220 00400224 00400228 0040022c 00400230 00400234 00400238 0040023c 00400240 00400244 004002480040024c
IR2 3c05aabb 34a5ccdd 3c061122 34c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004
PC2 004002080040020c00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 00400238 0040023c 00400240 00400244 00400248
IR3 8e160000 3c05aabb 34a5ccdd3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003
PC3 00400204 00400208 0040020c 00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 004002380040023c00400240 00400244
IR4 a6060002 8e160000 3c05aabb 34a5ccdd 3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002
PC4 00400200 00400204 00400208 0040020c 00400210 00400214 004002180040021c00400220 00400224 00400228 0040022c 00400230 00400234 00400238 0040023c 00400240
CLK
c.r.WE
time (ns)
18250.00
18500.00
18750.00
19000.00
19250.00
19500.00
19750.00
20000.00
20250.00
20500.00
20750.00
jump delay slot
j 4002cc
jal 4002e4
Figure 7: Test Jumps
cpu
(10.ps) Sat May 4 20:51:19
Daddr
0000001600000017 00000018 00000019 0000001a 0000001b 0000001c 0000001d 0000001e 00000000 0000001f 0000000000000002 00000004 00400000 00400308 00000000
Dout
00190019 ffffffef 00007913 ffffb5ef 7913b5ef
00000000
7fffae50 00000000 00000010 000000a4 00000010 00000001 00000002 00000004 00400000 00000000
DMC
00
RW 14
W
15
16
17
18
19
1a
1b
1c
1d
1e
10
1f
01
02
00000015 00000016 00000017 00000018 00000019 0000001a 0000001b0000001c0000001d 0000001e 004002bc 0000001f 004002d4 00000002 00000004 00400000 00400308
IR 2018001820190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 004210203c02004024420308 0040f809 00000000 006318203c020040
PC 004002980040029c 004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec 004002f0 004002f4 004003080040030c
IR1
20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b900210820 00421020 3c020040 24420308 0040f809 00000000 00631820
PC1
00400298 0040029c 004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8004002ec 004002f0 004002f4 00400308
IR2
20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 00421020 3c020040 24420308 0040f809 00000000
PC2
00400294 004002980040029c004002a0 004002a4 004002a8 004002ac 004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8004002ec 004002f0 004002f4
IR3
20160016 20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b900210820 00421020 3c020040 24420308 0040f809
PC3
00400290 00400294 00400298 0040029c 004002a0 004002a4 004002a8 004002ac 004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec 004002f0
IR4
20150015 20160016 20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 00421020 3c020040 24420308
PC4
0040028c00400290 00400294 004002980040029c004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec
CLK
c.r.WE
time (ns)
23500.00
23750.00
24000.00
24250.00
24500.00
24750.00
25000.00
25250.00
25500.00
25750.00
26000.00
bgezal $6, 400380
Figure 8. Branch Testing
cpu
(11.ps) Sat May 4 20:51:24
Daddr 00000000 00000006 004000000040032c
00000000
00000008
00000000
Dout 00000000 00000003 00400308 00400000
00000000
00000004 00000008 00000000 00000005 00000002 00000000 00000006 00000011 00000000 00000007 00000000
DMC
RW
0000000a
00000000
0000000c
00000000
0000000e
00000000
00
1f
00
03
02
00
04
W 004002f8 00000000 00000006 00400000 0040032c 0040031c 00000000 00000008
00
00000000
05
1f
00
06
1f
00
07
00
0000000a 00400354 00000000 0000000c 00400370 00000000 0000000e
IR 3c0200402442032c00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e738201ce0000500000000 00004020 19000005
PC 0040030c00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384 004003880040039c004003a0
IR1 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 00000000 00c63020 04d10005 00000000 00e73820 1ce00005 00000000 00004020
PC1 0040030c 00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384 004003880040039c
IR2 00631820 3c0200402442032c00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e738201ce0000500000000
PC2 00400308 0040030c 00400310 00400314 00400318 0040032c 00400330 00400334 004003480040034c00400350 00400364 004003680040036c00400380 00400384 00400388
IR3 00000000 00631820 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 00000000 00c63020 04d10005 00000000 00e73820 1ce00005
PC3 004002f4 004003080040030c00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384
IR4 0040f809 00000000 00631820 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e73820
PC4 004002f0 004002f4 00400308 0040030c 00400310 00400314 004003180040032c00400330 00400334 004003480040034c00400350 00400364 004003680040036c00400380
CLK
c.r.WE
time (ns)
26029.10
26250.00
26500.00
26750.00
27000.00
27250.00
27500.00
27750.00
28000.00
28250.00
28500.00
28632.01
bne $19, $19, 4004ec
$19=$19, so do not branch
Figure 9. Branch not taken Test
cpu
(15.ps) Sat May 4 20:53:48
Daddr 00000000 00000022
00000000
Dout 0000000000000011 00000020
00000000
00000024
00000000
00000012
00000000
DMC
00
00000000
IR
PC
00000000
00000013 0000002600000000
00
RW
W
00000026
11
1f
00
00000022 004004a4
12
00000000
00000000
1f
00000024 004004c0
02529020 06400005 00000000
00000000
00
13
00000000
0000002600000000
02739820 16730005
00000000
004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8004004bc004004c0004004c4004004c8004004cc 004004d0 004004d4 004004d8 004004dc 004004e0
IR1 06300005
00000000
02529020 06400005
00000000
02739820 16730005
00000000
PC1 0040049c004004a0 004004a4 004004a8 004004ac 004004b0 004004b4 004004b8004004bc004004c0 004004c4 004004c8 004004cc 004004d0 004004d4 004004d8 004004dc004004e0
IR2 0231882006300005
00000000
02529020 06400005
00000000
02739820 16730005
00000000
PC2 004004980040049c004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8 004004bc 004004c0 004004c4 004004c8 004004cc 004004d0 004004d4 004004d8004004dc
IR3 0000000002318820 06300005
00000000
02529020 06400005
00000000
02739820 16730005 00000000
PC3 00400494004004980040049c004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8 004004bc 004004c0 004004c4 004004c8 004004cc 004004d0 004004d4004004d8
IR4
00000000
02318820 06300005
00000000
02529020 06400005
00000000
0273982016730005
PC4 0040049000400494 00400498 0040049c 004004a0 004004a4 004004a8 004004ac 004004b0 004004b4 004004b8 004004bc 004004c0 004004c4004004c8004004cc 004004d0004004d4
CLK
c.r.WE
time (ns)
35750.00
36000.00
36250.00
36500.00
36750.00
37000.00
37250.00
37500.00
37750.00
38000.00
38250.00
PC1
pcff
data[29..0]
clock
NOTE: All Registers have Synchronous Clear tied to Reset
q[29..0]
data[29..0]
clock
PC1
data[31..0]
clock
DFF
DFF
data[31..0]
clock
DFF
data[29..0]
clock
DFF
IR3
q[29..0]
inst27
IR4
ff
DFF
data[31..0]
clock
inst3
DFF
data[29..0]
clock
ff
q[31..0]
PC4
pcff
q[29..0]
inst26
IR2
ff
q[31..0]
inst1
PC3
pcff
q[29..0]
inst25
IR1
ff
PC2
pcff
DFF
q[31..0]
DFF
data[31..0]
clock
inst7
q[31..0]
inst9
IF2
RD
CLK
Brcalc[29..0]
PC[29..0]
Brtaken
Iout[31..0]
Jcalc[29..0]
Jtaken
JRcalc[29..0]
JRtaken
Iaddr[31..0]
Iin[31..0]
Instr[25..21]
Instr[20..16]
Iin[31..0]
CLK
RA[4..0]
RB[4..0]
RW[4..0]
WE
W[31..0]
RDdataoutA[31..0]
RDdataoutB[31..0]
ALUdataout[31..0]
WBdataout[31..0]
ff
data[31..0]
clock
imem
Iaddr[31..0]
IMC[1..0]
PC[29..0]
Instr[31..0]
Instr[31..0]
RWaddress[4..0]
chooseRTorRDtowriteto
jal
inst
regfile
inst15
EX
DFF
q[31..0]
ALUbypass
ALUdataout[31..0]
ALUbypassB
WBbypass
JRPCcalc[29..0]
WBbypassB
JPCcalc[29..0]
SignExtendImmediate BrPCcalc[29..0]
ImmediateInstruction
lui
shiftVar
ALUctrl[10..0]
inst4
A[31..0]
B[31..0]
ff
data[31..0]
clock
MEMdatain[31..0]
agz
agez
alz
alez
aeb
aneb
DFF
q[31..0]
inst5
MEM
ff
PC[29..0]
toWB[31..0]
WBdataout[31..0] PCplusEight[31..0]
ff
data[31..0]
clock
LwlLwrcalc[31..0]
din[31..0]
signed
CTRLbh
CTRLword
MEMWbbypass
chooseLwlorLwr
DFF
q[31..0]
inst10
ff
data[31..0]
clock
DFF
inst8
data[31..0]
clock
dout[31..0]
WBdataout[31..0]
A[31..0]
isLwlLwr
WB_EXorMEM
DFF
q[31..0]
link
inst21
ff
inst19
DFF
q[31..0]
inst24
inst6
inst16
WB
M[31..0]
PCplus8[31..0]
LwlLwrcalc[31..0]
inst22
data[31..0]
clock
inst31
GND
DFF
q[31..0]
inst12
ff
EXdatain[31..0]
daddr[31..0]
q[31..0]
data[31..0]
clock
dmem
daddr[31..0]
dout[31..0]
DMC[1..0]
din[31..0]
inst20
ff
data[31..0]
clock
DFF
q[31..0]
inst14
Instr[25..21]
Instr[20..16]
forwarder
RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE
forwarder
Instr[20..16]
inst32
RW1
5ff
control
aneb
Brtaken
aeb
Jtaken
alez
JRtaken
alz
WE
agez
DMC[1..0]
agz
signed
InstrRD[31..0]
CTRLbh
InstrEX[31..0]
CTRLword
InstrMEM[31..0]
InstrWB[31..0] chooseRTorRDtoWriteto
jal_bltzal_bgezal
link
WB_EXorMEM
chooseLwlorLwr
isLwlLwr
SignExtendImmediate
lui
shiftVar
ALUc[10..0]
data[4..0]
clock
DFF
Instr[25..21]
Instr[20..16]
q[4..0]
inst28
WE1
1ff
data
clock
inst11
DFF
forwarder
RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE
RW2
5ff
data[4..0]
clock
inst13
DFF
q[4..0]
inst30
WE3
1ff
DFF
RW3
5ff
data[4..0]
clock
WE2
1ff
data
clock
inst36
DFF
q[4..0]
inst29
inst33
RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE
data
clock
DFF
inst23
inst2
TITLE
Pipelined MIPS CPU..
COURSE
Cornell University - ECECS 314..
DESIGNERS
Alan Leung & Lav Varshney..
NUMBER
1.00
DATE
Sat May 04 16:44:03 2002
REV
SHEET
A
OF
NOTE: PC has Synchronous Clear tied to Reset
WIRE
Iaddr[31..2]
inst2
Iaddr[1..0]
inst6
CLK
NOT
INPUT
VCC
inst11
OUTPUT
Iaddr[31..0]
OUTPUT
PC[29..0]
GND
clock
data[29..0]
q[29..0]
DFF
pc
VCC
incrementer
cin
dataa[29..0]
0
A
result[29..0]
A+B
B
inst
Parameter Value
WIDTH
30
BUSMUX
dataa[]
Brcalc[29..0]
INPUT
VCC
datab[]
inst22
Brtaken
INPUT
VCC
Jcalc[29..0]
INPUT
VCC
result[]
Parameter Value
WIDTH
30
sel
BUSMUX
dataa[]
datab[]
inst23
Jtaken
INPUT
VCC
JRcalc[29..0]
INPUT
VCC
JRtaken
INPUT
VCC
Iin[31..0]
INPUT
VCC
result[]
Parameter Value
WIDTH
30
sel
BUSMUX
dataa[]
datab[]
inst24
result[]
sel
OUTPUT
Iout[31..0]
TITLE
Instruction Fetch for CPU..

DESIGNERS
COURSE
NUMBER
DATE
1.00
Sat May 04 14:39:14 2002
REV
SHEET
A
OF
Instr[20..16]
INPUT
VCC
Instr[15..11]
Instr[31..0]
Parameter Value
WIDTH
5
BUSMUX
RT
RD
dataa[]
datab[]
inst31
chooseRTorRDtowriteto
Parameter Value
WIDTH
5
result[]
BUSMUX
sel
dataa[]
INPUT
VCC
VCC
datab[]
inst32
jal
result[]
RWaddress[4..0]
OUTPUT
sel
INPUT
VCC
TITLE
Register/Operand Fetch Stage for CPU..
COURSE

DESIGNERS
NUMBER
1.00
DATE
Sat May 04 14:34:28 2002
REV
SHEET
A
OF
Parameter Value
WIDTH
5
BUSMUX
Instr[10..6]
dataa[]
RDdataout[4..0]
datab[]
Parameter Value
WIDTH
5
sel
inst2
shiftVar
result[]
INPUT
VCC
BUSMUX
dataa[]
10000
16
inst9
datab[]
5
inst7
result[]
sel
INPUT
VCC
lui
Bpccalc
cin
VCC
PC[29..0]
dataa[29..0]
INPUT
VCC
datab[29..0]
A
result[29..0]
A+B
B
A[31..2]..
inst
OUTPUT
BrPCcalc[29..0]
OUTPUT
JRPCcalc[29..0]
OUTPUT
agz
OUTPUT
agez
OUTPUT
alz
OUTPUT
alez
OUTPUT
aeb
OUTPUT
aneb
OUTPUT
JPCcalc[29..0]
OUTPUT
ALUdataout[31..0]
OUTPUT
MEMdatain[31..0]
zcomp
signed compare
RDdataoutA[31..0]
INPUT
VCC
BUSMUX
dataa[]
datab[]
INPUT
VCC
ALUbypass
INPUT
VCC
dataa[]
datab[]
sel
inst32
WBbypass
result[]
result[]
inst23
bcomp
compare
dataa[31..0] aeb
datab[31..0] aneb
sel
inst33
WIRE
Instr[31..0]
SextImm[29..0]
BUSMUX
Parameter Value
WIDTH
32
PC[29..26]
Parameter Value
WIDTH
32
agb
dataa[31..0] ageb
alb
datab[]=0
aleb
inst19
JPCcalc[29..26]
inst6
Instr[25..0]
INPUT
VCC
WIRE
JPCcalc[25..0]
inst5
alu
ALUctrl[10..0]
ALUc[10..0]
shamt[4..0]
INPUT
VCC
A[31..0]
sext2
Instr[15..0]
SignExtendImmediate
BUSMUX
dataa[]
imm[15..0] out[31..0]
control
INPUT
VCC
datab[]
inst4
inst3
B[31..0]
result[]
result[31..0]
inst8
sel
Parameter Value
WIDTH
32
INPUT
VCC
Parameter Value
WIDTH
32
BUSMUX
RDdataoutB[31..0]
WBdataout[31..0]
INPUT
VCC
INPUT
VCC
dataa[]
datab[]
inst35
WBbypassB
ALUdataout[31..0]
INPUT
VCC
INPUT
VCC
result[]
Parameter Value
WIDTH
32
BUSMUX
sel
dataa[]
datab[]
inst36
ALUbypassB
INPUT
VCC
result[]
sel
TITLE
Execution Stage for CPU..

DESIGNERS
COURSE
NUMBER
1.00
DATE
Sat May 04 15:02:31 2002
REV
SHEET
A
OF
daddr[31..0]
EXdatain[31..0]
INPUT
VCC
INPUT
VCC
OUTPUT
dout[31..0]
OUTPUT
toWB[31..0]
OUTPUT
LwlLwrcalc[31..0]
MemLoadShifter
din[31..0]
signed
CTRLbh
CTRLword
INPUT
VCC
INPUT
VCC
INPUT
VCC
INPUT
VCC
Parameter Value
WIDTH
32
din[31..0]
toWB[31..0]
signed
CTRLbh
CTRLword
select[1..0]
daddr[1..0]
inst
BUSMUX
dataa[]
WBdataout[31..0]
INPUT
VCC
MEMWbbypass
INPUT
VCC
chooseLwlorLwr
INPUT
VCC
datab[]
inst4
LwlLwr
0
result[]
r[31..0]
b[31..0]
off[1..0]
chooseLwlorLwr
sel
daddr[1..0]
out[31..0]
inst3
PC[29..0]
INPUT
VCC
PCplus8
a[31..2]
a[31..0] dataa[31..0]
a[1..0]
8
GND
A
result[31..0]
A+B
B
OUTPUT
PCplusEight[31..0]
inst1
TITLE
Memory Stage for CPU..
COURSE

DESIGNERS
NUMBER
DATE
1.00
Sat May 04 15:37:59 2002
REV
SHEET
A
OF
A[31..0]
INPUT
VCC
Parameter Value
WIDTH
32
BUSMUX
M[31..0]
LwlLwrcalc[31..0]
INPUT
VCC
INPUT
VCC
BUSMUX
dataa[]
0
dataa[]
0
datab[]
inst9
isLwlLwr
Parameter Value
WIDTH
32
result[]
sel
datab[]
inst7
result[]
sel
Parameter Value
WIDTH
32
BUSMUX
INPUT
VCC
dataa[]
WB_EXorMEM
INPUT
VCC
PCplus8[31..0]
INPUT
VCC
link
INPUT
VCC
datab[]
inst8
result[]
WBdataout[31..0]
OUTPUT
sel
TITLE
Writeback Stage for CPU..
COURSE

DESIGNERS
NUMBER
DATE
1.00
Sat May 04 16:46:07 2002
REV
SHEET
A
OF

Research Paper

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Research Paper

Загружено:

Авторское право:

Доступные форматы

Project 4

MIPS Pipelined Processor

1. Arithmetic Logical Unit (ALU)

1.1 Carry Lookahead Adder-Subtractor

1.2 Slt and Sltu Operations

1.3 Logical Operations

1.4 Shift Operations

2. Central Processing Unit (CPU) Datapath

2.1 Instruction Fetch

2.2 Decode/Operand Fetch

2.6 Connection of the Five Stages

3.1 Instruction Decoding Control

3.2 Data Forwarding Unit

Table 3-2: Control Equations

RDInstrX refers to the Xth bit of the output of IR

Boot Code uses addi to write 0 to the registers

Figure 1. Beginning of Boot Code

(01.ps) Sat May 4 20:49:07

00000004 000000080000000c00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030 00000034

00000004 00000008 0000000c 00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030

lui $29, $0, 0x7fff

ori $29, $29, 0xae50

Figure 2. End of Boot Code

(03.ps) Sat May 4 20:50:13

7fffae50 7fffae54 7fffae58

f2340000 00001111 0000000f f2341234 f2342cf0

004000d0 00000000 f2340000 00000000 f2342cf0

000000a4 00000000 f2340000 00001111 0000000f f2341234 f2342cf0

lower bits of $9 were 10000, $1 contained 0x00001111

sllv $13, $1, $9

Figure 3. Trial of Some Arithmetic/Logical Instructions

(04.ps) Sat May 4 20:50:22

ffffff23 0000000d 11113640 11113563 f2341225

add $6,$5,$6: A register just written to, and a register written to 2

Figure 4: Test of Bypassing

(05.ps) Sat May 4 20:50:40

Daddr f2341225 f2341220

ffffff37 0dcbed17 000000c8 00000001

0000000f f2341234 0000000f 00001111 d9030000 0001e000 0000000d 00000000 ffff2341

00000001 0000000c0000000d00000019 00000000 0000ffff ffff001a

0000ffff 0000000f 00000001

f2341220 ffffff37 0dcbed17000000c800000001

00000001 0000000c 0000000d 00000019 00000000 0000ffff

lbu $11, 3($16), address in $16 is 0x11111110, so 3 + $16 = 0x11111113

Figure 5. Test Load & Store Instructions

(07.ps) Sat May 4 20:51:03

Daddr 11111111 11111112 11111113 11111110 11111112

11111111 11111110 11111112 11111110 11111113

00000013 000000b5 000000ef 00007913 0000b5ef

ffffffb5 0000000caabb0000 00000019 11220000 aabbccdd 11223344 11111110 11111111 11111112

(08.ps) Sat May 4 20:51:08

11111111 1111111011111112 00000001 0000000200000003

Figure 6. LWL/LWR Test

11111111 11111110 00001111 f2341234 f23416f6

jump delay slot

Figure 7: Test Jumps

(10.ps) Sat May 4 20:51:19

00190019 ffffffef 00007913 ffffb5ef 7913b5ef

bgezal $6, 400380

Figure 8. Branch Testing

(11.ps) Sat May 4 20:51:24

Daddr 00000000 00000006 004000000040032c

Dout 00000000 00000003 00400308 00400000