Вы находитесь на странице: 1из 23

Project 4

MIPS Pipelined Processor


ECECS 314, Computer Organization
Alan Leung (geke00) & Lav Varshney (lav)
May 4, 2002

1. Arithmetic Logical Unit (ALU)


Our implementation for the ALU involved making functional blocks that performed each operation separately, and
then using the control inputs to combine these different signals into a single output. It was necessary to create many
sub-blocks that were placed in myparts.cast. See Section 5 for a listing of blocks in myparts.cast. We had
experimented with a bit-slice implementation that combined 32 bit-slices using various methods, but this
implementation was quite slow, hence we moved to the current block implementation. The bit-slice implementation
was particularly slow and redundant in shifting arithmetically because each bit slice had to be given the sign bit.
This large use of the sign bit causes the signal to encounter a very large capacitance, requiring a great amount of
time to drive the signal from GND to Vdd, and from Vdd to GND. Each of the functional blocks is described in the
subsections. To combine the various outputs of the functional blocks together, the outputs are ANDed with their
select line, and then ORed together. This is done with SuperBusMux11.

1.1 Carry Lookahead Adder-Subtractor


We designed an adder/subtractor for use with add, sub, slt, and sltu ALU operations. It was a carry lookahead adder.
We designed a Full Adder that produces generate and propagate signals, rather than a carryout. We then designed a
4-bit carry lookahead unit, combining the outputs of four full adders, which generates carry, group generate, and
group propagate bits. We then designed a 16-bit carry lookahead unit, which combines the outputs of four 4-bit
carry lookahead units, producing group carries based on the group generate and group propagate bits from the 4-bit
carry lookahead units. Then, combining 32 full adders, 8 4-bit carry lookahead units (one for each block of 4 full
adders), and 2 16 bit carry lookahead units (1 for each block of 4 4-bit carry lookahead units), we created a 32-bit
carry lookahead adder.
To create a 32 bit adder/subtractor, the 2nd input's bits were XORed with the subtract select bit to negate the inputs
when subtracting. The subtract select bit was also fed in as the carry in, to complete the 2's complement negation.

1.2 Slt and Sltu Operations


The calculation for slt involves a conditional structure: if a is negative, and b is non-negative, then a < b; if b is
negative, and a is non-negative, then a > b; if both are the same sign, then subtract a - b, and the sign bit of this is the
result bit. We check for negative/non-negative by looking at the top bit of a and b. The top 31 bits of the output are
grounded and the result bit is tied to the least significant output bit. To calculate sltu, subtract a - b, and the
inversion of the end carry of this subtraction is the result bit. The top 31 bits are grounded and the result bit is tied
to the least significant output bit. The code for slt and sltu are contained within the definition for ALU in alu.cast.
We decided to place the code in this final stage of organization since these operations depend on the results of
addition/subtraction, and therefore cannot be effectively modularized to be performed in parallel with the other 32bit functional blocks. Thus, we expected that slt and sltu may have been in the critical path of our ALU.

1.3 Logical Operations


To perform the logical operations: and, xor, or, & nor, the corresponding logical operation is performed bitwise.
This is done in CAST using a loop with 32 iterations, one for each of the 32 bits to be operated on.

1.4 Shift Operations


To perform left logical shifting, we use five 2-to-1 32-bit bus multiplexers that are "rippled" together. The first
multiplexer in the chain has its upper bits (corresponding to 0 select) tied to an unshifted version of the b input. Its
bottom bits (corresponding to the 1 select) are tied to a version of the b input that is shifted left logically by 16-bits.
The select bit for this first multiplexer is the most significant shift amount bit (c.sa[4]). We can do this because we
know if the top shift amount bit is 1, then we will have to shift by at least 16 bits. The second multiplexer has its top
bits tied to the output of the first multiplexer, and the bottom bits are tied to a left-logical-shifted version of the
output from the first multiplexer. The select line is tied to the second-most-significant shift amount bit. This time,
the shift is by 8 bits. We can do this because we know if the second-most-significant shift amount bit is 1, then we
will have to shift by an additional 8 bits. We continue this pattern with 3 more multiplexers, feeding in an unshifted
and a shifted version (by 4, 2, and 1 -bits, respectively) of the output from the previous multiplexer, and using the
next less significant bit as the multiplexer select. The end result of this chain of multiplexers is shifted by the proper

amount. Note that by "left-logical-shifted" I mean that we fill the remainder bits with 0, which is done by tying the
corresponding multiplexer inputs to GND. Note that this shifter requires only 5 32-bit bus multiplexers. An
alternate implementation would have used a barrel shifter that would require 32 32-to-1 multiplexers, which would
have more than tripled the number of transistors needed for the ALU.
To perform right logical shifting, we use the same method as above, with the exception that we tie right-logicalshifted rather than left-logical shifted versions into the multiplexers.
To perform right arithmetic shifting, we use the same method as above again, but we set the remainder bits to the top
bit of the b input (this is how we sign extend), rather than simply to GND. This is done by tying the corresponding
multiplexer inputs to the most significant input bit (b.d[31]). When experimenting with using a barrel shifter for
possible speed gains, we found that the speed was in fact dramatically decreased, most likely due to the fact that any
change in the sign bit of b would have to propagate across the multiple inputs on 31 of the 32 large multiplexers. As
noted above, the huge resulting input capacitance causes long delays. Therefore, we found that a barrel shifter is an
inefficient implementation for wide ALUs.

1.5 Delay
The largest delay that we found was in the slt operation, and this was a delay of 15.10 ns. This seems reasonable,
because slt relies on the adder/subtractor to produce a result before the block can produce a stable result. Thus, we
have a serial combination of two blocks that takes the delay of both circuits for the output to stabilize. To find the
largest delay, first set a and b to undefined states. Then we stepped and fed slt the values 0x7fffffff and 0x7fffffff,
because subtracting these numbers will cause carries on the most bits for subtraction between positive numbers,
which is the worst case for the carry lookahead hardware. Since subtracting is an expensive operation, we tried
subtraction of 0xffffffff and 0xffffffff. This result causes carries on all bits. However, the delay for this was only
14.60 ns, which is less than the worst-case for slt. The worst-case delay is included in the test file as the final test
case.

2. Central Processing Unit (CPU) Datapath


The CPU Datapath is broken up into five pipeline stages, the Instruction Fetch (IF) stage; the Decode/Operand Fetch
stage (RD); The Execution stage (EX); the Memory (MEM) stage; and the Writeback (WB) stage. Each is described
in its respective subsection. The testing that is mentioned is done by setting control inputs manually, rather than
with any control. The linkage of the stages is described in Section 2.6.

2.1 Instruction Fetch


The IF stage takes 3 30-bit buses as input, and 3 control inputs. These buses are the computations for the top 30 bits
for the new PC from the EX stage for branches, jumps, and jump registers respectively. Internally, there is a PC + 4
computation that happens. Multiplexers are used to select among these 4 possible new PC calculations, using the
control inputs as select lines. There is a 30-bit negative-edge triggered register, which is the PC. The PC
automatically sets to zero, when Reset. This stage also has communication with the Instruction Memory, from
which new instructions are retrieved and passed along. The address bits sent to the memory take the top 30 address
bits from the PC, while the lower 2 bits are grounded, to ensure word alignment. The instruction fetched from the
Instruction Memory is passed along to the rest of the stages, as is the PC value.

2.2 Decode/Operand Fetch


The RD stage has a set of multiplexers, which selects whether to write results to the register in the rt slot in the
instruction, the register in the rd slot in the instruction, or $31, based on two control inputs. The immediate sign
extender could have been placed in this stage, but we decided to place it in the next stage, due to the fact we believe
the critical path may lie in the Register File access, and for simplicitys sake.

2.3 Execution
The EX stage takes the instruction and the PC value as input. In addition, it takes the two 32 bit data outputs of the
previous stage, the operands read from the register file. It also takes outputs from itself and the WB stage, for data
hazard operations. There are also many control signals. The outputs are the new PC calculations for branches,
jumps, and jump registers. Another output is the data that is calculated with the ALU. The data that is passed from

the RD stage, is also passed through for possible storage in the data memory, though with possible bypassing. Six
control outputs are also produced, that are a product of comparators; these are used for branch taken determination.
The main internal block that is used is the ALU, as described in Section 1. Multiplexers are required to handle the
bypassing that happens to avoid data hazards. Two MUXes are required for each of the data inputs to the block, to
choose between the RD output, the EX output, and the WB output. An additional MUX is required to choose
between data and immediates for one of the inputs to the ALU. A sign extender is also necessary, to sign extend
immediates when they are to be sign extended. To determine the shift amount, we are using a set of multiplexers to
choose either 10000 for lui; the shamt in the instruction for sll, srl, or sra; the lower five bits of the RD stages
output for the variable shifts. It is necessary to determine the shift amount as a part of Datapath, rather than control
because bypassing is necessary for the variable shift shift amounts. We are using a separate adder to perform the
branch PC calculation, for speed concerns. We are also using a comparator to determine the equality and relation to
zero.

2.4 Memory
The MEM stage has controls to the data memory. In terms of datapath elements, the MEM stage performs the PC +
8 computation for the "and link" instructions. For load instructions, this stage performs the decoding and signextending of the output from memory, with respect to the specific load instruction. The decoder/sign-extender is in
essence a shifter that right shifts in byte increments, either logically or arithmetically, depending on the instruction.
This stage also performs the lwl and lwr decoding, as described in Section 2.4.1.
2.4.1 LWL and LWR implementation
We implemented a block that takes the data read from memory and the data from the rt register. This block is
actually made from two other blocks, one that calculates the value for lwl, and one that calculates the value for lwr.
Each of these blocks is implemented by a set of 4 4-to-1 8-bit bus muxes, one for each byte in the result. The b3,
b2, and r3, r2, bytes are connected to these muxes such that by using the offset value as the select line on these
muxes, the result of all four muxes is the 32-bit result of the operation. Finally, the outputs from the lwl block and
the lwr block are muxed together in a 2-to-1 32-bit bus selected by a control bit that determines whether the
operation was an lwl instruction or an lwr instruction. The output of this block is our final calculated value for
lwl/lwr. To implement the bypassing, we added a bypass mux on the r input of this lwl/lwr block. This bypass mux
is connected to the rt value from the EX stage, and also a bypass from the WB writeback line. This bypass allows
two consecutive instructions from this instruction class to be executed.

2.5 Writeback
The writeback stage, WB, has a set of multiplexers, which makes up the data path portion of it. The multiplexers
select which data to writeback: the load output from the MEM stage, the lwl/lwr output of the MEM stage, the
output from the EX stage, or the PC + 8, depending on which instruction it is.

2.6 Connection of the Five Stages


The stages are all connected inside of cpu.cast. The pipeline registers for the instructions, the PC, and various other
flopped data values are also implemented within cpu.cast. The 2 32-bit operands fetched from the register file are
flopped between the RD and EX stage. The memory to be stored in memory, and the ALU computation, both 32
buses are flopped between the EX and MEM stages. The ALU computation is further flopped to the WB stage. The
PC + 8 computation and the data read from the data memory are flopped from the MEM to WB stage.

3. CPU Control
The controls that are based on a single stages instructions are located in control.cast, and are described in Section
3.1. The forwarding controls that are needed for data bypassing for data hazard avoidance are in the data forwarding
unit, forwarding.cast. This is described in Section 3.2. Most of the control signals calculations are performed
instantaneously, rather than performing all control calculations in RD and flopping the results. To do this, many
instructions are taken from IR, IR1, IR2, IR3, and IR4. It was felt this was a better approach than flopping the
controls through, because of the savings of extraneous registers, and the fact that some controls (such as branch
taken) could not be calculated at the beginning.

3.1 Instruction Decoding Control


A list of control lines that are needed for each stage, and the response for each implemented instruction are shown in
Table 3-1. The resulting logic equations that were derived are shown in Table 3-2.

3.2 Data Forwarding Unit


There are essentially three data forwarding units. One of these is to deal with the data hazard associated with
consecutive ALU operations using a register written in the first instruction as the operand for the second. This data
forwarding unit takes as input the rs and rt fields for the EX instruction, and the write address for the MEM stage,
and determines whether to forward by comparing the addresses (i.e., if the addresses are the same, then we must
forward). The other forwarding unit deals with data hazards associated with a register write operation followed 2
stages later by an operation that uses that register as an operand. This forwarder also takes the rs and rt fields for the
EX stage, but the write address from the WB stage. It works the same way. The last data forwarding unit
corresponds to the lwl/lwr calculation in the MEM stage. This forwarder takes only the rt field from the MEM stage,
and the write address from the WB stage. Once again, this forwarding works the same as described above.

4. Testing
A great deal of testing was performed on the individual blocks and on the final datapath. The details of the block
testing are beyond the scope of this design report. The final datapath testing is discussed only briefly, as further
discussion would also be beyond the scope of the report. To test the arithmetic and logical style instructions, one or
more instructions were issued of each type, such that there were no data hazards. To test the bypassing and
forwarding, a set of operations were written that had many data hazards that had to be resolved. The data to be
written and the address to which to be written in the Register File were examined in the waveform viewer. To test
load instructions, a known word was stored in memory, and then it was retrieved and stored in the register file. To
test the store instructions, storage was performed, and then the stored was loaded. In addition to the register file
inputs, the data memorys data and control inputs were also examined. To test branches and jumps, the PC was
examined. For the linking branches and jumps, the storage of PC + 8 was also ensured. A few of the interesting test
cases waveforms are shown in attached figures.

5. Sub-blocks Defined
A listing of the parts, in addition to the standard set of 5 stages, that were defined is shown in Table 5.1.
Block
Location
Description
Nand3
myparts.cast
3 input NAND
Nor3
myparts.cast
3 input NOR
Xor2
myparts.cast
2 input XOR
Xnor2
myparts.cast
2 input XNOR
And2
myparts.cast
2 input AND
And3
myparts.cast
3 input AND
And4
myparts.cast
4 input AND
Nand4
myparts.cast
4 input NAND
And5
myparts.cast
5 input AND
And6
myparts.cast
6 input AND
And7
myparts.cast
7 input AND
Or2
myparts.cast
2 input OR
Or3
myparts.cast
3 input OR
Or4
myparts.cast
4 input OR
Or5
myparts.cast
5 input OR
Or6
myparts.cast
6 input OR
Or10
myparts.cast
10 input OR
Or11
myparts.cast
11 input OR
Mux
myparts.cast
2 to 1 single-bit multiplexer
BusMux32
myparts.cast
2 to 1 32-bit multiplexer
BusMuxN
myparts.cast
2 to 1 N-bit multiplexer

Mux4
Mux32
And32ALU
Xor32ALU
Or32ALU
Nor32ALU
Srl32
Sll32
Sra32
Fagp
4bitCarryLookaheadUnit
16bitCarryLookaheadUnit
32bitAdder
30bitAdder
32bitAdderSubtractor
NodeMatrix
SuperBusMux11
Zcomparator
Bcomparator
4to1BusMux8
MemLoadShifter
Register
Sext2
Incrementer
Lwl
Lwr
LwlLwr
SubControl
Excontrol
Control
RWcomparator
Forwarder

myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
myparts.cast
subcontrol.cast
excontrol.cast
control.cast
forwarding.cast
forwarding.cast

4 to 1 single-bit multiplexer
32 to 1 single-bit multiplexer
32 bit bitwise AND
32 bit bitwise XOR
32 bit bitwise OR
32 bit bitwise NOR
32 bit logical right shifter
32 bit logical left shifter
32 bit arithmetic right shifter
Full adder, that produces generate and propagate
First level of carry lookahead logic
Second level of carry lookahead logic
32 bit carry lookahead adder
30 bit carry lookahead adder
32 bit carry lookahead combination adder/subtractor
Definition for a matrix of nodes
Multiplexes 11 32-bit inputs, with 11 one-hot select lines
Determines equality/inequality with zero of 32 bit inputs
Determines equality/inequality of 2 32 bit inputs
4 to 1 8-bit multiplexer
Shifter that aligns the output from memory loads
N-bit positive edge-triggered Register with synchronous clear
16 to 32 bit sign/zero extender
30 bit incrementer
Performs the load word left operation
Performs the load word right operation
Multiplexer that chooses between lwl and lwr
Control logic for IF, RD, MEM, and WB stages
Control logic for EX stage
Combines SubControl and Excontrol
5 bit equivalence comparator
Control logic to handle data forwarding
Table 5-1: Parts

Table 3-2: Control Equations

RDInstrX refers to the Xth bit of the output of IR


EXInstrX refers to the Xth bit of the output of IR1
MEMInstrX refers to the Xth bit of the output of IR2
WBInstrX refers to the Xth bit of the output of IR3
aeb, aneb, alez, agez, alz, & agez are inputs to the control, denoting equality/inequality of operands
-----------------------------------------------------------------------------------------------------------------------------------IF------------------------------------Brtaken = (aeb & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & !EXInstr27 & !EXInstr26)| (aneb & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & !EXInstr27 &
EXInstr26)| (alez & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & EXInstr27 & !EXInstr26)| (agz & !EXInstr31 & !EXInstr30 & !EXInstr29 & EXInstr28 & EXInstr27 &
EXInstr26)| (agez & !EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & !EXInstr27 & EXInstr26 & EXInstr16)| (alz & !EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28
& !EXInstr27 & EXInstr26 & !EXInstr16)
Jtaken = (!EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & EXInstr27)
JRtaken = special & !EXInstr5 & !EXInstr4 & EXInstr3& !EXInstr2 & !EXInstr1

RD------------------------------------WE = (!RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & RDInstr27 & RDInstr26) | (!RDInstr31 & !RDInstr30 & RDInstr29) | (RDInstr31 & !RDInstr30 & !RDInstr29) |
(special & !(!RDInstr5 & !RDInstr4 & RDInstr3& !RDInstr2 & !RDInstr1 & !RDInstr0)) | (!RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & !RDInstr27 & RDInstr26 &
RDInstr20)
chooseRTorRDtoWriteto = !RDInstr31 &!RDInstr30 &!RDInstr29 &!RDInstr28 &!RDInstr27
jal_bltzal_bgezal = !RDInstr31 & !RDInstr30 & !RDInstr29 & !RDInstr28 & RDInstr26
EX------------------------------------[special = (!EXInstr31 & !EXInstr30 & !EXInstr29 & !EXInstr28 & !EXInstr27 & !EXInstr26)]

SignExtendImmediate = (!EXInstr31 & !EXInstr30 & !EXInstr29) | (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28) | (EXInstr31) | (!EXInstr31 & !EXInstr30 & !EXInstr29
& !EXInstr28 & !EXInstr27 & EXInstr26)
ImmediateInstruction = (!EXInstr31 & !EXInstr30 & EXInstr29) | (EXInstr31)
lui = EXInstr29
shiftVar = EXInstr2

add = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & !EXInstr27) | (EXInstr31) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr2 & !EXInstr1))

and = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & !EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & !EXInstr1 & !EXInstr0))

xor = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & EXInstr1 & !EXInstr0))

or = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & !EXInstr27 & EXInstr26) | (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & !EXInstr1 & EXInstr0))
nor = (special & (EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr2 & EXInstr1 & EXInstr0))
sub = special & (EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr2 & EXInstr1)

sltu = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & EXInstr27 & EXInstr26) | (special & (EXInstr5 & !EXInstr4 & EXInstr3 & !EXInstr2 & EXInstr1 & EXInstr0))

slt = (!EXInstr31 & !EXInstr30 & EXInstr29 & !EXInstr28 & EXInstr27 & !EXInstr26) | (special & (EXInstr5 & !EXInstr4 & EXInstr3 & !EXInstr2 & EXInstr1 & !EXInstr0))
sra = special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr1 & EXInstr0)
srl = special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & EXInstr1 & !EXInstr0)

sll = (!EXInstr31 & !EXInstr30 & EXInstr29 & EXInstr28 & EXInstr27 & EXInstr26) | (special & (!EXInstr5 & !EXInstr4 & !EXInstr3 & !EXInstr1 & !EXInstr0))
MEM------------------------------------DMC0 = (RDInstr31 & !RDInstr30 & RDInstr29 & RDInstr26)
DMC1 = (RDInstr31 & !RDInstr30 & RDInstr29) & !(!RDInstr27 & RDInstr26)
signed = MEMInstr31 & !MEMInstr28
CTRLbh = MEMInstr31 & MEMInstr26
CTRLword = MEMInstr27 & MEMInstr26
chooseLwlorLwr = MEMInstr28

WB------------------------------------link = (!WBInstr31 & !WBInstr30 & !WBInstr29 & !WBInstr28 & WBInstr27) | (!WBInstr31 & !WBInstr30 & !WBInstr29 & !WBInstr28 & !WBInstr27 & WBInstr26) | (special
& !WBInstr5 & !WBInstr4 & WBInstr3)
WB_EXorMEM = WBInstr31
isLwlLwr = WBInstr31 & !WBInstr30 & !WBInstr29 & WBInstr27 & !WBInstr26

Boot Code uses addi to write 0 to the registers

Figure 1. Beginning of Boot Code

cpu

(01.ps) Sat May 4 20:49:07

Daddr

00000000

Dout

00000000

DMC

00

RW XX

00

01

02

03

04

05

06

07

08

09

0a

0b

0c

00000000

IR XXXXXXXX24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000240c0000240d0000 240e0000 240f0000
PC XXXXXXXX00000000 00000004 000000080000000c00000010 00000014 000000180000001c00000020 00000024 00000028 0000002c 00000030 00000034 00000038 0000003c
IR1

00000000 24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000 240d0000 240e0000 240f0000

PC1

00000000

00000004 00000008 0000000c 00000010 00000014 000000180000001c00000020 00000024 00000028 0000002c 00000030 00000034 00000038 0000003c

IR2

00000000

24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000 240d0000 240e0000

PC2

00000000

00000004 000000080000000c00000010 00000014 00000018 0000001c 00000020 00000024 00000028 0000002c 00000030 00000034 00000038

IR3

00000000

24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000240c0000240d0000

PC3

00000000

00000004 000000080000000c00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030 00000034

IR4

00000000

24000000 24010000 24020000 24030000 24040000 24050000 24060000 24070000 24080000 24090000 240a0000 240b0000 240c0000

PC4

00000000

00000004 00000008 0000000c 00000010 00000014 000000180000001c00000020 00000024 000000280000002c00000030

CLK
c.r.WE
time (ns)
0.00

250.00

500.00

750.00

1000.00

1250.00

1500.00

1750.00

2000.00

2250.00

2500.00

lui $29, $0, 0x7fff

PC + 8 stored to $31

jalr $31, $1

sw $0, 4($29)

ori $29, $29, 0xae50

Figure 2. End of Boot Code

cpu

(03.ps) Sat May 4 20:50:13

Daddr

00000000

7fffae50

Dout

00000000

7fff0000

DMC
RW

00000000

7fffae50 7fffae54 7fffae58

W 7fff0000

1b

1c

00000000

f2340000 00001111 0000000f f2341234 f2342cf0

00000000

00
1d

00000000

00
1d
7fffae50

1e

1f

00000000

ffffffeb 00000001

004000d0 00000000 f2340000 00000000 f2342cf0

00

00000000

00
00

1f

00000000

00

02

01

0a

02

08

000000a4 00000000 f2340000 00001111 0000000f f2341234 f2342cf0

03
ffffffeb 00000001

IR

241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 28430043 28240043 2c4500432c261388

PC

000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4 004000e8 004000ec 004000f0 004000f4

IR1 37bdae50241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 28430043 28240043 2c4500432c261388

PC1 00000084000000880000008c00000090 00000094 000000980000009c000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4 004000e8004000ec 004000f0 004000f4
IR2 241c000037bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 3442123420481abc 2548ffdc 28430043 282400432c450043
PC2 0000008000000084 00000088 0000008c 00000090 00000094 000000980000009c000000a0 004000d0 004000d4 004000d8 004000dc 004000e0 004000e4 004000e8 004000ec004000f0
IR3 241b0000241c0000 37bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f809 00000000 3c02f234 20011111 240a000f 34421234 20481abc 2548ffdc 2843004328240043
PC3 0000007c00000080 00000084 000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8 004000dc 004000e0 004000e4 004000e8004000ec
IR4 3c1d7fff 241b0000241c000037bdae50 241e0000 241f0000 afa00000 afa00004 afa00008 0020f80900000000 3c02f234 20011111 240a000f 3442123420481abc 2548ffdc 28430043

PC4 000000780000007c00000080 00000084 000000880000008c00000090 00000094 00000098 0000009c 000000a0 004000d0 004000d4 004000d8004000dc004000e0 004000e4004000e8
CLK
c.r.WE
time (ns)
5250.00

5500.00

5750.00

6000.00

6250.00

6500.00

delay slot

6750.00

JR PC calculation

7000.00

7250.00

7500.00

7750.00

lower bits of $9 were 10000, $1 contained 0x00001111


Note: $9 was changed in the previous instruction, but
pypassing allowed the correct shamt to be chosen

sllv $13, $1, $9

Figure 3. Trial of Some Arithmetic/Logical Instructions

cpu
Daddr

(04.ps) Sat May 4 20:50:22


00000000

Dout

00000001 00000010 f23416f6 f23416e6 d9030000 0001e000 00d90300 ffff2341 000d903011110000 00003640

00000000

00000001

00000000

0000000f d9030000 f2341234 00d90300 00001111 000d9030 f23416e6 00d90300 00003640 ffffff23 0000000f

DMC
RW 03
W

ffffff23 0000000d 11113640 11113563 f2341225

00
04

05

00000000

06

07

03

04

05

06

07

08

09

0b

0c

0d

0e

0f

10

00000001 00000010 f23416f6 f23416e6 d9030000 0001e000 00d90300 ffff2341 000d9030 11110000 00003640 ffffff23 0000000d 11113640 11113563

IR 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023 01a29824
PC 004000f8 004000fc 00400100 00400104 00400108 0040010c 00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130 00400134 00400138
IR1

304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023 01a29824

PC1 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130 00400134 00400138
IR2

2c261388 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021 004a8822 004a9023

PC2 004000f0 004000f4 004000f8 004000fc 00400100 00400104 00400108 0040010c 00400110 00400114 00400118 0040011c 00400120 00400124 004001280040012c00400130 00400134
IR3

2c450043 2c261388 304704d2 344304d2 384404d23c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007016c782001ed8021 004a8822

PC3

004000f0 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c 00400130

IR4

28240043 2c450043 2c261388 304704d2 344304d2 384404d2 3c05d903 000a3340 00053a02 00024303 00074903 01215804 00896006 00446807 00477007 016c7820 01ed8021

PC4

004000ec 004000f0 004000f4 004000f8 004000fc 00400100 00400104 004001080040010c00400110 00400114 00400118 0040011c 00400120 00400124 00400128 0040012c

CLK
c.r.WE
time (ns)
7808.73

8000.00

8250.00

8500.00

8750.00

9000.00

9250.00

9500.00

9750.00

10000.00

10250.00 10411.64

addi $5,$0,0xc

addi $6,$0,0xd

add $6,$5,$6: A register just written to, and a register written to 2


instructions before are used to calculate an addition, and the
result is correct, 0xc + 0xd = 0x19, hence bypassing is operational

Figure 4: Test of Bypassing

cpu

(05.ps) Sat May 4 20:50:40

Daddr f2341225 f2341220

ffffff37 0dcbed17 000000c8 00000001

Dout 0000000f

f2341234

00000000

0000000f f2341234 0000000f 00001111 d9030000 0001e000 0000000d 00000000 ffff2341

DMC
RW
W

00000001 0000000c0000000d00000019 00000000 0000ffff ffff001a

00000001 ffff001b

0000ffff 0000000f 00000001

00
11

12

f2341225

13

14

15

16

17

18

f2341220 ffffff37 0dcbed17000000c800000001

19

00000000

1a

05

06

07

08

00000001 0000000c 0000000d 00000019 00000000 0000ffff

09

0a

0b

ffff001a 00000001

IR 01a2a02501a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b2005000c2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000 012a5820 016a6024016c6826000d7100
PC 0040013c00400140 00400144 00400148 0040014c 00400150 00400154 00400158 0040015c 00400160 00400164 004001680040016c00400170 00400174 00400178 0040017c 00400180
IR1 01a2a025 01a2a826 01a2b027 004ab82a0022c02a004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822292a0000 012a5820 016a6024 016c6826

PC1 0040013c00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170 00400174 004001780040017c
IR2 01a29824 01a2a025 01a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822292a0000 012a5820 016a6024
PC2 00400138 0040013c 00400140 00400144 00400148 0040014c 00400150 00400154 00400158 0040015c 00400160 00400164 004001680040016c00400170 00400174 00400178
IR3 004a9023 01a29824 01a2a025 01a2a826 01a2b027 004ab82a0022c02a004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000 012a5820
PC3 00400134 00400138 0040013c 00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170 00400174
IR4 004a8822 004a9023 01a29824 01a2a025 01a2a826 01a2b027 004ab82a 0022c02a 004ac82b 0141d02b 2005000c 2006000d 00a63020 00c0382a 34e8ffff 00c84822 292a0000

PC4 00400130 00400134 004001380040013c00400140 00400144 004001480040014c00400150 00400154 00400158 0040015c 00400160 00400164 00400168 0040016c 00400170
CLK
c.r.WE
time (ns)
10500.00

10750.00

11000.00

11250.00

11500.00

11750.00

12000.00

12250.00

12500.00

12750.00

13000.00

lbu $11, 3($16), address in $16 is 0x11111110, so 3 + $16 = 0x11111113


The word at this location is 0x7913b5ef, so 0xef should be written to $11 in the next pipeline stage
cpu

Figure 5. Test Load & Store Instructions

effective address

(07.ps) Sat May 4 20:51:03

Daddr 11111111 11111112 11111113 11111110 11111112

11111110

11111111 11111110 11111112 11111110 11111113

11111110

11111112

Dout ffff001a00000001 ffff001b 00000001 ffff001a 7913b5ef fff001a0 00000019 ffffff00 00000019 f2341225 00000019 f2341220 00000019 00000079 00000019 00000013 00000019

load

DMC
RW 08

00
09

0a

00
0b

0c

0d

11

00
0e

06

00
0f

06

00
12

06

00
13

06

00
14

06

00
15

write to $11

write 0xef

00000013 000000b5 000000ef 00007913 0000b5ef

7913b5ef

00000019 1913b5ef 00000019 1919b5ef 00000019 191919ef 00000019 19191919 00000019 00191919

IR 960c0000 960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000 3c05aabb34a5ccdd
PC 004001c8 004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204 004002080040020c
IR1

960c0000 960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000 3c05aabb

PC1

004001c8 004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204 00400208

IR2

920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002 8e160000

PC2

004001c4 004001c8 004001cc 004001d0 004001d4 004001d8 004001dc 004001e0 004001e4 004001e8 004001ec 004001f0 004001f4 004001f8 004001fc 00400200 00400204

Instruction in
MEM stage
IR3

920a0002 920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000 a6060002

PC3

004001c0 004001c4004001c8004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8004001ec 004001f0 004001f4 004001f8 004001fc 00400200

Instruction in
WB stage
IR4

92090001 920a0002 920b0003960c0000960d0002 ae110000 8e0e0000 a2060000 8e0f0000 a2060001 8e120000 a2060002 8e130000 a2060003 8e140000 a6060000 8e150000

PC4

004001bc 004001c0004001c4004001c8004001cc 004001d0 004001d4 004001d8004001dc004001e0 004001e4 004001e8004001ec 004001f0 004001f4 004001f8 004001fc

CLK
c.r.WE
time (ns)
15617.46

15750.00

16000.00

16250.00

16500.00

16750.00

17000.00

17250.00

17500.00

17750.00

18000.00

18220.37

$6 contains 0x11223344. $16 has address 0x11111110. This address has 0xaabbccdd stored at it. lwl $6, 0($16) performed at PC =
0x0040021c; result is 0xaabbccdd, as expected. lwl $6, 1($16) performed at PC=0x00400220; result is 0xbbccdddd, as expected, and
so on. lwr $6 3($16) performed at PC=0x004022c; result is 0xaabbccdd, as expected. lwr $6 1($16) at PC = 0x0040230; result is
0xaabbaabb,
cpu as expected. These test cases demonstrate the fuctioning of lwl and lwr with necesary bypassing
Daddr 11111110 aabb0000 aabbccdd 1122000011223344
Dout

11111111 11111112

11111113

ffffffb5 0000000caabb0000 00000019 11220000 aabbccdd 11223344 11111110 11111111 11111112

DMC
RW

11111110

16

(08.ps) Sat May 4 20:51:08

11111111 1111111011111112 00000001 0000000200000003

11111113

00
06

Figure 6. LWL/LWR Test

11111111 11111110 00001111 f2341234 f23416f6

00
05

06

05

06

01

02

03

W 00000019 00190019 aabb0000 aabbccdd 11220000 11223344 aabbccdd aabbccdd bbccdddd ccdddddd dddddddd aabbccdd aabbaabb aabbaaaa aaaabbcc 00000001 00000002
IR 34a5ccdd3c061122 34c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004 2005000520060006
PC 0040020c00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 004002380040023c00400240 00400244 004002480040024c00400250
IR1 34a5ccdd3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004 20050005

PC1 0040020c00400210 00400214 00400218 0040021c 00400220 00400224 00400228 0040022c 00400230 00400234 00400238 0040023c 00400240 00400244 004002480040024c
IR2 3c05aabb 34a5ccdd 3c061122 34c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003 20040004
PC2 004002080040020c00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 00400238 0040023c 00400240 00400244 00400248
IR3 8e160000 3c05aabb 34a5ccdd3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002 20030003
PC3 00400204 00400208 0040020c 00400210 00400214 004002180040021c00400220 00400224 004002280040022c00400230 00400234 004002380040023c00400240 00400244
IR4 a6060002 8e160000 3c05aabb 34a5ccdd 3c06112234c63344 ae050000 8a060000 8a060001 8a060002 8a060003 9a060003 9a060001 9a060000 9a060002 20010001 20020002

PC4 00400200 00400204 00400208 0040020c 00400210 00400214 004002180040021c00400220 00400224 00400228 0040022c 00400230 00400234 00400238 0040023c 00400240
CLK
c.r.WE
time (ns)
18250.00

18500.00

18750.00

19000.00

19250.00

19500.00

19750.00

20000.00

20250.00

20500.00

20750.00

jump delay slot

j 4002cc

jal 4002e4

Figure 7: Test Jumps

cpu

(10.ps) Sat May 4 20:51:19

Daddr

0000001600000017 00000018 00000019 0000001a 0000001b 0000001c 0000001d 0000001e 00000000 0000001f 0000000000000002 00000004 00400000 00400308 00000000

Dout

00190019 ffffffef 00007913 ffffb5ef 7913b5ef

00000000

7fffae50 00000000 00000010 000000a4 00000010 00000001 00000002 00000004 00400000 00000000

DMC

00

RW 14
W

15

16

17

18

19

1a

1b

1c

1d

1e

10

1f

01

02

00000015 00000016 00000017 00000018 00000019 0000001a 0000001b0000001c0000001d 0000001e 004002bc 0000001f 004002d4 00000002 00000004 00400000 00400308

IR 2018001820190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 004210203c02004024420308 0040f809 00000000 006318203c020040
PC 004002980040029c 004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec 004002f0 004002f4 004003080040030c
IR1

20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b900210820 00421020 3c020040 24420308 0040f809 00000000 00631820

PC1

00400298 0040029c 004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8004002ec 004002f0 004002f4 00400308

IR2

20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 00421020 3c020040 24420308 0040f809 00000000

PC2

00400294 004002980040029c004002a0 004002a4 004002a8 004002ac 004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8004002ec 004002f0 004002f4

IR3

20160016 20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b900210820 00421020 3c020040 24420308 0040f809

PC3

00400290 00400294 00400298 0040029c 004002a0 004002a4 004002a8 004002ac 004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec 004002f0

IR4

20150015 20160016 20170017 20180018 20190019 201a001a 201b001b 201c001c 201d001d 201e001e 081000b3 201f001f 0c1000b9 00210820 00421020 3c020040 24420308

PC4

0040028c00400290 00400294 004002980040029c004002a0 004002a4 004002a8004002ac004002b0 004002b4 004002b8 004002cc 004002d0 004002e4 004002e8 004002ec

CLK
c.r.WE
time (ns)
23500.00

23750.00

24000.00

24250.00

24500.00

24750.00

25000.00

25250.00

25500.00

25750.00

26000.00

bgezal $6, 400380

Figure 8. Branch Testing

cpu

(11.ps) Sat May 4 20:51:24

Daddr 00000000 00000006 004000000040032c

00000000

00000008

00000000

Dout 00000000 00000003 00400308 00400000

00000000

00000004 00000008 00000000 00000005 00000002 00000000 00000006 00000011 00000000 00000007 00000000

DMC
RW

0000000a

00000000

0000000c

00000000

0000000e

00000000

00
1f

00

03

02

00

04

W 004002f8 00000000 00000006 00400000 0040032c 0040031c 00000000 00000008

00
00000000

05

1f

00

06

1f

00

07

00

0000000a 00400354 00000000 0000000c 00400370 00000000 0000000e

IR 3c0200402442032c00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e738201ce0000500000000 00004020 19000005
PC 0040030c00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384 004003880040039c004003a0
IR1 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 00000000 00c63020 04d10005 00000000 00e73820 1ce00005 00000000 00004020

PC1 0040030c 00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384 004003880040039c
IR2 00631820 3c0200402442032c00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e738201ce0000500000000
PC2 00400308 0040030c 00400310 00400314 00400318 0040032c 00400330 00400334 004003480040034c00400350 00400364 004003680040036c00400380 00400384 00400388
IR3 00000000 00631820 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 00000000 00c63020 04d10005 00000000 00e73820 1ce00005
PC3 004002f4 004003080040030c00400310 00400314 004003180040032c00400330 00400334 00400348 0040034c 00400350 00400364 00400368 0040036c 00400380 00400384
IR4 0040f809 00000000 00631820 3c020040 2442032c 00400008 00000000 00842020 10840005 00000000 00a52820 04a10005 0000000000c6302004d10005 00000000 00e73820

PC4 004002f0 004002f4 00400308 0040030c 00400310 00400314 004003180040032c00400330 00400334 004003480040034c00400350 00400364 004003680040036c00400380
CLK
c.r.WE
time (ns)
26029.10

26250.00

26500.00

26750.00

27000.00

27250.00

27500.00

27750.00

28000.00

28250.00

28500.00
28632.01

bne $19, $19, 4004ec

$19=$19, so do not branch

Figure 9. Branch not taken Test

cpu

(15.ps) Sat May 4 20:53:48

Daddr 00000000 00000022

00000000

Dout 0000000000000011 00000020

00000000

00000024

00000000

00000012

00000000

DMC
00
00000000

IR
PC

00000000

00000013 0000002600000000

00

RW
W

00000026

11

1f

00

00000022 004004a4

12

00000000

00000000

1f

00000024 004004c0

02529020 06400005 00000000

00000000

00

13

00000000

0000002600000000

02739820 16730005

00000000

004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8004004bc004004c0004004c4004004c8004004cc 004004d0 004004d4 004004d8 004004dc 004004e0

IR1 06300005

00000000

02529020 06400005

00000000

02739820 16730005

00000000

PC1 0040049c004004a0 004004a4 004004a8 004004ac 004004b0 004004b4 004004b8004004bc004004c0 004004c4 004004c8 004004cc 004004d0 004004d4 004004d8 004004dc004004e0
IR2 0231882006300005

00000000

02529020 06400005

00000000

02739820 16730005

00000000

PC2 004004980040049c004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8 004004bc 004004c0 004004c4 004004c8 004004cc 004004d0 004004d4 004004d8004004dc
IR3 0000000002318820 06300005

00000000

02529020 06400005

00000000

02739820 16730005 00000000

PC3 00400494004004980040049c004004a0 004004a4 004004a8004004ac004004b0 004004b4 004004b8 004004bc 004004c0 004004c4 004004c8 004004cc 004004d0 004004d4004004d8
IR4

00000000

02318820 06300005

00000000

02529020 06400005

00000000

0273982016730005

PC4 0040049000400494 00400498 0040049c 004004a0 004004a4 004004a8 004004ac 004004b0 004004b4 004004b8 004004bc 004004c0 004004c4004004c8004004cc 004004d0004004d4
CLK
c.r.WE
time (ns)
35750.00

36000.00

36250.00

36500.00

36750.00

37000.00

37250.00

37500.00

37750.00

38000.00

38250.00

PC1

pcff
data[29..0]
clock

NOTE: All Registers have Synchronous Clear tied to Reset

q[29..0]

data[29..0]
clock

PC1

data[31..0]
clock

DFF

DFF

data[31..0]
clock

DFF

data[29..0]
clock

DFF

IR3

q[29..0]

inst27

IR4

ff

DFF

data[31..0]
clock

inst3

DFF

data[29..0]
clock

ff

q[31..0]

PC4

pcff

q[29..0]

inst26

IR2

ff

q[31..0]

inst1

PC3

pcff

q[29..0]

inst25

IR1

ff

PC2

pcff

DFF

q[31..0]

DFF

data[31..0]
clock

inst7

q[31..0]

inst9

IF2

RD
CLK
Brcalc[29..0]
PC[29..0]
Brtaken
Iout[31..0]
Jcalc[29..0]
Jtaken
JRcalc[29..0]
JRtaken
Iaddr[31..0]
Iin[31..0]

Instr[25..21]
Instr[20..16]

Iin[31..0]

CLK
RA[4..0]
RB[4..0]
RW[4..0]
WE
W[31..0]

RDdataoutA[31..0]
RDdataoutB[31..0]
ALUdataout[31..0]
WBdataout[31..0]

ff
data[31..0]
clock

imem
Iaddr[31..0]
IMC[1..0]

PC[29..0]
Instr[31..0]

Instr[31..0]
RWaddress[4..0]
chooseRTorRDtowriteto
jal
inst

regfile
inst15

EX

DFF

q[31..0]

ALUbypass
ALUdataout[31..0]
ALUbypassB
WBbypass
JRPCcalc[29..0]
WBbypassB
JPCcalc[29..0]
SignExtendImmediate BrPCcalc[29..0]
ImmediateInstruction
lui
shiftVar
ALUctrl[10..0]

inst4

A[31..0]
B[31..0]

ff
data[31..0]
clock

MEMdatain[31..0]
agz
agez
alz
alez
aeb
aneb

DFF

q[31..0]

inst5

MEM

ff

PC[29..0]
toWB[31..0]
WBdataout[31..0] PCplusEight[31..0]

ff
data[31..0]
clock

LwlLwrcalc[31..0]
din[31..0]
signed
CTRLbh
CTRLword
MEMWbbypass
chooseLwlorLwr

DFF

q[31..0]

inst10

ff
data[31..0]
clock

DFF

inst8

data[31..0]
clock
dout[31..0]

WBdataout[31..0]

A[31..0]
isLwlLwr
WB_EXorMEM

DFF

q[31..0]

link
inst21

ff

inst19

DFF

q[31..0]

inst24

inst6

inst16

WB
M[31..0]
PCplus8[31..0]
LwlLwrcalc[31..0]

inst22

data[31..0]
clock

inst31
GND

DFF

q[31..0]

inst12

ff

EXdatain[31..0]
daddr[31..0]

q[31..0]

data[31..0]
clock

dmem
daddr[31..0]
dout[31..0]
DMC[1..0]

din[31..0]

inst20

ff
data[31..0]
clock

DFF

q[31..0]

inst14

Instr[25..21]
Instr[20..16]

forwarder
RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE

forwarder
Instr[20..16]

inst32

RW1

5ff
control
aneb
Brtaken
aeb
Jtaken
alez
JRtaken
alz
WE
agez
DMC[1..0]
agz
signed
InstrRD[31..0]
CTRLbh
InstrEX[31..0]
CTRLword
InstrMEM[31..0]
InstrWB[31..0] chooseRTorRDtoWriteto
jal_bltzal_bgezal
link
WB_EXorMEM
chooseLwlorLwr
isLwlLwr
SignExtendImmediate
ImmediateInstruction
lui
shiftVar
ALUc[10..0]

data[4..0]
clock

DFF

Instr[25..21]
Instr[20..16]

q[4..0]

inst28

WE1

1ff
data
clock
inst11

DFF

forwarder
RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE

RW2

5ff
data[4..0]
clock

inst13

DFF

q[4..0]

inst30

WE3

1ff

DFF

RW3

5ff
data[4..0]
clock
WE2

1ff
data
clock

inst36

DFF

q[4..0]

inst29

inst33

RS[4..0]
bypass
RT[4..0] bypassB
RW[4..0]
WE

data
clock

DFF

inst23

inst2

TITLE

Pipelined MIPS CPU..

COURSE

Cornell University - ECECS 314..

DESIGNERS

Alan Leung & Lav Varshney..

NUMBER

1.00

DATE

Sat May 04 16:44:03 2002

REV
SHEET

A
OF

NOTE: PC has Synchronous Clear tied to Reset

WIRE

Iaddr[31..2]

inst2

Iaddr[1..0]

inst6
CLK

NOT

INPUT
VCC

inst11

OUTPUT

Iaddr[31..0]

OUTPUT

PC[29..0]

GND

clock
data[29..0]

q[29..0]
DFF

pc

VCC

incrementer
cin
dataa[29..0]
0

A
result[29..0]
A+B
B

inst

Parameter Value
WIDTH
30

BUSMUX
dataa[]
Brcalc[29..0]

INPUT
VCC

datab[]
inst22

Brtaken

INPUT
VCC

Jcalc[29..0]

INPUT
VCC

result[]

Parameter Value
WIDTH
30

sel

BUSMUX
dataa[]
datab[]
inst23

Jtaken

INPUT
VCC

JRcalc[29..0]

INPUT
VCC

JRtaken

INPUT
VCC

Iin[31..0]

INPUT
VCC

result[]

Parameter Value
WIDTH
30

sel

BUSMUX
dataa[]
datab[]
inst24

result[]

sel

OUTPUT

Iout[31..0]

TITLE

Instruction Fetch for CPU..


Cornell University - ECECS 314..
DESIGNERS
Alan Leung & Lav Varshney..
COURSE

NUMBER
DATE

1.00
Sat May 04 14:39:14 2002

REV
SHEET

A
OF

Instr[20..16]

INPUT
VCC

Instr[15..11]

Instr[31..0]

Parameter Value
WIDTH
5

BUSMUX
RT
RD

dataa[]
datab[]
inst31

chooseRTorRDtowriteto

Parameter Value
WIDTH
5

result[]

BUSMUX

sel

dataa[]

INPUT
VCC
VCC

datab[]
inst32
jal

result[]

RWaddress[4..0]

OUTPUT

sel

INPUT
VCC

TITLE

Register/Operand Fetch Stage for CPU..

COURSE

Cornell University - ECECS 314..


DESIGNERS
Alan Leung & Lav Varshney..
NUMBER

1.00

DATE

Sat May 04 14:34:28 2002

REV
SHEET

A
OF

Parameter Value
WIDTH
5

BUSMUX
Instr[10..6]

dataa[]

RDdataout[4..0]

datab[]

Parameter Value
WIDTH
5

sel

inst2
shiftVar

result[]

INPUT
VCC

BUSMUX
dataa[]

10000
16
inst9

datab[]
5
inst7

result[]

sel

INPUT
VCC

lui

Bpccalc
cin
VCC

PC[29..0]

dataa[29..0]

INPUT
VCC

datab[29..0]

A
result[29..0]
A+B
B

A[31..2]..

inst

OUTPUT

BrPCcalc[29..0]

OUTPUT

JRPCcalc[29..0]

OUTPUT

agz

OUTPUT

agez

OUTPUT

alz

OUTPUT

alez

OUTPUT

aeb

OUTPUT

aneb

OUTPUT

JPCcalc[29..0]

OUTPUT

ALUdataout[31..0]

OUTPUT

MEMdatain[31..0]

zcomp
signed compare

RDdataoutA[31..0]

INPUT
VCC

BUSMUX

dataa[]

datab[]
INPUT
VCC

ALUbypass

INPUT
VCC

dataa[]

datab[]

sel

inst32
WBbypass

result[]

result[]

inst23

bcomp
compare

dataa[31..0] aeb
datab[31..0] aneb

sel

inst33

WIRE

Instr[31..0]

SextImm[29..0]

BUSMUX

Parameter Value
WIDTH
32

PC[29..26]

Parameter Value
WIDTH
32

agb
dataa[31..0] ageb
alb
datab[]=0
aleb

inst19

JPCcalc[29..26]

inst6

Instr[25..0]

INPUT
VCC

WIRE

JPCcalc[25..0]

inst5

alu
ALUctrl[10..0]

ALUc[10..0]
shamt[4..0]

INPUT
VCC

A[31..0]
sext2
Instr[15..0]
SignExtendImmediate

BUSMUX
dataa[]

imm[15..0] out[31..0]
control

INPUT
VCC

datab[]

inst4
inst3
ImmediateInstruction

B[31..0]

result[]

result[31..0]

inst8

sel
Parameter Value
WIDTH
32

INPUT
VCC

Parameter Value
WIDTH
32

BUSMUX
RDdataoutB[31..0]
WBdataout[31..0]

INPUT
VCC
INPUT
VCC

dataa[]
datab[]
inst35

WBbypassB
ALUdataout[31..0]

INPUT
VCC
INPUT
VCC

result[]

Parameter Value
WIDTH
32

BUSMUX

sel

dataa[]
datab[]
inst36

ALUbypassB

INPUT
VCC

result[]

sel
TITLE

Execution Stage for CPU..


Cornell University - ECECS 314..
DESIGNERS
Alan Leung & Lav Varshney..
COURSE

NUMBER

1.00

DATE

Sat May 04 15:02:31 2002

REV
SHEET

A
OF

daddr[31..0]
EXdatain[31..0]

INPUT
VCC
INPUT
VCC

OUTPUT

dout[31..0]

OUTPUT

toWB[31..0]

OUTPUT

LwlLwrcalc[31..0]

MemLoadShifter
din[31..0]
signed
CTRLbh
CTRLword

INPUT
VCC
INPUT
VCC
INPUT
VCC
INPUT
VCC

Parameter Value
WIDTH
32

din[31..0]
toWB[31..0]
signed
CTRLbh
CTRLword
select[1..0]

daddr[1..0]

inst

BUSMUX
dataa[]
WBdataout[31..0]

INPUT
VCC

MEMWbbypass

INPUT
VCC

chooseLwlorLwr

INPUT
VCC

datab[]
inst4

LwlLwr
0

result[]

r[31..0]
b[31..0]
off[1..0]
chooseLwlorLwr

sel

daddr[1..0]

out[31..0]

inst3

PC[29..0]

INPUT
VCC

PCplus8

a[31..2]

a[31..0] dataa[31..0]
a[1..0]
8

GND

A
result[31..0]
A+B
B

OUTPUT

PCplusEight[31..0]

inst1
TITLE

Memory Stage for CPU..

COURSE

Cornell University - ECECS 314..


Alan Leung & Lav Varshney..

DESIGNERS
NUMBER
DATE

1.00
Sat May 04 15:37:59 2002

REV
SHEET

A
OF

A[31..0]

INPUT
VCC

Parameter Value
WIDTH
32

BUSMUX
M[31..0]
LwlLwrcalc[31..0]

INPUT
VCC
INPUT
VCC

BUSMUX
dataa[]
0

dataa[]
0

datab[]
inst9

isLwlLwr

Parameter Value
WIDTH
32

result[]

sel

datab[]
inst7

result[]

sel
Parameter Value
WIDTH
32

BUSMUX

INPUT
VCC

dataa[]
WB_EXorMEM

INPUT
VCC

PCplus8[31..0]

INPUT
VCC

link

INPUT
VCC

datab[]
inst8

result[]

WBdataout[31..0]

OUTPUT

sel

TITLE

Writeback Stage for CPU..

COURSE

Cornell University - ECECS 314..


Alan Leung & Lav Varshney..

DESIGNERS
NUMBER
DATE

1.00
Sat May 04 16:46:07 2002

REV
SHEET

A
OF

Вам также может понравиться