Академический Документы
Профессиональный Документы
Культура Документы
Jehan-François Pâris
jparis@uh.edu
Chapter Organization
• Logic design conventions
• Implementation of a "toy" CPU
• Pipelining
• Pipelining hazards
– Data hazards IMPORTANT
– Control hazards
• Exceptions
• Parallelism
LOGIC DESIGN CONVENTIONS
Combinational/state elements
• Combinational elements:
– Outputs only depend on current inputs
– Stateless
• Adders and, more generally, arithmetic
logic unit (ALU)
Combinational/state elements
• State elements:
– Have a memory holding a state
– Output depends on current inputs and state of
element
– State reflects past inputs
• Flip-flops, …
Judicial analogy
• In our legal system
– Guilty/not guilty decision is stateless
• Good reasons
– Sentencing decision is not
• "Three strikes and you are out" laws
• Good reasons
Clocking methodology
• We will assume an edge-triggered clocking
technology
– Edge is short-enough to prevent data
propagation in state elements
– Can read current state of a memory element
at the same time we update it
Clocking convention
• Omit write control signal if state element is
updated at every active clock edge
A "TOY" CPU
Motivation
• "Toy" CPU will implement a subset of MIPS
instruction set
• Subset will be
– Self-sufficient
– Simpler to implement
– Complex enough to allow a serious discussion
of CPU architecture
The subset
• Will include
– Load and store instructions:
lw (load word) and sw (store word)
– Arithmetic-logic instructions:
add, sub, and, or and slt (set less than)
– Branch instructions:
beq (branch if equal) and j (jump)
Load and store instructions
• Format I
• Three operands:
– Two registers $r1 and $r2
– One displacement d
• lw $r1, d($r2) loads into register $r1 main
memory word at address contents($r2) + d
• sw $r1, d($r2) stores contents of register $r1 into
main memory word at address contents($r2) + d
Arithmetic-logic instructions
• Format R
• Three operands:
– Three registers $r1, $r2 and $r3
• Store into register $r1 result of $r2 <op> $r3
where <op> can be add, subtract, and, or
as well as set if less than
Branch instruction
• Format I
• Three operands:
– Two registers $r1 and $r2
– One displacement d
• beq $r1, $r2, d
set value of PC to PC+4 + 4×d
iff $r1 = $r2
The simplest data path
• Assume CPU will do nothing but
– Incrementing its program counter and
– Deliver the next instruction
The simplest data path
4 Add
Instruction
Memory
P Read address
C
Instruction
Implementing R2R instructions
• Takes two 32-bit inputs
• Returns
– A 32-bit output
– A 1-bit signal if the result is zero
The register file
• Two read outputs that are always available
• One write input activated by a RegWrite signal
RegWrite:
enables register writes
Implementing R2R instructions
Register ALU
file
Result
RegWrite is enabled
Implementing load and store
• Require
– An address calculation:
• contents($r2) + d
– An access to data memory
• Before doing the address calculation, we must
transform 16-bit displacement d into a 32-bit
value using sign extension
The data memory
• One address selector
• One write data input
• One read data output
• Two controls
– MemWrite
– MemRead
Sign extension (I)
• If 16-bit number has a zero as MSB
– It is positive
– Must add 16 zero bits
0110 1010 1010 0100
Write data
Write
SE Sign-extended d field
Implementing the load instruction
SE
Write
d field
SE
Implementing conditional branch
• Target Address:
– Sign-extend 16-bit immediate part of
instruction
– Shift left 2
– Add to PC
• Branch Control Logic:
– Perform test operation on two registers
– Check result
Implementing conditional branch
PC+4 Branch
Shift Add
left 2 Destination
To branch
Register ALU control logic
file
d field of
instruction
Sign-extended d field
SE
Note
• Arithmetic-logic operations only use
– Register file and ALU
• Load and store use
– ALU for computing memory address
– Data memory
Implementing other instructions
Combining everything
Left to be done
• All control signals:
– Two multiplexers: ALUSrc and MemtoReg
– RegWrite, MemRad and MemWrite switches
– ALU controls (4 bits)
ALU control signals
ALU control lines Function
0000 and
0001 or
0010 add
0110 subtract
0111 set on less than
1100 nor (not in "toy" subset)
Controlling the ALU
• Recall that all R-format instructions have
same opcode
– Operation performed by ALU is specified in the
function field (bits <0:5>)
Controlling the ALU
• ALU control inputs generated by two-step process
– Construct two ALUOp control bits from
opcode
– Construct four ALU control bits using
• Two ALUop bits
• Six bits from function field when they are
needed
Dependence table
Opcode ALUOp Operation Function Action ALU Ctl
lw 00 lw - add 0010
sw 00 sw - add 0010
beq 01 beq - subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 and 100100 and 0000
R-type 10 or 100101 or 0001
R-type 10 slt 101010 slt 0111
Notes
• Two step process simplifies combinatorial logic
• Many don't care conditions in truth table
Truth table
ALU ALU F5 F4 F3 F2 F1 F0 ALU
Op1 Op2 Control bits
0 0 X X X X X X 0010
0 1 X X X X X X 0110
1 0 X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 0 X X 0 1 0 0 0000
1 0 X X 0 1 0 1 0001
1 X X X 1 0 1 0 0111
Note
• Bits 4 and 5 of function field are not used
• ALUOp bits only have three possible values:
00, 01 and 10
– Introduces don't care conditions
• All R instructions use same data paths
– Other control bits depend only on opcode
Control signal effects
Signal When deasserted When asserted
Regdest Destination register
Destination register
comes from rt fieldcomes from rd field
(bits 20:16) (bits 15:10)
Regwrite None Enables write into
destination register
ALUSrc Second ALU operand Second ALU operand
comes from second comes from sign-
register output extended displacement
(bits 15:0)
Control signal effects
Signal When deasserted When asserted
PCSrc PC is incremented PC set to branch target
by 4 value
MemRead None Enables memory read
output
MemWrite None Enables memory write
MemtoReg Value fed to Value fed to destination
destination register register comes from
comes from ALU memory
Note
• PCSrc is asserted when
– Instruction is a branch
and
– ALU Zero result bit is asserted
• We will introduce a Branch control line
Control line settings
Instruction Rdest ALUsrc MemtoReg RegWrite
R-format 1 0 0 1
lw 0 1 1 1
sw X 1 X 0
beq X 0 X 0
Control line settings
r2
(fixed value)
A bad MIPS instruction (IV)
• Adding this instruction would be a very bad idea
– Why?
Answer
• Instruction would require two steps using the
ALU
– Adding r2 and r3 to compute the address of
the memory operand (step 4)
– Adding the memory operand to r1
• New step would introduce a structural hazard by
preventing any other instruction to access the
ALU
My comment
• Careful design of the MIPS CPU and instruction
set should be noted
– Not true for older instructions sets
• IBM 360, DEC VAX, …
– Not true for X86 instruction sets
• CPU is designed to be compatible with an
existing instruction set
Designing instruction sets
for pipelining (I)
sub add
Data hazards (IV)
• We lose two cycles during which nothing can be
done
• Cannot trust compiler to remove all data hazards
• Observe that new value of $s0 become
available at the end of step 3 of add instruction
– Add special circuitry to provide this value at
the end of step 2 of sub instruction
• Forwarding or bypassing
After forwarding
sub add
Detail of steps
Cycle 1 2 3 4 5 6
add IF ID/RR ALU RW
sub IF ID/RR ALU RW
sub add
Detail of steps
Cycle 1 2 3 4 5 6
lw IF ID/RR ALU MEM RW
sub IF ID/RR stall ALU RW
or beq
Pipelined datapath
Datapaths for pipelined organization
• Define five steps
1. Fetch instruction from memory (IF)
2. Instruction decode and register reads (ID)
3. Execute AL operation on ALU (EX)
4. Access operand in memory (MEM)
5. Write back results into a register (WB)
Datapaths for pipelined organization
• Insert registers to save outputs of each step
before they get updated by th next step
1. IF/ID registers
2. ID/EX registers
3. EX/MEM registers
4. MEM/WB registers
A first try
New New New New
IF/
Comments
• This first try is not correct
– Load instruction will not be implemented
correctly
• Address of destination register will be lost
as soon as new instruction will be fetched
• Must save it at each step
The almost correct datapaths
Register
address
follows
instruction
The almost correct datapaths
More problems
• Address of destination register is not always at
the same place in all instructions
– Could be instruction bits (20-16)
• For all I-format instructions that write into a
register
– Could be instruction bits (15-11)
• In R format instructions
Why?
• In R format instructions
• In I format instructions
or IF ID+Reg EX MEM
add IF ID+Reg EX
sw IF ID+Reg
Adding a forwarding unit
More data hazards
• We can forward the results of sub instruction at
the end of its EX step
– In time for all four following instructions
• To do that we need special forwarding unit
• Not all data hazards can be avoided
– lw followed by any instruction accessing the
loaded word
Why?
• lw loads word from RAM into memory
– Goes through IF, ID+Reg, EX, MEM and
WB steps
– Register value is updated at the end of WB
step
• Must delay any following instruction that wants to
access the contents of the register
Data hazard detection unit
• Detects hazards that cannot be avoided
• Inserts no operation instructions (nop)
– They do nothing!
More about control hazards
• Outcome of conditional branch is not known until
end of step EX
– beq and bne use arithmetic unit to evaluate
the branch condition
– If branch is taken, we must abort the two
following instructions
• Easy because they have not yet updated
anything
More about control hazards
beq IF ID+Reg EX MEM WB
next IF ABORT
dest IF ID+Reg EX
More about control hazards
beq IF ID+Reg EX MEM WB
next IF ABORT
nop
nop