Вы находитесь на странице: 1из 56

The Processor: Datapath & Control

We're ready to look at an implementation of the MIPS


Simplified to contain only:
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j
Generic Implementation:
use the program counter (PC) to supply instruction address
get the instruction from memory
read registers
use the instruction to decide exactly what to do
All instructions use the ALU after reading the registers
Why? memory-reference? arithmetic? control flow?

2004 Morgan Kaufmann Publishers 1


More Implementation Details

Abstract / Simplified View:

Two types of functional units:


elements that operate on data values (combinational)
elements that contain state (sequential)

2004 Morgan Kaufmann Publishers 2


Figure 5.2 The basic implementation of the MIPS subset
including the necessary multiplexers and control lines.

2004 Morgan Kaufmann Publishers 3


5.3 Building a Datapath

2004 Morgan Kaufmann Publishers 4


Keywords

Datapath element A functional unit used to operate on or hold


data within a processor. In the MIPS implementation the datapath
elements include the instruction and data memories, the register
file, the arithmetic logic unit (ALU), and adders.

Program counter (PC) The register containing the address of


the instruction in the program being executed.

Register file A state element that consists of a set of registers


that can be read and written by supplying a register number to be
accessed.

Sign-extend To increase the size of a data item by replicating


the high-order sign bit of the original data item in the high-order
bits of the larger, destination data item.

2004 Morgan Kaufmann Publishers 5


Keywords

Branch target address The address specified in a branch,


which becomes the new program counter (PC) if the branch is
taken. In the MIPS architecture the branch target is given by the
sum of the offset field of the instruction and the address of the
instruction following the branch.
Branch taken A branch where the branch condition is satisfied
and the program counter (PC) becomes the branch target. All
unconditional branches are taken branches.
Branch not taken A branch where the branch condition is false
and the program counter (PC) becomes the address of the
instruction that sequentially follows the branch.
Delayed branch A type of branch where the instruction
immediately following the branch is always executed, independent
of whether the branch condition is true or false.

2004 Morgan Kaufmann Publishers 6


Register File

Built using D flip-flops Read register


number 1
Register 0

Register 1
M
... u Read data 1
x
Read register Register n 2
number 1 Read Register n 1
Read register data 1
number 2
Register file Read register
Write Read number 2
register data 2
Write
data Write M
u Read data 2
x

Do you understand? What is the Mux above?

2004 Morgan Kaufmann Publishers 7


Register File

Note: we still use the real clock to determine when to write


Write

C
0
1 Register 0

n-to-2n .. D
Register number .
decoder
C
Register 1
n1
D
n

..
.

C
Register n 2
D

C
Register n 1
Register data D

2004 Morgan Kaufmann Publishers 8


Simple Implementation

Include the functional units we need for each instruction

Instruction
address

Instruction PC Add Sum

Instruction
memory

a. Instruction memory b. Program counter c. Adder

2004 Morgan Kaufmann Publishers 9


MemWrite

Read
Address data
16 32
Sign
Data extend
Write memory
data

MemRead

a. Data memory unit b. Sign-extension unit

2004 Morgan Kaufmann Publishers 10


5 Read ALU operation
register 1 4
Read
Register 5 data 1
Read
numbers register 2 Zero
Data ALU ALU
5 Registers
Write result
register Read
data 2
Data Write
Data
RegWrite

a. Registers b. ALU

Why do we need this stuff?


2004 Morgan Kaufmann Publishers 11
Figure 5.10 The datapath for the memory instructions and
the R-type instructions.

2004 Morgan Kaufmann Publishers 12


Building the Datapath
Use multiplexors to stitch them together
PCSrc

M
Add u
x
ALU
4 Add
result
Shift
left 2

Read ALUSrc ALU operation


Read register 1 4
PC address Read MemWrite
data 1
Read MemtoReg
register 2 Zero
Instruction ALU ALU
Registers Read Read
Write Address
data 2 result data M
Instruction register M
memory u u
x x
Write
data Data
Write memory
RegWrite data

16 32 MemRead
Sign
extend

2004 Morgan Kaufmann Publishers 13


5.4 A Simple Implementation Scheme

2004 Morgan Kaufmann Publishers 14


Figure B.5.9 A 1-bit ALU that performs AND, OR, and
addition on a and b or a and b.

2004 Morgan Kaufmann Publishers 15


FIGURE B.5.10 (Top) A 1-bit ALU that performs AND,
OR, and addition on a and b or b.

2004 Morgan Kaufmann Publishers 16


FIGURE B.5.10 (bottom) a 1-bit ALU for the most
significant bit.

2004 Morgan Kaufmann Publishers 17


FIGURE B.5.11 A 32-bit ALU constructed from the 31 copies of the 1-bit
ALU in the top of Figure B.5.10 and one 1-bit ALU in the bottom of that figure.

2004 Morgan Kaufmann Publishers 18


FIGURE B.5.12 The final 32-bit ALU. This adds a Zero
detector to Figure B.5.11.

2004 Morgan Kaufmann Publishers 19


FIGURE B.5.14 The symbol commonly used to represent an
ALU, as shown in FigureB.5.12.

2004 Morgan Kaufmann Publishers 20


Figure 5.15 The datapath of Figure 5.12 with all necessary
multiplexors and all control lines identified

2004 Morgan Kaufmann Publishers 21


Control

Simple combinational logic (truth tables)


Inputs
Op5
Op4
Op3

ALUOp Op2
Op1
ALU control block
Op0
ALUOp0
ALUOp1

Outputs
Operation2 R-format Iw sw beq
F3 RegDst
Operation
F2 Operation1 ALUSrc
F (5 0)
F1 MemtoReg
Operation0
RegWrite
F0
MemRead
MemWrite

Branch
ALUOp1

ALUOpO

2004 Morgan Kaufmann Publishers 22


Figure 5.17 The simple datapath with the control unit.

2004 Morgan Kaufmann Publishers 23


Figure 5.18 The setting of the control lines is completely
determined by the opcode fields of the instruction.

Memto- Reg Mem Mem


Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1

2004 Morgan Kaufmann Publishers 24


Figure 5.19 The datapath in operation for an R-type instruction
such as add $t1, $t2, $t3.

2004 Morgan Kaufmann Publishers 25


Figure 5.20 The datapath in operation for a load instruction.

2004 Morgan Kaufmann Publishers 26


Figure 5.21 The datapath in operation for a branch equal
instruction.

2004 Morgan Kaufmann Publishers 27


Figure 5.22 The control function for the simple single-cycle
implementation is completely specified by this truth table.
Input or output Signal name R-format lw sw beq
Inputs Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
Outputs RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
2004 Morgan Kaufmann Publishers 28
Figure 5.23 Instruction format for the jump instruction
(opcode = 2).

Field 000010 address


Bit positions 31:26 25:0

2004 Morgan Kaufmann Publishers 29


Figure 5.24 The simple control and datapath are extended to
handle the jump instruction.

2004 Morgan Kaufmann Publishers 30


Problem: Performance of Single-Cycle Machines (p.315)
Assume that the operation times for the major functional units in this implementation
are the following:

Memory units: 200 picoseconds (ps)


ALU and adders: 100 ps
Register file (read or write): 50 ps

Assume that the multiplexors, control unit, PC accesses, sign extension unit, and
wires have no delay, which of the following implementations would be faster and by
how much?

1. An implementation in which every instruction operates in 1 clock cycle of a


fixed length.
2. An implementation where every instruction executes in 1 clock cycle
using a variable-length clock, which for each instruction is only as long as it
needs to be.

To compare the performance, assume the following instruction mix: 25% loads, 10%
stores, 45% ALU instructions, 15% branches, and 5% jumps.

2004 Morgan Kaufmann Publishers 31


Lets start by comparing the CPU execution times.
CPU execution time Instruction count CPI Clock cycle time
Since CPI must be 1, we can simplify this to
CPU execution time Instruction count Clock cycle time
The critical path for the different instruction classes is as follows:

Instruction class Functional units used by the instruction class


R-type Instruction fetch Register access ALU Register access

Load word Instruction fetch Register access ALU Memory access Register access

Store word Instruction fetch Register access ALU Memory access

Branch Instruction fetch Register access ALU

Jump Instruction fetch

2004 Morgan Kaufmann Publishers 32


Using these critical paths, we can compute the required length for
each instruction class:
Instruction Instruction Register ALU Data Register
class memory read operation memory write Total

R-type 200 50 100 0 50 400ps

Load word 200 50 100 200 50 600ps

Store word 200 50 100 200 550ps

Branch 200 50 100 0 350ps

Jump 200 200ps

Thus, the average time per instruction with a variable clock is


CPU clock cycle 600 25% 550 10% 400 45% 350 15% 200 5%
447.5 ps

2004 Morgan Kaufmann Publishers 33


Since the variable clock implementation has a shorter average clock
cycle, it is clearly faster. Lets find the performance ratio:

CPU performance variable clock CPU execution timesingle clock



CPU performance single clock CPU execution time variable clock
IC CPU clock cyclesingle clock CPU clock cyclesingle clock

IC CPU clock cyclevariable clock CPU clock cyclevariable clock
600
1.34
447.5

2004 Morgan Kaufmann Publishers 34


5.5 A Multicycle Implementation

2004 Morgan Kaufmann Publishers 35


Keywords

Multicycle implementation Also called multiple clock cycle


implementation. An implementation in which and instruction is
executed in multiple clock cycles.

Microprogramming A symbolic representation of control in the


form of instructions, called microinstructions, that are executed on
a simple micromachine.

Finite state machine A sequential logic function consisting of a


set of inputs and outputs, a next-state function that maps the
current state and the inputs to a new state, and an output function
that maps the current state and possibly the input to a set of
asserted outputs.

Next-state function A combinational function that, given the


inputs and the current state, determines the next state of a finite
state machine.
2004 Morgan Kaufmann Publishers 36
Where we are headed

Single Cycle Problems:


what if we had a more complicated instruction like floating
point?
wasteful of area
One Solution:
use a smaller cycle time
have different instructions take different numbers of cycles
a multicycle datapath:

2004 Morgan Kaufmann Publishers 37


Multicycle Approach

We will be reusing functional units


ALU used to compute address and to increment PC
Memory used for instruction and data
Our control signals will not be determined directly by instruction
e.g., what should the ALU do for a subtract instruction?
Well use a finite state machine for control

2004 Morgan Kaufmann Publishers 38


Multicycle Approach

Break up the instructions into steps, each step takes a cycle


balance the amount of work to be done
restrict each cycle to use only one major functional unit
At the end of a cycle
store values for use in later cycles (easiest thing to do)
introduce additional internal registers

2004 Morgan Kaufmann Publishers 39


Figure 5.27 The multicycle datapath from Figure 5.26 with the
control lines shown.

2004 Morgan Kaufmann Publishers 40


Figure 5.28 The complete datapath for the multicycle
implementation together with the necessary control lines.

2004 Morgan Kaufmann Publishers 41


Figure 5.29 The action caused by the setting of each control
signal in Figure 5.28 on page 323.

Actions of the 1-bit control signals


Signal name Effect when deasserted Effect when asserted
RegDst The register file destination number for The register file destination number for the Write register
the Write register comes from the rt comes from the rd field.
field.
RegWrite None. The general-purpose register selected by the Write register
number is written with the value of the Write data input.
ALUSrcA The first ALU operand is the PC. The first ALU operand comes from the A register.

MemRead None. Content of memory at the location specified by the address


input is put on Memory data output.
MemWrite None. Memory contents at the location specified by the address
input is replaced by value on Write data input.
MemtoReg The value fed to the register file Write The value fed to the register file Write data input comes from
data input comes from ALUOut. the MDR.
IorD The PC is used to supply the address to ALUOut is used to supply the address to the memory unit.
the memory unit.
IRWrite None. The output of the memory is written into the IR.

PCWrite None. The PC is written; the source is controlled by PCSource.

PCWriteCond None. The PC is written is the Zero output from the ALU is also
active.

2004 Morgan Kaufmann Publishers 42


Continue

Actions of the 2-bit control signals


Signal Value Effect
name (binary)
ALUOp 00 The ALU performs an add operation.

01 The ALU performs a subtract operation.

10 The funct field of the instruction determines the ALU operation.

ALUSrcB 00 The second input to the ALU comes from the B register.

01 The second input to the ALU is the constant 4.

10 The second input to the ALU is the sign-extend, lower 16 bits of the IR.

11 The second input to the ALU is the sign-extended, lower 16 bits of the IR shifted left 2 bits.

PCSource 00 Output of the ALU (PC+4) is sent to the PC for writing.

01 The contents of ALUOut (the branch target address) are sent to the PC for waiting.

10 The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is
sent to the PC for writing.)

2004 Morgan Kaufmann Publishers 43


Instructions from ISA perspective

Consider each instruction from perspective of ISA.


Example:
The add instruction changes a register.
Register specified by bits 15:11 of instruction.
Instruction specified by the PC.
New value is the sum (op) of two registers.
Registers specified by bits 25:21 and 20:16 of the instruction
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op
Reg[Memory[PC][20:16]]

In order to accomplish this we must break up the instruction.


(kind of like introducing variables when programming)

2004 Morgan Kaufmann Publishers 44


Breaking down an instruction

ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op


Reg[Memory[PC][20:16]]

Could break down to:


IR <= Memory[PC]
A <= Reg[IR[25:21]]
B <= Reg[IR[20:16]]
ALUOut <= A op B
Reg[IR[20:16]] <= ALUOut

We forgot an important part of the definition of arithmetic!


PC <= PC + 4

2004 Morgan Kaufmann Publishers 45


Idea behind multicycle approach

We define each instruction from the ISA perspective (do this!)

Break it down into steps following our rule that data flows through
at most one major functional unit (e.g., balance work across steps)

Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)

Finally try and pack as much work into each step


(avoid unnecessary cycles)
while also trying to share steps where possible
(minimizes control, helps to simplify solution)

Result: Our books multicycle Implementation!

2004 Morgan Kaufmann Publishers 46


Five Execution Steps

Instruction Fetch

Instruction Decode and Register Fetch

Execution, Memory Address Computation, or Branch Completion

Memory Access or R-type instruction completion

Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

2004 Morgan Kaufmann Publishers 47


Step 1: Instruction Fetch

Use PC to get instruction and put it in the Instruction Register.


Increment the PC by 4 and put the result back in the PC.
Can be described succinctly using RTL "Register-Transfer Language"

IR <= Memory[PC];
PC <= PC + 4;

Can we figure out the values of the control signals?

What is the advantage of updating the PC now?

2004 Morgan Kaufmann Publishers 48


Step 2: Instruction Decode and Register Fetch

Read registers rs and rt in case we need them


Compute the branch address in case the instruction is a branch
RTL:

A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

We aren't setting any control lines based on the instruction type


(we are busy "decoding" it in our control logic)

2004 Morgan Kaufmann Publishers 49


Step 3 (instruction dependent)

ALU is performing one of three functions, based on instruction type

Memory Reference:

ALUOut <= A + sign-extend(IR[15:0]);

R-type:

ALUOut <= A op B;

Branch:

if (A==B) PC <= ALUOut;

2004 Morgan Kaufmann Publishers 50


Step 4 (R-type or memory-access)

Loads and stores access memory

MDR <= Memory[ALUOut];


or
Memory[ALUOut] <= B;

R-type instructions finish

Reg[IR[15:11]] <= ALUOut;

The write actually takes place at the end of the cycle on the edge

2004 Morgan Kaufmann Publishers 51


Write-back step

Reg[IR[20:16]] <= MDR;

Which instruction needs this?

2004 Morgan Kaufmann Publishers 52


Summary:

2004 Morgan Kaufmann Publishers 53


Problem: CPI in a multicycle CPU

Using the SPECINT2000 instruction mix shown in Figure 3.26, what is


the CPI, assuming that each state in the multicycle CPU requires 1
clock cycle?

Answer:
The mix is 25% loads (1% load byte+24% load word), 10% stores (1%
store byte+9% store word), 11% branches (6% beq, 5% bne), 2% jumps
(1% jal+1% jr), and 52% ALU (all the rest of the mix, which we assume to
be ALU instructions). From Figure 5.30 on page 329, the number of clock
cycles for each instruction class is the following:
Loads: 5 ; Store: 4; ALU instructions: 4; Branches: 3; Jumps: 3;

The CPI is given by the following:

CPU clock cycles Instruction count i CPI i


CPI
Instruction count Instruction count
Instruction count i
CPI i
Instruction count 2004 Morgan Kaufmann Publishers 54
The ratio

Instruction counti
Instruction count
is simplify the instruction frequency for the instruction class i. We
can therefore substitute to obtain

CPI 0.25 5 0.10 4 0.52 4 0.11 3 0.02 3 4.12


This CPI is better than the worst-case CPI of 5.0 when all the
instructions take the same number of clock cycles.

2004 Morgan Kaufmann Publishers 55


Figure 5.39 The multicycle datapath with the addition needed
to implement exceptions.

2004 Morgan Kaufmann Publishers 56

Вам также может понравиться