Вы находитесь на странице: 1из 33

Single Cycle Datapath

The Big Picture: Where are We?

The Five Classic Components of a Computer


Processor
Input
Control

Datapath

Memory
Output

The Performance Perspective


Performance of a machine was determined by:
Instruction count
Clock cycle time
Clock cycles per instruction
Processor design (datapath and control) will
determine:
Clock cycle time
Clock cycles per instruction

The MIPS Instruction Formats

All MIPS instructions are 32 bits long. The three instruction


formats:
31
26
21
16
11
6
0
op
rs
rt
rd
shamt
funct
R-type
6 bits 5 bits
5 bits 5 bits 5 bits
6 bits
31
26
21
16
0
immediate
op
rs
rt
I-type
6 bits 5 bits
5 bits
16 bits
31
26
0
J-type
op
target address
6 bits
26 bits

The different fields are:


op: operation of the instruction
rs, rt, rd: the source and destination register specifiers
shamt: shift amount
funct: selects the variant of the operation in the op field
address / immediate: address offset or immediate value
target address: target address of the jump instruction

The MIPS Subset

ADD and subtract


add rd, rs, rt
sub rd, rs, rt

31

26
op
6 bits

21
rs
5 bits

16
rt
5 bits

31
26
21
16
OR Immediate:
op
rs
rt
ori rt, rs, imm16
6 bits 5 bits 5 bits
LOAD and STORE
lw rt, rs, imm16
sw rt, rs, imm16
BRANCH:
beq rs, rt, imm16
JUMP:
j target

31

11

6
rd
shamt
5 bits 5 bits

funct
6 bits
0

immediate
16 bits

26
op
6 bits

0
target address
26 bits

An
Abstract
Implementation
Clk

View

of

the

PC
Instruction Address
Ideal
Instruction
Instruction
Rd Rs Rt
Memory
5 5
5

Clk

Rw Ra Rb
32 32-bit
Registers

32

ALU

32

Imm
1
6

32

Data
32 Address
Data
In
Clk

Ideal DataOut
Data
Memory

Clocking Methodology
Clk
Setup Hold

Setup Hold
Dont Care

.
.
.

.
.
.

.
.
.

All storage elements are clocked by the same clock edge


Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew

.
.
.

An Abstract View of the Critical Path

Clk

Register file and ideal memory:


The CLK input is a factor ONLY during write operation
During read operation, behave as combinational logic:
Address valid => Output valid after access time.

PC
Instruction Address
Ideal
Instruction
Instruction
Rd Rs Rt
Memory
5 5
5

Clk

Rw Ra Rb
32 32-bit
Registers

ALU

32

Critical Path (Load Operation) =


PCs Clk-to-Q +
Instruction Memorys Access Time +
Register Files Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Imm
Setup Time for Register File Write +
1
Clock Skew
6
Data
32 Address
Ideal DataOut
32
Data
Data
Memory
In
32

Clk

The Steps
Processor

of

Designing

Instruction Set Architecture => Register Transfer


Language
Register Transfer Language => Datapath Design
Datapath components
Datapath interconnect
Datapath components => Control signals
Control
Design signals => Control logic => Control Unit

RTL Example 1: The ADD Instruction


add

rd, rs, rt

mem[PC]
memory

Fetch the instruction

R[rd] <- R[rs] + R[rt]

from

The ADD operation

PC <- PC + 4 Calculate the next


instructions address

RTL Example 2: The Load Instruction

lw rt, rs, imm16


mem[PC]

Fetch the instruction


from memory

Addr <- R[rs] + SignExt(imm16)


memory
R[rt] <- Mem[Addr]
register

address

Calculate the

Load the data into the

PC <- PC + 4
Calculate the next
instructions address

Need of Combinational Logic Elements

Adder

CarryIn
A

MUX
B

A
ALU
B

Sum
Carry

Sele
ct
32

32

32

32

32

O
P

ALU

32

32

MUX

Adder

32

32

Result
Zero

Storage Element: Register


Register
Write Enable
Similar to the D Flip Flop except
Data In
Data Out
N-bit input and output
N
N
Write Enable input
Write Enable:
Clk
0: Data Out will not change
1: Data Out will become Data In

Storage Element: Idealized Memory


Write EnableAddress

Memory (idealized)
One input bus: Data In
Data In
DataOut
One output bus: Data Out
32
32
Clk
Memory word is selected by:
Write Enable = 0: Address selects the word to put on
Data Out
Write Enable = 1: address selects the memory
memory word to be written via the Data In bus
Clock input (CLK)
The CLK input is a factor ONLY during write operation
During read operation, behaves as a combinational
logic block:
Address valid => Data Out valid after access time.

Storage Element: Register File

RW RA RB
5 5 5
Register File consists of 32 registers:Write Enable
busA
Two 32-bit output busses:
busW
32
32 32-bit
busA and busB
32
Registers busB
One 32-bit input bus: busW
Clk
32
Register is selected by:
RA selects the register to put on busA
RB selects the register to put on busB
RW selects the register to be written
via busW when Write Enable is 1
Clock input (CLK)
The CLK input is a factor ONLY during write operation
During read operation, behaves as a combinational
logic block: RA or RB valid => busA or busB valid
after access time.

Overview of the Instruction Fetch Unit

The common RTL operations


Fetch the Instruction: mem[PC]
Update the program counter:
Sequential Code: PC <- PC + 4
Branch and Jump PC <- something else

Clk

PC
Next Address
Logic
Address
Instruction
Memory

Instruction
Word
3
2

RTL: The ADD Instruction


add

rd, rs, rt

mem[PC]
memory

Fetch the instruction

R[rd] <- R[rs] + R[rt]

from

The ADD operation

PC <- PC + 4 Calculate the next


instructions address

Datapath
Operations

for

Register-Register

R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt


Ra, Rb, and Rw comes from instructions rs, rt, and rd fields
ALUctr and RegWr: control logic after decoding the instruction

31

26
op
6 bits

21
rs
5 bits

16
rt
5 bits

11
rd
5 bits

Rd Rs Rt
RegWr 5 5 5
busA
32
busB
32

0
funct
6 bits

ALUc
tr

ALU

busW
32
Clk

Rw Ra Rb
32 32-bit
Registers

6
shamt
5 bits

Result
3
2

Register-Register Timing
Clk
Old
Value
Rs, Rt, Rd,
Op, Func

PC

ALUctr
RegWr

Clk-to-Q
New
Value
Instruction Memory Access Time
Old
New
Value
Value Delay through Control Logic
Old
New
Value
Value
Old
Value

busA,
B

Old
Value

New
Value
Register File Access Time
New
Value ALU Delay

busW

Old
Value

New
Value

Rd Rs Rt
RegWr 5 5 5
busA
32
busB
32

ALU

busW
32
Clk

Rw Ra Rb
32 32-bit
Registers

ALUc
tr

Register Write
Occurs Here
Result
3
2

RTL: The OR Immediate Instruction


31

26
op
6 bits

21
rs
5 bits

16
rt
5 bits

0
immediate
16 bits

ori rt, rs, imm16


mem[PC]

Fetch the instruction from memory

R[rt] <- R[rs] OR ZeroExt(imm16)


PC <- PC + 4

The OR operation

Calculate the next instructions address

31
1615
0000000000000000
16 bits

0
immediate
16 bits

Datapath for Logical Operations with


Immediate

R[rt] <- R[rs] op ZeroExt[imm16]] Example: ori

31

26
op
6 bits

21
rs
5 bits

rt, rs, imm16

11
immediate
16 bits
rd

16
rt
5 bits

ALU

16

ALUc
tr

Mux

imm16

ZeroExt

Rd Rt
RegDst
Mux
Dont Care
Rs
RegWr 5 5 5 (Rt)
busA
Rw Ra Rb
busW
32
32 32-bit
32
Registers
busB
Clk
32
32

ALUSrc

Result
32

RTL: The Load Instruction


31

lw

26
op
6 bits

rt, rs, imm16

mem[PC]

21
rs
5 bits

16
rt
5 bits

immediate
16 bits

Fetch the instruction from memory

Addr <- R[rs] + SignExt(imm16)


R[rt] <- Mem[Addr]
PC <- PC + 4

Calculate the memory address

Load the data into the register


Calculate the next instructions address

31

SignExt
operation

16 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00
16 bits
31
16 15
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11
16 bits

0
immediate
16 bits
immediate
16 bits

Datapath for Load Operations

R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw

31

26
op
6 bits

Rd
RegDst

21
rs
5 bits

16
rt
5 bits

rt, rs, imm16

11
immediate
16 bits
rd

Rt

Mux

ExtOp

ALUSrc

MemtoReg

32
MemWr

Data In
32
Clk

WrEn Adr
Data
Memory

32

Mux

32

ALUc
tr

ALU

16

Mux

imm16

Extender

Dont Care
Rs
RegWr5 5 5 (Rt)
busA
Rw Ra Rb
busW
32
32 32-bit
32
Registers
busB
Clk
32

RTL: The Store Instruction


31

26
op
6 bits

21
rs
5 bits

16
rt
5 bits

0
immediate
16 bits

sw rt, rs, imm16


mem[PC]

Fetch the instruction from memory

Addr <- R[rs] + SignExt(imm16)


address
Mem[Addr] <- R[rt]
PC <- PC + 4

Calculate the memory

Store the register into memory


Calculate the next instructions address

Datapath for Store Operations

Mem[R[rs] + SignExt[imm16] <- R[rt]] Example: sw

31

26
op
6 bits

Rd
RegDst

Mux

RegWr5

immediate
16 bits

Rt
Rs Rt
5 5
busA
32
busB
32

Extender

16

ALUc
tr

32

ExtOp

MemWr

32

Data In32
ALUSrc Clk

MemtoReg

32
WrEn Adr
Data
Memory

Mux

Rw Ra Rb
32 32-bit
Registers

imm16

rt
5 bits

Mux

32
Clk

rs
5 bits

16

ALU

busW

21

rt, rs, imm16

RTL: The Branch Instruction


31

26
op
6 bits

beq

21
rs
5 bits

16
rt
5 bits

0
immediate
16 bits

rs, rt, imm16

mem[PC]

Fetch the instruction from memory

Cond <- R[rs] - R[rt]


if (COND eq 0)
address

Calculate the branch condition


Calculate the next instructions

PC <- PC + 4 + ( SignExt(imm16) x 4 )
else PC <- PC + 4

Datapath for Branch Operations

beq

rs, rt, imm16 We need to compare Rs and Rt!

31

26
op
6 bits

Rd
RegDst

Mux

RegWr5

32
Clk

rs
5 bits

rt
5 bits

immediate
16 bits

Rt

Branch

Rs Rt
5 5

Rw Ra Rb
32 32-bit
Registers

ALUc
tr

32
busB
32

Mux

16

busA

Extender

imm16

16

32

ExtOp

ALUSrc

ALU

busW

21

PC

Clk

imm16 Next Address


Logic
16
Zero

To Instruction
Memory

Binary Arithmetics
Address

for

the

Next

In theory, the PC is a 32-bit byte address into the instruction memory:


Sequential operation: PC<31:0> = PC<31:0> + 4
Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] *
4
The magic number 4 always comes up because:
The 32-bit PC is a byte address
And all our instructions are 4 bytes (32 bits) long
In other words:
The 2 LSBs of the 32-bit PC are always zeros
There is no reason to have hardware to keep the 2 LSBs
In practice, we can simplify the hardware by using a 30-bit PC<31:2>:
Sequential operation: PC<31:2> = PC<31:2> + 1
Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
In either case: Instruction Memory Address = PC<31:2> concat
00

Next Address Logic: Expensive and Fast


Solution

Using a 30-bit PC:


Sequential operation: PC<31:2> = PC<31:2> + 1
Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
In either case: Instruction Memory Address = PC<31:2> concat 00

30

30

Mux

imm16
16
Instruction<15:0>

SignExt

Clk

30

Adder

Adder

PC

30

1
30

00

Addr<31:2>
Addr<1:0>
Instruction
Memory
32

30
Instruction<31:0>
Branch Zero

Next Address Logic: Cheap and Slow


Solution
Why is this slow?
Cannot start the address add until Zero (output of ALU) is valid
Does it matter that this is slow in the overall scheme of things?
Probably not here. Critical path is the load operation.
30

PC

30

1
0

Adder

Mux

imm16
16
nstruction<15:0>

SignExt

Clk

Carry In

0
1

30

30

00

Addr<31:2>
Addr<1:0>
Instruction
Memory
32

30
Instruction<31:0>

BranchZero

RTL: The Jump Instruction


31

26
op
6 bits

0
target address
26 bits

target
mem[PC]

Fetch the instruction from memory

PC<31:2> <- PC<31:28> concatenate target<25:0>


Calculate the next instructions address

Instruction Fetch Unit


j

target
PC<31:2> <- PC<31:28> concat target<25:0>
30
30

PC<31:28>

Mux

30

Adder

imm16
16
Instruction<15:0>

SignExt

Clk

Adder

PC

30

1
30

30
Branch Zero
equal

Mux

Target 4
Instruction<25:0>
26
30

Addr<31:2>

Jump

00

Addr<1:0>
Instruction
Memory
32

Instruction<31:0>

Putting it All Together: A Single Cycle


Datapath
We have everything except control signals (underline)

1 Mux 0
RegWr 5

Rs Rt
5 5
busA

32

ExtOp

Data In32

ALUSrc

Clk

WrEn Adr
Data
Memory

32

Mux

16

Extender

imm16

MemtoReg

32

Mux

32
Clk

Rw Ra Rb
32
32 32-bit
Registers busB
0
32

Rs Rd Imm16

Zero MemWr

ALU

busW

Rt

ALUc
tr

<0:15>

Jump
Clk

Rt

<11:15>

RegDst

<21:25>

Rd

<16:20>

Instruction<31:0>
Instruction
Fetch Unit

Branch

Вам также может понравиться