Вы находитесь на странице: 1из 29

HUMBOLDT-UNIVERSITT ZU BERLIN

INSTITUT FR INFORMATIK

COMPUTER ARCHITECTURE
Lecture 8

The Processing Unit, Microprogrammed Control And Pipelining


Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek
www.informatik.hu-berlin.de/rok/ca

CA - VIII - PU&MC - 1

THE PROCESSING UNIT AND MICROPROGRAMMED CONTROL

FUNDAMENTAL CONCEPTS SEQUENCING OF CONTROL SIGNALS CONCEPTS OF MICROPROGRAMMING MICROINSTRUCTION FORMATS HARDWIRED vs. MICROPROGRAMMED CONTROL UNIT
CA - VIII - PU&MC - 2

CONTROL UNIT (1)


Control Unit has two major functions: To control the sequencing of information-processing tasks performed by machine Guiding and supervising each unit to make sure that each unit carries out every operation assigned at the proper time Control of a computer can be distributed or centralized Early computers used distributed control and a lot of redundant hardware Contemporary small computers use central control unit Contemporary large computers use distributed control and highly specialized systems (e.g., I/O channels, I/O devices), sometimes even form separate computers Distributed control is widely used in computer with multiprogramming capabilities
CA - VIII - PU&MC - 3

CONTROL UNIT (2)


Two basic types of control units are used: Hardwired control unit Microprogrammed control unit

The typical control unit consists mainly of registers and combinational execution logic (in hardwired systems) and additional fast control memory (in microprogrammed systems)
CA - VIII - PU&MC - 4

BASIC PROCESSING UNIT ORGANIZATION


R ( i )IN

X
R ( i )

X
R ( i )O U T Y IN

X
Y YO U T

X
A B

ALU

Z IN

X Z

X
Z OU T

All signals R(i)in, R(i)out, Yin, Yout, Zin, Zout are triggered by a CONTROL UNIT CA - VIII - PU&MC - 5

PROCESSING UNIT FEATURES


Control signals to or from memory MDR from or to arithmetic register (ACC) to memory selection circuits Decoder MAR for branch instructions PC

Control signal generator

Op-code IR

Address

IR-CURRENT INSTRUCTION REGISTER PC-CURRENT PROGRAM REGISTER, (Program Counter)


CA - VIII - PU&MC - 6

SEQUENCING OF OPERATIONS (1)


THE SEQUENCER generates specific output sequence of control signals corresponding to any given instruction op-code as output VARIATIONS IN THE SEQUENCER DESIGN

SYNCHRONOUS

vs.

ASYNCHRONOUS (clocked) using delay elements with variable delays

use of the ring counters for every instruction or binary counters

CA - VIII - PU&MC - 7

SEQUENCING OF OPERATIONS (2)

FIXED

vs.

VARIABLE

1) use of the fixed binary counter & decoder (cycle length corresponds to the maximum length control sequence) (slower but less expensive) 2) fixed counter with a cycle length determined by the fetch phase; for the longer control sequences multiple cycles of the counter are utilized

1) separate ring counters for group of instructions with identical sequence

2) selectable-modulo counter where the length of the counter cycle is determined by the instruction to be executed (reset after instruction executed)
CA - VIII - PU&MC - 8

INSTRUCTION FETCH & EXECUTION


FETCH PHASE
set to READ

T0| PC MAR, 1 R T1| M (MAR) MDR T2| MDR IR; decode; restore memory T3| (PC + 1) PC EXECUTION PHASE T0| T1| T2| T3| 1 R; Ad (IR) MAR M (MAR) MDR MDR X; restore memory X+AZ

CA - VIII - PU&MC - 9

INSTRUCTION FETCH & EXECUTION (2)


COMBINED FETCH / EXECUTE e.g., 1) BAN - branches if ACC negative T0| PC MAR, 1 R fetch | T1 M (MAR) MDR T2| MDR IR; decode; restore memory T3| A0 (PC + 1) + A0 ( Ad (IR)) PC execute 2) Register reference instruction (increment ACC by 1) T0| PC MAR, 1 R fetch T1| M (MAR) MDR T2| MDR X; restore memory, decode T3| (A + 1) A; (PC + 1) PC execute INDIRECT ADDRESSING & DEFER PHASE T0| 1 R T1| Ad (IR) MAR T2| M (MAR) MDR Execution phase is delayed T3| MDR Ad (IR)
CA - VIII - PU&MC - 10

INSTRUCTION FETCH & EXECUTION (3)


T1 FROM MEMORY TO MEMORY MDR T2 IR T3 T1 X T2 ALU T3 A
CA - VIII - PU&MC - 11

MAR T0 PC INCREMENT BY 1

FETCH

MDR

EXECUTE
0

Ad( IR)

PENTIUM LAYOUT
Instruction Fetch Code TLB
Clock Driver

Code Cache

Instruction Decode

Branch Prediction Logic Control Logic Complex Instruction Support Pipelined Floating Point

Bus Interface Logic Data TLB Data Cache

Superscalar Integer Execution Units

TLB - Translation Lookaside Buffer

CA - VIII - PU&MC - 12

PENTIUM BLOCK DIAGRAM


64-Bit Data Bus Pentium Bus Interface TLB
Data cache 8 Kb Code cache 8 Kb TLB

32-Bit Address Bus

BTB Microcode ROM

Prefetch buffer Instruction decode Control unit

v pipeline

u pipeline

Floatingpoint pipeline

Register TLB - Translation Lookaside Buffer BTB - Branch Trace Logic

CA - VIII - PU&MC - 13

THE i386 REGISTERS


Instruction/stack pointer
31 16 15

EFLAG register
0 31 16 15 0

EIP ESP
31 16 15

IP SP AL BL CL DL

EFLAG Segment registers


0 15 0

FLAG

General-purpose registers EAX EBX ECX EDX ESI EDI EBP AH BH CH DH

SI DI BP Memory management registers


15 0 31

CS SS DS ES FS GS
0 19 0

TR LDTR

TSS select LDT select IDTR GDTR

TSS base address LDT base address IDT base address GDT base address

TSS limit LDT limit IDT limit GDT limit Debug registers

Control registers
31 16 15 0

31

16 15

CR3 CR2 CR1 CR0 Test registers


31 16 15 0

TR7 TR6

DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0


CA - VIII - PU&MC - 14

THE REGISTERS OF THE PENTIUM INTEGER UNIT


Instruction/stack pointer
31 16 15

EFLAG register
0 31 16 15 0

EIP ESP
31 16 15

IP SP

EFLAG Segment registers


0 15 0

FLAG

General-purpose registers EAX EBX ECX EDX ESI EDI EBP


15

AH BH CH DH

SI DI BP

AL BL CL DL

CS SS DS ES FS GS
0 19 0

Memory management registers


0 31

TSS select LDT select IDTR Control registers GDTR TR LDTR


31 16 15

TSS base address LDT base address IDT base address GDT base address
0 31

TSS limit LDT limit IDT limit GDT limit


16 15

Debug registers
0

CR4 CR3 CR2 CR1 CR0 Model-specific registers Machine check address register Machine check type register TR12

DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0


0

63

CA - VIII - PU&MC - 15

PowerPC Architecture 440GP Example


128-bit Processor Local Bus Superscalar: Two instructions per cycle, out of order execution 7 Stage pipeline
Three independent pipelines

I-Cache

D-Cache

Caches
32KB instruction & 32KB data cache 64-way set associative, 32 byte line

I-Cache Control

MMU

D-Cache Control

Dynamic branch prediction Interrupt controller supports 12 external and 20 internal interrupts 36-bit real address 64-entry, unified, fully associative TLB
4-entry instruction TLB; 8-entry data TLB

Instruction Branch Unit Unit

Complex Integer Pipe Timers

Single Integer Pipe

Load / Store Pipe

Power Management Interrupts

Debug/Trace

PowerPC 440xx Core


CA - VIII - PU&MC - 16

MAIN REGISTERS OF THE PowerPC


FR0 FR1 . . . FR31 0 R0 R1 . . R31 CR 0 XER 0 31
CA - VIII - PU&MC - 17

Floating-point registers

63

General purpose registers

Condition register 31 Integer-exception register

PIPELINING
A TECHNIQUE OF OVERLAPPED PROCESSING IN ORDER TO INCREASE FREQUENCY OF TASK COMPLETIONS: A PIPELINE WORKS LIKE AN ASSEMBLY LINE. THERE ARE INSTRUCTION, ARITHMETIC AND PROCESSOR PIPELINES.

EXAMPLE OF A PIPELINE OF ADDERS


Xi = Ai + Bi + Ci + Di i = 1,2,....,n Tc - clock cycle for each addition (e.g., 20ns) a) A GENERAL ARITHMETIC AND LOGIC UNIT
ALU Tp - processing time Each addition requires five clock cycles: Tp = 3n x 5Tc = 15nTc Even if each addition takes only one cycle: Tp = 3nTc Xi

b) A PIPELINE OF ADDERS
Ai

Bi +

Ci +

Di + X i

Tpipe - pipeline processing time k - the number of stages in the pipeline Each addition requires one clock cycle: Tpipe = kTc + (n-1)Tc = (k+n-1)Tc
CA - VIII - PU&MC - 18

EXAMPLE OF A FIVE STAGE INSTRUCTION PIPELINE OF A 80X86


Instruction fetch Memory access

Cycle n Cycle n+1 Cycle n+2 Cycle n+3 Cycle n+4

ADD eax, mem32 Decode ADD Fetch mem32 Add eax+ mem32 Result to eax

eax - general purpose register

Decode

CA - VIII - PU&MC - 19

Write-back

Execution

EXAMPLE OF AN EIGHT STAGE FLOATING POINT PIPELINE


IF D1 Instr. Instr. Cycle n k-1 k Instr. Instr. Cycle n+1 k k+1 Instr. Instr. Cycle n+2 k+1 k+2 Instr. Instr. Cycle n+3 k+2 k+3 Instr. Instr. Cycle n+4 k+3 k+4 Instr. Instr. Cycle n+5 k+4 k+5 Instr. Instr. Cycle n+6 k+5 k+6 Instr. Instr. Cycle n+7 k+6 k+7 IF - Instruction Fetch D2 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 Instr. k+4 Instr. k+5 EX Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 Instr. k+4 WB/X1 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 X2 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 WF Instr. k-6 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 ER Instr. k-7 Instr. k-6 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Result k-7 Result k-6 Result k-5 Result k-4 Result k-3 Result k-2 Result k-1 Result k

D1, D2 - First / Second Decoding Stage (operand addresses) EX, X1, X2 - Execution WB - Write Back (in three stages), (Update the Status Register) WF Write Floating-Point Register, Rounding-off ER - Error checking
CA - VIII - PU&MC - 20

Execution Scrabble diagrams for 5 generations of microprocessors


Generation 0 Time Generation 1
F D E F D E F D E F D F E D F E D F E D E F E F E

Generation 2 Instruction 1 Instruction 2 Instruction 3 Instruction 4

Instruction 1 Instruction 2 Instruction 3 Time Generation 3


F D F A D F R A D F E R A D W E R A W E R W E W

Time

F Fetch instruction

R Read operands E Execute W Write result

Time

D Decode A Address calculation


CA - VIII - PU&MC - 21

Generation 4
F F D D F F A A D D F F R R A A D D F F E E R R A A D D W W E E R R A A W W E E R R W W E E W W F F F D D D F F F A A A D D D F F F R R R A A A D D D F F F E E E E E E E E E

Generation 5
E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E W W W W W W W W W W (some latency) W W

Time

Time

Dataflow model

CA - VIII - PU&MC - 22

A SIMPLE SEQUENCER

Add sequence

Fetch sequence

Multiply sequence

+
Divide sequence

CA - VIII - PU&MC - 23

A SEQUENCER USING A FIXED COUNTER

Status

Control sequences

Control logic

Decoder

Counter Count

Clock Multiply Add

CA - VIII - PU&MC - 24

CPU BEHAVIOR REPRESENTED AS A SINGLE CLOSED LOOP


Transfer program counter to memory address register Fetch the instruction from main memory Step 1

Step 2

Increment program counter and decode instruction

Step 3

Transfer operand address to memory address register Fetch the operand(s) from main memory

Step 4

Step 5

Perform operation specified by the instruction

Step 6

CA - VIII - PU&MC - 25

FLOWCHART OF CONTROL UNIT


Fetchexecute flipflop Q G Stopstart flipflop ADD X, ACC MAR Start switch Stop switch Power switch Memory IR Opcode Address

MDR Decoder X Control logic Control signals Adder Counter T Clock pulses ACC

CA - VIII - PU&MC - 26

OPERATION OF AN EIGHT-INSTRUCTION CPU


Stop No CPU active? Yes Start AR<-PC Read M PC<-PC+1 IR<-DR(OP) No STORE ? No LOAD Yes ? Yes AR<IR(ADR) AR<IR(ADR) AR<DR(ADR) AR<DR(ADR) Read M AC<-DR

DR<-AC

Write M

Yes

AC AR DR DR(OP) DR(ADR) IR M PC

= Accumulator = Memory Address Register = Memory Data Register = Opcode Field of DR = Address Field of DR = Instruction Register = Memory = Program Counter

ADD ? No AND ? No JUMP ? No JUMPZ ? No COMP ? No Yes Yes

Read M

AC<AC+DR AC<AC^DR PC<DR(ADR)

Read M

Yes No AC=0 ? Yes

PC<DR(ADR)

Yes AC<-AC RSHIFT Shift AC

Fetch Cycle

CA - VIII - PU&MC - 27

Execute Cycle

STRUCTURE OF A SIMPLE EIGHT- INSTRUCTIONS CPU

AC = 0 c0 c2 c7 c 3 (Read) c 4 (Write) Main memory M c5 Arithmeticlogic circuits AC c 12 c6

MDR c1 PC MAR c8 c9 c 10 Control Unit IR c 11

c0 c1 c12

CA - VIII - PU&MC - 28

HARDWIRED CPU CONTROL UNIT, A SEQUENCE COUNTER OF AN EIGHT- INSTRUCTIONS CPU

Modulo-8 sequence counter F 1


LOAD STORE ADD AND JUMP JUMPZ COMP RSHIFT

F2

F3

F 4

F 5

F 6

F7

F8

Instruction decoder

Combinational circuit N

IR AC = 0 (for JUMPZ)

c1

c 12

CA - VIII - PU&MC - 29

Вам также может понравиться