Eng Ca08

HUMBOLDT-UNIVERSITT ZU BERLIN
INSTITUT FR INFORMATIK
COMPUTER ARCHITECTURE
Lecture 8
The Processing Unit, Microprogrammed Control And Pipelining

Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek
www.informatik.hu-berlin.de/rok/ca
CA - VIII - PU&MC - 1
THE PROCESSING UNIT AND MICROPROGRAMMED CONTROL
FUNDAMENTAL CONCEPTS SEQUENCING OF CONTROL SIGNALS CONCEPTS OF MICROPROGRAMMING MICROINSTRUCTION FORMATS HARDWIRED vs. MICROPROGRAMMED CONTROL UNIT
CONTROL UNIT (1)

Control Unit has two major functions: To control the sequencing of information-processing tasks performed by machine Guiding and supervising each unit to make sure that each unit carries out every operation assigned at the proper time Control of a computer can be distributed or centralized Early computers used distributed control and a lot of redundant hardware Contemporary small computers use central control unit Contemporary large computers use distributed control and highly specialized systems (e.g., I/O channels, I/O devices), sometimes even form separate computers Distributed control is widely used in computer with multiprogramming capabilities
CONTROL UNIT (2)

Two basic types of control units are used: Hardwired control unit Microprogrammed control unit
The typical control unit consists mainly of registers and combinational execution logic (in hardwired systems) and additional fast control memory (in microprogrammed systems)
BASIC PROCESSING UNIT ORGANIZATION

R ( i )IN
X
R ( i )
X
R ( i )O U T Y IN
X
Y YO U T
X
A B
ALU
Z IN
X Z
X
Z OU T
All signals R(i)in, R(i)out, Yin, Yout, Zin, Zout are triggered by a CONTROL UNIT CA - VIII - PU&MC - 5
PROCESSING UNIT FEATURES

Control signals to or from memory MDR from or to arithmetic register (ACC) to memory selection circuits Decoder MAR for branch instructions PC
Control signal generator
Op-code IR
Address
IR-CURRENT INSTRUCTION REGISTER PC-CURRENT PROGRAM REGISTER, (Program Counter)

SEQUENCING OF OPERATIONS (1)

THE SEQUENCER generates specific output sequence of control signals corresponding to any given instruction op-code as output VARIATIONS IN THE SEQUENCER DESIGN
SYNCHRONOUS
vs.
ASYNCHRONOUS (clocked) using delay elements with variable delays
use of the ring counters for every instruction or binary counters
SEQUENCING OF OPERATIONS (2)
FIXED
vs.
VARIABLE
1) use of the fixed binary counter & decoder (cycle length corresponds to the maximum length control sequence) (slower but less expensive) 2) fixed counter with a cycle length determined by the fetch phase; for the longer control sequences multiple cycles of the counter are utilized
1) separate ring counters for group of instructions with identical sequence
2) selectable-modulo counter where the length of the counter cycle is determined by the instruction to be executed (reset after instruction executed)
INSTRUCTION FETCH & EXECUTION

FETCH PHASE
set to READ
T0| PC MAR, 1 R T1| M (MAR) MDR T2| MDR IR; decode; restore memory T3| (PC + 1) PC EXECUTION PHASE T0| T1| T2| T3| 1 R; Ad (IR) MAR M (MAR) MDR MDR X; restore memory X+AZ
INSTRUCTION FETCH & EXECUTION (2)

COMBINED FETCH / EXECUTE e.g., 1) BAN - branches if ACC negative T0| PC MAR, 1 R fetch | T1 M (MAR) MDR T2| MDR IR; decode; restore memory T3| A0 (PC + 1) + A0 ( Ad (IR)) PC execute 2) Register reference instruction (increment ACC by 1) T0| PC MAR, 1 R fetch T1| M (MAR) MDR T2| MDR X; restore memory, decode T3| (A + 1) A; (PC + 1) PC execute INDIRECT ADDRESSING & DEFER PHASE T0| 1 R T1| Ad (IR) MAR T2| M (MAR) MDR Execution phase is delayed T3| MDR Ad (IR)
INSTRUCTION FETCH & EXECUTION (3)

T1 FROM MEMORY TO MEMORY MDR T2 IR T3 T1 X T2 ALU T3 A
MAR T0 PC INCREMENT BY 1
FETCH
MDR
EXECUTE
0
Ad( IR)
PENTIUM LAYOUT
Instruction Fetch Code TLB
Clock Driver
Code Cache
Instruction Decode
Branch Prediction Logic Control Logic Complex Instruction Support Pipelined Floating Point
Bus Interface Logic Data TLB Data Cache
Superscalar Integer Execution Units
TLB - Translation Lookaside Buffer
PENTIUM BLOCK DIAGRAM

64-Bit Data Bus Pentium Bus Interface TLB
Data cache 8 Kb Code cache 8 Kb TLB
32-Bit Address Bus
BTB Microcode ROM
Prefetch buffer Instruction decode Control unit
v pipeline
u pipeline
Floatingpoint pipeline
Register TLB - Translation Lookaside Buffer BTB - Branch Trace Logic
THE i386 REGISTERS

Instruction/stack pointer
31 16 15
EFLAG register
0 31 16 15 0
EIP ESP
31 16 15
IP SP AL BL CL DL
EFLAG Segment registers

0 15 0
FLAG
General-purpose registers EAX EBX ECX EDX ESI EDI EBP AH BH CH DH
SI DI BP Memory management registers

15 0 31
CS SS DS ES FS GS
0 19 0
TR LDTR
TSS select LDT select IDTR GDTR
TSS base address LDT base address IDT base address GDT base address
TSS limit LDT limit IDT limit GDT limit Debug registers
Control registers
31 16 15 0
31
16 15
CR3 CR2 CR1 CR0 Test registers

31 16 15 0
TR7 TR6
DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0

THE REGISTERS OF THE PENTIUM INTEGER UNIT

Instruction/stack pointer
31 16 15
EFLAG register
0 31 16 15 0
EIP ESP
31 16 15
IP SP
EFLAG Segment registers

0 15 0
FLAG
General-purpose registers EAX EBX ECX EDX ESI EDI EBP

15
AH BH CH DH
SI DI BP
AL BL CL DL
CS SS DS ES FS GS
0 19 0
Memory management registers

0 31
TSS select LDT select IDTR Control registers GDTR TR LDTR

31 16 15
TSS base address LDT base address IDT base address GDT base address
0 31
TSS limit LDT limit IDT limit GDT limit

16 15
Debug registers
0
CR4 CR3 CR2 CR1 CR0 Model-specific registers Machine check address register Machine check type register TR12
DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0

0
63
PowerPC Architecture 440GP Example

128-bit Processor Local Bus Superscalar: Two instructions per cycle, out of order execution 7 Stage pipeline
Three independent pipelines
I-Cache
D-Cache
Caches
32KB instruction & 32KB data cache 64-way set associative, 32 byte line
I-Cache Control
MMU
D-Cache Control
Dynamic branch prediction Interrupt controller supports 12 external and 20 internal interrupts 36-bit real address 64-entry, unified, fully associative TLB
4-entry instruction TLB; 8-entry data TLB
Instruction Branch Unit Unit
Complex Integer Pipe Timers
Single Integer Pipe
Load / Store Pipe
Power Management Interrupts
Debug/Trace
PowerPC 440xx Core

MAIN REGISTERS OF THE PowerPC

FR0 FR1 . . . FR31 0 R0 R1 . . R31 CR 0 XER 0 31
Floating-point registers
63
General purpose registers
Condition register 31 Integer-exception register
PIPELINING
A TECHNIQUE OF OVERLAPPED PROCESSING IN ORDER TO INCREASE FREQUENCY OF TASK COMPLETIONS: A PIPELINE WORKS LIKE AN ASSEMBLY LINE. THERE ARE INSTRUCTION, ARITHMETIC AND PROCESSOR PIPELINES.
EXAMPLE OF A PIPELINE OF ADDERS

Xi = Ai + Bi + Ci + Di i = 1,2,....,n Tc - clock cycle for each addition (e.g., 20ns) a) A GENERAL ARITHMETIC AND LOGIC UNIT
ALU Tp - processing time Each addition requires five clock cycles: Tp = 3n x 5Tc = 15nTc Even if each addition takes only one cycle: Tp = 3nTc Xi
b) A PIPELINE OF ADDERS
Ai
Bi +
Ci +
Di + X i
Tpipe - pipeline processing time k - the number of stages in the pipeline Each addition requires one clock cycle: Tpipe = kTc + (n-1)Tc = (k+n-1)Tc
EXAMPLE OF A FIVE STAGE INSTRUCTION PIPELINE OF A 80X86

Instruction fetch Memory access
Cycle n Cycle n+1 Cycle n+2 Cycle n+3 Cycle n+4
ADD eax, mem32 Decode ADD Fetch mem32 Add eax+ mem32 Result to eax
eax - general purpose register
Decode
Write-back
Execution
EXAMPLE OF AN EIGHT STAGE FLOATING POINT PIPELINE

IF D1 Instr. Instr. Cycle n k-1 k Instr. Instr. Cycle n+1 k k+1 Instr. Instr. Cycle n+2 k+1 k+2 Instr. Instr. Cycle n+3 k+2 k+3 Instr. Instr. Cycle n+4 k+3 k+4 Instr. Instr. Cycle n+5 k+4 k+5 Instr. Instr. Cycle n+6 k+5 k+6 Instr. Instr. Cycle n+7 k+6 k+7 IF - Instruction Fetch D2 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 Instr. k+4 Instr. k+5 EX Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 Instr. k+4 WB/X1 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 Instr. k+3 X2 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 Instr. k+2 WF Instr. k-6 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Instr. k+1 ER Instr. k-7 Instr. k-6 Instr. k-5 Instr. k-4 Instr. k-3 Instr. k-2 Instr. k-1 Instr. k Result k-7 Result k-6 Result k-5 Result k-4 Result k-3 Result k-2 Result k-1 Result k
D1, D2 - First / Second Decoding Stage (operand addresses) EX, X1, X2 - Execution WB - Write Back (in three stages), (Update the Status Register) WF Write Floating-Point Register, Rounding-off ER - Error checking
Execution Scrabble diagrams for 5 generations of microprocessors

Generation 0 Time Generation 1
F D E F D E F D E F D F E D F E D F E D E F E F E
Generation 2 Instruction 1 Instruction 2 Instruction 3 Instruction 4
Instruction 1 Instruction 2 Instruction 3 Time Generation 3

F D F A D F R A D F E R A D W E R A W E R W E W
Time
F Fetch instruction
R Read operands E Execute W Write result
Time
D Decode A Address calculation

Generation 4
F F D D F F A A D D F F R R A A D D F F E E R R A A D D W W E E R R A A W W E E R R W W E E W W F F F D D D F F F A A A D D D F F F R R R A A A D D D F F F E E E E E E E E E
Generation 5
E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E W W W W W W W W W W (some latency) W W
Time
Time
Dataflow model
A SIMPLE SEQUENCER
Add sequence
Fetch sequence
Multiply sequence
+
Divide sequence
A SEQUENCER USING A FIXED COUNTER
Status
Control sequences
Control logic
Decoder
Counter Count
Clock Multiply Add
CPU BEHAVIOR REPRESENTED AS A SINGLE CLOSED LOOP

Transfer program counter to memory address register Fetch the instruction from main memory Step 1
Step 2
Increment program counter and decode instruction
Step 3
Transfer operand address to memory address register Fetch the operand(s) from main memory
Step 4
Step 5
Perform operation specified by the instruction
Step 6
FLOWCHART OF CONTROL UNIT

Fetchexecute flipflop Q G Stopstart flipflop ADD X, ACC MAR Start switch Stop switch Power switch Memory IR Opcode Address
MDR Decoder X Control logic Control signals Adder Counter T Clock pulses ACC
OPERATION OF AN EIGHT-INSTRUCTION CPU

Stop No CPU active? Yes Start AR<-PC Read M PC<-PC+1 IR<-DR(OP) No STORE ? No LOAD Yes ? Yes AR<IR(ADR) AR<IR(ADR) AR<DR(ADR) AR<DR(ADR) Read M AC<-DR
DR<-AC
Write M
Yes
AC AR DR DR(OP) DR(ADR) IR M PC
= Accumulator = Memory Address Register = Memory Data Register = Opcode Field of DR = Address Field of DR = Instruction Register = Memory = Program Counter
ADD ? No AND ? No JUMP ? No JUMPZ ? No COMP ? No Yes Yes
Read M
AC<AC+DR AC<AC^DR PC<DR(ADR)
Read M
Yes No AC=0 ? Yes
PC<DR(ADR)
Yes AC<-AC RSHIFT Shift AC
Fetch Cycle
Execute Cycle
STRUCTURE OF A SIMPLE EIGHT- INSTRUCTIONS CPU
AC = 0 c0 c2 c7 c 3 (Read) c 4 (Write) Main memory M c5 Arithmeticlogic circuits AC c 12 c6
MDR c1 PC MAR c8 c9 c 10 Control Unit IR c 11
c0 c1 c12
HARDWIRED CPU CONTROL UNIT, A SEQUENCE COUNTER OF AN EIGHT- INSTRUCTIONS CPU
Modulo-8 sequence counter F 1

LOAD STORE ADD AND JUMP JUMPZ COMP RSHIFT
F2
F3
F 4
F 5
F 6
F7
F8
Instruction decoder
Combinational circuit N
IR AC = 0 (for JUMPZ)
c1
c 12

Eng Ca08

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Eng Ca08

Загружено:

Авторское право:

Доступные форматы

HUMBOLDT-UNIVERSITT ZU BERLIN

The Processing Unit, Microprogrammed Control And Pipelining

THE PROCESSING UNIT AND MICROPROGRAMMED CONTROL

CONTROL UNIT (1)

CONTROL UNIT (2)

BASIC PROCESSING UNIT ORGANIZATION

PROCESSING UNIT FEATURES

Control signal generator

IR-CURRENT INSTRUCTION REGISTER PC-CURRENT PROGRAM REGISTER, (Program Counter)

SEQUENCING OF OPERATIONS (1)

ASYNCHRONOUS (clocked) using delay elements with variable delays

use of the ring counters for every instruction or binary counters

SEQUENCING OF OPERATIONS (2)

1) separate ring counters for group of instructions with identical sequence

INSTRUCTION FETCH & EXECUTION

INSTRUCTION FETCH & EXECUTION (2)

INSTRUCTION FETCH & EXECUTION (3)

Bus Interface Logic Data TLB Data Cache

Superscalar Integer Execution Units

TLB - Translation Lookaside Buffer

PENTIUM BLOCK DIAGRAM

32-Bit Address Bus

BTB Microcode ROM

Prefetch buffer Instruction decode Control unit

Register TLB - Translation Lookaside Buffer BTB - Branch Trace Logic

THE i386 REGISTERS

EFLAG Segment registers

General-purpose registers EAX EBX ECX EDX ESI EDI EBP AH BH CH DH

SI DI BP Memory management registers

TSS select LDT select IDTR GDTR

CR3 CR2 CR1 CR0 Test registers

DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0

THE REGISTERS OF THE PENTIUM INTEGER UNIT

EFLAG Segment registers

General-purpose registers EAX EBX ECX EDX ESI EDI EBP

Memory management registers

TSS select LDT select IDTR Control registers GDTR TR LDTR

TSS limit LDT limit IDT limit GDT limit

DR7 DR6 DR5 DR4 DR3 DR2 DR1 DR0

PowerPC Architecture 440GP Example

Instruction Branch Unit Unit

Complex Integer Pipe Timers

Single Integer Pipe

Load / Store Pipe

Power Management Interrupts

PowerPC 440xx Core

MAIN REGISTERS OF THE PowerPC

General purpose registers

Condition register 31 Integer-exception register

EXAMPLE OF A PIPELINE OF ADDERS

EXAMPLE OF A FIVE STAGE INSTRUCTION PIPELINE OF A 80X86

Cycle n Cycle n+1 Cycle n+2 Cycle n+3 Cycle n+4

eax - general purpose register

EXAMPLE OF AN EIGHT STAGE FLOATING POINT PIPELINE

Execution Scrabble diagrams for 5 generations of microprocessors

Generation 2 Instruction 1 Instruction 2 Instruction 3 Instruction 4

Instruction 1 Instruction 2 Instruction 3 Time Generation 3

R Read operands E Execute W Write result

D Decode A Address calculation

A SEQUENCER USING A FIXED COUNTER

Clock Multiply Add

CPU BEHAVIOR REPRESENTED AS A SINGLE CLOSED LOOP

Increment program counter and decode instruction