Tomasulo 2

Cover page
Tomasulo’s Algorithms
Introductions
Tomasulo’s Algorithms is a computer architecture hardware algorithm for dynamic scheduling of instructions that
allows out-of-order execution and enables more efficient use of multiple execution units.
Dynamic Scheduling
The Dynamic Scheduling is a technique in which the hardware rearranges the instruction execution to reduce the stalls while
maintaining data flow and exception behaviour.
Dynamic scheduling introduces the idea of out of order execution i.e. an instruction will be executed as soon as its data
operands are available.
Advantages
 It enables handling some cases when dependences are unknown at compile time
 It simplifies the compiler and Uses speculation technique to improve performance
 It allows code that was compiled with one pipeline in mind to run efficiently on a different pipeline
Good Dynamic Scheduling Needs More

1. Better handling of WAR and WAW dependences
2. Better handling of structure hazards
3. Better pipeline efficiency
Approaches to dynamic scheduling
• Tomasulo's algorithm
• Scoreboarding technique
We focus on a more sophisticated technique, called Tomasulo’s algorithm, that has several major enhancements over score-
boarding.
Tomasulo Algorithm
• Tomasulo’s algorithm there is a common data bus between all the units that collects all the results.
• So an instruction can look into the CDB and not any register for a value provided by a previous instruction.
• Further the overall number of cycles is reduced for this scheme.
Goals
 High performance without special compilers
What Tomasulo Provides?
 Better handling of WAR and WAW dependences
 Better handling of structure hazards
 Better pipeline efficiency
 Dynamic memory disambiguation
 Distributed scheduling logic
Reservation stations:
• A reservation station fetches and buffers operands as soon as they are available.
• Provide register renaming for destination registers as well.
• May be split i.e every functional unit has its own RS or unified i.e a common RS for all units.
• Each reservation station holds an instruction that has been issued and is awaiting execution at a functional unit. It
also stores the operands or the names of the RS’s that will provide the operands.
The following diagram shows the basic structure of a Tomasulo-based MIPS processor, including bot the floating-point unit
and the load/store unit.
Figure 1: The Basic structure of a MIPS floating-point unit using Tomasulo’s algorithm
Instructions are sent from the instruction unit into the instruction queue from which they are issued in FIFO order. The
reservation stations include the operation and the actual operands, as well as information used for detecting and resolving
hazards. Load buffers have three functions: hold the components of the effective address until it is computed, track
outstanding loads that are waiting on the memory, and hold the results of completed loads that are waiting for the CDB.
Similarly, store buffers have three functions: hold the components of the effective address until it is computed, hold the
destination memory addresses of outstanding stores that are waiting for the data value to store, and hold the address and
value to store until the memory unit is available.
There are only three steps in Tomasulo’s Aprroach :
1. Issue - Get the next instruction from the head of the instruction queue. If there is a matching reservation station that is
empty, issue the instruction to the station with the operand values (renames registers)
2. Execute (EX) - When all the operands are available, place into the corresponding reservation stations for execution. If
operands are not yet available, monitor the common data bus (CDB) while waiting for it to be computed.
3. Write result (WB) - When the result is available, write it on the CDB and from there into the registers and into any
reservation stations (including store buffers) waiting for this result. Stores also write data to memory during this step:
When both the address and data value are available, they are sent to the memory unit and the store completes.
Cycle 1
No. Instruction IS EX W
1 L.D F6,34(R2) 1 Busy Op Vj Vk Qj Qk A
2 L.D F2, 45(R3) LD1 1 LD 134
3 MUL.D F0, F2, F4 LD2
AD1
4 SUB.D F8, F2, F6
AD2
5 DIV.D F10,F0,F6 AD3
6 ADD.D F6, F8, F2 ML1
Assume ML2
R2 is 100 22
R3 is 200
F0 F2 F4 F6 F8 F10
F4 is 2.5 LD1
Cycle 2
1 L.D F6,34(R2) 1 2 Busy Op Vj Vk Qj Qk A
2 L.D F2, 45(R3) 2 LD1 1 L.D 134
3 MUL.D F0, F2, F4 LD2 1 L.D 245
4 AD1
SUB.D F8, F2, F6
AD2
5 DIV.D F10,F0,F6
AD3
ML2
22
F0 F2 F4 F6 F8 F10
LD2 LD1
Cycle 3
1 L.D F6,34(R2) 1 4 Busy Op Vj Vk Qj Qk A
2 L.D F2, 45(R3) 2 3 LD1 1 L.D 134
3 3 LD2 1 L.D 245
MUL.D F0, F2, F4
AD1 1 MUL.D 2.5 LD2
4 SUB.D F8, F2, F6
AD2
ML2
22
F0 F2 F4 F6 F8 F10
AD1 LD2 LD1
Cycle 4
1 L.D F6,34(R2) 1 2 4 Busy Op Vj Vk Qj Qk A
2 L.D F2, 45(R3) 2 3 LD1 10 L.D 134
3 3 LD2 1 L.D 245
MUL.D F0, F2, F4
4 AD1 1 SUB.D Val LD2 LD1
4 SUB.D F8, F2, F6
AD2
6 ADD.D F6, F8, F2 ML1 1 MUL.D 2.5 LD2
ML2
22
F0 F2 F4 F6 F8 F10
ML1 LD2 LD1
LD1 AD1
Cycle 5
No. Instruction Is ex w
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 LD2 00 L.D 245
MUL.D F0, F2, F4
4 AD1 1 SUB.D Val 0.5 LD2
LD2
4 SUB.D F8, F2, F6
5 AD2
6 ADD.D F6, F8, F2 ML1 1 MUL.D Val 2.5 LD2
LD2
ML2 1 DIV.D ML1 ML2
22
F1 F2 F4 F6 F8 F10
ML1 LD2
LD2 ML2 AD1
Cycle 6
2 L.D F2, 45(R3) 2 3 LD1 0
3 3 6 LD2 0
MUL.D F0, F2, F4
4 6 AD1 1 SUB.D 1.5 0.5
4 SUB.D F8, F2, F6
5 AD2 1 ADD.D Val AD1
5 DIV.D F10,F0,F6
6 AD3
6 ADD.D F6, F8, F2 ML1 1 MUL.D 1.5 2.5
ML2 1 DIV.D 0.5 ML1
22
F1 F2 F4 F6 F8 F10
ML1 AD2 AD1 ML2
Cycle 8
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 6 LD2 0
MUL.D F0, F2, F4
4 6 8 AD1 00 SUB.D 1.5 0.5
4 SUB.D F8, F2, F6
5 AD2 0 ADD.D 1.0 2.5 AD1
AD1
5 DIV.D F10,F0,F6
6 AD3
6 ADD.D F6, F8, F2 ML1 1 MUL.D 1.5 2.5
ML2 1 DIV.D 0.5 ML1
22
F0 F2 F4 F6 F8 F10
ML1 AD2 AD1 ML2
Cycle 9
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 6 LD2 0
MUL.D F0, F2, F4
4 6 8 AD1 0
4 SUB.D F8, F2, F6
5 AD2 0 ADD.D 1.0 2.5
5 DIV.D F10,F0,F6
6 9 AD3
6 ADD.D F6, F8, F2 ML1 1 MUL.D 1.5 2.5
ML2 1 DIV.D 0.5 ML1
22
F0 F2 F4 F6 F8 F10
ML1 AD2 ML2
Cycle 11
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 6 LD2 0
MUL.D F0, F2, F4
4 6 8 AD1 0
4 SUB.D F8, F2, F6
5 AD2 0 ADD.D 1.0 2.5
5 DIV.D F10,F0,F6
6 9 11 AD3
6 ADD.D F6, F8, F2 ML1 1 MUL.D 1.5 2.5
ML2 1 DIV.D 0.5 ML1
22
F0 F2 F4 F6 F8 F10
ML1 AD2
AD2 ML2
Clock 16
No. Instruction Is ex w
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 6 16 LD2 0
MUL.D F0, F2, F4
4 6 8 AD1 0
4 SUB.D F8, F2, F6
5 AD2 0
5 DIV.D F10,F0,F6
6 9 11 AD3
6 ADD.D F6, F8, F2 ML1 0 0 MUL.D 1.5 2.5
ML2 1 DIV.D 3.75 0.5 ML1
22
F0 F2 F4 F6 F8 F10
ML1
ML1 ML2
Clock 17
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 MUL.D F0, F2, F4 3 6 16 LD2 0
4 4 6 8 AD1 0
SUB.D F8, F2, F6
5 17 AD2 0
5 DIV.D F10,F0,F6
6 9 11 AD3
6 ADD.D F6, F8, F2 ML1 0
ML2 1 DIV.D 3.75 0.5
22
F0 F2 F4 F6 F8 F10
ML2
Clock 57
2 L.D F2, 45(R3) 2 3 5 LD1 0
3 3 6 16 LD2 0
MUL.D F0, F2, F4
4 6 8 AD1 0
4 SUB.D F8, F2, F6
5 17 57 AD2 0
5 DIV.D F10,F0,F6
6 9 11 AD3
6 ADD.D F6, F8, F2 ML1 0
ML2 10 DIV.D 3.75 0.5
22
F0 F2 F4 F6 F8 F10
ML2
ML2
Conclusions
 Tomasulo algorithm has nothing to do with register renaming
 It resolves the WAR & WAW by eliminating the side effect of using register to pass value
 By using Tomasulo algorithm, the execution of a program is driven by data flow thus exploiting maximum
concurrency
Reference
1. https://www.scribd.com/document/251840928/Dynamic-Scheduling-Using-Tomasulo-s-Approach
2. https://en.wikipedia.org/wiki/Tomasulo_algorithm

Tomasulo 2

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Tomasulo 2

Загружено:

Авторское право:

Доступные форматы

Cover page

Good Dynamic Scheduling Needs More

Вам также может понравиться