Floating Point Pipelines: Single Pipelines. When All Instructions Run Through A Pipeline With The Same Number of

E C E N 6 2 5 3 A d v a n c e d D i g i t a l C o m p u t e r D e s i g n
Floating Point Pipelines January 9, 2006 page 1 of 4

Floating Point Pipelines
In the MIPS pipeline that we have studied last semester, only one pipeline stage, EX, is
allowed for the numerical calculations to take place. As long as the numerical calculations
can be done by a simple ALU, the amount of time available during one pipeline stage
should be adequate. However, more complicated arithmetic operations, such as multiply,
divide and floating point operations, can require complex hardware with significantly
longer delays than a simple ALU.
One solution might be to extend the super pipelining idea to the EX stage and have multi-
ple EX stages. Well designed integer multiply hardware has about twice the delay of the
carry chain in the ALU. This could be incorporated with two EX stages.
ALU
ID/EX1
INT MULT
stage 1
EX1/EX2
INT MULT
stage 2
EX2/MEM
MUX

The throughput for ALU instructions is unaffected by this scheme, but the increased pipe-
line latency will require additional operand forwarding and longer instruction latencies.
The same technique is not practical for integer division. Multiplication can be compressed
into a few pipeline stages because most of the computation can be done in parallel. Unfor-
tunately, division algorithms are highly sequential and only a limited amount of work can
be done in each pipeline stage which requires a large number of stages to complete the
algorithm. Floating point operations also require many pipeline stages.
Single Pipelines. When all instructions run through a pipeline with the same number of
stages, it is usually not practical to make the pipeline long enough to accommodate the
longest multi-stage instruction since it would take too many stages. Instead, the multi-
stage instructions are broken down into simpler parts. The EX stage contains all of the
hardware for each step and the EX stage is repeated the appropriate number of times until
the multi-stage instruction finishes. Meanwhile, the IF and ID stages are stalled if EX
takes more than one clock cycle.
This scheme preserves the simplicity of the simple linear pipeline and does not increase
the latency of the simple ALU instructions (only one EX cycle). The multi-stage instruc-
tions cause many stall cycles (the instructions become essentially unpipelined), but as long
as the instruction frequency of the multi-stage instructions is low, stalls will be infrequent.
Typical instruction frequency data show that the multi-stage instructions (integer multiply/
divide and floating point operations) have very low frequencies except in the most intense
floating point applications. Division is frequently implemented using this scheme even in
high performance superscalar processors.
Parallel Pipelines. Different parallel pipelines for different instructions can reduce the
number of stalls caused by the multi-stage instructions.
IF ID
EX
F1 F2 ... Fn
MEM WB
floating point unit
integer unit
The IF and ID stages are not auto-
matically stalled when a multi-stage instruction begins execution. The multi-stage
instructions still have high instruction latencies, but the stalls are reduced since a new
instruction can start executing every clock cycle. The high instruction latency means that
later instructions that need the results of previous multi-stage instructions have to be
stalled until the latency period is over.
Division would require the longest pipeline, but has the lowest instruction frequency of
the multi-stage instructions. The low instruction frequency makes it uneconomical to
devote extensive hardware to a long division pipeline. Instead, the division instruction
repeatedly uses the same EX hardware over and over again.
Since the pipelines for the different instruction types are in parallel, ALU instructions can
proceed before the multi-stage instructions finish. If the ALU instructions use the results
of a multi-stage instruction still in its pipeline, a RAW data hazard is possible. These can
be avoided by stalling or operand forwarding similar to the fixed length pipelines studied
previously. The problem is that there are many more situations that can cause stalls which
complicates the design.
If ALU instructions do not use results from any multi-stage instructions already in the
pipeline, then the ALU instructions might finish before a previous multi-stage instruction
(and multiplies might finish before previous divides, etc.). Recall that we could ignore
WAW and WAR data hazards only because instructions start and finish in order in a nor-
mal pipeline.
The variable length pipeline on the previous page does not cause WAR hazards because
instructions start and read their operands in ID in order before writes of subsequent
instructions. Thus, subsequent writes will automatically be after previous reads.
Another problem is that more than one instruction might get to MEM and WB at the same
time. It is possible to have duplicate MEM and WB stages on the end of each pipeline, but
this would require multiple ports on the memory and register file (too expensive usually).
It is true that many processors provide a separate register file for floating point operations
so that an integer and floating point instruction could be in WB at the same time. How-
ever, floating point operations could still cause structural hazards with other floating point
operations. A single MEM and WB stage as on the previous page makes a structural haz-
ard that must be avoided by stalling instructions so that only one instruction at a time
enters MEM and WB.
Maintaining Precise Exceptions. Since instructions can finish out of order in floating
point pipelines, it makes it much more difficult to make precise exceptions. Recall that
our previous method of making precise exceptions was to handle them in order as the
instructions reached the WB stage. Now, instructions do not necessarily reach WB in
order.
We have already mentioned that precise exceptions are necessary for virtual memory sys-
tems so that page faults are handled in order. The IEEE floating point standard also
requires precise exceptions. The standard defines several exceptions which allow the
exception handler the opportunity to fix the arithmetic result causing the exception.
Thus, the floating point exception handlers must be insured to run before any following
instructions use the result of the floating point instruction.
The solution to the problem of precise floating point exceptions falls into the four catego-
ries below.
1. Give the processor two modes of operation, one with precise floating point exceptions
and one without. The imprecise exception mode is faster because more floating point
instructions can be in the pipeline at the same time. With precise exceptions, new float-
ing point instructions are started only when it is determined that previous floating point
instructions will not cause exceptions. This solution is used by the DEC alpha, IBM
power-1 and power-2 and MIPS R8000.
2. Make precise exceptions by storing the results of an instruction temporarily until all
previous instructions have finished. Since the instruction results (register or memory
writes) are written in order only after exceptions for each instruction are handled, the
register file always has the correct values to restart the pipeline after the exception is
handled. This is similar to handling all exceptions in the WB stage in the simple MIPS
pipeline. This solution requires more sophisticated hardware which will be studied in
more detail later. This technique is used by most superscalar processors.
3. Allow imprecise exceptions and leave it up to the exception handler software to make
the exceptions precise. The exception handler software must have access to sufficient
information about every instruction in the pipeline. In architectures with a single inte-
ger unit as the MIPS pipeline, the integer instructions finish in order and only the float-
ing point instructions in the pipeline need to be considered by the exception handler. A
queue of pending exceptions is needed to execute the exception handler software in
order. It is also difficult to restart the pipeline after the exception handler is finished.
Consider the following instruction sequence.
DIVF F0, F2, F4
MULF F10, F10, F8
ADDF F12, F12, F14
Suppose DIVF causes an exception after ADDF completes but before MULF com-
pletes. When the pipeline is restarted, MULF must be executed again, but ADDF must
not be executed.
This solution is used by the SPARC processors.
4. Always require that previous instructions cannot cause exceptions before allowing a
new instruction to start execution. This is most effective when exception detection
hardware is added early in the multiple EX floating point pipeline stages. This solution
is used in the MIPS R2000/3000, MIPS R4000 and the Intel Pentium.

Floating Point Pipelines: Single Pipelines. When All Instructions Run Through A Pipeline With The Same Number of

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Floating Point Pipelines: Single Pipelines. When All Instructions Run Through A Pipeline With The Same Number of

Загружено:

Авторское право:

Доступные форматы

E C E N 6 2 5 3 A d v a n c e d D i g i t a l C o m p u t e r D e s i g n

Floating Point Pipelines January 9, 2006 page 1 of 4

Вам также может понравиться