Академический Документы
Профессиональный Документы
Культура Документы
Basic Operations
Pipeline Processing
Difficulties On implementing floating point Pipeline processing
arithmetic - A general technique for increasing processor
1. Exponent biasing throughput without requiring large amount of
If biased exponents are added extra hardware.
- Applied to design of complex datapath units
or subtracted using fixed-point arithmetic in the such as multipliers and floating-point adders.
course of a floating-point calculation, the - Also used to improve the overall throughput of
resulting exponent is doubly biased and must an instruction set processor.
be corrected by subtracting the bias. Introduction
Stages or segments – a pipeline processor consist of
a sequence of m data-processing circuits, which
collectively perform a single operation on a stream of
data operands passing through them.
Latency
• For a non-pipelined processor:
NmT
• For a pipelined processor:
[m+(N-1)]T
Where: N = number of Tasks
m = number of stages
T = pipeline’s clock period
Illustrates the behavior of the adder pipeline
Addition of two normalized floating-point numbers x when performing a sequence of N floating-
and y can be implemented using: Four-step sequence point additions of the form xi+yi for the case
N=6
Four-step sequence: At any time, any the four stages can contain
1) compare the exponents
a pair of partially processed scalar operands
2) align the mantissas
(xi, yi).
3) add the mantissas
The buffering of the stages ensures that Si
4) normalize the result
receives as input the results computed by
stage S(i-1) during the preceding clock
Normalization is done by counting the number k of
period only.
leading zero digits of the mantissa (or leading ones in
If T is the pipeline’s clock period, then it
the negative case), shifting the mantissa k digit
position to normalize it and making a corresponding takes 4T to compute the single sum xi+yi or
adjustment in the exponent. in other words, pipeline’s delay is 4T
4T is the time required to do one floating-
Four-stage floating-point adder pipeline: point addition using a nonpipelined
processor plus the delay due to the buffer
registers.
Once all four stages of the pipeline have
been filled w/ data, a new sum emerges from
the last stage of the pipeline S4 every T
seconds.
Consequently, N consecutive additions can
be done in time (N+3)T, implying that the
four-stage pipeline’s speedup is
_<Pipeline Design:
Suppose that x has a normalized floating
Find a suitable multistage sequential
point representation (Xm, Xe) where Xm is
algorithm to compute the given function.
mantissa and Xe is exponent w/ respect to
This algorithm’s steps which are
some base B=2K
implemented by the pipeline stages should
In the first step of adding x=(Xm, Xe) to
be balanced that they should have roughly
y=(Ym, Ye) which is executed by S1 of the
the same execution time.
pipeline, Xe and Ye are compared by
Fast buffer registers
subtracting the exponents, which requires a
- placed between the stages to allow all necessary
fixed point adder.
data-items (partial or complete results) to be
transferred from stage to stage without
interfering w/ 1 another
- buffers are designed to be clocked at the
maximum rate that allows data to be transferred
reliably between stages.
Tc = max{Ti} + TR
For i = 1,2,…m
• Non-pipelined Processor:
Feedback:
- The usefulness of a pipeline processor can
sometimes be enhanced by including feedback
paths from a stage output to the primary inputs
of the pipeline.
- It enables the result computed by certain stages
to be used in a subsequent calculations by the
pipeline