Вы находитесь на странице: 1из 22

Parallel and pipeline Processing

Introduction to Parallel
Processing

What is Pipelining?

Like an Automobile Assembly Line for


Instructions

Each step does a little job of processing the


instruction
Ideally each step operates in parallel

Simple Model

Instruction Fetch
Instruction Decode
Instruction Execute

F1

D1

E1

F2

D2

E2

F3

D3

E3
2

Parallel processing is another method


used to improve performance in a
computer system, when a system
processes two different instructions
simultaneously, it is performing
parallel processing

What is Parallel
Processing?

Ideal Pipeline Performance

If stages are perfectly balanced:

TimePerIns truction

Number _ Pipeline _ Stages

The more stages the better?

TimePerIns tructionUnpiped

Each stage typically corresponds to a clock cycle


Stages will not be perfectly balanced
Synchronous: Slowest stage will dominate time
Many hazards await us

Two ways to view pipelining

Reduced CPI (when going from non-piped to pipelined)


Reduced Cycle Time (when increasing pipeline depth)

Important Pipeline
Characteristics

Latency

Throughput

Time required for an instruction to propagate through


the pipeline
Based on the Number of Stages * Cycle Time
Dominant if there are lots of exceptions / hazards, i.e.
we have to constantly be re-filling the pipeline
The rate at which instructions can start and finish
Dominant if there are few exceptions and hazards, i.e.
the pipeline stays mostly full

Note we need an increased memory bandwidth


over the non-pipelined processor
5

Basic Ideas

Parallel processing

Pipelined processing

time

time

P1

a1

a2

a3

a4

P1

P2

b1

b2

b3

b4

P2

P3

c1

c2

c3

c4

P3

P4

d1

d2

d3

d4

P4

a1

b1

c1

d1

a2

b2

c2

d2

a3

b3

c3

d3

a4

b4

c4

d4

Less inter-processor communication


More inter-processor communication
Complicated processor hardware Simpler processor hardware
Colors: different types of operations performed
a, b, c, d: different data streams processed

Data Dependence

Parallel processing
requires NO data
dependence between
processors

P1

P1

P2

P2

P3

P3

P4

P4

time

Pipelined processing will


involve inter-processor
communication

time

Pipelining Example

Assume the 5 stages take time 10ns, 8ns, 10ns,


10ns, and 7ns respectively
Unpipelined

Ave instr execution time = 10+8+10+10+7= 45 ns

Pipelined

Each stage introduces some overhead, say 1ns per stage


We can only go as fast as the slowest stage!
Each stage then takes 11ns; in steady state we execute
each instruction in 11ns
Speedup = UnpipelinedTime / Pipelined Time
= 45ns / 11ns = 4.1 times

or about a 4X speedup

Usage of Pipelined
Processing

By inserting latches or
registers between
combinational logic
circuits, the critical path
can be shortened.

Consequence:
reduce clock cycle time,
increase clock frequency.

Suitable for DSP applications


that have (infinity) long data
stream.

Method to incorporate
pipelining: Cut-set retiming
Cut set:
A cut set is a set of edges of
a graph. If these edges are
removed from the original
graph, the remaining
graph will become two
separate graphs.
Retiming:
The timing of an algorithm
is re-adjusted while
keeping the partial
ordering of execution
unchanged so that the
results correct

Parallel Computing

Parallel Computing is a central and important


problem in many computationally intensive
applications, such as image processing,
database processing, robotics, and so forth.

Given a problem, the parallel computing is the


process of splitting the problem into several
subproblems, solving these subproblems
simultaneously, and combing the solutions of
subproblems to get the solution to the original
problem.
10

Parallel Computer
Structures

Pipelined Computers : a pipeline computer


performs overlapped computations to exploit
temporal parallelism.

Array Processors : an array processor uses


multiple synchronized arithmetic logic units to
achieve spatial parallelism.

Multiprocessor Systems : a multiprocessor


system achieves asynchronous parallelism
through a set of interactive processors.
11

Nonpipelined
Processor

12

Pipeline Processor

Fall 2008

Introduction to Parallel
Processing

13

Pipeline Computers

Normally, four major steps to execute an


instruction:

Instruction Fetch (IF)


Instruction Decoding (ID)
Operand Fetch (OF)
Execution (EX)

14

Performance Improvements

Computer Engineers improve performance through the


reduction of C/I
I/P is the domain of CS writing software
S/C is the domain of EE/VLSI IC fabrication
CPI or C/I is improved through getting more
instructions done in each cycle
This means doing work in parallel distributed across
the functional units of the IC
15

Multiprocessor Systems

A multiprocessor system is a single computer that


includes multiple processors (computer modules).
Processors may communicate and cooperate at
different levels in solving a given problem.
The communication may occur by sending
messages from one processor to the other or by
sharing a common memory.
A multiprocessor system is controlled by one
operating system which provides interaction
between processors and their programs at the
process, data set, and data element levels.
16

Array Computers

An array processor is a synchronous parallel


computer with multiple arithmetic logic units,
called processing elements (PE), that can operate
in parallel.

The PEs are synchronized to perform the same


function at the same time.

Only a few array computers are designed


primarily for numerical computation, while the
others are for research purposes.
17

Functional structure of multiprocessor system

18

Multicomputers

There is a group of processors, in which each of


the processors has sufficient amount of local
memory.

The communication between the processors is


through messages.

There is neither a common memory nor a


common clock.

This is also called distributed processing.


19

Power consumption trends


for desktop processors

Reduced Cost: Multiple processors share the same


resources. Separate power supply or mother board for
each chip is not required. This reduces the cost.
Increased Reliability: The reliability of system is also
increased. The failure of one processor does not affect
the other processors though it will slow down the
machine. Several mechanisms are required to achieve
increased reliability. If a processor fails, a job running on
that processor also fails. The system must be able to
reschedule the failed job or to alert the user that the job
was not successfully completed.
Increased Throughput: An increase in the number of
processes completes the work in less time. It is
important to note that doubling the number of
processors does not halve the time to complete a job. It
is due to the overhead in communication between
processors and contention for shared resources etc.

Conclusion

MPSoCs are an important chapter in


the history of multiprocessing
System Designers like uniprocessors
with sufficient computation power.
DSPs (Audio Processing)
Computational power (Moores Law)
vs low power, low- cost, real time
requirements.
22

Вам также может понравиться