Вы находитесь на странице: 1из 13

# ECS-073 PARALLEL ALGORITHMS

Unit-I:
Sequential model, need of alternative model, parallel computational models such as PRAM, LMCC, Hypercube, Cube
Connected Cycle, Butterfly, Perfect Shuffle Computers, Tree model, Pyramid model, Fully Connected model, PRAMCREW, EREW models, simulation of one model from another one.
Unit-II:
Performance Measures of Parallel Algorithms, speed-up and efficiency of PA, Cost- optimality, An example of illustrate
Cost- optimal algorithms- such as summation, Min/Max on various models.
Unit-III:
Parallel Sorting Networks, Parallel Merging Algorithms on CREW/EREW/MCC, Parallel Sorting Networks on
CREW/EREW/MCC/, linear array
Unit-IV:
Parallel Searching Algorithm, Kth element, Kth element in X+Y on PRAM, Parallel Matrix Transportation and
Multiplication Algorithm on PRAM, MCC, Vector-Matrix Multiplication, Solution of Linear Equation, Root finding.
Unit-V:
Graph Algorithms - Connected Graphs, search and traversal, Combinatorial Algorithms- Permutation, Combinations,
Derrangements.
References:
1. M.J. Quinn, Designing Efficient Algorithms for Parallel Computer, McGrawHill.
2. S.G. Akl, Design and Analysis of Parallel Algorithms
3. S.G. Akl, Parallel Sorting Algorithm by Academic Press
COURSE PLAN
PARALLEL ALGORITHMS
ECS-073
Course description : This course is about one (and perhaps the most fundamental) aspect of parallelism, namely, parallel
algorithms. A parallel algorithm is a solution method for a given problem destined to be performed on a parallel computer. In
order to properly design such algorithms, one needs to have a clear understanding of the model of computation underlying
the parallel computer.
Topic
Lecture-1
1.Sequential
mode
2. Desirable
Properties
For Parallel
Algorithms
Lecture-2
1. Need of
alternative
model
2. Parallel
computationa
l models
Lecture-3
Parallel
computationa
l models

Knowledge Input

Unit-I
Concept Input

Supportive Aid

References

Difference between
sequential model and
parallel model

Discussion and
Example

Notes

## Different types of model

PRAM,
LMCC,
Hypercube,

Working function of
these models

Discussion and
Example

S.G. Akl,
Design
and
Analysis of
Parallel
Algorithms
, Notes

Asymptotic Notation
Cube Connected
Cycle
Butterfly
Perfect Shuffle
Computers
Tree model

Working function of
these models

Discussion and
Example

S.G. Akl,
Design
and
Analysis of
Parallel
Algorithms
, Notes

Basic introduction of
different type of
model

Lecture-4
Designing
Algorithms

Topic
Lecture-1
Performance
Measures of
Parallel
Algorithms
Lecture-2
Performance
Measures of
Parallel
Algorithms

Pyramid model
Fully Connected
model

Incremental Approach
PRAM-CREW
EREW models

## Working function of these

models
simulation of one
model from another one.
Unit-II
Concept Input

Discussion and
Examples

Design &
Analysis of
Parallel
Algorithms

Supportive Aid

References

## Lower and Upper

Bounds
Speedup.
Number of Processors
Cost

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Other Measures
Area.
Length
Period

## Meausre the efficiency

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Knowledge Input

Lecture-3
Expressing
Algorithms

Lecture-4
Lower Bound

Lecture-5
Example of
illustrate
Cost- optimal
algorithms
Lecture-6
An Algorithm
For Parallel
Selection

informal language to
describe parallel
algorithms

## Statements similar to those

of a typical structured
programming language
(such as Pascal,).

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Linear Order
Rank
Selection
Complexity

## Comparison problems are

usually solved by
comparing pairs of
elements of an input
sequence.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## In this section, we present

two procedures for
performing these
simulations.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## An algorithm for parallel

selection on an EREW SM
SIMD computer.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Datum
Computing All
Sums

Procedure
Analysis

Unit-III
Topic
Lecture-1
Merging

Lecture-2
Merging On
The CREW
Model
Lecture-3
1. Merging
On The
EREW Model
2. A Better
Algorithm
For The
EREW Model
Lecture-4
Sorting

Knowledge Input

Concept Input

Supportive Aid

References

Introduction
A Network For
Merging

## Two sequences of numbers

sorted in nondecreasing
order by merging

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## It assumes the existence,

and makes use of, a
sequential procedure for
merging two sorted
sequences.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## Finding the Median

of Two Sorted
Sequences
Fast Merging on the
EREW Model

## We now show how this

run on an N-processor
EREW SM SIMD
computer

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Introduction
A Network For
Sorting

## We now turn our attention

to a third such problem:
Sorting. There are two
reasons for this interest.
The problem is important to

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Sequential Merging
Parallel Merging

practitioners, as sorting
data is at the heart of many
computations. It also has a
rich theory:
Lecture-5
Sorting On A
Linear Array

Lecture-6
1. Sorting On
The CRCW
Model
2. Sorting On
The CREW
Model
Lecture-7
Sorting On
The EREW
Model

Topic
Lecture-1
Searching

Lecture-2
Searching A
Random
Sequence

Lecture-3

ODD-EVEN
TRANSPOSITION
MERGE SPLIT

CRCW SORT
CREW SORT

Simulating
Procedure CREW
SORT
Sorting by ConflictFree Merging
Sorting by Selection

Knowledge Input

## In this section we describe

a parallel sorting algorithm
for an SIMD computer
where the processors are
connected to form a linear
array

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## It is time to turn our

attention to the sharedmemory SIMD model.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## Our purpose in this

section is to deal with this
third difficulty. Three
parallel algorithms for
sorting on the EREW
model are described, each
representing an
improvement over its
predecessor.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Supportive Aid

References

Unit-IV
Concept Input

1. Introduction
2. Searching A Sorted
Sequence
EREW
Searching
CREW
Searching
CRCW
Searching

## Searching is one of the

most fundamental
operations in the field of
computing. It is used in any
application where we need
to find out whether an
element belongs to a list or,
more generally, retrieve
from a file information
associated with that
element.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Searching on SM SIMD
Computers
Searching on a Tree
Searching on a Mesh

We begin by studying
parallel search algorithms
for shared-memory SIMD
computers. We then show
how the power of this
model is not really needed
for the search problem

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Generating
Permutations
and
Combinations

1. Introduction

In this lecture we
describe a number of
parallel algorithms for the
two fundamental problems
of generating permutations
and combinations.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Lecture-4
Sequential
Algorithms

## In this section we describe

a number of sequential
algorithms. The first
algorithm generates all mpermutations of n items in
lexicographic order. We
also show how all mpermutations of n items can
be put into one-to-one
correspondence with the
integers 1,...,"Pm.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Algorithm
Permutation Generator
Parallel Permutation
Generator for Few
Processors

## We set the stage in the

the problem of generating
permutations in parallel.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

A Fast Combination
Generator
Combination Generator

## We now turn to the problem

of generating all 'Cm rcombinations of S = { 1,2,..
. ., n } in lexicographic
order.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Introduction

Problems involving
matrices arise in a
multitude of numerical and
nonnumerical
contexts.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Mesh Transpose
Shuffle Transpose
EREW Transpose

## In this lecture we show how

three operations on
matrices can be performed
in parallel.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Mesh Multiplication
Cube Multiplication
CRCW Multiplication

## In this section we assume

that the elements of all
matrices are numerals, say,
integers. A straightforward
sequential implementation
of the preceding definition
is given by procedure
MATRIX
MULTIPLICATION.

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Linear Array
Multiplication
Tree Multiplication
Convolution

We study it
separately in order to
demonstrate the use of two
interconnection networks in

Discussion and
Examples

Design
and
Analysis of
Parallel

Lecture-5
Generating
Permutation
In Parallel

Lecture-6
Generating
Combination
In Parallel

Lecture-7
Matrix
Operations

Lecture-8
Trasposition

Lecture-9
Matrix-ByMatrix
Multiplicatio
n

Lecture-10
Matrix-ByVector
Multiplicatio
n

Generating
Permutations
Lexicographically
Numbering
Permutations
Generating
Combinations
Lexicographically
Numbering
Combinations

performing matrix
operations, namely, the
linear (or one-dimensional)
array and the tree.

Algorithms

Lecture-11
Numerical
Problems

Lecture-12
Finding
Roots Of
Nonlinear
Equations

1. Introduction
2. Solving Systems Of
Linear Equations
An SIMD Algorithm
An MIMD Algorithm

An SIM D
Algorithm
An MIMD
Algorithm

## In this lecture we describe

parallel algorithms for the
following numerical
problems: solving a system
of linear equations, finding
roots of nonlinear
equations, solving partial
differential equations and
computing eigenvalues

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

## In many science and

engineering applications it
is often required to find the
root of an equation of one
variable,

Discussion and
Examples

Design
and
Analysis of
Parallel
Algorithms

Unit-V
Topic
Lecture-1
Graph
Theory

Lecture-2
Computing
The
Connectivity
Matrix
Lecture-3
Finding
Connected
Components

Lecture-4
All-Pairs
Shortest
Paths

Knowledge Input

Concept Input

Supportive Aid

References

Introduction
Definition

## Graphs are used to organize

data, to model algorithms,
and generally as a powerful
tool to represent .

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

CUBE
CONNECTIVITY

## In order to implement this

algorithm in parallel, we
can use any of the matrix
multiplication algorithms

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

CUBE COMPONENTS

## The problem we consider in

this section is the
following. An undirected nnode graph G is given by
is required to decompose G
into the smallest possible
number of connected
components. We can solve
the problem by first
computing the connectivity
matrix

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

CUBE SHORTEST
PATHS

## the all-pairs shortest paths

problem is stated as
follows: An n-vertex graph
G is given by its n x n
weight matrix W, construct
an n x n matrix D such that
d1j is the length of the
shortest path from vi to vj

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

Lecture-5
Computing
The
Minimum
Spanning
Tree
Lecture-6
Traversing
Combinatoria
l Spaces

Lecture-7
Basic Designs
Principles

Lecture-8
The
Algorithm

Lecture-9
Analysis And
Examples

EREW MST

## If the graph G is weighted,

then a minimum spanning
tree (MST) of G has the
smallest edge-weight sum
among all spanning trees of
G.

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

Introduction
Sequential Tree
Traversal

Many combinatorial
problems can be solved by
generating and searching a
special graph known as a
state-space graph. This
method, aptly called statespace traversal, differs from
the searching algorithms

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

## The Minimal AlphaBeta Tree

Model of Computation
Objectives and Methods

## In this section we describe

the main ideas behind
(i) the parallel algorithm,
(ii) the model of
computation to be used,
(iii) the objectives
motivating the design, and
achieve these objectives.

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

Procedures and
Processes
Semaphores
Score Tables

## This section provides a

formal description of the
parallel alpha-beta
algorithm as implemented
on an EREW SM MIMD
computer.

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

Parallel Cutoffs
Storage Requirements

## As it is the case with most

MIMD algorithms, the
running time of procedure
MIMD ALPHA BETA is
best analyzed empirically.
In this section we examine
two other aspects of the
procedure's performance.

Discussion and
Examples

Design and
Analysis of
Parallel
Algorithms

Books Recommended:
1. M.J. Quinn, Designing Efficient Algorithms for Parallel Computer, McGrawHill.
2. S.G. Akl, Design and Analysis of Parallel Algorithms
3. S.G. Akl, Parallel Sorting Algorithm by Academic Press

TUTORIAL SHEET
PARALLEL ALGORITHMS
ECS-073
TUTORIAL SHEET 1
1.
2.
3.
4.
5.
6.

## Difference between Sequential Model and Parallel Computational Model.

What is Parallel Computing?
Discuss the need of Parallel Computers.
What is the meaning of parallelism?
Depending upon stream of instructions (the algorithm) how many types of computer are present? Discuss.
Short Notes:i) PRAM
ii) EPREW
iii) Feasibility of the Shared-Memory Model
iv) Interconnection-Network SIMID Computers.
7. Discuss about Programming MIMD Computers.
8. Difference between SIMD and MIMD.
TUTORIAL SHEET 2
1. Discuss the criteria of analyzing an Algorithm.
2. Show how an MISD computer can be used to handle multiple queries on a given object in a database.
3. Given a set of numbers {s1, S2 .... SN}, all sums of the form si + S2 , St + 52 + S3 .
S1 + 52 + * ' + SN are to be computed. Design an algorithm for solving this problem using N processors on each of
the four submodels of the SM SIMD model.
4. Why Use Parallel Computing?
5. Summarize the similarity & differences between the RAM model of series computation and the PRAM model of
parallel computations.

nn

n 2k

## 6. Devise a PRAM algorithm to multiply two

matrices, where
.
7. Design special-purpose architecture for solving a system of linear equations.
8. Prove that an algorithm requiring t(n) time to solve a problem of size n on a cube connected computer with N
processors can be simulated on a shuffle-exchange network with the same number of processors in O(log N) x t(n)
time.
TUTORIAL SHEET 3
1. If PARALLEL SELECT were to be implemented on a CREW SM SIMD computer, would it run any faster?
2. Design and analyze a parallel algorithm for solving the selection problem on a CRCW SM SIMD computer.
3. A tree-connected computer with n leaves stores one integer of a sequence S per leaf. For a given k, I < k < n, design
an algorithm that runs on this computer and selects the kth smallest element of S.
4. Repeat problem 3 for a linear array of n processors with one element of S per processor.

n
5.
6.
7.
8.

## Repeat problem 3 for an

mesh of processors with one element of S per processor.
What is the difference between a binary k-cube and a cube-connected network of degree k.
Name two ways to implement vector computers.
Is it possible for the average speedup exhibit by a parallel algorithm to be superlinear?

TUTORIAL SHEET 4
1. In steps I and 2 of procedure SEQUENTIAL SELECT, a simple sequential algorithm is required for sorting short
sequences. Describe one such algorithm.
2. Show that 2-D mesh with an odd number of rows, an odd number of columns.
3. Prove that a complete binary tree with weight n can be embedded with dilation 1 in an (n+2) dimensional
hypercube.
4. Show that, in general, any (r, s)-merging network must require 12(s log r) comparators when r < s.
5. The sequence of comparisons in the odd-even merging network can be viewed as a parallel algorithm. Describe an
implementation of that algorithm on an SIMD computer where the processors are connected to form a linear array.
The two input sequences to be merged initially occupy processors PI to P, and P, 1 to P., respectively. When the
algorithm terminates, Pi should contain the ith smallest element of the output sequence.
6. Establish the correctness of procedure EREW MERGE.
7. Establish the correctness of procedure TWO-SEQUENCE MEDIAN.
8. Develop a parallel merging algorithm for the CRCW model.
TUTORIAL SHEET 5

log n
1. Write an
positive integer.

matrix multiplication algorithm for the CREW PRAM model. Assume that

n 2k

, where k is a

2. What is
for the hypercube SIMD model?
3. Determine the processor efficiency of the hypercube SIMD matrix multiplication algorithm as a function of the
matrix dimension n.

a j i 0

not needed?

n / log p

## 5. Prove that it is possible to implement the odd-even reduction algorithm in

time using
processors
on the CREW PRAM.
6. Write a multicomputer targeted Gaussian elimination algorithm in which columns of the coefficient matrix A are
mapped to processors.
7. Write a parallel algorithm implementing the more general version of the Jacobi algorithm capable of solving the two
dimensional heat equations.
8. Write a parallel algorithm implementing the more general version of the Jacobi algorithm capable of solving the
arbitrary linear systems.
TUTORIAL SHEET 6
1. Rewrite the CRCW PRAM enumeration sort algorithm so that it requires only n(n-1)/2 processing elements, yet
still executes in constant time.
2. Use odd-even transposition sort to sort these sequences:
a) 5,8,3,2,4,6,4,1
b) 1,3,5,7,2,4,6,8
3. Do you think it is accurate to describe odd-even transposition sort as a parallel bubble sort? Justify your answer.
4. Can you design a sorting network that uses 0(n) processors to sort a sequence of length n in 0(log n) time?
5. Establish the correctness of procedure ODD-EVEN TRANSPOSITION.
6. In procedure MERGE SPLIT each processor needs at least 4n/N storage locations to merge two sequences of length
n/N each. Modify the procedure to require only 1 + n/N locations per processor.
7. Implement the idea of sorting by enumeration on a cube-connected SIMD computer and analyze the running time of
8. Show how procedure CRCW SORT can be modified to run on an EREW model and analyze its running time.

TUTORIAL SHEET 7
1. Design an algorithm for sorting on the pyramid machine.
2. Implement the idea of sorting by enumeration on a cube-connected SIMD computer and analyze the running time of

n 1

3. Derive an algorithm for sorting by enumeration on the EREW model. The algorithm should use
processors,
where k is an arbitrary integer, and run in 0(k log n) time.
4. Let the elements of the sequence S to be sorted belong to the set {O , 1. . . m 1},. A sorting algorithm known as
sorting by bucketing first distributes the elements among a number of buckets that are then sorted individually.
Show that sorting can be completed in 0(log n) time on the EREW model using n processors and O(mn) memory
locations.
5. Which of the following sequences are bitonic sequences?
a) 2,3
b) 8,1
c) 2,5,3
d) 6,2,6,9,7
e) 3,3,4,5,2
f) 1,3,6,4,7,9
g) 8,4,2,1,2,5,7,9
h) 1,9,7,3,2,5
6. Prove or disprove: All sequences containing fewer than four elements are bitonic sequences.
7. How many shuffle-exchange steps does Stones bitonic sorter require for n-values, where
And each step uses n/2 comparators?
8. What is the worst case time complexity of the parallel quicksort algorithm?

n 2k

TUTORIAL SHEET 8

1. Show that (log n) is a lower bound on the number of steps required to search a sorted sequence of n elements on
an EREW SM SIMD computer with n processors.
2. Consider a tree-connected SIMD computer where each node contains a record (not just the leaves). Describe
algorithms for querying and maintaining such a file of records.
3. Can the transpose of an n x n matrix be obtained on an interconnection network, other than the perfect shuffle, in
O(log n) time?
4. Is there an interconnection network capable of simulating procedure EREW TRANSPOSE in constant time?

n2
5. Design an algorithm for multiplying two n x n matrices on a cube with
processors in 0(n) time.
6. Show that procedure CUBE CONNECTIVITY is not cost optimal. Can the procedure's cost be reduced?
7. Derive a parallel algorithm to compute the connectivity matrix of an n-vertex graph in 0(n) time on an n x n meshconnected SIMD computer.
8. An articulation point of a connected undirected graph G is a vertex whose removal splits G into two or more
connected components. Design a parallel algorithm to determine all the articulation points of a given graph.