Вы находитесь на странице: 1из 43

SEMINAR

ON
EXTERNAL PARALLEL SORTING

DEEPAK GUPTA
22-ME(CTA)-07
Delhi College of Engineering
Sunday, March 8, 2015

Seminar on External Parallel Sorting

external parallel sorting ?


Simple definition:
External sorting involves sorting more data than can fit in the
combined memory of all the processors on the machine.

Challenges?
This involves using disk as a form of secondary memory, and it
presents some very interesting challenges because of the huge
difference between the bandwidths and latencies of memory
and disk systems.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

contents
1. Sorting Algorithm Attributes
2. External Sorting
3. Algorithm to sort the data using external sorting
3.1 Two-way Merge sort
3.2 Balance Merge sort
3.3 Poly phase Merge sort
4. Parallel sorting basics
5. Algorithm to sort the data using external parallel sorting
5.1 Parallel compare-split algorithm
5.2 Sorting Network
5.3 Bitonic sorting
5.4 Shear sorting
6. Case study

Sunday, March 8, 2015

Seminar on External Parallel Sorting

sorting algorithm attributes


Internal vs. external
internal: data fits in memory
external: uses tape or disk

Comparison-based or not
(a) comparison sort
basic operation: compare elements and exchange as necessary
(n log n) comparisons to sort n numbers
(b) non-comparison-based sort
e.g. radix sort based on the binary representation of data
(n) operations to sort n numbers

Parallel vs. sequential


Sunday, March 8, 2015

Seminar on External Parallel Sorting

external sorting?
Definition:
External sorting algorithms used for sorting large data files that could not fit into the
main memory of the computer.

Algorithms Used:
The algorithms presented here are two-way sort/merge, the balanced k-way
sort/merge, and the poly phase sort/merge.

Advantages:
1. Sorting transaction to match the key order of the master file can
reduce the time required to locate a matching master record
2. Sorted transactions can improve the maintenance of master files with
other types of file organization
3. Many reports generated from files require that the file be sorted, as it is
more presentable and readable when sorted.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

internal V/S external sorting?


Main Difference:
When list to be sorted, resides completely in main memory of a single
processor, then it is said to be internal sorting but if some part of list is in
main memory and other part is in secondary memory of a single processor,
then it is said to be external sorting.

Sorting used?
1. Internal sort is applied when SIZEDATA < SIZEMAIN-MEMORY
2. External sort is applied when SIZEDATA >= SIZEMAIN-MEMORY.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

two-way merge sort


A sort/merge algorithm involves two steps:
Step 1. The records in the file to be sorted are divided into several
groups, called run, and each run fits into main memory. An
internal sort is applied to each run, and the resulting sorted
runs are distributed to two external files.
Step 2. One run at a time from each of the external files created in step
1 merge into larger runs of sorted records. The result is stored
in a third external file. The data are distributed from the third
file back into the first two files, and the merge continues until
all records are in one large run.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

two-Way external merge Sort


Each pass we read + write
each page in file.
N pages in the file => the
number of passes

log2 N 1
So total cost is:

3,4

6,2

9,4

8,7

5,6

3,1

3,4

2,6

4,9

7,8

5,6

1,3

4,7
8,9

2,3
4,6

2-page runs
PASS 2

2,3

2 N log 2 N 1

4,4
6,7
8,9

1,2
3,5
6

4-page runs

PASS 3

Idea: Divide and


conquer: sort sub files
and merge
Sunday, March 8, 2015

1,3
5,6

Input file
PASS 0
1-page runs
PASS 1

1,2
2,3
3,4

Seminar on External Parallel Sorting

4,5
6,6
7,8
9

8-page runs

an example
Suppose we have a file of records with the following keys to be sorted in ascending
order:
50

110 95 10 100 36 153 40 120 60 70 130 22 140 80

Assumption:
Let us assume that the size of the run is 3 records/run.
Step 1. we can divide the keys into groups of 3 as follows:
(50, 110, 95), (10, 100, 36), (153, 40, 120), (60, 70, 130), (22,140, 80).
The number of groups indicates that we have 5 runs. These 5
runs are then loaded into 2 files:
File 1 contains run 1 (50, 110, 95), run 3 (153, 40, 120), and run 5 (22,
140, 80);
File 2 contains run 2 (10, 100, 36), and run 4 (60, 70, 130) as shown
below.
File 1File 2Sunday, March 8, 2015

22 40 50 80 95 110 120 140 153


10

36

60

70

Seminar on External Parallel Sorting

100

130
9

example continues
Step 2. we will be merging the two files together one run at a time into
a third file, say File3.
1. Run 1 in File 1 is merged with run 2 in File 2 to produce the
first run in File 3 with 6 sorted records.
2. Run 3 in File 1 is merged with run 4 in File 2 to produce
second run in File 3 also with 6 sorted records.
3. Now File 2 is empty and File 1 contains run 5, Run 5 is
merged into File 3 to make the third run of 3 sorted records.
Now we are going repeat step 1 by distributing each run in file 3 into file 1
and file 2, and then merge the two files into file 3 again. These steps are
repeated until we have only one run in file 3.
Complexity of the Algorithm:
The number of passes through the two-way sort is the ceiling of lg NR,
where NR means number of runs and lg means log base 2.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

10

balanced k-way merge sort

Introduction:
This algorithm is an improvement on the previous algorithm by increasing
the number of input and output files, the number of runs in each file is
decreased, with the result that fewer merges have to be performed.
Complications:
The merging phase is more complicated with k input files because a k-way
merge needs to distribute merged runs into k output files.
Advantage:
The sort/merge phase is faster than the algorithm that were previously
discussed.
Disadvantages:
The excess of use of files (disk space).
Sunday, March 8, 2015

Seminar on External Parallel Sorting

11

an example
Suppose we have a file of records with the following keys to be sorted in ascending
order using k-way merge sort with k = 3:
50

110

95 10

100 36 153

40 120 60 70 130 22 140 80

Assumption:
Let us assume that the size of the run is 3 records/run.
STEP 1.
we can divide the keys into groups of 3 as follows:
(50, 110, 95), (10, 100, 36), (153, 40, 120), (60, 70, 130), (22,140, 80).
The number of groups indicates that we have 5 runs. These 5
runs are then loaded into 3 files:
File 1 contains run 1 (50, 110, 95), and run 4 (60, 70, 130)
File 2 contains run 2 (10, 100, 36), and run 5 (22, 140, 80)
File 3 contains run 3 (153, 40, 120) as shown below File 1-

50

60

70

95

110

130

File 2-

10

22

36

80

100

140

File 3-

40

Sunday, March 8, 2015

120
Seminar on External Parallel Sorting

153
12

example continues
STEP 2.
we will be merging the two files together one run at a time into a fourth,
fifth & six file respectively.
1. Run 1 in File 1 is merged with run 2 in File 2 and run 3 in File 3 to
produce the first run in File 4 with 9 sorted records.
2. Run 4 in File 1 is merged with run 5 in File 2 to produce first
run in File 5 also with 6 sorted records.
File 6 is never used as there are not enough runs in the above
examples.
Now we are going repeat step 1 by distributing each run in file 4 & file 5
into file 1, file 2 and file 3, and then merge the two files into file 4, file5 &
file 6 again. These steps are repeated until we have only one run in file 1.
Complexity of the Algorithm:
The number of passes is lgk NR of k is the number of files to be merged and
it is also the number of output files.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

13

poly phase merge sort

Introduction:
The poly phase sort/merge algorithm is again an improvement to the other
algorithm described. This algorithms works just like the Balanced k-way
sort/merge algorithm but it does not use as many files. The algorithm
simply make use of the empty input file by turning that file into output file.
Condition for the poly phase merge sort algorithm:
This algorithm is only efficient, if we have unequal distributions of runs in
each input file.
Example:
Suppose we have a file of records with the following keys to be sorted
in ascending order:
50

110

95 10

100 36 153

40 120 60 70 130 22 140 80

Assumption:
Let us assume that the size of the run is 3 records/run.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

14

example continues
Step 1. we can divide the keys into groups of 3 as follows:
(50, 110, 95), (10, 100, 36), (153, 40, 120), (60, 70, 130), (22,140, 80).
The number of groups indicates that we have 5 runs. These 5
runs are then loaded into 2 files:
File 1 contains run 1 (50, 110, 95), run 3 (153, 40, 120), and run 5 (22,
140, 80);
File 2 contains run 2 (10, 100, 36), and run 4 (60, 70, 130) as shown
below.
File 1File 2-

22 40 50 80 95 110 120 140 153


10

36

60

70

100

130

Step 2. we will be merging the two files together one run at a time into
a third file, say File3.
1. Run 1 in File 1 is merged with run 2 in File 2 to produce the
first run in File 3 with 6 sorted records.
2. Run 3 in File 1 is merged with run 4 in File 2 to produce
second run in File 3 also with 6 sorted records.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

15

example continues
3. Now at this moment File 2 is empty, File 1 contains run 5, and File
3 contains Run 1 and Run 2 each with 6 records. So we use File 2
as the output file and merge the run 5 from File 1 with run 1 from
File 3 to produce first run of File 2.
4. Now at this moment File 1 is empty, File 2 contains run 1 with 9
sorted records, and File 3 contains Run 2 with 6 sorted records.
So we use File 1 as the output file and merge the run 1 from File 2
with run 2 from File 3 to produce first run of File 1.
Now we are going repeat the above steps until we have only one run in
either of the three files.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

16

Parallel Sorting Basics


Where are the input and output lists stored?
(a) we assume that both input and output lists are distributed

What is a parallel sorted sequence?


(a) sequence partitioned among the processors
(b) each processors sub-sequence is sorted
(c) all in Pj's sub-sequence < all in Pk's sub-sequence if j < k
the best process numbering can depend on network topology

Sunday, March 8, 2015

Seminar on External Parallel Sorting

17

Element-wise Parallel CompareExchange


Assumption:
When Partitioning is one element per process

Sunday, March 8, 2015

Seminar on External Parallel Sorting

18

Bulk Parallel Compare-Split

Pi retains smaller values; process Pj retains larger values.


Sunday, March 8, 2015

Seminar on External Parallel Sorting

19

Basic Analysis
Assumptions
1. Pi and Pj are neighbors
2. communication channels are bi-directional

Element wise compare-exchange: 1 element per processor


1. time = Ts + Tw

Bulk compare-split: n/p elements per processor


1. after compare-split on pair of processors Pi and Pj, i < j
smaller n/p elements are at processor Pi
larger n/p elements at Pj
2. time = Ts+ Tw n/p
merge in O(n/p) time, as long as partial lists are sorted
Sunday, March 8, 2015

Seminar on External Parallel Sorting

20

sorting network
Network of comparators designed for sorting
Comparator : two inputs x and y; two outputs x' and y

Sorting network speed is proportional to its depth

Sunday, March 8, 2015

Seminar on External Parallel Sorting

21

Sorting Networks
Network structure: a series of columns
Each column consists of a vector of comparators (in parallel)
Sorting network organization:

Sunday, March 8, 2015

Seminar on External Parallel Sorting

22

Example: Bitonic Sorting Network


Bitonic sequence
two parts: increasing and decreasing
1,2,4,7,6,0 : first increases and then decreases (or vice versa)
cyclic rotation of a bitonic sequence is also considered bitonic
8,9,2,1,0,4 : cyclic rotation of 0,4,8,9,2,1

Bitonic sorting network


sorts n elements in (log2 n) time
network kernel: rearranges a bitonic sequence into a sorted one

Sunday, March 8, 2015

Seminar on External Parallel Sorting

23

Bitonic Split

Sunday, March 8, 2015

Seminar on External Parallel Sorting

24

Bitonic Merge
Sort a bitonic sequence through a series of bitonic splits
Example: use bitonic merge to sort 16-element bitonic sequence
How: perform a series of log 2 16 = 4 bitonic splits

Sunday, March 8, 2015

Seminar on External Parallel Sorting

25

sorting via bitonic merging network


Sorting network can implement bitonic merge algorithm
bitonic merging network

Network structure
log 2 n columns
each column
n/2 comparators
performs one step of the bitonic merge

Bitonic merging network with n inputs: BM[n]


yields increasing output sequence

Replacing comparators by comparators: BM[n]


yields decreasing output sequence
Sunday, March 8, 2015

Seminar on External Parallel Sorting

26

Bitonic Sort
How do we sort an unsorted sequence using a
bitonic merge?
Two steps
Step 1. Build a bitonic sequence
Step 2. Sort it using a bitonic merging network

Sunday, March 8, 2015

Seminar on External Parallel Sorting

27

Building a Bitonic Sequence


1. Build a single bitonic sequence from the given sequence
any sequence of length 2 is a bitonic sequence.
build bitonic sequence of length 4
sort first two elements using BM[2]
sort next two using BM[2]

2. Repeatedly merge to generate larger bitonic sequences


BM[k] & BM[k]: bitonic merging networks of size k

Sunday, March 8, 2015

Seminar on External Parallel Sorting

28

Building a Bitonic Sequence

Sunday, March 8, 2015

Seminar on External Parallel Sorting

29

Bitonic Sort, n = 16

Sunday, March 8, 2015

Seminar on External Parallel Sorting

30

complexity of bitonic sorting networks

Sunday, March 8, 2015

Seminar on External Parallel Sorting

31

shear sort
How do we sort an unsorted sequence using a
Shear Sort?
Following steps
Step 1. Build a snake like structure to arrange all elements.
Step 2. Sort the rows, every other increasing (see arrow), every other
decreasing
Step 3. Sort columns, increasing
Step 4. Sort rows
Step 5. Sort columns

Sunday, March 8, 2015

Seminar on External Parallel Sorting

32

shear Sort
Also known as
snake sort.
Each row having 4
columns (or 4
processors)
Step 1 is shown in which
all the elements are
arranged as shown in a
snake like structure
Sunday, March 8, 2015

Seminar on External Parallel Sorting

33

shear sort
Step 2. Sort the rows, every other increasing (see arrow), every other
decreasing (Local sort).

Sunday, March 8, 2015

Seminar on External Parallel Sorting

34

shear sort
Step 3. Sort the columns, or transpose the complete structure ad sort
rows.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

35

shear sort
Continue in this fashion and you will get the required sorted
elements.

Complexity Analysis of Shear sort:


n numbers
n processors
O ( log n ( n log n +T ( n )))
Total time: O (log n)

Sunday, March 8, 2015

Seminar on External Parallel Sorting

36

Case Study
Parallel disk System
Assumptions:
1. A distributed memory parallel computer
2. A parallel filesystem where all CPUs can access all the data
The machine used is 128 cell AP1000 with 32 disks and the
HiDIOS parallel filesystem.

Note: The AP1000 m/c has only local disk access


in hardware. Remote disk access is provided by
the operating system.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

37

case study continues


Designing an algorithm:
An external sorting algorithm needs to make extensive use of
secondary memory (usually disk). In this we will use a parameter k
where k = N/M, where N is the number of elements being sorted
and M is the number of elements that can fit in the internal memory
of the machine.
If k = 1, implies internal sorting & any value of k greater than 1
implies external sorting.

On AP1000 available, k is limited to 8 (a total of


2GB of RAM and 16GB of disk)

Sunday, March 8, 2015

Seminar on External Parallel Sorting

38

case study continues


Algorithm overview
The external algorithm used here is shear sort in which there is the
mapping of file onto a 2 dimensional k X k grid in a snake-like
fashion.
A grid is further divided into slices which contain enough elements to
fill one processor .

Sunday, March 8, 2015

Seminar on External Parallel Sorting

39

case study continues


The partitioning with k = 3 and p = 6 is shown below

Each slice in the above figure is labeled with two sets of coordinates. One
is its slice number and the other is its (row,column) coordinates.
The slice number uniquely defines the slice whereas the (row,column)
coordinates are shared between all slices in a grid square.
Sunday, March 8, 2015

Seminar on External Parallel Sorting

40

case study continues


The difficulties in partitioning of elements among grid or
slices
1. The number of slices in a row or column must be less
than or equal to the number of processor P.
2. The number of elements in a slice must be less than the
amount of available memory in a processor M.
After partitioning of elements are completed then we
will use shear sort to sort all the elements in increasing
order using the steps described earlier.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

41

references
Tandem Technical Report 86.3 FastSort: An External Sort Using Parallel
Processing Alex Tsukerman Jim Gray Michael Stewart Susan Uren Bonnie
Vaughan Spring 1986
A. Tridgell. Ef_cient Algorithms for Sorting and Synchronization.PhD thesis,
Australian National University,1999.
Lecture notes on parallel sorting by Isak Jonsson isak@cs.umu.se NGSSC
2003
Lecture notes on parallel sorting by John Mellor-Crummey Department of
Computer Science Rice University johnmc@cs.rice.edu
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms.
MIT Press, 1990.

Sunday, March 8, 2015

Seminar on External Parallel Sorting

42

Thanks

Any Question ?
Sunday, March 8, 2015

Seminar on External Parallel Sorting

43

Вам также может понравиться