Вы находитесь на странице: 1из 21

Relational Algebra & Sync Sort

A Technical Flyer
Pravin Sharma
Table of Contents

S# Section Page #
1 Preface 3
2 SORT Basics 4
3 Schema Used in Examples 7
4 Basic Structure of SQL Query 8
5 SQL & SYNCSORT 9
6 JOINS using SORT 13
7 Tutorial 18
Preface

Sometimes love at first sight lasts for years.

This text offers new and interesting ways of using Mainframe’s SYNC-SORT utility, so sit up straight and
pay attention to what the document has to say. Then watch your efficiency speed up (while having some
fun too).

Sync Sort & Relational Algebra is a text designed to give an idea, how Database Management System can
be replaced with flat file processing. Flat Files are fast & cost effective data processing mechanism
where amount of data is huge.

Flat files are still having boundaries like security mechanism, integrity constraints, concurrency control
those can be only implemented through a good Relational Data Base Management System.
SORT Basics
SORT Definition
SORT as a verb means to arrange according to class, kind, or size.
Sorting is nothing but arranging the data in ordered fashion for optimal retrieval & faster searching.
Sorting Algorithm
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in
a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is
important to optimizing the use of other algorithms (such as search and merge algorithms) that require
sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-
readable output. More formally, the output must satisfy two conditions:

 The output is in non-decreasing order (each element is no smaller than the previous element
according to the desired total order);
 The output is a permutation, or reordering, of the input.

Which looks good?

It’s easy to find if items are arranged in ordered fashion.

2
1

5 5
5

Un Sorted 4 4
3 3
4

2 2
1 1
Un Sorted Sorted
Summaries of popular sorting algorithms
Bubble sort
Bubble sort is a straightforward and simplistic method of sorting data that is used in computer science
education. The algorithm starts at the beginning of the data set. It compares the first two elements, and
if the first is greater than the second, it swaps them. It continues doing this for each pair of adjacent
elements to the end of the data set. It then starts again with the first two elements, repeating until no
swaps have occurred on the last pass. While simple, this algorithm is highly inefficient and is rarely used
except in education. For example, if we have 100 elements then the total number of comparisons will be
10000. A slightly better variant, cocktail sort, works by inverting the ordering criteria and the pass
direction on alternating passes. Its average case and worst case are both O(n²).

Selection sort
Selection sort is a simple sorting algorithm that improves on the performance of bubble sort. It works by
first finding the smallest element using a linear scan and swapping it into the first position in the list,
then finding the second smallest element by scanning the remaining elements, and so on. Selection sort
is unique compared to almost any other algorithm in that its running time is not affected by the prior
ordering of the list: it performs the same number of operations because of its simple structure. Selection
sort requires (n - 1) swaps and hence Θ(n) memory writes. However, Selection sort requires (n - 1) + (n -
2) + ... + 2 + 1 = n(n - 1) / 2 = Θ(n2) comparisons. Thus it can be very attractive if writes are the most
expensive operation, but otherwise selection sort will usually be outperformed by insertion sort or the
more complicated algorithms.
Insertion sort
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted
lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list
one by one and inserting them in their correct position into a new sorted list. In arrays, the new list and
the remaining elements can share the array's space, but insertion is expensive, requiring shifting all
following elements over by one. The insertion sort works just like its name suggests - it inserts each item
into its proper place in the final list. The simplest implementation of this requires two list structures -
the source list and the list into which sorted items are inserted. To save memory, most implementations
use an in-place sort that works by moving the current item past the already sorted items and repeatedly
swapping it with the preceding item until it is in place. Shell sort (see below) is a variant of insertion sort
that is more efficient for larger lists. This method is much more efficient than the bubble sort, though it
has more constraints.

Shell sort
Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by
moving out of order elements more than one position at a time. One implementation can be described
as arranging the data sequence in a two-dimensional array and then sorting the columns of the array
using insertion sort. Although this method is inefficient for large data sets, it is one of the fastest
algorithms for sorting small numbers of elements (sets with less than 1000 or so elements). Another
advantage of this algorithm is that it requires relatively small amounts of memory.

Merge sort
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by
comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should
come after the second. It then merges each of the resulting lists of two into lists of four, then merges
those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms
described here, this is the first that scales well to very large lists, because its worst-case running time is
O(n log n).

Heap sort
Heap sort is a much more efficient version of selection sort. It also works by determining the largest (or
smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the
rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a special
type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to be the
largest element. When it is removed and placed at the end of the list, the heap is rearranged so the
largest element remaining moves to the root. Using the heap, finding the next largest element takes
O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows Heap sort to run in
O(n log n) time.

Quick sort
Quick sort is a divide and conquer algorithm which relies on a partition operation: to partition an array,
we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater
elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the
lesser and greater sub lists. Efficient implementations of Quick sort (with in-place partitioning) are
typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice.
Together with its modest O(log n) space usage, this makes quick sort one of the most popular sorting
algorithms, available in many standard libraries. The most complex issue in quick sort is choosing a good
pivot element; consistently poor choices of pivots can result in drastically slower (O(n²)) performance,
but if at each step we choose the median as the pivot then it works in O(n log n).

Bucket sort
Bucket sort is a sorting algorithm that works by partitioning an array into a finite number of buckets.
Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively
applying the bucket sorting algorithm. A variation of this method called the single buffered count sort is
faster than the quick sort and takes about the same time to run on any set of data.

Radix sort
Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k) time by treating
them as bit strings. We first sort the list by the least significant bit while preserving their relative order
using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end
up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the
number of values a bit can have is small.

Following are the different SORT Mechanism & there Complexities


Comparisons Sorts
Name Average Worst Memory Stable Method Other notes
Bubble sort O(n²) O(n²) O(1) Yes Exchanging Two Consecutive elements are
compared.
Cocktail sort — O(n²) O(1) Yes Exchanging
Comb sort — — O(1) No Exchanging Small code size
Gnome sort — O(n²) O(1) Yes Exchanging Tiny code size
Selection sort O(n²) O(n²) O(1) No Selection Can be implemented
as a stable sort
Insertion sort O(n²) O(n²) O(1) Yes Insertion Average case is also
O(n + d), where d is the
number of inversions
Shell sort — O(n log² n) O(1) No Insertion
Binary tree sort O(n log n) O(n log n) O(n) Yes Insertion When using a self-balancing
binary search tree
Library sort O(n log n) O(n²) O(n) Yes Insertion
Merge sort O(n log n) O(n log n) O(n) Yes Merging
In-place merge O(n log n) O(n log n) O(1) No Merging
sort
Heapsort O(n log n) O(n log n) O(1) No Selection
Smoothsort — O(n log n) O(1) No Selection
Quicksort O(n log n) O(n²) O(log n) No Partitioning Naïve variants use O(n) space; can
be O(n log n) worst case if median
pivot is used
Introsort O(n log n) O(n log n) O(log n) No Hybrid used in most implementations of
STL
Patience sorting — O(n²) O(n) No Insertion Finds all the longest increasing
subsequences within O(n log n)
Strand sort O(n log n) O(n²) O(n) Yes Selection
Following animation link will give you exact idea of different sorting mechanism comparatively.
http://vision.bc.edu/~dmartin/teaching/sorting/anim-html/all.html
Schema used in examples

DDL Statement COBOL Equivalent declaration generated by DCLGEEN utility


Table Creation Query ********************************************
CREATE TABLE LOAN * COBOL DECLARATION FOR TABLE LOAN *
********************************************
( LOANNMBR CHAR(10) NOT EXEC SQL DECLARE LOAN TABLE
NULL, ( LOANNMBR CHAR(10) NOT NULL,
BRANCHNM CHAR(15), BRANCHNM CHAR(15),
AMOUNT INTEGER, AMOUNT INTEGER
PRIMARY KEY (LOANNMBR)); ) END-EXEC.
********************************************
* COBOL DECLARATION FOR TABLE LOAN *
Query to Create Index ********************************************
CREATE UNIQUE INDEX IDX_LOAN 01 DCLLOAN.
ON LOAN (LOANNMBR); 10 LOANNMBR PIC X(10).
10 BRANCHNM PIC X(15).
10 AMOUNT PIC S9(9) USAGE COMP.
********************************************
Query to Create Table Borrower ********************************************
CREATE TABLE BORROWER * COBOL DECLARATION FOR TABLE BORROWER *
********************************************
(CUSTNAME CHAR(20) NOT NULL, EXEC SQL DECLARE BORROWER TABLE
LOANNMBR CHAR(10) NOT NULL ( CUSTNAME CHAR(20) NOT NULL,
PRIMARY KEY LOANNMBR CHAR(10) NOT NULL
(CUSTNAME,LOANNMBR)); ) END-EXEC.
********************************************
* COBOL DECLARATION FOR TABLE BORROWER *
Query to Create Index ********************************************
CREATE UNIQUE INDEX IDX_BORR 01 DCLBORROWER.
ON BORROWER 10 CUSTNAME PIC X(20).
(CUSTNAME,LOANNMBR); 10 LOANNMBR PIC X(10).
********************************************

Data in Loan Relationship Data in Borrower Relationship


Loannmbr branchnm amount Custname loannmbr
L-170 DOWNTOWN 3000 HAYES L-155
L-230 REDWOOD 4000 JONES L-170
L-260 PERRYRIDGE 1700 SMITH L-230
Basic Structure of SQL Query
SQL is based on set and relational operations with certain modifications and enhancements
A typical SQL query has the form:
select A1, A2, ..., An
from r1, r2, ..., rm
where P
Ais represent attributes
ris represent relations
P is a predicate.

This query is equivalent to the relational algebra expression.


ΠA1, A2, ..., An(σP (r1 x r2 x ... x rm))

The result of an SQL query is a relation.

“A query is a way of retrieving some subset of


information from a database.”
That information might be a single number
such as a product price, a list of members with
overdue subscriptions, or some sort of
calculation such as the total amount of
products sold in the past 12 months.

“A query is like a window on our database


through which we can see just the information
we require.”
We will further explore flavours of above query using Mainframe utility SYNCSORT.

 Projection Operation
The select clause list the attributes desired in the result of a query corresponds to the projection
operation of the relational algebra. Basically Projection is selection criteria on Column of relation.
Projection can be also viewed as reformatting.

A Subset of Columns Resulting Table

E.g. find the names of all branches in the loan relation; below mentioned expression will project branch-
name column from loan relation.
SQL implementation
select branchnm
from loan

Relational Algebra
In the “pure” relational algebra syntax, the query would be:
Πbranchnm (loan)

SYNCSORT Implementation
Projection operator can be implemented through
INREC : Which columns will go for processing?
INREC FIELDS=(11,15)
SORT : SORT command
SORT FIELDS=COPY
Syntax
SORT FIELDS=(<start-pos>,<length-in-bytes>,<field-format>,<sort-sequence>,…)
<start-pos> specifies starting position of field on which sorting is to be done
<length-in-bytes> specifies length of sort field in bytes
<field-format> specifies data format of the sort field. Field formats are
CH EBCDIC character sequence
AC ASCII character sequence
BI Binary sequence
ZD Zoned Decimal
PD Packed Decimal
<sort-sequence> “A” or “D” indicating Ascending or Descending sequence
Other Possible format is
SORT FIELDS=COPY
Applicable only of you don’t want to sort or re-order.

Sample JCL step for SORT


//SORTPRO EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=LOAN.DOWNLOAD,
// DISP=SHR
//SORTOUT DD DSN=LOAN.BRANCHNM,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=15,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
INREC FIELDS=(11,15)
SORT FIELDS=COPY
/*

Output dataset of above jcl will be having following records branchnm


DOWNTOWN
REDWOOD
PERRYRIDGE

 Selection Operation
Select operation select tuples that satisfy a given predicate. Select chooses data horizontally (rows).

A Subset of Rows Resulting Table


E.g. find the rows from loan relation for branch-name = “Perryridge”

SQL implementation
Horizontal selection is implemented in SQL through WHERE, LIKE, HAVING, IN, NOT IN clauses.
Query for given example will be as follows.
select *
from loan
where branchnm = “Perryridge”

Relational Algebra
In the relational algebra syntax, the query would be:
Πloannbmbr, branch-name, amount (σbranch-name =“Perryridge” (loan))

SYNCSORT Implementation
Select operation can be implemented by
INCLUDE COND
INCUDE Control Statement Syntax
INCLUDE COND=(<start-pos>,<length-in-bytes>,<field-format>,<conditional Operator>
<expression>,<logical operator>,…)
<start-pos> specifies starting position of field on which sorting is to be done
<length-in-bytes> specifies length of sort field in bytes
<field-format> specifies data format of the sort field. Field formats are
CH EBCDIC character sequence
AC ASCII character sequence
BI Binary sequence
ZD Zoned Decimal
PD Packed Decimal
<conditional operator> specifies operator to be tested
EQ Equals
NE Not Equals
LT Less Than
LE Less than or Equals
GT Greater Than
GE Greater than or Equals
<expression> specifies either constant or other columns as
(<start-pos>,<length-in-bytes>,<field-format>)
<logical operator> AND/OR
Simplified way
Expression will be translated as
INCLUDE COND = (11,15,CH,EQ,C’PERRYRIDGE’) loannmbr branchnm amount
L-170 DOWNTOWN 3000
Finally Sort Card will be
INCLUDE COND = (11,15,CH,EQ,C’PERRYRIDGE’)
SORT FIELDS=NONE
Output of above SORTCARD will be having following records
Let’s take another example to fetch all the rows having amount>2000
SQL Query for given example will be as follows.
select *
from loan
where amount>2000
Relational Algebra
In the relational algebra syntax, the query would be:
Πloannbmbr, branch-name, amount (σamount>2000 (loan) )
loannmbr branchnm amount
SYNCSORT Implementation
L-170 DOWNTOWN 3000
Select operation can be implemented by
L-230 REDWOOD 4000
INCLUDE COND=(26,4,BI,GT,2000)
SORT FIELDS=COPY
Output of above SORTCARD will be having following records
Note: Opposite of INCLUDE is OMIT syntax is also same.
Ordering or Sorting ORDER BY Clause
Ordering or sorting on output field
E.g. find the names of all branches in the loan relation in sorted order of
branch-name; below mentioned expression will project branch-name column
from loan relation.

SQL implementation
select loannmbr ,branchnm, amount
from loan
order by branchnm

Relational Algebra
In the “pure” relational algebra syntax, the query would be:
Πloannmbr, branchnm, amount(σbranchnm(i)<branchnm(i+1) (loan))

Note: Here less than mark is to indicate ascending order of branchname.

SYNCSORT Implementation
Projection operator can be implemented through
loannmbr branchnm amount
INREC : As selecting all columns no need to put INREC
L-170 DOWNTOWN 3000
SORT : SORT command
L-260 PERRYRIDGE 1700
SORT FIELDS=(11,15,CH,A)
L-230 REDWOOD 4000
Output of above JCL will be having following records
Summing Up the records

SQL implementation
select branchnm, SUM(amount)
from loan
group by branchnm

Relational Algebra
In the “pure” relational algebra syntax, the query would be:
Π branchnm, Σamount(σbranchnm(loan))

SYNCSORT Implementation
Projection operator can be implemented through
branchnm amount
INREC : As selecting all columns no need to put INREC
DOWNTOWN 3000
SORT : SORT command
PERRYRIDGE 1700
SORT FIELDS=(11,15,CH,A)
REDWOOD 4000
SUM FIELDS=(26,4,BI)
Output of above JCL will be having following records
 JOIN Operation
JOINS following are the 4 types of join in SQL
JOIN Type Meaning Example
Cross Join Simple Cartesian Product Selecting Loan & Customer details(quite meaningless)
without where clause SELECT *
FROM loan , borrower
Inner Join returns matched records Selecting Loan & borrowers detail holding that loan
that satisfies given (meaningful version of above)
condition SELECT *
FROM LOAN AS L, BORROWER AS B
WHERE L.LOANNMBR= B.LOANNMBR
Outer Join Selects matched records Selecting branch details & customers have taken loan from
with that branch even if there is no customer in that branch.
given condition as well as SELECT *
un- FROM LOAN LEFT OUTER JOIN BORROWER
matched records from one WHERE LOAN.LOANNMBR= BRANCH.LOANNMBR
of
the table. Types
* Left
* Right
* Full
Self Join Joining table with it self Selecting the list of customers taken multiple loans from
different branches

Cross Join
Simple Cartesian product without where clause this is quite meaningless; like selecting loan & customer
details
SQL implementation
select *
from loan, borrower;

Relational Algebra
In the “pure” relational algebra syntax, the query would be:
Πloannbmbr, branch-name, amount, custname (loan, borrower)

SYNCSORT Implementation
Cross Join is not supported using SYNCSORT.
Inner Join
Inner join is showing only matching records if matching data found in both the table corresponding to
key.
SQL implementation
select *
from loan, borrower
where loan.loannmbr=borrower.loannmbr;

Relational Algebra
In the “pure” relational algebra syntax, the query would be:
Πloannbmbr, branch-name, amount (σloan.loannmr=borrower.loannmbr (loan, borrower))

SYNCSORT Implementation
Above Join Operation can be implemented through
JOINKEYS : Identifies file and the key columns on which match will be performed.
REFORMAT FIELDS : Defines how the columns will be arranged in output file.
SORT FIELDS : Defines on which column sorting will be done.

Above join can be implemented by following sort card.


JOINKEYS FILES=F1,FIELDS=(1,10,A),SORTED
JOINKEYS FILES=F2,FIELDS=(21,10,A),SORTED
REFORMAT FIELDS=(F1:1,29,F2:1,30)
SORT FIELDS=COPY

More about JOINKEYS FIELDS


The number of JOINKEYS fields and their lengths and sorted order (A or D) must be the same for both
files, although their starting positions need not be the same.

You can also specify INCLUDE/OMIT condition if you want to select or omit records from I/P File
Let’s say you want to extract Loan records for ‘REDWOOD’ branch only.
SQL statement will be
select *
from loan, borrower
where loan.loannmbr=borrower.loannmbr
and loan.brnchnm=”REDWOOD”

Then SORT card will become


JOINKEYS FILES=F1,FIELDS=(1,10,A),SORTED ,INCLUDE COND=(11,15,CH,EQ,C’REDWOOD’)
JOINKEYS FILES=F2,FIELDS=(21,10,A),SORTED
REFORMAT FIELDS=(F1:1,29,,F2:1,30)
SORT FIELDS=COPY
Sample JCL step for JOIN operation using syncsort
//SORJOIN EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTJNF1 DD DSN=LOAN.DOWNLOAD,
// DISP=SHR
//SORTJNF2 DD DSN=BORROWER.DOWNLOAD,
// DISP=SHR
//SORTOUT DD DSN=LOANBORR.JOIN,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=59,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
JOINKEYS FILES=F1,FIELDS=(1,10,A),SORTED
JOINKEYS FILES=F2,FIELDS=(21,10,A),SORTED
REFORMAT FIELDS=(F1:1,29,F2:1,30)
SORT FIELDS=COPY
/*

Output of above JCL will be having following records


loannmbr branchnm amount
L-170 DOWNTOWN 3000
L-260 PERRYRIDGE 1700
L-230 REDWOOD 4000

Outer Join
Outer join is showing matching records if matching data found in both the table corresponding to key; as
well as unpaired records:
If from 1st file Left Outer Join
If from 2nd file Right Outer Join
If from both the file  Full Outer Join
Following is the example of Left Outer Join
SQL implementation
select *
from loan LEFT OUTER JOIN borrower
where loan.loannmbr=borrower.loannmbr;

SYNCSORT Implementation
Above Join Operation can be implemented through
JOINKEYS : Identifies file and the key columns on which match will be performed.
JOIN : JOIN defines how the match will be performed
REFORMAT FIELDS : Defines how the columns will be arranged in output file.
SORT FIELDS : Defines on which column sorting will be done.
JOINKEYS : Specifies kind of OUTER Join
Above join can be implemented by following sort card.
JOINKEYS FILES=F1,FIELDS=(1,10,A),SORTED
JOINKEYS FILES=F2,FIELDS=(21,10,A),SORTED
REFORMAT FIELDS=(F1:1,29,F2:1,30)
JOIN UNPAIRED, F1
SORT FIELDS=COPY
Different Usage of JOIN Keyword
JOIN Usage JOIN Type Description
INNER JOIN When you are not specifying JOIN statement in SORT CARD.
SORT utility will treat as inner join.
That is all matched record will come in O/P file
JOIN UNPAIRED, F1 LEFT OUTER In this case all the records from file1 will be selected even
JOIN though there is no matching record in file2
JOIN UNPAIRED, F2 RIGHT OUTER In this case all the records from file2 will be selected even
JOIN though there is no matching record in file1
JOIN FULL OUTER All the records will be selected paired as well as unpaired
UNPAIRED,F1,F2 JOIN
or simply
JOIN UNPAIRED
JOIN FULL OUTER Only Unpaired record will be selected that is record should
UNPAIRED,ONLY JOIN be either in file1 or file2.
- INNER JOIN

Performance consideration
# while joining it’s always better to pass sorted files on key.
# as sorting performance depends upon I/P pattern.
To optimize sorting & joining it should be tested with real data.

Note: Joins are not supported in ICETOOL by JOINKEYS; it can be implemented by SPLICE see
documentation @ http://www.ibm.com/storage/dfsort for further details. As of now this text is
covering only sync sort later edition will include SPLICE through DFSORT as well.
Tutorial
JCL#1: Let’s start with very simple JCL for Copying the Dataset or taking backup using SORT.
Create a PDS with record length of 80 with name USERID.SORT.JCL, inside that create member
SORTTUT1. Create the JCL with following step.

//STEP001 EXEC PGM=SORT


//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=INPUT FILE NAME,
// DISP=SHR
//SORTOUT DD DSN=OUTPUT FILE NAME,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=80,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
SORT FIELDS=COPY
/*

Description:
Statement Description
EXEC statement is for executing SORT utility in STEP001
SYSOUT SYSOUT is dataset for output messages produced by SORT utility, here * has been
used to route all output statements to SPOOL.
SORTIN I/P dataset name from where data will be copied/sorted.
SORTOUT O/P dataset name where the data will be copied. Points here to remember is LRECL;
if it is not compatible with O/P data SORT utility will raise an error message.
SORTWKnn SORTWKnn are temporary work files; if not enough space provided in work files
step may abend with S*37 (SB37, SD37 or SE37 – Space allocation error).
SYSIN Finally the sysin parameter is the guideline to SORT utility what to do? By SYSIN we
pass commands that will be carried out or taken care by SORT utility.
JCL#2: Sorting Dataset on key field
Create a PDS with record length of 80 with name USERID.SORT.JCL, inside that create member
SORTTUT2. Create the JCL with following step.
Following JCL is same as mentioned for projection operation order by clause.
//SORTKEY EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=LOAN.DOWNLOAD,
// DISP=SHR
//SORTOUT DD DSN=LOAN.BRANCHNM,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=15,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
SORT FIELDS=(11,15,CH,A)
/*

Description:
Statement Description
EXEC statement is for executing SORT utility in STEP SORTKEY
SYSOUT SYSOUT is dataset for output messages produced by SORT utility, here * has been
used to route all output statements to SPOOL.
SORTIN I/P dataset name from where data will be copied/sorted.
SORTOUT O/P dataset name where the data will be copied. Points here to remember is LRECL;
if it is not compatible with O/P data SORT utility will raise an error message.
SORTWKnn SORTWKnn are temporary work files; if not enough space provided in work files
step may abend with S*37 (SB37, SD37 or SE37 – Space allocation error).
SYSIN Finally the sysin parameter is the guideline to SORT utility what to do? By SYSIN we
pass commands that will be carried out or taken care by SORT utility.

Note: Sorting can be done on multiple fields


For example sorting on Branch Name & Loan Number then SYSIN parameter will be
SORT FIELDS=(11,15,CH,A,1,10,CH,A)
JCL#3: Removing duplicates through sync sort.
Create a PDS with record length of 80 with name USERID.SORT.JCL, inside that create member
SORTTUT32. Create the JCL with following step.
//SORTDUP EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=INPUT.FILE.NAME,
// DISP=SHR
//SORTOUT DD DSN=OUTPUT/FILE.NAME.WITHOUT.DUP,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=15,BLKSIZE=0
//SORTXSUM DD DSN=OUTPUT/FILE.NAME.WITH.DUP,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=15,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
SORT FIELDS=(11,15,CH,A)
SUM FIELDS=NONE,XSUM
/*

Description:
Statement Description
EXEC statement is for executing SORT utility in STEP SORTDUP
SYSOUT SYSOUT is dataset for output messages produced by SORT utility, here * has been
used to route all output statements to SPOOL.
SORTIN I/P dataset name from where data will be copied/sorted.
SORTOUT O/P dataset name where the data will be copied. Points here to remember is
LRECL; if it is not compatible with O/P data SORT utility will raise an error message.
SORTXSUM O/P dataset name where the records with duplicate key will be copied. Points here
to remember is LRECL; if it is not compatible with O/P data SORT utility will raise an
error message.
SORTWKnn SORTWKnn are temporary work files; if not enough space provided in work files
step may abend with S*37 (SB37, SD37 or SE37 – Space allocation error).
SYSIN Finally the sysin parameter is the guideline to SORT utility what to do? By SYSIN we
pass commands that will be carried out or taken care by SORT utility.
JCL#4: Joining of two files (LEFT OUTER JOIN).
Create a PDS with record length of 80 with name USERID.SORT.JCL, inside that create member
SORTTUT4. Create the JCL with following step.
Following JCL is same as mentioned for JOIN operation.
//SORJOIN EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTJNF1 DD DSN=LOAN.DOWNLOAD,
// DISP=SHR
//SORTJNF2 DD DSN=BORROWER.DOWNLOAD,
// DISP=SHR
//SORTOUT DD DSN=LOANBORR.JOIN,
// DISP=(,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(TRK,(150,50),RLSE),
// RECFM=FB,LRECL=59,BLKSIZE=0
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5),RLSE)
//SYSIN DD *
JOINKEYS FILES=F1,FIELDS=(1,10,A),SORTED
JOINKEYS FILES=F2,FIELDS=(21,10,A),SORTED
REFORMAT FIELDS=(F1:1,29,F2:1,30)
JOIN UNPAIRED, F1
SORT FIELDS=COPY
/*
Description:
Statement Description
EXEC statement is for executing SORT utility in SOTJOIN
SYSOUT SYSOUT is dataset for output messages produced by SORT utility, here * has
been used to route all output statements to SPOOL.
SORTJNF1 First File for Join
SORTJNF2 Second File for Join
SORTOUT O/P dataset name that will contain joined records. Points here to remember is
LRECL; if it is not compatible with O/P data SORT utility will raise an error
message.
SORTWKnn SORTWKnn are temporary work files; if not enough space provided in work files
step may abend with S*37 (SB37, SD37 or SE37 – Space allocation error).

SYSIN Finally the sysin parameter is the guideline to SORT utility what to do? By SYSIN
we pass commands that will be carried out or taken care by SORT utility.

Вам также может понравиться