You are on page 1of 4

Parallel Processing :

There are two types of parallelism techniques in DS Px. They are


1. Pipeline Parallelism
2. Partitioning Parallelism
1.Pipeline Parallelism:
All the stages in the job will run concurrently. o stage will be i!le.
" will explain with one example. #et$s ta%e a job which will loa! the !ata from the &racle Source to
the oracle target with a transformer in between. '(en in a single)no!e
*onfiguration+ instea! of waiting for all source !ata to be rea!+ as soon as the source
!ata is present at the input+ these are passe! to the subsequent
stages. This metho! is calle! pipeline parallelism.
"f you ran the same job on a system with multiple processors+ the stage
rea!ing woul! start on one processor an! start filling a pipeline with the !ata it
ha! rea!. The transformer stage woul! start running as soon as there was
!ata in the pipeline process it an! start filling another pipeline. The stage
writing the transforme! !ata to the target woul! similarly start
writing as soon as there was !ata a(ailable. Thus all three stages are
operating simultaneously.
2.Partition Parallelism:
"n Partition parallelism+ the !ata is partitione! into a number of
separate sets+ with each partition being han!le! by a separate instance of the
job stages.
,sing partition parallelism the job woul! effecti(ely be run simultaneously by
se(eral processors+ each han!ling a separate subset of the total !ata. At the
en! of the job the !ata partitions can be collecte! bac% together again an!
written to a single !ata source.
Configuration file:
The configuration file !escribes a(ailable processing power in terms
of processing no!es. The number of no!es you !efine in the configuration
file !etermines how many instances of a process will be pro!uce!
when you compile a parallel job.
-hen you run a DataStage job+
DataStage first rea!s the configuration file to !etermine the a(ailable
system resources.
-hen you mo!ify your system by a!!ing or remo(ing processing
no!es or by reconfiguring no!es+ you !o not nee! to alter or e(en
recompile your DataStage job. .ust e!it the configuration file.
/ou can create as many config files as many u want but at once only one can be use!. The config file
can create! from the manager. The config file has the structure
0
no!e 1no!e12 0
fastname 1ser(ername2
pool 1no!e12 1ser(ername2
resource !is% 1path2
resource scratch!is% 1path2
3
3
Partition Types:
The !ata is partitione! to get the goo! performance. The !ata is partitione! an! %ept in separate sets
an! this will be put on !ifferent processors a(ailable.
There are 4 partition metho!s
They are
1. Auto
2. Same
5. 'ntire
6. 7oun! 7obin
8. 7an!om
9. :o!ulus
;. <ash
=. D>2
4. 7ange
1. Auto:
This is the !efault partition present in the DS. This will select the best partition metho! !epen!ing on
the type of stage. Typically DS uses roun! robin partitioning metho!.
2. Same:
DS ta%es the partition metho! that is use! by the pre(ious stage. >ecause of this the recor!s staye! in
some no!e will not be re!istribute! an! the partitioning will be faster. Same is the fastest partitioning
metho!.This can be use! when passing the !ata between the stages.
3. Entire:
"n this the total !ata set will be present in each processing no!e. '(ery no!e will be ha(ing the access
to the total !ata set. This is generally use! for the loo%ups.
4. Round Roin:
"n this metho!+ the !ata will be portione! on the a(ailable no!es. The first recor! goes to the first
no!e+ the secon! recor! goes to the secon! no!e an! so on. After the last no!e gets the recor! the
process again starts. "n this we will get the equal sharing of !ata between the no!es.
!. Random:
"n ran!om metho! the recor!s are !istribute! ran!omly base! on the ran!om generate! (alue. "n this
metho! also we will get the equal share! !ata on all the no!es. >ut this will require time to calculate
the ran!om number
". #odulus:
"n this+ the partitioning is !one base! on the %ey column mo! by the no. of no!es. >ase! on the result
it is !eci!e! to mo(e which recor! to which no!e.
$. %as&:
"n hash partitioning+ the !ata is partitioine! base! on the some %ey column combination. The %ey
(alues are ran!omly !istribute! among the a(ailable no!es. "n this all the partitions will not be of
equal si?e. The recor!s with same partitioning are put in the same no!e.
'. ()2:
"n this+ the D>2 partitioning metho! is use! to partition metho!.
*. Range:
"n this+ base! on some range the !ata is partitione! into the sets an! each is assigne! to the a(ailable
no!es. "n this we will get equal si?e partitions.
Collecting met&ods:
The collecting metho!s are use! to collect the !ata which is partitione! into !ifferent sets.
The !ata is collecte! using the collecting metho!s an! the !ata is mo(e! to the target stage.
There are 6 types of collecting metho!.
They are
1. 7oun! robin
2. Auto
5. &r!ere!
6. Sorte! :erge
1. Round roin:
"n this metho!+ first recor! will be rea! from the first partition an! secon! recor! from secon!
partition an! so on.
2. Auto:
"n this type the DS will choose the best collecting metho! base! on the stage an! will use that to
collect the !ata. Auto is the fastest collection metho! in DS.
3. +rdered:
"n this it will rea! all the recor!s from the first partition an! then secon! an! so on until it reaches the
last partition.
4. Sorted #erge:
7ea! recor!s in an or!er base! on one or more columns of the recor!.
The columns use! to !efine recor! or!er are calle! collecting %eys.
Typically+ you use the sorte! merge collector with a partition)sorte!
!ata set @as create! by a sort stageA. "n this case+ you specify as the
collecting %ey fiel!s those fiel!s you specifie! as sorting %ey fiel!s to
the sort stage.