Академический Документы
Профессиональный Документы
Культура Документы
Datstage is an ETL Tool and it is client-server technology and integrated toolset used for
designing, running, monitoring and administrating the “data acquisition” application is
known as “job”.
When I was installed Data stage software in our personal PC it’s automatically comes in our
PC is having 4 components in blue color like DATASTAGE ADMINISTRATOR, DATASTAGE
DESIGNER, DATASTAGE DIRECTOR, DATASTAGE MANAGER. These are the client
components.
DS Client components:-
This components will be used for to perform create or delete the projects. , cleaning
metadata stored in repository and install NLS.
2) Data stage Manager:-
It is used to validate, schedule, run and monitor the Data stage jobs.
It is used to create the Datstage application known as job. The following activities can be
performed with designer window.
It is one of the server side components which is defined to store the information about
to build out Data Ware House.
This is defined to execute the job while we are creating Data stage jobs.
Parallel jobs:-
II. Active stages:- which defines the data transformation and filtering
the data known as active stage.
Ex:- All Processing Stages.
6. Explain parallism techniques?
Ans:- it is a process to perform ETL task in parallel approach need to build the data warehouse.
The parallel jobs support the following hardware system like SMP, MPP to achieve the
parallism.
There are two types of parallel parallism techniques.
A. Pipeline parallism.
B. Partition parallism.
Pipeline parallism.:- the data flow continuously throughout it pipeline . All stages in the
job are Operating simultaneously.
For example, my source is having 4 records as soon as first record starts processing, then all
remaining records processing simultaneously.
Partition parallism:- in this parallism, the same job would effectively be run simultaneously by
several processors. Each processors handles separate subset of total records.
For example, my source is having 100 records and 4 partitions. The data will be equally partition
across 4 partitions that mean the partitions will get 25 records. Whenever the first partition starts, the
remaining three partitions start processing simultaneously and parallel.
Round Robin:- the first record goes to first processing node, second record goes to the second
processing node and so on….. This method is useful for creating equal size of partition.
Hash:- The records with the same values for the hash-key field given to the same processing node.
Modulus:- This partition is based on key column module. This partition is similar to hash partition.
Random:- The records are randomly distributed across all processing nodes.
Range:- The related records are distributed across the one node . The range is specified based on key
column.
Auto:- This is most common method. The data stage determines the best partition method to use
depending upon the type of stage.