Вы находитесь на странице: 1из 4

1. Explain about Data stage Architecture?

Datstage is an ETL Tool and it is client-server technology and integrated toolset used for
designing, running, monitoring and administrating the “data acquisition” application is
known as “job”.

A job is graphical representation of dataflow from source to target and it is designed


with source definitions and target definition and transformation Rules.

The data stage software consists of client and server components

Data stage Designer

Data stage Server

Data stage Director TCP/IP

Data stage Manager


Data stage Repository

Data stage Administrator

When I was installed Data stage software in our personal PC it’s automatically comes in our
PC is having 4 components in blue color like DATASTAGE ADMINISTRATOR, DATASTAGE
DESIGNER, DATASTAGE DIRECTOR, DATASTAGE MANAGER. These are the client
components.

DS Client components:-

1) Data Stage Administrator:-

This components will be used for to perform create or delete the projects. , cleaning
metadata stored in repository and install NLS.
2) Data stage Manager:-

it will be used for to perform the following task like..

a) Create the table definitions.

b) Metadata back-up and recovery can be performed.

c) Create the customized components.

3) Data stage Director:-

It is used to validate, schedule, run and monitor the Data stage jobs.

4) Data stage Designer:-

It is used to create the Datstage application known as job. The following activities can be
performed with designer window.

a) Create the source definition.


b) Create the target definition.
c) Develop Transformation Rules
d) Design Jobs.

Data Stage Repository:-

It is one of the server side components which is defined to store the information about
to build out Data Ware House.

Data Stage Server:-

This is defined to execute the job while we are creating Data stage jobs.

2. What is a job? And Types of the Job?


Ans:- Job is nothing but it is ordered series of individual stages which are linked together to
describe the flow of data from source and target.
There are three types of jobs can be designed.
a) Server jobs
b) Parallel Jobs
c) Mainframe Jobs
3. Have you work either parallel jobs or server jobs?
Ans:- I had been working parallel job since 3+ years onwards…

4. What is difference between server jobs and parallel jobs?


Ans:-
Server jobs:-
a) In server jobs it handles less volume of data with more performance.
b) It is having less number of components.
c) Data processing will be slow.
d) It’s purely work on SMP (Symmetric Multi Processing).
e) It is highly impact usage of transformer stage.

Parallel jobs:-

a) It handles high volume of data.


b) It’s work on parallel processing concepts.
c) It applies parallism techniques.
d) It follows MPP (Massively parallel Processing).
e) It is having more number of components compared to server jobs.
f) It’s work on orchestrate framework
5. What is stage? Explain different various stages in data stage?
Ans:- A stage defines a database, file and processing
There are two types of stages.
a) Built-in stages
b) Plug-in stages.
Built-in stages:- These stages defines the extraction, Transformation, and loading
These are also divided into two types
I. Passive stages:- which defines read and write access are known as
passive stages.
EX:- All Database stages in palette window by designer.

II. Active stages:- which defines the data transformation and filtering
the data known as active stage.
Ex:- All Processing Stages.
6. Explain parallism techniques?
Ans:- it is a process to perform ETL task in parallel approach need to build the data warehouse.
The parallel jobs support the following hardware system like SMP, MPP to achieve the
parallism.
There are two types of parallel parallism techniques.
A. Pipeline parallism.
B. Partition parallism.

Pipeline parallism.:- the data flow continuously throughout it pipeline . All stages in the
job are Operating simultaneously.

For example, my source is having 4 records as soon as first record starts processing, then all
remaining records processing simultaneously.

Partition parallism:- in this parallism, the same job would effectively be run simultaneously by
several processors. Each processors handles separate subset of total records.
For example, my source is having 100 records and 4 partitions. The data will be equally partition
across 4 partitions that mean the partitions will get 25 records. Whenever the first partition starts, the
remaining three partitions start processing simultaneously and parallel.

7. What is configuration file? What is the use of this in data stage?


It is normal text file. it is having the information about the processing and storage resources
that are available for usage during parallel job execution.
The default configuration file is having like
a) Node:- it is logical processing unit which performs all ETL operations.
b) Pools:- it is a collections of nodes.
c) Fast Name: it is server name. by using this name it was executed our ETL jobs.
d) Resource disk:- it is permanent memory area which stores all Repository components.
e) Resource Scratch disk:-it is temporary memory area where the staging operation will be
performed.
8. What are the data partitioning techniques in data stage?
In this data partitioning method the data splits into various partitions distribute across the
processors.
The data partitioning techniques are
a) Auto
b) Hash
c) Modulus
d) Random
e) Range
f) Round Robin
g) Same

The default partition technique is Auto.

Round Robin:- the first record goes to first processing node, second record goes to the second
processing node and so on….. This method is useful for creating equal size of partition.

Hash:- The records with the same values for the hash-key field given to the same processing node.

Modulus:- This partition is based on key column module. This partition is similar to hash partition.

Random:- The records are randomly distributed across all processing nodes.

Range:- The related records are distributed across the one node . The range is specified based on key
column.

Auto:- This is most common method. The data stage determines the best partition method to use
depending upon the type of stage.

Вам также может понравиться