Вы находитесь на странице: 1из 110

Contents

Introduction to Datastage

3

History of Datastage

3

Architecture of Datastage

3

Components of Datastage

3

How to create Project:

3

Steps to Create Sample Job:

5

How to Open DS Designer

6

What is Job?

6

Types of Jobs?

7

Creating a Datastage Job:

7

How to Create ODBC

7

Business Requirement:

9

Add Different Stages.(Add SQL Enterpirse, Oracle enterprise)

10

Configure the Stages(Source Stage and Target Stage)

10

Compile the Job

12

Populating Surrogate Key in Order Method Dimension Table:

12

Surrogate key File Creation

15

RCP(Runtime Column Propagation)

16

Employee Dimension Population

18

Parameters:

21

Global Parametres:

22

ORDER METHOD DIM POPULATION Using Insert and

25

Change Capture Stage:

25

Datastage Administrator Activity

26

Deleting a Corrupted Project

34

Restarting RTI server

34

Adding Oracle DSN Entries

35

Configuring .odbc.ini file to add DSN for DB2 connectivity:

36

Steps for Creating a New Datastage Project

43

Deadlock Daemon Locks using UNIVERSE Commands

45

Deleting a Project using Datastage Administrator

47

LDAP Configuration

49

Package Installation guidelines

52

Releasing Resource Locks using UNIVERSE Commands

59

Renaming Datastage Project Using Universe

61

Restart of DataStage 7.5.1a Services

62

Restart of DataStage 8.1 Services

66

Restarting RTI Agent

75

Overview of Datastage Stages:

76

Aggregator Stage :

76

Change Apply Stage

82

Takes the change data set, that contains the changes in the before and after data sets, from the Change Capture stage and applies the encoded change operations to a before data set to compute an

after data set

82

Filter Stage :

84

Funnel Stage :

88

Join Stage :

91

Lookup Stage :

95

Merge Stage :

Join Stage combines a sorted master data set with one or more update data sets.

The columns from the records in the master and update data sets are merged so that the output

record contains all the columns from the master record plus any additional columns from each update

record.

96

Modify Stage

98

Pivot Stage :

103

Remove Duplicates Stage

104

Surrogate Key Generator Stage

105

Switch Stage

106

Compress Stage :

109

Expand Stage

110

Introduction to Datastage

History of Datastage

Architecture of Datastage

Components of Datastage

How to create Project:

1)

2)

Login to Datastage Administrator

Datastage Architecture of Datastage Components of Datastage How to create Project: 1) 2) Login to Datastage

3)

4)

5)

6)

7)

3) 4) 5) 6) 7) Go to Project Tab and Click on Add button Provide the

Go to Project Tab and Click on Add button

3) 4) 5) 6) 7) Go to Project Tab and Click on Add button Provide the

Provide the name of the project you want to create.

on Add button Provide the name of the project you want to create. Click on Project
on Add button Provide the name of the project you want to create. Click on Project

Click on Project Properties provide the Access to users.

Steps to Create Sample Job: 1) Understand the Business Requirement 2) Open Datastage designer 3)

Steps to Create Sample Job:

1) Understand the Business Requirement 2) Open Datastage designer 3) Create Job(Parallel)/Server Job 4) Add stages 5) Configure each stage

a. Datasource Info

b. Table Names, Column names

c. Mapping

6) Save, Compile and Run

How to Open DS Designer

How to Open DS Designer Enter Datastage User Name and Password and select the project you

Enter Datastage User Name and Password and select the project you want to work and Click on OK

and select the project you want to work and Click on OK What is Job? •

What is Job?

Executable DataStage program

Created in DataStage Designer

Built using a graphical user interface

Compiles into Orchestrate shell language (OSH)

Types of Jobs?

Creating a Datastage Job:

Types of Jobs? Creating a Datastage Job: How to Create ODBC Connection.
Types of Jobs? Creating a Datastage Job: How to Create ODBC Connection.

How to Create ODBC Connection.

Types of Jobs? Creating a Datastage Job: How to Create ODBC Connection.
Business Requirement: Gosales(MSSQL) Gosalesdw(Oracle) Order Method   ORDER_METHOD_DIM    

Business Requirement:

Business Requirement: Gosales(MSSQL) Gosalesdw(Oracle) Order Method   ORDER_METHOD_DIM    
Business Requirement: Gosales(MSSQL) Gosalesdw(Oracle) Order Method   ORDER_METHOD_DIM    
Business Requirement: Gosales(MSSQL) Gosalesdw(Oracle) Order Method   ORDER_METHOD_DIM    

Gosales(MSSQL)

Gosalesdw(Oracle)

Order Method

 

ORDER_METHOD_DIM

 
 

Order

   

OrderMethodCD

Nm

ORDER_METHOD_ID

ORDER_METHOD_DESC

Add Different Stages.(Add SQL Enterpirse, Oracle enterprise)

Add Different Stages.(Add SQL Enterpirse, Oracle enterprise) Configure the Stages(Source Stage and Target Stage)

Configure the Stages(Source Stage and Target Stage)

Add Different Stages.(Add SQL Enterpirse, Oracle enterprise) Configure the Stages(Source Stage and Target Stage)

Compile the Job

Compile the Job Populating Surrogate Key in Order Method Dimension Table: Wants to load the DIM_ORDER_METHOD
Compile the Job Populating Surrogate Key in Order Method Dimension Table: Wants to load the DIM_ORDER_METHOD

Populating Surrogate Key in Order Method Dimension Table:

Wants to load the DIM_ORDER_METHOD from Order_Method table hosted on MS SQL Server database. In this case, want to generate Surrogate Key.

Mapping

Source Table(Order_Method)

Target Table(DIM_ORDER_METHOD)

 

--

ORDER_METHOD_KEY(surrogate_key)

ORDER_METHOD_CODE

ORDER_METHOD_CODE

ORDER_METHOD_DESC

ORDER_METHOD_DESC

Two ways to populating Surrogate key(Flat File or DB Sequence)

1)

By using Surrogate Key Stage in the Job itself

2) By Using Transformer Stage in the Job. Steps to design the Job 1) Add

2)

2) By Using Transformer Stage in the Job. Steps to design the Job 1) Add Stages(SQL

By Using Transformer Stage in the Job.

2) By Using Transformer Stage in the Job. Steps to design the Job 1) Add Stages(SQL

Steps to design the Job

1)

Add Stages(SQL enterprise, Trnsformer, Oracle enterprise)

2)

Configure each and every stage

3)

4)

5)

6)

3) 4) 5) 6)
3) 4) 5) 6)
3) 4) 5) 6)
3) 4) 5) 6)

7)

8)

7) 8) Surrogate key File Creation 1) Create new job to generate Surrogate file 2) Add
7) 8) Surrogate key File Creation 1) Create new job to generate Surrogate file 2) Add

Surrogate key File Creation

1)

Create new job to generate Surrogate file

2)

Add Surrogate Key Generator Stage

to generate Surrogate file 2) Add Surrogate Key Generator Stage 3) Configure, compile, Run to generate

4)

4) RCP(Runtime Column Propagation) RCP can be used to populate columns available in the source table

RCP(Runtime Column Propagation)

RCP can be used to populate columns available in the source table without defining at the

stages.

We can enable RCP functionality at Datastage Administrator

available in the source table without defining at the stages. We can enable RCP functionality at

To use RCP in the jobs we need enable RCP flag at Source Stage and Transformer Stage.

To use RCP in the jobs we need enable RCP flag at Source Stage and Transformer
To use RCP in the jobs we need enable RCP flag at Source Stage and Transformer

Employee Dimension Population

Business Requirement: Want to populate the Employee Dimension table from source Employee table available in Gosales

table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
table from source Employee table available in Gosales EMP Trsnfr Emp_dim 1) Add 3 Stages(SQL Enterprise,
EMP
EMP

Trsnfr

Emp_dim
Emp_dim

1)

Add 3 Stages(SQL Enterprise, Oracle enterprise, Transformer)

Emp: R2: Wants to Populate Gender Name in the Target table based on Gender Code

Gender Name in the Target table based on Gender Code BR: EMP_DIM: R3  Want to

BR: EMP_DIM: R3Want to store Year of Hire and Month of Hire based on Hire Date available in the source.

EMP_DIM: R4  Wants to populate Termination Reason in the Employee Dim. Add Lookup Stage
EMP_DIM: R4  Wants to populate Termination Reason in the Employee Dim. Add Lookup Stage

EMP_DIM: R4Wants to populate Termination Reason in the Employee Dim.

Add Lookup Stage and Database Stage to read the records from Termination Lookup Table.

EMP_DIM: R5: Wants to populate Manger Code1, Manager_name1, Manger Code2, Manger Code3, Manger Code4, Manger Code5 in to Employee DIM table.

Data Set Joiner(emp_ cd=emp_cd EMp_Hist EMP HIST
Data Set
Joiner(emp_
cd=emp_cd
EMp_Hist
EMP HIST
table. Data Set Joiner(emp_ cd=emp_cd EMp_Hist EMP HIST Joiner(mgr1_cd= emp_cd Parameters: Parameters can be
table. Data Set Joiner(emp_ cd=emp_cd EMp_Hist EMP HIST Joiner(mgr1_cd= emp_cd Parameters: Parameters can be

Joiner(mgr1_cd=

emp_cd

cd=emp_cd EMp_Hist EMP HIST Joiner(mgr1_cd= emp_cd Parameters: Parameters can be divided into 2 types. 1)

Parameters:

Parameters can be divided into 2 types. 1) Global Parameters(Project Level). 2) Local Parameters(Job Level).

Global Parameters can be created in a datastage Administrator

Local Parameters can be created in Datastage designer.

Global Parametres:

Global Parametres:
If you want to use these global parameters in the Job. We need include the

If you want to use these global parameters in the Job. We need include the parameters in the Job Parameters.

If we want logically group parameters, First need to create Parameter SET.

If we want logically group parameters, First need to create Parameter SET.

If we want logically group parameters, First need to create Parameter SET.

ORDER METHOD DIM POPULATION Using Insert and Update.

ORDER METHOD DIM POPULATION Using Insert and Update. Change Capture Stage:

Change Capture Stage:

ORDER METHOD DIM POPULATION Using Insert and Update. Change Capture Stage:

How to Implement SCD Type 2 using Change Capture:

Source table:

Order Method

Order Method

 

Code

Order method NM

 

1 Fax

 

2 Web

 

3 Email

 

4 Telephone

Target table should be

KEY

CD

NM

CURR_INDICATOR

 

601 Fax

1

 

Y

 

602 Web

2

 

Y

 

603 E-Mail

3

 

N

 

604 Email

3

 

Y

 

4

605 Telephone

Y

Add different stages as mentioned below:

  4 605 Telephone Y Add different stages as mentioned below: Configure Source DB stage to

Configure Source DB stage to connect SQL Serer DB

Configure Lkp table to connect Order Method Dim table on Target DB:

Configure Lkp table to connect Order Method Dim table on Target DB:

Configure Change Capture Stage as mentioned Below:

Configure Change Capture Stage as mentioned Below:

Configure Transformer Stage as mentioned below:
Configure Transformer Stage as mentioned below:

Configure Transformer Stage as mentioned below:

Configure Transformer Stage as mentioned below:
Configure Target DB as mentioned Below to connect Target Table(Order Method Dim) for Insert
Configure Target DB as mentioned Below to connect Target Table(Order Method Dim) for Insert

Configure Target DB as mentioned Below to connect Target Table(Order Method Dim) for Insert

Configure Target DB as mentioned Below to connect Target Table(Order Method Dim) for Insert

Configure Target DB to connect target table(Order Method Dim) for Update purpose.

connect target table(Order Method Dim) for Update purpose. SCD population using SCD Stage: Configure the SCD

SCD population using SCD Stage:

target table(Order Method Dim) for Update purpose. SCD population using SCD Stage: Configure the SCD stage

Configure the SCD stage as follows

Datastage Administrator Activity.

Deleting a Corrupted Project

Steps to remove the project manually through UV:

1) Login to UNIX using dsadm.

2) Make sure you are in the DSEngine folder

3) Source the dsenv and login to the UV

$

.

./dsenv

$ bin/uv

4) Execute "LIST UV.ACCOUNT <project name>" and if you see the project name type: "DELETE UV.ACCOUNT project name"

5) "LIST UV_SCHEMA” to see the List of project names, then if you see the

Project name type: "VERIFY.SQL SCHEMA <project_name> FIX"

6) Check that you cannot see the project by typing "LIST UV_SCHEMA"

7) If you still see the project then enter "Drop SCHEMA project_name cascade;"

Restarting RTI server

1. Logon as Super dsadm -

2. Change the directory to the RTIServer bin

$ cd

/opt/dsSoftware/Ascential/RTIServer/bin/

For starting

$ nohup ./RTIServer.sh start &

For stopping

$ nohup ./RTIServer.sh stop &

4. Check whether the RTI server has been restarted. Execute the below command

$ ps -ef| grep RTI

Find the sample output for the above command

dsadm 4977 4946 0 16:26:09 pts/7

0:00 grep RTI

dsadm 20018

1

0

Feb 27 ?

1061:21

/opt/dsSoftware/Ascential/RTIServer/apps/jre/bin/java -Xmx256m -server -Dprogra

Or

open the IE browser and enter the url in the address bar

http://<Server Name >:8080/rti/

http://kopsapace02.corpnet2.com:8080/rti/

Adding Oracle DSN Entries

1. Logon to Datastage Server as dsadm

2. Change directory to DSEngine folder

3. Start or stop the server using nohup command

4. Edit .odbc.ini file to add an entry for DSN. Pls find a sample Oracle DSN entries

[ukdev495]

Driver=/opt/dsSoftware/Ascential/DataStage/branded_odbc/lib/VMor820.so

Description=DataDirect Oracle

ServerName=ukdev495

CatalogOptions=0

ProcedureRetResults=0

EnableDescribeParam=0

EnableStaticCursorsForLongData=0

ApplicationUsingThreads=1

5. Save the file

Note: DataStage does not need to be stopped or restarted after this change

Configuring .odbc.ini file to add DSN for DB2 connectivity:

The .odbc.ini file in the Datastage home directory (/opt/dsSoftware/Ascential/DataStage/DSEngine) should have an entry for every database to which the user wants to connect using ODBC connectivity.

This document is about adding an entry in .odbc.ini file to allow for DB2 connectivity.

The Sample Entry for DB2 connectivity is given below

[PMAR_JDE_446_ODBC]

Driver=/opt/dsSoftware/Ascential/DataStage/branded_odbc/lib/VMdb220.so

Description=DataDirect 5.00 DB2 Wire Protocol Driver

AddStringToCreateTable=

AlternateID=

Collection=JDFDATA

DynamicSections=100

GrantAuthid=PUBLIC

GrantExecute=1

IpAddress=166.71.155.29

IsolationLevel=CURSOR_STABILITY

Location=NETDATA

LogonID=SCDWUSER

Password=

Package=PMARPCK

PackageOwner=SCDWUSER

TcpPort=446

WithHold=1

The entry within [ ] is the name of the entry (PMAR_JDE_446_ODBC in this case)

The driver is the location of ODBC driver for DB2. An ODBC driver is needed to allow connectivity from Datastage to any Database.

AddStringToCreateTable is the string that should be added while issuing create table commands

Collection is the name of the Library that has tables to which the user has access. (I believe that no matter which library you are using here you would be able to access the ones your DB user has privileges).

IpAddress is the IP Address of the Database Server

Location is the name of the Relational Database (RDB) on the AS/400 server

LogonID is the user Logon with which the user logs on to RDB on AS/400

Password is the password for the user

Package is any name upto 7 characters to uniquely identify this connectivity

PackageOwner is typically the same user

TCPPort is the port number on which DB2 is listening.

Finding out the Location:

Run the "WRKRDBDIRE" command on AS/400 and use the entry against the one that is typed *LOCAL.

Finding out the Port Number:

To determine the correct port number, execute 'NETSTAT' from an AS/400 command line. Choose option 3 to display a list of active ports on the AS/400. Find the entry for DRDA and press F-14 to toggle the display the port number.

DRDA is the communicating protocol for communicating with DB2.

For Collection we have used JDFDATA, which is a Vanilla Library.

Once the changes are made to .ODBC.INI file, the next step is to Bind the package. This is essential before checking the DSN Connectivity.

The bind command can be executed from branded_odbc/lib directory as

$. /bind20 PMAR_JDE_8471_ODBC

where PMAR_JDE_8471_ODBC is the name of the DSN.

Bindings may not happen successfully sometimes and most probable reasons are

1. Wrong Port: This will cause the Bind operation to hang forever. Press ctrl c to break from the operation and re-edit the entries in .odbc.ini by specifying the correct port umber using the command listed above

2. Incorrect UserId/Password: Attempting to bind the package with incorrect user credentials or credentials having insufficient privileges will cause the Bind operation to fail with the error 7680. Discuss with DB2 or AS/400 admin to get the correct user credentials and/or privileges.

3. Incorrect Port / IP Address: Trying to bind a package with incorrect IP/Port will cause the binding to fail with the error 7505. Follow the commands listed above to identify the Port number and discuss with DB2 or AS/400 admin to get the correct IP Address.

4. Incorrect Location/Collection: Package bind will fail with the error 1242 if the Location or collection mentioned in the .odbc.ini file is incorrect. Use the procedure mentioned above to find out the correct Location and contact DB2 or AS/400 admin to get the Collection name to which you have access.

5. Network Error: Package creation and binding may sometimes fail with the error 7500 which would indicate a network failure.

The cause for other error messages during the package binding can be found by looking at the file ivdb220.po in the directory /branded_odbc/locale/en_US/LC_MESSAGES

Once the Bind is successful, the next step is to test the ODBC connectivity. This can be done as

1. If you haven’t previously done so, cd to $DSHOME and set up the DataStage environment by running dsenv:

.

./dsenv

./bin/dssh

The DSEngine shell starts.

3. Log to the project:

LOGTO PMARDev

Where project_name is case sensitive.

4. Get a list of available DSNs by typing:

DS_CONNECT

5. Test the required connection by typing:

DS_CONNECT PMAR_JDE_8471_ODBC

6. Once the test is successful, exit out by pressing .Q

Once the DSN connectivity is tested from the Unix box, the next step is to import tables from Datastage using ODBC and start using the same in Datastage Jobs.

Configuring uvconfig file for avoiding Timeout:

1. Logon using dsadm.

2. Check that there are no client connections or phantom jobs running in the background

This can be checked by issuing the commands

$ ps efd | grep phantom

$ ps efd | grep dsapi

There should not be any processes as a result of the above commands. If there are any phantom processes or client connections, they need to be killed using the process below.

Request the client (szs42740 for example) to close their client connections and/or log onto to Unix Box and kill the process.

If the clients (szs42740 for example) are not traceable and/or there is a pressing need to restart the Datastage service, issue the following commands

$ super mdc-kill-phantom

$ super mdc-kill-dsapi_slave

The first command kills all the phantom processes and the second command kills all the dsapi_slave connections.

3. Change the directory to

$ /opt/dsSoftware/Ascential/DataStage/DSEngine/bin

4. Source the dsenv file.

$

/dsenv

5.

Stop the service

$ . /uv -admin -stop.

6. Change the directory

$

/opt/dsSoftware/Ascential/DataStage/DSEngine

7. Take a back up of uvconfig file.

8. Change the below mentioned values in the uvconfig file using Vi editor

RLTABSZ 100

GLTABSZ 100

MAXRLOCK 99

9. Save the uvconfig file.

10. Make the changes to take effect

$ ./uv admin regen

11. Restart the DS server

$ ./uv admin start

12. Change the directory to DsEngine .To Confirm the changes have taken effect issue

the command below

$ /opt/dsSoftware/Ascential/DataStage/DSEngine/

$ bin/uvregen -t

Note: $ is the Unix prompt

Steps for Creating a New Datastage Project

Step 1: Login to Datastage Administrator using dsadm

Step 2: After successful login click on the “Projects” Tab and then click “Add”.

Step 3: “Add Project” window will be displayed.

Step 3: “Add Project” window will be displayed. Step 4: Enter the name of the Project.

Step 4: Enter the name of the Project.

Step 5: Enter the Project Path (/datastage/Projects/)

Step 6: Click “OK”

Step 7: Select the created project and click “Properties”

Step 8: Check the options in the Project Properties as displayed in the image below

in the Project Properties as displayed in the image below Steps for Creating Access permissions for

Steps for Creating Access permissions for a Datstage Project

Step 1: Login to Datastage Administrator using dsadm

Step 2: Change the directory to /datastage/Projects/<Project Name >

Step 3: Identify the .developer.adm file

Step 4: Open the .developer.adm file and enter only the primary group(Eg:dstage) or

secondary group (Eg : ds_scdw) for giving access permission.

Users who are not part of the primary group(Eg:dstage) or secondary group (Eg : ds_scdw) entered in the .developer.adm file cannot access the project.

Deadlock Daemon Locks using UNIVERSE Commands

Step 1. Logon using dsadm.

Step 2. Change the directory to

$ /opt/dsSoftware/Ascential/DataStage/DSEngine/bin

Step 3. Source the dsenv file.

$

/dsenv

Step 4.Issue the below command

$ ./uv

and the following message will be displayed

DataStage Command Language 7.0

Copyright (c) 1997 - 2003 Ascential Software Corporation. All Rights Reserved

DSEngine logged on: Fri May 12 13:24:49 2006

Step 5. Issue the below command

> LOGTO UV

> DEADLOCK.MENU

You will get the following messages

DataStage Deadlock Manager Administration

1. Examine the deadlock daemon logfile

2. Start the Deadlock daemon

3. Halt the Deadlock daemon

4. Purge the logfile

5. Check for deadlock

6. Select victim for deadlock resolution

Step 6: choose the options above

Step 7: Issue the following command to quit the universe

>Q

Find the explanation for the messages in the Dsdlock.log

1. DeadLock Daemon started in Query Mode by pid .

Someone executed dsdlockd -p command

2. DeadLock Daemon started in Dead Process Cleanup Mode.

The deadlock daemon on waking found defunct processes and initiated a cleanup

3. DeadLock Daemon started in Normal Mode by pid.

Someone (whose pid is given) started the deadlock daemon, maybe from

DEADLOCK.MENU, maybe from the command line. If pid = 1 this was the auto-start

on re-boot.

Deleting a Project using Datastage Administrator

Steps to remove the project through Datastage Administrator:

1) Login to Datastage Administrator using dsadm.

2) Navigate to the Projects Tab
2) Navigate to the Projects Tab

2) Navigate to the Projects Tab

3) Select the Project name that needed to be deleted. 4) Once the project name
3) Select the Project name that needed to be deleted. 4) Once the project name
3)
Select the Project name that needed to be deleted.
4)
Once the project name is selected Click on Delete button.
5)
This will ask for a confirmation ‘Are you sure you want to delete the project?’ Click ‘Yes’ to
delete the project.

6. This will delete the selected datastage project.

LDAP Configuration

Initial Setting in the WAS for Global security

Please follow the steps below to configure LDAP for IBM Information Server. Prerequisites: Create VSED

Please follow the steps below to configure LDAP for IBM Information Server.

Prerequisites: Create VSED user id and Password which has full administrative rights

And get the Type, Host, Port, Base Distinguished name

Step 1: Login to the WAS Web Console using the https://<servername>:9043/ibm/console/logon.jsp and Click Security -> Global Security ->Under User registries, click LDAP

Enter the required details as given below

Enter the required details as given below Step 2: Change Active User Registry to LDAP

Step 2: Change Active User Registry to LDAP

Enter the required details as given below Step 2: Change Active User Registry to LDAP

Step 3: Login as root to the Datastage Server. Stop the IBM Information Server

Cd etc/rc2.d

# ./AppServerAdmin.sh -was -user yqz99739 -password mask31july

Info WAS instance /Node:stvus059Node01/Server:server1/ updated with new user information

Info MetadataServer daemon script updated with new user information

#

^C

#

./DirectoryAdmin.sh -delete_groups

#

./DirectoryAdmin.sh -delete_users#

^C#

# ./DirectoryAdmin.sh -delete_users

Package Installation guidelines

Step 1: Login as dsadm

Step 2: Change the directory to DSEngine Directory

Eg: cd /local/apps/dsSoftware/715A/Ascential/DataStage/DSEngine

Step 3: Change the directory to bin in DSEngine

$cd bin

Step 4: Source the dsenv file

$ /dsenv

Step 5: Execute the Datastage Package installer command

$. /dspackinst

Please find the screen shot for steps 3, 4, and 5

Please find the screen shot for steps 3, 4, and 5 Please find the screenshot 1,

Please find the screenshot 1, 2 3 after the execution of Step 5

Screenshot 1:

3, 4, and 5 Please find the screenshot 1, 2 3 after the execution of Step

Screenshot 2:

3, 4, and 5 Please find the screenshot 1, 2 3 after the execution of Step
Step 6: Enter the package directory (Screen shot 4) Screen shot 4: The package Installer

Step 6: Enter the package directory (Screen shot 4)

Screen shot 4:

Enter the package directory (Screen shot 4) Screen shot 4: The package Installer will display the

The package Installer will display the package information.(Screen shot 5)

Screen shot 5:

(Screen shot 4) Screen shot 4: The package Installer will display the package information.(Screen shot 5)

Step 7 The package installer will search for the Projects on the server and select the project you want to the plug-in to be registered (Screen shot 6)

Screen shot 6:

the plug-in to be registered (Screen shot 6) Screen shot 6: Step 8: Enter the Log

Step 8: Enter the Log file destination directory (Screen shot 7)

Screen shot 7:

The package installer will show the installation details which you have given in the previous

The package installer will show the installation details which you have given in the previous steps (Screen shot 8)

Screen shot 8:

given in the previous steps (Screen shot 8) Screen shot 8: Step 9: Enter the options

Step 9: Enter the options if you want to proceed (Screen shot 9)

Screen shot 9:

if you want to proceed (Screen shot 9) Screen shot 9: The installation confirmation will be

Screen shot 10:

Screen shot 10: Note: Proper considerations have to be taken when doing an FTP from the
Screen shot 10: Note: Proper considerations have to be taken when doing an FTP from the

Note: Proper considerations have to be taken when doing an FTP from the plug-in source to the Datastage server.

Releasing Resource Locks

DataStage Director pull down Job->Cleanup Resources. Choosing this option will open the Job Resources interface.

To release a locked Item:  Select Show All in the Processes pane  Select

To release a locked Item:

Select Show All in the Processes pane

Select Show All in the locks pane

Locate the Item id you wish to unlock and note the PID/User#. For example rjPLAW_P1_LoadSTG_Seq has a PID of 27645

Locate the PID in the Processes pane and select the row.

Release the lock by clicking on the Logout button. This will kill the process holding the lock,thus releasing it.

Releasing Resource Locks using UNIVERSE Commands

Step 1. Logon using dsadm.

Step 2. Change the directory to

$ cd /opt/dsSoftware/Ascential/DataStage/DSEngine/bin

Step 3. Source the dsenv file.

$

/dsenv

Step 4.Issue the below command

$ ./uv

and the following message will be displayed

DataStage Command Language 7.0

Copyright (c) 1997 - 2003 Ascential Software Corporation. All Rights Reserved

DSEngine logged on: Fri May 12 13:24:49 2006

Step 5. Issue the below command

$ LOGTO <Project Name >

eg: LOGTO SCDW

$ LIST.READU EVERY

and the following messages will be displayed

Active Group Locks:

Record Group Group Group

Device

Inode

Netnode Userno Lmode G-Address. Locks

RD

SH

EX

69847306

11234

0 36986

5 IN

1000

1

0

0

0

69847306

7126

0 40649

8 IN

B800

1

0

0

0

69847306

16839

0 58192 10 IN

B000

1

0

0

0

69847306

16839

0 40669 19 IN

7000

1

0

0

0

69847306

20851

0 40649 19 IN

1000

1

0

0

0

Active Record Locks:

Device

Inode

Netnode Userno Lmode

Pid Login Id Item-ID

69847306

11234

0 44468

5 RL 21068 dsadm

RT_CONFIG66

69847306

11234

0 42852

5 RL 22684 dsadm

RT_CONFIG66

69847306

7126

0 54892

8 RL 10644 dsadm

RT_CONFIG354

69847306

7126

0 54892 68 RL 10644 dsadm

RT_CONFIG356

69847306

6997

0 65053 69 RU

483 dsadm

GSKLACIRSBSpecificDat

aInsertLots

Step 6: Identify the Resource locks

If the user wants to release the Resource identified above in the messages higlighted .Issue the command below

$ LOGTO UV

$ UNLOCK INODE 6997 USER 65053 ALL

The below messages will be displayed

Clearing Record locks.

Clearing GROUP locks.

Clearing FILE locks.

Renaming Datastage Project Using Universe

Please backup and save the original project if anything goes wrong;

1. From within DS Administrator create project newname

2. From server remove directory newname

3. Rename newname.tmp to newname

4. Source the .dsenv

5. Type > bin/uv or bin/uvsh

6. LOGTO project

7. Type UPDATE.ACCOUNT (This ensure that pointers are updated to reflect the correct

installation directories etc.)

8. Type DS.TOOL(s) and select option 2 (this is to rebuild the repository indexes)

9. Once complete type n or press returns

10. Access the Project as normal

Restart of DataStage 7.5.1a Services

Restarting of a Datastage service may be necessary under various circumstances. The most common need for a restart of the service is the changes made to the Environment file “dsenv”.

Restarting of Datastage Service is a two step process

1. Stop Datastage Service

2. Start Datastage Service

Prerequisites

1. Logon as dsadm -

2. Stop SITESCOPE MONITOR VERY IMPORTANT

After logging onto the Unix box using our login credentials, we switch the user to dsadm

This can be done by issuing the command

su dsadm

Unix box using our login credentials, we switch the user to dsadm This can be done

2. Check existence of client connections

Before attempting to stop Datastage service, ensure that there are no client connections or phantom jobs running in the background.

This can be checked by issuing the commands

ps efd | grep phantom

ps efd | grep dsapi

There should not be any processes as a result of the above commands. If there are any phantom processes or client connections, they need to be killed using the process below.

a. Find out the user of the process. This can be found by looking at the process entry

Eg: szs42740 7854 7846 0 11:29:25 ?

0:07 dsapi_slave 9 8 0

A sample entry as shown above, indicates that the user “szs42740” is having a client connection (dsapi_slave).

In such case, request the client (szs42740 in the example) to close their client connections and/or log onto to Unix Box and kill the process.

If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issue the following commands

super mdc-kill-phantom

super mdc-kill-dsapi_slave

The first command kills all the phantom processes and the second command kills all the dsapi_slave connections.

Stop Datastage Service

Attempt to stop the service after performing prerequisite activities detailed above.

The Datastage Service can be stopped by issuing the commands

cd $DSHOME

.

./dsenv

bin/uvsh

cd bin

./uv admin stop

This shuts down the server engine and frees any resources held by the server engine process.

Start Datastage Service

Wait for atleast 30 seconds, after stopping the Datastage service, before you attempt to restart the Datastage Service

The Datastage Service can be started by issuing the commands

/opt/dsSoftware/Ascential/Datastage/DSEngine/bin/uv admin start

This command starts the dsrpcd daemon, which is daemon for server engine.

Check Datastage Service

Check whether the Datastage service is running by issuing the following command

netstat na | grep 31538

The above command may have multiple line output, but if the service is running then there should be a row with a “LISTEN”.

*.31538

*.*

0

0 49152

0 LISTEN

Common Problems Restarting Datastage Service

1. Datastage Service is started, but cannot connect from Datastage clients

Symptom

The Datastage service was stopped and restarted, but attempting to connecting to the Host from the Datastage client (eg: Designer), results in an error such as

and issuing the command netstat -na | grep 31538 does not return any record with

and issuing the command

netstat -na | grep 31538

does not return any record with “LISTEN”.

Cause

The service was restarted without ensuring that client connections are closed. This causes the “port” to be unavailable for any connections

Remedy

Restart the service once again using the commands by issuing commands under the sections “Stop

Datastage Service” and “Start Datastage Service”. resolve this issue.

Stopping and Starting the service again is known to

For any other problems, contact IBM Support

Restart of DataStage 8.1 Services

Restarting of a Datastage service may be necessary under various circumstances. The most common need for a restart of the service is the changes made to the Environment file “dsenv”.

Restarting of IBM Information Server 8.1

3. Stop Datastage Service

4. Start Datastage Service

Prerequisites

3. Disable the SiteScope Monitor for the server that you are going to re-start (e.g breus002)

4. Login as dsadm user and switch to super root user.

After logging onto the Unix box using our login credentials, we switch the user to dsadm

This can be done by issuing the command

super root-shell

2. Check existence of client connections – Before attempting to stop Datastage service, ensure that

2. Check existence of client connections

Before attempting to stop Datastage service, ensure that there are no client connections or phantom jobs running in the background.

This can be checked by issuing the commands

ps efd | grep phantom

ps efd | grep dsapi

There should not be any processes as a result of the above commands. If there are any phantom processes or client connections, they need to be killed using the process below.

b. Find out the user of the process. This can be found by looking at the process entry

Eg: szs42740 7854 7846 0 11:29:25 ?

0:07 dsapi_slave 9 8 0

A sample entry as shown above, indicates that the user “szs42740” is having a client connection (dsapi_slave).

In such case, request the client (szs42740 in the example) to close their client connections and/or log onto to Unix Box and kill the process.

If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issue the following commands

super mdc-kill-phantom

super mdc-kill-dsapi_slave

The first command kills all the phantom processes and the second command kills all the dsapi_slave connections.

Stop Datastage Service Attempt to stop the service after performing prerequisite activities detailed above.
Stop Datastage Service Attempt to stop the service after performing prerequisite activities detailed above.

Stop Datastage Service

Attempt to stop the service after performing prerequisite activities detailed above.

The Datastage Service can be stopped by issuing the commands

cd /etc/rc2.d

# ./S99ds.rc 'stop'

Stopping JobMonApp

JobMonApp has been shut down.

DataStage Engine 8.1.0.0 instance "ade" has been brought down.

# ./S99ISFAgents 'stop'

Agent stopped.

LoggingAgent stopped.

# ./S99ISFServer 'stop'

ADMU0116I: Tool information is being logged in file

/local/apps/DRS_dstage/IS81/IBM/AppServer/profiles/default/logs/server1/stopServer.log

ADMU0128I: Starting tool with the default profile

ADMU3100I: Reading configuration for server: server1

ADMU3201I: Server stop request issued. Waiting for stop status.

ADMU4000I: Server server1 stop completed.

# ps -efd|grep java

root 6042 5544

# ps -efd|grep ds

0 21:39:52 pts/3

0:00 grep java

dsadm 3201 3200

0 20:27:24 ?

0:00 /opt/openssh/libexec/sftp-server

dsadm 3094 3092

0 20:24:47 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 3096 3094

0 20:24:47 pts/2

0:00 -ksh

dsadm 14084 14082

0 14:50:16 pts/4

0:00 -ksh

dsadm 14082 14080

0 14:50:16 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 2050 25672

0 20:00:24 pts/5

0:00 tail -f startServer.log

root 6045 5544

0 21:40:01 pts/3

0:00 grep ds

dsadm 3200 3198

0 20:27:24 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 25670 25668

0 17:53:45 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 25672 25670

0 17:53:45 pts/5

0:00 -ksh

dsadm 5509 5497

0 21:31:36 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 26245

1

0 18:01:36 ?

0:05

/local/apps/DRS_dstage/IS81/IBM/InformationServer/Server/PXEngine/bin/resource_

dsadm 5511 5509

0 21:31:36 pts/3

0:00 -ksh

# kill 26245

# ps -efd|grep ds

dsadm 3201 3200

0 20:27:24 ?

0:00 /opt/openssh/libexec/sftp-server

dsadm 3094 3092

0 20:24:47 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 3096 3094

0 20:24:47 pts/2

0:00 -ksh

dsadm 14084 14082

0 14:50:16 pts/4

0:00 -ksh

dsadm 14082 14080

0 14:50:16 ?

0:00 /opt/openssh/sbin/sshd -R

root 6100 5544

0 21:41:23 pts/3

0:00 grep ds

dsadm 2050 25672

dsadm 3200 3198

0 20:00:24 pts/5

0 20:27:24 ?

0:00 tail -f startServer.log

0:00 /opt/openssh/sbin/sshd -R

dsadm 25670 25668

0 17:53:45 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 25672 25670

0 17:53:45 pts/5

0:00 -ksh

dsadm 5509 5497

0 21:31:36 ?

0:00 /opt/openssh/sbin/sshd -R

dsadm 5511 5509

0 21:31:36 pts/3

0:00 ksh

netstat na | grep 3153

This shuts down the server engine and frees any resources held by the server engine process.

Start Datastage Service

# ./S99ISFServer 'start'

# ./S99ISFAgents 'start'

LoggingAgent.pid: No such file or directory

Starting LoggingAgent

LoggingAgent started.

Agent.pid: No such file or directory

Starting Agent

Agent started.

# ./S99ds.rc 'start'.

# ./S99dsrfcd.rc 'start'

Check Datastage Service

Check whether the Datastage service is running by issuing the following command

netstat na | grep 3153

The above command may have multiple line output, but if the service is running then there should be a row with a “LISTEN”.

*.31538

*.*

0

0 49152

0 LISTEN

Common Problems Restarting Datastage Service

2. Datastage Service is started, but cannot connect from Datastage clients

Symptom

The Datastage service was stopped and restarted, but attempting to connecting to the Host from the Datastage client (eg: Designer), results in an error such as

Datastage client (eg: Designer), results in an error such as and issuing the command netstat -na

and issuing the command

netstat -na | grep 31538

does not return any record with “LISTEN”.

Cause

The service was restarted without ensuring that client connections are closed. This causes the “port” to be unavailable for any connections

Remedy

Restart the service once again using the commands by issuing commands under the sections “Stop

Datastage Service” and “Start Datastage Service”. resolve this issue.

Stopping and Starting the service again is known to

For any other problems, contact IBM Support

Super root shell

Restarting RTI Agent

1. Logon using super dsadm.

2. Change the directory to the RTIAgent bin

/opt/Ascential/RTIAgent

$ cd

/opt/dsSoftware/Ascential/RTIAgent/bin/

3. Start or stop the server using nohup command

For starting

$ nohup ./RTIAgent.sh start &

For stopping

$ nohup ./RTIAgent.sh stop &

4. Check whether the RTI server has been restarted .execute the below command

$ ps -ef| grep RTIAgent

Find the sample output for the above command

ps -efd|grep RTIAgent

dsadm 26190 26178 0 14:02:32 pts/3

0:00 grep RTIAgent

dsadm 26164

1 0 14:02:00 pts/2

Djava.library.path=/opt/dsSoft

0:01 /opt/dsSoftware/Ascential/RTIAgent/jre/bin/java -

Overview of Datastage Stages:

- Overview of Datastage Stages: Aggregator Stage : Aggregator classifies data rows from a

Aggregator Stage :

Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions for each group. The summed totals for each group are output from the stage thro' output link. Group is a set of record with the same value for one or more columns

Example : Transaction records might be grouped by both day of the week and by month. These groupings might show the busiest day of the week varies by season.

Change Apply Stage Takes the change data set, that contains the changes in the before

Change Apply Stage

Takes the change data set, that contains the changes in the before and after data sets, from the Change Capture stage and applies the encoded change operations to a before data set to compute an after data set.

The Change Apply stage read a record from the change data set and from the before data set, compares their key column values, and acts accordingly.

Filter Stage : The Filter stage transfers, unmodified, the records of the input data set

Filter Stage :

The Filter stage transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records.

Filter stage can have a single input link and a any number of output links and, optionally, a single reject link. You can specify different requirements to route rows down different output links. The filtered out records can be routed to a reject link, if required.

to route rows down different output links. The filtered out records can be routed to a
Funnel Stage : Funnel Stage copies multiple input data sets to a single output data

Funnel Stage :

Funnel Stage copies multiple input data sets to a single output data set. This operation is useful for combining separate data sets into a single large data set. The stage can have any number of input links and a single output link.

separate data sets into a single large data set. The stage can have any number of
Join Stage : Definition : Join Stage performs join operations on two or more data

Join Stage :

Definition : Join Stage performs join operations on two or more data sets input to the stage and then outputs the resulting data set.

The input data sets are notionally identified as the "right" set and the "left" set, and "intermediate" sets. It has any number of input links and a single output link.

and the "left" set, and "intermediate" sets. It has any number of input links and a
and the "left" set, and "intermediate" sets. It has any number of input links and a

Lookup Stage :

Lookup Stage used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data.

It can also perform lookups directly in a DB2 or Oracle database or in a lookup table contained in a Lookup File Set stage.

can also perform lookups directly in a DB2 or Oracle database or in a lookup table
can also perform lookups directly in a DB2 or Oracle database or in a lookup table
Merge Stage : Join Stage combines a sorted master data set with one or more

Merge Stage :

Join Stage combines a sorted master data set with one or more update data sets. The columns from the records in the master and update data sets are merged so that the output record contains all the columns from the master record plus any additional columns from each update record. A master record and an update record are merged only if both of them have the same values for the merge key column(s) that you specify. Merge key columns are one or more columns that exist in both the master and update records.

The data sets input to the Merge stage must be key partitioned and sorted. This ensures that rows with the same key column values are located in the same partition and will be processed by the same node.

that rows with the same key column values are located in the same partition and will
Modify Stage The Modify stage alters the record schema of its input data set. The

Modify Stage

The Modify stage alters the record schema of its input data set. The modified data set is then output. It is a processing stage. It can have a single input and single output.

data set. The modified data set is then output. It is a processing stage. It can
Pivot Stage : Pivot Stage converts columns in to rows. Eg., Mark-1 and Mark-2 are

Pivot Stage :

Pivot Stage converts columns in to rows.

Eg., Mark-1 and Mark-2 are two columns.

Task : Convert all the columns in to one column.

Implication : Can be used to co SCD Type-3 to Type-2.

Using Methodology : In the deviation field of the output column change the input columns in to one column.

Eg., Column Name "Marks".

Derivation : Mark-1 and Mark-2.

Note : Column "Marks" is derived from the input columns Mark-1 and Mark-2.

Derivation : Mark-1 and Mark-2. Note : Column "Marks" is derived from the input columns Mark-1
Remove Duplicates Stage The Remove Duplicates stage takes a single sorted data set as input,

Remove Duplicates Stage

The Remove Duplicates stage takes a single sorted data set as input, removes all duplicate records, and writes the results to an output data set.

Removing duplicate records is a common way of cleansing a data set before you perform further processing. Two records are considered duplicates if they are adjacent in the input data set and have identical values for the key column(s).

are considered duplicates if they are adjacent in the input data set and have identical values
Surrogate Key Generator Stage The Surrogate Key stage generates key columns for an existing data

Surrogate Key Generator Stage

The Surrogate Key stage generates key columns for an existing data set

User can specify certain characteristics of the key sequence. The stage generates sequentially incrementing unique integers from a given starting point. The existing columns of the data set are passed straight through the stage.

If the stage is operating in parallel, each node will increment the key by the number of partitions being written to.

If the stage is operating in parallel, each node will increment the key by the number
Switch Stage The switch stage takes a single data set as input and assigns each

Switch Stage

The switch stage takes a single data set as input and assigns each input row to an output data set based on the value of a selector field.

It can have a single input link, up to 128 output links and a single rejects link. This stage performs an operation similar to a C switch statement. Rows that satisfy none of the cases are output on the rejects link.

performs an operation similar to a C switch statement. Rows that satisfy none of the cases
Compress Stage : The Compress stage uses the UNIX compress or GZIP utility to compress

Compress Stage :

The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data set from a sequence of records into a stream of raw binary data

A compressed data set cannot be processed by many stages until it is expanded, i.e., until its rows are returned to their normal format. Stages that do not perform column based processing or reorder the rows can operate on compressed data sets. For example, you can use the copy stage to create a copy of the compressed data set.

can operate on compressed data sets. For example, you can use the copy stage to create
Expand Stage The Expand stage uses the UNIX compress or GZIP utility to expand the

Expand Stage

The Expand stage uses the UNIX compress or GZIP utility to expand the data set. It converts a data set from a stream of raw binary data into sequence of records.

or GZIP utility to expand the data set. It converts a data set from a stream
or GZIP utility to expand the data set. It converts a data set from a stream