Вы находитесь на странице: 1из 42

Using WebSphere DataStage with IBM DataMirror Change Data Capture

Set up a working environment for Information Server DataMirrorCDC operator


Skill Level: Intermediate Indrani Ghatare (indrani@us.ibm.com) Software Engineer IBM

15 May 2008 Do you need real-time access to data residing in diverse data sources? Learn how to use the DataMirror Change Data Capture (CDC) stagetype operator of the DataMirror Transformation Server to capture streaming data changes into WebSphere DataStage.This tutorial includes a step-by-step guide to help you set up your DataMirror environment. You'll learn how to generate a DataStage-specific import file for using the operator in the DataStage job. Finally, learn how to use the operator in some example jobs. This tutorial includes a sample DSX file.

Section 1. Before you start


This integration solution is suitable in a business environment where data copies are kept in different databases, and all the databases must be available to access real-time data at any point in time. The integration technology is suitable for keeping two data copies synced up in heterogeneous databases with different schema. DataMirror's Change Data Capture (CDC) technology show data changes that occurred in databases and also minimizes the resources required for maintaining ETL data. This gives you access to real-time data for your target databases.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 1 of 42

developerWorks

ibm.com/developerWorks

About this tutorial


This tutorial teaches you how to use Information Server DataMirrorCDC stagetype operator in IBM DataMirror environment. The tutorial provides step-by-step instruction to setup the DataMirror environment. It shows you how to set up DataStage DataMirrorCDC operator in a DataStage job. The tutorial includes sample data and DataStage job definition file.

Objectives
In this tutorial, you will: Set up the DataMirror environment Generate DataStage job definition file Prepare the DataStage job using DataMirrorCDC operator and sample data Run a DataStage job using sample data and example job

Prerequisites
This tutorial is written for Windows users whose skills and experience are at an intermediate level. You should have a solid understanding of Information Server DataStage and a working knowledge of DB2 and DataMirror.

System requirements
To use this tutorial, you need to have these installed on the same Windows box: Information Server 8.0.1 FP1 Information Server patch for DataMirrorCDC operator (available on eService) IBM DataMirror Transformation Server for UDB IBM DataMirror Transformation Server for WebSphere DataStage IBM DataMirror Transformation Server Access Manager IBM DataMirror Transformation Server Management Console

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 2 of 42

ibm.com/developerWorks

developerWorks

Before beginning, you should also do the following: Read the document about DataMirrorCDC stagetype operator, refer to the DataMirrorCDCReadme-a4.pdf file from Download section. Download the datamirrorTutorial.zip file from the Download section. This zip file contains the sample data and job definition file that are used in this tutorial. Extract the zip file in an empty directory.

Section 2. Overview
The DataMirrorCDC stagetype operator in Information Server allows you to utilize the CDC technology of IBM DataMirror. The scenario describes how DataMirrorCDC operator allows you to integrate data between a remote source database and a target database using DataMirror's CDC technology by streaming data changes into Information Server DataStage. Figure 1 shows the integration scenario: Figure 1. Integration scenario

The scenario follows these steps: 1. 2. DataStage extracts data from source database using standard ETL functions DataMirrorCDC operator requests the changed data from DataMirror

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 3 of 42

developerWorks

ibm.com/developerWorks

3. 4. 5. 6.

DataMirror captures changes made to the source database DataMirror sends the captured changes to DataMirrorCDC operator DataMirrorCDC operator passes the data off to downstream stages Updates are written to the target database

In the next section, you'll set up the DataMirror environment needed to implement the above steps.

Section 3. Set up the DataMirror environment


Follow the steps described in this section to set up your DataMirror environment. Task 1: Set up the database To set up the database used in this tutorial: 1. Create database BANKDATA by running the DB2 command: db2 create database BANKDATA 2. Enable log retention by running the DB2 commands: db2 update db config for BANKDATA using LOGRETAIN recovery db2 backup database BANKDATA to c:\backup BANKDATA is the name of the database that you want to enable for replication, and C:\backup is the directory where BANKDATA is backed up. Make sure that C:\backup exists in the system before running the command. 3. 4. If your CLASSPATH does not have db2java.zip, add it to your CLASSPATH. Create source table BANK.BANKCUSTOMERS using data.db2 found in the datamirrorTutorial.zip file (see Downloads).

Task 2: Add an instance of Transformation Server for UDB

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 4 of 42

ibm.com/developerWorks

developerWorks

To add an instance of Transformation Server for UDB, bring up Transformation Server Configuration Tool by clicking Start -> All Programs -> DataMirror -> Transformation Server for UDB -> Configure Transformation Server. 1. Click Add on the Transformation Server Configuration Tool window to create a new instance as Figure 2 shows. Figure 2. Add a new instance for UDB

2.

Check Local System account for Windows Service. Enter the information for the database. Click Apply and then Close. Figure 3 shows those steps. Figure 3. Enter information for the database

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 5 of 42

developerWorks

ibm.com/developerWorks

3.

Select the new_instance and click Start. Figure 4. Select new_instance

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 6 of 42

ibm.com/developerWorks

developerWorks

4.

The instance new_instance should be in running status now. Click Close. Figure 5. Make sure new_instance is running in UDB

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 7 of 42

developerWorks

ibm.com/developerWorks

Task 3: Add an instance of Transformation Server for WebSphere DataStage To add an instance of Transformation Server for WebSphere DataStage, bring up Transformation Server Configuration Tool by clicking Start -> All Programs -> DataMirror -> Transformation Server for WebSphere DataStage -> Configure Transformation Server. 1. Click Add on Transformation Server Configuration Tool window to create a new instance. Figure 6. Add a new instance for WebSphere DataStage

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 8 of 42

ibm.com/developerWorks

developerWorks

2.

Check Local System account for Windows Service. Set the password for tsuser in the Transformation Server Authentication box. Click Apply and then Close. Figure 7. Set password

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 9 of 42

developerWorks

ibm.com/developerWorks

3.

Select new_instance and click Start. Figure 8. Start the new instance

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 10 of 42

ibm.com/developerWorks

developerWorks

4.

The instance new_instance should be in running status now. Click Close. Figure 9. Make sure new_instance for Datastage is running

Task 4: Create and configure replication agents and users in Access Manager 1. Click Start -> All Programs -> DataMirror -> Transformation Server

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 11 of 42

developerWorks

ibm.com/developerWorks

Access Control -> Access Manager. 2. In the Access Manager window, click File -> New -> Replication Agent. Figure 10. Create a new replication agent

3.

To create a replication agent for source, go to the General tab of the Replication Agent Properties window: In the Name field, type srcagent (or any other unique name) In the Description field, type a description for the replication agent (optional) In the Hostname field, type the host name or ip address of where the Transformation Server is running In the Port field, type a port number to allow the replication agent to communicate with Transformation Server. This port number is specified during the Transformation Server configuration. Figure 11. Fill in the values for the replication agent

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 12 of 42

ibm.com/developerWorks

developerWorks

4.

Under the Version tab in the Replication Agent Properties window, click Ping Now. If the ping is successful, it will obtain the correct version information of Transformation Server. Click OK. If the ping is unsuccessful, check if the information in the General tab is correct or if there is any network issue. Figure 12. Click ping now

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 13 of 42

developerWorks

ibm.com/developerWorks

5.

Similarly, create another replication agent for the target. Under the General tab in the Replication Agent Properties window: In the Name field, type tgtagent (or any other unique name) In the Description field, type a description for the target agent (optional) In the Hostname field, type the host name or ip address where the replication agent is running In the Port field, type a port number that allows the replication agent to communicate with the Transformation Server for DataStage. This port number is specified during the Transformation Server for DataStage configuration. Figure 13. Create a target replication agent

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 14 of 42

ibm.com/developerWorks

developerWorks

6.

Under the Version tab in the Replication Agent Properties window, click Ping Now. If the ping is successful, it will obtain the correct version information of Transformation Server for DataStage. Click OK. If the ping is unsuccessful, check if the information in the General tab is correct or if there is any network issue. Figure 14. Click Ping Now

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 15 of 42

developerWorks

ibm.com/developerWorks

7.

In the Access Manager window, click File -> New -> Replication User to create the Transformation Server System Administrator. Figure 15. Create the Transformation Server System Administrator

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 16 of 42

ibm.com/developerWorks

developerWorks

8.

Under the General Tab in the User Properties window: In the Username field, type username In the Description field, type a description (optional). In the Password field, create and confirm a password. Figure 16. Create a password for the system administrator

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 17 of 42

developerWorks

ibm.com/developerWorks

9.

Under the Replication Agents table in the User Properties window, click Add/Delete.. Figure 17. Click the Add/Delete button

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 18 of 42

ibm.com/developerWorks

developerWorks

Select

both srcagent and tgtagent and click Add to move them to the "Allowed to access" window. Figure 18. Permit access to the srcagent and tgtagent

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 19 of 42

developerWorks

ibm.com/developerWorks

Click OK. Your screen should look like Figure 19. Figure 19. Access to srcagent and tgtagent

Click OK. You should see both srcagent and tgtagent under the Replication Agents list. Figure 20. srcagent and tgtagent are now replication agents

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 20 of 42

ibm.com/developerWorks

developerWorks

Select

srcagent and click Parameters. Fill in the information for srcagent that the Access Parameters window requests. Click OK. Figure 21. Parameter information for srcagent

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 21 of 42

developerWorks

ibm.com/developerWorks

Select tgtagent and click Parameters. Fill in the information for tgtagent that the Access Parameter window requests. Click OK. Figure 22. Parameter information for tgtagent

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 22 of 42

ibm.com/developerWorks

developerWorks

10. In Options tab, uncheck User must change password at next logon. Figure 23. Disable request to change password

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 23 of 42

developerWorks

ibm.com/developerWorks

Click OK

in the User Properties window. Close the Access Manager. Task 5: Setup subscription in the Transformation Server Management Console 1. Click Start -> All Programs -> DataMirror -> Transformation Server Management Console -> Management Console Type the user name and password to login to the Management Console. Figure 24. Login to the Management Console

2.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 24 of 42

ibm.com/developerWorks

developerWorks

3.

In the Management Console, click on File -> Datastore -> Add or Remove Tables... Figure 25. Add tables

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 25 of 42

developerWorks

ibm.com/developerWorks

4.

Select the source table BANKCUSTOMERS from the Add or Remove Tables wizard. Click OK. Figure 26. Add BANKCUSTOMERS

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 26 of 42

ibm.com/developerWorks

developerWorks

5.

Create a new subscription: From the menu, click Subscription -> New Subscription. In the New subscription window, provide a name for the subscription and select replication agents for Datastores. Click OK. Figure 27. Create a new subscription

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 27 of 42

developerWorks

ibm.com/developerWorks

6.

Configure subscription: From the menu, click theConfiguration tab and then the Subscription tab. Right-click on the subscription that you have created and click on Map Tables. Figure 28. Map tables

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 28 of 42

ibm.com/developerWorks

developerWorks

From Map Tables, check WebSphere DataStage mapping type. Click Next. Figure 29. Map to WebSphere DataStage

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 29 of 42

developerWorks

ibm.com/developerWorks

Check Direct Connect as WebSphere DataStage Connection Method. Click Next. Figure 30. Select Direct Connect

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 30 of 42

ibm.com/developerWorks

developerWorks

Select the source table. Click Next. Figure 31. Select a source table

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 31 of 42

developerWorks

ibm.com/developerWorks

For Connection, enter a port number that is not used by any other application. Under Record Format, check Single Record. Click Next. Figure 32. Select Single Record

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 32 of 42

ibm.com/developerWorks

developerWorks

Verify that Review mapping settings has "Before and After images will be in a single record." Click Finish. Figure 33. Ensure all images will be in a single record

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 33 of 42

developerWorks

ibm.com/developerWorks

Task 6: Set Access Server parameters To set the access server parameters: 1. Run C:\Program Files\DataMirror\Transformation Server for WebSphere DataStage\bin\dmsetaccessserverparams.exe and provide the values

Figure 34. Set the access server parameters

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 34 of 42

ibm.com/developerWorks

developerWorks

Section 4. Generate DataStage job definition file


1. From the Management Console, right-click on the subscription, and click on Generate WebSphere DataStage Job Definition... Figure 35. Generate the DataStage job definition

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 35 of 42

developerWorks

ibm.com/developerWorks

Save the .dsx file. The generated job definition file DATASTAGE_SUB.dsx is attached with this tutorial for your reference. See downloads.

Section 5. Prepare and run DataStage job


Import the DataStage job definition file To import the DataStage job definition file (.dsx) that you have created in the previous section, from WebSphere DataStage Designer: Click Import -> DataStage Components. Browse and select the DataStage job definition file that you have created in the previous section After the import is complete, a job is created under the Jobs -> DataMirror folder in the DataStage designer. In this example, job DATASTAGE_SUB_BANKCUSTOMERS is created.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 36 of 42

ibm.com/developerWorks

developerWorks

You can open the job by double-clicking Jobs-> DataMirror->DATASTAGE_SUB_BANKCUSTOMERS Figure 36. Import the DataStage job definition file

Verify and set properties for the DataMirrorCDC stage To verify and set properties for DataMirrorCDC stage: Right-click on DatamirrorCDC_Input stage on canvas. Click on Properties. On the Stage tab, change the value for dsautostartTS to True. Click OK. Figure 37. Set properties for the DataMirror CDC stage

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 37 of 42

developerWorks

ibm.com/developerWorks

Set properties for Sequential_File_Output stage To set the Target File property value as the output file where incoming data will be written to: Right-click Sequential_File_Output stage and click on Properties. From the Input tab, provide a file name for the target file. Click OK. Figure 38. Set properties for the Sequential_File_Output stage

Run the DataStage job DATASTAGE_SUB_BANKCUSTOMERS 1. You can insert, update or delete any data in the source table.
Trademarks Page 38 of 42

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

ibm.com/developerWorks

developerWorks

2. 3. 4.

Compile the job by clicking File -> Compile. Once compilation is successful, run the job by clicking File -> Run. Once the job has run successfully, the target file that you specified in the sequential file stage contains the data changes based on insert, update or delete operation done on source table.

Run an example job Once the above job runs successfully, the DataMirrorCDC operator can be used with other stages in DataStage. To run the example job: 1. Import the Example_Job.dsx file in DataStage designer. The Example_job.dsx is contained in the dataMirrorTutorial.zip file (see the Download section). Set job properties for $APT_DBNAME and $APT_DB2_INSTANCEHOME by clicking Edit -> Job Properties -> Parameters. Create a target database BANKCUSTOMERS_TARGET with table definition of BANKCUSTOMERS source table. This example job uses DEMO.BANKCUSTOMERS as the target table, if you use a different target database, change the target database stage properties values appropriately. Figure 39. Run an example job

2. 3.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 39 of 42

developerWorks

ibm.com/developerWorks

Section 6. Conclusion
Now that you have gone through this tutorial, you can configure the DataMirror product to capture changed data using DataMirror. You have learned how to transfer the data changes captured to WebSphere DataStage for cleansing and transforming, ready for saving in a target database. Try this technology using your installations of DataMirror and DataStage, and keep your databases in sync.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 40 of 42

ibm.com/developerWorks

developerWorks

Downloads
Description
Tutorial sample data About DataMirrorCDC operator Information about download methods

Name
datamirrorTutorial.zip DataMirrorCDCReadme-a4.pdf

Size
17KB 164KB

Download method
HTTP HTTP

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 41 of 42

developerWorks

ibm.com/developerWorks

Resources
Learn In the Information Integration area on developerWorks, get the resources you need to advance your skills on IBM Information Platform and Solution products. Browse the technology bookstore for books on these and other technical topics. Get products and technologies Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2, Lotus, Rational, Tivoli, and WebSphere. Discuss Participate in the discussion forum for this content. Check out developerWorks blogs and get involved in the developerWorks community.

About the author


Indrani Ghatare Indrani Ghatare is a software engineer in IBM Silicon Valley Lab, USA. Indrani has been working as a software developer in IBM software group since 2001. Currently, she works in the Information Server Integration team and provides integration solutions for IBM products within IPS portfolio.

Using WebSphere DataStage with IBM DataMirror Change Data Capture Copyright IBM Corporation 2008. All rights reserved.

Trademarks Page 42 of 42