PowerCenter 9' Level 2 Developer Student Guide Lab

Informatica
PowerCenter
8
Level II Developer
Lab Guide
Version - PC8LIID 20060910
Informatica PowerCenter Level II Developer Lab Guide
Version 8.1
September 2006
Copyright (c) 19982006 Informatica Corporation.
All rights reserved. Printed in the USA.
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions
on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or
transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as
provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR
52.227-14 (ALT III), as applicable.

The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing.
Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX,
and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other
company and product names may be trade names or trademarks of their respective owners.
Portions of this software are copyrighted by DataDirect Technologies, 1999-2002.
Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and
University of California, Irvine, Copyright (c) 1993-2002, all rights reserved.
Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public
License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica,
as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular
purpose.
Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration is a registered trademark of Meta Integration
Technology, Inc.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The
Apache Software Foundation. All rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at
http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved.
The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler.
The Curl license provided with this Software is Copyright 1996-200, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved.
The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library
package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/
pcre.
InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved.
Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or
without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html.
This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending.
DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied,
including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this
documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this
documentation at any time without notice.
Tabl e of Contents
Informatica PowerCenter 8 Level II Developer iii
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Obtaining Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Obtaining Informatica Professional Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Providing Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Lab 1: Dynamic Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Step 2: Preview Target Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Step 3: View Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Step 4: Create Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Step 5: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Step 6: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Step 7: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Lab 2: Workflow Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 1: Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 2: Mappings Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 3: Reusable Sessions Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 4: Create a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 5: Create a Worklet in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 6: Create a Timer Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Step 7: Create an E-Mail Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Step 8: Create a Control Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 9: Add Reusable Session to the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 10: Link Tasks in Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 11: Add Reusable Session to the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 12: Link Tasks in Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 13: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table of Contents
iv Informati ca PowerCenter 8 Level II Devel oper
Lab 3: Dynamic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 1: Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 2: Mapping Required. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 3: Copy Reusable Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 4: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 5: Create Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 6: Add Session to Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 7: Create a Timer Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Step 8: Create an Assignment Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Step 9: Link Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Step 10: Run Workflow by Editing the Workflow Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Step 11: Monitor the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Lab 4: Recover a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Step 1: Copy the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Step 2: Edit the Workflow and Session for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Step 3: Edit the Session to Cause an Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Step 4: Run the Workflow, Fix the Session, and Recover the Workflow. . . . . . . . . . . . . . . . . . 23
Lab 5: Using the Transaction Control Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Step 3: Run Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Step 4: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Step 5: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Lab 6: Error Handling with Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Lab 7: Handling Fatal and Non-Fatal Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Tabl e of Contents
Informatica PowerCenter 8 Level II Developer v
Lab 8: Repository Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Step 1: Create a Query to Search for Targets with Customer . . . . . . . . . . . . . . . . . . . . . . . 50
Step 2: Validate, Save, and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Step 3: Create A Query to Search For Mapping Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 52
Step 4: Validate, Save, and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Step 5: Modify and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Step 6: Run the Query Accessed by the Repository Manager . . . . . . . . . . . . . . . . . . . . . . . . . 54
Step 7: Create Your Own Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Lab 9: Performance and Tuning Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Workshop Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Workshop Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Establish ETL Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Documented Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Lab 10: Partitioning Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Workshop Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Scenario 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Table of Contents
vi Informati ca PowerCenter 8 Level II Devel oper
Preface
Informatica PowerCenter 8 Level II Developer vii
Preface
Welcome to PowerCenter, Informaticas software product that delivers an open, scalable data integration
solution addressing the complete life cycle for all data integration projects including data warehouses and
data marts, data migration, data synchronization, and information hubs. PowerCenter combines the latest
technology enhancements for reliably managing data repositories and delivering information resources in
a timely, usable, and efficient manner.
The PowerCenter metadata repository coordinates and drives a variety of core functions, including
extracting, transforming, loading, and managing data. The Integration Service can extract large volumes
of data from multiple platforms, handle complex transformations on the data, and support high-speed
loads. PowerCenter can simplify and accelerate the process of moving data warehouses from development
to test to production.
Preface
vii i Informati ca PowerCenter 8 Level II Devel oper
About This Guide
Welcome to the PowerCenter 8 Level II Developer course.
This course is designed for data integration and data warehousing implementers. You should be familiar
with PowerCenter, data integration and data warehousing terminology, and Microsoft Windows.
Document Conventions
This guide uses the following formatting conventions:
If you see It means Example
> Indicates a submenu to navigate to. Click Repository > Connect.
In this example, you should click the Repository menu
or button and choose Connect.
boldfaced text
Indicates text you need to type or enter. Click the Rename button and name the new source
definition S_EMPLOYEE.
UPPERCASE
Database tables and column names are shown
in all UPPERCASE.
T_ITEM_SUMMARY
italicized text
Indicates a variable you must replace with
specific information.
Connect to the Repository using the assigned login_id.
Note: The following paragraph provides additional
facts.
Note: You can select multiple objects to import by
using the Ctrl key.
Tip: The following paragraph provides suggested
uses or a Velocity best practice.
Tip: The m_ prefix for a mapping name is
Preface
Informatica PowerCenter 8 Level II Developer ix
Other Informatica Resources
In addition to the student guides, Informatica provides these other resources:
Informatica Documentation
Informatica Customer Portal
Informatica web site
Informatica Developer Network
Informatica Knowledge Base
Informatica Professional Certification
Informatica Technical Support
Obtaining Informatica Documentation
You can access Informatica documentation from the product CD or online help.
Visiting Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at http://
my.informatica.com. The site contains product information, user group information, newsletters, access
to the Informatica customer support case management system (ATLAS), the Informatica Knowledge Base,
and access to the Informatica user community.
Visiting the Informatica Web Site
You can access Informaticas corporate web site at http://www.informatica.com. The site contains
information about Informatica, its background, upcoming events, and locating your closest sales office.
You will also find product information, as well as literature and partner information. The services area of
the site includes important information on technical support, training and education, and
implementation services.
Visiting the Informatica Developer Network
The Informatica Developer Network is a web-based forum for third-party software developers. You can
access the Informatica Developer Network at the following URL:
http://devnet.informatica.com
The site contains information on how to create, market, and support customer-oriented add-on solutions
based on interoperability interfaces for Informatica products.
Visiting the Informatica Knowledge Base
As an Informatica customer, you can access the Informatica Knowledge Base at http://
my.informatica.com. The Knowledge Base lets you search for documented solutions to known technical
issues about Informatica products. It also includes frequently asked questions, technical white papers, and
technical tips.
Obtaining Informatica Professional Certification
You can take, and pass, exams provided by Informatica to obtain Informatica Professional Certification.
For more information, go to:
http://www.informatica.com/services/education_services/certification/default.htm
Preface
x Informati ca PowerCenter 8 Level II Devel oper
Providing Feedback
Email any comments on this guide to aconlan@informatica.com.
Obtaining Technical Support
There are many ways to access Informatica Technical Support. You can call or email your nearest
Technical Support Center listed in the following table, or you can use our WebSupport Service.
Use the following email addresses to contact Informatica Technical Support:
support@informatica.com for technical inquiries
support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http://
my.informatica.com.
North America / South
America
Europe / Middle East / Africa Asia / Australia
Informatica Corporation
Headquarters
100 Cardinal Way
Redwood City, California
94063
United States
Toll Free
877 463 2435
Standard Rate
United States: 650 385 5800
Informatica Software Ltd.
6 Waltham Park
Waltham Road, White Waltham
Maidenhead, Berkshire
SL6 3TN
United Kingdom
Toll Free
00 800 4632 4357
Standard Rate
Belgium: +32 15 281 702
France: +33 1 41 38 92 26
Germany: +49 1805 702 702
Netherlands: +31 306 022 797
United Kingdom: +44 1628 511
445
Informatica Business
Solutions Pvt. Ltd.
301 & 302 Prestige Poseidon
139 Residency Road
Bangalore 560 025
India
Toll Free
Australia: 00 11 800 4632 4357
Singapore: 001 800 4632 4357
Standard Rate
India: +91 80 5112 5738
Lab 1
Informatica PowerCenter 8 Level II Developer 1
Lab 1: Dynamic Lookup Cache
Technical Description
You have a customer table in your target database that contains existing customer information. You also
have a flat file that contains new customer data. Some rows in the flat file contain new information on
new customers, and some contain updated information on existing customers. You need to insert the new
customers into your target table and update the existing customers.
The source file may contain multiple rows for a customer. It may also contain rows that contain updated
information for some columns and NULLs for the columns that do not need to be updated.
To do this, you will use a Lookup transformation using a dynamic cache that looks up data on the target
table. The Integration Service inserts new rows and updates existing rows in the lookup cache as it inserts
and updates rows in the target table. If you configure the Lookup transformation properly, the Integration
Service ignores NULLs in the source when it updates a row in the cache and target.
Objectives
Use a dynamic lookup cache to update and insert rows in a customer table
Use a Router transformation to route rows based on the NewLookupRow value
Use an Update Strategy transformation to flag rows for update or insert
Duration
45 minutes
Mapping Overview
Lab 1
2 Informati ca PowerCenter 8 Level II Devel oper
Velocity Deliverable: Mapping Specifications
Sources
Targets
Source To Target Field Matrix
Mapping Name m_DYN_update_customer_list_xx
Source System Flat file Target System EDWxx
Initial Rows 8 Rows/Load 8
Short Description Update the existing customer list with new and updated information.
Load Frequency On demand
Preprocessing None
Post Processing None
Error Strategy None
Reload Strategy None
Unique Source Fields CUST_ID
Files
File Name File Location
updated_customer_list.txt
Create shortcut from
DEV_SHARED folder
In the Source Files directory on the Integration Service process machine.
Tables Schema/Owner EDWxx
Table Name Update Delete Insert Unique Keys
CUSTOMER_LIST
DEV_SHARED folder
yes PK_KEY
CUST_ID
Target Column Source File or Transformation Source Column
Ignore NULL Inputs for Updates
(Lookup Transformation)
PK_KEY LKP_CUSTOMER_LIST Sequence-ID
CUST_ID updated_customer_list.txt CUST_ID
FIRSTNAME updated_customer_list.txt FIRSTNAME Yes
LASTNAME updated_customer_list.txt LASTNAME Yes
ADDRESS updated_customer_list.txt ADDRESS Yes
CITY updated_customer_list.txt CITY Yes
Lab 1
Detailed Overview
STATE updated_customer_list.txt STATE Yes
ZIP updated_customer_list.txt ZIP Yes
Repository Object Name Object Type Description and Instructions
m_DYN_update_customer_list_xx Mapping m_DYN_update_customer_list_xx
Shortcut_to_updated_customer_list Source Definition Flat file in $PMSourceFileDir directory.
Create shortcut from DEV_SHARED folder.
SQ_Shortcut_to_updated_customer_list Source Qualifier Connect to input/output ports of the Lookup transformation,
LKP_CUSTOMER_LIST.
LKP_CUSTOMER_LIST Lookup Lookup transformation based on the target definition
Shortcut_to_CUSTOMER_LIST and the target table CUSTOMER_LIST.
- Change the input/output port names prepend them with IN_.
- Use dynamic caching.
- Define the lookup condition using the customer ID ports.
- Configure the Lookup properties so it inserts new rows and updates
existing rows. (Insert Else Update)
- Ignore NULL inputs for all lookup/output ports except CUST_ID and
PK_KEY.
- Associate input/output ports with a similar name for each lookup/output
port.
- PK_KEY must be an integer in order to specify Sequence-ID as the
Associated Port.
- Connect the NewLookupRow port and all lookup/output ports to
RTR_Insert_Update.
RTR_Insert_Update Router Create two output groups with the following names:
- UPDATE_EXISTING: Condition is NewLookupRow=2. Connect output
ports to UPD_Update_Existing.
- INSERT_NEW: Condition is NewLookupRow=1. Connect output ports to
UPD_Insert_New.
Do not connect any of the NewLookupRow ports to any transformation.
Do not connect the Default output group ports to any transformation.
UPD_Insert_New Update Strategy Update Strategy Expression DD_INSERT.
Connect all input/output ports to CUSTOMER_LIST_Insert.
UPD_Update_Existing Update Strategy Update Strategy Expression DD_UPDATE.
Connect all input/output ports to CUSTOMER_LIST_Update.
CUSTOMER_LIST_Insert Target Definition First instance of the target table definition in EDWxx schema.
Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target
definition.
In the mapping, rename the target instance name to
CUSTOMER_LIST_Insert.
CUSTOMER_LIST_Update Target Definition Second instance of the target table definition in EDWxx schema.
Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target
definition.
In the mapping, rename the target instance name to
CUSTOMER_LIST_Update.
Target Column Source File or Transformation Source Col umn
Ignore NULL Inputs for Updates
(Lookup Transformation)
Lab 1
Instructions
Step 1: Create Mapping
1. Connect to the PC8A_DEV repository using Developerxx as the user name and developerxx as the
password.
2. Create a mapping called m_DYN_update_customer_list_xx, where xx is your student number. Use
the mapping details described in Detailed Overview on page 3 for guidelines.
Figure 1-1 shows an overview of the mapping you must create:
Step 2: Preview Target Data
1. In the m_DYN_update_customer_list_xx mapping, preview the target data to view the rows that
exist in the table.
Figure 1-1. m_DYN_update_customer_list_xx Mapping
Lab 1
2. Use the ODBC_EDW ODBC connection to connect to the target database. Use EDWxx as the user
name and password.
The CUSTOMER_LIST table should contain the following data:
Figure 1-2. Preview Target Data for CUSTOMER_LIST Table Before Session Run
PK_KEY CUST_ID FIRSTNAME LASTNAME ADDRESS CITY STATE ZIP
111001 55001 Melvin Bradley 4070 Morning Trl New York NY 30349
111002 55002 Anish Desai 2870 Elliott Cir Ne New York NY 30305
111003 55003 J Anderson 1538 Chantilly Dr Ne New York NY 30324
111004 55004 Chris Ernest 2406 Glnrdge Strtford Dr New York NY 30342
111005 55005 Rudolph Gibiser 6917 Roswell Rd Ne New York NY 30328
111006 55006 Bianco Lo 146 W 16th St New York NY 10011
111007 55007 Justina Bradley 221 Colonial Homes Dr NW New York NY 30309
111008 55008 Monique Freeman 260 King St San Francisco CA 94107
111009 55009 Jeffrey Morton 544 9th Ave San Francisco CA 94118
Lab 1
Step 3: View Source Data
1. Navigate to the $PMSourceFileDir directory. By default, the path is:
C:\Informatica\PowerCenter8.1.0\server\infa_shared\SrcFiles
2. Open updated_customer_list.txt in a text editor.
The updated_customer_list.txt source file contains the following data:
CUST_ID,FIRSTNAME,LASTNAME,ADDRESS,CITY,STATE,ZIP
67001,Thao,Nguyen,1200 Broadway Ave,Burlingame,CA,94010
67002,Maria,Gomez,390 Stelling Ave,Cupertino,CA,95014
67003,Jean,Carlson,555 California St,Menlo Park,CA,94025
67004,Chris,Park,13450 Saratoga Ave,Santa Clara,CA,95051
55002,Anish,Desai,400 W Pleasant View Ave,Hackensack,NJ,07601
55006,Bianco,Lo,900 Seville Dr,Clarkston,GA,30021
55003,Janice,MacIntosh,,,,
67003,Jean,Carlson,120 Villa St,Mountain View,CA,94043
3. Notice that the row for customer ID 55003 contains some NULL values. You do not want to insert
the NULL values into the target, you only want to update the other column values in the target.
4. Notice that the file contains two rows with customer ID 67003. Because of this, you must use a
dynamic cache for the Lookup transformation.
5. Close the file.
Step 4: Create Workflow
1. Open the Workflow Manager and open your ~Developerxx folder.
2. Create a workflow named wf_DYN_update_customer_list_xx.
3. Create a session named s_m_DYN_update_customer_list_xx using the
m_DYN_update_customer_list_xx mapping.
4. In the session, verify that the target connection is EDWxx.
5. Verify that the Target load type is set to Normal and the Truncate target table option is not checked.
6. Verify the specified source file name is updated_customer_list.txt and the specified location is
$PMSourceFileDir.
Step 5: Run Workflow
Run workflow wf_DYN_update_customer_list_xx.
Lab 1
Step 6: Verify Statistics
Step 7: Verify Results
1. Preview the target data from the mapping to verify the results.
Figure 1-3 shows the Preview Data dialog box for the CUSTOMER_LIST table:
Figure 1-3. Preview Target Data for CUSTOMER_LIST Table After Session Run
Lab 1
The CUSTOMER_LIST table should contain the following data:
2. Look at customer ID 55003. It should not contain any NULLs.
3. Look at customer ID 67003. It should contain data from the last row for customer ID 67003 in the
source file.
PK_KEY CUST_ID FIRSTNAME LASTNAME ADDRESS CITY STATE ZIP
111001 55001 Melvin Bradley 4070 Morning Trl New York NY 30349
111002 55002 Anish Desai 400 W Pleasant View Ave Hackensack NJ 07601
111003 55003 Janice MacIntosh 1538 Chantilly Dr Ne New York NY 30324
111004 55004 Chris Ernest 2406 Glnrdge Strtford Dr New York NY 30342
111005 55005 Rudolph Gibiser 6917 Roswell Rd Ne New York NY 30328
111006 55006 Bianco Lo 900 Seville Dr Clarkston GA 30021
111007 55007 Justina Bradley 221 Colonial Homes Dr NW New York NY 30309
111008 55008 Monique Freeman 260 King St San Francisco CA 94107
111009 55009 Jeffrey Morton 544 9th Ave San Francisco CA 94118
111010 67001 Thao Nguyen 1200 Broadway Ave Burlingame CA 94010
111011 67002 Maria Gomez 390 Stelling Ave Cupertino CA 95014
111012 67003 Jean Carlson 120 Villa St Mountain View CA 94043
111013 67004 Chris Park 13450 Saratoga Ave Santa Clara CA 95051
Lab 2
Lab 2: Workflow Alerts
Business Purpose
A session usually runs for under an hour. Occasionally, it will run longer. The administrator would like to
be notified via an alert if the session runs longer than an hour. A second session is to run after the first
session completes.
A Worklet will be created with a Worklet variable to define the time the Workflow started plus one hour.
A Timer Task will be created in the Worklet to wait for one hour before sending an email. If the session
runs for less than an hour a Control Task will be issued to stop the timer.
Objectives
Create a Workflow
Create a Worklet
Create a Timer Task
Create an Email Task
Create a Control Task
Create a condition to control the Email Task
Duration
30 minutes
Worklet Overview
Workflow Overview
Lab 2
Instructions
Step 1: Setup
Connect to the PC8A_DEV repository in the Designer and Workflow Manager.
Step 2: Mappings Required
If any of the following mappings do not exist in the ~Developerxx folder, copy them from the
SOLUTIONS_ADVANCED folder. Rename the mappings to have the _xx reflect the Developer number.
m_DIM_CUSTOMER_ACCT_xx
m_DIM_CUSTOMER_ACCT_STATUS_xx
Step 3: Reusable Sessions Required
If any of the following sessions do not exist in the ~Developerxx folder, copy them from the
SOLUTIONS_ADVANCED folder. Resolve any conflicts that may occur. Rename the mappings to have
the _xx reflect the Developer number.
s_m_DIM_CUSTOMER_ACCT_xx
s_m_DIM_CUSTOMER_ACCT_STATUS_xx
Step 4: Create a Workflow
Create a Workflow called wf_DIM_CUSTOMER_ACCT_LOAD_xx.
Step 5: Create a Worklet in the Workflow
1. Create a Worklet called wl_DIM_CUSTOMER_ACCT_LOAD_xx.
2. Open the Worklet and create the following tasks.
Step 6: Create a Timer Task in the Worklet
1. Create a Timer task and name it tim_SESSION_RUN_TIME.
2. Edit the Timer task and click the Timer tab.
3. Select the Relative time: radio button.
Lab 2
4. Select the Start after 1 Hour from the start time of this task.
Step 7: Create an E-Mail Task in the Worklet
1. Create an Email task and name it eml_SESSION_RUN_TIME.
2. Click the Properties tab.
3. For the Email User Name type - administrator@anycompany.com.
4. For the Email Subject type - session s_m_DIM_CUSTOMER_ACCT_xx ran an hour or longer.
5. For the Email Text type an appropriate message.
Lab 2
Step 8: Create a Control Task in the Worklet
1. Create a Control task and name it ctrl_STOP_SESS_TIMEOUT.
2. Edit the Control task and click the Properties tab.
3. Set the Control Option attribute to Stop parent.
Step 9: Add Reusable Session to the Worklet
1. Add s_m_DIM_CUSTOMER_ACCT_xx to wl_DIM_CUSTOMER_ACCT_LOAD_xx.
2. Verify source connections are ODS and source file name is customer_type.txt.
3. Verify target connections are EDWxx.
4. Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS.
5. Truncate target table.
6. Ensure Target Load Type is Normal.
Step 10: Link Tasks in Worklet
1. Link Start to tim_SESSION_RUN_TIME and s_m_DIM_CUSTOMER_ACCT_xx.
2. Link tim_SESSION_RUN_TIME to eml_SESSION_RUN_TIME.
3. Link s_m_DIM_CUSTOMER_ACCT_xx to ctrl_STOP_SESS_TIMEOUT Link.
Step 11: Add Reusable Session to the Workflow
1. Add s_m_DIM_CUSTOMER_ACCT_STATUS_xx to wf_DIM_CUSTOMER_ACCT_LOAD_xx.
2. Verify source connections are ODS and source file name is customer_type.txt.
3. Verify target connections are EDWxx.
Lab 2
4. Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS.
5. Truncate target table.
Step 12: Link Tasks in Workflow
1. Link Start to wl_DIM_CUST_ACCT_LOAD_xx.
2. Link wl_DIM_CUST_ACCT_LOAD_xx to s_m_DIM_CUSTOMER_ACCT_STATUS_xx.
1. In the Workflow Monitor, click the Filter Tasks button in the toolbar, or select Filters > Tasks from
the menu.
2. Make sure to show all of the tasks.
3. When you run your workflow, the Task View should look as follows.
Lab 2
Lab 3
Lab 3: Dynamic Scheduling
Business Purpose
The Department Dimension table must load sales information on an hourly basis during the business day.
It does not load during non-business hours (before 6 a.m. or after 6 p.m.). The start time of the loading
session should be calculated and started based on the workflow starting time.
Use workflow variables to calculate when the session starts. The starting time of the session has to be at
the top of the hour on or after 6 a.m. and not on or after 6 p.m. To accomplish this, the workflow will run
continuously.
Objectives
Create and use workflow variables
Create an Assignment Task
Create a Timer Task
Duration
30 minutes
Workflow Overview
Lab 3
Instructions
Step 1: Setup
Connect to PC8A_DEV Repository in the Designer and Workflow Manager.
Step 2: Mapping Required
The following Mapping will be used in this lab. If the below Mapping does not exist in the ~Developerxx
folder, copy it from the SOLUTIONS_ADVANCED folder. Change the xx in the mapping name to
reflect the Developer Number.
m_SALES_DEPARTMENT_xx
Step 3: Copy Reusable Sessions
Copy the following reusable session from the SOLUTIONS_ADVANCED folder to the ~Developerxx
folder. Change the xx in the session name to reflect the Developer Number.
s_m_SALES_DEPARTMENT_xx
Create a Workflow called wf_SALES_DEPARTMENT_xx.
Step 5: Create Workflow Variables
1. Add three variables as follows:
2. Click OK.
3. Save.
Step 6: Add Session to Workflow
1. Add reusable session s_m_SALES_DEPARTMENT_xx to the Workflow.
2. Source Database Connection should be ODS.
3. Target Database Connection should be EDWxx.
Lab 3
5. Truncate the Target Table.
Step 7: Create a Timer Task
1. Create a Timer Task called tim_SALES_DEPARTMENT_START.
2. Edit the Timer task and click the Timer tab.
3. Select the Absolute time: radio button.
4. Select the Use this workflow date-time variable to calculate the wait radio button.
5. Select the ellipsis to browse variables.
6. Double click on wf_SALES_DEPARTMENT_xx.
7. Select $$NEXT_START_TIME as the workflow variable.
8. Save.
Step 8: Create an Assignment Task
1. Create an Assignment Task called asgn_SALES_DEPARTMENT_START_TIME.
2. Add the following expressions:
Calculates the absolute workflow start time to the hour
$$TRUNC_START_TIME = TRUNC(WORKFLOWSTARTTIME, 'HH')
Extracts/assigns the hour from the above calculation
$$HOUR_STARTED = GET_DATE_PART($$TRUNC_START_TIME, 'HH')
Calculates/assigns the start time of the session
$$NEXT_START_TIME = IIF($$HOUR_STARTED >= 5 AND $$HOUR_STARTED < 17,
ADD_TO_Date($$TRUNC_START_TIME, 'HH',1),
DECODE($$HOUR_STARTED,
0, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',6),
17, ADD_TO_DATE($$TRUNC_START_TIME,'HH',13),
20 ,ADD_TO_DATE($$TRUNC_START_TIME,'HH',10),
23, ADD_TO_DATE($$TRUNC_START_TIME,'HH',7)))
Note: The above functions could be nested together in one assignment expression if desired.
Step 9: Link Tasks
1. Create a link from the Start Task to asgn_SALES_DEPARTMENT_START_TIME.
Lab 3
2. Create a link from asgn_SALES_DEPARTMENT_START_TIME to
tim_SALES_DEPARTMENT_START.
3. Create a link from tim_SALES_DEPARTMENT_START to s_m_SALES_DEPARTMENT_xx.
4. Save the repository.
Step 10: Run Workflow by Editing the Workflow Schedule
Note: In order for the top of the hour to be calculated based on the workflow start time, the workflow
must be configured to execute continuously.
1. Edit workflow wf_SALES_DEPARTMENT_xx.
2. Click the SCHEDULER Tab.
3. Verify that the scheduler is Non Reusable.
4. Edit the schedule.
5. Click the Schedule Tab.
6. Click Run Continuously.
7. Click OK.
8. Click OK.
9. Save the repository. This will start the workflow.
Lab 3
Step 11: Monitor the Workflow
1. Open the Gantt Chart View.
Note: Notice that assignment task as already executed and the timer task is running.
2. Browse the Workflow Log.
3. Verify the results of the Assignment expressions in the log file. Listed below are examples:
Variable [$$TRUNC_START_TIME], Value [05/23/2004 16:00:00].
Variable [$$HOUR_STARTED], Value [16].
Variable [$$NEXT_START_TIME], Value [05/23/2004 17:00:00].
4. Verify the Load Manager message that tells when the timer task will complete. Listed below is an
example message:
INFO : LM_36606 [Sun May 23 16:05:02 2004] : (2288|2004) Timer task instance
[TM_SALES_DEPARTMENT_START]: The timer will complete at [Sun May 23 17:00:00 2004].
5. Open Task View.
6. At or near the top of the hour, open the monitor to check the status of the session. Verify that it
starts(started) at the desired time. Below is an example:
7. After the session completes, notice that the workflow automatically starts again.
8. If the workflow starts after 5 p.m., the timer message in the workflow log will show that the timer will
end at 6 a.m. the following morning. Listed below is an example:
Lab 3
[TM_SALES_DEPARTMENT_START]: Timer task specified to wait until absolute time [Mon
May 24 06:00:00 2004], specified by variable [$$NEXT_START_TIME].
[TM_SALES_DEPARTMENT_START]: The timer will complete at [Mon May 24 06:00:00 2004].
9. Stop or abort the workflow at any time. Afterwards, edit the workflow scheduler and select RUN ON
DEMAND.
10. Save the repository.
Lab 4
Lab 4: Recover a Suspended Workflow
In this lab, you will configure a mapping and its related session and workflow for recovery. Then, you will
change a session property to create an error that causes the session to suspend when you run it. You will
fix the error and recover the workflow.
Objectives
Configure a mapping, session, and workflow for recovery.
Recover a suspended workflow.
Duration
30 minutes
Lab 4
Instructions
Step 1: Copy the Workflow
1. Open the Repository Manager.
2. Copy the wkf_Stage_Customer_Contacts_xx workflow from the SOLUTIONS_ADVANCED folder
to your folder.
3. In the Workflow Manager, open the wkf_Stage_Customer_Contacts_xx workflow.
4. Rename the workflow to replace xx with your student number.
5. Rename the session in the workflow to replace xx with your student number.
6. Save the workflow.
Step 2: Edit the Workflow and Session for Recovery
1. Open the wkf_Stage_Customer_Contacts_xx workflow.
2. Edit the workflow, and on the General tab, select Suspend on Error.
3. Edit the s_m_Stage_Customer_Contacts_xx session and click the Properties tab.
4. Scroll to the end of the General Options settings and select Resume from last checkpoint for the
Recovery Strategy.

5. Click the Mapping tab and change the target load type to Normal.
Note: When you configure a session for bulk load, the session is not recoverable using the resume
recovery strategy. You must use normal load.
6. Change the target database connection to EDWxx.
Lab 4
Step 3: Edit the Session to Cause an Error
In this step, you will edit the session so that when the Integration Service runs it, there will be an error.
1. Edit the s_m_Stage_Customer_Contacts_xx session, and click the Mapping tab.
The source in the mapping uses a file list, customer_list.txt. To make the session encounter an error,
you will change the value in the Source Filename session property.
2. On the Sources node, change the source file name to customer_list1234.txt.
3. Click the Config Object tab.
4. In the Error Handling settings, configure the session to stop on one error.
Step 4: Run the Workflow, Fix the Session, and Recover the Workflow
1. Run the workflow.
The Workflow Monitor shows that the Integration Service suspends the workflow and fails the
session.
2. Open the session log.
Suspended Workflow and FailedSession
Lab 4
3. Scroll to the end of the session log.
Notice that the Integration Service failed the session.
Next, you will fix the session.
4. In the Workflow Manager, edit the session.
5. On the Mapping tab, enter customer_list.txt as the source file name.
7. In the Workflow Manager, right-click the workflow, and choose Recover Workflow.
The Workflow Monitor shows that the Integration Service is running the workflow and that the
session is running as a recovery run.
Session run has completed with failure.
Running Recovery Session Run
Lab 4
When the session and workflow complete, the Workflow Monitor shows that the session completed
successfully as a recovery run.
8. Open the session log.
9. Search for session run completed with failure.
Notice that the Integration Service continues to write log events to the same session log.
Successful Recovery Session Run
Lab 4
10. Search for recovery run.
The Integration Service writes recovery information to the session log.
11. Close the Log Viewer.
Lab 5
Lab 5: Using the Transaction Control Transformation
Business Purpose
Line item data is read and sorted by invoice number. We need each invoice number committed to the
target database as a single transaction.
A flag will be created to tell PowerCenter when a new set of Invoice numbers are found. A Transaction
Control Transformation will be created to tell the database when to issue a commit.
Objectives
Create a flag to check for new INVOICE_NOs
Commit upon seeing a new set of INVOICE_NOs
Duration
45 minutes
Mapping Overview
Lab 5
Sources
Targets
Mapping Name m_DIM_LINE_ITEM_xx
Source System ODS Target System EDWxx
Initial Rows Rows/Load
Short Description Commit on a new set of INVOICE NO's
Preprocessing None
Error Strategy None
Unique Source Fields LINE_ITEM_NO
Tables
Table Name Schema/Owner Selection/Fil ter
ODS_LINE_ITEM
DEV_SHARED folder
ODS
Table Name Update Delete Insert Unique Key
DIM_LINE_ITEM
DEV_SHARED folder
yes LINE_ITEM_NO
Target Table Target Column Source Table Source Column Expression
DIM_LINE_ITEM LINE_ITEM_NO ODS_LINE_ITEM LINE_ITEM_NO Issue a commit upon a new set of Invoice Nos.
DIM_LINE_ITEM INVOICE_NO ODS_LINE_ITEM INVOICE_NO Issue a commit upon a new set of Invoice Nos.
DIM_LINE_ITEM PRODUCT_CODE ODS_LINE_ITEM PRODUCT_CODE Issue a commit upon a new set of Invoice Nos.
DIM_LINE_ITEM QUANTITY ODS_LINE_ITEM QUANTITY Issue a commit upon a new set of Invoice Nos.
DIM_LINE_ITEM PRICE ODS_LINE_ITEM PRICE Issue a commit upon a new set of Invoice Nos.
DIM_LINE_ITEM COST ODS_LINE_ITEM COST Issue a commit upon a new set of Invoice Nos.
Lab 5
Detailed Overview
Transformation Name Type Description
Mapping Mapping m_DIM_LINE_ITEM_xx
ODS_LINE_ITEM Source Definition Table Source definition in ODS schema.
Create shortcut from DEV_SHARED folder.
Shortcut_to_sq_ODS_LINE_ITEM Source Qualifier Send to srt_DIM_LINE_ITEM:
LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY,
DISCOUNT, PRICE, COST
srt_DIM_LINE_ITEM Sorter Sort by INVOICE_NO
Send to exp_DIM_LINE_ITEM
INVOICE_NO
SEND to tc_DIM_LINE_ITEM:
exp_DIM_LINE_ITEM Expression Uncheck the 'o' on INVOICE_NO
Create a variable called v_PREVIOUS_INVOICE_NO as a decimal
10,0 to house the value of the previous row's INVOICE_NO.
Expression:
INVOICE_NO
Create a variable called v_NEW_INVOICE_NO_FLAG as an Integer to
set a flag to check whether the current row's INVOICE_NO is the same
as the previous row's INVOICE_NO
Expression:
IIF(INVOICE_NO=v_PREVIOUS_INVOICE_NO, 0,1)
Move v_NEW_INVOICE_NO_FLAG above
v_PREVIOUS_INVOICE_NO
Create an output port called NEW_INVOICE_NO_FLAG_out as a
integer to hold the value of the flag
Expression:
v_NEW_INVOICE_NO_FLAG
SEND to tc_DIM_LINE_ITEM:
NEW_INVOICE_NO_FLAG_out
tc_DIM_LINE_ITEM Transaction Control On the ports tab, delete the _out from NEW_INVOICE_FLAG_out
On the properties tab enter the following Transaction Control
Condition:
IIF(NEW_INVOICE_NO_FLAG=1,
TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)
SEND to DIM_LINE_ITEM:
Shortcut_to_DIM_LINE_ITEM Target Definition Target definition in the EDWxx schema.
Create a shortcut from DEV_SHARED folder.
Lab 5
Instructions
Create a mapping called m_DIM_LINE_ITEM_xx, where xx is your student number. Use the mapping
details described in the previous pages for guidelines.
1. Open ~Developerxx folder.
2. Create workflow named wf_DIM_LINE_ITEM_xx.
3. Create session named s_m_DIM_LINE_ITEM_xx.
4. In the session, edit Mapping tab and expand the Sources node. Under Connections verify that the
Connection Value is ODS.
5. Expand the Targets node and verify that the Connection value is correct, the Target load type is set to
Normal and the Truncate target table option is checked.
Run workflow wf_DIM_LINE_ITEM_xx.
Lab 5
Lab 5
Lab 6
Lab 6: Error Handling with Transactions
Business Purpose
The IT Department would like to prevent erroneous data from being committed into the
DIM_VENDOR_PRODUCT table. They would also like to issue a commit every time a new group of
VENDOR_IDs is written. A rollback will also be issued for an entire group of vendors if any record in
that group has an error.
Records will be committed when a new group of VENDOR_IDs comes in. This will require a flag to be
set to determine whether a VENDOR_ID is new or not. Rows will need to be rolled back if an error
occurs. An error flag will be set when a business rule is violated.
Objectives
Use a Transaction Control Transformation to Commit based upon Vendor IDs and issue a rollback
based upon errors.
Duration
60 minutes
Mapping Overview
Lab 6
Sources
Targets
Lookup Transformation Detail
Mapping Name m_DIM_VENDOR_PRODUCT_TC_xx
Source System Flat File Target System EDWxx
Short Description
Issue a commit based upon VENDOR_ID, but only if the PRODUCT_CODE is not null and the
CATEGORY is valid for all records in the group. A rollback of the entire group should occur if Informatica
comes across a null PRODUCT code or an invalid CATEGORY.
Preprocessing None
Error Strategy None
Unique Source Fields None
Files
PRODUCT.txt
DEV_SHARED folder
In the Source Files directory on the Integration Service process machine
DIM_VENDOR_PRODUCT
DEV_SHARED folder
yes
Lookup Name lkp_ODS_VENDOR
Lookup Table Name ODS_VENDOR Location ODS
Description
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate
DIM_VENDOR_PRODUCT.
Match Condi tion(s) ODS.VENDOR_ID = PRODUCT.VENDOR_ID
Filter/ SQL Override N/A
Return Value(s) VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE
Lab 6
Detailed Overview
DIM_VENDOR_PRODUCT PRODUCT_CODE PRODUCT PRODUCT_CODE
DIM_VENDOR_PRODUCT VENDOR_ID PRODUCT VENDOR_ID
DIM_VENDOR_PRODUCT VENDOR_NAME PRODUCT Derived Value from
lkp_ODS_VENDOR
Return VENDOR_NAME from
lkp_ODS_VENDOR
DIM_VENDOR_PRODUCT VENDOR_STATE PRODUCT Derived Value from
lkp_ODS_VENDOR
Return VENDOR_STATE from
lkp_ODS_VENDOR
DIM_VENDOR_PRODUCT PRODUCT_NAME PRODUCT PRODUCT_NAME
DIM_VENDOR_PRODUCT CATEGORY PRODUCT CATEGORY
DIM_VENDOR_PRODUCT MODEL PRODUCT MODEL
DIM_VENDOR_PRODUCT PRICE PRODUCT PRICE
DIM_VENDOR_PRODUCT FIRST_CONTACT PRODUCT Derived Value from
lkp_ODS_VENDOR
Return FIRST_CONTACT from
lkp_ODS_VENDOR
Mapping Mapping m_DIM_VENDOR_PRODUCT_TC_xx
PRODUCT.txt Source Definition Drag in Shortcut from DEV_SHARED
Sq_Shortcut_To_ PRODUCT Source Qualifier Data Source Qualifier for flat file
SEND PORT to exp_SET_ERROR_FLAG:
PRODUCT_CODE, VENDOR_ID, CATEGORY, PRODUCT_NAME,
MODEL, PRICE
exp_SET_ERROR_FLAG Expression Output port: ERROR_FLAG
Expression:
IIF(ISNULL(PRODUCT_CODE) OR ISNULL(CATEGORY),
TRUE, FALSE)
Send all output ports to srt_VENDOR_ID.
srt_VENDOR_ID Sorter Sort data ascending by VENDOR_ID & ERROR_FLAG. This puts any
error records at the end of each group.
SEND all PORTS to exp_SET_TRANS_TYPE.
SEND PORTS to lkp_ODS_VENDOR:
VENDOR_ID
Lab 6
exp_SET_TRANS_TYPE Expression 1. Create a variable called v_PREV_VENDOR_ID as a Decimal with
precision of 10 to house the value of the previous vendor.
Expression: VENDOR_ID
2. Create a variable port called v_NEW_VENDOR_ID_FLAG as an
integer to check and see if the current VENDOR_ID is new.
Expression:
IIF(VENDOR_ID != v_PREV_VENDOR_ID, TRUE,
FALSE)
Variables can be used to remember values across rows.
V_PREV_VENDOR_ID must always hold the value of the previous
VENDOR_ID, so it must be placed after v_NEW_VENDOR_ID_FLAG
3. Create an output port as a string(8) called TRANSACTION_TYPE
to tell the Transaction Control Transformation whether to CONTINUE,
COMMIT, or ROLLBACK.
Expression:
IIF(ERROR_FLAG = TRUE,
'ROLLBACK',
IIF(v_NEW_VENDOR_ID_FLAG = TRUE,
'COMMIT',
'CONTINUE'))
Since we sorted to put error records at the end of each group, when we
ROLLBACK, we'll be rolling back the whole group.
4. SEND all output PORTS to tc_DIM_VENDOR_PRODUCT.
lkp_ODS_VENDOR Lookup Create a connected lookup to ODS.ODS_VENDOR. Create an input
port for the source data field VENDOR_ID
Rename VENDOR_ID1 to VENDOR_ID_in
Set Lookup Condition:
VENDOR_ID = VENDOR_ID_in
SEND PORTS to tc_DIM_VENDOR_PRODUCT
VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE
tc_DIM_VENDOR_PRODUCT Transaction
Control
Expression:
DECODE(TRANSACTION_TYPE,
'COMMIT', TC_COMMIT_BEFORE,
'ROLLBACK', TC_ROLLBACK_AFTER,
'CONTINUE', TC_CONTINUE_TRANSACTION)
// If we're starting a new group, we need to COMMIT the
// prior group.
// If we hit an error, we need to ROLLBACK the current
// group including the current record.

PORTS to SEND to DIM_VENDOR_PRODUCT:
All ports except for TRANSACTION_TYPE
Shorcut_To_DIM_VENDOR_PROD
UCT
Target Table All data without errors will be routed here
Create shortcut from DEV_SHARED folder
Lab 6
Instructions
Create a mapping called m_DIM_VENDOR_PRODUCT_TC_xx, where xx is your student number.
Use the mapping details described in the previous pages for guidelines.
2. Create workflow named wf_DIM_VENDOR_PRODUCT_TC_xx.
3. Create session named s_m_DIM_VENDOR_PRODUCT_TC_xx
4. Source file is found in the Source Files directory on the Integration Service machine
5. Verify that the source filename is PRODUCT.txt (extension required)
6. Verify target database connection value is EDWxx
7. Verify target load type is Normal
8. Select Truncate for DIM_VENDOR_PRODUCT
9. Set Lookup connection to ODS
Run workflow wf_DIM_VENDOR_PRODUCT_TC_xx.
Lab 6
Lab 7
Lab 7: Handling Fatal and Non-Fatal Errors
Business Purpose
ABC Incorporated would like to track which records are failing when trying to run a load from the
PRODUCT Flat File to the DIM_VENDOR_PRODUCT table. Also some of the developers have
noticed dirty data being loaded into the DIM_VENDOR_PRODUCT table, therefore users are getting
dirty data in their reports.
Instead of using a Transaction Control Transformation, route the Fatal Errors off to a Fatal Error table
and route the Nonfatal Errors off to a Nonfatal table. All good data will be sent to the EDW.
Objectives
Trap all database errors and load them to a table called ERR_FATAL.
Trap the dirty data coming through from the CATEGORY field and write it to a table called
ERR_NONFATAL.
Write all data without fatal or nonfatal errors to DIM_VENDOR_PRODUCT.
Duration
60 minutes
Lab 7
Mapping Overview
Lab 7
Sources
Targets
Mapping Name m_DIM_VENDOR_PRODUCT_xx
Source System Flat File Target System EDWxx
Short Description
If a fatal error is found, route data to a fatal error table, If a nonfatal error is found route data to a
nonfatal table, If data is free of errors route it to DIM_VENDOR_PRODUCT.
Preprocessing None
Error Strategy Create a flag for both fatal errors and nonfatal errors. Route bad data to its respective table.
Unique Source Fields None
Files
PRODUCT.txt
DEV_SHARED folder
In the Source Files directory on the Integration Service process machine.
DIM_VENDOR_PRODUCT
DEV_SHARED folder
yes
ERR_NONFATAL
DEV_SHARED folder
yes ERR_ID
Lab 7
ERR_FATAL
DEV_SHARED folder
yes ERR_ID
Lab 7
Lookup Transformation Detail
Lookup Name lkp_ODS_VENDOR
Lookup Table Name ODS_VENDOR Location ODS
Description
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate
DIM_VENDOR_PRODUCT.
Match Condi tion(s) ODS.VENDOR_ID = PRODUCT.VENDOR_ID
Filter/ SQL Override N/A
Return Value(s) VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE
ERR_NONFATAL ERR_ID PRODUCT Derived Value Generated from seq_ERR_ID_ERR_NONFATAL
ERR_NONFATAL REC_NBR PRODUCT REC_NUM N/A
ERR_NONFATAL ERR_RECORD PRODUCT Derived Value The entire source record is concatenated
ERR_NONFATAL ERR_DESCRIPTION PRODUCT Derived Value First, records must be tested for validity.
Run a check to see if the PRODUCT_CODE is
Null.
Set a flag to True or False
Run a check to see if CATEGORY is Null
Rows must be separated into Fatal, Nonfatal and
Good Data
All NONFATAL ERRORS have a description of
INVALID CATEGORY
ERR_NONFATAL LOAD_DATE PRODUCT Derived Value Date and time the session runs
ERR_FATAL ERR_ID PRODUCT Derived Value Generated from seq_ERR_ID_ERR_FATAL
ERR_FATAL REC_NBR PRODUCT REC_NUM N/A
ERR_FATAL ERR_RECORD PRODUCT Derived Value The entire record is concatenated and sent to
the ERR_FATAL table.
ERR_FATAL ERR_DESCRIPTION PRODUCT Derived Value First, records must be tested for validity.
Run a check to see if the PRODUCT_CODE is
null.
Run a check to see if CATEGORY is Null
Rows must be separated into Fatal, Nonfatal and
Good Data
All Fatal Errors have a description of NULL
VALUE IN KEY
ERR_FATAL LOAD_DATE PRODUCT Derived Value The Date and time the session runs.
DIM_VENDOR_PR
ODUCT
PRODUCT_CODE PRODUCT PRODUCT_CODE Rows must have a non null PRODUCT_CODE
and a valid CATEGORY.
DIM_VENDOR_PR
ODUCT
VENDOR_ID PRODUCT VENDOR_ID Rows must have a non null PRODUCT_CODE
Lab 7
Detailed Overview
DIM_VENDOR_PR
ODUCT
VENDOR_NAME PRODUCT Derived Value from
lkp_ODS_VENDOR
Rows must have a non null PRODUCT_CODE
DIM_VENDOR_PR
ODUCT
VENDOR_STATE PRODUCT Derived Value from
lkp_ODS_VENDOR
DIM_VENDOR_PR
ODUCT
PRODUCT_NAME PRODUCT PRODUCT_NAME Rows must have a non null PRODUCT_CODE
DIM_VENDOR_PR
ODUCT
CATEGORY PRODUCT CATEGORY Rows must have a non null PRODUCT_CODE
DIM_VENDOR_PR
ODUCT
MODEL PRODUCT MODEL Rows must have a non null PRODUCT_CODE
DIM_VENDOR_PR
ODUCT
PRICE PRODUCT PRICE Rows must have a non null PRODUCT_CODE
DIM_VENDOR_PR
ODUCT
FIRST_CONTACT PRODUCT Derived Value from
lkp_ODS_VENDOR
Mapping Mapping m_DIM_VENDOR_PRODUCT_xx
PRODUCT.txt Flat File Source
Definition
Drag in Shortcut from DEV_SHARED
Shortcut_To_sq_PRODUCT Source Qualifier Source Qualifier for flat file.
Create shortcut from DEV_SHARED folder
exp_ERROR_TRAPPING Expression Check to see if PRODUCT_CODE is NULL
Derive ISNULL_PRODUCT_CODE_out by creating an output port
CODE: IIF(ISNULL(PRODUCT_CODE),'FATAL','GOOD DATA')
Check to see if CATEGORY is NULL
Derive INVALID_CATEGORY_out by creating an output port
IIF(ISNULL(CATEGORY), 'NONFATAL', 'GOOD
DATA')
Derive ERR_RECORD_out by creating an output port that
concatenates the entire record.
Use a To_Char function to convert all non-strings to strings
SEND PORTS to lkp_ODS_VENDOR:
VENDOR_ID
SEND PORTS to rtr_PRODUCT_DATA:
PRODUCT_CODE, ISNULL_PRODUCT_CODE_out, VENDOR_ID,
CATEGORY, INVALID_CATEGORY_out, PRODUCT_NAME, MODEL,
PRICE, REC_NUM, ERR_RECORD_out
Lab 7
lkp_ODS_VENDOR Lookup Create a connected lookup to ODS.ODS_VENDOR Create an input
port for the source data field VENDOR_ID
Rename VENDOR_ID1 to VENDOR_ID_in
Set Lookup Condition:
VENDOR_ID = VENDOR_ID_in
SEND PORTS to rtr_PRODUCT_DATA:
VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE
rtr_PRODUCT_DATA Router Create groups to route the data off to different paths:
Group = NONFATAL_ERRORS
CODE: INVALID_CATEGORY_out='NONFATAL'
Group = FATAL_ERRORS
CODE: ISNULL_PRODUCT_CODE_out='FATAL'
The default group will contain rows that do not match the above
conditions, hence all good rows.
PORTS TO SEND TO exp_ERR_NONFATAL:
NONFATAL_ERRORS.PRODUCT_CODE
PORTS to SEND to ERR_NONFATAL:
NONFATAL_ERRORS.REC_NUM,
NONFATAL_ERRORS.ERR_RECORD
PORTS to SEND to exp_ERR_FATAL:
FATAL_ERRORS.PRODUCT_CODE
PORTS to SEND to ERR_FATAL:
FATAL_ERRORS.REC_NUM,
FATAL_ERRORS.ERR_RECORD
PORTS to SEND to DIM_VENDOR_PRODUCT:
DEFAULT.PRODUCT_CODE, DEFAULT.VENDOR_ID,
DEFAULT.VENDOR_NAME, DEFAULT.VENDOR_STATE,
DEFAULT.PRODUCT_NAME, DEFAULT.CATEGORY,
DEFAULT.MODEL, DEFAULT.PRICE, DEFAULT.FIRST_CONTACT
exp_ERR_FATAL Expression Derive ERR_DESCRIPTION_out by creating an output port
CODE: 'NULL VALUE IN KEY'
Derive LOAD_DATE_out by creating an output port
CODE: SESSSTARTTIME
PORTS to SEND to ERR_FATAL:
LOAD_DATE_out, ERR_DESCRIPTION_out
exp_ERR_NONFATAL Expression Derive ERR_DESCRIPTION_out by creating an output port
CODE: INVALID CATEGORY'
Derive LOAD_DATE_out by creating an output port
CODE: SESSSTARTTIME
PORTS to SEND to ERR_NONFATAL:
LOAD_DATE_out, ERR_DESCRIPTION_out
seq_ERR_FATAL Sequence
Generator
Generate the ERR_ID for ERR_FATAL
seq_ERR_NONFATAL Sequence
Generator
Generate the ERR_ID for ERR_NONFATAL
ERR_FATAL Target Traps all of the FATAL ERRORS
ERR_NONFATAL Target Traps all NONFATAL ERRORS
DIM_VENDOR_PRODUCT Target All good data to be loaded into the target table.
Lab 7
Instructions
Create a mapping called m_DIM_VENDOR_PRODUCT_xx, where xx is your student number. Use the
mapping details described in the previous pages for guidelines.
2. Create workflow named wf_DIM_VENDOR_PRODUCT_xx.
3. Create session named s_m_DIM_VENDOR_PRODUCT_xx.
Source file is found in the Source Files directory on the Integration Service process machine.
4. Verify source file name is PRODUCT all Uppercase with an extension of .txt.
5. Verify the target database connect is EDWxx.
6. Change the target load type to Normal.
7. Truncate DIM_VENDOR_PRODUCT.
8. Set Lookup connection to ODS.
1. Run workflow wf_DIM_VENDOR_PRODUCT_xx.
Lab 7
ERR_NONFATAL
ERR_FATAL
DIM_VENDOR_PRODUCT
Lab 7
Lab 8
Lab 8: Repository Queries
In this lab, you will search for repository objects by creating and running object queries.
Objectives
Create object queries
Run object queries
Duration
15 minutes
Lab 8
Instructions
Step 1: Create a Query to Search for Targets with Customer
First, you will create a query that searches for target objects with the string customer in the target name.
1. In the Designer, choose Tools > Queries.
The Query Browser appears.
2. Click New to create a new query.
Figure 8-4 shows the Query Editor:
3. In the Query Name field, enter targets_customer.
4. In the Parameter Name column, select Object Type.
5. In the Operator column, select Is Equal To.
6. In the Value 1 column, select Target Definition.
7. Click the New Parameter button.
Notice that the Query Editor automatically adds an AND operator for the two parameters.
Figure 8-4. Query Editor
Add AND or OR
operators.
Add a new query
parameter.
Validate the query.
Run the query.
AND Operator
Lab 8
8. Edit the new parameter to search for object names that contain the text customer.
Step 2: Validate, Save, and Run the Query
1. Click the Validate button to validate the query.
The PowerCenter Client displays a dialog box stating if the query is valid or not. If the query is not
valid, fix the error and validate it again.
2. Click Save.
The PowerCenter Client saves the query to the repository.
3. Click Run.
The Query Results window shows the results of the query you created. Your query results might
include more objects than in the following results:
Some columns only apply to objects in a versioned repository, such as Version Comments, Label
Name, and Purged By User.
Run
Validate
Save
Lab 8
Step 3: Create A Query to Search For Mapping Dependencies
Next, you will create a query that returns all dependent objects for a mapping. A dependent object is an
object used by another object. The query will search for both parent and child dependent objects. An
example child object of a mapping is a source. An example parent object of a mapping is a session.
1. Close the Query Editor, and create a new query.
2. Enter product_inventory_mapping_dependents as the query name.
3. Edit the first parameter so the object name contains product.
4. Add another parameter, and choose Include Children and Parents for the parameter name.
Note: When you search for children and parents, you enter the following information in the value
columns:
Value 1. Object type(s) for dependent object(s), the children and parents.
Value 2. Object type(s) for the object(s) you are querying.
Value 3. Reusable status of the dependent object(s).
The PowerCenter Client automatically chooses Where for the operator.
5. Click the arrow in the Value 1 column, select the following objects, and click OK:
Mapplet
Source Definition
Target Definition
Transformation
6. In the Value 2 column, choose Mapping.
Note: When you access the Query Editor from the Designer, you can only search for Designer
repository objects. To search for all repository object types that use the mapping you are querying,
create a query from the Repository Manager.
7. Choose Reusable Dependency in the third value column.
Step 4: Validate, Save, and Run the Query
1. Validate the query.
2. Save and run the query.
Lab 8
Your query results might look similar to the following results:
The query returned objects in all folders in the repository. Next, you will modify the query so it only
returns objects in your folder.
Step 5: Modify and Run the Query
1. In the Query Editor, place the cursor somewhere in the last parameter and then add a new parameter.
2. Modify the parameter so it searches for folders equal to the SOLUTIONS_ADVANCED folder.
3. Validate and save the query.
4. Run the query.
Lab 8
Notice that the even though the query says to include parent and child objects, it does not display any
parent objects to the mapping. Parent objects of a mapping include sessions, worklets, and workflows.
When you run a query accessed by the Designer, the query results only display Designer objects.
Similarly, when you run a query accessed by the Workflow Manager, the query results only display
Workflow Manager objects.
In the next step, you will run the same query accessed by the Repository Manager.
Step 6: Run the Query Accessed by the Repository Manager
1. Open the Repository Manager and connect to the repository.
2. Open the Query Browser. For details on how to do this, see Create a Query to Search for Targets
with Customer on page 50.
3. Select the product_inventory_mapping_dependents query, and run it by clicking Execute.
Lab 8
Notice that the query results show all parent (and child) objects, including Workflow Manager
objects, such as workflows.
Step 7: Create Your Own Queries
1. Create a new query that searches for invalid mappings.
Tip: You might need to modify a mapping in your folder to make it invalid. You can copy the mapping
with a new name, and then delete links to the target.
2. Create a new query that searches for impacted mappings.
Tip: You can modify a source or target used in a mapping by removing a column. The Designer or
Workflow Manager invalidates a parent object when you modify a child object in such a way that the
parent object may not be able to run.
Lab 8
Lab 9
Lab 9: Performance and Tuning Workshop
Business Purpose
The support group within the IT Department has taken over the support of an ETL system that was
recently put into production. The implementation team seemed to do a good job but the over the last few
runs some of the sessions/mappings have been running very slowly and they need to be optimized. Due to
budget constraints, the management does not want to pay consultants to optimize the sessions/mappings
so the task has fallen on the support group. It has been mandated that the group reduce the run time of
one particular session/mapping by at least 30%. The Team Lead is confident that the group is up to the
challenge, they have just returned from an Informatica Advanced Training course.
The session that needs to be optimized is wf_FACT_MKT_SEGMENT_ORDERS_xx.
This session runs a mapping that reads in a flat file of order data, finds the customer market segment
information, aggregates the orders and writes the values out to a relational table.
The support group needs to find the bottleneck(s), determine the cause of the bottleneck(s) and then
reduce the bottleneck(s). The reduction in run time must be at least 30%.
Objectives
Use learned techniques to determine and reduce the bottleneck(s) that exist.
Duration
120 minutes
Object Locations
ProjectX folder
Lab 9
Workshop Details
Overview
This workshop is designed to assist the developers with the task at hand. It does not give detailed
instructions on how to identify a bottleneck, determine the cause of a bottleneck or how to optimize the
session/mapping. The approach to take is left entirely up to the discretion of the developers. The
optimization techniques to use are also left up to the developers. The workshop will provide instructions
on establishing a typical read baseline and on running the original session.
The suggested steps to follow are:
1. Establish a typical read baseline
2. Run the original session
3. Identify and reduce the bottlenecks
Target
Source
Mapping
Session
Important: For detailed information on identifying bottlenecks and reducing bottlenecks, see the
Performance Tuning Guide in the PowerCenter online help. To access the online help, press the F1 key in
any of the PowerCenter Client tools. In the online help, click the Contents tab and expand the section for
the Performance Tuning Guide.
Workshop Rules
The rules of the workshop are:
Developers must work in teams of two.
Partitioning cannot be used to optimize the session.
Data results must match the initial session run.
Think out of the box.
Ask the instructor any questions that come to mind.
Establish ETL Baseline
In order to obtain a starting point for measurement purposes it is necessary to establish baselines. Ideally
a baseline should be established for the ETL process, the network and disks.
A straight throughput mapping sourcing from a RDBMS and writing to a flat file will establish a typical
read baseline.
Typical Read Baseline
In order to have a reasonable measurement for uncovering source bottlenecks a typical read baseline will
need to be established. This can be accomplished by running a straight throughput mapping that sources
a relational table and writes to a flat file. The session properties can be used to accomplish this.
1. In the Repository Manager, copy the wf_Source_Baseline_xx workflow from the ProjectX folder to
your folder.
2. In the Workflow Manager, open the wf_Source_Baseline_xx workflow in your folder.
Lab 9
3. Edit the session named s_m_Source_Baseline_xx, and click the Mapping tab:
a. Edit the Sources node and ensure the database connection is ODS.
b. Edit the Targets node and change the Writer from Relational Writer to File Writer.
c. Change the Targets Properties for the Output and Reject filenames to include your assigned
student number.
4. Save, start and monitor the workflow.
5. Document the results in the table provided in Documented Results on page 65.
Run Original Session
Running the original session will provide a starting point to measure the progress against.
1. In the Repository Manager, copy the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow from
the ProjectX folder to your folder.
2. In the Workflow Manager, edit the session named s_m_FACT_MKT_SEGMENT_ORDERS_xx
located in the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow in your folder.
3. In the Mapping Tab, edit the Sources node:
a. Ensure the ORDER_LINE_ITEM source filename value is daily_order_line_item.dat.
b. Ensure the ODS_INVOICE_SUMMARY database connection is ODS.
4. In the Mapping Tab, edit the Targets node:
a. Ensure the database connection is EDWxx.
b. Ensure the Target load type is set to Normal.
c. Ensure the Truncate target table option is checked.
5. Save, start and monitor the workflow.
6. Document the results in the table provided in Documented Results on page 65.
Lab 9
SOURCES
TARGETS
Mapping Name m_FACT_MKT_SEGMENT_ORDERS_xx
Source System ODS and Flat File Target System EDWxx
Initial Rows 4,015,335 Rows/ Load 437,023
Short Description
Calculates totals for quantity, revenue and cost for market segments. Values are summarized by customer,
date, market segment, region and item.
Preprocessing None
Error Strategy None
Unique Source
Fields
Tables
Table Name Schema/Owner Selection/Filter
daily_order_line_item Flat File This is a daily order line item file that contains order
information for customers. The file contains 1,328,667 rows
of order data for August 29, 2003 and is sorted by order id.
This file is joined to the ODS_INVOICE_SUMMARY
relational table in order to retrieve the payment type that the
customer uses. It is assumed that the customer uses the
same payment type each time. The payment types are
CREDIT CARD, DEBIT CARD, CASH and CHECK
The source file is called daily_order_line_item.dat. The
location for the file can be found by checking the service
variable $PMSourceFileDir.
ODS_INVOICE_SUMMARY ODS This is a monthly summary of customer invoice data. The
table contains invoice number, customer, order date,
payment type and amount. The Primary Key is Invoice
Number. The table contains 2,686,668 rows.
Tables Schema Owner EDWxx
FACT_MKT_SEGMENT_ORDERS Yes ORDER_KEY (system generated)
Lab 9
LOOKUPS
SOURCE TO TARGET FIELD MATRIX
Target table name: FACT_MKT_SEGMENT_ORDERS
Lookup Name lkp_ITEM_ID
Table DIM_ITEM Location EDWxx
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the ITEM_KEY stored on it as a Foreign
Key. The item id contained in the source will be matched with the item id in the DIM_ITEM table to retrieve
the ITEM_KEY.
The cost of each item needs to be obtained from this table and used in the calculation of item costs for
each row written to the target.
This table contains 27 rows.
Match
Condition(s)
DIM_ITEM.ITEM_ID = ORDER_LINE_ITEM.ITEM_ID
Filter/SQL
Override
N/A
Return Value(s) ITEM_KEY, COST
Lookup Name lkp_CUSTOMER_INFO
Table DIM_CUSTOMER_PT Location EDWxx
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the customer key stored on it as a Foreign
Key. The CUSTOMER_ID contained in the source will be matched with the CUSTOMER_ID in the
DIM_CUSTOMER_PT table to retrieve the customer key (C_CUSTKEY).
The market segment of each customer is also retrieved and used in aggregate groupings.
This table contains 1,000,000 rows.
Match
Condition(s)
DIM_CUSTOMER_PT.C_CUST_ID = ORDER_LINE_ITEM.CUSTOMER_ID
Filter/SQL
Override
N/A
Return Value(s) C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
Target Column Source Table Source Column Expression
ORDER_DATE ORDER_LINE_ITEM ORDER_DATE N/A
ORDER_QUANTITY ORDER_LINE_ITEM QUANTITY Sum of QUANTITY grouped by customer key, order date,
market segment, region and item key.
ORDER_REVENUE ORDER_LINE_ITEM REVENUE Sum of REVENUE grouped by customer key, order date,
market segment, region and item key.
PYMT_TYPE ODS_INVOICE_SUMMARY PYMT_TYPE N/A
ORDER_KEY Derived Value Generated by a Sequence Generator
CUSTOMER_KEY Derived Value Foreign Key referencing the DIM_CUSTOMER_PT table.
Obtained via a lookup to the dimension table on the
CUSTOMER_ID column.
Lab 9
MKTSEGMENT Derived Value The market segment that the customer belongs in.
Obtained via a lookup to the DIM_CUSTOMER_PT
dimension table.
REGION Derived Value Derived based on customer id. If the customer id is: <
50000 the region is 'WEST',
>= 50000 and < 95000 the region is 'CENTRAL',
>= 95000 and < 120000 the region is 'SOUTH',
>= 120000 and < 200501 the region is 'EAST',
>= 200501 the region will be 'UNKNOWN'
ITEM_KEY Derived Value Foreign Key referencing the DIM_ITEM table. Obtained
via a lookup to the DIM_ITEM dimension table on the
ITEM_ID column.
ORDER_COST Derived Value SUM of the (COST * QUANTITY). COST is obtained via
a lookup to the DIM_ITEM dimension table.
Target Column Source Table Source Column Expression
Lab 9
DETAILED OVERVIEW
Mapping Mapping m_FACT_MKT_SEGMENT_ORDERS_xx
Shortcut_to_ORDER_LINE_ITEM Source Definition Flat file containing daily order information for each customer. Contains
orders for August 29, 2003. This table contains 1,328,667 rows
Sq_Shortcut_to_ORDER_LINE_IT
EM
Source Qualifier Flat File Source Qualifier
Sent to jnr_PAYMENT_TYPE:
All Ports
Shortcut_to_ODS_INVOICE_SUM
MARY
Source Qualifier Relational table containing a summary of the invoices for the month.
This table contains data from August 1, 2003 through August 29, 2003.
The key is INVOICE_NO and the table contains 2,686,668 rows
Sq_Shortcut_To_ODS_INVOICE_S
UMMARY
Source Qualifier Relational Source Qualifier
Sent to jnr_PAYMENT_TYPE:
All Ports
Jnr_PAYMENT_TYPE Joiner Joiner transformation that joins the ORDER_LINE_ITEM table to the
ODS_INVOICE_SUMMARY table.
Master Source: ORDER_LINE_ITEM
Detail Source: ODS_INVOICE_SUMMARY
Join Condition:
ORDER_DATE = ORDER_DATE
CUSTOMER_ID = CUSTOMER_ID
Sent to lkp_ITEM_ID:
ORDER_LINE_ITEM: ITEM_ID
Sent to lkp_CUSTOMER_INFO:
ORDER_LINE_ITEM: CUSTOMER_ID
Sent to exp_SET_UNKNOWN_KEYS:
ORDER_LINE_ITEM: ORDER_DATE, QUANTITY, PRICE
ODS_INVOICE_SUMMARY: PYMT_TYPE
lkp_ITEM_ID Lookup Lookup transformation that obtains item keys from the DIM_ITEM
table. The DIM_ITEM table is located in the EDWxx schema.
Lookup Condition
ITEM_ID from DIM ITEM =
ITEM_ID from ORDER_LINE_ITEM
ITEM_KEY, COST
Lab 9
Lkp_CUSTOMER_INFO Lookup Lookup transformation that obtains customer keys from the
DIM_CUSTOMER_PT table. The DIM_CUSTOMER_PT table is
located in the EDWxx schema.
Lookup Condition
CUSTOMER_ID from DIM_CUSTOMER_PT =
CUSTOMER_ID from ORDER_LINE_ITEM
C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
exp_SET_UNKNOWN_KEYS Expression Expression Transformation that sets values for missing columns (item
key, mktsegment). It also defines the region the customer belongs in.
Output Ports:
MKTSEGMENT_out
Formula:
IIF( ISNULL(MKTSEGMENT), 'UNKNOWN',
MKTSEGMENT)
ITEM_ID_out
Formula:
IIF(ISNULL(ITEM_KEY), 0.00, ITEM_COST)
REGION_OUT
Formula:
IIF(C_CUST_ID > 0 AND C_CUST_ID < 50000,
'WEST',
IIF(C_CUST_ID >= 50000 AND C_CUST_ID <
95000, 'CENTRAL',
120000, 'SOUTH',
200501, 'EAST', 'UNKNOWN'))))
Sent to agg_VALUES:
All output ports
agg_VALUES Aggregator Aggregator transformation that calculates the revenue, quantity and
cost
Group by ports:
C_CUSTKEY, ORDER_DATE, MKTSEGMENT, REGION, ITEM_KEY
Output ports:
ORDER_QUANTITY
Formula: SUM(QUANTITY)
ORDER_REVENUE
Formula: SUM(PRICE * QUANTITY)
ORDER_COST
Formula: SUM(ITEM_COST * QUANTITY)
Sent to FACT_MKT_SEGMENT_ORDERS:
All output ports
Lab 9
Documented Results
Seq_ORDER_KEY Sequence
Generator
Sequence Generator transformation that populates the system
generated ORDER_KEY
Sent to FACT_MKT_SEGMENT_ORDERS:
NEXTVAL
Shortcut_to_FACT_MKT_SEGMEN
T_ORDERS
Target Definition Fact table located in EDWxx schema
Sessi on Name
Rows
Processed
Rows Failed Start Time End Time
Elapsed Time
(Secs)
Rows Per
Second
ETL Read Baseline
Original Session
Write to Flat File Test
(Target)
Filter Test (Source)
Read Mapping Test
(Source or Mapping)
Filter Test (Mapping)
Lab 9
Lab 10
Lab 10: Partitioning Workshop
Business Purpose
The support group within the IT Department has taken over the support of an ETL system that was
recently put into production. During development the test data was not up to standard, therefore serious
performance testing could not be accomplished. The system has been in production for a while and the
support group has already taken some steps to optimize the sessions that have been running. The time
window is still tight so the management wants the support group to look at partitioning some of the
sessions to see if this would help.
The sessions/mappings that are in need of analysis are:
s_m_Target_Bottleneck_xx. This session reads in a relational source that contains customer account
balances for the year.
s_m_Items_Bottleneck_xx. This mapping reads a large flat file of item sold data, filters out last years
stock, applies some row level manipulation, performs a lookup to get cost information and then loads
the data into an Oracle table.
Note: The s_m_Items_Bottleneck_xx mapping is a hypothetical example. It does not exist in the
repository.
s_m_Source_Bottleneck_xx. This mapping reads in one relational source that contains customer
account balances and another relational source that contains customer demographic information. The
two tables are joined at the database side.
s_m_Mapping_Bottleneck_xx. This mapping reads in a flat file of order data, finds the customer
market segment information, filters out rows that haven't sold more than one item, aggregates the
orders and writes the values out to a relational table.
The support group needs to review each one of these sessions to if it makes sense to partition the session.
Objectives
Review the sessions and based on knowledge gained from the presentations determine what
partitioning, if any, should be done.
Duration
60 minutes
Object Locations
ProjectX folder
Lab 10
Workshop Scenarios
Scenario 1
The session in question is s_m_Target_Bottleneck_xx has been optimized already but it is felt that more
can be done. The machine that the session is running on has 32 Gig of Memory and 16 CPUs. The
mapping takes account data from a relational source, calculates various balances and then writes the data
out to the BalanceSummary table. The BalanceSummary table is an Oracle table that the DBA has
partitioned by the account_num column.
Answer the following questions:
Review Partition Points
1. Edit the s_m_Target_Bottlenck_xx Session located in the wf_Target_Bottleneck_xx workflow.
Question Answers
I. How many pipeline stages does this
session contain?
II. What default partition points does this
session contain?
III. Can partitions be added/deleted or can the
partition types be changed to make this
more efficient?
IV. What partition types should be used and
where?
V. In what way will this increase
performance?
Lab 10
2. Click the Mapping > Partitions tab to see the partition points.
3. Select each Transformation and look at the window at the bottom of the screen to see what partition
type is being used for that particular partition point.
Partition Test
The purpose of this section is to implement partitioning on the s_m_Target_Bottleneck_xx session.
1. Copy the wf_Target_Bottleneck_xx workflow and rename it to wf_Target_Bottleneck_Partition_xx
2. Edit the s_m_Target_Bottleneck_xx Session located in the wf_Target_Bottleneck_Partition_xx
workflow and rename in to s_m_Target_Bottleneck_Partition_xx
3. Click the Mapping tab, and then click the Partitions tab
4. On the Partitions tab, select the Shortcut_to_BalanceSummary transformation, click the Edit
Partition Point icon and add two new partitions
5. Select Key Range from the drop down box and click OK
6. Leave <**All**> selected in the Key Range drop down menu
7. Click on Edit Keys - this allows the definition of the columns that are going to be in the key range
8. Add the Account_num column to the Key Range and select OK
9. Input the following ranges for the 3 partitions
Partition #1 - start range 1, end range 3500
Partition #3 - start range 7000
10. Select the SQ_Shortcut_to_Source2 partition Point and edit the partition point
11. Select Key Range from the drop down box
Lab 10
12. Add the Account_num column to the Key Range and select OK
13. Input the following ranges for the 3 partitions
Partition #3 - start range 7000
14. Save, start and monitor the Workflow
15. Compare the results against the original session results and against the indexed session results. Is there
a performance gain?
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
Scenario 2
Note: The mapping shown in this scenario is a hypothetical example. It does not exist in the repository.
The session in question is s_m_Items_Bottleneck_xx has been running slowly and the Project manager
wants it optimized. The machine that this is running on has 8 Gig of Memory and 4 CPUs. The mapping
takes items sold data from a large flat file, transforms it and writes out to an Oracle table. The flat file
comes from one location and splitting it up is not an option. The second Expression transformation is
very complex and takes a long time to push the rows through.
Mapping Overview
Question Answers
I. How many pipeline stages does this
session contain?
II. What default partition points does this
session contain?
partition types be changed to make this
more efficient?
where?
V. In what way will this increase
performance?
Lab 10
Conclusion
Scenario 3
The session in question is s_m_Source_Bottleneck_xx has been running slowly and the Project manager
wants it optimized. The machine that this is running on has 2 Gig of Memory and 2 CPUs. The mapping
reads one relational source that contains customer account balances and another relational source that
contains customer demographic information. The tables are joined at the database side, the rows are then
pushed through an expression transformation and loaded into an Oracle table.
Mapping Overview
Conclusion
Scenario 4
The session in question is s_m_Mapping_Bottleneck_Sorter_xx is still not running quite as fast as is
needed. The machine that this is running on has 24 Gig of Memory and 16 CPUs. The mapping reads a
flat file source that is really 3 region specific flat files being read from a file list. The rows are then passed
through two lookups to obtain item costs and customer information. It is then sorted and aggregated
before being loaded into an Oracle table. The customer is part of the sort key and the DBA has
Question Answers
I. How many pipeline stages does this session
contain?
II. What default partition points does this session
contain?
partition types be changed to make this more
efficient?
where?
V. In what way will this increase performance?
Lab 10
partitioned the Oracle table by customer_key. What can be done to further optimize this session/
mapping?
Mapping Overview
Conclusion
Question Answers
contain?
contain?
partition types be changed to make this more
efficient?
where?
V. In what way will this increase performance?
Lab 10
Answers
Scenario 1
Scenario 2
Scenario 3
Question Answers
contain?
3
contain?
Source Qualifier, Target
III. Can partitions be added/deleted or can the partition
types be changed to make this more efficient?
Yes
IV. What partition types should be used and where? Key_Range at both the source and the target
V. In what way will this increase performance? This will add multiple connections to the source and target which will
result in data being read concurrently. This will be faster.
Question Answers
contain?
3
contain?
Source Qualifier and Target
Yes
IV. What partition types should be used and where? Additional pass-through at the exp_complex_calculations
transformation
V. In what way will this increase performance? This will add one more pipeline stage which in turn will give you an
additional buffer to move data.
Question Answers
contain?
3
contain?
Source Qualifier, Target
No - Each partition takes at least between 1-2 CPUs
IV. What partition types should be used and where? N/A
V. In what way will this increase performance? N/A
Lab 10
Scenario 4
Question Answers
contain?
4
contain?
Source Qualifier, Aggregator and Target
Yes
IV. What partition types should be used and where? 3 Partitions
- Key-Range at Target
- Split the source into the 3 regions specific files and read each one
into one of the partitions
- Hash Auto Keys an the Sorter Transformation. This will also allow
you to remove the partition point at the aggregator if you like.
V. In what way will this increase performance? Additional connections at the target will load faster.
You need to split the source flat file into the 3 region specific files
because you can have only one
connection open to a flat file
The Hash Auto-Keys is required to make sure that there is no
overlap at the aggregator. You could also remove the partition point
at the aggregator if you like.
If the flat files significantly vary in size then you may want to add a
round robin somewhere. In this particular mapping this will not make
sense to do this.

PowerCenter 9' Level 2 Developer Student Guide Lab

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

PowerCenter 9' Level 2 Developer Student Guide Lab

Загружено:

Авторское право:

Доступные форматы

Informatica

Вам также может понравиться

PowerCenter 9&#39; Level 2 Developer Student Guide Lab

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

PowerCenter 9&#39; Level 2 Developer Student Guide Lab

Загружено:

Авторское право:

Доступные форматы

Informatica

Вам также может понравиться

PowerCenter 9' Level 2 Developer Student Guide Lab

PowerCenter 9' Level 2 Developer Student Guide Lab