Академический Документы
Профессиональный Документы
Культура Документы
PowerCenter 8.x Level I Developer Student Guide Version 04 April2008 Copyright (c) 2008 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The Apache Software Foundation. All rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved. The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler. The Curl license provided with this Software is Copyright 1996-2007, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved. The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre. InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved. Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.
Preface
Welcome to the PowerCenter 8 Level I Developer course. Data integration is a large undertaking with many potential areas of concern. The PowerCenter infrastructure will greatly assist you in your data integration efforts and alleviate much of your risk. This course will prepare you for that challenge by teaching you the most commonly used components of the product. You will build a small operational data store (ODS) using PowerCenter to extract from Source tables and files, transform the data, load it into a staging area, and finally into the operational data store. The Instructor will teach you about Mappings, transformations, Sources, Targets, Workflows, Sessions, Workflow tasks, Connections, and the Velocity methodology.
iii
Use PowerCenter 8 Designer to build mappings that move data from sources to targets Use PowerCenter 8 Workflow Manager to build and run a workflow that runs a session based on a mapping Design mappings and workflows based on business needs Perform basic troubleshooting of PowerCenter mappings and transformaitons Use Informatica Support options to resolve questions and problems about Informatica PowerCenter 8
Audience
This course is designed for data integration and data warehousing implementers. You should be familiar with data integration and data warehousing terminology, and in using Microsoft Windows software.
Document Conventions
This guide uses the following formatting conventions:
If you see It means Indicates a submenu to navigate to. Example
boldfaced text
UPPERCASE
Click the Rename button and name the new source definition S_EMPLOYEE. T_ITEM_SUMMARY Connect to the Repository using the assigned login_id. Note: You can select multiple objects to import by using the Ctrl key. Tip: The m_ prefix for a mapping name is
Database tables and column names are shown in all UPPERCASE. Indicates a variable you must replace with specific information. The following paragraph provides additional facts. The following paragraph provides suggested uses or a Velocity best practice.
italicized text
Note: Tip:
x x x x x x x
Informatica Documentation Informatica Customer Portal Informatica web site Informatica Developer Network Informatica Knowledge Base Informatica Professional Certification Informatica Technical Support
The site contains information on how to create, market, and support customer-oriented add-on solutions based on interoperability interfaces for Informatica products.
Providing Feedback
Email any comments on this guide to education@informatica.com.
x x
support@informatica.com for technical inquiries support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Toll Free 877 463 2435 Standard Rate United States: 650 385 5800
Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Toll Free 00 800 4632 4357 Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445
Asia / Australia Informatica Business Solutions Pvt. Ltd. 301 & 302 Prestige Poseidon 139 Residency Road Bangalore 560 025 India Toll Free Australia: 00 11 800 4632 4357 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 5112 5738
Table of Contents
Module 0. Course Introduction 1. PowerCenter Overview 2. Mapping Fundamentals 3. Workflow Basics 4. Expression and Filter Transformations 5. Joining and Merging Data 6. Lookup Transformations 7. Sorter and Aggregator Transformations 8. Using the Debugger 9. Updating Target Tables 10. Mapping Techniques 11. Mapplets and Worklets 12. Controlling Workflows 13. Mapping Design Workshop 14. Workflow Design Workshop 15. PowerCenter 9.0 New Features Page 0-1 1-1 2-1 3-1 4-1 5-1 6-1 7-1 8-1 9-1 10-1 11-1 12-1 13-1 14-1 15-1
vii
Course Introduction
0.1
L1D_20081124GV9
Course Introduction
0.2
Course Introduction
0.3
Introductions
Logistics/Site Information Introductions
About you How do you expect to benefit from this course?
3 of 22
Course Introduction
0.4
Course Audience
PowerCenter 8.x Level I Developer is designed for Developers and Consultants This course enables participants to use the principal features of Informatica PowerCenter 8 for integrating data between disparate applications This material assumes familiarity with database concepts and technology
4 of 22
Course Introduction
0.5
Course Objectives
When you have completed this course, you should be able to:
Use PowerCenter 8 Designer to build mappings that move data from sources to targets Use PowerCenter 8 Workflow Manager to build and run a workflow that runs a session based on a mapping Design simple mappings and workflows based on business needs Perform basic troubleshooting of PowerCenter mappings and transformations Use Informatica Support options to resolve questions and problems about Informatica PowerCenter 8
5 of 22
Course Introduction
0.6
Course Agenda
1. PowerCenter Overview 2. Mapping Fundamentals 3. Workflow Basics 4. Expression and Filter Transformations 5. Joining and Merging Data 6. Lookup Transformations 7. Sorter and Aggregator Transformations
6 of 22
Course Introduction
0.7
7 of 22
Course Introduction
0.8
8 of 22
Course Introduction
0.9
9 of 22
Course Introduction
0.10
10 of 22
Course Introduction
0.11
Website provides: Product Access to ATLAS system Knowledge Base Access to the User Community User Group info Newsletters Debugging Tools Velocity
Module 0: Course Introduction
documentation Support
Informatica /
11 of 22
Course Introduction
0.12
12 of 22
Course Introduction
0.13
Informatica Documentation
Can be accessed via: Product CD or Download link Online help Documentation Center
13 of 22
Course Introduction
0.14
Contains
Documented solutions to known technical issues Answers to frequentlyasked questions (FAQs) White papers Technical tips Perform generic- or specific-searches
Module 0: Course Introduction
14 of 22
Course Introduction
0.15
15 of 22
Course Introduction
0.16
On the Online Support page, navigate to the Service Requests tab Click the New button
16 of 22
Course Introduction
0.17
17 of 22
Course Introduction
0.18
PowerCenter 8.x
PowerCenter 8.x Upgrade PowerCenter 8.x Level I Developer PowerCenter 8 XML Support PowerCenter 8.x Level II Developer
18 of 22
Course Introduction
0.19
Introduction to PowerExchange PowerCenter 8.5 Level I Administrator PowerCenter 8 High Availability PowerCenter 8 Team-Based Development
PowerExchange Basics
19 of 22
Course Introduction
0.20
Data Quality Assessment Using IDQ Informatica Data Quality 8.6 New Features
20 of 22
Course Introduction
0.21
PowerCenter QuickStart (eLearning) PowerCenter 8.5+ Administrator (4 days) PowerCenter Developer 8.x Level I (4 days) PowerCenter Developer 8 Level II (4 days)
PowerCenter QuickStart (eLearning) PowerCenter 8.5+ Administrator (4 days) PowerCenter Developer 8.x Level I (4 days) PowerCenter Developer 8 Level II (4 days) PowerCenter 8 Data Migration (4 days) PowerCenter 8 High Availability (1 day)
Architecture & Administration; Advanced Administration Advanced Mapping Design Advanced Admistration Enablement Technologies
Additional Training: PowerCenter 8.5 New Features PowerCenter 8.6 New Features PowerCenter 8 Upgrade
21 of 22
Course Introduction
0.22
Informatica Data Quality 8.6 Level I (4 days) Informatica Data Explorer 8.6 Level I (2 days)
Additional Training: Data Quality Assessment Using Informatica Data Explorer Data Quality Assessment Using Informatica Data Quality Informatica Data Quality 8.5 Cleansing Workshop Informatica Data Quality 8.5 Matching Workshop Informatica Data Quality 8.6 New Features
22 of 22
PowerCenter Overview
1.1
PowerCenter Overview
1.2
Module Objectives
After completing this module you will be able to: Explain the purposes of PowerCenter Define terms used in PowerCenter Name major PowerCenter components
2 of 32
PowerCenter Overview
1.3
The Problem
Large organizations have a lot of data
The data can be stored in many formats, including databases and unstructured files
This data must be collated, combined, compared, and made to work as a seamless whole
But the different databases dont talk to each other!
Marketing (ORCL)
3 of 32
PowerCenter Overview
1.4
Billing (Sybase)
4 of 32
Connector
A connector is a piece of custom software that performs two functions: it converts data from the format of one application to the format of another application, and it transports the data between the two applications.
Technical Note
PowerCenter Overview
1.5
Billing (Sybase)
5 of 32
PowerCenter Overview
1.6
PowerCenter Overview
1.7
Informatica PowerCenter
Informatica PowerCenter is the premium data integration solution available today
Database neutral will communicate with any database Powerful data transformations convert one applications data to anothers format
Manufacturing (DB2) Marketing (ORCL) Informatica PowerCenter Accounting (SAP) Inventory (SQL Server) Resource Planning (PSFT) 7 of 32 Billing (Sybase) Sales (SalesForce)
PowerCenter Overview
1.8
Accounting (old)
Informatica PowerCenter
Accounting (new)
8 of 32
PowerCenter Overview
1.9
Billing A
Informatica PowerCenter
Billing B
9 of 32
PowerCenter Overview
1.10
Marketing (ORCL)
Billing (Sybase)
Informatica PowerCenter
Data Warehouse 10 of 32
In addition to the examples given on these slides, PowerCenter is deployed for: Data Synchronization ongoing exchange of data between disparate applications Data Hubs master data management; reference data hubs; single view of customer, product, supplier, employee, etc. Business Activity Monitoring business process improvement, real-time reporting
PowerCenter Overview
1.11
11 of 32
PowerCenter Overview
1.12
Decision Support
Data Warehouse
Transaction data Optimized for transaction response time Current Normalized or De-normalized data
Aggregate data Cleanse data Consolidate data Apply business rules De-normalize data
Transform
ETL
Extract
Load
12 of 32
PowerCenter Overview
1.13
ETL: Extract
PowerCenter reads data, row by row, from a table (or group of related tables) in a database, or from a file This database or file is referred to as the source The structure of the source is contained in a source definition object.
Extract
Source
Informatica PowerCenter
13 of 32
PowerCenter Overview
1.14
ETL: Transform
PowerCenter converts the rows into a format the second (target) system will be able to use The logic for this conversion is defined in transformation objects
Extract
Transform
Source
Informatica PowerCenter
14 of 32
PowerCenter Overview
1.15
ETL: Load
PowerCenter writes data, row by row, to a table (or group of related tables) in a database, or to a file This database or file is referred to as the target The structure of the target is contained in a target definition object
Extract Transform Load
Source
Informatica PowerCenter
Target
15 of 32
PowerCenter Overview
1.16
Mapping
A set of transformations, in sequence or in parallel, that move and transform data from one or more source(s) to one or more target(s) Mappings exist entirely inside PowerCenter
Informatica PowerCenter
Source(s)
Target(s)
transformations
mapping
16 of 32
Mappings
A mapping logically defines the ETL process. It reads data from sources, applies transformation logic to the data, and writes the transformed data to targets.
PowerCenter Overview
1.17
Transformations
Transformations receive data and transform it
Generate new fields Modify data Select and pass data
Transformations
17 of 32
PowerCenter Overview
1.18
Session
The object that runs a mapping
Session
Mapping
Transformation
18 of 32
PowerCenter Overview
1.19
Workflow
An ordered set of one or more sessions and other tasks, designed to accomplish an overall operational purpose
Other Tasks Workflow
Session
Mapping
Transformation
19 of 32
Tasks
A task is an executable set of actions, functions, or commands. A session is a task that runs a mapping. Other tasks include: Command runs a shell script Email sends an email Decision branches a workflow conditionally Timer waits for a defined period
PowerCenter Overview
1.20
Metadata
Defines data and processes Examples:
Source and target definitions
Type (flat file, database table, XML file, etc) Datatype (character string, integer, decimal, etc) Other attributes (length, precision, etc.)
Repository
20 of 32
Note
The repository is implemented as a schema, which may reside in any of a number of supported relational database management systems.
Metadata
The word metadata literally means data about data. It is the information that describes data. Common contents of metadata include the source of a dataset, how it should be accessed, and its limitations.
Metadata in PowerCenter PowerCenter uses metadata to define sources, targets, transformations, mappings, and workflows
PowerCenter Overview
1.21
Sources
Domain
Integration Service
TCP/IP
Targets
HTTPS
Native drivers
PowerCenter Client
Native drivers
Repository
Domain Metadata 21 of 32
Can be relational tables or heterogeneous files (such as flat files, VSAM files, or XML) The engine which performs all the ETL logic Manages connectivity to metadata repositories that contain mapping and workflow definitions Multithreaded process that retrieves, inserts, and updates repository metadata Contains all the metadata needed to run the ETL process
Client Tools
Desktop tools used to populate the repository with metadata, execute workflows on the Integration Service, monitor the workflows, and manage the repository
PowerCenter Overview
1.22
Development
Repository Manager
Manage repository connections folders objects users and groups (in PowerCenter 8.1)
Designer
Workflow Manager
Workflow Monitor
22 of 32
Note
Designer and Repository Manager access the repository through the Repository Service. Workflow Manager and Workflow Monitor connect to the Integration Service. Each client has its own user interface. The UIs typically have toolbars, a navigation window to the left, a workspace to the right, and an output window at the bottom.
PowerCenter Overview
1.23
Transformation toolbar
* Single login to client applications click on icons to open other tools they are already connected to repository
Module 1: PowerCenter Overview
23 of 32
PowerCenter Overview
1.24
Designer Tools
Target Transformation Mapplet Source Designer: Developer: Designer: Analyzer: create create source create target create reusable objects transformations mapplets objects
24 of 32
PowerCenter Overview
1.25
Transformation Views
Iconized shows the transformation in relation to the rest of the mapping Normal shows the flow of data through the transformation
Edit shows transformation ports and properties; allows editing Ports represent table columns or file fields
25 of 32
PowerCenter Overview
1.26
Navigator Window
Status Bar
Output Window
26 of 32
PowerCenter Overview
1.27
Create worklets
Create workflows
27 of 32
PowerCenter Overview
1.28
28 of 32
PowerCenter Overview
1.29
Navigator Window
Output Window
Task View
Time Window
29 of 32
PowerCenter Overview
1.30
Navigator Window
Status Bar
Output Window
Main Window
30 of 32
PowerCenter Overview
1.31
Class Scenario
Business Function: Staging Data format similar to OLTP, used for populating ODS
Business Function: ODS/EDW Operational Data Store or Electronic Data Warehouse is normalized data in an enterprise data Model that aligns data from various OLTP Systems DB Schema: ODSxx
Business Function: DDW Dimensional Data Warehouse uses Fact and Dimension tables in second normal form to speed report generation and allow for historical data Covered in Level two Developer class
DB Schema: SDBU
DB Schema: STGxx
31 of 32
Note
In the labs for this course, we are simulating part of the creation of a (very simple) Dimensional Data Warehouse. In these labs, you will begin with data in OLTP tables and flat files, bring data to Staging, and from Staging (STG) to the Operational Data Store (ODS). Because creation of Staging tables is fairly trivial, you will do more work on moving data from STG to ODS. This will provide more realistic uses of the capabilities of PowerCenter.
PowerCenter Overview
1.32
Summary
This module showed you how to: Explain the purposes of PowerCenter Define terms used in PowerCenter Name major PowerCenter components
32 of 32
Mapping Fundamentals
2.1
Mapping Fundamentals
2.2
Module Objectives
After completing this module you will be able to Create source and target definitions from flat files and relational tables Create a mapping using existing source and target definitions Use links to connect ports
2 of 22
Mapping Fundamentals
2.3
3 of 22
Mapping Fundamentals
2.4
Transaction Control: allows data-driven commits and rollbacks Java: allows Java code to be used within PowerCenter Midstream XML Parser: parses XML anywhere in a mapping Midstream XML Generator: creates XML anywhere in a mapping
More Source Qualifiers: read from XML, message queues and applications
4 of 22
Mapping Fundamentals
2.5
PowerCenter Designer
Provides tools to define and manipulate
Sources Targets Transformations Mappings Other objects
5 of 22
Mapping Fundamentals
2.6
Repositories
Objects are stored as metadata in repositories Within a repository, objects are organized in folders
Repository
Folder
Sub folders
6 of 22
Repository Management
Repositories are not created and managed in the Designer application. They are created in the Administration Console application, and managed in the Repository Manager application.
Folder Management
Folders are created and managed in the Repository Manager application. Do not confuse repository folders with the directories visible in Windows Explorer. The folders are PowerCenter repository objects and are not related to Windows directories. Technically, all folders are shared with all users who have the appropriate folder permissions, regardless of the blue arm icon. The blue arm icon indicates that the folder permits shortcuts, dynamic links to the objects contained in that folder used by mappings in other folders.
Shortcut Folders
Mapping Fundamentals
2.7
Source Definitions
Defines the structure of a data source such as a relational database table or a flat file Created using the Source Analyzer in the PowerCenter Designer application Enables you to preview the data in the source
7 of 22
Note
Two sources from different systems may use the same name. Placing each source in a folder based on its connection type avoids confusion when this is the case.
Mapping Fundamentals
2.8
Source Analyzer
The Designer tool for creating source definitions Import source definitions from:
Relational databases Flat files XML sources Cobol sources Applications such as SAP, Siebel, and PeopleSoft PowerExchange Mainframes
8 of 22
Mapping Fundamentals
2.9
9 of 22
Type Description
Active Mandatory for all flat file and relational sources in a mapping. Selects records from flat file and relational table sources. For relational tables, creates a SQL SELECT statement. Converts native source datatypes to PowerCenter transformation datatypes.
Ports
Mapping Fundamentals
2.10
10 of 22
Mapping Fundamentals
2.11
Target Definitions
Define the structure of a target such as a relational database table or a flat file Created using the Target Designer in the PowerCenter Designer application Can be
Copied from a source object Imported from relational database Imported from flat file A shortcut to a target in another folder
11 of 22
Mapping Fundamentals
2.12
Target Designer
The Designer tool for creating targets Wizards import target definitions from
Flat files Relational database tables XML Applications such as SAP BF and MQ Series
12 of 22
Mapping Fundamentals
2.13
13 of 22
Mapping Fundamentals
2.14
Shortcuts
A shortcut is a dynamic link to the original object
Usually a target or source definition
To create a shortcut drag the object to another open folder or any workspace where the object type is allowed Shortcuts appear in the Navigator window with a small curved arrow
14 of 22
Mapping Fundamentals
2.15
Datatypes
PowerCenter must know the data types used internally by both source and target systems
These are called native datatypes
The Source Qualifier converts the source data to a standard format used internally by PowerCenter
This is the integration or transformation datatype
The target definition object converts it to the native datatype of the target system
15 of 22
Note
The integration datatype standardizes transformations, and is easily translated from and to the native datatypes of application databases
Note
Mapping Fundamentals
2.16
Datatype Conversion
NATIVE DATATYPES TRANSFORMATION DATATYPES
Specific to the source and target database types Display in source and target tables within Mapping Designer
Transformation datatypes
Transformation datatypes allow mix and match of source and target database types When connecting ports, native and transformation datatypes must be compatible (or must be explicitly converted)
Module 2: Mapping Fundamentals
16 of 22
Datatype Conversion
Datatypes can be converted by Passing data between ports with different datatypes Passing data from an expression to a port Using transformation functions Using arithmetic operators The following type conversions are supported: Numeric datatypes l other numeric datatypes Numeric datatypes l string Date/Time l date or string For further information, in the PowerCenter client, consult Help Content Index port-to-port data conversion
Mapping Fundamentals
2.17
Transformation Ports
Data passes into and out of transformations through input and output ports
Input Ports Output Ports
17 of 22
Mapping Fundamentals
2.18
Passive transformations
One row comes in, one row goes out one-for-one input to output
Same number of rows output as input
3 rows out
18 of 22
Note
The transformation is considered passive regardless of what transformations take place within a row, provided only that the rows going out are a one-for-one match with the rows going in.
Examples
Mapping Fundamentals
2.19
Active Transformations
Many rows in, many rows out
May not be the same number output as input
5105557283 6503855000
19 of 22
Examples
Mapping Fundamentals
2.20
Velocity Methodology
Informaticas Velocity methodology includes: Templates
Mapping specification templates Source to target field matrix
Naming conventions
Object type prefixes: m_, exp_, agg_, wfl_, s_,
Best practices
20 of 22
Velocity Phases
Velocity covers the entire data integration project lifecycle: Phase 1: Manage Phase 2: Architect Phase 3: Design Phase 4: Build Phase 5: Deploy Phase 6: Operate For more information, see http://devnet.informatica.com
Note
Note
In Velocity, the standard is to begin the names of all Source Qualifier objects with SQ_, followed by the name of the source. The Mapping Designer automatically names the Source Qualifier this way when you drag a Source object onto the Mapping canvas.
Mapping Fundamentals
2.21
21 of 22
Mapping Fundamentals
2.22
Summary
This module showed you how to Create source and target definitions from flat files and relational tables Create a mapping using existing source and target definitions Use links to connect ports
22 of 22
Workflow Basics
3.1
Workflow Basics
3.2
Module Objectives
After completing this module you will be able to: Create a basic Workflow and link its tasks Run a Workflow, monitor its execution, and verify the results
2 of 13
Workflow Basics
3.3
PowerCenter Tasks
Session: Run the logic of a mapping Command: Run external commands Email: Send an email to a defined recipient Decision: Choose between paths in a Workflow Assignment: Assign values to variables Timer: Wait or pause for specified time Control: Terminate or fail a Workflow Event Wait: Wait for an event Event Raise: Cause an Event Wait task to trigger
Module 3: Workflow Basics
3 of 13
Workflow Basics
3.4
Workflow Object
Executes a series of Mappings (as Sessions) and other tasks
4 of 13
Workflow Basics
3.5
Workflow Manager
The Workflow Manager is the PowerCenter application that enables designers to build and run workflows
Can be launched from Designer by clicking the W icon
5 of 13
Workflow Basics
3.6
Workflow Designer
The tool in Workflow Manager where you create Workflow objects
6 of 13
Workflow Basics
3.7
Start Task
Is always the first task in a Workflow
7 of 13
Workflow Basics
3.8
Session Task
Implements the execution of a Mapping
8 of 13
Workflow Basics
3.9
Links in Workflows
Indicate the flow of control from one task to the next The flow may branch either:
Unconditionally (multiple links are followed from a single task) Using Decision tasks (only one branch is followed)
9 of 13
Workflow Basics
3.10
Running a Workflow
Right-click in the Workflow and select Start Workflow Passes control to Workflow Monitor
10 of 13
Workflow Basics
3.11
Workflow Monitor
Displays all Workflows and tasks in real time Provides access to logs and results
11 of 13
Workflow Basics
3.12
12 of 13
Workflow Basics
3.13
Summary
This module showed you how to: Create a basic Workflow and link its tasks Run a Workflow, monitor its execution, and verify the results
13 of 13
Workflow Basics
3.14
4.1
4.2
Module Objectives
After completing this module you will be able to: Use Expression transformations to perform calculations on a row-by-row basis Use Filter transformations to pass rows based on userdefined conditions
2 of 19
4.3
Expression Transformation
Performs row-level calculations (no aggregate functions)
3 of 19
Type Description
Passive Modifies individual ports (columns) within a single row. Can add and suppress ports. Cannot perform aggregation across multiple rows.
Business Purpose
Use the logical and arithmetic operators and built-in functions for: Character manipulation (concatenate, truncate, etc.) Datatype conversion (to char, to date, etc.) Data cleansing (check nulls, replace strings, etc.) Data manipulation (round, truncate, etc.) Numerical calculations Scientific calculations Special functions (lookup, decode, etc.) Testing (for spaces, number, etc.)
4.4
4 of 19
4.5
Expression Editor
An expression is a calculation or conditional statement for a specific port Can contain other ports, functions, operators, variables, constants and return values from other transformations
5 of 19
Comments
Comments can be added to expressions by prefacing them with //. This allows later developers to understand the logic behind an expression.
4.6
Functions Provided
Character manipulation e.g. CONCAT, LTRIM, UPPER Datatype conversion e.g. TO_CHAR, TO_DECIMAL Detect & correct errors e.g. ISNULL, REPLACECHR Manipulate dates e.g. GET_DATE_PART, DIFF_DATES Mathematical operations e.g. LOG, POWER, SQRT More mathematical operations e.g. SIN, COS, TAN Special constructs, e.g. IIF, DECODE Test values e.g. ISNULL, IS_DATE, IS_NUMBER Update variables e.g. SETVARIABLE, SETMINVARIABLE
Module 4: Expression and Filter Transformations
6 of 19
Tip
Highlighting a function and pressing F1 will launch the online help at the selected function section.
Note
All expressions resolve to a single value of a specific datatype. For example, the expression LENGTH (HELLO WORLD) / 2 returns the numerical value 5.5.
4.7
Variable Ports
Use to:
Simplify complex expressions
Example: extract month from a date for use in several output ports
Variable ports are not visible in Normal view, only in Edit view
7 of 19
Note
Variable ports cannot be output directly. To output the contents of a variable port, create an output port whose value is the variable.
Creating
A transformation variable is created by creating a port and selecting the V check box. When V is checked, the I and O checkboxes are unavailable (grayed out), indicating that a variable port can not be used for input or output.
4.8
Variables are initialized (numeric to 0, string to ) when the Mapping logic is processed
8 of 19
Note
When a record is processed, the expression is evaluated and the result is assigned to the variable port. The result must be compatible with the ports datatype. The variable persists across the set of records, and may be used or modified anywhere in the set of records.
4.9
Order of Evaluation
PowerCenter evaluates ports in the following order:
Input and Input/Output ports Variable ports Output ports
Variable ports are evaluated in the order they appear in the Ports tab Order of evaluation is critical when one variable refers to another
Reference to a variable that has not yet been evaluated will use the value from the previous row This is always Null for the first row in the datastream can cause errors!
9 of 19
4.10
Expression Validation
The Validate and OK buttons in the Expression Editor both parse the current expression to:
Resolve references to ports in other transformations Parse default values Check spelling, correct number of arguments in functions, other syntactical errors
10 of 19
4.11
Expression Example 1
Check, Clean, and Record Errors
Clean up item name
Some item names are in UPPERCASE, some in lower case, some in MiXEd They should all be in Title Case
Missing data
Some records are incomplete
Invalid dates
Sometimes dates are input in an invalid format
Invalid numbers
Certain numeric fields sometimes contain nonnumeric data
Reporting
Need a count of the changes to item names Incorrect and missing data should be tagged and a report generated
11 of 19
4.12
12 of 19
4.13
Expression Example 2
Calculate Sales Discounting and Inventory Days
Discount tracking
Compare the suggested sale price to the actual sale price to determine the level of discounting Create a field that tracks this comparison for reporting
Days in Inventory
Determine how long a given item has been in inventory
13 of 19
4.14
Strings are more expensive for the server to process than mathematical calculations Avoid default data type/size for new ports (string, 10) unless applicable
14 of 19
4.15
Filter Transformation
Passes rows conditionally
15 of 19
Type Description
Active Allows rows which meet the filter condition are passed through to the next transformation. Rows which do not meet the filter condition are skipped.
Business Purpose
A business may choose not to process records which do not meet a data quality criterion.
4.16
Filter Example
Existing customer records need to be updated to reflect changes to columns such as address
However, only existing customer records are to be updated not new customer records
Use a Filter transformation to pass only customer records with a legitimate master customer ID number
16 of 19
4.17
17 of 19
4.18
18 of 19
4.19
Summary
This module showed you how to: Use Expression transformations to perform calculations on a row-by-row basis Use Filter transformations to pass rows based on userdefined conditions
19 of 19
4.20
20 of 19
5.1
5.2
Module Objectives
After completing this module you will be able to: Use pipeline branches in Mappings Join homogeneous Sources using a Source Qualifier Define heterogeneous joins using the Joiner transformation Merge records from multiple places into one record set using the Union transformation Create and use reusable transformations
2 of 22
5.3
Source Pipelines
Each Source Qualifier transformation starts a single Source pipeline
A single Mapping can have multiple Source pipelines Each pipeline must terminate with at least one Target Transformations can split one source pipeline into multiple pipeline branches
1 mapping
T
Source pipeline
T T
T
2 pipeline branches
SQ
T T T
3 of 22
5.4
Multiple Pipelines
A single Mapping may contain more than one pipeline
SQ
2 source pipelines
T T
T T
T T
T
1 mapping
SQ
4 of 22
5.5
Homogeneous Joins
Homogeneous joins combine data from tables in the same database related by a common field The join is specified in the Source Qualifier transformation The join is performed on the Source database at runtime
When SQL generated by the SQ transformation executes
Source 1 Source 2
SQ
5 of 22
5.6
6 of 22
5.7
7 of 22
5.8
Heterogeneous Joins
Heterogeneous joins are joins using dissimilar sources, such as
Oracle table and DB2 table Flat file and database table Two flat files
Use a Joiner transformation (performs the join within the mapping) One source is designated the Master, the other Detail The Joiner selects rows from the two sources based on a join condition, such as a matching ID field
Join Results
8 of 22
Pipelines
A Joiner transformation combines two pipelines into a single transformation. Specifically, the pipeline from the Detail source ends at the Joiner, flowing into the pipeline from the Master source.
5.9
Joiner Transformation
Performs heterogeneous joins on two data flows
Ports Input or Input/Output One Source is designated Master, the other Detail M property indicates ports from the Master Source when checked
9 of 22
Type Description
Active Combines fields from two data sources into a single combined data source, based on one or more common fields called the join condition
Business Purpose
Enables data from different systems to be combined to achieve desired structure and results
5.10
Joiner Example
Sales transaction data resides on a flat file Product data resides on a relational table Sales transactions require product data
10 of 22
5.11
Join Types
Normal (inner) join keeps only matching rows based on the condition Master outer join - keeps all rows from Detail and matching rows from Master Detail outer join - keeps all rows from Master and matching rows from Detail Full outer join - keeps all rows from both Master and Detail
11 of 22
5.12
12 of 22
5.13
13 of 22
5.14
Joiner Cache
Two types of cache memory: index cache and data cache All rows from the master Source are read into cache
Index cache contains values for all ports from the master Source which are part of the join condition Data cache contains those port values not specified in the join condition
After the cache is loaded, each row in the detail Source is compared to the values in the index cache Upon a match, the rows from the data cache are included in the outgoing data stream
14 of 22
5.15
15 of 22
5.16
Union Transformation
Merges row sets from multiple pipelines
16 of 22
Type Description
Active Merges data from multiple pipelines or pipeline branches to a single pipeline branch, similar to the SQL statement UNION ALL. Does not remove duplicate rows.
Business Purpose
Enables you to convert data from multiple sources into a single rowset
5.17
Union Groups
Create groups on the Groups tab Each group has an identical set of ports
17 of 22
5.18
Group Ports
Create on the Group Ports tab
18 of 22
5.19
Union Ports
Each group port created appears in the Output group (i.e., PRODUCT_ID Copies of each port appear in the Input groups (i.e., PRODUCT_ID2, PRODUCT_ID3)
19 of 22
5.20
Reusable Transformations
Create in Transformation Developer Or create in Mapping Designer and promote
Listed in Transformations node of navigator Drag and drop into any mapping
20 of 22
Warning
Instance of reusable transformations inherit any changes to the reusable transformation. As a result, changing a reusable transformation may invalidate the mappings containing instances of the transformation.
Note
To make a non-reusable copy of a reusable transformation, hold the Ctrl key while dragging and dropping.
5.21
21 of 22
5.22
Summary
This module showed you how to: Use pipeline branches in Mappings Join homogeneous Sources using a Source Qualifier Define heterogeneous joins using the Joiner transformation Merge records from multiple places into one record set using the Union transformation Create and use reusable transformations
22 of 22
Lookup Transformations
6.1
Lookup Transformations
6.2
Module Objectives
After completing this module you will be able to: Use Lookup transformations to bring in additional data related to a row
2 of 12
Lookup Transformations
6.3
Lookup Functionality
Lookup Data Input value Input value Lookup value(s) Output with Lookup values
Lookup Transformation
Lookup condition: ITEM_ID = IN_ITEM_ID PRICE <= IN_PRICE
3 of 12
Lookup Transformations
6.4
Lookup Transformation
Returns values from a database table or flat file associated with a given input value
Ports Mixed Check L column for lookup values to be looked up Usage: Returns related values (if value not found returns null)
4 of 12
Type Description
Passive Allows the inclusion of additional information in the transformation process from an external database or flat file source. In SQL terms, may be thought of as a subquery. May be connected, unconnected, or dynamic.
Business Purpose
Allows data from external sources such as product codes, dates, names, etc., to be brought into the row being processed.
Lookup Transformations
6.5
Lookup Condition
Compares one or more input fields with fields in the lookup source
Similar to a SQL WHERE clause
5 of 12
Lookup Transformations
6.6
Lookup Cache
Uses index and data cache
Index cache contains values from all ports which are part of the lookup condition Data cache contains values from all output ports which are not part of the lookup condition
After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache
When a match is found, the rows from the cache are included in the stream
6 of 12
Note
Caching for lookup tables is an option. When the lookup references a flat file, caching is always performed.
Lookup Transformations
6.7
Persistent cache can improve performance, but stale data may pose a problem
Module 6: Lookup Transformations
7 of 12
Lookup Transformations
6.8
Caching Options
8 of 12
Description Overrides default SQL to query the lookup table only with caching enabled
Lookup Cache Directory The location on disk where files associated with the lookup cache are stored. Name Lookup Data and Index Cache Size The lookup cache is divided into an index cache and a data cache. The cache sizes represent upper boundaries on how much of the index and data caches will reside in memory. Any overflows are written to disk.
Lookup Transformations
6.9
9 of 12
Description The name of the table from which the transformation looks up values
What to do when the transformation finds multiple rows that match the lookup condition: first row, last row, any value, or error.
Displays the condition set in the Condition tab. Specifies the database containing the lookup table. Can use exact connection information or the $Source and $Target variables.
Lookup Transformations
6.10
10 of 12
Description Defaults to MM/DD/YYYY HH24:MI:SS Defaults to no separator. Can be set to a comma or a period (full stop).
Decimal Separator
If selected, the Integration Service differentiates between upper and lower case when matching lookup conditions.
Null Ordering
Determines whether null values are considered high or low. Defaults to high.
Sorted Input
Indicates whether the lookup file data is sorted. If it is, then checking this box makes the lookup more efficient.
Lookup Transformations
6.11
11 of 12
Lookup Transformations
6.12
Summary
This module showed you how to: Use Lookup transformations to bring in additional data related to a row
12 of 12
7.1
7.2
Module Objectives
After completing this module you will be able to: Calculate values based on data in a set of records using the Aggregator transformation Order a set of records based on one or more fields using the Sorter transformation
2 of 12
7.3
Sorter Transformation
Sort Order
Sort Keys
Ports Input/Output Define one or more sort keys Define sort order for each key
3 of 12
Type Description
Active Sorts incoming data based on one or more key values. Sort order may be ascending, descending, or mixed.
Business Purpose
Use before an Aggregator transformation to improve overall performance. The Sorter transformation is often more efficient than adding an ORDER BY clause to the Source Qualifier.
7.4
4 of 12
Description Determines whether the Sorter differentiates between upper and lower case characters.
Work Directory
A directory where the Integration Service will create temporary files when sorting data.
Distinct
Treats output rows as distinct. If this is selected, all ports are considered as part of the sort key.
Transformation Scope
Transaction: Applies transformation logic to all rows in a transaction. All Input: Applies the transformation logic on all incoming data, regardless of incoming transaction boundaries.
Other Properties
7.5
Sorter Cache
All the incoming data is passed into cache memory before the sort operation is stored in cache memory Size of the cache memory is set by the Sorter Cache Size property
May be from 1 MB to 4 GB If the cache size is larger than the available amount of memory, the Integration Service fails the session
If the size of the incoming data is greater than the cache size, PowerCenter uses temporary files
The location of these files is set using the Work Directory property
5 of 12
7.6
Aggregator Transformation
Performs aggregate calculations
Ports Mixed I/O ports allowed Variable ports allowed Group By allowed Create aggregate expressions in non-input ports Usage Standard aggregations
6 of 12
Type Description
Active Calculates aggregates such as sums, averages, minimums and maximums, across multiple groups of rows.
Business Purpose
Enables calculation of gross profits or margins, summaries by period, average values, etc.
7.7
Aggregator Properties
7 of 12
Description Local directory for the index and data cache file
Tracing Level
Sorted Input
Indicates input data is presorted by group. Use only if the mapping passes sorted data to the Aggregator.
Data cache size for the transformation. Default size is set to Auto.
Index cache size for the transformation. Default cache size is set to Auto.
Transformation Scope
Transaction: applies transformation logic to all rows in a transaction. All input: applies the transformation logic on all incoming data.
7.8
Aggregator Cache
Two types: Index and Data
Index contains group by port values Data cache contains all port values, including variable and connected output ports
Non group by input ports used in non-aggregate output expressions Non group by input/output ports Local variable ports Ports containing aggregate function (multiply by three)
One output row is returned for each unique occurrence of the group by ports
8 of 12
Key Points
If there is not enough memory specified in the index and data cache properties, overflow is written to disk No rows are returned until all rows are aggregated Checking the sorted input attribute bypasses caching, as well as the sort operation that occurs implicitly in an Aggregator
7.9
Aggregate Expressions
Aggregate functions
AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE
Conditional Aggregate expressions are supported: Conditional SUM format: SUM(value, condition)
Module 7: Sorter and Aggregator Transformations
9 of 12
7.10
Data Concatenation
Determines whether some ports (data flow arrows) can bypass a transformation Works only if:
Combining branches of the same source pipeline AND neither branch contains an active transformation
ALLOWED
DISALLOWED
Passive T T T
Active T
10 of 12
7.11
11 of 12
7.12
Summary
This module showed you how to: Calculate values based on data in a set of records using the Aggregator transformation Order a set of records based on one or more fields using the Sorter transformation
12 of 12
8.1
8.2
Module Objectives
After completing this module you will be able to: Use the Debug wizard and toolbar to debug a mapping
2 of 9
8.3
Debugger
The Debugger is a wizard-driven tool that runs a test session Allows you to
Follow a record across a mapping from transformation to transformation Set and modify breakpoints within a mapping Change data and variable values
3 of 9
8.4
4 of 9
8.5
5 of 9
8.6
6 of 9
8.7
Set Breakpoints
1. Edit breakpoint 2. Choose global or specific transformation 3. Choose to break on data condition or error optionally skip rows 4. Add data conditions for breakpoint 5. Add breakpoint(s)
7 of 9
8.8
8 of 9
8.9
Summary
This module showed you how to: Use the Debug wizard and toolbar to debug a mapping
9 of 9
8.10
10 of 9
9.1
9.2
Module Objectives
After completing this module you will be able to: Use an Update Strategy transformation to determine how the Target should handle records (insert, update, delete)
2 of 7
9.3
Ports All input / output Specify the Update Strategy Expression IIF or DECODE logic determines how to handle the record Example Updating Slowly Changing Dimensions
3 of 7
Type Description
Active Tags a row with the appropriate DML (Data Manipulation Language) for PowerCenters writer to apply to the relational target. Each row can be tagged with one of the tags shown on the following slide.
Business Purpose
A target table may require historical information dealing with existing entries. Rows written to a target table, based on one or more criteria, may need to be inserted, updated, or deleted. The Update Strategy transformation meets this requirement.
9.4
4 of 7
Note
For the row tags DD_DELETE and DD_UPDATE, the table definition in the mapping must have a key identified. Otherwise, the session created from the mapping will fail. If the Forward Rejected Rows attribute is checked (default), then rows tagged with DD_REJECT will be passed on to the next transformation or the Target, and subsequently placed in the appropriate bad file. If the attribute is unchecked, then the reject rows will be skipped.
9.5
Expression is evaluated for each row Rows are tagged according to the logic of the expression Appropriate SQL is submitted to the target database: insert, delete or update DD_REJECT means the row will not have SQL written for it. Rejected rows may be forwarded through the mapping
5 of 7
Performance Considerations
Update Strategy performance can vary depending on the number of updates and inserts. In some cases there may be a performance benefit to splitting a mapping with updates and inserts into two mappings and sessions, one performing the inserts and one the updates.
9.6
6 of 7
9.7
Summary
This module showed you how to: Use an Update Strategy transformation to determine how the Target should handle records (insert, update, delete)
7 of 7
9.8
8 of 7
Mapping Techniques
10.1
Mapping Techniques
10.2
Module Objectives
After completing this module you will be able to: Set and use system and Mapping variables and parameters Use unconnected Lookup transformations to provide values on an as-needed basis Use a Router transformation to divide a single set of records into multiple sets of records
2 of 24
Mapping Techniques
10.3
System Variables
SYSDATE
SESSSTARTTIME
$$$SessStartTime
Returns the system date value as a string. Uses system clock on machine hosting PowerCenter Server
Format of the string is database type dependent Used in SQL override Has a constant value
Module 10: Mapping Techniques
3 of 24
Description
System variables hold information derived from the system. The user cannot control the content of the variable but can reference the information contained within the variable.
Business Purpose
Mapping Techniques
10.4
Example
The developer needs to set the value of a port to indicate when the record was last updated by PowerCenter Create a port called LAST_UPDATED and set its value to the expression SYSDATE
4 of 24
Mapping Techniques
10.5
5 of 24
Description
A Mapping can utilize parameters and variables to store information during execution. Each parameter and variable is defined with a specific datatype. Parameters are different from variables in that parameters are fixed for the run of the Mapping, while variables can change (vary). Both can be accessed from anywhere in the Mapping.
Business Purpose
Mapping variables and parameters are used: To simplify Mappings by carrying information within or between transformations To improve maintainability by allowing quick changes to values in a Mapping
Mapping Techniques
10.6
User-defined ($$name)
6 of 24
Scope
Parameters and variables can be used only inside the object in which they are created. A Mapping variable created for Mapping_1 is available only within that Mapping and cannot be used by another Mapping or Mapplet in the same workflow. A parameter or variables scope is the object in which it was created. The PowerCenter Server uses the aggregation type of a mapping variable to determine the final current value of the mapping variable. In a session with multiple partitions, the PowerCenter Server combines the variable value from each partition and saves the final value into the repository. Aggregation types include Count (for integer datatype), Max, and Min. IsExpVar determines how the Integration Service expands the parameter in an expression string. If true, the IS expands the parameter after parsing the expression. Default is false. If this is true and the parameter type is not String, the IS fails the session.
Aggregation Type
IsExpVar
Mapping Techniques
10.7
Mapping Techniques
10.8
SETCOUNTVARIABLE($$Variable)
Increments a counter variable If the row is marked for Insert, add 1; if it is marked for Delete, subtract 1; otherwise, do not change
SETMAXVARIABLE($$Variable,value)
Sets the variable to the larger of its current value and the value passed to the function
SETMINVARIABLE($$Variable,value)
Sets the variable to the smaller of its current value and the value passed to the function
Module 10: Mapping Techniques
8 of 24
Variable Persistence
At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined. The final value written to the repository for a variable that has an Aggregate type of MAX will be whichever is greater, the current value or the initial value. Similarly, the final value for a variable with an Aggregate type of MIN will be whichever is smaller, the current value or the initial value.
Naming Convention
User-defined variable and parameter names always begin with $$ (i.e., $$ParamName or $$VariableName).
Mapping Techniques
10.9
9 of 24
Parameter File
A file that holds information about definitions of variables and parameters Values for variables that were saved in the Repository after successful completion of a Session
The initial value, as set by the user when creating the variable or parameter Set by the system
Default Value
Mapping Techniques
10.10
Potential Value
100 'US'
'Y' 'N'
Param Param
$$YES_1_CHAR $$NO_1_CHAR
SESSSTARTTIME
Variable
$$LAST_RUN_DT
10 of 24
Mapping Techniques
10.11
Unconnected Lookup
Commonly used when a Lookup not needed for every record No links from/to other transformations Lookup data is called at the point in the Mapping that needs it Lookup function can be set within any transformation that supports expressions Unconnected Lookup
11 of 24
Type
Passive
Description
Unconnected Lookups allow the inclusion of additional information in the transformation process from an external source when they are referenced within any transformation that supports expressions.
Business Purpose
A source table may have a small percentage of records with incomplete data. These holes in the data can be filled by performing a lookup to another table or tables, on an as-needed basis.
Mapping Techniques
10.12
12 of 24
Mapping Techniques
10.13
Lookup function
Condition is evaluated for each row but Lookup function is called only if condition is satisfied
13 of 24
Key Points
Use the Lookup function (:lkp.lookupname) within a conditional expression The condition is evaluated for each row but the Lookup function is only called when the record requires it Data from several input ports may be passed to the Lookup transformation, but only one port may be returned, as designated by the R(eturn) property in the Lookup transformation If no port is set as R the mapping will not be invalid, but the session may fail at runtime
Mapping Techniques
10.14
Lookup Condition (true for 2 percent of all rows) (called only when condition is true)
14 of 24
Mapping Techniques
10.15
Must select a Return port in the Ports tab, or expression that calls lookup is invalid.
15 of 24
Mapping Techniques
10.16
Key Points
An unconnected lookup can improve performance if the lookup table is static Use the lookup with a conditional statement
The condition is evaluated only for those rows where the condition evaluates to TRUE
The transformation is called using the expression :lkp.lookupname Data from several input ports may be passed to the Lookup transformation but only one port may be returned
The port to be returned is designated by the Lookup transformations R (return) port If a port is not selected as the R port, the mapping will not be invalidated but the session will fail at runtime
16 of 24
Mapping Techniques
10.17
Part of the mapping data flow Returns multiple values (by linking output ports to another transformation) Executed for every record passing through the transformation More visible, shows where the lookup values are used Default values are used
Separate from the mapping data flow Returns one value - by checking the Return (R) port option for the output port that provides the return value Only executed when the lookup function is called Less visible, as the lookup is called from an expression within another transformation Default values are ignored
17 of 24
Mapping Techniques
10.18
Joiner
Advantages
Lookup
Can re-use cache across session runs Can re-use cache w/in mapping Can modify cache dynamically Can choose to cache or not to cache Can query relational table or flat file Inequality comparisons are allowed
18 of 24
Mapping Techniques
10.19
Router Transformation
Sends rows to different destinations based on filter conditions
Ports All input/output Specify filter conditions for each Group Usage Link source data in one pass to multiple filter conditions
19 of 24
Type Description
Active Passes row data to different groups based on filter-like conditions. A Router transformation has one input group, and one or more output groups, each of which has its own filter condition.
Business Purpose
Allows you to write records from a single source into multiple targets based on user-defined criteria.
Mapping Techniques
10.20
Router Groups
Input group (always one) User-defined output groups
Each group has one condition ALL group conditions are evaluated for each row One row can pass multiple conditions
Unlinked Group outputs are ignored Default group (always one) capture rows that fail all Group conditions
20 of 24
Performance Considerations
A Router transformation is functionally equivalent to several Filter transformations in parallel. However, performance can be substantially better, because a row is read once into the input group but evaluated multiple times, once for each condition.
Mapping Techniques
10.21
Router Functionality
POSITION_CODE SALES
POSITION_CODE = SALES
21 of 24
Note
In the diagram above, the record for a salesperson hired less than 90 days ago will be routed to both STG_EMPLOYEES_NEW and STG_EMPLOYEES_SALES. If you wish to prevent single records from being routed to multiple Targets, you must ensure that their filter conditions are mutually exclusive. In the example above, you would prefix the conditions for the positionbased tables with DATE HIRED >= 90 DAYS AGO AND. Note that multiple target objects can be instances of the same target table. If this is the case, only one INSERT statement will be generated per record.
Mapping Techniques
10.22
22 of 24
Mapping Techniques
10.23
23 of 24
Mapping Techniques
10.24
Summary
This module showed you how to: Set and use system and Mapping variables and parameters Use unconnected Lookup transformations to provide values on an as-needed basis Use a Router transformation to divide a single set of records into multiple sets of records
24 of 24
11.1
11.2
Module Objectives
After completing this module you will be able to: Describe Mapplets Use a Mapplet in a mapping Describe Worklets
2 of 18
11.3
Mapplets
A Mapplet contains transformations and may be embedded into a Mapping
3 of 18
Type Description
Passive or Active Mapplets combine multiple mapping objects for reusability; they can also simplify complex mapping maintenance. A Mapplet receives input data from either an internal Source or the Mapping pipeline that calls the Mapping. A Mapplet must pass data out to the Mapping via a Mapplet Output transformation.
Note
Mapplets are reusable by nature a Mapping uses an instance of a Mapplet. These instances inherit all changes to the parent Mapplet, which may affect the behavior of the Mappings that use the instances.
11.4
Mapplet Designer
4 of 18
Example
A business, as part of its daily sales, needs to apply discounts, performing a number of lookups and aggregating the sales values. This functionality is used in several types of feeds, so the Mapplet shown here was created to provide this functionality, identically, in many Mappings.
11.5
Passive Ports Output ports only Usage Only those ports connected from an Input transformation to another transformation will display in the resulting Mapplet
Transformation
Transformation
Connecting the same port to more than one transformation is disallowed Pass to an Expression transformation first
5 of 18
Type
Passive
Description
11.6
6 of 18
11.7
Usage Only those ports connected to an Output transformation (from another transformation) will display in the resulting Mapplet One (or more) Mapplet Output transformations are required in every Mapplet
Module 11: Mapplets and Worklets
7 of 18
Type
Passive
Description
11.8
8 of 18
11.9
9 of 18
11.10
10 of 18
Warning
When the Mapplet is expanded at runtime, an unconnected output group could result in a transformation having no output connections. If that is not permitted, then the mapping will be invalid. For example: If the Mapplet outputs are fed by an Expression transformation, the mapping is invalid because an Expression requires a connected output. But if the Mapplet outputs are fed by a Router, the mapping is valid because a Router can have unconnected output groups.
11.11
Mapplet
11 of 18
Note
Mapplets cannot be nested that is, you cannot use a Mapplet inside another Mapplet.
11.12
Active Mapplet
12 of 18
11.13
13 of 18
11.14
14 of 18
11.15
Worklet
An object representing a set or grouping of Tasks Can contain any Task available in the Workflow Manager Worklets expand and execute inside a Workflow A Workflow which contains a Worklet is called the parent Workflow Worklets CAN be nested (unlike Mapplets) Reusable Worklets create in the Worklet Designer Non-reusable Worklets create in the Workflow Designer
15 of 18
Description
Worklets are optional processing objects inside Workflows. They contain PowerCenter tasks that represent a particular grouping of, or functionally related set of, tasks. They can be created directly in a Workflow (nonreusable) or in the Worklet Designer (reusable)
Business Purpose
A Workflow may contain dozens of tasks. During Workflow design they will develop naturally in groupings of meaningfully-related tasks, run in the appropriate operational order. The Workflow can run as-is, from start to finish, executing task-by-task, or the developer can place natural groupings into Worklets. A Worklets relationship to a Workflow is like that of a Mapplet to a Mapping.
11.16
Creating Worklets
In the Worklet Designer, select Worklets | Create
Worklets Node
16 of 18
11.17
17 of 18
11.18
Summary
This module showed you how to: Describe Mapplets Use a Mapplet in a mapping Describe Worklets
18 of 18
Controlling Workflows
12.1
Controlling Workflows
12.2
Module Objectives
After completing this module you will be able to: Set and use workflow variables Use link conditions and Decision tasks to control the execution of a workflow Use other workflow tasks: Email, Event Wait, Event Raise, Command Explain the purpose of the pmcmd utility Schedule workflows to run automatically
2 of 30
Controlling Workflows
12.3
Link Conditions
You can set conditions on workflow links:
If the link condition is True, the next task is executed If the link condition is False, the next task is not executed
To set a condition, right-click a link and enter an expression that evaluates to True or false.
You can use workflow variables in the condition (next slide)
Note: the words SUCCEEDED and FAILED are reserved for use in expressions
3 of 30
Reserved Words
In addition to SUCCEEDED and FAILED, the words DISABLED, NOT STARTED, STARTED, STOPPED and ABORTED are reserved for use in link conditions. Consult the Workflow Administration guide for valid values for all Predefined variables.
Controlling Workflows
12.4
Incoming Links
OR run the task as soon as any link condition is TRUE AND run the task when all link conditions are TRUE
4 of 30
Controlling Workflows
12.5
Task-specific variables
Types
Business Purpose
A workflow can contain multiple tasks and multiple pipelines. One or more tasks or pipelines may be dependent on the status of previous tasks. Workflow variables convey that information from one task to another.
Controlling Workflows
12.6
Variables can persist across sessions in a workflow The value is saved in the repository
6 of 30
Controlling Workflows
12.7
7 of 30
Note
Predefined workflow variables are discussed in more detail in the Workflow Administration Guide
Controlling Workflows
12.8
Session 4 should not run if Session 3 takes more than one hour
Test the system variable WORKFLOWSTARTTIME in the Link condition expression
Session 1 Start Session 3 Session 4 Session 2
8 of 30
Controlling Workflows
12.9
9 of 30
Description
Can establish the value of a Workflow Variable, whose value can be used at a later point in the workflow, as testing criteria to determine if or when other workflow tasks/pipelines should be run.
Business Purpose
Running a workflow task may depend on the results of other tasks or calculations in the workflow. An Assignment task can do certain calculations to establish the value for a workflow variable. This value may determine whether other tasks or pipelines are run.
Controlling Workflows
12.10
10 of 30
Controlling Workflows
12.11
Decision Task
Tests for a condition during the workflow and sets a flag based on the condition Use a link condition (or a Control task) downstream to test the flag and control execution flow Can use workflow variables in condition
11 of 30
Description
Decision tasks enable workflow designers to set criteria by which the workflow will or will not proceed to the next set of tasks, depending on whether the set criteria is true or false
Business Purpose
Commonly, workflows have multiple paths. Some are simply concurrent tasks. Others are pipelines of tasks that should only run if the previous tasks are successful. Still others should be run only if those tasks are not successful. What determines the success or failure of a task or group of tasks is user-defined, depending on the business-defined rules and operational rules of processing. The criteria are set as the decision condition in a Decision task, and subsequently tested for a True or False condition
Controlling Workflows
12.12
12 of 30
Controlling Workflows
12.13
Email Task
Sends an email within a workflow
Note: emails can also be sent post-session in a Session task
Can be used with a link condition to notify success or failure of prior tasks
13 of 30
Description
Email tasks enable PowerCenter to send email messages at various points in a workflow. Users can define email addresses, a subject line, and the email message text. When called from within a Session task, the message text can contain variable Session-related metadata for example, one message for Session success and another for failure.
Business Purpose
Various business and operational staff may need to be notified of the progress of a workflow, the status of tasks (or combinations of tasks) within it, or various metadata results of a session
Performance Considerations
The PowerCenter domain must be configured to use a running, configured email server. However, the impact of the Integration Service sending the emails is minimal
Controlling Workflows
12.14
14 of 30
Description
Event Wait tasks wait for either the presence of a named flat file (a predefined event) or some other user-defined event to occur in the workflow processing. Note that the Workflow must be running in order to recognize a pre-defined event.
Business Purpose
An Event Wait task watching for a flat file by name is placed in a workflow because some subsequent processing is dependent on the presence of the file. An Event Wait task waiting for the occurrence of a user-defined event will be strategically placed so that the workflow should not proceed further until some other set of tasks and conditions has occurred. It always works in concert with an Event Raise task.
Controlling Workflows
12.15
15 of 30
Controlling Workflows
12.16
16 of 30
Controlling Workflows
12.17
17 of 30
Description
Event Raise tasks are always used in conjunction with user-defined Event Wait tasks. They send a signal to an Event Wait task that a set of pre-defined events has occurred, along the pipeline from the Start task to the Event Raise task.
Business Purpose
This task allows signals to be passed from one spot in the workflow to another that a particular series of predetermined events has occurred.
Controlling Workflows
12.18
18 of 30
User-Defined Events
For a user-defined event, the developer: 1. Defines an event in the workflow properties (prior to workflow processing) 2. Includes an Event Wait task at a suitable point in the workflow, where further processing must await some specific event 3. Includes an Event Raise task at a suitable point in the workflow, e.g., after a parallel pipeline has been completed.
Controlling Workflows
12.19
Command Task
Specifies one or more UNIX command or shell script, DOS command or batch file for Integration Services to run during a workflow
Note: UNIX and DOS commands can also be run pre- or postsession in a Session task
19 of 30
Description
Command tasks are inserted in workflows and worklets to enable the Integration Service to run one or more OS commands of any nature. All commands or batch files referenced must be executable by the OS login that owns the Integration Service process.
Business Purpose
OS commands can be used for any operational or business unit related procedure, and can be run at any point in a workflow. Command tasks can be set to run one or more OS commands or scripts/batch files, before proceeding to the next task in the workflow. If more than one command is coded into a Command Task, the entire task can be set to fail if any one of the individual commands fails. Additionally and optionally, each individual command can be set not to run if a preceding command fails.
Controlling Workflows
12.20
20 of 30
A Session task that produces an output file can be followed by a Command task that copies the file to another directory, or FTPs the file to another box location. The command syntax is the same as that which would accomplish this at the OS command prompt on the Integration Service machine.
A Session task that relies on a flat file as source data can be preceded by a Command task that verifies the presence of the file, opens it and verifies control totals or record counts with some external source of information.
A series of multiple concurrent or sequential Sessions can be followed by a single Command task coded to copy or move all session logs created by the workflow to a special daily backup directory.
Controlling Workflows
12.21
Timer Task
Waits for a specified period of time to execute the next task
General Tab Timer Tab
21 of 30
Controlling Workflows
12.22
Control Task
Stops, fails, or aborts the Worklet or Workflow
22 of 30
Controlling Workflows
12.23
Control Options
Description Marks the Control task as Failed. The PowerCenter Server fails the Control task if you choose this option. Marks the status of the workflow or worklet that contains the Control task as failed after the workflow or worklet completes.
Stops the workflow or worklet that contains the Control task. Stop Parent Abort Parent Aborts the workflow or worklet that contains the Control task. Fail Top-Level Workflow Fails the workflow that is running. Stops the workflow that is running. Stop Top-Level Workflow Abort Top-Level Workflow Aborts the workflow that is running. Note: Fail = Complete but set status to failed, Stop = Stop executing after orderly shutdown, Abort = Stop executing immediately
Module 12: Controlling Workflows
23 of 30
Note
The Control task can fail, stop, or abort either the parent Workflow or the top-level Workflow. However, stopping or aborting the parent Workflow means that no further progress takes place along that branch in the top-level Workflow. This can cause the top-level Workflow to stop if there is no other branch.
Controlling Workflows
12.24
Reusable Tasks
Session, Email and Command tasks can be reusable Use the Task Developer to create reusable tasks Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow In a workflow, a reusable task is indicated by a special symbol
24 of 30
Business Purpose
Occasionally, a certain mapping logic may be required to run in multiple workflows. Since a mapping is reusable, the developer can code multiple sessions, all based on the same mapping. However, it is simpler to create a reusable session based on the mapping. Once created in the Task Developer, an instance of the Reusable Session can be placed in any workflow or Worklet.
Performance Considerations
Use reusable session tasks sparingly. Retrieving the metadata for a reusable session task and its child instances from the repository takes longer than retrieving the metadata for a non-reusable session task.
Controlling Workflows
12.25
pmcmd utility
Line command utility providing most Workflow Manager operations, e.g. start workflow Example of syntax: pmcmd startworkflow -sv MyIntService -d MyDomain -u seller3 -p jackson -f SalesEast wf_SalesAvg
Note: The password can be provided through the PASSWORD environmental variable. To do this, you can encrypt the password using the pmpasswd utility on the PowerCenter Services machine and then enter the encrypted password in pmcmd.
Module 12: Controlling Workflows
25 of 30
Description
The pmcmd command line utility allows the developer to perform most Workflow Manager operations outside of the PowerCenter client tool. These commands can be used in batch files.
Controlling Workflows
12.26
Workflow Scheduler
Set and customize workflow-specific schedule
26 of 30
Controlling Workflows
12.27
27 of 30
Controlling Workflows
12.28
28 of 30
Controlling Workflows
12.29
29 of 30
Controlling Workflows
12.30
Summary
This module showed you how to: Set and use workflow variables Use link conditions and Decision tasks to control the execution of a workflow Use other workflow tasks: Email, Event Wait, Event Raise, Command Explain the purpose of the pmcmd utility Schedule workflows to run automatically
30 of 30
13.1
13.2
Module Objectives
After completing this module you will be able to: Follow best practices for mapping design
2 of 4
13.3
3 of 4
13.4
Summary
This module showed you how to: Follow best practices for mapping design
4 of 4
14.1
14.2
Module Objectives
After completing this module you will be able to: Follow best practices for workflow design
2 of 4
14.3
3 of 4
14.4
Summary
This module showed you how to: Follow best practices for workflow design
4 of 4
15.1
15.2
Objectives
After completing this module you will be able to: Describe New Features in PowerCenter 9.0:
Lookup transformation enhancements SQL transformation enhancements XML Parser enhancements Verbose Logging enhancement License Management enforcement Integration Service log file rollover Mapping Architect for Visio Additional Transformations infacmd Command Line enhancements
2 of 8
15.3
15
3 of 8
Cache Updates
You can update the lookup cache based on the results of an expression. When an expression is true, you can add to or update the lookup cache. You can update the dynamic lookup cache with the results of an expression
Database deadlock does not cause immediate session failure. The IS attempts to run the last statement in a lookup again. Number of retries and sleep interval are configurable.
You can configure the Lookup transformation to return all rows that match a lookup condition You can create an SQL override for uncached lookup. You can include lookup ports in the SQL query
15.4
4 of 8
When deleting and inserting records with referential integrity constraints, the order in which the operations are performed becomes important. Allows Referential Integrity constraints to be observed.
XML Transformation Parser can validate an XML document against a schema Routes invalid XML to an error port Routes messages to a separate output group
Verbose Log
15.5
Mapping Architect for Visio New mapping objects including Normalizer, Custom transformation
15
5 of 8
License Management Number of cores enforcement: Ensures that licensees do not exceed licensed number of cores Repository Licensing: Ensures that licensees do not exceed licensed number of repositories Integration Service Mapping Architect for Visio Session log file rollover: Limit the size of session logs for real-time sessions New Mapping Objects: Pipeline Normalizer, Custom Transformation, PowerExchange Source Definition, PowerExchange Target definition Can configure a transformation to use a shortcut You can create a Mapping template that contains these objects, shortcuts, or reusable transformations
15.6
6 of 8
Infacmd expanded to infacmd ds (data services) include management infacmd isp (Informatica service manager) of all Informatica application services infacmd oie (Object import and export) infacmd prs (Model Repository services) infacmd rtm (Analyst Tool services) infacmd sql (SQL data services) infacmd help <application service> for help on each service type
15.7
Summary
This module showed you how to: Describe New Features in PowerCenter 9.0:
Lookup transformation enhancements SQL transformation enhancements XML Parser enhancements Verbose Logging enhancement License Management enforcement Integration Service log file rollover Mapping Architect for Visio Additional Transformations infacmd Command Line enhancements
15
7 of 8
15.8
9002/61/1
llarevO gniniarT eht htiw deifsitas ma I .71 reyolpme ym rof tnemtsevni elihwhtrow a saw gniniart sihT .61 ecnamrofrep boj ym evorpmi lliw gniniart sihT .51
llarevO
gninrael ym ni loot evitceffe na saw tnempiuqe moorssalC .31 gninrael ot evicudnoc saw seitilicaf eht fo ytilauq ehT .21
yletelpmoc dna ylraelc derewsna erew snoitseuq tnedutS .01 snoitseuq ksa ot stneduts degaruocne rotcurtsni ehT .9 stpecnoc yek eht nrael .8 ot em elbane ot emit ssalc desu ylevitceffe rotcurtsni ehT aera tcejbus eht ni elbaegdelwonk saw rotcurtsni ehT .7
rotcurtsnI
hguone gnol saw sbal rof dettolla emit ehT .4 tnetnoc eht revoc ot hguone gnol saw htgnel esruoc ehT .3 sevitcejbo detats eht tem esruoc ehT .2 snoitatcepxe ym tem tnetnoc esruoc ehT .1
slairetaM dna tnetnoC esruoC
lufpleh erew rotcurtsni eht yb desu sdia lausiv ehT .6 lairetam esruoc eht gninrael ni depleh sesicrexe bal ehT .5
eergA
4
lartueN
3
laminiM
enoN
?stpecnoc BDR ot erusopxe fo level ruoy saw tahW .3 ?sloot yreuq LQS ot erusopxe fo level ruoy saw tahW .2 ?tcudorp eht ot erusopxe fo level ruoy saw tahW .1
____ rehtO ____ retekrameleT ____ rebmuN 008 ____ beW ?ssalc siht rof retsiger ot esu uoy did dohtem tahW
)tnirp(
noitamrofnI lanoitpO
)tnirp(
noitamrofnI deriuqeR
9002/61/1
li a m - e li a m - e rebmuN enohP rebmuN enohP e ma N e maN .sesruoc ruo fo eno morf tifeneb dluow leef uoy enoyna rof noitamrofni tcatnoc edivorp esaelP seY
5
oN
?su htiw ssenisub od ot reisae ti ekam dluow taht evah uoy od snoitseggus tahW ?ees ot ekil uoy dluow )s(esruoc pu wollof tahW ?tnemnorivne moorssalc eht gnivorpmi rof evah uoy od snoitseggus tahW ?esruoc eht fo yreviled reh/sih evorpmi ot rotcurtsni eht reffo uoy dluow snoitseggus tahW ?ralucitrap ni tinu ynA ?retrohs ro regnol edam eb esruoc eht dluohS ?ralucitrap ni bal ynA ?denetrohs ro denehtgnel eb emit bal eht dluohS
:sseL
?egarevoc sseL ?ssalc ni egarevoc erom eriuqer scipot tahW ?noitatneserp ro dna slairetam :tnemevorpmi esruoc rof tseggus uoy nac snoitadnemmocer tahW
:tsaeL
:eroM
?yhw dna uoy rof elbaulav tsael ,elbaulav tsom eht erew stinU tahW
Global Education Services Course Evaluation
:tsoM