Академический Документы
Профессиональный Документы
Культура Документы
Informatica PowerCenter®
(Version 7.1.1)
Informatica PowerCenter® Data Profiling Guide
Version 7.1.1
August 2004
This software and documentation contain proprietary information of Informatica Corporation, they are provided under a license agreement
containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. No
part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
without prior consent of Informatica Corporation.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software
license agreement as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR
12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to
us in writing. Informatica Corporation does not warrant that this documentation is error free.
Informatica, PowerMart, PowerCenter, PowerCenter Connect, PowerConnect, and PowerChannel are trademarks or registered trademarks of
Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade
names or trademarks of their respective owners.
Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington
University and University of California, Irvine, Copyright © 1993-2002, all rights reserved.
DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied,
including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information
provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or
changes in the products described in this documentation at any time without notice.
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
New Features and Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv
About Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
About this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xvi
Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Visiting the Informatica Webzine . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . xvii
Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xviii
iii
Configuring a Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Creating a Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Upgrading the Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . 13
Configuring a Relational Database Connection to the Data Profiling
Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Installing PowerAnalyzer Data Profiling Reports . . . . . . . . . . . . . . . . . . . . . 16
Installing Data Profiling XML Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Importing Data Profiling Schema and Reports . . . . . . . . . . . . . . . . . . . 16
Configuring a Data Source for the Data Profiling Warehouse . . . . . . . . . 20
Creating a Data Connector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Adding a Data Profiling Data Source to a Data Connector . . . . . . . . . . . 23
Configuring Default Data Profile Options . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Purging the Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
iv Table of Contents
Running Sessions from the Profile Manager . . . . . . . . . . . . . . . . . . . . . . . . 55
Configuring a Session in the Profile Wizard . . . . . . . . . . . . . . . . . . . . . 56
Running a Session when You Create a Data Profile . . . . . . . . . . . . . . . . 59
Running a Session for an Existing Data Profile . . . . . . . . . . . . . . . . . . . 59
Monitoring Interactive Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Profiling Data Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Selecting a Function Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Selecting a Data Sampling Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Creating a Session in the Workflow Manager . . . . . . . . . . . . . . . . . . . . . . . 62
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table of Contents v
Row Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Column-Level Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Business Rule Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Domain Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Domain Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Aggregate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Distinct Value Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Intersource Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Orphan Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Join Complexity Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
vi Table of Contents
Server Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
DP Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
List of Figures ix
x List of Figures
List of Tables
Table 2-1. Scripts for Creating a Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Table 2-2. Scripts for Upgrading the Data Profiling Warehouse . . . . . . . . . . . . . . . . . . . . . . . 13
Table 2-3. Data Connector Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Table 2-4. Default Data Profile Options - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table 2-5. Default Data Profile Options - Prefixes Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Table 4-1. Profile Run Properties for Profile Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 4-2. Session Setup Properties for Profile Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 4-3. Function Behavior with Data Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 5-1. Auto Profile Report Summary Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Table 5-2. Custom Profile Report Summary Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Table 5-3. Custom Profile Report Function Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Table 5-4. PowerAnalyzer Data Profiling Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 6-1. Prepackaged Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Table 6-2. perl Syntax Guidelines for Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 6-3. COBOL Syntax and perl Syntax Compared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Table 6-4. SQL Syntax and perl Syntax Compared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Table 7-1. Row Count Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Table 7-2. Business Rule Validation Source-Level Function Options . . . . . . . . . . . . . . . . . . . . 95
Table 7-3. Candidate Key Evaluation Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Table 7-4. Redundancy Evaluation Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Table 7-5. Row Uniqueness Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Table 7-6. Business Rule Validation Column-Level Function Options . . . . . . . . . . . . . . . . . . .102
Table 7-7. Domain Validation Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Table 7-8. Aggregate Functions to Add Based on the Column Datatype . . . . . . . . . . . . . . . . .107
Table 7-9. Aggregate Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108
Table 7-10. Distinct Value Count Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
Table 7-11. Orphan Analysis Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
Table A-1. DPR_LATEST_PRFLS Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
Table A-2. DPR_PRFL_AUTO_COL_FN_METRICS Column Information . . . . . . . . . . . . . .118
Table A-3. DPR_PRFL_CART_PROD_METRICS Column Information . . . . . . . . . . . . . . . .119
Table A-4. DPR_PRFL_COL_FN Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120
Table A-5. DPR_PRFL_COL_FN_METRICS Column Information . . . . . . . . . . . . . . . . . . . .122
Table A-6. DPR_PRFL_COL_FN_VERBOSE Column Information . . . . . . . . . . . . . . . . . . . .124
Table A-7. DPR_PRFL_CP_FN Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125
Table A-8. DPR_PRFL_FN_DTLS Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . .127
Table A-9. DPR_PRFL_OJ_FN Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129
Table A-10. DPR_PRFL_OJ_FN_VERBOSE Column Information . . . . . . . . . . . . . . . . . . . .131
Table A-11. DPR_PRFL_OJ_METRICS Column Information . . . . . . . . . . . . . . . . . . . . . . . .132
Table A-12. DPR_PRFL_RUN_DTLS Column Information . . . . . . . . . . . . . . . . . . . . . . . . .133
Table A-13. DPR_PRFL_SRC_FN Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . .134
List of Tables xi
Table A-14. DPR_PRFL_SRC_FN_METRICS Column Information . . . . . . . . . . . . . . . . . . .136
Table A-15. DPR_PRFL_SRC_FN_VERBOSE Column Information . . . . . . . . . . . . . . . . . . .137
Table A-16. DPR_PRFL_VER_DTLS Column Information . . . . . . . . . . . . . . . . . . . . . . . . . .138
xiii
New Features and Enhancements
This section describes new features and enhancements to Data Profiling 7.1.1.
♦ Data sampling. You can create a data profile for a sample of source data instead of the
entire source. You can view a profile from a random sample of data, a specified percentage
of data, or for a specified number of rows starting with the first row.
♦ Verbose data enhancements. You can specify the type of verbose data you want the
PowerCenter Server to write to the Data Profiling warehouse. The PowerCenter Server can
write all rows, only the rows that meet the business rule, or only the rows that do not meet
the business rule.
♦ Profile session enhancement. You can save sessions that you create from the Profile
Manager to the repository.
♦ Domain Inference function tuning. You can configure the Profile Wizard to filter the
Domain Inference function results. You can configure a maximum number of patterns and
a minimum pattern frequency. You may want to narrow the scope of patterns returned to
view only the primary domains, or you may want to widen the scope of patterns returned
to view exception data.
♦ Row Uniqueness function. You can determine unique and duplicate rows for a source
based on a selection of columns for the specified source.
♦ Define mapping, session, and workflow prefixes. You can define default mapping,
session, and workflow prefixes for the mappings, sessions, and workflows generated when
you create a data profile.
♦ Profile mapping display in the Designer. The Designer displays profile mappings under a
profile mappings icon in the repository folder.
xiv Preface
About Informatica Documentation
The complete set of documentation for PowerCenter includes the following books:
♦ Data Profiling Guide. Provides information about how to profile PowerCenter sources to
evaluate source data and detect patterns and exceptions.
♦ Designer Guide. Provides information needed to use the Designer. Includes information to
help you create mappings, mapplets, and transformations. Also includes a description of
the transformation datatypes used to process and transform source data.
♦ Getting Started. Provides basic tutorials for getting started.
♦ Installation and Configuration Guide. Provides information needed to install and
configure the PowerCenter tools, including details on environment variables and database
connections.
♦ PowerCenter Connect® for JMS® User and Administrator Guide. Provides information
to install PowerCenter Connect for JMS, build mappings, extract data from JMS messages,
and load data into JMS messages.
♦ Repository Guide. Provides information needed to administer the repository using the
Repository Manager or the pmrep command line program. Includes details on
functionality available in the Repository Manager and Administration Console, such as
creating and maintaining repositories, folders, users, groups, and permissions and
privileges.
♦ Transformation Language Reference. Provides syntax descriptions and examples for each
transformation function provided with PowerCenter.
♦ Transformation Guide. Provides information on how to create and configure each type of
transformation in the Designer.
♦ Troubleshooting Guide. Lists error messages that you might encounter while using
PowerCenter. Each error message includes one or more possible causes and actions that
you can take to correct the condition.
♦ Web Services Provider Guide. Provides information you need to install and configure the Web
Services Hub. This guide also provides information about how to use the web services that the
Web Services Hub hosts. The Web Services Hub hosts Real-time Web Services, Batch Web
Services, and Metadata Web Services.
♦ Workflow Administration Guide. Provides information to help you create and run
workflows in the Workflow Manager, as well as monitor workflows in the Workflow
Monitor. Also contains information on administering the PowerCenter Server and
performance tuning.
♦ XML User Guide. Provides information you need to create XML definitions from XML,
XSD, or DTD files, and relational or other XML definitions. Includes information on
running sessions with XML data. Also includes details on using the midstream XML
transformations to parse or generate XML data within a pipeline.
Preface xv
About this Book
The Informatica PowerCenter Data Profiling Guide provides information to install Data
Profiling, build data profiles, run profile sessions, and view profile results. It is written for the
database administrators and developers who are responsible for building PowerCenter
mappings and running PowerCenter workflows.
This book assumes you have knowledge of relational database concepts, database engines, and
PowerCenter. You should also be familiar with the interface requirements for other
supporting applications.
The material in this book is available for online use.
Document Conventions
This guide uses the following formatting conventions:
italicized monospaced text This is the variable name for a value you enter as part of an
operating system command. This is generic text that should be
replaced with user-supplied values.
Warning: The following paragraph notes situations where you can overwrite
or corrupt data, unless you follow the specified procedure.
bold monospaced text This is an operating system command you enter from a prompt to
run a task.
xvi Preface
Other Informatica Resources
In addition to the product manuals, Informatica provides these other resources:
♦ Informatica Customer Portal
♦ Informatica Webzine
♦ Informatica web site
♦ Informatica Developer Network
♦ Informatica Technical Support
Preface xvii
The site contains information on how to create, market, and support customer-oriented add-
on solutions based on Informatica’s interoperability interfaces.
Belgium
Phone: +32 15 281 702
Hours: 9 a.m. - 5:30 p.m. (local time)
France
Phone: +33 1 41 38 92 26
Hours: 9 a.m. - 5:30 p.m. (local time)
Germany
Phone: +49 1805 702 702
Hours: 9 a.m. - 5:30 p.m. (local time)
Netherlands
Phone: +31 306 082 089
Hours: 9 a.m. - 5:30 p.m. (local time)
Singapore
Phone: +65 322 8589
Hours: 9 a.m. - 5 p.m. (local time)
Switzerland
Phone: +41 800 81 80 70
Hours: 8 a.m. - 5 p.m. (local time)
xviii Preface
Chapter 1
1
Understanding Data Profiling
Data profiling is a technique used to analyze source data. PowerCenter Data Profiling can
help you evaluate source data and detect patterns and exceptions. PowerCenter lets you profile
source data to suggest candidate keys, detect data patterns, evaluate join criteria, and
determine information, such as implicit datatype.
You can use Data Profiling to analyze source data in the following situations:
♦ During mapping development
♦ During production to maintain data quality
From the Profile View and the Source View, you can perform the following tasks to manage,
run, and view data profiles:
♦ Create a custom profile. For more information, see “Creating a Custom Profile” on
page 38.
♦ View data profile details. You can view the details of an auto profile or custom profile.
♦ Edit a data profile. For more information, see “Editing a Data Profile” on page 47.
♦ Delete a data profile. For more information, see “Deleting a Data Profile” on page 49.
♦ Run a session. For more information, see “Running Sessions from the Profile Manager” on
page 55.
♦ Regenerate a profile mapping. You can regenerate a profile mapping to validate the
mapping.
♦ Check in profile mappings. You can check in profile mappings for versioned repositories
to commit the changes to the repository. When you check in an object, the repository
creates a new version of the object and assigns it a version number. For more information
on versioning, see the PowerCenter Repository Guide.
♦ Configure default data profile options. For more information, see “Configuring Default
Data Profile Options” on page 26.
♦ Configure domains for profile functions. For more information, see “Working with
Domains” on page 77.
♦ Purge the Data Profiling warehouse. For more information, see “Purging the Data
Profiling Warehouse” on page 29.
♦ Display the status of interactive profile sessions. For more information, see “Monitoring
Interactive Sessions” on page 59.
♦ Display PowerCenter Data Profiling reports. For more information, see “PowerCenter
Data Profiling Reports” on page 67.
Profile View
The Profile View tab displays all of the data profiles in the open folders in your repository.
Use the Profile View to determine the data profiles that exist for a particular repository folder.
Figure 1-1 shows the Profile Manager:
When you select the Source View tab in the Profile Manager, a source view tree displays data
profiles as nodes under the source definition for which you defined the data profile.
If you change or delete a data profile or a source or mapplet with a data profile, you can click
View-Refresh to refresh the Source View.
9
Overview
Data Profiling installs automatically with the PowerCenter Server and Client when you
purchase the Data Profiling Option. After you install PowerCenter with the Data Profiling
Option, you must complete configuration steps before you can create data profiles, run
profile sessions, or view profile reports.
Installing PowerAnalyzer
Before installing and configuring Data Profiling, install and configure PowerAnalyzer if you
want to view Data Profiling reports in PowerAnalyzer.
To use PowerAnalyzer for Data Profiling, you need the following product licenses:
♦ Application server license
♦ PowerAnalyzer license
PowerAnalyzer License
PowerAnalyzer Data Profiling reports use PowerAnalyzer and require a PowerAnalyzer license.
Informatica does not ship the PowerAnalyzer license key with the Data Profiling installation.
To obtain a PowerAnalyzer license, send a request to: productrequest@informatica.com.
Informatica provides the license key in an email with instructions on how to apply the license
to the product.
You do not need the license key to install PowerAnalyzer. If you have the PowerAnalyzer
license key at the time of installation, you can apply the license during the installation.
Otherwise, you can complete the installation and apply the license key afterward.
The PowerAnalyzer license is a restricted license. The license can only be used to run Data
Profiling and PowerCenter Metadata Reporter. For more information about the license,
contact Informatica Technical Support.
For more information about activating the PowerAnalyzer license or installing and
configuring PowerAnalyzer, see the PowerAnalyzer Installation Guide.
Overview 11
Configuring a Data Profiling Warehouse
When you install Data Profiling for the first time, create a Data Profiling warehouse for each
PowerCenter repository in which you want to store data profiles. When you upgrade from a
previous version, you must upgrade your Data Profiling warehouse.
Once you create a Data Profiling warehouse, you must configure a relational database
connection to the warehouse in the Workflow Manager.
Table 2-1 shows the script you must run for each database type:
Informix create_schema_inf.sql
Oracle create_schema_ora.sql
Sybase create_schema_syb.sql
Teradata create_schema_ter.sql
Table 2-2 shows the script you must run for each database type and PowerCenter Data
Profiling installation version:
5. Click Continue.
The Import Schemas page displays the following message:
The objects have been successfully imported to the target repository.
By default, the Publish to Everyone and Run Scheduled Reports After Import options are
selected.
5. If you do not want to give all PowerAnalyzer users access to the reports, clear the Publish
to Everyone option.
You can give specific individuals access to the Data Profiling reports. For more
information about providing access to a PowerAnalyzer report, see the PowerAnalyzer
Administrator Guide.
6. Clear the Run Scheduled Reports option.
7. Click Continue.
2. Click Add.
Required/
Property Description
Optional
System Name Required Enter the name of the data connector. The data connector name must be
unique. The system name can include any character except a space, tab,
newline character, and the following special characters:
\ / : * ? “ < > | ‘ &
Description Optional Enter a description for the data connector. The connector description can
be between 1 and 255 characters.
Primary Data Source Required Select the primary data source from the list of data sources available in
PowerAnalyzer. PowerAnalyzer uses this data source to connect to the
Data Profiling warehouse and read the metric and attribute data for a
report. For information about how PowerAnalyzer connects to the primary
and additional data sources, see the PowerAnalyzer Administrator Guide.
Primary Time Optional This option does not apply to Data Profiling reports.
Dimension
Additional Schema Optional This option does not apply to Data Profiling reports.
Mappings
To add the Data Profiling warehouse data source to the system data connector:
6. Click Add.
PowerAnalyzer displays the additional schema mappings for the system data connector.
7. Click OK.
You can now run Data Profiling reports using the system data connector.
Required/
Option Description
Optional
Always save changes Optional Select to always save changes to the profile mapping before running a
before interactive run profile session interactively. If you clear this option, the Designer
prompts you to save changes before you run an interactive session.
Display Profile Optional Select to always launch the Profile Manager after you create a data
Manager after profile. If you clear this option, the Profile Manager does not launch
creating a profile automatically after you create a data profile.
Always run profile Optional Select to always run a profile session interactively when you create a
interactively data profile. If you clear this option, you can still run auto and custom
profile sessions interactively from the Profile Manager. For more
information about running interactive sessions, see “Running Sessions
from the Profile Manager” on page 55. For more information about
creating custom profiles, see “Creating a Custom Profile” on page 38.
Check in profile Optional Select if you want the Designer to check in profile mappings when you
mapping when profile save changes for versioned repositories. Saving versions of profile
is saved mappings in your repository can consume large amounts of disk space.
Make sure you have enough disk space on the machine hosting the
repository.
Always invoke auto Optional Select to display the Auto Profiling dialog box when you create a new
profiling dialog auto profile. If you clear this option, the Auto Profiling dialog box does
not display when you create a new data profile. Also, you cannot
configure Domain Inference tuning options and verbose data loading
options when you create the auto profile. However, if you clear this
option and create an auto profile for a source with 25 or more columns,
the Auto Profiling dialog box displays. For more information about
creating auto profiles, see “Creating an Auto Profile” on page 34.
Use source owner Optional Select to add the table owner name to relational sources when the
name during profile Designer generates a profile mapping.
mapping generation Note: If the owner name changes after you generate the profile mapping,
you must regenerate the mapping. You can regenerate a profile mapping
in the Profile Manager. For more information about regenerating profile
mappings, see “Using the Profile Manager” on page 6.
4. Enter the name of a text editor for the session log file.
By default, the Profile Manager selects Wordpad as the text editor.
5. Enter the location where you want your session log files to be written.
6. Click the Prefixes tab.
Required/
Option Description
Optional
Profile Mapping Required Edit the prefix to use with all profile mapping names. Profile mappings
Prefix use the following naming convention:
<prefix>_<Profile Name>
The default prefix is m_DP_.
The prefix must be 1 to 10 characters. It cannot contain spaces.
Profile Workflow Required Edit the prefix to use with all profile workflow names. Profile workflows
Prefix use the following naming convention:
<prefix>_<Profile Name>
The default prefix is wf_DP_.
The prefix must be 1 to 10 characters. It cannot contain spaces.
Profile Session Prefix Required Edit the prefix to use with all profile session names. Profile sessions use
the following naming convention:
<prefix>_<Profile Name>
The default prefix is s_DP_.
The prefix must be 1 to 10 characters. It cannot contain spaces.
8. Click OK.
4. Select the folder from which you want to purge metadata and/or profile session results.
The folder must be open to purge it. Or, select All Open Folders to purge metadata and/
or profile session results from it.
31
Overview
You can create, edit, and delete data profiles. Data profiles contain a set of functions to apply
to a specified set of source data. The functions return the metadata about the profile sources
that make up the data profile reports.
You can create the following types of data profiles:
♦ Auto profile. Contains predefined functions for profiling source data. You can use an auto
profile during mapping or mapplet development to learn more about your source data. For
more information about creating an auto profile, see “Creating an Auto Profile” on
page 34.
♦ Custom profile. You create and add functions to create a custom profile. You can use a
custom profile during mapping and mapplet development to validate documented
business rules about the source data. You can also use a custom profile to monitor data
quality. For more information about creating a custom profile, see “Creating a Custom
Profile” on page 38.
When you create a data profile, the Designer generates a mapping based on the profile
functions. When you run a session for the mapping, the PowerCenter Server writes profile
data to the Data Profiling warehouse.
Once you create a data profile, you can edit and delete the data profile. For more information
about editing a data profile, see “Editing a Data Profile” on page 47. For more information
about deleting a data profile, see “Deleting a Data Profile” on page 49.
Profile Functions
You can add multiple profile functions to a data profile. Profile functions are calculations you
perform on the source data that return information about various characteristics of the source
data.
When you add a function to a data profile, you can choose from the following types of
functions:
♦ Source-level functions. Performs calculations on two or more columns of a source, source
group, or mapplet group. For example, you can evaluate a business rule for groups in an
XML source.
♦ Column-level functions. Performs calculations on one column of a source. For example,
you can evaluate the data in a column to find patterns that frequently occur in your data.
♦ Intersource functions. Performs calculations on two or more sources. These functions
generate information about the relationship between the sources. For example, you might
compare the values of columns in two sources to find out the percentage of identical data
that appears in both sources.
Each function type has a subset of functionality that you can configure when you add a
function to the data profile. For more information about profile functions, see “Working with
Functions” on page 91.
Overview 33
Creating an Auto Profile
Create an auto profile to learn more about your source data or mapplet output data during
mapping development. When you create an auto profile, the Designer creates a data profile
with the following functions:
♦ Row Count. Counts the number of rows read from the source during the profile session.
When you create a data profile that uses the Row Count function with data samples, the
Row Count function estimates the total row count.
♦ Candidate Key Evaluation. Calculates the number and percentage of unique values in one
or more columns.
♦ Redundancy Evaluation. Calculates the number of duplicate values in one or more
columns of the source.
♦ Domain Inference. Reads all values in a column and infers patterns that occurs in the data.
You can configure the Profile Wizard to filter the Domain Inference results.
♦ Distinct Value Count. Reads all the values in a column and returns the number of distinct
values for the column. You can configure the auto profile to load verbose data to the Data
Profiling warehouse.
♦ Aggregate functions. Calculates an aggregate value for numeric or string values in a
column. You can use aggregate functions to count null values, determine average values,
and determine minimum or maximum values.
For more information about functions, see “Working with Functions” on page 91.
For example, if you generate an auto profile for the source CustomerData, the Designer
names the auto profile AP_CustomerData.
After you create the auto profile, the Designer generates a mapping based on the profile
functions. The Designer uses the following naming convention when it saves the profile
mapping to the repository:
m_DP_AP_<source/mapplet name>
For example, if you create an auto profile called AP_CustomerData, the profile mapping
name is m_DP_AP_CustomerData.
Tip: You can rename an auto profile in the Profile Manager. You can click Description on the
Auto Profile Column Selection page to change the name or description of the profile. Or, you
can change the naming convention for profile mappings in the default data profile options.
For information about changing default data profile options, see “Configuring Default Data
Profile Options” on page 26.
where N is the latest version number of the previous auto profile plus 1. For example, if you
have an auto profile AP_CustomerData, and you generate a new auto profile for the source
CustomerData, the auto profile name is AP_CustomerData1.
The mapping the Designer generates for the new auto profile uses the following naming
convention:
m_DP_AP_<source/mapplet name>N
1. Select the source definition in the Source Analyzer or mapplet in the Mapplet Designer
you want to profile.
2. In the Source Analyzer, select Sources-Profiling-Create Auto Profile. In the Mapplet
Designer, select Mapplets-Profiling-Create Auto Profile.
The Auto Profiling dialog box displays in the following cases:
♦ You set the default data profile options to open the Auto Profiling dialog box when
you create an auto profile.
♦ The source definition contains 25 or more columns.
Add or edit
a name or
description
for the
profile.
Add
columns or
groups to
the profile.
Specify the
type of
verbose
data to
load.
Select to configure the session. Select to configure the Domain Inference function settings.
If the Auto Profiling dialog box does not display, the Designer generates an auto profile
and profile mapping based on the profile functions. Go to step 10.
Note: If you skip this dialog box, you cannot configure verbose data loading settings or
Domain Inference tuning settings.
3. Optionally, click Description to add a description for the data profile.
If you do not want to add a description, go to step 5.
4. Enter a description up to 200 characters for the profile, and click OK to return to the
Auto Profile Column Selection page.
5. Optionally, select the groups or columns in the source that you want to profile.
By default, all columns or groups are selected.
6. Specify the type of verbose data for the Distinct Value Count function you want the
PowerCenter Server to write to the Data Profiling warehouse during the profile session.
The PowerCenter Server can write the following types of verbose data to the Data
Profiling warehouse for auto profile functions:
♦ No rows. The PowerCenter Server writes no verbose data to the Data Profiling
warehouse.
Also, you cannot use an at sign (@) or a number at the beginning of a data profile name.
When you are finished, click Next.
Sources
Node
Add
Source
Remove
Source
To add sources to the data profile, select a source definition, group in a source definition, or
mapplet and click the Add Source button. To remove a source, select the source definition,
group, or mapplet and click the Remove Source button.
When you are finished adding sources, click Next.
Tip: If you want to profile multiple sources, you can create a mapplet that combines multiple
sources and create a data profile based on the mapplet output data.
Up and
Down
Arrows
Configure
Domain
Inference
function
settings.
If you finish adding functions to the profile and you have not enabled session configuration,
click Finish. The Profile Wizard generates the mapping. For more information about
generating the profile mapping, see “Generating the Profile Mapping” on page 45.
If you specify an intersource function, you must select at least two sources or two groups from
different sources to apply the function to. For more information about profile functions, see
“Working with Functions” on page 91.
Figure 3-5. Profile Wizard - Function Role Details Page (Row Count Function)
Each function type has a subset of functionality you can configure to perform calculations on
the source data. For more information about configuring functions, see “Working with
Functions” on page 91.
When you finish configuring the function, the Profile Wizard returns to the Function-Level
Operations page described in “Step 3. Add Functions and Enable Session Configuration” on
page 39. You can then continue to add and configure functions for the profile.
Profile Mappings
2. From the Profile View, select the profile you want to delete.
3. Select Profile-Delete.
The Profile Manager asks if you want to delete the selected data profile.
53
Overview
To generate information about your source data from a data profile, you must create and run
a profile session. You can create and run profile sessions from the following tools:
♦ Profile Manager. You can create and run profile sessions from the Profile Manager. This
allows you to run sessions immediately to quickly obtain profile results from a source. You
can also run a session from the Profile Manager when you want to profile a sample of
source data instead of the entire source. When you create sessions from the Profile
Manager, the Profile Manager creates a workflow and associates it with the session.
♦ Workflow Manager. If you want to monitor ongoing data quality issues, you can create a
persistent session and workflow for the profile mapping in the Workflow Manager and add
a scheduling task. This allows you to perform a time-dimensional analysis of data quality
issues. You can also edit and run persistent sessions that you create in the Profile Manager
from the Workflow Manager. For more information about creating and running sessions
and workflows, see the PowerCenter Workflow Administration Guide.
After you configure the options for the Profile Run page, you must configure options for the
Session Setup page.
Table 4-2 describes the session property settings on the Session Setup page:
Source Required Configure source connection properties on the Connections tab. Configure source
Properties properties on the Properties tab. Configure reader properties on the Reader tab.
The Source properties are the same as those in the session properties you
configure in the Workflow Manager. For more information about session property
settings for sources, see the PowerCenter Workflow Administration Guide.
Target Required The relational database connection to the Data Profiling warehouse database. This
Connections is the relational database connection you configured for the Data Profiling
warehouse in the Workflow Manager. For more information about configuring
relational database connections, see the PowerCenter Workflow Administration
Guide.
Reject File Required Directory for session reject files. The default reject file directory is $PMBadFileDir\.
Directory
Run Session Optional Select to run the session immediately. Otherwise, the Profile Manager saves the
session configuration information and exits.
You can also monitor the profile session from the Workflow Monitor. The PowerCenter
Server creates a workflow for profile sessions. For more information about monitoring
workflows, see the PowerCenter Workflow Administration Guide.
If an interactive session fails, the PowerCenter Server writes a session log. You can review the
session log, correct any errors, and restart the session. To view the session log, click View-
Session Log in the Profile Manager.
When the session successfully finishes, you can view reports that contain the profile results.
For more information about viewing Data Profiling reports, see “Viewing Profile Results” on
page 65.
Source-level functions Profiles created with source functions and data samples can display general patterns within
the data.
Column-level functions Data profiles created with the following column-level functions and data samples can display
general patterns within the data:
- Domain Inference
- Business Rule Validation
- Distinct Value Count
Data samples for NULL Count and Average Value Aggregate functions display general
patterns within the data. However, Minimum and Maximum Value Aggregate functions can
have inconsistent results because a column can have unusually high maximum values or
unusually low minimum values.
Intersource functions You cannot use data samples with intersource functions.
I tried to run an interactive session for an SAP R/3 source, but the session failed.
When you create a data profile for an SAP R/3 source, the Designer does not generate an
ABAP program for the data profile. If you run an interactive session immediately after
creating the data profile, the session fails. Create an ABAP program for the profile mapping
and then run the session.
An interactive session failed with an error message stating that the buffer block size is too
low.
You ran an interactive session for a source with a large number of rows with high precision.
Or, you ran an interactive session for a multi-group source with a large number of groups. As
a result, the PowerCenter Server could not allocate enough memory blocks to hold the data,
and the session failed.
Create a persistent session for the profile mapping in the Workflow Manager. In the session
properties, set the value for the buffer block size that the error message in the session log
recommends. For more information about optimizing sessions and setting the buffer block
size, see the PowerCenter Workflow Administration Guide.
Troubleshooting 63
64 Chapter 4: Running Profile Sessions
Chapter 5
65
Overview
After you run a profile session, you can view the session results in a report. There are two
types of Data Profiling reports:
♦ PowerCenter Data Profiling reports. Reports you can view from the Profile Manager after
running a profile session. PowerCenter Data Profiling reports display data for the latest
session run. Use PowerCenter Data Profiling reports to quickly view profile results during
mapping development.
♦ PowerAnalyzer Data Profiling reports. Reports you can view from PowerAnalyzer after
running a profile session. PowerAnalyzer reports provide a time-dimensional view of your
data. They also display information about rejected rows in your profile results. Use
PowerAnalyzer Data Profiling reports when you want to monitor data quality during
production.
Attribute Description
Profile Run Time Date and time of the profile session run.
Folder Name Name of the folder in which the data profile is stored.
Repository Name Name of the repository in which the data profile is stored.
Source Name of the source definition on which the auto profile is based in the following format:
<database name>::<source definition name>
Groups The groups in the source definition or mapplet on which the auto profile is based, where
applicable.
The body of an auto profile report provides general and detailed information. In an auto
profile report, you can click the hyperlinks to view information about verbose data for the
Distinct Value Count function, Redundancy Evaluation function, and Domain Inference
function.
Click to launch
the report in a
browser.
Attribute Description
Profile Run Time Date and time of the profile session run.
Folder Name Name of the folder in which the data profile is stored.
Repository Name Name of the repository in which the data profile is stored.
Source Name The type and name of the source definitions upon which the custom profile is based.
Show report for all Click to show Data Profiling reports for all sources upon which the custom profile is
sources based. By default, the report shows all sources. To view a particular source, click on the
source name on the Sampling Summary page. This filters out results for other sources.
Table 5-3 describes the attributes that display for each function in a custom profile report:
Attribute Description
Column Names Name of the column to which the profile business rule is being applied.
You can also click the hypertext links to view information about verbose data.
4. Select the ODBC data source for the Data Profiling warehouse.
5. Enter the user name and password for the Data Profiling warehouse.
6. Click Connect.
The PowerCenter Data Profiling report displays.
You access the PowerAnalyzer Data Profiling reports folder in the Public Folders on the Find
tab.
Informatica recommends that you do not use the Summary option when viewing Data
Profiling reports. If you use the Summary option, the values that display in the report may
not accurately reflect your profile session results.
Tip: When you view a PowerAnalyzer Data Profiling report, select +Workflows to open the
analytic workflow.
Profile List This report displays information for all data profiles in a folder. The report displays
data for the latest data profile version.
Profile Function List This report displays the functions for the selected profile. The report displays data for
the latest data profile version.
Source Function Statistics This report displays details about source functions for the selected profile. The report
displays data for the latest data profile version.
Column Function Statistics This report displays details about column functions for the selected profile. The
report displays data for the latest data profile version.
Redundancy Evaluation This report displays details from the Redundancy Evaluation source-level function for
Statistics the selected profile. The report displays data for the latest data profile version.
Candidate Key Statistics This report displays details from the Candidate Key Evaluation source-level function
for the selected profile. The report displays data for the latest data profile version.
Row Count Statistics This report displays details from the Row Count source-level function for the selected
profile. The report displays data for the latest data profile version.
Source Business Rule This report displays details from the Business Rule Validation source-level function
Validation Statistics for the selected profile. The report displays data for the latest data profile version.
Rejected Rows - Source This report displays details about the rows that do not satisfy the specified business
Business Rule Validation rule for a source (verbose data details). The report displays data for the latest data
profile version.
Column Business Rule This report displays details about the Business Rule Validation column-level function
Validation Statistics for the selected profile. The report displays data for the latest data profile version.
Aggregate Column Statistics This report displays details about the Aggregate function for the selected profile. The
report displays data for the latest data profile version.
Distinct Count Statistics This report displays details about the Distinct Value Count function for the selected
profile. The report displays data for the latest data profile version.
Domain Validation Statistics This report displays details about the Domain Validation function for the selected
profile. The report displays data for the latest data profile version.
Domain Inference Statistics This report displays details about the Domain Inference column function type for the
selected profile. The report displays data for the latest data profile version.
Rejected Rows - Column This report displays details about the rows that do not satisfy the specified business
Business Rule Validation rule for a column (verbose data details). The report displays data for the latest data
profile version.
Rejected Rows - Domain This report displays details about the rows that do not satisfy the specified domain
Validation validation rule for a column (verbose data details). The report displays data for the
latest data profile version.
Outer Join Analysis This report displays the results of all the Orphan Analysis functions for the selected
profile. The report displays data for the latest data profile version.
Rejected Rows - Outer Join This report displays the unmatched rows of the Orphan Analysis function for the
Analysis selected profile. The report displays data for the latest data profile version.
Cartesian Product Analysis This report displays details about the Join Complexity Evaluation function for the
selected profile. The report displays data for the latest data profile version.
Auto Profile - Column Statistics This report displays the results of column-level statistics for a group generated as
part of an auto profile for column functions. The report displays data for the latest
data profile version.
Inter Source Function List This report displays details about intersource functions for the selected data profiles.
The report displays data for the latest data profile version.
77
Working with Domains
Domains are sets of all valid values for a column. When you create a custom profile, you can
create domains or you can use existing domains that Informatica provides. Some domains
contain a list of all valid values that the source column can contain. Some domains contain a
regular expression that describes a range or pattern of values that the source column can
contain. You can use prepackaged domains or create your own domains.
You can use the following domains to profile your data:
♦ Prepackaged domains. For more information about prepackaged domains, see
“Prepackaged Domains” on page 79.
♦ Custom domains. For more information about creating custom domains, see “Custom
Domains” on page 80.
US-Zip-Codes-Pattern (faster) Validates the source against the U.S.A. zip code pattern.
US-Zip-Codes-List (more-accurate) Contains a list of United States Postal Service zip codes.
US-State-Codes (extended) Contains a list of U.S.A. state abbreviations with additional values for
territories and outlying areas served by the United States Postal Service.
US-State-Names (extended) Contains a list of the names of all U.S.A. states with additional values for
territories and outlying areas.
US-Social-Security-Number Validates the source against the U.S.A. social security number pattern.
Canadian-Zip-Codes Validates the source against the Canadian zip code pattern.
UK-Postal-Codes Validates the source against the U.K. postal code pattern.
Prepackaged Domains 79
Custom Domains
You can create custom domains when you create a Domain Validation function. During a
profile session, the Domain Validation function uses the domains you specify to validate
source values or to help you infer patterns from source data.
You can create the following types of domains:
♦ List of Values. Domains defined by a comma-delineated list of values.
♦ Regular Expression. Domains defined by a range of values in an expression.
♦ Domain Definition Filename. Domains defined by an external file containing values.
You can create reusable and non-reusable custom domains. Apply a reusable domain to
multiple Domain Validation functions. Apply a non-reusable domain to one Domain
Validation function. For more information about configuring a Domain Validation function,
see “Column-Level Functions” on page 101.
You can create a domain from the Profile Manager or when you configure a function. Any
domain you create from the Profile Manager is a reusable domain. When you create a domain
from a function, you can make the domain reusable or non-reusable.
When you view domains in the Domain Browser from the Profile Manager, you can only view
reusable and prepackaged domains. You can view non-reusable domains from the Domain
Browser when you define a function to which the non-reusable domain applies.
Once you create a domain, you can edit or delete the domain. For more information about
editing domains, see “Editing a Domain” on page 89. For more information about deleting a
domain, see “Deleting a Domain” on page 90.
1. To create a List of Values domain from the Profile Manager, select Tools-Domains.
To create a domain when you define a function, click the Domains button on the Profile
Function Details page.
4. Clear Reusable Domain if you do not want to be able to use this domain in other
Domain Validation functions.
Note: If you configure a domain from the Profile Manager, the domain is automatically
reusable. You cannot make the domain non-reusable.
Custom Domains 81
5. Select List of Values as the domain type.
6. In the Value box, enter a new domain value to manually add values to the list of values. If
you want to add a file with a list of values, go to step 8.
When you enter a domain value, the Designer ignores any spaces before or after the
value.
7. Click Add to add the domain values you entered to the list of values.
8. Click Values File to add a list of values.
If you want to add domain values manually, go to step 6.
9. Navigate to the file, and select the file to use.
10. Select the appropriate code page from the drop-down list.
The code page you specify must be a subset of the code page for the operating system that
hosts the PowerCenter client. You can specify localization and code page information in
the file list. If you do not specify the localization information, the PowerCenter Server
uses default values. For information about specifying localization information, see
“Specifying Localization and Code Page Information” on page 87.
11. Repeat steps 6 and 7 for each domain value you want to add.
12. If you want to remove a domain value, select the value from the list of values and click
Remove.
13. Click OK to save the domain.
14. Click Close.
Syntax Description
[a-z] Matches one instance of a letter. For example, [a-z][a-z] can match ab or CA.
Syntax Description
() Groups an expression. For example, the parentheses in (\d-\d-\d\d) groups the expression \d\d-\d\d,
which finds any two numbers followed by a hyphen and any two numbers, as in 12-34.
{} Matches the number of characters exactly. For example, \d{3} matches any three numbers, such as
650 or 510. Or, [a-z]{2} matches any two letters, such as CA or NY.
? Matches the preceding character or group of characters zero or one time. For example,
\d{3}(-{d{4})? matches any three numbers, which can be followed by a hyphen and any four
numbers.
* (an asterisk) Matches zero or more instances of the values that follow the asterisk. For example, *0 is any value
that precedes a 0.
For example, to create a regular expression for U.S.A. zip codes, you can enter the following
perl syntax:
\d{5}(-\d{4})?
This expression lets you find 5-digit U.S.A. zip codes, such as 93930, as well as 9-digit zip
codes, such as 93930-5407.
In this example, \d{5} refers to any five numbers, such as 93930. The parentheses surrounding
-\d{4} group this segment of the expression. The hyphen represents the hyphen of a 9-digit
zip code, as in 93930-5407. \d{4} refers to any four numbers, such as 5407. The question
mark states that the hyphen and last four digits are optional or can appear one time.
Figure 6-1 shows an example of a Regular Expression Domain:
When you enter a regular expression, you can validate the regular expression with test data.
You can use the test data instead of the data in your repository. The test data should represent
the source data you plan to use the Regular Expression domain against.
Custom Domains 83
Figure 6-2 shows an example of test data in the Test Data dialog box:
Test Results
9999 \d\d\d\d Matches any four digits from 0-9, as in 1234 or 5936.
or
\d{4}
9xx9 \d[a-z][a-z]\d Matches any number followed by two letters and another number, as in 1ab2.
Custom Domains 85
10. Click OK to save the domain.
11. Click Close.
You can use server variables, such as $PMSourceDir and $PMRootDir when you specify a
filename and path for the domain value.
Note: The file you specify must use a code page that is a subset of the PowerCenter Server
code page. You can specify a code page by entering valid syntax on the first line of the
file. For information about entering localization and code page information, see
“Specifying Localization and Code Page Information” on page 87.
6. Click OK to save the domain.
7. Click Close.
where language, territory, code page, and Sort represent the following information:
− Language. Specifies translation for month and names for days of the week.
− Territory. Specifies country dependent information such as currency symbols, numeric
and monetary formatting rules, and Date/Time formats.
− Code page. Specifies the character encoding to use. The code page you specify must be a
subset of the code page for the operating system that hosts the PowerCenter client.
Custom Domains 87
− Sort. Specifies the collation sequence to use. For example, you can use Binary.
For example, you can specify the following localization information for a U.S. English file:
locale=English_UnitedStates.US-ASCII@binary
For a Japanese file, you can specify the following localization information:
locale=Japenese_Japan.JapanEUC@binary
For more information about code page compatibility for Data Profiling components, see
“Code Page Compatibility” on page 140.
To edit a domain:
1. To edit a reusable or prepackaged domain from the Profile Manager, select Tools-
Domains.
To edit a domain when you define a function, click the Domains button on the Profile
Function Role Details page.
The Domain Browser dialog box displays.
2. Select the domain you want to edit, and click Edit.
The Domain Details dialog box displays.
3. If you are editing a custom domain, optionally change the domain name.
If you are editing a List of Values domain, add or remove domain values as necessary.
If you are editing a Regular Expression domain, modify the domain expression as
necessary.
If you are editing a Domain Filename Definition domain, modify the filename that
contains the domain values as necessary.
4. Click OK to save your changes.
5. Click Close.
Editing a Domain 89
Deleting a Domain
You can delete a domain if you no longer want to apply it to Domain Validation functions. If
you delete a domain, the Designer invalidates all of the data profiles and related profile
mappings that reference the domain.
To delete a domain:
91
Overview
You include functions in a profile to perform calculations on sources during a profile session.
When you create an auto profile, the Designer adds a predefined set of functions to your
profile. When you create a custom profile, you create functions that meet your business needs,
and add them to your profile. You can add the following types of functions to a profile:
♦ Source-level functions. Perform calculations on two or more source columns, source
group, or mapplet group. For more information about source-level functions, see “Source-
Level Functions” on page 93.
♦ Column-level functions. Perform calculations on one column in a source. For more
information about column-level functions, see “Column-Level Functions” on page 101.
♦ Intersource functions. Perform calculations on two or more sources, source groups, or
mapplet groups. For more information about intersource functions, see “Intersource
Functions” on page 111.
For many profile functions, you can write data to the Data Profiling warehouse in verbose
mode. When you select this option, the PowerCenter Server writes verbose data to the Data
Profiling warehouse during the profile session. You can use Data Profiling reports to view
more information about the verbose data. For more information about viewing
PowerAnalyzer reports, see “Viewing Profile Results” on page 65.
For more information about creating auto profiles, see “Creating an Auto Profile” on page 34.
For more information about creating custom profiles, see “Creating a Custom Profile” on
page 38.
Row Count
The Row Count function returns the number of rows in a source. It can also report the
number of rows in each group. If you configure a session for random manual sampling or
automatic manual sampling, the Row Count function returns an estimate of the total source
rows. If you configure a session to sample First N Rows, the Row Count function returns the
number of rows read during the session. For more information about using data samples, see
“Profiling Data Samples” on page 60.
Source-Level Functions 93
Figure 7-1 shows the Function Role Details page for the Row Count function:
Required/
Property Description
Optional
Generate profile Optional Select to group source rows in a particular column. You can view a result for each
data by group group in the Data Profiling report.
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group by. You
can select a column of any datatype except Binary. If you select a a column of
numeric datatype, the precision must between 1 and 28 digits. If you select a
column of String datatype, the precision must be between 1 to 200 characters.
Click to enter a
business rule.
Table 7-2 shows the properties of the Business Rule Validation source-level function:
Required/
Property Description
Optional
Rule Summary Required Click the Rule Editor button to enter a business rule. Once you enter a business
rule, the rule displays in the Rule Summary dialog box. Use a valid Boolean
expression. You can only enter Business Rule Validation functions in the Business
Rule Editor. If you enter other functions available through Business Rule
Validation, such as Date or String functions, the session may generate
unexpected results.
Generate profile Optional Select to group source rows in a particular column. You can view a result for each
data by group group.
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group by. You
can select a column of any datatype except Binary. If you select a a column of
numeric datatype, the precision must between 1 and 28 digits. If you select a
column of String datatype, the precision must be between 1 to 200 characters.
Source-Level Functions 95
Table 7-2. Business Rule Validation Source-Level Function Options
Required/
Property Description
Optional
Specify the type of Required Select for the PowerCenter Server to write verbose data to the Data Profiling
verbose data to warehouse.
load Into the You can load the following types of verbose data:
warehouse - No Rows
- Valid rows only
- Invalid rows only
- All Rows
The character limit is 1000 bytes/ K, where K is the maximum number of bytes for
each character in the Data Profiling warehouse code page. If the column exceeds
this limit, the PowerCenter Server writes truncated data to the Data Profiling
warehouse. For more information about configuring verbose mode, see
“Configuring a Function for Verbose Mode” on page 43.
Select Columns Optional Click to select the columns you want to profile in verbose mode. For more
information about configuring a function for verbose mode, see “Configuring a
Function for Verbose Mode” on page 43.
Figure 7-3 shows the Business Rule Validation Editor for a simple business rule:
Table 7-3 shows the properties of the Candidate Key Evaluation function:
Required/
Property Description
Optional
Select Column(s) Required Select the columns to profile. By default, the Designer selects all columns
of numeric datatype with a precision between 1 and 28 digits or String
datatype with a precision of 1 to 10 characters.
Enable duplicate count for Optional Select to evaluate candidate keys based on pairs of columns.
pairs of selected columns
Redundancy Evaluation
The Redundancy Evaluation function calculates the number of duplicate values in one or
more columns of a source. You can use the Redundancy Evaluation function to identify
columns to normalize into separate tables. You may want to normalize the columns that have
the highest percentage of redundant values.
This function can evaluate unique values in columns of numeric datatypes with a precision of
28 digits or less or columns of the String datatype with a precision of 10 characters or less.
Source-Level Functions 97
Figure 7-5 shows the Function Role Details page for the Redundancy Evaluation function:
Required/
Property Description
Optional
Select Column(s) Required Select the columns you want to profile using the Redundancy Evaluation
function. By default, the Designer selects all columns of numeric datatype
with a precision between 1 and 28 digits or String datatype with a precision
of 1 to 10 characters.
Enable duplicate count for Optional Select to perform redundancy evaluation based on pairs of columns.
pairs of selected columns
Row Uniqueness
The Row Uniqueness function calculates the number of unique and duplicate values based on
the columns selected. You can profile all columns in the source row or choose columns to
profile. This helps you identify columns to normalize into a separate table. You can also use
this function to test for distinct rows. This function is particularly useful for flat files, which
have no internal validation tools such as primary key constraints or unique indexes. For
example, if you have a flat file source that uses unique employee ID values to identify each
Required/
Property Description
Optional
Select Column(s) Required Select the columns you want to profile using the Row Uniqueness function.
By default, the Designer selects all columns of numeric datatype with a
precision between 1 and 28 digits or String datatype with a precision of 1 to
10 characters.
Generate profile data by Optional Select to group source rows in a particular column. You can view a result
group for each group in the Data Profiling report.
Source-Level Functions 99
Table 7-5. Row Uniqueness Function Options
Required/
Property Description
Optional
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group
by. You can select a column of any datatype except Binary. If you select a a
column of numeric datatype, the precision must between 1 and 28 digits. If
you select a column of String datatype, the precision must be between 1 to
200 characters.
Specify the type of Required Select for the PowerCenter Server to write verbose data to the Data
verbose data to load Into Profiling warehouse. You can load the following types of verbose data:
the warehouse - All rows
- No rows
- Duplicate rows only
The character limit is 1000 bytes/ K, where K is the maximum number of
bytes for each character in the Data Profiling warehouse code page. If the
column exceeds this limit, the PowerCenter Server writes truncated data to
the Data Profiling warehouse. For more information about configuring
verbose mode, see “Configuring a Function for Verbose Mode” on page 43.
Click to enter a
business rule.
Table 7-6 shows the properties of the Business Rule Validation column-level function:
Required/
Property Description
Optional
Selected Column Required Select the column you want to apply the business rule to.
Generate profile Optional Select to group source rows in a particular column. You can view a result for each
data by group group.
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group by. You
can select a column of any datatype except Binary. If you select a a column of
numeric datatype, the precision must between 1 and 28 digits. If you select a
column of String datatype, the precision must be between 1 to 200 characters.
Required/
Property Description
Optional
Rule Summary Required Click the Rule Editor button shown in Figure 7-3 to enter a business rule. Once
you enter a business rule, the rule displays in the Rule Summary dialog box. Use
a valid Boolean expression. You can only enter Business Rule Validation functions
in the Business Rule Editor. If you enter other functions available through
Business Rule Validation, such as Date or String functions, the session may
generate unexpected results.
Specify the type of Required Select for the PowerCenter Server to write verbose data to the Data Profiling
verbose data to warehouse. You can load the following types of verbose data:
load Into the - No Rows
warehouse - Valid rows only
- Invalid rows only
- All Rows
The character limit is 1000 bytes/ K, where K is the maximum number of bytes for
each character in the Data Profiling warehouse code page. If the column exceeds
this limit, the PowerCenter Server writes truncated data to the Data Profiling
warehouse. For more information about configuring verbose mode, see
“Configuring a Function for Verbose Mode” on page 43.
Domain Validation
The Domain Validation function calculates the number of values in the profile source
column that fall within a specified domain and the number of values that do not. A domain is
the set of all possible valid values for a column. For example, a domain might include a list of
abbreviations for all of the states in the U.S.A. Or, a domain might include a list of valid
U.S.A zip code patterns.
Required/
Property Description
Optional
Selected Column Required Select the column you want to evaluate against the domain.
Domain Summary Required Click the Domains button to select a reusable domain or to create a non-reusable
domain. Once you enter a domain, it displays in the Domain Summary box. For
more information about domains, see “Working with Domains” on page 77.
Specify the type of Required Select for the PowerCenter Server to write verbose data to the Data Profiling
verbose data to warehouse. You can load the following types of verbose data:
load Into the - No Rows
warehouse - Valid rows only
- Invalid rows only
- All Rows
The character limit is 1000 bytes/ K, where K is the maximum number of bytes for
each character in the Data Profiling warehouse code page. If the column exceeds
this limit, the PowerCenter Server writes truncated data to the Data Profiling
warehouse. For more information about configuring verbose mode, see
“Configuring a Function for Verbose Mode” on page 43.
Select the domain you want to use for your domain validation function. Click the Close
button to return to the Domain Validation Function Role Details page.
If you validate the domain against a List of Values domain or a Domain Definition Filename
domain, the list must use a code page that is two-way compatible with the PowerCenter
Server. For information about specifying a compatible code pages for these domains, see
“Custom Domains” on page 80.
Domain Inference
The Domain Inference function reads all values in the column and infers a pattern that fits
the data. The function determines if the values fit a list of values derived from the column
values or a pattern that describes the pattern of the source data. For example, you have a
column with social security numbers in your source. Use the Domain Inference function to
determine the pattern of numbers in the column. The Domain Inference function can also
infer a pattern of ‘STRING WITH ONLY SPACES’ for columns containing non-null blank
space data. This is useful for determining data quality.
This function can infer domains for columns with a numeric datatype with a precision of 28
digits or less or a String datatype with a precision of 200 characters or less. This function can
also infer domains for columns of the Date/Time datatype.
When you create a Domain Inference function, select a column for which you want to infer a
domain. Click Finish.
Aggregate Functions
An Aggregate function calculates an aggregate value for a numeric or string value applied to
one column of a profile source.
You can add the following aggregate functions to a profile:
♦ NULL Value Count. The number of rows with NULL values in the source column.
♦ Average Value. The average value of the rows in the source column.
♦ Minimum Value. The minimum value of the rows in the source column.
♦ Maximum Value. The maximum value of the rows in the source column.
The Aggregate function you can add to a source column depends on the datatype of the
column.
Table 7-8 describes the Aggregate functions you can add based on the datatype of the source
column:
Figure 7-11 shows the Function Role Details page for Aggregate functions:
Required/
Property Description
Optional
Selected Column Required Select the column you want to apply the aggregation to.
Aggregate Required Select the functions you want to apply to the source column.
Functions
Required/
Property Description
Optional
Generate profile Optional Select to group source rows in a particular column. You can view a result for each
data by group group.
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group by. You
can select a column of any datatype except Binary. If you select a a column of
numeric datatype, the precision must between 1 and 28 digits. If you select a
column of String datatype, the precision must be between 1 to 200 characters.
Click to enter a
business rule.
Required/
Property Description
Optional
Selected Column Required Select the column you want to apply the function to.
Rule Summary Optional Click the Rule Editor button to enter an expression. Once you enter an
expression, it displays in the Rule Summary dialog box. If you enter other
functions available through the Rule Editor, such as Date or String functions, the
session may generate unexpected results.
Generate profile Optional Select to group source rows in a particular column. You can view a result for each
data by group group.
Group by Columns Optional If you selected Generate Profile Data by Group, select a column to group by. You
can select a column of any datatype except Binary. If you select a a column of
numeric datatype, the precision must between 1 and 28 digits. If you select a
column of String datatype, the precision must be between 1 to 200 characters.
Specify the type of Required Select for the PowerCenter Server to write verbose data to the Data Profiling
verbose data to warehouse. You can load the following types of verbose data:
load Into the - All rows
warehouse - No rows
- Duplicate rows only
The character limit is 1000 bytes/ K, where K is the maximum number of bytes for
each character in the Data Profiling warehouse code page. If the column exceeds
this limit, the PowerCenter Server writes truncated data to the Data Profiling
warehouse. For more information about configuring verbose mode, see
“Configuring a Function for Verbose Mode” on page 43.
Orphan Analysis
The Orphan Analysis function compares the values of columns in two sources. When you
create this function, you select columns that you want to analyze. During the profile session,
the PowerCenter Server reports the number and percentage of rows that appear in a specified
column in the master source but not in the detail source. This is useful for referential integrity
analysis, also known as orphan analysis.
This function can evaluate unique values in columns of numeric datatypes with a precision of
28 digits or less or columns of the String datatype with a precision of 200 characters or less.
The columns can use any combination of datatypes except Date and Numeric and Non-Raw
with Raw. You can use Raw with Raw when you disable verbose mode.
Required/
Property Description
Optional
Source Required The source you selected to perform the intersource function. The source is listed
using the following syntax: <DBD Name>:<Source Name> or <DBD
Name>:<Mapplet Name>.
Port Name Required Select the columns you want to profile for each source.
Required/
Property Description
Optional
Datatype Required The datatype for the columns you want to profile in the corresponding source. The
columns can use any combination of datatypes except Date and Numeric and
Non-Raw with Raw. You can use Raw with Raw when you disable verbose mode.
Specify the type of Required Select for the PowerCenter Server to write all unmatched or orphaned rows to the
verbose data to Data Profiling warehouse.
load Into the You can load the following types of verbose data:
warehouse - No rows
- Orphan rows
The character limit is 1000 bytes/ K, where K is the maximum number of bytes for
each character in the Data Profiling warehouse code page. If the column exceeds
this limit, the PowerCenter Server writes truncated data to the Data Profiling
warehouse. For more information about configuring verbose mode, see
“Configuring a Function for Verbose Mode” on page 43.
When you configure the details for the Join Complexity Evaluation function, you must select
the column in each source that you want to profile. The columns can use any datatype except
Binary. The columns can use any combination of datatypes except Date and Numeric, Raw
with Raw, and Non-Raw with Raw.
115
Overview
PowerCenter Data Profiling provides views to create metrics and attributes in the Data
Profiling schema. The prepackaged reports use these metrics and attributes to provide
information about your source data.
Your business needs may require you to create custom metrics, attributes, or reports using the
prepackaged Data Profiling schema tables. The schema tables are views built on top of the
Data Profiling tables. To understand the Data Profiling schema, you must understand the
views and the information contained in each view.
This chapter provides the following information about each view:
♦ Description. Provides a description of the view.
♦ Usage. Provides information about how you can use the view for analysis.
♦ Column Name. Provides the name of the column used in the view.
PROFILE_ID Identifies the data profile in every repository and repository folder.
CURR_PRFL_VER_KEY The surrogate key that the Designer generates for a data profile. This identifies
the current version of a data profile.
CURR_PRFL_RUN_KEY The surrogate key that the Designer generates for a data profile session run. This
identifies the current version of a data profile session run.
CURR_RUN_DT Identifies the last saved date of the data profile session run.
DPR_LATEST_PRFLS 117
DPR_PRFL_AUTO_COL_FN_METRICS
♦ Description. This view contains the results (metric statistics) of the column-level
functions of an auto profile. Use this view to analyze the column-level function statistics
for an auto profile.
♦ Usage. Use this view to get all metric statistics of all the column-level functions of an auto
profile. This view does not apply to custom profiles.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session. Refers
to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
TOTAL_ROWS The total row counts in column. This applies to the column-level Business Rule
Validation, Distinct Value Count, and Aggregate functions.
NULL_ROWS The null rows count in a column. This applies to the Aggregate function.
DISTINCT_ROWS The distinct count of values in a column. This applies to the Distinct Value Count
function.
AVG_VALUE The average value of the numeric fields of the selected column in a source. This
applies to the Aggregate function.
MIN_VALUE The minimum value of a column selected in a source. This applies to the
Aggregate function.
MAX_VALUE The maximum value of a column selected in a source. This applies to the
Aggregate function.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session. Refers
to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
CP_VAL The value for two or more source tables based on a joined column.
COLUMN_VALUE The column value for the selected joined columns on two or more source tables.
DPR_PRFL_CART_PROD_METRICS 119
DPR_PRFL_COL_FN
♦ Description. This view contains details about column-level functions, such as Aggregate
and Domain Validation functions.
♦ Usage. Use this view to get all data profile column-level function details.
FUNCTION_LEVEL Identifies the function level for a function in a data profile. The function level is
COLUMN.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile column-level
function.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
SOURCE_NAME1 Identifies the name of the source object that is being profiled.
SOURCE_NAME Identifies the name of the source object that is being profiled.
SOURCE_TYPE1 Identifies the type of source object that is being profiled, such as a database
table or mapplet.
SOURCE_TYPE Identifies the type of source object that is being profiled, such as a database
table or mapplet.
DBD_NAME1 Identifies the DBD name for the source in a data profile.
FN_TYPE_TEXT The detailed text of the function type for the column-level functions.
RULE_VAL Identifies the rule applied to a function. This applies to the column-level Business
Rule Validation function and the Distinct Value Count function.
COLNAME1 Identifies the column for which the profile function is defined.
COLTYPE1 Identifies the datatype of the source column to which the profile function applies.
This column applies to column-level and intersource functions.
GRP_BY_COLNAME1 Identifies the first column on which the profiling function is grouped.
GRP_BY_COLNAME2 Identifies the second column on which the profiling function is grouped.
GRP_BY_COLNAME3 Identifies the third column on which the profiling function is grouped.
GRP_BY_COLTYPE1 Identifies the datatype of the first column on which the profiling function is
grouped.
GRP_BY_COLTYPE2 Identifies the datatype of the second column on which the profiling function is
grouped.
GRP_BY_COLTYPE3 Identifies the datatype of the third column on which the profiling function is
grouped.
GRP_BY_COLTYPES Identifies the datatype of all the columns on which the profiling function is
grouped.
DOMAIN_VALUE Identifies the domain value for domain validation. This applies to Regular
Expression and Domain Validation Filename domains.
DPR_PRFL_COL_FN 121
DPR_PRFL_COL_FN_METRICS
♦ Description. This view contains the results (metric statistics) of all column-level functions.
Use this view to analyze the results of various column-level functions.
♦ Usage. Use this view to get all metric statistics of all the column-level functions. You can
use this view to analyze the result set of column-level functions.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile column-level
function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session.
Refers to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
GRP_BY_COL1_VAL Identifies the first column value on which the profile function is grouped.
GRP_BY_COL2_VAL Identifies the second column value on which the profile function is grouped.
GRP_BY_COL3_VAL Identifies the third column value on which the profile function is grouped.
GRP_BY_COL_VALUES Identifies the concatenated values of the three group-by column values.
FN_TYPE_TEXT The detailed text of the function type for the column-level functions.
TOTAL_ROWS The total row count in a column. This applies to the column-level Business
Rule Validation, Distinct Value Count, and Aggregate functions.
SATISFIED_ROWS The row count for rows that satisfied a rule or condition in a column. This
applies to the column-level Business Rule Validation function.
UNSATISFIED_ROWS The row count for rows that did not satisfy a rule or condition in a column. This
applies to the column-level Business Rule Validation function.
NULL_ROWS The null row count in a column. This applies to the Aggregate function.
DISTINCT_ROWS The distinct count of values in a column. This applies to the Distinct Value
Count function.
AVG_VALUE The average value of the numeric fields of the selected column in a source.
This applies to the Aggregate function.
MIN_VALUE The minimum value of a selected column in a source. This applies to the
Aggregate function.
MAX_VALUE The maximum value of a selected column in a source. This applies to the
Aggregate function.
COLUMN_PATTERN The list of values for domain validation and an inferred pattern for the Domain
Inference function.
DOMAIN_TYPE The domain type of domain inference. This could be a list of values or a regular
expression.
DOMAIN_TOTAL_ROWS The total row count in a source for Domain Validation and Domain Inference
functions.
DOMAIN_SATISFIED_ROWS The total row count for rows that satisfied a domain validation rule inferred
pattern. This applies to the Domain Validation and Domain Inference functions.
DOMAIN_UNSATISFIED_ROWS The total row count for rows that did not satisfy a domain validation rule or
inferred pattern. This applies to Domain Validation and Domain Inference
functions.
DOMAIN_NULL_ROWS The null row count for Domain Validation and Domain Inference functions.
DPR_PRFL_COL_FN_METRICS 123
DPR_PRFL_COL_FN_VERBOSE
♦ Description. This view contains the rejected row information for column-level functions.
♦ Usage. Use this view to get all rejected row information for the Business Rule Validation
and Domain Validation column-level functions.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile column-level
function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session run.
Refers to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
GRP_BY_COL1_VAL Identifies the first column value on which the profile function is grouped.
GRP_BY_COL2_VAL Identifies the second column value on which the profile function is grouped.
GRP_BY_COL3_VAL Identifies the third column value on which the profile function is grouped.
GRP_BY_COL_VALUES Identifies all the column values on which the profile function is grouped.
FUNCTION_LEVEL Identifies the function level for a function in a data profile. The function level is
INTER SOURCE.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile Join Complexity
Evaluation function.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
SOURCE_NAME1 Identifies the name of the source object that is being profiled.
SOURCE_NAME2 This column does not apply to Join Complexity Evaluation functions.
SOURCE_NAME Identifies the name of the source object that is being profiled.
SOURCE_TYPE1 Identifies the type of source object that is being profiled, such as a database
table or mapplet.
SOURCE_TYPE2 This column does not apply to Join Complexity Evaluation functions.
SOURCE_TYPE Identifies the type of source object that is being profiled, such as a database
table or mapplet.
DBD_NAME1 Identifies the DBD name for the source in a data profile.
DBD_NAME2 This column does not apply to Join Complexity Evaluation functions.
GROUP_NAME2 This column does not apply to Join Complexity Evaluation functions.
FUNCTION_TYPE Identifies the function type for the Join Complexity Evaluation function.
FN_TYPE_TEXT The detailed text of the function type for the Join Complexity Evaluation function.
RULE_VAL This column does not apply to Join Complexity Evaluation functions.
COLNAME1 Identifies the column for which the profile function is defined.
COLNAME2 This column does not apply to Join Complexity Evaluation functions.
COLNAME3 This column does not apply to Join Complexity Evaluation functions.
COLTYPE1 Identifies the datatype of the source column to which the profile function applies.
COLTYPE2 This column does not apply to Join Complexity Evaluation functions.
DPR_PRFL_CP_FN 125
Table A-7. DPR_PRFL_CP_FN Column Information
COLTYPE3 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLNAME1 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLNAME2 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLNAME3 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLUMNS This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLTYPE1 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLTYPE2 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLTYPE3 This column does not apply to Join Complexity Evaluation functions.
GRP_BY_COLTYPES This column does not apply to Join Complexity Evaluation functions.
DOMAIN_NAME This column does not apply to Join Complexity Evaluation functions.
DOMAIN_TYPE This column does not apply to Join Complexity Evaluation functions.
DOMAIN_VALUE This column does not apply to Join Complexity Evaluation functions.
FUNCTION_LEVEL Identifies the function level for a function in a data profile. The function levels are:
SOURCE, COLUMN, and INTER SOURCE.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile function.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
SOURCE_NAME1 Identifies the name of the source object that is being profiled. For Orphan
Analysis functions, this contains the master source name.
SOURCE_NAME2 Identifies the name of the detail source object that is being profiled. This only
applies to Orphan Analysis functions.
SOURCE_NAME Identifies the name of the source object that is being profiled.
SOURCE_TYPE1 Identifies the type of source object that is being profiled, such as a database
table or mapplet. For Orphan Analysis functions, this identifies the type of the
master source object.
SOURCE_TYPE2 Identifies the type of detail source object that is being profiled, such as a
database table or mapplet. This only applies to Orphan Analysis functions.
SOURCE_TYPE Identifies the type of source object that is being profiled, such as a database
table or mapplet.
DBD_NAME1 Identifies the DBD name for the source in a data profile. For Orphan Analysis
functions, this is populated with the DBD name of the master source.
DBD_NAME2 Identifies the DBD name for the detail source in a data profile. This only applies
to Orphan Analysis functions.
FUNCTION_TYPE Identifies the function type for each of the function levels: source level, column
level, and intersource.
FN_TYPE_TEXT The detailed text of the function type for each of the function levels: source level,
column level, and intersource.
DPR_PRFL_FN_DTLS 127
Table A-8. DPR_PRFL_FN_DTLS Column Information
RULE_VAL Identifies the rule applied to a function. This applies to the source-level and
column-level Business Rule Validation functions and the Distinct Value Count
function.
COLNAME1 Identifies the column for which the profile function is defined. This column does
not apply to source-level functions.
COLNAME2 Identifies one of the source columns to which the profile function applies. If the
profile function applies to various columns, this column populates. This column
only applies to Orphan Analysis functions.
COLNAME3 Identifies one of the source columns to which the profile function applies. If the
profile function applies to various columns, this column populates. If a function
applies to various columns, this column populates. This column applies to
Orphan Analysis functions.
COLTYPE1 Identifies the datatype of the source column to which the profile function applies.
This column applies to column-level and intersource functions.
COLTYPE2 Identifies the datatype of the source column to which the profile function applies.
This column applies to Orphan Analysis functions.
COLTYPE3 Identifies the datatype of the source column to which the profile function applies.
If a function applies to various source columns, this column populates. This
column applies to Orphan Analysis functions.
GRP_BY_COLNAME1 Identifies the datatype of first column on which the profiling function is grouped.
GRP_BY_COLNAME2 Identifies the datatype of second column on which the profiling function is
grouped.
GRP_BY_COLNAME3 Identifies the datatype of third column on which the profiling function is grouped.
GRP_BY_COLNAMES Identifies the datatype of all the columns on which the profiling function is
grouped.
DOMAIN_VALUE Identifies the domain value for domain validation. This applies to Regular
Expression and Domain Validation Filename domains.
FUNCTION_LEVEL Identifies the function level for a function in a data profile. The function level is
INTER SOURCE.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile Orphan Analysis
function.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
SOURCE_NAME1 Identifies the name of the source object that is being profiled. For Orphan
Analysis functions, this contains the master source name.
SOURCE_NAME2 Identifies the name of the detail source object that is being profiled. This column
only applies to Orphan Analysis functions.
SOURCE_NAME Identifies the name of the source object that is being profiled.
SOURCE_TYPE1 Identifies the type of source object that is being profiled, such as a database
table or mapplet. For Orphan Analysis functions, this identifies the type of the
master source object.
SOURCE_TYPE2 Identifies the type of detail source object that is being profiled, such as a
database table or mapplet. This only applies to Orphan Analysis functions.
SOURCE_TYPE Identifies the type of source object that is being profiled, such as a database
table or mapplet.
DBD_NAME1 Identifies the DBD name for the source in a data profile. For Orphan Analysis
functions, this is populated with the DBD name of the master source.
DBD_NAME2 Identifies the DBD name for the detail source in a data profile. This only applies
to Orphan Analysis functions.
FUNCTION_TYPE Identifies the function type for the Orphan Analysis functions.
FN_TYPE_TEXT The detailed text of the function type for the Orphan Analysis functions.
COLNAME1 Identifies the column for which the profile function is defined.
DPR_PRFL_OJ_FN 129
Table A-9. DPR_PRFL_OJ_FN Column Information
COLNAME2 Identifies one of the source columns to which the profile function applies. If the
profile function applies to various columns, this column populates. This column
only applies to Orphan Analysis functions.
COLNAME3 Identifies one of the source columns to which the profile function applies. If the
profile function applies to various columns, this column populates. This column
only applies to Orphan Analysis functions.
COLTYPE1 Identifies the datatype of the source column to which the profile function applies.
This column applies to column-level and intersource functions.
COLTYPE2 Identifies the datatype of the source column to which the profile function applies.
This column only applies to Orphan Analysis functions.
COLTYPE3 Identifies the datatype of the source column to which the profile function applies.
If a function applies to various source columns, this column populates. This
column only applies to Orphan Analysis functions.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session run.
Refers to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
COL_VALUE Identifies the concatenated text of the three column values of the joined columns.
MASTER_ROW_FLAG Identifies whether the verbose record pertains to the master source or the detail
source.
DPR_PRFL_OJ_FN_VERBOSE 131
DPR_PRFL_OJ_METRICS
♦ Description. This view contains the results (metric statistics) of Orphan Analysis
functions.
♦ Usage. Use this view to get all the Orphan Analysis function run details.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session run.
Refers to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
UNSATISFIED_PARENT_ROWS The row count for rows that did not satisfy join conditions in the master source.
UNSATISFIED_CHILD_ROWS The row count for rows that did not satisfy join conditions in the detail source.
SATISFIED_PARENT_ROWS The rows count for rows that satisfied join conditions in the master source.
SATISFIED_CHILD_ROWS The rows count for rows that satisfy join conditions in the detail source.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
PRFL_RUN_KEY The surrogate key that the Designer generates for a data profile session run.
PRFL_REPO_NAME Identifies the name of the repository that stores the data profile.
PROFILE_ID Identifies the data profile in every repository and repository folder.
PROFILE_NAME Identifies the name of the data profile within a folder and repository.
PRFL_RUN_DT Identifies the last run date of the session for a data profile.
PRFL_RUN_STATUS Identifies the status of profile session runs. You can run every profile session one
or more times. The run status displays the status of every session run. The run
status can be: RUNNING, SUCCESS, or FAILURE.
LATEST_PRFL_RUN_FLAG Identifies the latest profile session run for all functions.
DPR_PRFL_RUN_DTLS 133
DPR_PRFL_SRC_FN
♦ Description. This view contains details about the source-level functions, such as
Candidate Key Evaluation and Redundancy Evaluation.
♦ Usage. Use this view to get all source-level function details.
FUNCTION_LEVEL Identifies the function level for a function in a data profile. The function level is
SOURCE.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile source-level
function.
PRFL_VER_KEY The profile version key that the Designer generates for a data profile. Refers to
DPR_PRFL_VER_DTLS.PRFL_VER_KEY column.
SOURCE_NAME1 Identifies the name of the source object that is being profiled.
SOURCE_NAME Identifies the name of the source object that is being profiled.
SOURCE_TYPE1 Identifies the type of source object that is being profiled, such as a database
table or mapplet.
SOURCE_TYPE Identifies the type of source object that is being profiled, such as a database
table or mapplet.
DBD_NAME1 Identifies the DBD name for the source in a data profile.
FN_TYPE_TEXT The detailed text of the function type for source-level functions.
RULE_VAL Identifies the rule applied to a function. This applies to the source-level Business
Rule Validation function.
GRP_BY_COLNAME1 Identifies the first column on which the profiling function is grouped.
GRP_BY_COLNAME2 Identifies the second column on which the profiling function is grouped.
GRP_BY_COLNAME3 Identifies the third column on which the profiling function is grouped.
GRP_BY_COLTYPE1 Identifies the datatype of first column on which the profiling function is grouped.
GRP_BY_COLTYPE2 Identifies the datatype of second column on which the profiling function is
grouped.
GRP_BY_COLTYPE3 Identifies the datatype of third column on which the profiling function is grouped.
GRP_BY_COLTYPES Identifies the datatype of all the columns on which the profiling function is
grouped.
DPR_PRFL_SRC_FN 135
DPR_PRFL_SRC_FN_METRICS
♦ Description. This view contains the results (metric statistics) of all source-level functions.
Use this view to analyze the results of various source-level functions.
♦ Usage. Use this view to get all metric statistics of all the source-level functions.
FUNCTION_KEY The surrogate key that the Designer generates for a data profile source-level
function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session. Refers
to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
COLUMN_NAME Identifies the concatenated text of one or a combination of two columns used for
the Candidate Key Evaluation and Redundancy Evaluation functions.
GRP_BY_COL1_VAL Identifies the first column value on which the profiling function is grouped.
GRP_BY_COL2_VAL Identifies the second column value on which the profiling function is grouped.
GRP_BY_COL3_VAL Identifies the third column value on which the profiling function is grouped.
GRP_BY_COL_VALUES Identifies the concatenated values of the three group-by column values.
FN_TYPE_TEXT The detailed text of the function type for the source-level functions.
TOTAL_ROWS The total rows in a source. This applies to the source-level Business Rule
Validation and Row Count functions.
TOTAL_ROWS_EVAL The total rows in a source. This applies to the Candidate Key Evaluation and
Redundancy Evaluation functions.
SATISFIED_ROWS The total row count for rows that satisfied a rule or condition. This applies to the
source-level Business Rule Validation and Row Count functions.
PERC_SATISFIED_ROWS The percentage of rows that satisfied a rule or condition in a source. This applies
to the source-level Business Rule Validation and Row Count functions.
UNSATISFIED_ROWS The row count for rows that did not satisfy a rule or condition. This applies to the
source-level Business Rule Validation and Row Count functions.
DUP_ROWS The duplicate row count in a source. The duplicate row count is the total number
of rows minus the distinct number of rows. This applies to the Redundancy
Evaluation function.
FUNCTION_KEY The surrogate key that the Designer generates for a source-level function.
PRFL_RUN_KEY The profile run key that the Designer generates for a data profile session run.
Refers to DPR_PRFL_RUN_DTLS.PRFL_RUN_KEY column.
GRP_BY_COL1_VAL Identifies the first column value on which the profile function is grouped.
GRP_BY_COL2_VAL Identifies the second column value on which the profile function is grouped.
GRP_BY_COL3_VAL Identifies the third column value on which the profile function is grouped.
GRP_BY_COL_VALUES Identifies all the column values on which the profiling function is grouped.
DP_ROW_NUMBER Identifies the row to which the column belongs. Use this row number group the
columns belonging to one row.
DPR_PRFL_SRC_FN_VERBOSE 137
DPR_PRFL_VER_DTLS
♦ Description. This view contains version details for all data profiles.
♦ Usage. Use this view to get details about profile versions.
PRFL_VER_KEY The surrogate key that the Designer generates for a data profile.
PRFL_REPO_NAME Identifies the name of the repository that stores the data profile.
PROFILE_ID Identifies the data profile in every repository and repository folder.
PROFILE_NAME Identifies the name of the data profile within a folder and repository.
PROFILE_TYPE Identifies whether the data profile is an auto profile or custom profile. Values are
AUTO PROFILE and CUSTOM PROFILE.
139
Code Page Compatibility
When you use Data Profiling, configure code page compatibility between all PowerCenter
components and Data Profiling components and domains.
Follow the instructions in “Code Pages” in the Installation and Configuration Guide to ensure
that the code pages of each PowerCenter component have the correct relationship with each
other.
When you work with data profiles, ensure that the code pages for the Data Profiling
components have the correct relationship with each other:
♦ The PowerCenter Server must use a code page that is a subset of the Data Profiling
warehouse code page.
♦ The PowerCenter Server and the Repository Server must use two-way compatible code
pages.
♦ The code page for the PowerCenter Client must be two-way compatible with the code
page for the Repository Server.
♦ A Domain Definition Filename domain must use a code page that is a subset of the
PowerCenter Server code page. For more information about creating the Domain
Definition Filename domain, see “Custom Domains” on page 80.
♦ A List of Values domain must use a code page that is a subset of the code page for the
operating system that hosts the PowerCenter Client. For more information about creating
a List of Values domain, see “Custom Domains” on page 80.
♦ To view reports from the Profile Manager, the Data Profiling warehouse must use a code
page that is two-way compatible with the code page of the operating system that hosts the
PowerCenter Client. For more information about viewing Data Profiling reports from the
Profile Manager, see “PowerCenter Data Profiling Reports” on page 67.
Domain Definition
File
Two-way compatible code page required.
Two-way compatible code page required for data profiling reports.
Must be a subset of the code page.
143
Overview
PowerCenter returns messages when you perform data profiling tasks. Some messages are
errors and some are informational. You can use this chapter to help determine what causes
error messages to appear and what measures you can take to correct the error.
When an error message is a result of an internal error, causes are listed when possible, but
contacting Informatica Technical Support is still the best recourse. For contact information,
see “Obtaining Technical Support” on page xviii.
Data profiling messages from can originate from the following sources:
♦ PowerCenter Designer
♦ PowerCenter Server
The Designer displays messages when you create and modify data profiles. The PowerCenter
Server displays messages in the session log when you run profile sessions.
For other PowerCenter messages, see the PowerCenter Troubleshooting Guide.
Datatype ‘number’ of port <port name> and datatype ‘character’ of port <port name> are not the
same.
Cause: You specified port with different datatypes for a Join Complexity Evaluation
function.
Action: Select ports with the same datatypes for the Join Complexity Evaluation
function.
<filename> file is not a valid domain definition file. Reason: <error message>.
Cause: The domain definition file that you specified for the List of Values domain is
empty or contains a value longer than 200 characters. The domain definition
file must contain entries of 200 or fewer characters.
Action: Make sure that your domain definition file contains valid content before you
import it.
or
Action: See the additional error message for more information.
Mapping <mapping name> representing this profile is not valid. This profile cannot be run in
interactive mode.
Cause: The mapping is invalid because it has been modified.
Action: Regenerate the profile mapping.
or
Action: If this is the first time that you generated the mapping, and you did not
modify it, contact Informatica Technical Support.
Port <port name> cannot be used in a Join Complexity Evaluation function, because it is of the
datatype ‘binary’.
Cause: You specified a port with a Binary datatype while adding or editing a Join
Complexity Evaluation function.
Action: Specify a port with a datatype other than Binary.
Profile was created successfully but mapping <mapping name> representing this profile is not
valid. This profile cannot be run in interactive mode.
Cause: Internal error.
Action: Contact Informatica Technical Support.
Profile was updated successfully but mapping <mapping name> representing this profile is not
valid. This profile cannot be run in interactive mode.
Cause: Internal error.
Action: Contact Informatica Technical Support.
Some fields in source <source file name> group <group name> cannot be found. The source
might be corrupt.
Cause: The source has been modified since you last edited the data profile.
Action: Create a new data profile for the modified source.
The mapping corresponding to this profile cannot be found. The repository might be corrupt.
Cause: The mapping corresponding to this data profile might have been deleted.
Action: Delete the data profile and create a new one.
The target warehouse does not contain profile results for repository <repository name>.
Cause: The Data Profiling warehouse connection that you specified for viewing
reports does not match the relational database connection that you specified
for your session target.
Action: In the Profile Manager, modify your connection for viewing reports to match
the relational database connection that you specified for your session target.
You cannot profile a mapplet with transformations that generate transaction controls.
Cause: You tried to create a data profile for a mapplet with a transformation that
generates transaction controls. For example, you tried to create a data profile
for a mapplet with a Custom transformation configured for transaction
control.
Action: Make sure the transformations in your mapping are not configured for
transaction control.
DP Codes
The following messages may appear when you run a profile session:
DP_90001 Invalid target type. Profile mapping targets should either be relational or null.
Cause: The profile mapping contains target definitions that are not relational or null.
The targets in the profile mapping might have been modified.
Action: Recreate the profile.
DP_90002 All the targets in a profiling mapping should use the same connection and have
the same connection attributes.
Cause: There are two or more relational database connections configured for targets
in the profile mapping.
Action: Make sure you use the same relational database connection for all targets in a
profiling mapping.
DP_90004 Connection to the database using user <username>, connect string <database
connect string> failed. Reason: <error message>.
Cause: The username or connect string is invalid.
Action: Verify that your username and connect string values are valid.
or
Action: See the additional error message for more information.
DP_90009 SQL Prepare failed for statement <SQL statement> with error <database error>.
Cause: The SQL query failed.
Action: Fix the database error indicated in the message, and rerun the Data Profiling
warehouse script in the Data Profiling installation directory. Commit your
SQL script after you run it.
DP_900010 SQL Bind failed for statement <SQL statement> with error <database error>.
Cause: The Data Profiling warehouse tables are invalid.
Action: Rerun the Data Profiling warehouse script in the Data Profiling installation
directory. Commit your SQL script after you run it.
DP_900011 SQL Execute failed for statement <SQL statement> with error <database error>.
Cause: The Data Profiling warehouse tables are invalid.
Action: Rerun the Data Profiling warehouse script in the Data Profiling installation
directory. Commit your SQL script after you run it.
DP_900012 SQL Fetch failed for statement <SQL statement> with error <database error>.
Cause: The Data Profiling warehouse tables are invalid.
Action: Rerun the Data Profiling warehouse script in the Data Profiling installation
directory. Commit your SQL script after you run it.
1 Column key
Action: Rerun the Data Profiling warehouse script in the Data Profiling installation
directory. Commit your SQL script after you run it.
DP_90014 There must be exactly one input group and one output group for this
transformation.
Cause: The Custom transformation in the profile mapping has been modified and is
invalid.
Action: Regenerate the profile mapping.
DP_90016 The target warehouse is already used for repository <repository name> with
GUID <global unique identifier>. Either drop the warehouse tables or use a
different one.
Cause: You tried to use two repositories for the same Data Profiling warehouse.
Action: Create a second Data Profiling warehouse. Also, create a new relational
database connection to the second Data Profiling warehouse in the Workflow
Manager.
DP_90017 The profile warehouse tables are not present in the target database connection.
Please check the target connection information.
Cause: The Data Profiling warehouse tables are not in the target database.
Action: Run the Data Profiling warehouse script in the Data Profiling installation
directory. Commit your SQL script after you run it.
DP_90020 Failed to get the metadata extension <metadata extension> for the mapping.
Cause: You copied a profile mapping without copying the metadata extensions.
Action: Copy the metadata extensions from the original profile mapping to the copied
mapping and run the session again.
or
Cause: You are running a session with an original profile mapping that corresponds to
a data profile, but the metadata extensions are deleted.
Action: Regenerate the profile mapping.
DP_90024 The target warehouse uses schema version <version> and data version
<version>. You may need to upgrade the warehouse.
Cause: The version of the Data Profiling warehouse does not match the version of
PowerCenter.
Action: Upgrade the Data Profiling warehouse using the upgrade script for your
database type.
DP_90031 Source Qualifier transformation [%s] was not found in this mapping.
Cause: The profile mapping is modified.
Action: Regenerate the profile mapping.
DP_90803 Invalid number of input port(s) associated with output port <port>.
Cause: The profile mapping is modified.
Action: Regenerate the profile mapping.
Glossary
157
Glossary Terms
Aggregate functions
Functions that calculate an aggregate value for a numeric or string value applied to one
column of a profile source.
analytic workflow
A list of PowerAnalyzer reports linked together in a hierarchy consisting of a primary report
and related workflow reports. See PowerAnalyzer Data Profiling reports.
auto profile
A data profile containing a predetermined set of functions for profiling source data.
column-level functions
Functions that perform calculations on one column of a source, source group, or mapplet
group.
custom domain
A domain you create to validate source values or to infer patterns from source data. You can
create a custom domain when you create a Domain Validation function. See domain and
Domain Validation function.
data connector
A requirement for PowerAnalyzer to connect to a data source and read data for Data Profiling
reports. Typically, PowerAnalyzer uses the system data connector to connect to all the data
sources required for reports.
data profile
A profile of source data in PowerCenter. A data profile contains functions that perform
calculations on the source data.
domain
A set of all valid values for a source column. A domain can contain a regular expression, list of
values, or the name of a file that contains a list of values. Informatica provides prepackaged
domains. You can also create your own domains.
group-by columns
Columns by which you want to group data for a custom profile. When you configure a
function, you can determine the column by which you want to group the data.
interactive session
A data profile session that you run from the Profile Manager.
intersource functions
Functions that perform calculations on two or more sources, source groups, or mapplet
groups.
non-reusable domain
A domain that applies to one Domain Validation function. See also reusable domain.
prepackaged domains
Domains informatica provides, which verify data, such as phone numbers, postal codes, and
email addresses. See domain.
Profile Manager
A tool in the Designer that manages data profiles. Use the Profile Manager to set default data
profile options, work with data profiles in your repository, run profile sessions, view profile
results, and view sources and mapplets with at least one profile defined for them.
profile mapping
A mapping the Designer generates when you create a data profile. The PowerCenter
repository stores the data profile and the associated mapping.
profile session
A session for a profile mapping that gathers information about your source data. The Data
Profiling warehouse stores the results of profile sessions. See persistent session and interactive
session.
reusable domain
A domain you can apply to multiple Domain Validation functions in one or more data
profiles. See domain.
source-level function
A function that performs calculations on two or more columns of a source, source group, or
mapplet group.
temporary session
A session that is run from the Profile Manager and is not stored to the repository.
verbose mode
An option to view verbose data that the PowerCenter Server writes to the Data Profiling
warehouse during a profile session. You can specify the type of verbose data to load when you
configure the data profile.
A C
ABAP program Candidate Key Evaluation function
generating for SAP R/3 sources 33 description 96
adding profiling primary key columns 96
functions 39 checking in
Aggregate functions profile mappings 27
description 107 COBOL syntax
auto profile reports converting to perl syntax 84
description 67 Code pages
auto profiles configuring compatibility 140
auto profile reports 67 Data Profiling report requirements 140
creating 34 rules for Domain Definition Filename domain 140
deleting 49 rules for List of Values domain 140
editing 47 specifying for a Domain Definition Filename domain
running a session after creating an auto profile 59 87
specifying for a List of Values domain 87
syntax for Domain Definition Filename domain 87
B syntax for List of Values domain 87
column-level functions
buffer block size Aggregate functions 107
See PowerCenter Workflow Administration Guide Business Rule Validation 101
Business Rule Validation function description 101
column-level function description 101 Distinct Value Count 109
source-level function description 94 Domain Inference 105
Domain Validation 103
163
configuring Data Profiling reports
default data profile options 26 auto profile reports 67
functions 41 custom profile reports 69
sessions 56 importing into PowerAnalzyer 16
copying requirements for viewing from the Profile Manager
profile mappings 51 140
creating viewing in PowerCenter 71
auto profiles 34 Data Profiling schemas
custom profiles 38 importing 16
data connectors 21 Data Profiling warehouse
Data Profiling warehouse on IBM DB2 12 configuring a data source 20
Data Profiling warehouse on Informix 12 configuring a relational database connection 15
Data Profiling warehouse on Microsoft SQL Server 13 creating 12
Data Profiling warehouse on Oracle 13 overview 3
Data Profiling warehouse on Sybase 13 upgrading 13
Data Profiling warehouse on Teradata 13 data samples
Domain Definition Filename domain 86 profiling 60
List of Values domain 80 selecting a data sampling mode 61
Regular Expression domain 82 selecting a function for a profile session 60
custom domains selecting a function type for a data profile 60
description 80 data sources
custom profile reports adding to data connector 23
viewing 69 configuring for the Data Profiling warehouse 20
custom profiles primary in data connector 22
creating 38 supported databases 20
deleting 49 datatypes
editing 47 Raw 101, 111
running a session 59 default data profile options
configuring 26
deleting
D data profiles 49
domains 90
data connectors Distinct Value Count function
adding the Data Profiling data source 23 description 109
creating 21 documentation
primary data source 22 conventions xvi
properties 22 description xv
data profiles Domain Definition Filename domain
creating a custom profile 38 code page compatibility 140
creating an auto profile 34 creating 86
deleting 49 description 80
editing 47 syntax for specifying code pages 87
Data Profiling Domain Inference function
during mapping development 2 configuring Domain Inference settings 106
during production 2 description 105
overview 2 Domain Validation function
description 103
164 Index
domains
creating 78
I
custom 80 IBM DB2
deleting 90 creating a Data Profiling warehouse 12
description 78 upgrading a Data Profiling warehouse 14
Domain Definition Filename domain 80 IMA
editing 89 views 116
List of Values domain 80 importing
non-reusable 80 Data Profiling schemas and reports 16
prepackaged 79 Informatica
Regular Expression domain 80 documentation xv
reusable 80 Webzine xvii
Informix
creating a Data Profiling warehouse 12
E upgrading a Data Profiling warehouse 14
installing
editing PowerAnalzyer Data Profiling reports 16
data profiles 47 XML scripts 16
domains 89 interactive profiling
error messages See interactive sessions
designer 145 interactive sessions
DP codes 150 configuring 45
overview 144 definition 55
session logs 150 monitoring in the Profile Manager 59
monitoring with the Workflow Monitor 59
viewing the session log 59
F intersource functions
description 111
functions
Join Complexity Evaluation 113
adding 39
Orphan Analysis 111
column-level functions 101
configuring 41
configuring for verbose mode 43
configuring group-by columns 42 J
intersource functions 111 Join Complexity Evaluation function
source-level functions 93 description 113
G K
generating keys
ABAP program for SAP R/3 sources 33 profiling primary key columns 96
profile mappings 45
glossary
terms 158
group-by columns
L
configuring for functions 42 List of Values domain
code page compatibility requirements 140
creating 80
description 80
syntax for specifying code pages 87
Index 165
localization PowerAnalyzer
specifying for a Domain Definition Filename domain adding data sources 23
87 configuring a data source 20
specifying for a List of Values domain 87 creating data connectors 21
importing Data Profiling schemas and reports 16
PowerAnalyzer Data Profiling reports
M description 66
installing 16
mappings viewing verbose data 73
generating a profile mapping 45 PowerCenter Data Profiling reports
prefix for the profile mapping name 28 auto profile reports 67
mapplets custom profile reports 69
profiling 32 description 66
Microsoft SQL Server prepackaged domains
creating a Data Profiling warehouse 13 description 79
upgrading a Data Profiling warehouse 14 primary data sources
modifying data connector 22
profile mappings 51 primary keys
monitoring profiling with the Candidate Key Evaluation function
interactive sessions 59 96
multiple sources profile functions
profiling 33 description 33
Profile Manager
checking in profile mappings 6
N creating custom profiles 6
creating domains 78
non-reusable domains deleting data profiles 6
description 80 editing data profiles 6
normal mode Profile View 7
running temporary sessions 55 regenerating profile mappings 6
running interactive sessions 6
running persistent sessions 55
O running sessions 59
running temporary sessions 55
Oracle
Source View 8
creating a Data Profiling warehouse 13
using 6
upgrading a Data Profiling warehouse 14
viewing data profile details 6
Orphan Analysis function
profile mappings
description 111
checking in 6
copying 51
copying with reusable domains 51
P modifying 51
perl syntax prefix for the profile mapping name 28
using in a Regular Expression domain 82 regenerating 6
persistent sessions profile sessions
definition 55 See also interactive sessions
running from the Profile Manager 55 See also persistent sessions
running in real time 62 configuring to use data samples 60
prefix for the profile session name 28
troubleshooting 63
166 Index
Profile View session logs
in Profile Manager 7 description 150
Profile Wizard viewing for interactive sessions 59
creating domains 78 sessions
profile workflows See also interactive sessions
prefix for the profile workflow name 28 See also persistent sessions
requirements for creating in the Workflow Manager 62 configuring 56
profiling creating in the Workflow Manager 62
data samples 60 running after creating a custom profile 59
eligible sources 32 running for auto profiles 59
mapplets 32 running from the Profile Manager 59
SAP R/3 sources 33 Source View
Profile Manager 8
source-level functions
R Business Rule Validation 94
Candidate Key Evaluation 96
Raw datatype description 93
verbose mode sessions 101, 111 loading the Raw datatype for verbose mode 101, 111
real-time sessions Redundancy Evaluation 97
running 62 Row Count 93
Redundancy Evaluation function Row Uniqueness 98
description 97 sources
Regular Expression domain eligible sources for profiling 32
creating 82 profiling multiple sources 33
description 80 SQL syntax
using perl syntax 82 converting to perl syntax 84
relational database connections Sybase
See also PowerCenter Workflow Administration Guide creating a Data Profiling warehouse 13
configuring for the Data Profiling warehouse 15 upgrading a Data Profiling warehouse 15
reports
auto profile reports 67
custom profile reports 69
PowerAnalyzer Data Profiling reports 73
T
PowerCenter Data Profiling reports 67 temporary session
reusable domains definition 55
copying a profile mapping 51 running from the Profile Manager 55
description 80 running in normal mode 55
Row Count function Teradata
description 93 creating a Data Profiling warehouse 13
Row Uniqueness function upgrading a Data Profiling warehouse 15
description 98 troubleshooting
profile sessions 63
S
SAP R/3 sources
profiling 33
scripts
See XML scripts
session configuration
enabling 39
Index 167
U
upgrading
Data Profiling warehouse on IBM DB2 14
Data Profiling warehouse on Informix 14
Data Profiling warehouse on Microsoft SQL Server 14
Data Profiling warehouse on Oracle 14
Data Profiling warehouse on Sybase 15
Data Profiling warehouse on Teradata 15
V
verbose data
description 4
viewing in PowerAnalyzer Data Profiling reports 73
verbose mode
configuring a function 43
verbose mode sessions
Raw datatype 101, 111
versioned objects
See also PowerCenter Repository Guide
checking in profile mappings 6
viewing
auto profile reports 67
custom profile reports 69
PowerAnalzyer Data Profiling reports 73
PowerCenter Data Profiling reports 71
views
descriptions 116
list of 116
W
warehouse
See Data Profiling warehouse
webzine xvii
Workflow Manager
creating a session and workflow 62
Workflow Monitor
monitoring interactive sessions 59
workflows
creating in the Workflow Manager 62
X
XML scripts
installing 16
168 Index