Вы находитесь на странице: 1из 53

Informatica Transformation Types

A transformation is a repository object that generates, modifies, or passes data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data.

Transformations can be of two types:

Active Transformation: An active transformation can change the number of rows that pass
through the transformation, change the transaction boundary, can change the row type. For
example, Filter, Transaction Control and Update Strategy are active transformations.

The key point is to note that Designer does not allow you to connect multiple active
transformations or an active and a passive transformation to the same downstream transformation
or transformation input group because the Integration Service may not be able to concatenate the
rows passed by active transformations. However, Sequence Generator transformation(SGT) is an
exception to this rule. A SGT does not receive data. It generates unique numeric values. As a
result, the Integration Service does not encounter problems concatenating rows passed by a SGT
and an active transformation.

Passive Transformation: A passive transformation does not change the number of rows that
pass through it, maintains the transaction boundary, and maintains the row type.

The key point is to note that Designer allows you to connect multiple transformations to the same
downstream transformation or transformation input group only if all transformations in the
upstream branches are passive. The transformation that originates the branch can be active or
passive.

Transformations can be Connected or UnConnected to the data flow.

Connected Transformation: Connected transformation is connected to other transformations or


directly to target table in the mapping.

UnConnected Transformation: An unconnected transformation is not connected to other


transformations in the mapping. It is called within another transformation, and returns a value to
that transformation.

Informatica Transformations – List


Following are the list of Transformations available in Informatica:

 Aggregator Transformation
 Application Source Qualifier Transformation
 Custom Transformation
 Data Masking Transformation
 Expression Transformation
 External Procedure Transformation
 Filter Transformation
 HTTP Transformation
 Input Transformation
 Java Transformation
 Joiner Transformation
 Lookup Transformation
 Normalizer Transformation
 Output Transformation
 Rank Transformation
 Reusable Transformation
 Router Transformation
 Sequence Generator Transformation
 Sorter Transformation
 Source Qualifier Transformation
 SQL Transformation
 Stored Procedure Transformation
 Transaction Control Transaction
 Union Transformation
 Unstructured Data Transformation
 Update Strategy Transformation
 XML Generator Transformation
 XML Parser Transformation
 XML Source Qualifier Transformation
 Advanced External Procedure Transformation
 External Transformation

In the following pages, we will explain all the above Informatica Transformations and their
significances in the ETL process in detail.

Informatica Transformations Explained


(Part 1)
Aggregator Transformation

Aggregator transformation performs aggregate functions like average, sum, count etc. on
multiple rows or groups. The Integration Service performs these calculations as it reads and
stores data group and row data in an aggregate cache. It is an Active & Connected
transformation.
Difference b/w Aggregator and Expression Transformation? Expression transformation permits
you to perform calculations row by row basis only. In Aggregator you can perform calculations
on groups.

Aggregator transformation has following ports – State, State_Count, Previous_State and


State_Counter.

Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.

Aggregate Expressions: are allowed only in aggregate transformations. can include conditional
clauses and non-aggregate functions. can also include one aggregate function nested into another
aggregate function.

Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE,
STDDEV, SUM, VARIANCE

Application Source Qualifier Transformation

Represents the rows that the Integration Service reads from an application, such as an ERP
source, when it runs a session.It is an Active & Connected transformation.

Custom Transformation

It works with procedures you create outside the designer interface to extend PowerCenter
functionality. calls a procedure from a shared library or DLL. It is active/passive & connected
type.

You can use CT to create T. that require multiple input groups and multiple output groups.

Custom transformation allows you to develop the transformation logic in a procedure. Some of
the PowerCenter transformations are built using the Custom transformation. Rules that apply to
Custom transformations, such as blocking rules, also apply to transformations built using Custom
transformations. PowerCenter provides two sets of functions called generated and API functions.
The Integration Service uses generated functions to interface with the procedure. When you
create a Custom transformation and generate the source code files, the Designer includes the
generated functions in the files. Use the API functions in the procedure code to develop the
transformation logic.

Difference between Custom and External Procedure Transformation? In Custom T, input and
output functions occur separately.The Integration Service passes the input data to the procedure
using an input function. The output function is a separate function that you must enter in the
procedure code to pass output data to the Integration Service. In contrast, in the External
Procedure transformation, an external procedure function does both input and output, and its
parameters consist of all the ports of the transformation.

Data Masking Transformation


Passive & Connected. It is used to change sensitive production data to realistic test data for non
production environments. It creates masked data for development, testing, training and data
mining. Data relationship and referential integrity are maintained in the masked data.

Example: It returns masked value that has a realistic format for SSN, Credit card number,
birthdate, phone number, etc. But is not a valid value.

Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format.
Default is no masking.

Expression Transformation

Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a
single row. Example: to calculate discount of each product or to concatenate first and last names
or to convert date to a string field.

You can create an Expression transformation in the Transformation Developer or the Mapping
Designer.

Components: Transformation, Ports, Properties, Metadata Extensions.

External Procedure

Passive & Connected or Unconnected. It works with procedures you create outside of the
Designer interface to extend PowerCenter functionality. You can create complex functions
within a DLL or in the COM layer of windows and bind it to external procedure transformation.
To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter. You must be an experienced programmer to use TX and use
multi-threaded code in external procedures.

Filter Transformation

Active & Connected. It allows rows that meet the specified filter condition and removes the rows
that do not meet the condition. For example, to find all the employees who are working in
NewYork or to find out all the faculty member teaching Chemistry in a state. The input ports for
the filter must come from a single transformation. You cannot concatenate ports from more than
one transformation into the Filter transformation.

Components: Transformation, Ports, Properties, Metadata Extensions.

HTTP Transformation

Passive & Connected. It allows you to connect to an HTTP server to use its services and
applications. With an HTTP transformation, the Integration Service connects to the HTTP server,
and issues a request to retrieves data or posts data to the target or downstream transformation in
the mapping.
Authentication types: Basic, Digest and NTLM.

Examples: GET, POST and SIMPLE POST.

Java Transformation

Active or Passive & Connected. It provides a simple native programming interface to define
transformation functionality with the Java programming language. You can use the Java
transformation to quickly define simple or moderately complex transformation functionality
without advanced knowledge of the Java programming language or an external Java
development environment.

Joiner Transformation

Active & Connected. It is used to join data from two related heterogeneous sources residing in
different locations or to join data from the same source. In order to join two sources, there must
be at least one or more pairs of matching column between the sources and a must to specify one
source as master and the other as detail. For example: to join a flat file and a relational source or
to join two flat files or to join a relational source and a XML source.

The Joiner transformation supports the following types of joins:

 Normal: Normal join discards all the rows of data from the master and detail source that
do not match, based on the condition.
 Master Outer: Master outer join discards all the unmatched rows from the master source
and keeps all the rows from the detail source and the matching rows from the master
source.
 Detail Outer: Detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from the detail
source.
 Full Outer: Full outer join keeps all rows of data from both the master and detail sources.

Limitations on the pipelines you connect to the Joiner transformation:


*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy
transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation
directly before the Joiner transformation.

Lookup Transformation

Passive & Connected or UnConnected. It is used to look up data in a flat file, relational table,
view, or synonym. It compares lookup transformation ports (input ports) to the source column
values based on the lookup condition. Later returned values can be passed to other
transformations. You can create a lookup definition from a source qualifier and can also use
multiple Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For
example, the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For
example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.

Lookup Components: Lookup source, Ports, Properties, Condition.

Types of Lookup:

1. Relational or flat file lookup.


2. Pipeline lookup.
3. Cached or uncached lookup.
4. connected or unconnected lookup.

Informatica Transformations Explained


(Part 2)
Normalizer Transformation

Active & Connected. The Normalizer transformation processes multiple-occurring columns or


multiple-occurring groups of columns in each source row and returns a row for each instance of
the multiple-occurring data. It is used mainly with COBOL sources where most of the time data
is stored in de-normalized format.

You can create following Normalizer transformation:


*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier
transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file
access method for IBM mainframe.

*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data


from relational tables or flat files. This is default when you create a normalizer transformation.

Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

Rank Transformation

Active & Connected. It is used to select the top or bottom rank of data. You can use it to return
the largest or smallest numeric value in a port or group or to return the strings at the top or the
bottom of a session sort order. For example, to select top 10 Regions where the sales volume was
very high or to select 10 lowest priced products.
As an active transformation, it might change the number of rows passed through it. Like if you
pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from
the Rank transformation to another transformation.
You can connect ports from only one transformation to the Rank transformation. You can also
create local variables and write non-aggregate expressions.

Router Transformation

Active & Connected. It is similar to filter transformation because both allow you to apply a
condition to test data. The only difference is, filter transformation drops the data that do not meet
the condition whereas router has an option to capture the data that do not meet the condition and
route it to a default output group.

If you need to test the same input data based on multiple conditions, use a Router transformation
in a mapping instead of creating multiple Filter transformations to perform the same task. The
Router transformation is more efficient.

Sequence Generator Transformation

Passive & Connected transformation. It is used to create unique primary key values or cycle
through a sequential range of numbers or to replace missing primary keys.

It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports.
Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of
numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus
one or NEXTVAL plus the Increment By value.

You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse
a Sequence Generator when you perform multiple loads to a single target.

For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero
by default, and the Integration Service does not cache values during the session.For non-reusable
Sequence Generator transformations, setting Number of Cached Values greater than zero can
increase the number of times the Integration Service accesses the repository during the session. It
also causes sections of skipped values since unused cached values are discarded at the end of
each session.

For reusable Sequence Generator transformations, you can reduce Number of Cached Values to
minimize discarded values, however it must be greater than one. When you reduce the Number
of Cached Values, you might increase the number of times the Integration Service accesses the
repository to cache values during the session.

Sorter Transformation

Active & Connected transformation. It is used sort data either in ascending or descending order
according to a specified sort key. You can also configure the Sorter transformation for case-
sensitive sorting, and specify whether the output rows should be distinct. When you create a
Sorter transformation in a mapping, you specify one or more ports as a sort key and configure
each sort key port to sort in ascending or descending order.

Source Qualifier Transformation

Active & Connected transformation. When adding a relational or a flat file source definition to a
mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is
used to join data originating from the same source database, filter rows when the Integration
Service reads source data, Specify an outer join rather than the default inner join and to specify
sorted ports.

It is also used to select only distinct values from the source and to create a custom query to issue
a special SELECT statement for the Integration Service to read source data

SQL Transformation

Active/Passive & Connected transformation. The SQL transformation processes SQL queries
midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You
can pass the database connection information to the SQL transformation as input data at run
time. The transformation processes external SQL scripts or SQL queries that you create in an
SQL editor. The SQL transformation processes the query and returns rows and database errors.

Stored Procedure Transformation

Passive & Connected or UnConnected transformation. It is useful to automate time-consuming


tasks and it is also used in error handling, to drop and recreate indexes and to determine the
space in database, a specialized calculation etc. The stored procedure must exist in the database
before creating a Stored Procedure transformation, and the stored procedure can exist in a source,
target, or any database with a valid connection to the Informatica Server. Stored Procedure is an
executable script with SQL statements and control statements, user-defined variables and
conditional statements.

Transaction Control Transformation

Active & Connected. You can control commit and roll back of transactions based on a set of
rows that pass through a Transaction Control transformation. Transaction control can be defined
within a mapping or within a session.
Components: Transformation, Ports, Properties, Metadata Extensions.

Union Transformation

Active & Connected. The Union transformation is a multiple input group transformation that you
use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It
merges data from multiple sources similar to the UNION ALL SQL statement to combine the
results from two or more SQL statements. Similar to the UNION ALL statement, the Union
transformation does not remove duplicate rows.
Rules

1. You can create multiple input groups, but only one output group.
2. All input groups and the output group must have matching ports. The precision, datatype,
and scale must be identical across all groups.
3. The Union transformation does not remove duplicate rows. To remove duplicate rows,
you must add another transformation such as a Router or Filter transformation.
4. You cannot use a Sequence Generator or Update Strategy transformation upstream from a
Union transformation.
5. The Union transformation does not generate transactions.

Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.

Unstructured Data Transformation

Active/Passive and connected. The Unstructured Data transformation is a transformation that


processes unstructured and semi-structured file formats, such as messaging formats, HTML
pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7,
EDI-X12, EDIFACT, AFP, and SWIFT.

Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.

Update Strategy Transformation

Active & Connected transformation. It is used to update data in target table, either to maintain
history of data or recent changes. It flags rows for insert, update, delete or reject within a
mapping.

XML Generator Transformation

Active & Connected transformation. It lets you create XML inside a pipeline. The XML
Generator transformation accepts data from multiple ports and writes XML through a single
output port.

XML Parser Transformation

Active & Connected transformation. The XML Parser transformation lets you extract XML data
from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or
databases. The XML Parser transformation functionality is similar to the XML source
functionality, except it parses the XML in the pipeline.

XML Source Qualifier Transformation


Active & Connected transformation. XML Source Qualifier is used only with an XML source
definition. It represents the data elements that the Informatica Server reads when it executes a
session with XML sources. has one input or output port for every column in the XML source.

External Procedure Transformation

Active & Connected/UnConnected transformation. Sometimes, the standard transformations


such as Expression transformation may not provide the functionality that you want. In such cases
External procedure is useful to develop complex functions within a dynamic link library (DLL)
or UNIX shared library, instead of creating the necessary Expression transformations in a
mapping.

Advanced External Procedure Transformation

Active & Connected transformation. It operates in conjunction with procedures, which are
created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is
useful in creating external transformation applications, such as sorting and aggregation, which
require all input rows to be processed before emitting any output rows.

Lookup Transformation in Informatica


Lookup transformation is used to look up data in a flat file, relational table, view or synonym.
Lookup is a passive/active transformation and can be used in both connected/unconnected
modes. From informatica version 9 onwards lookup is an active transformation. The lookup
transformation can return a single row or multiple rows.

You can import the definition of lookup from any flat file or relational database or even from a
source qualifier. The integration service queries the lookup source based on the ports, lookup
condition and returns the result to other transformations or target in the mapping.

The lookup transformation is used to perform the following tasks:

 Get a Related Value: You can get a value from the lookup table based on the source
value. As an example, we can get the related value like city name for the zip code value.
 Get Multiple Values: You can get multiple rows from a lookup table. As an example,
get all the states in a country.
 Perform Calculation. We can use the value from the lookup table and use it in
calculations.
 Update Slowly Changing Dimension tables: Lookup transformation can be used to
determine whether a row exists in the target or not.

You can configure the lookup transformation in the following types of lookup:
 Flat File or Relational lookup: You can perform the lookup on the flat file or relational
database. When you create a lookup using flat file as lookup source, the designer invokes
flat file wizard. If you used relational table as lookup source, then you can connect to the
lookup source using ODBC and import the table definition.
 Pipeline Lookup: You can perform lookup on application sources such as JMS, MSMQ
or SAP. You have to drag the source into the mapping and associate the lookup
transformation with the source qualifier. Improve the performance by configuring
partitions to retrieve source data for the lookup cache.
 Connected or Unconnected lookup: A connected lookup receives source data, performs
a lookup and returns data to the pipeline. An unconnected lookup is not connected to
source or target or any other transformation. A transformation in the pipeline calls the
lookup transformation with the :LKP expression. The unconnected lookup returns one
column to the calling transformation.
 Cached or Uncached Lookup: You can improve the performance of the lookup by
caching the lookup source. If you cache the lookup source, you can use a dynamic or
static cache. By default, the lookup cache is static and the cache does not change during
the session. If you use a dynamic cache, the integratiion service inserts or updates row in
the cache. You can lookup values in the cache to determine if the values exist in the
target, then you can mark the row for insert or update in the target.

What is lookup transformation in


informatica?
updated May 20, 2014 11:55 am | 150,692 views

Contents [Hide TOC]

 1 Question
 2 Answer
o 2.1 Example
 2.1.1 Connected LKPs
 2.1.2 Unconnected LKPs
 2.1.3 Performance Considerations for Lookups
 2.1.4 Misconceptions about lookup SQL Indexes
o 2.2 Dynamic Lookups
o 2.3 Links
 3 http://www.example.com link title
[edit]

Question

What is lookup transformation in informatica?

[edit]

Answer

Lookup is a transformation to look up the values from a relational table/view or a flat file. The
developer defines the lookup match criteria. There are two types of Lookups in Powercenter-
Designer, namely; 1) Connected Lookup 2) Unconnected Lookup . Different caches can also be
used with lookup like static, dynamic, persistent, and shared(The dynamic cache cannot be used
while creating an un-connected lookup). Each of these has its own identification. For more
details, the book "Informatica Help" can be useful.

Hope you are aware with the basics of Informatica. Now proceeding through lookup
transformation.

Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is
used to look up data in a relational table, view, or synonym. Lookup definition can be imported
either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the
sales data resides in another table called 'Sales'. Here instead of using the sales table as one more
source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.

Difference between Connected and UnConnected Lookup Transformation:

1. Connected lookup receives input values directly from mapping pipeline whereas UnConnected
lookup receives values from: LKP expression from another transformation.

2. Connected lookup returns multiple columns from the same row whereas UnConnected lookup
has one return port and returns one column from each row.

3. Connected lookup supports user-defined default values whereas UnConnected lookup does not
support user defined values.

[edit]

Example

Select dname from dept,emp where: emp.deptno=dept.deptno


[edit]

Connected LKPs

1. Connected LKP trasformation is one which is connected to Pipe line.


2. Connected LKP trasformation will process each and every Row.
3. If you want to Use Dynamic LKP cache, use the connected LKP transformation.
4. If the LKP condition is not matched the LKP transformation will return the Default Value.
5. it cannot be called
6. it returns multiple values.
7. it can use static or dynamic cache

[edit]

Unconnected LKPs

1. Unconnected LKP trasformation is one which is not connected to the Pipe line.
2. It should be called either from expression or Update Stragey.
3. It will not process each and evry row. It will return the values based expression Condition.
4. If no match found for the LKP condition, the LKP transformation will return Null Values.
5. it is a reusable trnsformation. The same LKP trnans can be called multiple times in same
mapping
6. it will return only one value.
7. it can use only static cache

[edit]

Performance Considerations for Lookups

Below are a list of performance considerations for lookups in Informatica PowerCenter.

Performance for Lookups

[edit]

Misconceptions about lookup SQL Indexes

I have seen people suggesting an index to improve the performance of any SQL. This suggestion
is incorrect - many times. Specially when talking about indexing the condition port columns of
Lookup SQL, it is far more "incorrect".

Before explaining why it is incorrect, I would try to detail the functionality of Lookup. To
explain the stuff with an example, we take the usual HR schema EMP table. I have EMPNO,
ENAME, SALARY as columns in EMP table.
Let us say, there is a lookup in ETL mapping that checks for a particular EMPNO and returns
ENAME and SALARY from the Lookup. Now, the output ports for the Lookup are "ENAME"
and "SALARY". The condition port is "EMPNO". Imagine that you are facing performance
problems with this Lookup and one of the suggestion was to index the condition port.

As suggested (incorrectly) you create an index on EMPNO column in the underlying database
table. Practically, the SQL the lookup executes is going to be this:

select EMPNO,
ENAME,
SALARY
from EMP
ORDER BY EMPNO,
ENAME,
SALARY;

The data resulted from this query is stored in the Lookup cache and then, each record from the
source is looked up against this cache. So, the checking against the condition port column is done
in the Informatica Lookup cache and "not in the database". So any index created in the database
has no effect for this.

You may be wondering if we can replicate the same indexing here in Lookup Cache. You don't
have to worry about it. PowerCenter create "index" cache and "data" cache for the Lookup. In
this case, condition port data - "EMPNO" is indexed and hashed in "index" cache and the rest
along with EMPNO is found in "data" cache.

I hope now you understand why indexing condition port columns doesn't increase performance.

Having said that, I want to take you to a different kind of lookup, where you would've disabled
the caching. In this kind of Lookup, there is no cache. Everytime a row is sent into lookup, the
SQL is executed against database. In this scenario, the database index "may" work. But, if the
performance of the lookup is a problem, then "cache-less" lookup itself may be a problem.

I would go for cache-less lookup if my source data records is less than the number of records in
my lookup table. In this case ONLY, indexing the condition ports will work. Everywhere else, it
is just a mere chanse of luck, that makes the database pick up index.

[edit]

Dynamic Lookups

Dynamic Lookups are used for implementing Slowly Changing dimensions. The ability to
provide dynamic caching gives Informatica a definetive edge over other vendor products. In a
Dynamic Lookup, everytime a new record is found (based on the lookup condition) the Lookup
Cache is appended with that record. It can also update existing records in the cache with the
incoming values.
[edit]

Links

More information on Informatica at ITtoolbox

The only issue with the Dynamic Lookup is that they are slow as the caches are updated
frequently based on the transaction posted in the database. Please be very careful in using the
dynamic lookup. For small tables are fine but for heavy table, try to avoid the dynamic lookup. [

[edit]

http://www.example.com link title

SCD TYPES:

SCD 1:

Unlike SCD Type 2, Slowly Changing Dimension Type 1 do not preserve any history versions of data. This
methodology overwrites old data with new data, and therefore stores only the most current
information. In this article lets discuss the step by step implementation of SCD Type 1 using Informatica
PowerCenter.

The number of records we store in SCD Type 1 do not increase exponentially as this methodology
overwrites old data with new data Hence we may not need the performance improvement techniques
used in the SCD Type 2 Tutorial.

Understand the Staging and Dimension Table.


Slowly Changing Dimension Series
Part I : SCD Type 1.
Part II : SCD Type 2.
Part III : SCD Type 3.
Part IV : SCD Type 4.
Part V : SCD Type 6.

For our demonstration purpose, lets consider the CUSTOMER Dimension. Below are the detailed
structure of both staging and dimension table.

Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other
tables other than Dimension table will be involved in the mapping. Below is the structure of our staging
table.
Key Points

1. Staging table will have only one days data. Change Data Capture is not in scope.
2. Data is uniquely identified using CUST_ID.
3. All attribute required by Dimension Table is available in the staging table

Dimension Table
Here is the structure of our Dimension table.

Key Points

1. CUST_KEY is the surrogate key.


2. CUST_ID is the Natural key, hence the unique record identifier.

Mapping Building and Configuration


Step 1
Lets start the mapping building process. For that pull the CUST_STAGE source definition into the
mapping designer.
Step 2
Now using a LookUp Transformation fetch the existing Customer columns from the dimension table
T_DIM_CUST. This lookup will give NULL values if the customer is not already existing in the Dimension
tables.

 LookUp Condition : IN_CUST_ID = CUST_ID


 Return Columns : CUST_KEY

Step 3
Use an Expression Transformation to identify the records for Insert and Update using below expression.

o INS_UPD :- IIF(ISNULL(CUST_KEY),'INS', 'UPD')

Additionally create two output ports.

o CREATE_DT :- SYSDATE
o UPDATE_DT :- SYSDATE

See the structure of the mapping in below image.


Step 4
Map the columns from the Expression Transformation to a Router Transformation and create two
groups (INSERT, UPDATE) in Router Transformation using the below expression. The mapping will look
like shown in the image.

o INSERT :- IIF(INS_UPD='INS',TRUE,FALSE)
o UPDATE :- IIF(INS_UPD='UPD',TRUE,FALSE)

INSERT Group
Step 5
Every records coming through the 'INSERT Group' will be inserted into the Dimension table
T_DIM_CUST.

Use a Sequence generator transformation to generate surrogate key CUST_KEY as shown in below
image. And map the columns from the Router Transformation to the target as shown below image.
Note : Update Strategy is not required, if the records are set for Insert.

UPDATE Group
Step 6
Records coming from the 'UPDATE Group' will update the customer Dimension with the latest customer
attributes. Add an Update Strategy Transformation before the target instance and set it as DD_UPDATE.
Below is the structure of the mapping.
We are done with the mapping building and below is the structure of the completed mapping.

Workflow and Session Creation


There is not any specific properties required to be given during the session configuration.

Below is a sample data set taken from the Dimension table T_DIM_CUST.

Initial Inserted Value for CUSTI_ID 1003

Updated Value for CUSTI_ID 1003

Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties
implementing this.

SCD2:

Slowly Changing Dimension Type 2 also known SCD Type 2 is one of the most commonly used type of
Dimension table in a Data Warehouse. SCD Type 2 dimension loads are considered to be complex
mainly because of the data volume we process and because of the number of transformation we are
using in the mapping. Here in this article, we will be building an Informatica PowerCenter mapping to
load SCD Type 2 Dimension.
Understand the Data Warehouse Architecture
Before we go to the mapping design, Lets understand the high level architecture of our Data
Warehouse.

Slowly Changing Dimension Series


Part I : SCD Type 1.
Part II : SCD Type 2.
Part III : SCD Type 3.
Part IV : SCD Type 4.
Part V : SCD Type 6.

Here we have a staging schema, which is loaded from different data sources after the required data
cleansing. Warehouse Tables are loaded from the staging schema directly. Both staging tables and the
warehouse tables are in two different schemas with in a single database instance.

Understand the Staging and Dimension Table.


Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other
tables other than Dimension table will be involved in the mapping. Below is the structure of our staging
table.

 CUST_ID
 CUST_NAME
 ADDRESS1
 ADDRESS2
 CITY
 STATE
 ZIP

Key Points :

1. Staging table will have only one days data.


2. Data is uniquely identified using CUST_ID.
3. All attribute required by Dimension Table is available in the staging table.
Dimension Table
Here is the structure of our Dimension table.

 CUST_KEY
 AS_OF_START_DT
 AS_OF_END_DT
 CUST_ID
 CUST_NAME
 ADDRESS1
 ADDRESS2
 CITY
 STATE
 ZIP
 CHK_SUM_NB
 CREATE_DT
 UPDATE_DT

Key Points :

1. CUST_KEY is the surrogate key.


2. CUST_ID, AS_OF_END_DT is the Natural key, hence the unique record identifier.
3. Record versions are kept based on Time Range using AS_OF_START_DT, AS_OF_END_DATE
4. Active record will have an AS_OF_END_DATE value 12-31-4000
5. Checksum value of all dimension attribute columns are stored into the column CHK_SUM_NB

Mapping Building and Configuration


Now we understand the ETL Architecture, Staging Table, Dimension Table and the design considerations,
we can go to the mapping development. We are splitting the mapping development into six steps.

1. Join Staging Table and Dimension Table


2. Data Transformation
o Generate Surrogate Key
o Generate Checksum Number
o Other Calculations
3. Identify Insert/Update
4. Insert the new Records
5. Update(Expire) the Old Version
6. Insert the new Version of Updated Record

1. Join Staging Table and Dimension Table


We are going to OUTER JOIN both the Staging (Source) Table and the Dimension (Target) Table using the
SQL Override below. An OUTER Join gives you all the records from the Staging table and the
corresponding records from Dimension table. if it is there is no corresponding record in the Dimension
table, it returns NULL values for the Dimension table columns.
SELECT

--Columns From Staging (Source) Tables CUST_STAGE.CUST_ID,

CUST_STAGE.CUST_NAME,

CUST_STAGE.ADDRESS1,

CUST_STAGE.ADDRESS2,

CUST_STAGE.CITY,

CUST_STAGE.STATE,

CUST_STAGE.ZIP,

--Columns from Dimension (Target) Tables.

T_DIM_CUST.CUST_KEY,

T_DIM_CUST.CHK_SUM_NB

FROM CUST_STAGE LEFT OUTER JOIN T_DIM_CUST

ON CUST_STAGE.CUST_ID = T_DIM_CUST.CUST_ID -- Join On the Natural Key

AND T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000','MM-DD-YYYY') – Get the active record.


2. Data Transformation
Now map the columns from the Source Qualifier to an Expression Transformation. When you map the
columns to the Expression Transformation, rename the ports from Dimension Table with
OLD_CUST_KEY, CUST_CHK_SUM_NB and add below expressions.

 Generate Surrogate Key : A surrogate key will be generated for each and every record inserted
in to the Dimension table
o CUST_KEY : Is the surrogate key, This will be generated using a Sequence Generator
Transformation
 Generate Checksum Number : Checksum number of all dimension attributes. Difference in the
Checksum value between the incoming and Checksum of the Dimension table record will
indicate a changed column value. This is an easy way to identify changes in the columns than
comparing each and every column.
o CHK_SUM_NB : MD5(TO_CHAR(CUST_ID) || CUST_NAME || ADDRESS1 || ADDRESS2 ||
CITY || STATE || TO_CHAR(ZIP))
 Other Calculations :
o Effective Start Date : Effective start date of the Record
 AS_OF_START_DT : TRUNC(SYSDATE)
o Effective end date : Effective end date of the Record,
 AS_OF_END_DT : TO_DATE('12-31-4000','MM-DD-YYYY')
o Record creation date : Record creation timestamp, this will be used for the records
inserted
 CREATE_DT : TRUNC(SYSDATE)
o Record updating date : Record updating timestamp, this will be used for records
updated.
 UPDATE_DT : TRUNC(SYSDATE)

3. Identify Insert/Update
In this step we will identify the records for INSERT and UPDATE.

 INSERT : A record will be set for INSERT if the record is not exist in the Dimension Table, We can
identify the New records if OLD_CUST_KEY is NULL, which is the column from the Dimension
table
 UPDATE : A record will be set for UPDATE, if the record is already existing in the Dimension table
and any of the incoming column from staging table has a new value. If the column
OLD_CUST_KEY is not null and the Checksum of the incoming record is different from the
Checksum of the existing record (OLD_CHK_SUM_NB <> CHK_SUM_NB), the record will be set
for UPADTE
o Following expression will be used in the Expression Transformation port INS_UPD_FLG
shown in the previous step
o INS_UPD_FLG : IIF(ISNULL(OLD_CUST_KEY), 'I', IIF(NOT ISNULL(OLD_CUST_KEY) AND
OLD_CHK_SUM_NB <> CHK_SUM_NB, 'U'))

Now map all the columns from the Expression Transformation to a Router and add two groups as below

o INSERT : IIF(INS_UPD_FLG = 'I', TRUE, FALSE)


o UPDATE : IIF(INS_UPD_FLG = 'U', TRUE, FALSE)

4. Insert The new Records


Now map all the columns from the ‘INSERT’ group to the Dimension table instance T_DIM_CUST. While
mapping the columns, we don’t need any column named OLD_, which is pulled from the Dimension
table.
5. Update(Expire) the Old Version
The records which are identified for UPDATE will be inserted into a temporary table
T_DIM_CUST_TEMP. These records will then be updated into T_DIM_CUST as a post session SQL. You
can learn more about this performance improvement technique from one of our previous post.

We will be mapping below columns from ‘UPDATE’ group of the Router Transformation to the target
table. To update(expire) the old record we just need the columns below list.

o OLD_CUST_KEY : To uniquely identify the Dimension Column.


o UPDATE_DATE : Audit column to know the record update date.
o AS_OF_END_DT : Record will be expired with previous days date.

While we map the columns, AS_OF_END_DT will be calculated as


ADD_TO_DATE(TRUNC(SYSDATE),'DD',-1) in an Expression Transformation. Below image gives the
picture of the mapping.
6. Insert the new Version of Updated Record
The records which are identified as UPDATE will have to have a new(active) version inserted. Map all
the ports from the ‘UPDATE’ group of the Router Transformation to target instance T_DIM_CUST. While
mapping the columns, we don’t need any column named OLD_, which is pulled from the Dimension
table.

Workflow and Session Creation


During the session configuration process, add the below SQL as part of the Post session SQL statement
as shown below. This correlated update SQL will update the records in T_DIM_CUST table with the
values from T_DIM_CUST_TEMP. Like we mentioned previously, this is a performance improvement
technique used to update huge tables.

UPDATE T_DIM_CUST SET


(T_DIM_CUST.AS_OF_END_DT,
T_DIM_CUST.UPDATE_DT) =
(SELECT
T_DIM_CUST_TEMP.AS_OF_END_DT,
T_DIM_CUST_TEMP.UPDATE_DT
FROM T_DIM_CUST_TEMP
WHERE T_DIM_CUST_TEMP.CUST_KEY = T_DIM_CUST.CUST_KEY) WHERE EXISTS
(SELECT 1
FROM T_DIM_CUST_TEMP
WHERE T_DIM_CUST_TEMP.CUST_KEY = T_DIM_CUST.CUST_KEY)

Now lets look at the data see how it looks from the below image.
Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties
implementing this.

SCD 3:

Unlike SCD Type 2, Slowly Changing Dimension Type 3 preserves only few history versions of data, most
of the time 'Current' and Previous' versions. The 'Previous' version value will be stored into the
additional columns with in the same dimension record. In this article lets discuss the step by step
implementation of SCD Type 3 using Informatica PowerCenter.

The number of records we store in SCD Type 3 do not increase exponentially as we do not insert a
record for each and every historical record. Hence we may not need the performance improvement
techniques used in the SCD Type 2 Tutorial.

Understand the Staging and Dimension Table.


Slowly Changing Dimension Series
Part I : SCD Type 1.
Part II : SCD Type 2.
Part III : SCD Type 3.
Part IV : SCD Type 4.
Part V : SCD Type 6.

For our demonstration purpose, lets consider the CUSTOMER Dimension. Here we will keep previous
version of CITY, STATE, ZIP into its corresponding PREV columns. Below are the detailed structure of
both staging and dimension table.

Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other
tables other than Dimension table will be involved in the mapping. Below is the structure of our staging
table.
Key Points

1. Staging table will have only one days data. Change Data Capture is not in scope.
2. Data is uniquely identified using CUST_ID.
3. All attribute required by Dimension Table is available in the staging table

Dimension Table
Here is the structure of our Dimension table.

Key Points

1. CUST_KEY is the surrogate key.


2. CUST_ID is the Natural key, hence the unique record identifier.
3. Previous versions are kept in PREV_CITY, PREV_STATE, PREV_ZIP columns.

Mapping Building and Configuration


Step 1

Lets start the mapping building process. For that pull the CUST_STAGE source definition into the
mapping designer.

Step 2
Now using a LookUp Transformation fetch the existing Customer columns from the dimension table
T_DIM_CUST. This lookup will give NULL values if the customer is not already existing in the Dimension
tables.

 LookUp Condition : IN_CUST_ID = CUST_ID


 Return Columns : CUST_KEY, CITY, STATE, ZIP

Step 3
Using an Expression Transformation, identify the records for Insert and Update using below
expression. Additionally, map the columns from the LookUp Transformation to the Expression as shown
below. With this we get both the previous and current values of the CUST_ID.

o INS_UPD :- IIF(ISNULL(CUST_KEY),'INS', IIF(CITY <> PREV_CITY OR STATE <> PREV_STATE


OR ZIP <> PREV_ZIP, 'UPD'))

Additionally create two output ports.

o CREATE_DT :- SYSDATE
o UPDATE_DT :- SYSDATE

Note : If in case there are too many columns to be compared to build the INS_UPD logic, make use
of CheckSum Number (MD5() Function) to make it simple.
Step 4

Map the columns from the Expression Transformation to a Router Transformation and create two
groups (INSERT, UPDATE) in Router Transformation using the below expression. The mapping will look
like shown in the image.

o INSERT :- IIF(INS_UPD='INS',TRUE,FALSE)
o UPDATE :- IIF(INS_UPD='UPD',TRUE,FALSE)
INSERT Group
Step 5
Every records coming through the 'INSERT Group' will be inserted into the Dimension table
T_DIM_CUST.

Use a Sequence generator transformation to generate surrogate key CUST_KEY as shown in below
image. And map the columns from the Router Transformation to the target. Leave all 'PREV' columns
unmapped as shown below image.

Note : Update Strategy is not required, if the records are set for Insert.

UPDATE Group
Step 6
Records coming from the 'UPDATE Group' will update the customer Dimension with Current customer
attributes and the 'PREV' attributes. Add an Update Strategy Transformation before the target instance
and set it as DD_UPDATE. Below is the structure of the mapping.
We are done with the mapping building and below is the structure of the completed mapping.

Workflow and Session Creation


There is not any specific properties required to be given during the session configuration.

Below is a sample data set taken from the Dimension table T_DIM_CUST. See the highlighted values.

Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties
implementing this.

Вам также может понравиться