Вы находитесь на странице: 1из 4

Informatica PowerCenter

Development Best Practices


Informatica PowerCenter Development Best Practices
1. Lookup - Performance considerations
What is a lookup transformation? It is just not another transformation that
fetches you data to look up against source data. !ookup is an important and
useful transformation when used effectively. If used improperly" performance
of your mapping will #e severely impaired.
!et us see the different scenarios where you can face pro#lems with !ookup
and also how to tackle them.
1.1. Unwanted columns
By default" when you create a lookup on a ta#le" PowerCenter gives you all the
columns in the ta#le. If not all the columns are re$uired for the lookup
condition or return" delete the unwanted columns from the transformations.
By not removing the unwanted columns" the cache si%e will increase.
1.2. Size of the source versus size of lookup
!et us say" you have &' rows in the source and one of the columns has to #e
checked against a #ig ta#le (& million rows). *hen PowerCenter #uilds the
cache for the lookup ta#le and then checks the &' source rows against the
cache. It takes more time to #uild the cache of & million rows than going to the
data#ase &' times and lookup against the ta#le directly.
+se uncached lookup instead of #uilding the static cache" as the num#er of
source rows is $uite less than that of the lookup.
1.3. J!" instead of Lookup
In the same conte,t as a#ove" if the !ookup transformation is after the source
$ualifier and there is no active transformation in-#etween" you can as well go
for the ./! over ride of source $ualifier and join traditionally to the lookup
ta#le using data#ase joins" if #oth the ta#les are in the same data#ase and
schema.
1.#. $onditional call of lookup
Instead of going for connected lookups with filters for a conditional lookup
call" go for unconnected lookup. Is the single column return #othering for
this? 0o ahead and change the ./! override to concatenate the re$uired
columns into one #ig column. Break them at the calling side into individual
columns again.
1.%. S&L 'uer(
1ind the e,ecution plan of the !ookup ./! and see if you can add some
inde,es or hints to the $uery to make it fetch data faster. 2ou may have to take
the help of a data#ase developer to accomplish this if you" yourself are not a
./!er.
1.). !ncrease cache
If none of the a#ove options provide performance enhancements" then the
pro#lem may lie with the cache. *he cache that you assigned for the lookup is
not sufficient to hold the data or inde, of the lookup. Whatever data that
doesn3t fit into the cache is spilt into the cache files designated in
4P5CacheDir. When the PowerCenter doesn3t find the data you are looking
for in the cache" it swaps the data from the file to the cache and keeps doing
this until the data is found. *his is $uite e,pensive #eing that this type of
operation is very I67 intense. *o stop this issue from occurring" increase the
si%e of the cache so the entire data set resides in memory. When increasing the
cache you also have to #e aware of the system constraints. If your cache si%e is
greater than the resources availa#le" the session will fail due to the lack of
resources.
1.*. $achefile file-s(stem
In many cases" if you have cache directory in a different file-system than that
of the hosting server" the cache file piling up may take time and result in
latency. .o with the help of your system administrator try to look into this
aspect as well.
1.+. Useful cache utilities
If the same lookup ./! is #eing used #y another lookup" then shared cache or
a reusa#le lookup should #e used. lso" if you have a ta#le where the data is
not changed often" you can use the persist cache option to #uild the cache
once and use it many times #y consecutive flows.
2. ,orkflow performance - .asic considerations
*hough performance tuning has #een the most feared part of development" it
is the easiest" if the intricacies are known. With the newer and newer versions
of PowerCenter" there is added fle,i#ility for the developer to #uild #etterperforming
workflows. *he major #locks for performance are the design of the
mapping" ./! tuning if data#ases are involved.
8egarding the design of the mapping" I have few #asic considerations to #e
made. Please note that these are not any rules-of-thum#" #ut will make you act
sensi#ly in different scenarios.
&. I would always suggest you to think twice #efore using an +pdate
.trategy" though it adds a certain level of fle,i#ility in the mapping. If
you have a straight-through mapping which takes data from source and
directly inserts all the records into the target" you wouldn9t need an
update strategy.
:. +se a pre-./! delete statement if you wish to delete specific rows from
target #efore loading into the target. +se truncate option in the session
properties" if you wish to clean the ta#le #efore loading. I would avoid a
separate pipe-line in the mapping that runs #efore the load with
update-strategy transformation.
;. 2ou have ; sources and ; targets with one-on-one mapping. If the load
is independent according to #usiness re$uirement" I would create ;
different mappings and ; different session instances and they all run in
parallel in my workflow after my <.tart= task. I9ve o#served that the
workflow runtime comes down #etween ;'->'? of serial processing.
@. PowerCenter is #uilt to work of high volumes of data. .o let the server
#e completely #usy. Induce parallelism as far as possi#le into the
mapping6workflow.
A. If using a transformation like a Boiner or ggregator transformation"
sort the data on the join keys or group #y columns prior to these
transformations to decrease the processing time.
>. 1iltering should #e done at the data#ase level instead within the
mapping. *he data#ase engine is much more efficient in filtering than
PowerCenter.
*he a#ove e,amples are just some things to consider when tuning a mapping.
2.1. S&L tunin/
./! $ueries6actions occur in PowerCenter in one of the #elow ways.
C 8elational .ource /ualifier
C !ookup ./! 7verride
C .tored Procedures
C 8elational *arget
+sing the e,ecution plan to tune a $uery is the #est way to gain an
understanding of how the data#ase will process the data. .ome things to keep
in mind when reading the e,ecution plan includeD E1ull *a#le .cans are
not evilE" EInde,es are not always fastE" and <Inde,es can #e slow
tooE.
nalyse the ta#le data to see if picking up :' records out of :' million is #est
using inde, or using ta#le scan. 1etching &' records out of &A using inde, is
faster or using full ta#le scan is easier.
5any times the relational target inde,es create performance pro#lems when
loading records into the relational target. If the inde,es are needed for other
purposes" it is suggested to drop the inde,es at the time of loading and then
re#uild them in post-./!. When dropping inde,es on a target you should
consider integrity constraints and the time it takes to re#uild the inde, on post
load vs. actual load time.
3. Pre0Post-Session command - Uses
C It is a very good practice to email the success or failure status of a task"
once it is done. In the same way" when a #usiness re$uirement drives"
make use of the Post .ession .uccess and 1ailure email for proper
communication.
C *he #uilt-in feature offers more fle,i#ility with .ession !ogs as
attachments and also provides other run-time data like Workflow runinstance
ID" etc.
C ny archiving activities around the source and target flat files can #e
easily managed within the session using the session properties for flat
file command support that is new in PowerCenter vF.>. 1or e,ample"
after writing the flat file target" you can setup a command to %ip the file
to save space.
C If you have any editing of data in the target flat files which your
mapping couldn9t accommodate" write a shell6#atch command or script
and call it in the Post-.ession command task. I prefer taking trade-offs
#etween PowerCenter capa#ilities and the 7. capa#ilities in these
scenarios.
#. Se'uence /enerator - desi/n considerations
In most of the cases" I would advice you to avoid the use of se$uence generator
transformation" while populating an ID column in the relational target ta#le. I
suggest you rather create a se$uence on the target data#ase and ena#le the
trigger on that ta#le to fetch the value from the data#ase se$uence.
*here are many advantages to using a data#ase se$uence generatorD
C 1ewer PowerCenter o#jects will #e present in a mapping which reduces
development time and also maintenance effort.
C ID generation is PowerCenter independent if a different application is
used in future to populate the target.
C 5igration #etween environments is simplified #ecause there is no
additional overhead of considering the persistent values of the
se$uence generator from the repository data#ase.
In all of the a#ove cases" a se$uence created in the target data#ase would make
life lot easier for the ta#le data maintenance and also for the PowerCenter
development. In fact" data#ases will have specific mechanisms (focused) to
deal with se$uences and so you can implement manual Push-down
optimi%ation on your PowerCenter mapping design for yourself.
DBs will always complain a#out triggers on the data#ases" #ut I would still
insist on using se$uence-trigger com#ination for huge volumes of data as well.
%. 12P $onnection o.3ect - platform independence
If you have any files to #e read as source from Windows server when your
PowerCenter server is hosted on +GIH6!IG+H" then make use of 1*P users
on the Windows server and use 1ile 8eader with 1*P Connection o#ject.
*his connection o#ject can #e added as any other connection string. *his gives
the fle,i#ility of platform independence. *his will further reduce the overhead
of having .5B mounts on to the Informatica #o,es.