Informatica DW Consolidated

session properties -> general tab -> treat input link as 'AND'/'OR' session partition with aggregator transformation
preview data ******************************************************** High availability. You can use the high availability option to eliminate single points of failure in the PowerCenter environment and reduce service interruptions in the e vent of failure. High availability provides resilience, failover, and recovery for servi ces. infacmd infasetup pmrep Versioned Objects Workflow recovery Custom transformation. The Custom transformation has the following enhancements: Procedures with thread-specific operations. Java transformation. *********************************************************** Error codes BR - error related reader process, including ERP, flat file, relation sources CMN - error related databases, memory allocation, lookup, joiner and internal er rors DBGR - error related to debugger SE, TM, WID - errors related to transformations PMF - error related caching in aggregator, lookup, joiner, rank RR - error related to relational sources EP - error related to external procedure LM - error related Load Manager REP - error related to repository functions VAR - error related to mapping variable WRT - error related to writer
*********************************************************** Netezza and oracle - rownum and limit ************************************************************ Aggregator - Active & Connected source qualifier - Active & Connected Filter - Active & Connected Expression - Passive & Connected joiner - Active & Connected lookup - passive & connected/unconnected HTTP - passive & connected normalizer - active & connected rank - active & connected router - active & connected sequence - passive & connected sorter - active & connected stored procedure - passive & connected/unconnected union - active & connected
*************************************************************************8 A transformation is a repository object that generates, modifies, or passes dat a. The Designer provides a set of transformations that perform specific function s. For example, an Aggregator transformation performs calculations on groups of data. Lookup transformation: be sure to delete the unwanted columns from the lookup as they affect the loo kup cache very much. if the Lookup transformation is after the source qualifier and there is no ac tive transformation in-between, you can as well go for the SQL over ride of sour ce qualifier The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't fin d the data you are lookingup in the cache, it swaps the data from the file to th e cache and keeps doing this until it finds the data. This is quite expensive fo r obvious reasons being an I/O operation. Increase the cache so that the whol e data resides in the memory Sequential and Concurrent caches: The 8.x version of PowerCenter gives us this w onderful option to build the caches of the lookups either concurrently or in a s equential manner depending on the business rule. If no business rule dictates ot herwise, concurrent cache building is a very handy option. [HINTS] Difference b/w Aggregator and Expression Transformation? Expression transformati on permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups. HTTP Transformation Passive & Connected. It allows you to connect to an HTTP server to use its servi ces and applications. With an HTTP transformation, the Integration Service conne cts to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping.
***********************************************************************888
**************************************************************************** Q. What type of repositories can be created using Informatica Repository Manager ? A. Informatica PowerCenter includeds following type of repositories : Standalone Repository : A repository that functions individually and this is unr elated to any other repositories. Global Repository : This is a centralized repository in a domain. This repositor y can contain shared objects across the repositories in a domain. The objects ar e shared through global shortcuts.
Local Repository : Local repository is within a domain and it s not a global repos itory. Local repository can connect to a global repository using global shortcut s and can use objects in it s shared folders. Versioned Repository : This can either be local or global repository but it allo ws version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment. Q. What is a code page? A. A code page contains encoding to specify characters in a set of one or more l anguages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support J apanese text. When a code page is chosen, the program or application for which the code page i s set, refers to a specific set of data that describes the characters the applic ation recognizes. This influences the way that application stores, receives, and sends character data. Q. Which all databases PowerCenter Server on Windows can connect to? A. PowerCenter Server on Windows can connect to following databases: IBM DB2 Informix Microsoft Access Microsoft Excel Microsoft SQL Server Oracle Sybase Teradata Q. Which all databases PowerCenter Server on UNIX can connect to? A. PowerCenter Server on UNIX can connect to following databases: IBM DB2 Informix Oracle Sybase Teradata Infomratica Mapping Designer Q. How to execute PL/SQL script from Informatica mapping? A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. I n SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure. Q. How can you define a transformation? What are different types of transformati ons available in Informatica? A. A transformation is a repository object that generates, modifies, or passes d ata. The Designer provides a set of transformations that perform specific functi ons. For example, an Aggregator transformation performs calculations on groups o f data. Below are the various transformations available in Informatica: Aggregator Application Source Qualifier
Custom Expression External Procedure Filter Input Joiner Lookup Normalizer Output Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier Q. What is a source qualifier? What is meant by Query Override? A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a f lat file source definition is added to a mapping, it is connected to a Source Qu alifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation wh enever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default que ry by changing the default settings of the transformation properties. The list o f selected ports or the order they appear in the default query should not be cha nged in overridden query. Q. What is aggregator transformation? A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transform ation can only be used to perform calculations on groups. The Expression transfo rmation permits calculations on a row-by-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last r ow of each group unless otherwise specified in the transformation properties. Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAS T, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE. Q. What is Incremental Aggregation? A. Whenever a session is created for a mapping Aggregate Transformation, the ses sion option for Incremental Aggregation can be enabled. When PowerCenter perform s incremental aggregation, it passes new source data through the mapping and use s historical cache data to perform new aggregation calculations incrementally. Q. How Union Transformation is used? A. The union transformation is a multiple input group transformation that can be
used to merge data from various sources (or pipelines). This transformation wor ks just like UNION ALL statement in SQL, that is used to combine result set of t wo SELECT statements. Q. Can two flat files be joined with Joiner Transformation? A. Yes, joiner transformation can be used to join data from two flat file source s. Q. What is a look up transformation? A. This transformation is used to lookup data in a flat file or a relational tab le, view or synonym. It compares lookup transformation ports (input ports) to th e source column values based on the lookup condition. Later returned values can be passed to other transformations. Q. Can a lookup be done on Flat Files? A. Yes. Q. What is the difference between a connected look up and unconnected look up? A. Connected lookup takes input values directly from other transformations in th e pipleline. Unconnected lookup doesn t take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping. Q. What is a mapplet? A. A mapplet is a reusable object that is created using mapplet designer. The ma pplet contains set of transformations and it allows us to reuse that transformat ion logic in multiple mappings. Q. What does reusable transformation mean? A. Reusable transformations can be used multiple times in a mapping. The reusabl e transformation is stored as a metadata separate from any other mapping that us es the transformation. Whenever any changes to a reusable transformation are mad e, all the mappings where the transformation is used will be invalidated. Q. What is update strategy and what are the options for update strategy? A. Informatica processes the source data row-by-row. By default every row is mar ked to be inserted in the target table. If the row has to be updated/inserted ba sed on some logic Update Strategy transformation is used. The condition can be s pecified in Update Strategy to mark the processed row for update or insert. Following options are available for update strategy : DD_INSERT : If this is used the Update Strategy ivalent numeric value of DD_INSERT is 0. DD_UPDATE : If this is used the Update Strategy lent numeric value of DD_UPDATE is 1. DD_DELETE : If this is used the Update Strategy valent numeric value of DD_DELETE is 2. DD_REJECT : If this is used the Update Strategy flags the row for insertion. Equ flags the row for update. Equiva flags the row for deletion. Equi flags the row for rejection. Equ
ivalent numeric value of DD_REJECT is 3. *******************************************************8 SESSION LOGS Information that reside in a session log: Allocation of system shared memory Execution of Pre-session commands/ Post-session commands Session Initialization Creation of SQL commands for reader/writer threads Start/End timings for target loading Error encountered during session Load summary of Reader/Writer/ DTM statistics Other Information By default, the server generates log files based on the server code page. Thread Identifier Ex: CMN_1039 Reader and Writer thread codes have 3 digit and Transformation codes have 4 digi ts. The number following a thread name indicate the following: (a) Target load order group number (b) Source pipeline number (c) Partition number (d) Aggregate/ Rank boundary number Log File Codes Error Codes Description BR CMN DBGR EPLM TM REP WRT Related to reader process, including ERP, relational and flat file. Related to database, memory allocation Related to debugger External Procedure Load Manager DTM Repository Writer
Load Summary (a) Inserted (b) Updated (c) Deleted (d) Rejected Statistics details (a) Requested rows shows the no of rows the writer actually received for the specified operation (b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error) (c) Rejected rows show the no of rows the writer could not apply to the targe t (d) Affected rows shows the no of rows affected by the specified operation Detailed transformation statistics The server reports the following details for each transformation in the mapping
(a) (b) (c) (d)
Name of Transformation No of I/P rows and name of the Input source No of O/P rows and name of the output target No of rows dropped
Tracing Levels Normal Initialization and status information, Errors encountered, Transf ormation errors, rows skipped, summarize session details (Not at the level of in dividual rows) Terse Initialization information as well as error messages, and notifica tion of rejected data Verbose Init Addition to normal tracing, Names of Index, Data files used and detailed transformation statistics. Verbose Data Addition to Verbose Init, Each row that passes in to mappin g detailed transformation statistics. NOTE When you enter tracing level in the session property sheet, you override tracing levels configured for transformations in the mapping. MULTIPLE SERVERS With Power Center, we can register and run multiple servers against a local or g lobal repository. Hence you can distribute the repository session load across av ailable servers to improve overall performance. (You can use only one Power Mart server in a local repository) Issues in Server Organization Moving target database into the appropriate server machine may improve effi ciency All Sessions/Batches using data from other sessions/batches need to use the same server and be incorporated into the same batch. Server with different speed/sizes can be used for handling most complicated sessions. Session/Batch Behavior By default, every session/batch run on its associated Informatica server. T hat is selected in property sheet. In batches, that contain sessions with various servers, the property goes t o the servers, that?s of outer most batch. Session Failures and Recovering Sessions Two types of errors occurs in the server Non-Fatal Fatal (a) Non-Fatal Errors
It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session property sheet with the stop on opt ion. When you enable this option, the server counts Non-Fatal errors that occur in the reader, writer and transformations. Reader errors can include alignment errors while running a session in Unicode mo de. Writer errors can include key constraint violations, loading NULL into the NOT-N ULL field and database errors. Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as NULL Input. (b) Fatal Errors This occurs when the server can not access the source, target or repository. Thi
s can include loss of connection or target database errors, such as lack of data base space to load data. If the session uses normalizer (or) sequence generator transformations, the serv er can not update the sequence values in the repository, and a fatal error occur s. Others Usages of ABORT function in mapping logic, to abort a session when the server en counters a transformation error. Stopping the server using pmcmd (or) Server Manager
Performing Recovery When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY t able and notes the rowid of the last row commited to the target database. The se rver then reads all sources again and starts processing from the next rowid. By default, perform recovery is disabled in setup. Hence it won?t make entr ies in OPB_SRVR_RECOVERY table. The recovery session moves through the states of normal session schedule, w aiting to run, Initializing, running, completed and failed. If the initial recov ery fails, you can run recovery as many times. The normal reject loading process can also be done in session recovery proc ess. The performance of recovery might be low, if o Mapping contain mapping variables o Commit interval is high Un recoverable Sessions Under certain circumstances, when a session does not complete, you need to trunc ate the target and run the session from the beginning. Commit Intervals A commit interval is the interval at which the server commits data to relational targets during a session. (a) Target based commit Server commits data based on the no of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval. During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the buffer block is full, the Informatica serv er issues a commit command. As a result, the amount of data committed at the com mit point generally exceeds the commit interval. The server commits data to each target based on primary ?foreign key constr aints. (b) Source based commit
Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. During a session, the server commits data to the target based on the number of rows from an active source in a single pipeline. The rows are referred to as source rows. A pipeline consists of a source qualifier and all the transformations and t argets that receive data from source qualifier. Although the Filter, Router and Update Strategy transformations are active transformations, the server does not use them as active sources in a source base d commit session. When a server runs a session, it identifies the active source for each pipe
line in the mapping. The server generates a commit row from the active source at every commit interval. When each target in the pipeline receives the commit rows the server perfor ms the commit. Reject Loading During a session, the server creates a reject file for each target instance in t he mapping. If the writer of the target rejects data, the server writers the rej ected row into the reject file. You can correct those rejected data and re-load them to relational targets, usin g the reject loading utility. (You cannot load rejected data into a flat file ta rget) Each time, you run a session, the server appends a rejected data to the reject f ile. Locating the BadFiles $PMBadFileDir Filename.bad When you run a partitioned session, the server creates a separate reject file fo r each partition. Reading Rejected data Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00 To help us in finding the reason for rejecting, there are two main things. (a) Row indicator Row indicator tells the writer, what to do with the row of wrong data. Row indicator Meaning Rejected By 0 Insert Writer or target 1 Update Writer or target 2 Delete Writer or target 3 Reject Writer If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject. (b) Column indicator Column indicator is followed by the first column of data, and another column ind icator. They appears after every column of data and define the type of data prec eding it Column Indicator D Meaning Valid Data Writer Treats as Good Data. The target accepts it unless a database error
occurs, such as finding duplicate key. ? N T Overflow Bad Data. Null Truncated Bad Data. Bad Data
NOTE NULL columns appear in the reject file with commas marking their column. Correcting Reject File Use the reject file and the session log to determine the cause for rejected data . Keep in mind that correcting the reject file does not necessarily correct the so urce of the reject. Correct the mapping and target database to eliminate some of the rejected data w hen you run the session again. Trying to correct target rejected rows before correcting writer rejected rows is
not recommended since they may contain misleading column indicator. For example, a series of ?N? indicator might lead you to believe the target data base does not accept NULL values, so you decide to change those NULL values to Z ero. However, if those rows also had a 3 in row indicator. Column, the row was reject ed b the writer because of an update strategy expression, not because of a targe t database restriction. If you try to load the corrected file to target, the writer will again reject th ose rows, and they will contain inaccurate 0 values, in place of NULL values. Why writer can reject ? Data overflowed column constraints An update strategy expression Why target database can Reject ? Data contains a NULL column Database errors, such as key violations Steps for loading reject file: After correcting the rejected data, rename the rejected file to reject_file .in The rejloader used the data movement mode configured for the server. It als o used the code page of server/OS. Hence do not change the above, in middle of t he reject loading Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name] Other points The server does not perform the following option, when using reject loader (a) Source base commit (b) Constraint based loading (c) Truncated target table (d) FTP targets (e) External Loading Multiple reject loaders You can run the session several times and correct rejected data from the several session at once. You can correct and load all of the reject files at once, or w ork on one or two reject files, load then and work on the other at a later time. External Loading You can configure a session to use Sybase IQ, Teradata and Oracle external loade rs to load session target files into the respective databases. The External Loader option can increase session performance since these database s can load information directly from files faster than they can the SQL commands to insert the same data into the database. Method: When a session used External loader, the session creates a control file and targ et flat file. The control file contains information about the target flat file, such as data format and loading instruction for the External Loader. The control file has an extension of ?*.ctl ? and you can view the file in $PmtargetFilesDi r. For using an External Loader: The following must be done: configure an external loader connection in the server manager Configure the session to write to a target flat file local to the server. Choose an external loader connection for each target file in session proper ty sheet. Issues with External Loader: Disable constraints Performance issues
o o -
Increase commit intervals Turn off database logging Code page requirements The server can use multiple External Loader within one session (Ex: you are having a session with the two target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information: The External Loader performance depends upon the platform of the server The server loads data at different stages of the session The serve writes External Loader initialization and completing messaging in the session log. However, details about EL performance, it is generated at EL l og, which is getting stored as same target directory. If the session contains errors, the server continues the EL process. If the session fails, the server loads partial target data using EL. The EL creates a reject file for data rejected by the database. The reject file has an extension of ?*.ldr? reject. The EL saves the reject file in the target file directory You can load corrected data from the file, using database reject loader, an d not through Informatica reject load utility (For EL reject file only) Configuring EL in session In the server manager, open the session property sheet Select File target, and then click flat file options Caches server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup transformation in a mapping. Server stores key values in index caches and output values in data caches : if the server requires more memory ,it stores overflow values in cache files . When the session completes, the server releases caches memory, and in most circumstances, it deletes the caches files . Caches Storage overflow : releases caches memory, and in most circumstances, it deletes the caches fi les .
Caches Storage overflow : Transformation index cache data cache Aggregator stores group values stores calculations As configured in the based on Group-by ports Group-by ports. Rank stores group values as stores ranking information Configured in the Group-by based on Group-by ports . Joiner stores index values for stores master source rows . The master source table As configured in Joiner condition. Lookup stores Lookup condition stores lookup data that?s Information. Not stored in the index cache. Determining cache requirements To calculate the cache size, you need to consider column and row requirements as well as processing overhead. server requires processing overhead to cache data and index information. Column overhead includes a null indicator, and row overhead can include row to k ey information. Steps:
first, add the total column size in the cache to the row overhead. Multiply the result by the no of groups (or) rows in the cache this gives t he minimum cache requirements . For maximum requirements, multiply min requirements by 2. Location: ? by default , the server stores the index and data files in the directory $P MCacheDir. ? the server names the index files PMAGG*.idx and data files PMAGG*.dat. if t he size exceeds 2GB,you may find multiple index and data files in the directory .The server appends a number to the end of filename(PMAGG*.id*1,id*2,etc). Aggregator Caches ? when server runs a session with an aggregator transformation, it stores dat a in memory until it completes the aggregation. ? when you partition a source, the server creates one memory cache and one di sk cache and one and disk cache for each partition .It routes data from one part ition to another based on group key values of the transformation. ? server uses memory to process an aggregator transformation with sort ports. It doesn?t use cache memory .you don?t need to configure the cache memory, that use sorted ports. Index cache: #Groups (( column size) + 7) Aggregate data cache: #Groups (( column size) + 7) Rank Cache when the server runs a session with a Rank transformation, it compares an i nput row with rows with rows in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored row with the input row. If the rank transformation is configured to rank across multiple groups, th e server ranks incrementally for each group it finds . Index Cache : #Groups (( column size) + 7) Rank Data Cache: #Group [(#Ranks * ( column size + 10)) + 20] Joiner Cache: When server runs a session with joiner transformation, it reads all rows fr om the master source and builds memory caches based on the master rows. After building these caches, the server reads rows from the detail source a nd performs the joins Server creates the Index cache as it reads the master source into the data cache. The server uses the Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache. To improve joiner performance, the server aligns all data for joiner cache or an eight byte boundary. Index Cache : #Master rows [( column size) + 16) Joiner Data Cache: #Master row [( column size) + 8] Lookup cache: When server runs a lookup transformation, the server builds a cache in memo ry, when it process the first row of data in the transformation. Server builds the cache and queries it for the each row that enters the tra nsformation. If you partition the source pipeline, the server allocates the configured a mount of memory for each partition. If two lookup transformations share the cach
e, the server does not allocate additional memory for the second lookup transfor mation. The server creates index and data cache files in the lookup cache drectory and used the server code page to create the files. Index Cache : #Rows in lookup table [( column size) + 16) Lookup Data Cache: #Rows in lookup table [( column size) + 8] Transformations A transformation is a repository object that generates, modifies or passes data. (a) Active Transformation: a. Can change the number of rows, that passes through it (Filter, Normalizer, Rank ..) (b) a. ..) Passive Transformation: Does not change the no of rows that passes through it (Expression, lookup
NOTE: Transformations can be connected to the data flow or they can be unconnecte d An unconnected transformation is not connected to other transformation in t he mapping. It is called with in another transformation and returns a value to t hat transformation Reusable Transformations: When you are using reusable transformation to a mapping, the definition of trans formation exists outside the mapping while an instance appears with mapping. All the changes you are making in transformation will immediately reflect in ins tances. You can create reusable transformation by two methods: (a) Designing in transformation developer (b) Promoting a standard transformation Change that reflects in mappings are like expressions. If port name etc. are cha nges they won?t reflect. Handling High-Precision Data: Server process decimal values as doubles or decimals. When you create a session, you choose to enable the decimal data type or le t the server process the data as double (Precision of 15) Example: You may have a mapping with decimal (20,0) that passes through. The value m ay be 40012030304957666903. If you enable decimal arithmetic, the server passes the number as it is. If you do not enable decimal arithmetic, the server passes 4.00120303049577 X 1019. If you want to process a decimal value with a precision greater than 28 digits, the server automatically treats as a double value. Mapplets When the server runs a session using a mapplets, it expands the mapplets. The se rver then runs the session as it would any other sessions, passing data through each transformations in the mapplet. If you use a reusable transformation in a mapplet, changes to these can invalida te the mapplet and every mapping using the mapplet. You can create a non-reusable instance of a reusable transformation. Mapplet Objects: (a) Input transformation
(b) (c) (d)
Source qualifier Transformations, as you need Output transformation
Mapplet Won?t Support: Joiner Normalizer Pre/Post session stored procedure Target definitions XML source definitions Types of Mapplets: (a) Active Mapplets (b) Passive Mapplets Contains one or more active transformations Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit your changes You can use a single mapplet, even more than once on a mapping. Ports Default value for I/P port NULL Default value for O/P port ERROR Default value for variables Does not support default values Session Parameters This parameter represent values you might want to change between sessions, such as DB Connection or source file. We can use session parameter in a session property sheet, then define the parame ters in a session parameter file. The user defined session parameter are: (a) DB Connection (b) Source File directory (c) Target file directory (d) Reject file directory Description: Use session parameter to make sessions more flexible. For example, you have the same type of transactional data written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for both tables. Instead of creating two sessions for the same mapping, you can create a database connection parameter, like $DBConnectionSource, and use it as the source databa se connection for the session. When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After it completes set the value to TransDB2 and run the session again. NOTE: You can use several parameter together to make session management easier. Session parameters do not have default value, when the server can not find a val ue for a session parameter, it fails to initialize the session. Session Parameter File A parameter file is created by text editor. In that, we can specify the folder and session name, then list the paramete rs and variables used in the session and assign each value. Save the parameter file in any directory, load to the server We can define following values in a parameter o Mapping parameter o Mapping variables o Session parameters You can include parameter and variable information for more than one sessio
n in a single parameter file by creating separate sections, for each session wit h in the parameter file. You can override the parameter file for sessions contained in a batch by us ing a batch parameter file. A batch parameter file has the same format as a sess ion parameter file Locale Informatica server can transform character data in two modes (a) ASCII a. Default one b. Passes 7 byte, US-ASCII character data (b) UNICODE a. Passes 8 bytes, multi byte character data b. It uses 2 bytes for each character to move data and performs additional ch ecks at session level, to ensure data integrity. Code pages contains the encoding to specify characters in a set of one or more l anguages. We can select a code page, based on the type of character data in the mappings. Compatibility between code pages is essential for accurate data movement. The various code page components are Operating system Locale settings Operating system code page Informatica server data movement mode Informatica server code page Informatica repository code page Locale (a) System Locale System Default (b) User locale setting for date, time, display Input locale Mapping Parameter and Variables These represent values in mappings/mapplets. If we declare mapping parameters and variables in a mapping, you can reuse a map ping by altering the parameter and variable values of the mappings in the sessio n. This can reduce the overhead of creating multiple mappings when only certain att ributes of mapping needs to be changed. When you want to use the same value for a mapping parameter each time you run th e session. Unlike a mapping parameter, a mapping variable represent a value that can change through the session. The server saves the value of a mapping variable to the re pository at the end of each successful run and used that value the next time you run the session. Mapping objects: Source, Target, Transformation, Cubes, Dimension Debugger We can run the Debugger in two situations (a) Before Session: After saving mapping, we can run some initial tests. (b) After Session: real Debugging process Metadata Reporter: Web based application that allows to run reports against repository metadat a Reports including executed sessions, lookup table dependencies, mappings an d source/target schemas.
Repository Types of Repository (a) Global Repository a. This is the hub of the domain use the GR to store common objects that mult iple developers can use through shortcuts. These may include operational or appl ication source definitions, reusable transformations, mapplets and mappings (b) Local Repository a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for development. Standard Repository a. A repository that functions individually, unrelated and unconnected to oth er repository NOTE: Once you create a global repository, you can not change it to a local repos itory However, you can promote the local to global repository Batches Provide a way to group sessions for either serial or parallel execution by server Batches o Sequential (Runs session one after another) o Concurrent (Runs sessions at same time) Nesting Batches Each batch can contain any number of session/batches. We can nest batches severa l levels deep, defining batches within batches Nested batches are useful when you want to control a complex series of sessions that must run sequentially or concurrently Scheduling When you place sessions in a batch, the batch schedule override that session sch edule by default. However, we can configure a batched session to run on its own schedule by selecting the ?Use Absolute Time Session? Option. Server Behavior Server configured to run a batch overrides the server configuration to run sessi ons within the batch. If you have multiple servers, all sessions within a batch run on the Informatica server that runs the batch. The server marks a batch as failed if one of its sessions is configured to run i f ?Previous completes? and that previous session fails. Sequential Batch If you have sessions with dependent source/target relationship, you can place th em in a sequential batch, so that Informatica server can run them is consecutive order. They are two ways of running sessions, under this category (a) Run the session, only if the previous completes successfully (b) Always run the session (this is default) Concurrent Batch In this mode, the server starts all of the sessions within the batch, at same ti me Concurrent batches take advantage of the resource of the Informatica server, red ucing the time it takes to run the session separately or in a sequential batch. Concurrent batch in a Sequential batch If you have concurrent batches with source-target dependencies that benefit from running those batches in a particular order, just like sessions, place them int o a sequential batch.
Server Concepts The Informatica server used three system resources (a) CPU (b) Shared Memory (c) Buffer Memory Informatica server uses shared memory, buffer memory and cache memory for sessio n information and to move data between session threads. LM Shared Memory Load Manager uses both process and shared memory. The LM keeps the information s erver list of sessions and batches, and the schedule queue in process memory. Once a session starts, the LM uses shared memory to store session details for th e duration of the session run or session schedule. This shared memory appears as the configurable parameter (LMSharedMemory) and the server allots 2,000,000 byt es as default. This allows you to schedule or run approximately 10 sessions at one time. DTM Buffer Memory The DTM process allocates buffer memory to the session based on the DTM buffer p oll size settings, in session properties. By default, it allocates 12,000,000 by tes of memory to the session. DTM divides memory into buffer blocks as configured in the buffer block size set tings. (Default: 64,000 bytes per block) Running a Session The following tasks are being done during a session 1. LM locks the session and read session properties 2. LM reads parameter file 3. LM expands server/session variables and parameters 4. LM verifies permission and privileges 5. LM validates source and target code page 6. LM creates session log file 7. LM creates DTM process 8. DTM process allocates DTM process memory 9. DTM initializes the session and fetches mapping 10. DTM executes pre-session commands and procedures 11. DTM creates reader, writer, transformation threads for each pipeline 12. DTM executes post-session commands and procedures 13. DTM writes historical incremental aggregation/lookup to repository 14. LM sends post-session emails Stopping and aborting a session If the session you want to stop is a part of batch, you must stop the batch If the batch is part of nested batch, stop the outermost batch When you issue the stop command, the server stops reading data. It continue s processing and writing data and committing data to targets If the server cannot finish processing and committing data, you can issue t he ABORT command. It is similar to stop command, except it has a 60 second timeo ut. If the server cannot finish processing and committing data within 60 seconds , it kills the DTM process and terminates the session. Recovery: After a session being stopped/aborted, the session results can be recovered . When the recovery is performed, the session continues from the point at which it stopped. If you do not recover the session, the server runs the entire session the n ext time. Hence, after stopping/aborting, you may need to manually delete targets bef ore the session runs again. NOTE: ABORT command and ABORT function, both are different.
When can a Session Fail Server cannot allocate enough system resources Session exceeds the maximum no of sessions the server can run concurrently Server cannot obtain an execute lock for the session (the session is alread y locked) Server unable to execute post-session shell commands or post-load stored pr ocedures Server encounters database errors Server encounter Transformation row errors (Ex: NULL value in non-null fiel ds) Network related errors When Pre/Post Shell Commands are useful To delete a reject file To archive target files before session begins Session Performance Minimum log (Terse) Partitioning source data. Performing ETL for each partition, in parallel. (For this, multiple CPUs ar e needed) Adding indexes. Changing commit Level. Using Filter trans to remove unwanted data movement. Increasing buffer memory, when large volume of data. Multiple lookups can reduce the performance. Verify the largest lookup tabl e and tune the expressions. In session level, the causes are small cache size, low buffer memory and sm all commit interval. At system level, o WIN NT/2000-U the task manager. o UNIX: VMSTART, IOSTART. Hierarchy of optimization Target. Source. Mapping Session. System. Optimizing Target Databases: Drop indexes /constraints Increase checkpoint intervals. Use bulk loading /external loading. Turn off recovery. Increase database network packet size. Source level Optimize the query (using group by, group by). Use conditional filters. Connect to RDBMS using IPC protocol.
Mapping Optimize data type conversions. Eliminate transformation errors. Optimize transformations/ expressions.
Session: concurrent batches. Partition sessions. Reduce error tracing. Remove staging area. Tune session parameters. System: improve network speed. Use multiple preservers on separate systems. Reduce paging. Session Process Info server uses both process memory and system shared memory to perform ETL pro cess. It runs as a daemon on UNIX and as a service on WIN NT. The following processes are used to run a session: (a) LOAD manager process: starts a session ? creates DTM process, which creates the session. (b) ? ? Load DTM process: creates threads to initialize the session read, write and transform data. handle pre/post session opertions. manager processes: manages session/batch scheduling. Locks session. Reads parameter file. Expands server/session variables, parameters . Verifies permissions/privileges. Creates session log file.
DTM process: The primary purpose of the DTM is to create and manage threads that carry out th e session tasks. The DTM allocates process memory for the session and divides it into buffers. Th is is known as buffer memory. The default memory allocation is 12,000,000 bytes .it creates the main thread, which is called master thread .this manages all oth er threads. Various threads Master threadmanager. functions handles stop and abort requests from load
Mapping threadone thread for each session. Fetches session and mapping information. Compiles mapping. Cleans up after execution. Reader threadone thread for each partition. Relational sources uses relational threads and Flat files use file threads. Writer thread. Transformation threadion. Note: one thread for each partition writes to target one or more transformation for each partit
When you run a session, the threads for a partitioned source execute concurr ently. The threads use buffers to move/transform data. *************************************************************************8 What r the out put files that the informatica server creates during the session running? Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log).It also creates an error log fo r error messages.These files will be created in informatica home directory. Session log file: Informatica server creates session log file for each session.I t writes information about session into log files such as initialization process ,creation of sql commands for reader and writer threads,errors encountered and l oad summary.The amount of detail in session log file depends on the tracing leve l that u set. Session detail file: This file contains load statistics for each targets in mapp ing.Session detail include information such as table name,number of rows written or rejected.U can view this file by double clicking on the session in monitor w indow Performance detail file: This file contains information known as session perform ance details which helps U where performance can be improved.To genarate this fi le select the performance detail option in the session property sheet. Reject file: This file contains the rows of data that the writer does notwrite t o targets. Control file: Informatica server creates control file and a target file when U r un a session that uses the external loader.The control file contains the informa tion about the target flat file such as data format and loading instructios for the external loader. Post session email: Post session email allows U to automatically communicate inf ormation about a session run to designated recipents.U can create two different messages.One if the session completed sucessfully the other if the session fails . Indicator file: If u use the flat file as a target,U can configure the informati ca server to create indicator file.For each target row,the indicator file contai ns a number to indicate whether the row was marked for insert,update,delete or r eject. output file: If session writes to a target file,the informatica server creates t he target file based on file prpoerties entered in the session property sheet. Cache files: When the informatica server creates memory cache it also creates ca che files.For the following circumstances informatica server creates index and d atacache files. Aggreagtor transformation Joiner transformation Rank transformation Lookup transformation ******************************************************************************
******************************************************************************** ** ******************************************************************************** ** ******************************************************************************** ** Add EXPRESSION transformation after a SQ and before the target. If the source or target definition changes reconnecting ports is much easier. Denormalization using aggregator LAST function in aggregator Returns the last row in the selected port. Optionally, you can apply a filter to limit the rows the Integration Service reads. You can nest only one other aggre gate function within LAST. LAST(AMOUNT, MONTH) MQ Source qualifier transformation *********************************************** Test load takes the no.of rows from the SQ as specified in the test rows. and sh
ows to which instance it goes. not loaded to the target. Not working for flat file target *********************************************** EVENTWAIT-EVENTRAISE tasks First create an event by right click the workflow and edit-> events tab -> creat e an event place eventraise task in the workflow space and give the created event name in t he "user defined event" (properties tab) Place eventwait task in the workflow space-> 2 options in events tab predefined-this is a file watch userdefined-this is an event crea ted on the workflow properties events tab user-defined event to watch and give the the event created when the eventraise is executed, it creates the event-> it triggers the event wa it to continue *********************************************** Timer Task You can specify the period of time to wait before the Integration Service runs t he next task in the workflow with the Timer task. two options: Absolute time - give the exact time when to start ime variable Relative time - give the hours, minute, seconds sk ent workflow/worklet p-level workflow *********************************************** Decision Task Use the Decision task instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the predefined condition variable i n a Decision task to simplify link conditions. can be achieve without this task also *********************************************** Assignment task You can assign a value to a user-defined workflow variable with the Assignment t ask. ********************************************** Control Task the next task or refer a datet from the start time of this ta from the start time of the par from the start time of the to
fail me fail parent abort parent fail parentlevel workflow stop parent level workflow abort parent level workflow ********************************************** DECODE( ITEM_ID, 10, 'Flashlight', 14, 'Regulator', 20, 'Knife', 40, 'Tank', 'NONE' ) DECODE ( CONST_NAME, 'Five', 5, 'Pythagoras', '1.414213562', 'Archimedes', '3.141592654', 'Pi', 3.141592654 ) ********************************************** If you use an output default value other than ERROR, the default value overrides the ERROR function in an expression. For example, you use the ERROR function in an expression, and you assign the default value, 1234 , to the output port. Each t ime the Integration Service encounters the ERROR function in the expression, it overrides the error with the value 1234 and passes 1234 to the next transformation. It does not skip the row, and it does not log an error in the session log. IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', EMP_SALARY ) SALARY RETURN VALUE 10000 10000 -15000 'Error. Negative salary found. Row skipped.' NULL NULL 150000 150000 1005 1005 ********************************************** Returns e raised to the specified power (exponent), where e=2.71828183. For exam ple, EXP(2) returns 7.38905609893065. ********************************************** Returns the first value found within a port or group. Optionally, you can apply a filter to limit the rows the Integration Service reads. You can nest only one other aggregate function within FIRST. FIRST( value [, filter_condition ] ) FIRST( ITEM_NAME, ITEM_PRICE > 10 ) LAST(ITEM_NAME, ITEM_PRICE > 10) LPAD( PART_NUM, 6, '0') RIM( LAST_NAME, 'S.') *********************************************
Returns the future value of an investment, where you make periodic, constant pay ments and the investment earns a constant interest rate. Syntax FV( rate, terms, payment [, present value, type] ) FV(0.0075, 12, -250, -2000, TRUE) ********************************************* GET_DATE_PART( DATE_SHIPPED, 'HH12' ) GET_DATE_PART( DATE_SHIPPED, 'DD' ) GET_DATE_PART( DATE_SHIPPED, 'MON' ) ********************************************* GREATEST( QUANTITY1, QUANTITY2, QUANTITY3 ) ********************************************* INDEXOF( ITEM_NAME, diving hood , flashlight , safety knife )
1 if the input value matches string1, 2 if the input value matches string2, and so on. 0 if the input value is not found. NULL if the input is a null value. ********************************************* INITCAP( string ) ********************************************* INSTR( COMPANY, 'a', 1, 2 ) ********************************************* Is_Date() returns 0 or 1 ********************************************* Mapping -> Debugger -> use the existing session instance -> click next instance [shows one row at a time when moves throuthout the mapping, displays the next ro w] ********************************************* Is_Number() Is_spaces() Returns whether a string value consists entirely of spaces. LAST_DAY() Returns the date of the last day of the month for each date in a port . ******************************************** MAX( ITEM_NAME, MANUFACTURER_ID='104' ) - The MAX function uses the same sort or der that the Sorter transformation uses. However, the MAX function is case sensi tive, and the Sorter transformation may not be case sensitive.
You can also use MAX to return the latest date or the largest numeric value in a port or group. ******************************************* MD5 - Message-Digest algorithm 5 METAPHONE - Encodes string values. You can specify the length of the string that you want to encode. ******************************************* The following expression returns the average order for a Stabilizing Vest, based on the first five rows in the Sales port, and thereafter, returns the average f or the last five rows read: MOVINGAVG( SALES, 5 ) MOVINGSUM( SALES, 5 ) ******************************************* POWER( NUMBERS, EXPONENT ) ******************************************* RAND (1) Returns a random number between 0 and 1. This is useful for probabilit y scenarios. ******************************************* REG_EXTRACT REPLACESTR ( CaseFlag, InputString, OldString1, [OldString2, ... OldStringN,] Ne wString ) REVERSE( string ) RPAD( ITEM_NAME, 16, '.') TO_FLOAT() - 0 if the value in the port is blank or a non-numeric character. ****************************************** What is a global and local shortcuts? Standalone Repository : A repository that functions individually and this is unr elated to any other repositories. Global Repository : This is a centralized repository in a domain. This repositor y can contain shared objects across the repositories in a domain. The objects ar e shared through global shortcuts. Local Repository : Local repository is within a domain and it s not a global repos itory. Local repository can connect to a global repository using global shortcut s and can use objects in it s shared folders. ************************************************** When bulk loading, the Integration Service bypasses the database log, which spee ds performance. Without writing to the database log, however, the target databas e cannot perform rollback. As a result, you may not be able to perform recovery. When you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session.
************************************************* IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_A + VAL_B + VAL_C, IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N', VAL_A + VAL_B , IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y', VAL_A + VAL_C, IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N', VAL_A , IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_B + VAL_C, IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N', VAL_B , IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y', VAL_C, IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N', 0.0, )))))))) This expression requires 8 IIFs, 16 ANDs, and at least 24 comparisons. If you take advantage of the IIF function, you can rewrite that expression as: IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0. 0) *************************************************
****************************************************** Look for performance bottlenecks in the following order: 1. Target 2. Source 3. Mapping 4. Transformation 5. Session
6. grid deployments 7. Powercenter components 8. System ****************************************** Run test sessions. You can configure a test session to read from a flat file s ource or to write to a flat file target to identify source and target bottleneck s. Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases. Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points. Monitor system performance. You can use system monitoring tools to view the pe rcentage of CPU use, I/O waits, and paging to identify system bottlenecks. You c an also use the Workflow Monitor to view system resource usage. ****************************************** If the reader or writer thread is 100% busy, consider using string datatypes in the source or target ports. Non-string ports require more processing. If a transformation thread is 100% busy, consider adding a partition point in the segment. When you add partition points to the mapping, the Integration Servi ce increases the number of transformation threads it uses for the session. Howev er, if the machine is already running at or near full capacity, do not add more threads. If one transformation requires more processing time than the others, consider adding a pass-through partition point to the transformation. ***************************************** Complete the following tasks to eliminate target bottlenecks: Have the database administrator optimize database performance by optimizing th e query. Increase the database network packet size. Configure index and key constraints. **************************************** Source Bottlenecks Performance bottlenecks can occur when the Integration Service reads from a sour ce database. Inefficient query or small database network packet sizes can cause source bottlenecks. Identifying Source Bottlenecks You can read the thread statistics in the session log to determine if the source is the bottleneck. When the Integration Service spends more time on the reader thread than the transformation or writer threads, you have a source bottleneck.
If the session reads from a relational source, use the following methods to iden tify source bottlenecks: Filter transformation Read test mapping Database query If the session reads from a flat file source, you probably do not have a source bottleneck. Using a Filter Transformation You can use a Filter transformation in the mapping to measure the time it takes to read source data. Add a Filter transformation after each source qualifier. Set the filter conditio n to false so that no data is processed passed the Filter transformation. If the time it takes to run the new session remains about the same, you have a source bottleneck. Using a Read Test Mapping You can create a read test mapping to identify source bottlenecks. A read test m apping isolates the read query by removing the transformation in the mapping. To create a read test mapping, complete the following steps: 1. Make a copy of the original mapping. 2. In the copied mapping, keep only the sources, source qualifiers, and any cus tom joins or queries. 3. Remove all transformations. 4. Connect the source qualifiers to a file target. Run a session against the read test mapping. If the session performance is simil ar to the original session, you have a source bottleneck. Using a Database Query To identify source bottlenecks, execute the read query directly against the sour ce database. Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the re sult of the query in a file. On UNIX, you can load the result of the query in /d ev/null. Measure the query execution time and the time it takes for the query to return t he first row. Eliminating Source Bottlenecks Complete the following tasks to eliminate source bottlenecks: Set the number of bytes the Integration Service reads per line if the Integrat ion Service reads from a flat file source. Have the database administrator optimize database performance by optimizing th e query.
Increase the database network packet size. Configure index and key constraints. If there is a long delay between the two time measurements in a database query , you can use an optimizer hint. ******************A simple source filter on the source database can sometimes ne gatively impact performance because of the lack of indexes. You can use the Powe rCenter conditional filter in the Source Qualifier to improve performance.****** ** ***************************************************** Mapping Bottlenecks If you determine that you do not have a source or target bottleneck, you may hav e a mapping bottleneck. Generally, you reduce the number of transformations in the mapping and delete un necessary links between transformations to optimize the mapping. Configure the m apping with the least number of transformations and expressions to do the most a mount of work possible. Delete unnecessary links between transformations to mini mize the amount of data moved. **************************************************** Optimizing the Line Sequential Buffer Length If the session reads from a flat file source, you can improve session performanc e by setting the number of bytes the Integration Service reads per line. By defa ult, the Integration Service reads 1024 bytes per line. If each line in the sour ce file is less than the default setting, you can decrease the line sequential b uffer length in the session properties. *************************************************** Single-pass reading allows you to populate multiple targets with one source qual ifier. Consider using single-pass reading if you have multiple sessions that use the same sources. You can combine the transformation logic for each mapping in one mapping and use one source qualifier for each source. *************************************************** You can optimize performance for pass-through mappings. To pass directly from so urce to target without any other transformations, connect the Source Qualifier t ransformation directly to the target. If you use the Getting Started Wizard to c reate a pass-through mapping, the wizard creates an Expression transformation be tween the Source Qualifier transformation and the target. *************************************************** Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations. For example, many databases store U.S. ZIP c ode information as a Char or Varchar datatype. If you convert the zip code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 94 3031234. This helps increase the speed of the lookup comparisons based on zip co de.
************************************************** Optimizing aggregator Grouping By Simple Columns - use integer values Filtering Data Before You Aggregate Limiting Port Connections - Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. Using Sorted Input Using Incremental Aggregation If you can capture changes from the source that affect less than half the target , you can use incremental aggregation to optimize the performance of Aggregator transformations. When you use incremental aggregation, you apply captured changes in the source t o aggregate calculations in a session. The Integration Service updates the targe t incrementally, rather than processing the entire source and recalculating the same calculations every time you run the session. You can increase the index and data cache sizes to hold all data in memory witho ut paging to disk. ************************************************ Optimizing Joiner Designate the master source as the source with fewer duplicate key values. Designate the master source as the source with fewer rows. Perform joins in a database when possible - Create a pre-session stored procedu re to join the tables in a database. Use the Source Qualifier transforma tion to perform the join. Join sorted data when possible - minimize disk input and output. ************************************************ Optimizing Lookup Transformations Native Drivers Caching Lookup Tables If a mapping contains Lookup transformations, you might want to enable lookup ca ching. When you enable caching, the Integration Service caches the lookup table and queries the lookup cache during the session. When this option is not enabled , the Integration Service queries the lookup table on a row-by-row basis. The result of the Lookup query and processing is the same, whether or not you ca che the lookup table. However, using a lookup cache can increase session perform ance for smaller lookup tables. In general, you want to cache lookup tables that need less than 300 MB. Use the following types of caches to increase performance: Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappin gs. Persistent cache. To save and reuse the cache files, you can configure the tra nsformation to use a persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a persistent cache can improv
e performance because the Integration Service builds the memory cache from the c ache files instead of from the database. Enabling Concurrent Caches - default Auto *********************************************** To optimize Sequence Generator transformations, create a reusable Sequence Gener ator and using it in multiple mappings simultaneously. Also, configure the Numbe r of Cached Values property. The Number of Cached Values property determines the number of values the Integra tion Service caches at one time. Make sure that the Number of Cached Value is no t too small. Consider configuring the Number of Cached Values to a value greater than 1,000. If you do not have to cache values, set the Number of Cache Values to 0. Sequenc e Generator transformations that do not use cache are faster than those that req uire cache. ********************************************** Allocating Memory For optimal performance, configure the Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service node . Allocate at least 16 MB of physical memory to sort data using the Sorter trans formation. The Sorter cache size is set to 16,777,216 bytes by default. If the I ntegration Service cannot allocate enough memory to sort data, it fails the sess ion. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation wor k directory. The Integration Service requires disk space of at least twice the a mount of incoming data when storing data in the work directory. If the amount of incoming data is significantly greater than the Sorter cache size, the Integrat ion Service may require much more than twice the amount of disk space available to the work directory. Use the following formula to determine the size of incoming data: # input rows ([Sum(column size)] + 16) ********************************************* Use the Select Distinct option for the Source Qualifier transformation if you wa nt the Integration Service to select unique values from a source. Use Select Dis tinct option to filter unnecessary data earlier in the data flow. This can impro ve performance. ******************************************** Transformation errors occur when the Integration Service encounters conversion e rrors, conflicting mapping logic, and any condition set up as an error, such as null input. Check the session log to see where the transformation errors occur. If the errors center around particular transformations, evaluate those transform ation constraints. ********************************************
********************************************* ************************************************ **************************************************** Tools -> options -> general tab Reload Tables/Mappings when Opening a Folder Ask Whether to Reload the Tables/Mappings Group Source by Database ******************************************** Target file having datatime as header Header in session properties - Command - give direct command or batch script
******************************************** Join condition in SQL OVERRIDE - {} not needed for one join condition alone Lookup SQL Override column order and the port order - does not affect
Update employee set employeename = case when empid = 1 then 'RK' when empid = 2 then 'Valluri' else null end Incremental aggregation: When you enable both truncate target tables and increm
ental aggregation in the session properties, the Workflow Manager issues a warni ng that you cannot enable truncate target tables and incremental aggregation in the same session. Test load: When you enable both truncate target tables and test load, the Integr ation Service disables the truncate table function, runs a test load session, an d writes the following message to the session log: ********************************************** Java Transformation Examples if(isNull("LegacyAssociatedOrgRefno")) { IdentifyingType="Parent"; LegacyAssociatedOrgRefno = LegacyIdentifyingRefno; generateRow(); IdentifyingType="Child"; LegacyAssociatedOrgRefno = LegacyIdentifyingRefno; generateRow(); IdentifyingType="Self"; LegacyAssociatedOrgRefno = LegacyIdentifyingRefno; generateRow(); } if(isNull("LegacyAssociatedOrgRefno") && HOTYPE.equals("STHA")) { LegacyAssociatedOrgRefno=-1; Level=0; generateRow(); } Swaping columns example: IdentifyingType = "Parent"; generateRow(); IdentifyingType = "Child"; int x = LegacyAssociatedOrgRefno; int y = LegacyIdentifyingRefno;
LegacyIdentifyingRefno = x; LegacyAssociatedOrgRefno = y; generateRow(); *************************************************** The Integration Service runs sessions and workflows. When you configure the Inte gration Service, you can specify where you want it to run Dynamic flat file creation depending on the levels (vary with each trust) Sorter - to sort out the level Transaction control filename port in target instance use TRANSACTION CONTROL transformation IIF(Out_Level_old3 <> Out_Level_new3, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION) Generating Flat File Targets By Transaction You can generate a separate output file each time the Integration Service starts a new transaction. You can dynamically name each target flat file. To generate a separate output file for each transaction, add a FileName port to the flat file target definition. When you connect the FileName port in the mappi ng, the Integration Service creates a separate target file at each commit point. The Integration Service uses the FileName port value from the first row in each transaction to name the output file. ************************************************** UNIX man ls -ltr wc [-l -w] mv filename diff cp rm gzip gunzip ff to find the file any where in the system mkdir cd pwd grep looks for string in the file ftp chmod 4+2+1 (x+w+r) ex: chmod 755 cat f1 f2 >f3 ^-c (ctrl-c) to kill a running process ^-d (ctrl-d) to close an open window 10. What difference between cmp and diff commands? cmp - Compares two files byte by byte and displays the first mismatch diff - tells the changes to be made to make the files identical tar - tape archiver, tar takes a bunch of files, and munges them into one .tar f ile, the files are often compressed with the gzip algorithm, and use the .tar.gz
extension. to create a tar tar -cf archive.tar /directory, then to extract the archive to the current directory run tar -xf archive.tar to use gzip, just add a z to the options, to create a tar.gz: tar -czf archive.tar.gz /dir to extract i t tar -xzf archive.tar.gz head tail head 100 | tail 1
IIF(NOT ISNULL(In_UniqueLegacyTreatmentFunctionIdentifier) AND v_TreatmentFnType Code_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)')
IIF(ISNULL(v_UniqueLegacyTreatmentFunctionIdentifier_lkp), NULL, IIF( v_Treatmen tFnTypeCode_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)')) IIF(ISNULL(v_UniqueLegacySpecialtyIdentifier_lkp), NULL, IIF( v_SpecTypeCode_dom = 'CC_CPSPE', NULL, 'B61413(v_SpecTypeCode)')) IIF(ISNULL(v_UniqueLegacyCurrentSpecialtyIdentifier_lkp), NULL, IIF(v_CurrentSpe cTypeCode_dom = 'CC_CPSPE', NULL, 'B61414(v_CurrentSpecTypeCode)')) *************************************************
********************************************************** ********************************************************** ************************************************************
********************************************************** DW ************************************************************** Data Warehouse Data warehouse is an architecture for organizing information system. It is a pro cess for building decision support systems and knowledge management enviroment t hat supports both day-to-day tactical decision making and long-term business str ategies. Bill Inmon "Subject-oriented, integrated, time variant, non-volatile collection of data in support of management's decision making process." Ralph Kimball "a copy of transaction data specifically structured for query and analysis." Operational Data Store(ODS) An operational data store is an integrated, subject-oriented, volatile(including update/deletion), current valued structure designed to serve operational users as they do high performance integrated processing. OLTP(Online Transaction Processing)> OLTP is a class of program that facilitates and manages transaction-oriented app lications, typically for data entry and retrieval transaction processing. OLTP s ystems are optimized for data entry operations. e.g. Order Entry, Banking, CRM, ERP applications etc
Data Warehouse Subject oriented Summarized, refined Represents value over time Supports managerial needs Read only data Batch processing
Operational/Transactional Application oriented Detailed Accurate as of moment Supports day-to-day needs Can be updated Real time transactions
Completely different life cycle Software Development Life Cycle Analysis driven Dimensional model Large amount of data Relaxed availability Flexible structure Transaction driven Entity Relational Diagram Small amount of data High availability Static structure
Business Intelligence (BI) Business Intelligence is a set of business processes for collecting and analying business information. BI functions include trend analysis, aggregation of data , drilling down to complex levels of detail, slice-dice, data rotation for compa rative viewing. OLAP(On-Line Analytical Processing) Querying and presenting data from data ware house exemplifying as multiple dimensions. ROLAP(Relational OLAP) Applications and set of user interfaces that retrieve da ta from RDBMS and present as dimensional model. MOLAP(Multidimensional OLAP) Applications, set of user interfaces and database technologies that have dimensional model. DOLAP(Desktop OLAP) Designed for low-end, single user. Data is stored/download ed on the desktop. HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.
Top-Down Bottom-Up rated Practitioner Bill Inmon Ralph Kimball Hackney Emphasize Data Warehouse Data Marts grate heterogeneous BI environments
Hybrid Many practitioners DW and data marts
Fede Doug Inte
Design Enterprise based normalized model; marts use a subject orient di mensional model Dimensional model of data mart, consists star schema Start enterprise and local models; one or more star schemas An achitecture of architectures; share dimensions, facts, rules, definitions across organizations Architect Multi-tier comprised of staging area and dependent data marts Staging area and data marts High-level normalized enterprise model; initial marts Reality of change in organizations and systems Data set DW atomic level data; marts summary data Contains both atomic and summary data Populates marts with atomic and summary data via a non-persistent staging area. Use of whatever means possible to integrate business needs
***************************************************************** E. F. Codd(father of the relational database)'s 12 rules for OLAP Multidimensional conceptual view. This supports EIS "slice-and-dice" operations and is usually required in financial modeling. Transparency. OLAP systems should be part of an open system that supports hetero geneous data sources. Furthermore, the end user should not have to be concerned about the details of data access or conversions. Accessibility. The OLAP should present the user with a single logical schema of the data.
Consistent reporting performance. Performance should not degrade as the number o f dimensions in the model increases. Client/server architecture. Requirement for open, modular systems. Generic dimensionality. Not limited to 3-D and not biased toward any particular dimension. A function applied to one dimension should also be able to be applied to another. Dynamic sparse-matrix handling. Related both to the idea of nulls in relational databases and to the notion of compressing large files, a sparse matrix is one i n which not every cell contains data. OLAP systems should accommodate varying st orage and data-handling options. Multiuser support. OLAP systems, like EISes, need to support multiple concurrent users, including their individual views or slices of a common database. Unrestricted cross-dimensional operations. Similar to rule 6; all dimensions are created equal, and operations across data dimensions do not restrict relationsh ips between cells. Intuitive data manipulation. Ideally, users shouldn't have to use menus or perfo rm complex multiple-step operations when an intuitive drag-and-drop action will do. Flexible reporting. Save a tree. Users should be able to print just what they ne ed, and any changes to the underlying financial model should be automatically re flected in reports. Unlimited dimensional and aggregation levels. A serious tool should support at l east 15, and preferably 20, dimensions. ******************************************************************** Dimensional data model is most often used in data warehousing systems. This is d ifferent from the 3rd normal form, commonly used for transactional (OLTP) type s ystems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model. Dimension: A category of information. For example, the time dimension. eg: demog raphy, date, product, customer, HR etc Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension. Hierarchy: The specification of levels that represents relationship between diff erent attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year ? Quarter ? Month ? Day. Dimensional Data Model: Dimensional data model is commonly used in data warehous ing systems. This section describes this modeling technique, and the two common schema types, star schema and snowflake schema. Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fac t table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column. Attributes are the non-key columns in the lookup tables. Lookup Table: The lookup table provides the detailed information about the attri butes. For example, the lookup table for the Quarter attribute would include a l ist of all of the quarters available in the data warehouse. Each row (each quart er) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented a s "Q1 2001" or "2001 Q1").
STAR SCHEMA All measures in the fact table have the same level of granularity. complex star can have more than one fact table. A star schema is characterized by one OR more very large fact tables that contai n the primary information in the data warehouse, and a number of much smaller di mension tables (OR lookup tables), each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables. Ea ch dimension table is joined to the fact table using a primary key to foreign ke y join, but the dimension tables are not joined to each other. The main advantages of star schemas are that they: Provide a direct and intuitive mapping between the business entities being analy zed by end users and the schema design. Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that the data-warehouse schema contains dimension ta bles The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in th e dimensional hierarchy. the Time Dimension that consists of 2 different hierarchies: 1. Year ? Month ? Day 2. Week ? Day Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dim ension data has been grouped into multiple tables instead of one large table. Fo r example, a location dimension table in a star schema might be normalized into a location table and city table in a snowflake schema. While this saves space, i t increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure above p resents a graphical representation of a snowflake schema. Fact Constellation Schema This Schema is used mainly for the aggregate fact tables, OR where we want to sp lit a fact table for better comprehension. The split of fact table is done only when we want to focus on aggregation over few facts & dimensions. ********************************************************************

Informatica DW Consolidated

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Informatica DW Consolidated

Загружено:

Авторское право:

Доступные форматы

session properties -> general tab -> treat input link as 'AND'/'OR' session partition with aggregator transformation

(a) (b) (c) (d)

(b) (c) (d)

Source qualifier Transformations, as you need Output transformation

IIF(NOT ISNULL(In_UniqueLegacyTreatmentFunctionIdentifier) AND v_TreatmentFnType Code_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)')

****

Hybrid Many practitioners DW and data marts

Fede Doug Inte

Вам также может понравиться