Вы находитесь на странице: 1из 52

Tips & Techniques for NCR Teradata with Informatica PowerCenter

Tips & Techniques for NCR Teradata with Informatica PowerCenter.................................................................................................... 1 Introduction............................................................................................................................................................................................. 3 Teradata Basics...................................................................................................................................................................................... 3 Teradata Hardware ................................................................................................................................................................................. 3 Teradata Software .................................................................................................................................................................................. 3 Tools ...................................................................................................................................................................................................... 4 Client Configuration Basics for Teradata.................................................................................................................................................. 5 Informatica/Teradata Touch Points .......................................................................................................................................................... 5 ODBC ................................................................................................................................................................................................ 5 ODBC Windows.............................................................................................................................................................................. 7 ODBC UNIX ................................................................................................................................................................................... 7 Teradata External Loaders.................................................................................................................................................................. 8 Partitioned Loading............................................................................................................................................................................15 Teradata TPump................................................................................................................................................................................16 Teradata MultiLoad............................................................................................................................................................................17 Cleaning up after a failed MultiLoad session...................................................................................................................................17 Using One Instance of Teradata MultiLoad to Load Multiple Tables.................................................................................................18 Multiple Workflows that MultiLoad to The Same Table..................................................................................................................22 Teradata FastLoad ............................................................................................................................................................................22 FAQ: Why are there three different loaders for Teradata? Which loader should I use? ........................................................................22 Teradata FastExport ..........................................................................................................................................................................24 HOW TO: Enable the Teradata FastExport option in an existing repository using command line ......................................................25 HOW TO: use encryption with Fast Export......................................................................................................................................28 Teradata Parallel Transporter (TPT)...................................................................................................................................................29 Connection attributes for Teradata Parallel Transporter (TPT).........................................................................................................31 ETL Vs EL-T Design Paradigm (PushDown Optimization)...................................................................................................................32 Maximizing Performance using Pushdown Optimization..................................................................................................................32 Running Pushdown Optimization Sessions.....................................................................................................................................33 Running Source-Side Pushdown Optimization Sessions .................................................................................................................33 Running Target-Side Pushdown Optimization Sessions..................................................................................................................33 Running Full Pushdown Optimization Sessions ..............................................................................................................................33 Integration Service Behavior with Full Optimization.........................................................................................................................34 Working with SQL Overrides ..........................................................................................................................................................36 Configuring Sessions for Pushdown Optimization ...........................................................................................................................36 Design Techniques........................................................................................................................................................................38 FAQs....................................................................................................................................................................................................41 Uncached Lookup Date/Time limitation...............................................................................................................................................41 Streaming/Non-Staged Mode...........................................................................................................................................................41 Lookup Performance..........................................................................................................................................................................49

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Hiding the Password..........................................................................................................................................................................49 Troubleshooting.................................................................................................................................................................................50 Errors that indicate a prior MultiLoad session has not been cleaned up.........................................................................................51 Sessions periodically fail with broken pipe errors when writing to a loader in streaming (non-staging) mode:................................51

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Introduction
This document gives an overview of all the Integration and touch points between Informatica and Teradata. It discusses ELT architecture and design Methodologies. This document covers all the details regarding the configuration of various Informatica Teradata touch points and supplies how-to examples for using Informatica PowerCenter 8.1/8.1.1 and Teradata Warehouse. It covers Teradata basics and also describes some tweaks, which experience has shown may be necessary to adequately deal with some of the common practices you may encounter at a Teradata account. The Teradata documentation (especially the MultiLoad, FastLoad, TPump, FastExport and TPT reference materials) is highly recommended, as is the External Loader section of the Server Manager Guide for PowerCenter. Additional Information: All Teradata documentation can be downloaded from the Teradata Web site (http://www.info.ncr.com/Teradata/eTeradata-BrowseBy.cfm). Finally, a Teradata Forum provides a wealth of information that can be useful (http://www.Teradataforum.com). Please visit my.informatica.com to see Informatica documentation.

Teradata Basics
Teradata, a division of NCR Corporation (NYSE: NCR), is the global technology leader in enterprise data warehousing, analytic applications and data warehousing services. Optimized for decision support, Teradata Warehouse is a powerful suite of software (Teradata RDBMS, data access and management utilities, and data mining capabilities), hardware and consulting services. Due to its parallel database architecture and scalable hardware, Teradata Warehouse outperforms other vendors solutions, from small to very large production warehouses with hundreds of terabytes. With 25+ years experience, Teradata is a major player in the financial services, retail, communications, insurance, travel and transportation, and manufacturing industries, as well as with government organizations.

Teradata Hardware
While Teradata can run on other platforms, it is predominantly found on NCR Intel-based servers, which run NCRs version of Unix (NCR Unix MP-RAS), which are proven to deliver higher performance, availability to guarantee response, and scalability to accommodate business growth. NCR Servers can be configured for both massively parallel processing (MPP) and symmetric multiprocessing (SMP). Each MPP node (or semi-autonomous processing unit) can support SMP. Teradata can be configured to communicate directly with a mainframes input/output (I/O) channel, known as channel-attached. Alternatively, it can be network-attached; that is, configured to communicate via transmission control protocol/Internet protocol (TCP/IP) over a local area network (LAN). Because PowerCenter runs on UNIX, you will be dealing with a network-attached configuration most of the time. However, occasionally, clients will want to use their existing channel-attached configuration under the auspices of better performance. Do not assume that channel-attached is always faster than network-attached. Similar performance has been observed across a channel attachment as well as a 100-MB LAN. In addition, channel attachment requires an additional sequential data move: data must be moved from the PowerCenter server to the mainframe before moving the data across the mainframe channel to Teradata.

Teradata Software

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

In the Teradata world, there are Teradata Director Program Identifier (TDPIDs), databases, and users. The TDPID is simply the name for connections from a Teradata client to Teradata server (as any other RDBMS systems that has host port and instance name mapping file). And Teradata also looks at databases and users somewhat synonymously. A login is used to login to the instance and A user has a userid, password, and space to store tables. Teradata AMPs are access module processors. Think of them as Teradatas parallel database engines. Although they are strictly software (virtual processors according to Teradata terminology), Teradata often uses the terms AMP and hardware node interchangeably because an AMP previously was a piece of hardware.

Tools
These are the tools you may find at a Teradata site. There are others, but these are the main ones. BTEQ: (pronounced BEE-teek): This is the command line utility for Teradata. It is similar to Oracles SQL*Plus. Teradata SQL, for the most part, is standard SQL. Teradata SQL Assistant/Queryman: This is the GUI SQL client for Teradata. Older versions were called Queryman, newer versions Teradata SQL Assistant. They are basically the same tool. If youre going to be doing a lot with Teradata, it is probably worth getting access to this tool so you dont have to do so much command line typing. WinDDI: Windows Data Dictionary Interface. This is a DBA type client tool that you use to perform database administration tasks. It is nice if you can get access to this tool, but it has been my experience that not every client will allow this. MultiLoad: This is a sophisticated bulk load utility and is the primary method PowerCenter uses to load mass quantities of data into Teradata. Unlike bulk load utilities from other vendors, MultiLoad supports inserts, updates, upserts and delete operations. You can also use variables and embed conditional logic into MultiLoad scripts. It is very fast (millions of rows in a few minutes). It is also a resource hog. Tpump: Tpump is kind of like a MultiLoad lite. It also supports inserts, updates, upserts and deletes. It is not as fast as MultiLoad, but it doesnt use as many resource nor does it require table level locks. It is often used to trickle load a table. The syntax of a Tpump script is very similar to MultiLoad. FastLoad: As the name suggests, this is a very fast utility to load data into Teradata. It is the fastest method to load data into Teradata. However, there is one major restriction: the target table must be empty (yes, you read that correctly). FastExport: As the name suggests, this is a very fast utility to unload data from Teradata. Teradatas ODBC has been optimized for query support so it is pretty fast (testing has shown it is as fast as BTEQ), but, alas, not as fast as FastExport. Fast Export has been supported since version 7.1.3. Teradata Warehouse Builder: Teradata Warehouse Builder (TWB) is a single utility that was intended to replace FastLoad, MultiLoad, Tpump and FastExport. It was to support a single scripting environment with different modes, where each mode roughly equates to one of the legacy utilities. It also was to support parallel loading (i.e. multiple instances of a TWB client could run and load the same table at the same time something the legacy loaders cannot do). PowerCenter supports TWB. Unfortunately, NCR/Teradata does not. Much to Informaticas dismay, TWB has never been formally released (never went GA). According to NCR, its release was delayed primarily because of issues with the mainframe version. This delay has lasted for over 2 years. If you find a prospect willing to use TWB, please do. Its ability to support parallel load clients makes some things quite a bit easier.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Client Configuration Basics for Teradata


The client configuration is wholly contained in the hosts file (/etc/hosts on UNIX or winnt\system32\drivers\etc\hosts on Win). Informatica does not run on NCR UNIX MP-RAS, so you should not have to deal with the server side. Teradata uses a naming nomenclature in the host file. The name of the Teradata instance (that is, TDPID) is indicated by the letters and numbers that precede the string cop1 in a host file entry. For example: 127.0.0.1 192.168.80.113 localhost curly demo1099cop1 pcop1

This tells the Teradata Database that when a client tool references the instance demo1099, it should direct requests to localhost (or IP address 127.0.0.1); when a client tool references instance p, it is located on the server curly (or IP address 192.168.80.113). There is no tie here to any kind of database server-specific information. That is, the TDPID is used strictly to define the name a client uses to connect to a server. Teradata does not care. It simply takes the name you specify, looks in the host file to map the <name>cop1 (or cop2, and so on) to an IP address, and then attempts to establish a connection with Teradata at the IP address. Sometimes youll see multiple entries in a host file with similar TDPIDs: 127.0.0.1 192.168.80.113 192.168.80.114 192.168.80.115 192.168.80.116 localhost curly_1 curly_2 curly_3 curly_4 demo1099cop1 pcop1 pcop2 pcop3 pcop4

This setup allows load balancing of clients among multiple Teradata nodes. That is, most Teradata systems have many nodes, and each node has its own IP address. Without the multiple host file entries, every client will connect to one node and eventually this node will be doing more than its fair share of client processing. With multiple host file entries, if it takes too long for the node specified with the cop1 suffix to respond (that is, curly_1) to the client request to connect to p, then the client will automatically attempt to connect to the node with the cop2 suffix (that is, curly_2).

Informatica/Teradata Touch Points


Informatica PowerCenter 7.1x/8.x accesses the Teradata Database through various Teradata tools. Each one of the touch points is defined below according to how it is configured within PowerCenter.

ODBC
Teradata provides 32-bit and 64 bit ODBC drivers for Windows and UNIX platforms. If possible, use the ODBC driver from Teradatas TTU 8.1 release (or above) of its client software because this version supports array reads. Tests have shown these new drivers (3.05) can be 20 to 30 percent faster than the old drivers (3.01). Also 64 bit drivers will provide better performance than the 32 bit counter parts but it is important to note that drivers bit mode needs to be compatible with PowerCenter bit mode. This latest release of Teradatas TTU, 8.1, uses ODBC v3.0.5. TTU 8.2 uses ODBC v3.0.6 but it is not yet supported but PowerCenter 8.1.1. Teradatas ODBC driver is on a performance par with Teradatas SQL CLI. In fact, ODBC is Teradatas recommended SQL interface for its partners. When using traditional ETL approach, it is suggested to use ODBC drivers to write to Teradata only when youre writing very small data sets (and even then, you should probably use TPump, defined later) because Teradatas ODBC driver is optimized for query access, not for writing data. For extraction and large size lookups, it is better to use FastExport rather than ODBCPowerCenter Designer uses Teradatas ODBC driver to import source and target table.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

If you have performance problems, Pushdown Optimization is suggested. Detailed design methodologies, EL-T approach and Configuring and designing mappings using PushDown Optimization is described in the following sections. If PushDown Optimization is unavailable, it is suggested to use the native Teradata Utilities. Note: ODBC is not good for sourcing and lookups instead FastExport should be used for large sourcing and lookups. The most efficient method to create a lookup (for very large data sets) is to use FastExport to create a sorted file that can then be used as a source for a flat file lookup. In version 8.5, one can also use the pipeline lookup feature to do this automatically. That is, the pipeline lookup feature allows a source qualifier (any source qualifier, even a source qualifier that will be using FastExport behind the scenes) to be tied to a lookup.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

ODBC Windows

ODBC UNIX When the PowerCenter server is running on UNIX, then ODBC is required to read (both sourcing and look-ups) from the Teradata Database. As with all UNIX ODBC drivers, the key to configuring the UNIX ODBC driver is adding the appropriate entries to the .odbc.ini file. To correctly configure the .odbc.ini file, there must be an entry under [ODBC Data Sources] that points to the Teradata ODBC driver shared library (tdata.sl on HP-UX, the standard shared library extension on other flavors of UNIX). The following example shows the required entries from an actual .odbc.ini file (note that the path to the driver may be different on each computer): [ODBC Data Sources]

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

dBase=MERANT 3.60 dBase Driver Oracle8=MERANT 3.60 Oracle 8 Driver Text=MERANT 3.60 Text Driver Sybase11=MERANT 3.60 Sybase 11 Driver Informix=MERANT 3.60 Informix Driver DB2=MERANT 3.60 DB2 Driver MS_SQLServer7=MERANT SQLServer driver

TeraTest=tdata.sl [TeraTest] Driver=/usr/odbc/drivers/tdata.sl Description=Teradata Test System DBCName=148.162.247.34 Similar to the client host file set-up, you can specify multiple IP addresses for the DBCName to balance the client load across multiple Teradata nodes. Consult with the Teradata administrator for exact details on this (or copy the entries from the PC clients host file (see the section Client Configuration Basics for Teradata earlier in this document). Important note: Make sure that the Merant ODBC path precedes the Teradata ODBC path information in the PATH and SHLIB_PATH (or LD_LIBRARY_PATH, and so on) environment variables. This is necessary because both sets of ODBC software use some of the same file names. PowerCenter should use the Merant files because this software has been certified. Important note: If possible, use the ODBC driver from Teradatas TTU7 release (or above) of their client software because this version supports array reads. Tests have shown these new drivers (3.02) can be 20%-30% faster than the old drivers (3.01).

Teradata External Loaders


PowerCenter 7.1.2/8.x supports four different Teradata external loaders: TPump, FastLoad, MultiLoad, and Teradata Warehouse Builder. The actual Teradata loader executables (TPump, mload, fastload, tbuild) must be accessible by the PowerCenter server generally in the path statement. Note: Please look at the Product Availability Matrix All of the Teradata loader connections will require a value to the TDPID attribute. Refer to the first section of this document to understand how to correctly enter the value. All of these loaders require: A load file, which can be configured to be a stream/pipe and is autogenerated by PowerCenter A control file of commands to tell the loader what to do (PowerCenter autogenerates)

All of these loaders will also produce a log file, which will be the means to debug the loader if something goes wrong. Because these are external loaders, PowerCenter will only be notified of whether it ran successfully or not.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

By default, the input file, control file, and log file will be created in $PMTargetFileDir of the PowerCenter server executing the workflow.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

You can use any of these loaders by configuring the target in the PowerCenter session to be a File Writer and then choosing the appropriate loader.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

10

The auto-generated control file can be overridden. Click the Pencil icon next to the loader connection name.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Scroll to the bottom of the connection attribute list and click the value next to the Control File Content Override attribute. Then click the Down arrow.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

12

Click the Generate button and change the control file as you wish. The repository stores the changed control file.

Alternate option: 1. Run session. 2. Modify the control file created by the initial session run. 3. Make the control file read-only. 4. Run the session again. This and subsequent session runs will use the modified control file.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Most of the loaders also use some combination of internal work, error, and log tables. By default, these will be in the same database as the target table. All of these can now be overridden in the attributes of the connection.

To stage the input flat file to the disk (verus in-memory or pipe option), ensure the Is Staged attribute is checked. If the Is Staged attribute is not checked, then the file will be piped/streamed to the loader. If you select the non-staged mode for a loader, also set the checkpoint property to 0. This effectively turns off the checkpoint processing. Checkpoint processing is used for recovery/restart of Teradata FastLoad and MultiLoad sessions. However, if you are using a named pipe instead of a physical file as input, then the recovery/restart mechanism of the loaders does not work. Besides impacting performance (the checkpoint processing is not free, and we want to eliminate unnecessary overhead when possible), a nonzero checkpoint value will sometimes cause seemingly random errors and session failures when used with named pipe input (as is the case in streaming mode).

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

14

Partitioned Loading
With PowerCenter if you set a round robin partition point on the target definition and set each target instance to be loaded using the same loader connection instance, then PowerCenter automatically writes all data to the first partition and only starts one instance of Teradata FastLoad or MultiLoad. You will know you are getting this behavior if you see the following entry in the session log: MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions. All data will be routed to the first partition. If you do not see this message, then chances are the session fails with the following error: WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not support partitioned sessions. WRITER_1_*_1> Thu Jun 16 11:58:21 2005 WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Teradata TPump
Teradata TPump is an external loader that supports inserts, updates, upserts, deletes, and data-driven updates. Multiple TPump loaders can execute simultaneously against the same table because TPump doesnt use many resources or require table-level locks. It is often used to trickle load a table. As stated earlier, Teradata TPump provides a faster method to update a table than using ODBC, but will not be as fast as the other loaders.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

16

Teradata MultiLoad
This sophisticated bulk load utility is the primary method PowerCenter uses to load/update large quantities of data into a Teradata Warehouse. Unlike bulk load utilities from other vendors, Teradata MultiLoad supports inserts, updates, upserts, deletes, and data-driven operations in PowerCenter. You can also use variables and embed conditional logic into Teradata MultiLoad scripts. It is very fast (millions of rows in a few minutes). It can be resource-intensive and will take a table lock.

Cleaning up after a failed MultiLoad session: Teradata MultiLoad supports sophisticated error recovery. That is, it allows load jobs to be restarted without having to redo all of the prior work. However, for the types of problems normally encountered during a Proof Of Concept (loading null values into a column that does not support nulls, incorrectly formatted date columns), the error recovery mechanisms tend to get in the way. Please refer to Teradata MultiLoad Manual for Teradata MultiLoads sophisticated error recovery (Available of www. Teradata.com/manuals). To learn how to work around the recovery mechanisms to restart a failed MultiLoad script from scratch, read this section.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Teradata MultiLoad puts the target table into the MultiLoad state. Upon successful completion, the target table is returned to the normal (non-MultiLoad) state. Therefore, when a MultiLoad session fails for any reason, the table is left in the MultiLoad state, and you cannot simply rerun the same MultiLoad session. MultiLoad will report an error. In addition, MultiLoad also queries the target tables MultiLoad log table to see if it contains any errors. If a MultiLoad log table exists for the target table, then you also will not be able to rerun your MultiLoad job. To recover from a failed MultiLoad, release the target table from the MultiLoad state and also drop the MultiLoad log table. You can do this using BTEQ or Teradata QueryMan to issue the following commands: drop table mldlog_<table name>; release mload <table name>; Note: The drop table command assumes that youre recovering from a MultiLoad script generated by PowerCenter (PowerCenter always names the MultiLoad log table mldlog_<table name>). If youre working with a hand-coded MultiLoad script, the name of the MultiLoad log table could be anything. Here is the actual text from a BTEQ session that cleans up a failed load to the table td_test owned by the user infatest: BTEQ -- Enter your DBC/SQL request or BTEQ command: drop table infatest.mldlog_td_test; drop table infatest.mldlog_td_test; *** Table has been dropped. *** Total elapsed time was 1 second.

BTEQ -- Enter your DBC/SQL request or BTEQ command: release mload infatest.td_test; release mload infatest.td_test; *** Mload has been released. *** Total elapsed time was 1 second. Using One Instance of Teradata MultiLoad to Load Multiple Tables MultiLoad is a big consumer of resources on a Teradata system. Some systems will have hard limits on the number of concurrent MultiLoad sessions allowed. By default, PowerCenter will start an instance of MultiLoad for every target file. Sometimes, this is illegal (if the multiple instances target the same table). Other times, it is just expensive. Therefore, a prospect may ask that PowerCenter use a single instance of MultiLoad to load multiple tables (or to load both inserts and updates into the same target table). To make this happen, you must heavily edit the generated MultiLoad script file. Note: This is not an issue with Teradata TPump because TPump is not as resourceintensive as MultiLoad (and multiple concurrent instances of TPump can target the same table). Heres the workaround: 1) 2) 3) Use a dummy session (i.e., set test rows to 1 and target a test database) to generate MultiLoad control files for each of the targets. Merge the multiple control files (one per target table) into a single control file (one for all target tables). Configure the session to call MultiLoad from a post-session script using the control file created in Step 2. Integrated support cannot be used because each input file is processed sequentially. and this causes problems when combined with the integrated named pipes and streaming of PowerCenter.

Details on merging the control files:

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

18

1) 2) 3) 4) 5) 6) 7) 8)

There is a single log file for each instance of MultiLoad. Therefore, you do not have to change or add anything to the LOGFILE statement. However, you might want to change the name of the log table because it may be a log that spans multiple tables. Copy the work and error tables delete statements into the common control file. Modify the BEGIN MLOAD statement to specify all the tables that the MultiLoad job will be hitting. Copy the Layout sections into the common control file and give each a unique name. Organize the file such that all the layout sections are grouped together. Copy the DML sections into the common control file and give each a unique name. Organize the file such that all the DML sections are grouped together. Copy the Import statements into the common control file and modify them to reflect the unique names created for the referenced layout and DML sections created in Steps 4 and 5. Organize the file such that all the import sections are grouped together. Run chmod w on the newly minted control file so PowerCenter doesnt overwrite it, or, better yet, name it something different so PowerCenter cannot overwrite it. Remember, a single instance of Teradata MultiLoad can target five tables at most. Therefore, dont combine more than five target files into a common file.

Heres an example of a control file merged from two default control files: .DATEFORM ANSIDATE; .LOGON demo1099/infatest,infatest; .LOGTABLE infatest.mldlog_TD_TEST; DROP TABLE infatest.UV_TD_TEST ; DROP TABLE infatest.WT_TD_TEST ; DROP TABLE infatest.ET_TD_TEST ; DROP TABLE infatest.UV_TD_CUSTOMERS ; DROP TABLE infatest.WT_TD_CUSTOMERS ; DROP TABLE infatest.ET_TD_CUSTOMERS ; .ROUTE MESSAGES WITH ECHO TO FILE c:\LOGS\TgtFiles\td_test.out.ldrlog ; .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST, infatest.TD_CUSTOMERS ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; /* Begin Layout Section */ .Layout InputFileLayout1; .Field .Field .Field .Field .Field CUST_KEY CUST_NAME CUST_DATE CUST_DATEmm CUST_DATEdd 1 CHAR( 12) NULLIF CUST_KEY = '*' ; 13 CHAR( 20) NULLIF CUST_NAME = '*' ; 33 CHAR( 10) NULLIF CUST_DATE = '*' ; 33 CHAR( 2) ; 36 CHAR( 2) ;

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

.Field CUST_DATEyyyy 39 CHAR( 4) ; .Field CUST_DATEtd CUST_DATEyyyy||'/'||CUST_DATEmm||'/'||CUST_DATEdd NULLIF CUST_DATE = '*' ; .Filler EOL_PAD 43 CHAR( 2) ; .Layout InputFileLayout2; .Field CUSTOMER_KEY 1 CHAR( 12) ; .Field CUSTOMER_ID 13 CHAR( 12) ; .Field COMPANY 25 CHAR( 50) NULLIF COMPANY = '*' ; .Field FIRST_NAME 75 CHAR( 30) NULLIF FIRST_NAME = '*' ; .Field LAST_NAME 105 CHAR( 30) NULLIF LAST_NAME = '*' ; .Field ADDRESS1 135 CHAR( 72) NULLIF ADDRESS1 = '*' ; .Field ADDRESS2 207 CHAR( 72) NULLIF ADDRESS2 = '*' ; .Field CITY 279 CHAR( 30) NULLIF CITY = '*' ; .Field STATE 309 CHAR( 2) NULLIF STATE = '*' ; .Field POSTAL_CODE 311 CHAR( 10) NULLIF POSTAL_CODE = '*' ; .Field PHONE 321 CHAR( 30) NULLIF PHONE = '*' ; .Field EMAIL 351 CHAR( 30) NULLIF EMAIL = '*' ; .Field REC_STATUS 381 CHAR( 1) NULLIF REC_STATUS = '*' ; .Filler EOL_PAD 382 CHAR( 2) ; /* End Layout Section */ /* begin DML Section */

.DML Label tagDML1; INSERT INTO infatest.TD_TEST ( CUST_KEY , CUST_NAME , CUST_DATE ) VALUES ( :CUST_KEY , :CUST_NAME , :CUST_DATEtd ) ; .DML Label tagDML2; INSERT INTO infatest.TD_CUSTOMERS ( CUSTOMER_KEY , CUSTOMER_ID , COMPANY , FIRST_NAME , LAST_NAME , ADDRESS1 , ADDRESS2 , CITY , STATE , POSTAL_CODE , PHONE , EMAIL , REC_STATUS
Informatica Confidential. Do not duplicate. Revision: 1/25/2008 20

) VALUES ( :CUSTOMER_KEY :CUSTOMER_ID :COMPANY :FIRST_NAME :LAST_NAME :ADDRESS1 :ADDRESS2 :CITY :STATE :POSTAL_CODE :PHONE :EMAIL :REC_STATUS ) ; /* end DML Section */

, , , , , , , , , , , ,

/* Begin Import Section */ .Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout1 Format Unformat Apply tagDML1 ; .Import Infile c:\LOGS\TgtFiles\td_customers.out Layout InputFileLayout2 Format Unformat Apply tagDML2 ;

/* End Import Section */ .END MLOAD; .LOGOFF;

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Multiple Workflows that MultiLoad to The Same Table BecauseTeradata MultiLoad puts a lock on the table, it requires that all MultiLoad sessions handle wait events so they don't try to access the table simultaneously. Also, any log files should be given unique names for the same reason.

Teradata FastLoad
As the name suggests, this utility is the fastest method to load data into a Teradata Warehouse. However, there is one major restriction: the target table must be empty.

FAQ: Why are there three different loaders for Teradata? Which loader should I use?
FastLoad is the fastest loader, but it only works with empty tables with no secondary indexes. Use FastLoad for a high-volume initial load, or for high-volume truncate and reload operations. FastLoad can only insert data.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

22

MultiLoad can insert, update, delete, and upsert into Teradata. An upsert is essentially an update else insert performed at the database level. Note that this does not require specifying update else insert in PowerCenter, or use of an Update Strategy transformation. You specify Upsert as the Load Mode in the Connection Properties when defining the MultiLoad External Loader Connection in the PowerCenter Workflow Manager. Use MultiLoad for large volume incremental loads. Both FastLoad and MultiLoad work at the data block level. In other words, these loaders are much faster than standard DML within Teradata. The both acquire table level locks, which means they are only appropriate for off-line data loading. MultiLoad first writes the data into temporary tables in Teradata, and then it updates the data blocks directly. All changes to a physical data block are made in a single operation. Tpump is designed to refresh the data warehouse on-line or in real-time. Tpump is an alternative to MultiLoad for relatively low-volume, on-line data loads. It does not incur the overhead of writing to temporary tables, but it does potentially incur the expense of changing the same physical data block multiple times. TPump is not as fast as MultiLoad for large volume loads, but Tpump acquires row-hash locks on the table, rather than acquiring a table-level lock. Tpump also provides a mechanism to limit resource consumption by controlling the rate at which statements are sent to the RDBMS. Other users and applications can access data in the table being loaded while Tpump is running.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

PowerCenter 7.1.x/ 8.x Product Availability Matrix for Teradata Product


PowerCenter PowerCenter PowerCenter PowerCenter PowerCenter PowerCenter PowerCenter

Ver
8.1.1 SP1 (Ltd PAM) 8.0.0 7.1.5 7.1.4 7.1.3 7.1.2 7.1.1

OS
Windows Unix Windows Unix Windows Unix Windows Unix Windows Unix Windows Unix Windows Unix

Database
Teradata Teradata Teradata Teradata Teradata Teradata Teradata

Ver
v2R6.1, v2R6, v2R5, v2R5.1 v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1 v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1 v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1 v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1 v2R6, v2R5.1, v2R5, v2R4, v2R4.1 v2R5.1, v2R5, v2R4, v2R4.1

Src
x x x x x x x

Tgt
x x x x x x x

Rep

Status
Supported Supported

x x x x x

Supported Supported Supported Supported Supported

Teradata FastExport
Teradata Fast Export support with PowerCenter 7.1.3 and above versions Prior versions of PowerCenter do not support Teradata FastExport. FastExport is a utility that uses multiple Teradata sessions to quickly export large amounts of data from a Teradata database. You can create a PowerCenter session that uses FastExport to read Teradata sources. To use FastExport with PowerCenter, you need to register the FastExport plug-in to PowerCenter. The plug-in includes a FastExport Teradata connection and FastExport Reader that you can select for a session. To register FastExport plug-in in PowerCenter 8.1.1, please see the instructions given below HOW TO: Register a FastExport plug-in using Admin Console in PowerCenter 8.1.1: 1) Run the Repository Service in exclusive

2) 3) 4)

In the Navigator, select the Repository Service to which you want to add the plug-in. Click the Plug-ins tab. Click the link to register a Repository Service plug-in.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

24

5) 6) 7) 8) 9)

On the Register Plugin for <Repository Service> page, click the Browse button to locate the plug-in file. If the plug-in was registered previously and you want to overwrite the registration, select the check box to update the existing plug-in registration. For example, you might select this option when you upgrade a plug-in to the latest version. Enter your repository user name and password and Click OK. The Repository Service registers the plug-in with the repository. The results of the registration operation appear in the activity log. Run the Repository Service in normal mode.

HOW TO: Enable the Teradata FastExport option in an existing repository using command line To enable the FastExport Application Connection for a repository it is necessary to register the plug-in file for Teradata FastExport ("pmtdfexp.xml") This plug-in file is located in the "native" sub-directory of the Repository Server installation. * Windows The default directory on Windows is C:\Program Files\Informatica PowerCenter 7.1.3\RepositoryServer\bin\native * Unix An example on UNIX would be: /local/repserver/native Use the "pmrepagent" command located in the repository server installation directory. Syntax: * Windows: pmrepagent registerplugin -r reponame -n Administrator -x Administrator -t dbtype -u dbuser -p dbpwd -c connect_string I .\native\pmtdfexp.xml -N * UNIX: pmrepagent registerplugin -r reponame -n Administrator -x Administrator -t dbtype -u dbuser -p dbpwd -c connect_string -i ./native/pmtdfexp.xml -N Example: * Windows: cd C:\Program Files\Informatica PowerCenter 7.1.3\RepositoryServer\bin pmrepagent registerplugin -r PC71X -n Administrator -x Administrator -t oracle -u PC71X -p PCPASS -c orarepa2.informatica.com -i .\native\pmtdfexp.xml -N * UNIX: $ cd /local/repserver $ pmrepagent registerplugin -r PC71X -n Administrator -x Administrator -t oracle -u PC71X -p PCPASS -c orarepa2.informatica.com -i ./native/pmtdfexp.xml -N

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

To use FastExport, create a mapping with a Teradata source database. In the session, use FastExport reader instead of Relational reader. Use a FastExport connection to the Teradata tables you want to export in a session. FastExport uses a control file that defines what to export. When a session starts, the Integration Service creates the control file from the FastExport connection attributes. If you create a SQL override for the Teradata tables, the Integration Service uses the SQL to generate the control file. You can override the control file for a session by defining a control file in session properties. The Integration Service writes FastExport messages in the session log and information about FastExport performance in the FastExport log. PowerCenter saves the FastExport log in the folder defined by the Temporary File Name session attribute. The default extension for the FastExport log is .log. HOWTO: To use FastExport in a session: 1) 2) 3) 4) Create a FastExport connection in the Workflow Manager and configure the connection attributes. Open the session and change the Reader property from Relational Reader to Teradata FastExport Reader. Change the connection type and select a FastExport connection for the session. Optionally, create a FastExport control file in a text editor and save it in the Repository.

HOWTO: To create a FastExport connection: 1) 2) 3) Click Connections > Application in the Workflow Manager. The Connection Browser dialog box appears. Click New. Select a Teradata FastExport connection and click OK.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

26

4)

Enter a name for the FastExport connection.

At run time, PowerCenter starts FastExport and streams the data (in FastExport format) to a named pipe or file. PowerCenter then reads from the named pipe/file. "FastExport file format" is used since this is more efficient than converting everything to ASCII characters (within a FastExport formatted file, numbers are stored as binary data). Important Fast Export Session Attributes Is Staged If Selected FastExport writes data to a stage file Fractional Seconds Precision for Fractional Seconds following the decimal point in a timestamp. Range is between 0 to 6 be very careful as this has to match with table creation. Control File Override Control File override attribute There is a known issue with Control File override currently the overrides in the Control File is not persistent. If Fast export support is not available in your PowerCenter version: For versions of PowerCenter prior to 7.1.3 the following options are available: * Version 3.02 of Teradata's ODBC driver supports array reads (SQLExtendedFetch). * Write a FastExport script and invoke it as a pre-session command task. The FastExport task could also be written to a pipe.
Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Additional Information: * FastExport is an extract utility (similar to external loaders Multi-load, Tpump, and Fastload). * FastExport can be used in streaming mode which avoids the need to stage the file. * FastExport is only available for sources, not for lookups (to do a lookup type operation use a joiner, refer to article 10238). HOW TO: use encryption with Fast Export To encrypt data with FastExport, enable the DataEncryption attribute in the Teradata FastExport connection. The DataEncryption attribute is disabled by default. Teradata FastExport is a new feature available in PowerCenter 7.1.3 and later releases. There is a new repository plug-in that comes with the 7.1.3 install and when you create a new repository using the PowerCenter 7.1.3 Repository Server it automatically registers the plug-in. However, a repository that is created with a previous release of PowerCenter will not have this plug-in registered in the repository. WHAT Command is used behind the screen while running FastExport? When running a session with a Teradata source that extracts the data using FastExport PowerCenter does the following: Runs the fexp command as a child process and opens a named pipe to retrieve data from the Teradata table. Also Either of the following FastExport commands can be used with the .ctl file generated by PowerCenter. fexp -c ".RUN FILE <control file name>;" fexp -r ".RUN FILE <control file name>;" Some known limitations with PowerCenter 7.1.3 1. Teradata FastExport (CR:88001) - Teradata FastExport does not support SQL override at session level If we paste the SQL in the Teradata FastExport Properties in the mappings tab of the session properties it does not override. Either we should allow SQL override in the session level or disable the property. 2. Teradata FastExport (CR:88240) - We support ANSI time only for fastexport as of now Getting error: READER_1_1_1> [PMTDFEXP_EN_305416] [ERROR] Received unexpected data. READER_1_1_1> SDKS_38200 Partition-level [SQ_CERT_ALL_DATATYPES_SRC]: Plug-in #305400 failed in run(). 3. Teradata FastExport (CR:88367) - Teradata FastExport gives different DataEncryption message on TTU v7 have an option DATAENCRYPTION. So customers should use TTUv8 TTU v7 doesnt

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

28

Teradata Parallel Transporter (TPT)

Figure provided by courtesy of NCR Terdata Teradata Parallel Transporter (TPT) is a single utility that is intended to replace Teradata FastLoad, MultiLoad, TPump, and FastExport. It will support a single scripting environment with different modes, with each mode roughly equating to one of the legacy utilities. It also will support parallel loading (i.e., multiple instances of a TPT client could run and load the same table at the same time something the legacy loaders cannot do). Teradata Parallel Transporter is used to massively parallel process reading large amounts of data from Teradata and writing large amounts of data to Teradata. Teradata Parallel Transporter PowerExchange connect provides integration between PowerCenter and Teradata for data extraction and loading. Using the Teradata Parallel Transporter in Informatica PowerCenter, sessions can use Teradata Parallel Transporter to: Read Teradata Sources and Load Teradata Targets It can be used to create a Mapping with a Teradata Source or Target or use an existing mapping created using Teradata ODBC Connection. then use a Teradata Parallel Transporter Connection to connect to the Teradata tables to be loaded or exported in a session PowerCenter Connect for Teradata PT then extracts or loads data using one of the following methods depending upon : Export: Extracts data from Teradata. Load: Used for initial bulk table loading in the Teradata database. Update: Used to update, insert, upsert, and delete data from the Teradata database. Stream: Used to update, insert, upsert, and delete data (continuous data load) from the Teradata database.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Traditionally if you had to do the above, then you would have to use Fastload, Multiload, TPump, FastExport, ODBC or combination of two or more loading mechanisms. Key thing to note is that there are no control files generated under the cover no there is no need to overwrite or store passwords in the file. Also, the metadata lineage is completely preserved within PowerCenter. Performance of TPT is supposedly 20% faster compared to the traditional loading or extraction mechanisms. RELEASE INFORMATION Support since Informatica PowerCenter Version: 8.1.1.0.2 which was released July 2007. Supports Teradata TPT API 8.2 Prerequisites (on machine where PowerCenter Integration Service is running): Teradata Parallel Transporter API 8.2 Teradata CLIv2 4.8.2 Shared ICU libraries for Teradata 01.01.02.xx Teradata GSS Client nt-i386 06.02.00.00 A seperate license is required as per Teradata for TPT API.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

30

Connection attributes for Teradata Parallel Transporter (TPT) Connection Attribute TDPID Database Name Tenacity Max Session Block Size Sleep Data Encryption System Operator Log Database Log Table Name Error Database Error Table Name 1 Error Table Name 2 Drop Error Tables Required/Optional Required Optional Optional Optional Optional Optional Optional Required Optional Required Optional Optional Optional N/A Description Name/IP Address of Teradata Server Host Teradata database name Number of hours the driver attempts to log on Maximum number of sessions to log on. Block size in bytes used when returning data to the Client. Number of minutes the driver pauses before attempting to log on Activates full security encryption of SQL requests, responses and data Data loading operator Name of the log database Name of the restart log table for restart information Name of the error database Name of the first error table Name of the second error table Reserved for future use

Known Issues in 8.1.1 SP4 CR 130060: Session processes a different number of rows than configured for test load When you enable test load in the session properties, the total number of rows processed by the session might differ from the number of rows you configure for the test load in the session properties. CR 130061: Teradata PT API 8.2 UPDATE and STREAM system operators not supported on UNIX platforms TPT API 8.2 supports only LOAD and EXPORT system operators in ASCII and UNICODE mode on UNIX platforms. The UPDATE system operator fails on multi-AMP instances and the STREAM system operator cannot load UTF-8 data. CR 130062: Teradata PT API 8.2 might not return correct row statistics in the session load summary Teradata PT API 8.2 might not include the correct number of affected and rejected rows for the update strategy, including insert, update, and delete operations. Workaround: If a session fails, use the session log and error and log tables for more information about errors that occurred during the session. CR 130064: Cannot insert data using LOAD system operator for multiple pass-through partitions The LOAD system operator requires the target table to be empty for loading data. If any partition establishes a connection with Teradata and starts inserting the data, Teradata locks the target table. As a result, other partitions cannot establish connections with the target table. CR 130065: Cannot insert, update, or delete data using the UPDATE system operator for multiple pass-through partitions

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

The UPDATE system operator cannot establish a connection with a Teradata target table for a partition when the target table is already being loaded by another partition. Using the UPDATE operator with multiple pass-through partitions might cause inconsistent results. CR 170071: Decimal data type behaves incorrectly for high precision data when you enable the Enable High Precision session property For the data with precision greater than 10, the last digit of a decimal number is not the same as the last digit of the source data when you enable the Enable High Precision session property. Workaround: Disable the Enable High Precision session property.

ETL Vs EL-T Design Paradigm (PushDown Optimization)


Informatica PowerCenter embeds a Powerful engine that actually has a memory management system built within and all the smart algorithms built into the engine to perform various transformation operations such as aggregation, sorting, joining, lookup etc. This is a typically referred to as ETL architecture where EXTRACTS, TRANSFORMATIONS and LOAD performed. So in other words data is extracted from the data source to the PowerCenter Engine (can be on the same machine as the source or a separate machine) where all the transformations are applied and then pushed to the target. In this scenario, some of the performance considerations are as there is data transfer, the network has to be fast and tuned effectively and also the Hardware on which PowerCenter is running should be a powerful machine with high processing power and high memory. EL-T is a new design or runtime paradigm that is becoming popular with the advent of higher performing RDBM systems be it DSS or OLTP. And Terdata specially runs on well tuned operating system and well tuned hardware so EL-T paradigm just tries to maximize the benefits of this system by pushing as much as transformation logic on the Teradata Box. EL-T design paradigm can be achieved through Pushdown Optimization option provided within Informatica PowerCenter 8.1 version.

Maximizing Performance using Pushdown Optimization


You can push transformation logic to the source or target database using pushdown optimization. The amount of work you can push to the database depends on the pushdown optimization configuration, the transformation logic, and the mapping and session configuration. When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic. The Integration Service analyzes the transformation logic, mapping, and session configuration to determine the transformation logic it can push to the database. At run time, the Integration Service executes any SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database. Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to Pushdown Optimization.

T e ra d a ta _ S o u rc e (T e ra d a ta )

SQ_T D _SR C

FIL T R A N S

T e ra d a ta _ T a rg e t (T e ra d a ta )

Figure showing the mapping

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

32

The mapping contains a Filter transformation that filters out all items except for those with an ID greater than 1005. The Integration Service can push the transformation logic to the database, and it generates the following SQL statement to process the transformation logic: INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE (ITEMS.ITEM_ID >1005) The Integration Service generates an INSERT SELECT statement to obtain and insert the ID, NAME, and DESCRIPTION columns from the source table, and it filters the data using a WHERE clause. The Integration Service does not extract any data from the database during this process.

Running Pushdown Optimization Sessions


When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and transformations to determine the transformation logic it can push to the database. If the mapping contains a mapplet, the Integration Service expands the mapplet and treats the transformations in the mapplet as part of the parent mapping. You can configure pushdown optimization in the following ways: Using source-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the source database. Using target-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the target database. Using full pushdown optimization: The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs partial pushdown optimization instead.

Running Source-Side Pushdown Optimization Sessions


When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the database. The Integration Service generates a SELECT statement based on the transformation logic for each transformation it can push to the database. When you run the session, the Integration Service pushes all transformation logic that is valid to push to the database by executing the generated SQL statement. Then, it reads the results of this SQL statement and continues to run the session. If you run a session that contains an SQL override, the Integration Service generates a view based on the SQL override. It then generates a SELECT statement and runs the SELECT statement against this view. When the session completes, the Integration Service drops the view from the database.

Running Target-Side Pushdown Optimization Sessions


When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database, starting with the first transformation in the pipeline it can push to the database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the target database. Then, it executes the generated SQL.

Running Full Pushdown Optimization Sessions

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

To use full pushdown optimization, the source and target must be on the same database. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. It generates SQL statements that are executed against the source and target database based on the transformation logic it can push to the database. If the session contains an SQL override, the Integration Service generates a view and runs a SELECT statement against this view. When you run a session for full pushdown optimization, the database must run a long transaction if the session contains a large quantity of data. Consider the following database performance issues when you generate a long transaction: A long transaction uses more database resources. A long transaction locks the database for longer periods of time, and thereby reduces the database concurrency and increases the likelihood of deadlock. A long transaction can increase the likelihood that an unexpected event may occur.

Integration Service Behavior with Full Optimization


When you configure a session for full optimization, the Integration Service might determine that it can push all of the transformation logic to the database. When it can push all transformation logic to the database, it generates an INSERT SELECT statement that is run on the database. The statement incorporates transformation logic from all the transformations in the mapping. When you configure a session for full optimization, the Integration Service might determine that it can push only part of the transformation logic to the database. When it can push part of the transformation logic to the database, the Integration Service pushes as much transformation logic to the source and target databases as possible. It then processes the remaining transformation logic. For example, a mapping contains the following transformations:

The Rank transformation cannot be pushed to the database. If you configure the session for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source. It pushes the Expression transformation and target to the target database, and it processes the Rank transformation. The Integration Service does not fail the session if it can push only part of the transformation logic to the database. Known Issues with Teradata and PowerCenter 8.1.1

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

34

You may encounter the following problems using ODBC drivers with a Teradata database: Teradata sessions fail if the session requires a conversion to a numeric datatype and the precision is greater than 18. Teradata sessions fail when you use full pushdown optimization for a session containing a Sorter transformation. A sort on a distinct key may give inconsistent results if the sort is not case sensitive and one port is a character port. A session containing an Aggregator transformation may produce different results from PowerCenter if the group by port is a string datatype and it is not case-sensitive. A session containing a Lookup transformation fails if it is configured for target-side pushdown optimization. A session that requires type casting fails if the casting is from x to date/time. A session that contains a date to string conversion fails.

Sample mapping with two partitions The first key range is 1313 - 3340, and the second key range is 3340 - 9354. The SQL statement merges all the data into the first partition: INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT ITEMS ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS WHERE (ITEMS.ITEM_ID>=1313)AND ITEMS.ITEM_ID<9354) ORDER BY ITEMS.ITEM_ID The SQL statement selects items 1313 through 9354, which includes all values in the key range and merges the data from both partitions into the first partition. The SQL statement for the second partition passes empty data:

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) ORDER BY ITEMS.ITEM_ID

Working with SQL Overrides


You can configure the Integration Service to perform an SQL override with pushdown optimization. To perform an SQL override, you configure the session to create a view. When you use an SQL override for a Source Qualifier transformation in a session configured for source or full pushdown optimization with a view, the Integration Service creates a view in the source database based on the override. After it creates the view in the database, the Integration Service generates an SQL query that it can push to the database. The Integration Service runs the SQL query against the view to perform pushdown optimization. Note: To use an SQL override with pushdown optimization, you must configure the session for pushdown optimization with a view. Running a Query If the Integration Service did not successfully drop the view, you can run a query against the source database to search for the views generated by the Integration Service. When the Integration Service creates a view, it uses a prefix of PM_V. You can search for views with this prefix to locate the views created during pushdown optimization. Teradata specific sql: SELECT TableName FROM DBC.Tables WHERE CreatorName = USER AND TableKind ='V' AND TableName LIKE 'PM\_V%' ESCAPE '\' Rules and Guidelines for SQL OVERIDE Use the following rules and guidelines when you configure pushdown optimization for a session containing an SQL override: 1. 2. 3. 4. 5. 6. 7. 8. Do not use an order by clause in the SQL override. Use ANSI outer join syntax in the SQL override. Do not use a Sequence Generator transformation. If a Source Qualifier transformation is configured for a distinct sort and contains an SQL override, the Integration Service ignores the distinct sort configuration. If the Source Qualifier contains multiple partitions, specify the SQL override for all partitions. If a Source Qualifier transformation contains Informatica outer join syntax in the SQL override, the Integration Service processes the Source Qualifier transformation logic. PowerCenter does not validate the override SQL syntax, so test the SQL override query before you push it to the database. When you create an SQL override, ensure that the SQL syntax is compatible with the source database.

Configuring Sessions for Pushdown Optimization


You configure a session for pushdown optimization in the session properties. However, you may need to edit the transformation, mapping, or session configuration to push more transformation logic to the database. Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. To configure a session for pushdown optimization:

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

36

1.

In the Workflow Manager, open the session properties for the session containing transformation logic you want to push to the database.

2.

From the Properties tab, select one of the following Pushdown Optimization options: None To Source To Source with View To Target Full Full with View Click on the Mapping Tab in the session properties. Click on View Pushdown Optimization. The Pushdown Optimizer displays the pushdown groups and the SQL that is generated to perform the transformation logic. It displays messages related to each pushdown group. The Pushdown Optimizer Viewer also displays numbered flags to indicate the transformations in each pushdown group. View the information in the Pushdown Optimizer Viewer to determine if you need to edit the mapping, transformation, or session configuration to push more transformation logic to the database.

3. 4. 5. 6.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Design Techniques
As Pushdown Optimization option supports most of the Transformations on the Source Side, while running a Data Transfer job, please a simple pass through mapping and stage the entire source Data in staging areas on the Target Teradata Database and just use Full Pushdown Optimization option to run the actual mapping that has the transformations. This design will avail all the benefits that come with Push Down / EL-T Approach using full pushdown.

Effectively designing mappings for PushDown Optimization Attached below is an example of a mapping that needs to be redesigned in order to use Pushdown Optimization

Source_Table (T eradata)

Source_Qualifier

Lookup_1

Lookup_3

Filter_1

Target_Table (T eradata)

In the above mapping, there are two lookups, one filter. And as the staging area is the same as the target area, we can use PushDown Optimization in order to achieve high performance. But parallel lookups are not supported within PowerCenter 8.1.1 so the mapping needs to be redesigned. Please see below for the redesigned mapping.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

38

Source_Table (T eradata)

Source_Qualifier

Lookup_1

lookup_2

Filter_1

Target_Table (T eradata)

In order to use PushDown Optimization, the lookups have been serialized which makes them a sub query while generating the SQL. Please see the figure below that shows the complete SQL and PushDown Configuration using Full Pushdown option.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Sample SQL generated is shown below Group 1 INSERT INTO Target_Table (ID,ID2,SOME_CAST) SELECT Source_Table.ID, Source_Table.SOME_CONDITION, CAST(Source_Table.SOME_CAST), Lookup_1.ID, Source_Table.ID, FROM ((Source_Table LEFT OUTER JOIN Lookup_1 ON
Informatica Confidential. Do not duplicate. Revision: 1/25/2008 40

(Lookup_1.ID = Source_Table.ID) AND (Source_Table.ID2 = (SELECT Lookup_2.ID2 FROM Lookup_2 Lookup_1 WHERE (Lookup_1.ID = Source_Table.ID2)))) LEFT OUTER JOIN Lookup_1 Lookup_2 ON (Lookup_1.ID = Source_Table.ID) AND (Source_Table.ID = (SELECT Lookup_2.ID2 FROM Lookup_2 WHERE (Lookup_2.ID2 = Source_Table.ID2)))) WHERE (NOT (Lookup_1.ID1 IS NULL) AND NOT (Lookup_2.ID2 IS NULL)) As you can see from the above example, very complicated SQL can be generated using PushDown Optimization. Some points to remember while configuring sessions is to make sure the right joins are being generated.

FAQs
Uncached Lookup Date/Time limitation
From the 6.1 release notes: When you run a session with a mapping that uses an uncached lookup on a Teradata database, the Informatica Server fails the session if any transformation port in the lookup condition uses a Date/Time datatype. The Informatica Server writes the following Teradata error message to the session log:
[NCR][ODBC Teradata Driver][Teradata RDBMS] Invalid operation on an ANSI Datetime or Interval value.

Workaround: Configure the Lookup transformation to use a lookup cache, or remove the Date/Time port from the lookup condition. There is now a better workaround. From the v7.1.2 release notes: Workaround: Apply the Teradata ODBC patch 3.2.011 or later and remove NoScan=Yes from the odbc.ini file.

Streaming/Non-Staged Mode
If one selects streaming (a.k.a. non-staged) mode for a loader, one should also set the checkpoint property to 0. This effectively turns off the checkpoint processing. Checkpoint processing is used for recovery/restart of fastload and multiload sessions. However, if one is not using a physical file as input, but rather a named pipe, then the recovery/restart mechanism of the loaders does not work. Not only does this impact performance (i.e. the checkpoint processing is not free and we want to eliminate as much unnecessary overhead as possible), but a non-zero checkpoint value will sometimes cause seemingly random errors and session failures when used with named pipe input (as is the case with streaming mode).

Creating a session which performs both inserts and updates MkIII Update(6/05): PowerCenter now supports data driven sessions that target tpump or multiload. This feature is called Teradata mixed mode processing. Essentially, because MultiLoad and Tpump support inserts, updates and deletes, the row indicator is also written
Informatica Confidential. Do not duplicate. Revision: 1/25/2008

to the output file/stream and the generated control file is enhanced to obey the row indicator. When using this option, make sure to set both the sessions and loaders mode property to data driven. Important usage note: Mixed mode only works when there is single target definition instance for the target table. That is, if one has a target definition instance of the target table to which the insert rows are mapped, and another target definition instance to which update rows are mapped, this will not work as expected. The problem is that PowerCenter will start an instance of the loader for each target definition instance, and two mloads cannot write to the same table at the same time. Multiple tpumps may be able to write to the same table, although you could get locking conflicts between the inserts, updates and deletes. If one needs to update different columns than when inserting, override the targets update SQL (see section 5.2 below). If this does not work, then perhaps the legacy solution described below will. Legacy Notes: Suppose you need to populate a slowly changing dimension. By typical design, the mapping will contain at least two instances of the target definition, an insert target definition and an update target definition. The type of operation MultiLoad or Tpump performs is determined by the Load Mode property of the External Loader (defined within the Server Manager). In this way, one can create External Loaders for each type of Load Mode (e.g. a insert MultiLoad, an update MultiLoad, etc.), and then assign the corresponding External Loader to the correct target definition instance (i.e. assign the insert loader to the insert target definition instance and the update loader to the update target definition instance) except this does not work! This should work fine for Tpump (although there could be locking conflicts between the updates and inserts), but this will not work for MultiLoad (dont even think about using FastLoad since it requires the target table to be empty except as described below). Only a single instance of MultiLoad can run at any one time. Following the method described above, PowerCenter would start two instance of MultiLoad (one for the inserts and one for the updates). Whichever MultiLoad instance that starts second will fail. The simple workaround is as follows: 1) Configure PowerCenter to use the integrated MultiLoad support for the larger of the two output files. That is, if the majority of rows will be inserts, configure PowerCenter to use the insert MultiLoad external loader for the insert file. 2) Load the remaining file using MultiLoad called via a post-session script (Run a dummy session to generate a MultiLoad control file for this file, then use this control file for the post-session script). The syntax to run MultiLoad from a post session script is simply: mload < <control file> For example, if the target definitions output file is named td_test_update.out: mload < ./TgtFiles/td_test_update.out.ctl The benefit of this approach is that larger file is streamed into Teradata, only one of the output files must be completely staged prior to loading, and it is fairly simple to setup. The downside is that MultiLoad is invoked twice, and this incurs more overhead on the Teradata system. A more difficult workaround that only calls MultiLoad once is described below in section 5.5 Using one instance of MultiLoad to load multiple tables. Using Update SQL Override on Target Definitions Both MultiLoad and Tpump can do updates as well as inserts to the target table (again, FastLoad does one thing: insert into empty tables). In addition, they also support the concept of an upsert (update if exists else insert). Where does this update statement come from? By default, the update statement generated in a MultiLoad or Tpump script is just like the update statement that would be used by a native connection (update all the targets mapped ports using the key ports in the where clause). However, one can override the target definitions update SQL to change this default behavior. When one overrides a targets update SQL, the SQL from the Override SQL property of the mappings target definition is moved to the MultiLoad or Tpump script verbatim. This means that one must get rid of each columns :TU. prefix because the utilities do not understand this PowerCenter specific nomenclature. Why would you ever do this? Suppose you need to do an incremental aggregation against an existing table that already contains many millions of rows. That is, suppose every day a set of aggregations must be computed and then applied as an upsert to an existing table (i.e. if the aggregate already exists, update the existing row to reflect to latest data, otherwise insert a new row). You could use
Informatica Confidential. Do not duplicate. Revision: 1/25/2008 42

PowerCenters patented incremental aggregation capability. However, if the table already existed before PowerCenter came into the picture, there is the problem of building the initial incremental aggregation cache (the initial transactional source data is probably long gone). There are also session restart issues caused by PowerCenters incremental aggregation (and Teradata folks seem to be keen on making sure there is always a mechanism to assure clean restarts). How do you do this? Assume youre computing a single aggregate to be incrementally applied to a single target table. Compute the aggregate as usual and map all aggregate ports to the target definition (no need for any update strategies). Now, override the target update SQL to add the computed values (those coming from the mapping) to the existing values (those coming from the table) and modify the SQL to make it syntactically correct for MultiLoad or Tpump. Here is an example of a target definition SQL Override that, when combined with the Upsert Load Mode, will perform an incremental aggregation. Here is the original update SQL (updating every non-primary key field in the table): UPDATE TERA_DIST_INVENTORY SET QTY = :TU.QTY, LAST_TRANS = :TU.LAST_TRANS WHERE PRODUCT = :TU.PRODUCT

Heres the modified MultiLoad SQL (adds computed QTY to existing QTY and updates the last transaction date column the td at the end of the date field name is just something the MultiLoad control file generation routine does): UPDATE TERA_DIST_INVENTORY SET QTY = :QTY + QTY, LAST_TRANS = :LAST_TRANStd WHERE PRODUCT = :PRODUCT

Heres the modified Tpump SQL (same as above, but as of v5.1.1, Tpump and MultiLoad scripts use different naming conventions for date fields Tpump uses no td suffix on dates): UPDATE TERA_DIST_INVENTORY SET QTY = :QTY + QTY, LAST_TRANS = :LAST_TRANS WHERE PRODUCT = :PRODUCT

Date Formats Key Point: If target table contains dates, expect to run into problems. Heres why: When one creates a date column in a Teradata table, one can specify a display format for the date. Not only does this determine the format in which dates will be displayed by the Teradata client tools, but it also, unexpectedly, determines the format in which dates can be loaded into the column. For example, suppose a date column has been created with a format of yyyy/mm/dd, if one attempts to load a date string formatted as mm/dd/yyyy into the column, the load will fail! This is further complicated by the fact that PowerCenter only supports a small subset of the date formats supported by Teradata (basically, PowerCenter supports yyyy/mm/dd and mm/dd/yyyy). Also, it is unwise to assume consistency in date column formats between tables. That is, experience has shown that just because a particular date format has been specified for tableA, there is no guarantee that the same format will be specified in tableB. NOTE: The QueryMan client tool does not respect the format option on dates. That is, it displays dates in a consistent format regardless of the declared format option. To view the format defined for a date field, you must run the command show table <table name> (or run a select from the table using BTEQ). An especially dangerous Teradata date format which one might run into is yyyyddd. This corresponds to the 4 digit century followed by the day in the year (e.g. 1-365). If you run into a column defined like this, you must do the following workaround: 1) Edit the target table definition within PowerCenter to change the date columns data type from date to char(7). 2) Create an expression transformation that converts the date port into a string of the format yyyyddd (i.e. to_char(date_port,yyyy) || to_char(date_port,ddd) note, to_char(date_port, yyyyddd) does not work.). 3) Map the output of this transformation expression into the port of the target definition that was changed in step 1).

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Of course, another alternative would be to get the column redefined with a date format that we support. It is important to note that the date format does not change the way that a date is internally stored by Teradata. This might help the argument for making a change in the format. Creating the work, log and error tables in a different database MkIII Update(6/05): PowerCenter now supports more flexibility in the creation of work, log and error tables. The loaders now support properties to specify the database in which to put these tables. Legacy Notes: By default, the MultiLoad and Tpump scripts generated by PowerCenter will place the work, log and error tables in the same database as the target table (For a little more detailed discussion of the error tables, see section 6 Troubleshooting). Sometimes it is a sites standard to put the work, log or error tables in a database other than the target. Heres the workaround (Note: At this point, it is probably a good idea to reference the Teradata MultiLoad or Tpump documentation. It would also be a good idea to enlist somebody from the prospect to help with this as well this somebody should be familiar with the standards to which you are trying to conform.): To change the location of the work table, add a WORKTABLES clause to the BEGIN MLOAD statement: .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST WORKTABLES dumpdb.WT_TD_TEST ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; To change the location of the log table, find the following line and change the database name (infatest is the database name): .LOGTABLE infatest.mldlog_TD_TEST; To change the location of the error tables, add an ERRORTABLES clause to the BEGIN MLOAD statement: .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST WORKTABLES dumpdb.WT_TD_TEST ERRORTABLES dumpdb.ET_TD_TEST dumpdb.UV_TD_TEST ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; If you change the name or location of the log or error tables, you will also need to change the statements that drop these tables at the beginning of the script: DROP TABLE dumpdb.UV_TD_TEST ; DROP TABLE dumpdb.WT_TD_TEST ; DROP TABLE dumpdb.ET_TD_TEST ; After you make these changes to the generated control file, run the chmod command (change file mode) to make the control file readonly. In this way, PowerCenter will not overwrite these changes the next time it runs the session:

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

44

chmod w td_test.out.ctl The obvious downside to this is maintenance. When/If the target table changes, youll need to update the control file to reflect the changes. Using one instance of MultiLoad to load multiple tables MultiLoad is a big consumer of resources on a Teradata system. Some systems will have hard limits on the number of concurrent MultiLoad sessions allowed. By default, PowerCenter will start an instance of MultiLoad for every target file. Sometimes, this is illegal (if the multiple instances target the same table). Other times, it is just expensive. Therefore, a prospect may ask that PowerCenter use a single instance of MultiLoad to load multiple tables (or to load both inserts and updates into the same target table). To make this happen, were back to heavy editing of the generated MultiLoad script file. Note: This should not be an issue with Tpump because Tpump is not as resource intensive as MultiLoad (and a multiple concurrent instances of Tpump can target the same table). Heres the workaround: 4) 5) 6) Use a dummy session (i.e. set test rows to 1 and target a test database) to generate MultiLoad control files for each of the targets. Merge the multiple control files (one per target table) into a single control file (one for all target tables) Configure the session to call MultiLoad from a post-session script using the control file created in step 2. Integrated support cannot be used because each input file is processed sequentially and this causes problems when combined with PowerCenters integrated named pipes and streaming.

Details on merging the control files: 9) There is a single log file for each instance of MultiLoad. Therefore, you do not have to change or add anything the LOGFILE statement. However, you might want to change the name of the log table since it may be a log that spans multiple tables. 10) Copy the work and error table delete statements into the common control file 11) Modify the BEGIN MLOAD statement to specify all the tables that the MultiLoad will be hitting 12) Copy the Layout sections into the common control file and give each a unique name. Organize the file such that all the layout sections are grouped together. 13) Copy the DML sections into the common control file and give each a unique name. Organize the file such that all the DML sections are grouped together. 14) Copy the Import statements into the common control file and modify them to reflect the unique names created for the referenced LAYOUT and DML sections created in steps 4) and 5). Organize the file such that all the Import sections are grouped together. 15) Run chmod w on the newly minted control file so PowerCenter doesnt overwrite it, or, better yet, name it something different so PowerCenter cannot overwrite it. 16) Its just that easy!!! Also remember, a single instance of MultiLoad can target at most 5 tables. Therefore, dont combine more than 5 target files into a common file. Heres an example of a control file merged from two default control files: .DATEFORM ANSIDATE; .LOGON demo1099/infatest,infatest; .LOGTABLE infatest.mldlog_TD_TEST; DROP TABLE infatest.UV_TD_TEST ; DROP TABLE infatest.WT_TD_TEST ; DROP TABLE infatest.ET_TD_TEST ; DROP TABLE infatest.UV_TD_CUSTOMERS ; DROP TABLE infatest.WT_TD_CUSTOMERS ; DROP TABLE infatest.ET_TD_CUSTOMERS ;

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

.ROUTE MESSAGES WITH ECHO TO FILE c:\LOGS\TgtFiles\td_test.out.ldrlog ; .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST, infatest.TD_CUSTOMERS ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; /* Begin Layout Section */ .Layout InputFileLayout1; .Field CUST_KEY .Field CUST_NAME .Field CUST_DATE .Field CUST_DATEmm .Field CUST_DATEdd .Field CUST_DATEyyyy .Field CUST_DATEtd CUST_DATE .Filler EOL_PAD 1 CHAR( 12) NULLIF CUST_KEY = '*' ; 13 CHAR( 20) NULLIF CUST_NAME = '*' ; 33 CHAR( 10) NULLIF CUST_DATE = '*' ; 33 CHAR( 2) ; 36 CHAR( 2) ; 39 CHAR( 4) ; CUST_DATEyyyy||'/'||CUST_DATEmm||'/'||CUST_DATEdd NULLIF = '*' ; 43 CHAR( 2) ;

.Layout InputFileLayout2; .Field .Field .Field .Field .Field .Field .Field .Field .Field .Field .Field .Field .Field .Filler CUSTOMER_KEY CUSTOMER_ID COMPANY FIRST_NAME LAST_NAME ADDRESS1 ADDRESS2 CITY STATE POSTAL_CODE PHONE EMAIL REC_STATUS EOL_PAD 1 13 25 75 105 135 207 279 309 311 321 351 381 382 CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( CHAR( 12) ; 12) ; 50) NULLIF 30) NULLIF 30) NULLIF 72) NULLIF 72) NULLIF 30) NULLIF 2) NULLIF 10) NULLIF 30) NULLIF 30) NULLIF 1) NULLIF 2) ;

COMPANY FIRST_NAME LAST_NAME ADDRESS1 ADDRESS2 CITY STATE POSTAL_CODE PHONE EMAIL REC_STATUS

= = = = = = = = = = =

'*' '*' '*' '*' '*' '*' '*' '*' '*' '*' '*'

; ; ; ; ; ; ; ; ; ; ;

/* End Layout Section */ /* begin DML Section */

.DML Label tagDML1; INSERT INTO infatest.TD_TEST ( CUST_KEY CUST_NAME


Informatica Confidential. Do not duplicate. Revision: 1/25/2008

, ,
46

CUST_DATE ) VALUES ( :CUST_KEY :CUST_NAME :CUST_DATEtd ) ; .DML Label tagDML2; INSERT INTO infatest.TD_CUSTOMERS ( CUSTOMER_KEY CUSTOMER_ID COMPANY FIRST_NAME LAST_NAME ADDRESS1 ADDRESS2 CITY STATE POSTAL_CODE PHONE EMAIL REC_STATUS ) VALUES ( :CUSTOMER_KEY :CUSTOMER_ID :COMPANY :FIRST_NAME :LAST_NAME :ADDRESS1 :ADDRESS2 :CITY :STATE :POSTAL_CODE :PHONE :EMAIL :REC_STATUS ) ; /* end DML Section */ /* Begin Import Section */

, ,

, , , , , , , , , , , ,

, , , , , , , , , , , ,

.Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout1 Format Unformat Apply tagDML1 ; .Import Infile c:\LOGS\TgtFiles\td_customers.out Layout InputFileLayout2 Format Unformat Apply tagDML2 ;
Informatica Confidential. Do not duplicate. Revision: 1/25/2008

/* End Import Section */ .END MLOAD; .LOGOFF; Partitioned Loading As previously mentioned, without special behavior, one cannot simultaneously run multiple instances of MultiLoad targeting the same table. This is exactly what PowerCenter would do if one were allowed to specify Teradata external loaders for a partitioned session. However, special behavior has been added to PowerCenter. Please read on MkIII Update(6/05): With PowerCenter v7.x, if one sets a round robin partition point on the target definition and sets each target instance to be loaded using the same loader connection instance, then PowerCenter automatically writes all data to the first partition and only starts one instance of FastLoad or MultiLoad. You will know you are getting this behavior if you see the following entry in the session log: MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions. data will be routed to the first partition. If you do not see this message, then chances are the session fails with the following error: WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not support partitioned sessions. WRITER_1_*_1> Thu Jun 16 11:58:21 2005 WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating. Legacy Notes: With v6.1 and beyond, there is a special undocumented pmserver.cfg/registry variable to handle this (this is equally applicable to UDB, SybaseIQ as well as Teradata). One must add the following line to pmserver.cfg: SupportNonPartitionedLoaders=Yes On Win2k/NT, the following registry entry must be created: HK_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Powermart\Parameters\MiscInfo\Su pportNonPartitionedLoaders=Yes When this flag is set to Yes, a writer thread does not actually start the external loader process until it receives data. However, you still must configure your session so that only a single writer thread receives data. This is typically accomplished by placing a key range partition point on the target definition. The key range must then be configured such that one and only one partition receives data (place an all inclusive key range on one partition and a non-inclusive key range on all others). While not required at this time, it is recommended that partition 1 be the non-null partition. This is to support future behavior changes to this flag. Prior to v6.1, PowerCenter did not allow one to configure integrated MultiLoad support for partitioned sessions. The workaround is: 1) Use a dummy non-partitioned session (i.e. set test rows to 1 and target a test database) to generate the MultiLoad control file. 2) Check the Merge targets for partitioned sessions check box under the Target Options for the partitioned session 3) Configure the session to call MultiLoad from a post-session script using the control file created in step 1. Streaming data into MultiLoad and Tpump on Win2K/NT MkIII Update(6/05): PowerCenter now supports streaming on Win2K to MultiLoad, FastLoad, Tpump and TWB automatically with no extra work. Simply select or deselect the staged property. All

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

48

Legacy Notes: In general, Win2K/NT does not support the Unix facility of named pipes. Therefore, INFA has not been able to stream into external loaders on Win2K/NT. However, Teradata supports a special named pipes access module that INFA can leverage to stream data to/from the Teradata utilities (MultiLoad, Tpump, FastLoad and FastExport). To do this, one uses the ASXMOD (Access Module) option of the various tools. The following is an example of streaming data into Tpump (MultiLoad and FastLoad would be very similar). One must modify the IMPORT statement in the Tpump command script to specify a Win2K/NT named pipe instead of the PowerCenter output file and the AXSMOD modifier must be specified: Default: .Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout Format Unformat Apply tagDML ; Modified to use Teradata Named Pipe Access Module: .Import Infile \\.\pipe\mypipe axsmod np_axsmod.dll Layout InputFileLayout Format Unformat Apply tagDML; Unlike Unix named pipes, the Teradata implementation of Win2K/NT named pipes can support checkpoint restart capabilities. Also, the Teradata utility is responsible for creating and deleting the named pipe. It is likely that the PATH environment variable of the PowerCenter server must include the directory where np_axsmod.dll is located. After modifying the tpump control file to use a named pipe (it is suggested that you rename the control file to be something other than the default name so it does not get overwritten if you re-run the session with tpump or mload specified for the target), you must reconfigure the session to write to the named pipe rather than to tpump. That is, do not specify an external loader and specify the named pipe (e.g. \\.\pipe\mypipe ) as the output file for the session. In addition, you must run tpump as a pre-session command as the tpump access module is responsible for creating the named pipe, and the named pipe must exist before PowerCenter can write to it. For more information on the Win2K/NT Named Pipe Access Module, see the Teradata manual: Teradata Tools and Utilities Access Module Reference.

Lookup Performance
Creating a lookup cache based on a multi-million row table may time some time. This is especially true since the lookup will be using ODBC to populate the cache. A non-cached lookup may actually improve overall throughput because a big NCR box can service simple queries very fast vs. pulling millions of rows into an often much smaller box. Your mileage will vary and there are no hard and fast rules about this just keep this in mind as a potential area to improve performance.

Hiding the Password


Teradata loaders use clear text passwords. They offer no nice way to obscure the password within the file. Some prospects see this as a security concern. The easiest solution is to lock down the directory in which the control file is generated so that the general public cannot read the control file.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

You can also direct PowerCenter to write the control files to a different location (directory) and then secure this location. To configure the PowerCenter Server on UNIX to write the external loader control file to a separate directory, add the following entry to pmserver.cfg:
LoaderControlFileDirectory=<directory_name>

On Win2K, add the following to the registry HK_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Powermart\Parameters\MiscInfo\Lo aderControlFileDirectory=<directory_name> Finally, MultiLoad and Tpump (but NOT FastLoad) support a command called RUN FILE. This essentially directs control from the current control file to the control file specified in the script. Place your login statements in a file in a secure location, and then add a RUN FILE statement to the generated control file to call it. For example, create a login script as follows (in the file login.ctl in a <secure directory path>): .LOGON demo1099/infatest,infatest; The modify the generated control file to replace the login statement with: .RUN FILE <secure directory path>/login.ctl;

Troubleshooting
A MultiLoad load actually consists of two main phases: acquisition and application. In the acquisition phase, the input data file is read and written to a temporary work table. During the application phase, the data from the work table is written to the actual target table. Different errors will appear in the different phases. Errors that have to do with the format or content of the source data will generally be identified during the acquisition phase. Errors that have to do with the data as it is stored in the target table (i.e. constraints, primary keys) will generally occur during the application phase. MultiLoad requires an exclusive lock on the target table during the application phase. Tpump is a single phase load. In fact, Tpump does not do anything fancy except for macro-ized SQL. That is, it takes the SQL from the control file and turns it into a database macro. It then takes the input data and applies the macro to it. It uses standard SQL and standard locking.

The et table Errors generated during the acquisition phase of a MultiLoad or during a Tpump load can be found in the et table (i.e. et_<table name> by default). This is generally the first place to look for more specific information when a load fails. The key column of the error table is ErrorField. This column indicates the column of the target table that could not be loaded. There is also an ErrorCode field that provides details about why the column failed. The most common ErrorCodes are: 2689: Trying to load a null value into a non-null field 2665: Invalid date format See the Teradata documentation for a complete list of possible error codes.

The uv table
Informatica Confidential. Do not duplicate. Revision: 1/25/2008 50

Error generated during the application phase of an MultiLoad can be found in this table (because Tpump does not have an application phase, it does not create a uv table). The most common types of errors logged to the uv table are non-unique primary keys, field overflow and constraint violations. Like the et table, the key column of the uv table is DBCErrorField and DBCErrorCode. The DBCErrorField column is not initialized in the case of primary key uniqueness violations. However, DBCErrorCode that corresponds to a primary key uniqueness violation is 2794.

Errors that indicate a prior MultiLoad session has not been cleaned up The following is an excerpt from the tail end of a MultiLoad session log (<output file name>.ldrlog found in the target file directory): **** 21:23:55 UTY0817 MultiLoad submitting the following request: CHECKPOINT LOADING; **** 21:23:56 UTY0805 RDBMS failure, 2801: Duplicate unique prime key error in infatest.ET_TD_TEST. ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 21:23:58 UTY6212 A successful disconnect was made from the RDBMS. **** 21:23:58 UTY2410 Total processor time used = '0.0701008 Seconds' . Start : 21:23:48 - THU APR 18, 2002 . End : 21:23:58 - THU APR 18, 2002 . Highest return code encountered = '12'. This error indicates that youre trying to run a MultiLoad session without properly cleaning up a previously fail MultiLoad session. See section 4.5. You might also see messages similar to these in the pmservers standard output file (see section 4.3): 0003 .LOGTABLE infatest.mldlog_TD_TEST; **** 21:23:50 UTY8400 Default character set: ASCII **** 21:23:50 UTY8400 Maximum supported buffer size: 64K **** 21:23:51 UTY6211 A successful connect was made to the RDBMS. **** 21:23:51 UTY6210 Logtable 'infatest.mldlog_TD_TEST' indicates that a restart is in progress. 0004 DROP TABLE infatest.UV_TD_TEST ; **** 21:23:51 UTY1012 A restart is in progress. executed. The return code was: 0. This will also be corrected by properly cleaning up from the MultiLoad. Sessions periodically fail with broken pipe errors when writing to a loader in streaming (non-staging) mode: If youre using FastLoad or MultiLoad and have a non-zero value specified for the checkpoint property, this can happen. Set the loaders checkpoint property to 0.
Informatica Confidential. Do not duplicate. Revision: 1/25/2008

This request has already been

Copyright 2005 Informatica Corporation. Informatica and PowerCenter are registered trademarks of Informatica Corporation. Teradata is a registered trademark of NCR Corporation. All other company, product, or service names may be trademarks or registered trademarks of their respective owners.

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

52

Вам также может понравиться