Вы находитесь на странице: 1из 17

Tera-Tom on Teradata Utilities V12-V13

by Tom Coffing Coffing Data Warehousing. (c) 2011. Copying Prohibited.

Reprinted for Hema Ganapathy, Cognizant Technology Solutions Hema.Ganapathy@cognizant.com Reprinted with permission as a subscription benefit of Skillport, http://skillport.books24x7.com/

All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper or other forms without written permission is prohibited.

TeraTomonTeradataUtilitiesV12V13

Chapter 1: Introduction
Overview Its not the data load that breaks us down, its the way you carry it. Tom Coffing Teradata has been doing data transfers to and from the largest data warehouses in the world for close to two decades. While other databases have allowed the loads to break them down, Teradata has continued to set the standards and break new barriers. The brilliance behind the Teradata load utilities is in their power and flexibility. With six great utilities Teradata allows you to pick the utility for the task at hand. This book is dedicated to explaining these utilities in a complete and easy manner. This book has had contributions from over 10 Certified Teradata Masters with experience at over 125 Teradata sites worldwide. Let our experience be your guide. We are going to make these difficult Teradata Utilities easy to use and easy to understand. Let me first say that if you are going to write Teradata Utilities they are very difficult and cumbersome in the beginning. We at Coffing Data Warehousing have built SmartScript inside our world famous Nexus Query Chameleon. Use the Nexus to build your Teradata scripts. All you do is point and click and Nexus builds them brilliantly for you. This book will introduce SmartScript, but also show you how to build them by hand. You can download the Nexus Query Chameleon from our website for a free trial so you use the Nexus to build your scripts until you know what you are doing. In the beginning of my career it took me an average of about 1-hour to build the scripts, but I now always use SmartScript to build them and average about 1-minute. Nexus Query Chameleon - Build Teradata Load Scripts in Seconds You know how difficult and cumbersome it is to build Teradata Load Scripts such as BTEQ, FastLoad, MultiLoad, TPump, and FastExport. Not anymore! Just Right Click on a table and choose SmartScript and then select the script you want to build. Most of the defaults your script will need will automatically be placed in the script, but you can still change if you want, but why would you in most cases? In seconds you can create the script. The only thing we needed to select in the FastLoad script below was the Source File. Then you hit Build Script and check it out or make any minor changes you want (shouldnt need to make any changes). Then hit Execute or use the Nexus Scheduler and run the script when you want!

Page 2 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

The Teradata Utilities Anything that wont sell, I dont want to invent. Its sale is proof of utility, and utility is success. Thomas A. Edison Did you know that Thomas Edison only slept four hours per night? That makes sense with that stupid light always in his eyes! Here is an introduction to the six Teradata Utilities starting with the first three of BTEQ, FastLoad and MultiLoad. Hopefully the light will go on and you will start with a bright and clear fundamental understanding of which utility to use and when. BTEQ was the first Teradata query tool and first utility because it was built as a report writer, but it also imports and exports data 1-row at a time. FastLoad is used to load to Teradata tables that are empty in 64K blocks. This is a mover and shaker and always feels the need for speed. Sorry though because your table cant have Secondary Indexes when loading, join indexes, triggers, or referential integrity. The great news is that since the table must start empty you wont need any of these. You can however use FastLoad to load the table and then add your Secondary Indexes, Join Indexes, Triggers, and Referential Integrity. The only command that FastLoad needs to know is INSERT because it INSERTS into empty Teradata tables by loading 64K blocks of rows (could be hundreds to thousands of rows with a single block load). MultiLoad is like FastLoad in that it also loads in 64K blocks so it is also considered a block utility. BTEQ is not a block utility because it works a row at a time. Where FastLoad only understands the word INSERT because it only INSERTS into empty tables, MultiLoad is used to populate populated tables. The idea is to use FastLoad to load to an empty table the first time and then use MultiLoad each time you want to add to the table. MultiLoad understands the words INSERT, UPDATE, DELETE and UPSERT.

Page 3 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

The Teradata Utilities (Continued) Nothing can have value without being an object of utility. Karl Marx I had no idea that Karl Marx was a Teradata Certified Marxist! He must have really known the utilities because this class struggle will be in determining which utility to use! FastExport is another block Utility that works in 64K blocks, just like FastLoad and MultiLoad, but FastExport only exports Teradata data off of Teradata. The only word that FastExport understands is SELECT. You SELECT the data from the table and then FastExport exports it off Teradata in 64K blocks. TPump is one of the most exciting utilities. It works a row at a time so it is slower than FastLoad or MultiLoad, but you can have Secondary Indexes, Join Indexes, Referential Integrity and Triggers on your table while you load it. I like to think of TPump as MultiLoad, but loading only a row at a time. Why would you use something slower like TPump when you can rapidly load using MultiLoad? Because users can continue to query a table while TPump quietly INSERTS, UPDATES, UPSERTS, or DELETES rows in the background. Think of MultiLoad as a noisy train coming down the tracks disrupting everything in its path and TPump as a quiet truck loading to its destination. Teradata Parallel Transport is Teradatas newest utility and designed to use all of the before mentioned utilities in one scripting language. It also improves on the other utilities by taking advantage of Teradatas Parallel Processing. This soon may be the only utility you need to use, but that story hasnt been scripted quite yet.

Page 4 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Considerations for using Block at a Time Utilities Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Michelangelo I had no idea that Michelangelo was a fan of Teradata block utilities. Back then the ceiling was the limit, but with Block Utilities of today the sky is the limit! As mentioned above, there are efficiencies associated with using large blocks of data when transferring between computers. So, the logic might indicate that it is always the best approach. However, there is never one best approach. You will learn that efficiency comes at the price of other database capabilities. For instance, when using large blocks to transfer and incorporate data into Teradata the following are not allowed:
n

Secondary indices Triggers Referential integrity More than 15 concurrent utilities running at the same time

Therefore, it is important to understand when and where these considerations are present. So, as important as it is to know the language of the utility and database, it is also important to understand when to use the appropriate utility. The capabilities and considerations are covered in conjunction with the commands.

Page 5 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Maximum Amount of Block Utilities has changed! A book that is shut is but a block. Thomas Fuller Thomas Fuller must have been a Tera-Tom fan because he really meant to say a Tera-Tom Utility book that is open talks about Block Utilities! Before Teradata V2R6.0 when Tera-Tom was just a baby in the crib the DBS Control parameter MaxLoadTasks had a maximum limit of 15. This meant no more than a total combination of block utilities (FastLoad, MultiLoad and FastExport could run simultaneously. Many companies set this to 5 because these Block Utilities will have major impact on a system and can greatly affect user query performance. After Teradata V2R6.0 Teradata increased and changed this number. It no longer includes FastExport. Let me explain. Now there can be up to 30 concurrent FastLoad and MultiLoad jobs, but remember it is up to each individual company to determine if this is too many because of the performance hit. For FastExport jobs up to 60 can run concurrently. The only caveat here is that 60 FastExports can run simultaneously (minus the number of active FastLoad and MultiLoad jobs also running). This new feature is actually controlled by a new DBS Control parameter named MaxLoadAWT, which controls AMP Worker Tasks (AWT). When MaxLoadAWT is set to zero then it is like going back in time to pre-V2R6.0 where only 15 FastLoad, MultiLoad and FastExports can run max. When MaxLoadAWT is greater than zero the new feature is active. Each AMP can perform 80 things at once, thus meaning that 80 AMP Worker Tasks per AMP. The MaxLoadAWT should never exceed more than 48 or the AMPs would not be able to do much else during the load.

Page 6 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Considerations for using Row at a Time Utilities Call on God, but row away from the rocks. Hunter S. Thompson I had no idea that Hunter S. Thompson understood the value of the row-level utilities. He made a little typo though because hemeanttosayCallon God, but row away from the Blocks! Block level utilities have speed but so many restrictions. The opposite of sending a large block of rows at the same time is sending a single row at a time. The primary difference in these approaches is speed. It is always faster to send multiple rows in one operation instead of one row. If it is slower, why would anyone ever use this approach? The reason is that it provides more flexibility with fewer considerations. By this, we mean that the row at a time utilities allow the following:
n

Secondary indices Triggers Referential integrity More than 15 concurrent utilities running at the same time

As you can see, they allow all the things that the block utilities do not. With that in mind and for more information, continue reading about the individual utilities and open up a new world of capabilities in working with the Teradata RDBMS. Welcome to the world of the Teradata Utilities.

Page 7 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Fast Path Inserts inside the Teradata Database My mother always used to say, 'There is no path to peace. Peace is the path. Donald Freed I had no idea that Dons mom understood the value of the fast path in Teradata. Using this path will definitely bring you peace, which is peace of mind. The load utilities such as BTEQ, FastLoad, MultiLoad, TPump and TPT are designed to import or export data to and from Teradata, but it is also important that you understand that once you are inside Teradata you can use an INSERT SELECT from one Teradata table to another and get great speed. If the target table you are loading to starts empty then there isnt a large amount of writing to the Transient Journal. The Transient Journal is designed to Rollback bad transactions, but since the table starts empty there is only one write to the Transient Journal and then it is idle. If the Transient Journal needs to rollback the table it just empties it like it started. For the Fast Path to be taken both the target and the source table must have the same Primary Index. That way no data has to be moved across the AMPs via the BYNET and Teradata can just copy and insert the blocks directly. This is why it is called the Fast Path. You can also use to utilize the Fast Path. We will discuss this later.

Page 8 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Fast Path DELETE inside the Teradata Database All speech is vain and empty unless it be accompanied by action. Demosthenes I had no idea that Demosthenes understood the value of the fast path for deletes in Teradata during a speech.. My Fellow Americans DELETE that Teradata tables data! Thank you! This will not be vain and empty. Oops, I mean it will not be in vain, but it will be empty. That is the idea behind the Fast Path Delete. It deletes not a row at a time or a block at a time, but uses the Teradata Cylinder Index and Master Index to delete the blocks almost instantaneously. The blocks arent really physically deleted but logically deleted. There is some different syntax based on if you are using ANSI mode or Teradata mode as you can see on the following page. I have also included multiple statements so you have your choice in both modes.

Page 9 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Freespace Percent and Loading Tables Where the press is free and every man able to read, all is safe. Thomas Jefferson I had no idea that Thomas Jefferson understood the value of freedom inside Teradata cylinders. Where the cylinder is free and every user able to INSERT, all is fast. If a cylinder is filled completely during a load and someone does a simple insert of even a single row then the AMP will complain and Teradata will move data to another cylinder. This is called a cylinder split. It isnt a big deal and Teradata does this in the background but it does take additional time. That is why Teradata invented Freespace Percent. This is actually set in a DBS control parameter called FreeSpacePercent so it becomes the default when loading a table, but users can use the ALTER table or CREATE table commands and inform the system to override the default and utilize the Freespace Percent for that particular table that they want. If nobody on your system ever did an INSERT command then you would not want to use the Freespace Percent command and you would want to completely fill the cylinder, but if there are individual inserts then you dont want the cylinder splits. The following page will show you the utilities that honor and dont honor the Freespace Percent rule.

Page 10 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

Referential Integrity and Load Utility Problems A single lie destroys a whole reputation of integrity. Baltasar Gracian In Teradata a single lie wont destroy any Referential Integrity (RI). This is because Teradata provides a Referential Integrity Error Table. If you are trying to place Referential Integrity on the Employee_Table for example and you populated the table and then created the Foreign Key references and there was a Referential Integrity error(s) a table called Employee_Table_0 would be created and show all the RI errors. There is a problem when using FastLoad or MultiLoad and if you expect the utility to take care of any Referential Integrity problems. They wont! The following page will show you multiple ways to get around this utility situation.

Page 11 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

V13 No Primary Index Tables No one is so generous as he who has nothing to give. French Proverb New in Teradata V13 the DBA has the ability to CREATE tables without a Primary Index! These tables are designed to merely spread the rows randomly and evenly. They are called NoPI tables, which stands for No Primary Index tables. A NoPI table is designed for ETL staging tables so data can be quickly transferred from flat files taken from operational systems such as Oracle or DB2. This might be data that needs to be massaged or transformed. Then once the transformation has been completed the DBA can write an INSERT/SELECT command and quickly load the data inside the stating table into a Teradata table that has a Primary Index. Although you can query or JOIN a NoPI table with a traditional table containing a Primary Index they are really meant to quickly import data inside Teradata temporarily so it can be transformed inside Teradata and then loaded inside the data warehouse tables.

Page 12 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

NoPI CREATE Statement The Constitution only gives people the right to pursue happiness. You have to catch it yourself. Ben Franklin On the following page you can see the NoPI CREATE statement. This is done when you create the table. This can be done with normal SQL as seen on the following page or it can be done with a FastLoad or Tpump Load Utility. The key word to focus on the following page is the NO PRIMARY INDEX highlighted for your convenience.

Page 13 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

NoPI Row-ID Increments the Uniqueness Value Its not the size of the dog in the fight, but the size of the fight in the dog. Archie Griffin Each AMP will receive an equal amount of rows in an attempt by the Parsing Engine to spread the data evenly. Notice the picture on the following page. The Row Hash for every row in the NoPI table is the same. Only the Uniqueness Value is incremented.

NoPI Row-Hash Different on each AMP When all you have is a hammer, you tend to see every problem as a nail. - Abraham Maslow The example on the next page allows you to realize that the Row Hash on each AMP is different, but once the Row Hash is established on each AMP, all rows contain that exact same Row Hash and each AMP only increments the Uniqueness Value. NoPI tables dont need to be sorted and that is another main advantage if you desire to CREATE a staging table.

Page 14 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

NoPI Options and Facts Failure accepts no alibis. Success requires no explanation. Robert Rose The example on the next page describes the options and facts about NoPI Tables.

NoPI Restrictions

Page 15 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

He who asks a question may be a fool for five minutes, but he who never asks a question remains a fool forever. Tom Connelly The example on the next page shows the restrictions of NoPI Tables.

The Nexus Query Chameleon from Coffing Data Warehousing may be the most sophisticated piece of software ever built in the data warehouse industry. After evaluating hundreds of tools Microsoft chose the Nexus as their tool of choice for their Parallel Data Warehouse (PDW) customers for 3-years straight, and they still partner with CoffingDW to deliver it internally at Microsoft and to all PDW customers. This is because the Nexus is so pretty, easy-to-use and the only query tool that universally works with all major database vendor platforms including Teradata, Netezza, Oracle, DB2, Greenplum, SQL Server and SQL Server Parallel Data Warehouse. Download for a FREE Trial at www.CoffingDW.com. Not only can users query each system simultaneously, but they can also perform advanced analytics, graphing and charting, Pivoting, Cube Building, Database Administration, ETL and thousands of other functions on every database simultaneously! The end goal of the Nexus Query Chameleon is to be the only enterprise software tool needed to perform all functions on all databases. Imagine having to pay for only one tool and that tool is the best ever seen by the user community, the developers, power users, Database Administrators, load experts and managers. That is where Nexus gets its name. Nexus is the point where everything connects so Times Square is the Nexus of the New York Subway system. The Nexus Query Chameleon is the Nexus for all databases and the Query Chameleon allows Nexus to literally change colors and fit in any enterprise environment. See the Nexus Query Chameleon User Guide at: http://www.coffingdw.com/data/Nexus_Product_Info.pdf Nexus - A Brilliant Systems Tree Users can choose their colors for each system and see their system trees as they see fit. Some DBAs need to see all databases and users while users can choose the databases or users they want listed by right clicking on the system and
Page 16 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

TeraTomonTeradataUtilitiesV12V13

choosing My Databases! You can also right click on any table in the systems tree and choose Quick Select which was done in the query example below.

Download a FREE Trial of the Nexus Query Chameleon at: www.CoffingDW.com See the Nexus Query Chameleon User Guide at: http://www.coffingdw.com/data/Nexus_Product_Info.pdf

Page 17 / 17 ReprintedforCTS/227461,CognizantTechnologySolutions CoffingDataWarehousing,CoffingPublishing(c)2011,CopyingProhibited

Вам также может понравиться