Вы находитесь на странице: 1из 55

KNOWLEDGE SHARING ON TERADATA

TERADATA OVERVIEW & UTILITIES

Agenda

Introduction to TERADATA Flow through TERADATA UTILITIES with Lab Examples

Bteq Fast

Load Fast Export Muilti Load Tpump

Comparative study of the Teradata loading utilities.

What is Teradata?

Teradata is a Relational Database Management System (RDBMS) that drives a companys data warehouse. Teradata is an open system, compliant with industry ANSI standards. It is currently available for the following operating systems: UNIX MP-RAS Windows 2000 The ability to manage terabytes of data is accomplished using the concept of parallelism, wherein many individual processors perform smaller tasks concurrently to accomplish an operation against a huge repository of data. To date, only parallel architectures can handle databases of this size.

Why Teradata? There are many reasons to choose Teradata as the preferred platform for enterprise data warehousing: Supports easy scalability from a small (10 GB) to a massive (100+TB) database. Automatic and even data distribution eliminates complex indexing schemes or time-consuming reorganizations. Designed and built with parallelism from day one . Single operational view of the entire MPP system and single point of control for the DBA (Teradata Manager). Teradata has been doing data warehousing longer than any other vendor.

Scalability in a Production Environment Start Smaller and Grow: One Experience 200-300 users 30 concurrent users 300 GB disk space 1.7 billion-row table 200 queries per day 30M-row batch per night 1 main application Over 7500 users Over 2000 concurrent users Over 50 TB user data Over 7.5 billion -row table Over 20,000 queries per day Over 500M-row batch per night Over 30 applications

BUT ONE REMAINS CONSTANST

ADVANTAGE of TERADATA

i . e, Through Put. ADVANTAGE of TERADATA


Ease of setup and maintenance No reorganization of data needed Most robust utilities in the industry Low cost of disk to data ratio Ease in expanding the system

TERADATA Architecture(Shared Nothing)

SMP & MPP platforms BYNET Disk Arrays Cliques Hot Standby nodes Virtual processors Request processing Teradata Database RASUI

Node Architecture(Shared Nothing) Each Teradata Node is made up of hardware and software Each node has CPUs, system disk, memory and adapters Each node runs copy of OS and database SW

BYNET in TERADATA

BYNET

The BYNET performs the internal communication of the Teradata RDBMS All communication between PEs and AMPs is done via the BYNET

Boardless BYNET

Single-node SMP systems use Board less Bynet ( or virtual BYNET software to simulate the Bynet hardware driver.

Disk Arrays & Clique

Disk Arrays

A Disk Array is a configuration of disk drives that utilizes specialized controllers to manage and distribute data and parity across the Disks while providing fast access and data integrity.

Clique

A Clique is a set of Teradata nodes that share a common set of disk arrays. In the event of failure, all virtual processors can migrate to another available node in the clique. All nodes in the clique must have access to the same disk arrays.

Hot Standby Nodes The hot standby Node feature allows spare nodes to be incorporated into the production environment so that Teradata Database can take advantage of the presence of the spare nodes to improve availability and maintain performance levels. Is Member of a Clique Does not normally participate in the trusted parallel application(TPA).

Can be brought into the TPA to compensate for the loss of a node in the Clique

Virtual Processors The versatility of Teradata Database is based on virtual processors(vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under Teradata Parallel Database Extensions(PDE) within the multitasking environment of the operating system.

The two types of vprocs PE Parsing Engine) AMP Access module processor

Vproc

PE

A Parsing Engine (PE) is a virtual processor that manages the dialogue between the client application and the RDBMS. It interprets the SQL requests, receives input records and passes data. It is made of the following software components: Session Control, the Parser, the Optimizer and the Dispatcher

AMP

The AMP is a virtual processor designed for and dedicated to managing a portion of the entire database. An AMP will control some portion of each table on the system. It performs all database management functions such as sorting, aggregating and formatting data. The AMP receives data from the PE, formats rows and distributes them to the disk storage units it controls. The AMP also retrieves the rows requested by the PE.

Data Store on Disks

AMP

AMP

AMP

AMP

Table A rows Table B rows

The rows of every table are distributed among all AMPs Each AMP is responsible for a subset of the rows of each table. Ideally, each table will be evenly distributed among all AMPs. Evenly distributed tables result in evenly distributed workloads. The uniformity of distribution of the rows of a table depends on the choice
of the Primary Index.

Request processing
SQL Request Answer Set Response

NODE

Parsing Engine

Parsing Engine

BYNET

AMP

AMP

AMP

AMP

Disk Storage

Disk Storage

Disk Storage

Disk Storage

TERADATA UTILITIES INTRODUCTION

The major Teradata utility that assists in data warehousing management and maintenance along with the Teradata RDBMS are BTEQ FASTLOAD FAST EXPORT MULTILOAD TPUMP

BTEQ - Basic Teradata Query

General-purpose, command-based program that allows users on a workstation to communicate with one or more Teradata Database systems. A set of SQL statements used to inserts updates or deletes in Teradata tables.

Imports data to Teradata database from a file.


Exports data from table and formats the results and returns them to the screen, a file, or to a designated printer.

Do report the error occurs but will not capture it as log.

Capabilities in BTEQ Enter Teradata SQL statements to view, add, modify, and delete data. Enter operating system commands. Create and use Teradata stored procedures BTEQ supports Teradata-specific SQL functions for doing complex analytical querying and data mining All database requests in BTEQ are expressed in Teradata SQL. BTEQ also supports the conditional logic (i.e., "IF.THEN...") based on activity count or error code. It is useful for batch mode export / import processing. Error handling is applicable in BTEQ. We can assign error level for each error code and make decisions based on the level assigned.

OPERATING MODES in BTEQ Interactive mode you start a BTEQ session by entering BTEQ logon at the system prompt on your terminal and submit SQL commands to the database as needed. Format of logon cmd: bteq .logon server name/user_name, password Batch mode In batch mode, you prepare BTEQ scripts or macros, and then submit them to BTEQ from a scheduler or manually for processing. A BTEQ script is a set of SQL statements and BTEQ commands saved in a file with the extension ".bteq". The BTEQ script can be run using the following command (in UNIX or Windows)

EXPORT in BTEQ Export BTEQ by default delivers a response to all SQL queries that includes a helpful message along with helpful diagnostic information about the time taken to perform the query. If all of this information is captured in a single output file, this mixed output typically renders the data unsuitable for some other purposes. So the .EXPORT feature provides the ability to separate the report or output data to a separate file. The output file of this script will contain only the messages and not the data. It is exported to a file which can be used for some other purposes also. Export types are export record , export data, export reset , export indicdata, export dif

IMPORT in BTEQ

Import data from host to Teradata as a series of inserts updates and deletes.

Import types supported are import data import record import indicdata.

BTEQ COMMANDS

All the BTEQ commands must be preceded by a dot . character and also BTEQ commands may or may not end with a semicolon ;.

They are of four types as


Session control File control Format control Sequence control commands

BTEQ Advantages

Report formatting. Ad hoc query tool . Database administration .

Best for small data volumes.

Bteq Lab Exercise


Lab.sh #! /bin/sh .logon tdprd/username, pwd; .Export report File=lab.txt .set record vartext "|"; .BEGIN LOADING emp ERRORFILES Error_1, Error_2; DEFINE empno (VARCHAR (50)), empname (VARCHAR (50)), doj (VARCHAR (30)) FILE = /ngs/app/asrdedwp/SCRIPTS/emp.txt; .Set Underline Off; .Set Titledashes Off; .Set Errorout Stdout; .Set Width 4000; select * From table_name where ; Delete from table_name where.; Insert into table name values(.); Update table name set where..; Call macro, procedure etc .if errorcode <> 0 then .exit 2 .export reset .logoff

FL-FASTLOAD

FastLoad- Fload or FL is a multi - sessioned parallel load utility for initial table load in bulk mode on a Teradata Database.

It is a command-driven utility to load large data into an empty table on a Teradata RDBMS with no secondary indexes.
It uses multiple database sessions to load data.

FASTLOAD Capability

Full Restart capability. Checkpoints provided for restart. Checkpoints slow fast load processing. Set the checkpoint large enough to be taken every 10 to 15 minutes. Two Error tables and Error Limits, accessible using SQL. In one Error table, rows which failed due to constraints or translation errors are loaded. In another table duplicate rows for UPIs are captured. Error table is loaded with one row at a time, so errors slow down the performance of fastload.

Fast Load Operates in two phases

Phase 1

FastLoad uses one SQL session to define AMP steps The PE sends a block to each AMP Amps hash each record and redistribute them to the AMP responsible for the hash value Records are written to the target table in unsorted blocks starts after .end loading command. So if this command not specified fast load will be paused and not terminated. When loading completes, each AMP sorts the target table, puts the rows into blocks, and writes the blocks to the disk Fall back rows are then generated if required

Phase 2

OPERATING MODES in Fast Load Interactive mode In interactive mode, Teradata FastLoad uses terminal screen and keyboard as the standard output and input streams. For Interactive mode, fastload .logon tdprd/user_id, pwd Batch mode In batch mode, FastLoad uses > and < to redirect the standard output / input streams. For Batch mode, fastload [options] < infile > outfile Here, the infile is a Teradata FastLoad job script file and the outfile is the FastLoad output stream file.

SQL Statements Supported in FastLoad CREATE TABLE Defines the columns, index and other qualities of a table DATABASE Changes the default database DELETE Deletes rows from a table DROP TABLE Removes a table and all of its rows from a database INSERT Inserts rows into a table

Lab Excercise FastLoad


.LOGON TDP/username,pwd .LOGON TDP/username,pwd; errlimit 1; tenacity 4; sleep 6; DROP TABLE ; SET RECORD UNFORMATTED; .begin loading filename errorfiles filename_ref1, filename_ref2; Define feilds file=stg_sref_service_price.out; show; checkpoint 0 ; INSERT INTO table name ( ) VALUES ( ); end loading; logoff;;

FEFast Export

Teradata FastExport, also called "FastExport" or "FE," is a multi-sessioned command-driven utility for export in bulk mode from tables and views of the Teradata Database to a clientbased application. It is the reverse of the Teradata FastLoad utility. Teradata FastExport processes a series of FastExport commands and Teradata SQL statements written in a batch mode job script or interactively entered. The FastExport commands provide the session control and data handling specifications for the data transfer operations, and the Teradata SQL statements perform the actual data export functions on the Teradata RDBMS tables and views

Capability of Fast Export


Fully automated Restart. Export from multiple tables

There are two techniques to provide variable inputs to fastexport for selection controls. They are

ACCEPT from a parameter file; only accept from a single record IMPORT from a data file; each import record is applied to every select.

Operating Modes in Fast Export

Interactive mode;
In interactive mode, Teradata FastLoad uses terminal screen and keyboard as the standard output and input streams. Interactive mode for Microsoft Windows: c:\ncr\fexq

Batch mode;
outfile

Batch mode for Microsoft Windows: c:\ncr\fexq [options] < infile >
In batch mode, FastExport uses > and < to redirect the standard output / input streams.

SQL Statements Supported in Fast Export


CREATE TABLE Defines the columns, index and other qualities of a table DATABASE Changes the default database DELETE Deletes rows from a table DROP TABLE Removes a table and all of its rows from a database INSERT Inserts rows into a table

SQL Statements Supported in Fast Export


ALTER TABLE
Changes the column configuration or options of an existing table

COLLECT STATISTICS
Collects statistical data for one or more columns of a table

COMMENT
Stores or retrieves comment string associated with a database object

CREATE DATABASE,MACRO,TABLE,VIEW
Creates a new database, macro, table, or view

DATABASE
Specifies a new default database for the current session

DELETE
Removes rows from a table

SQL Statements Supported in FastExport

DELETE DATABASE Removes all tables, views, and macros from a database DROP DATABASE Drops the definition for an empty database from the Data Dictionary DROP TABLE Removes a table from the database GIVE Transfers ownership of a database to another user GRANT Grants access privileges to a database object INSERT Inserts new rows to a table

SQL Statements Supported in FastExport

RENAME Changes the name of an existing table, view, or macro REPLACE MACRO,VIEW Redefines an existing macro or view REVOKE Rescinds access privileges to a database object UPDATE Changes the column values of an existing row in a table

SQL Statements Supported in FastExport

RENAME Changes the name of an existing table, view, or macro REPLACE MACRO,VIEW Redefines an existing macro or view REVOKE Rescinds access privileges to a database object UPDATE Changes the column values of an existing row in a table

SQL Statements Supported in FastExport

RENAME Changes the name of an existing table, view, or macro REPLACE MACRO,VIEW Redefines an existing macro or view REVOKE Rescinds access privileges to a database object UPDATE Changes the column values of an existing row in a table

Lab Exercise Fast Export

LOGTABLE utillog;

.LOGON tdpz/user,pswd; .BEGIN EXPORT SESSIONS 20; .LAYOUT UsingData; .FIELD ProjId * Char(8); .FIELD WkEnd * Date; .IMPORT INFILE ddname1 LAYOUT UsingData; .EXPORT OUTFILE ddname2; SELECT EmpNo, Hours FROM CHARGES WHERE WkEnd = :WkEnd AND Proj_ID = :ProjId ORDER BY EmpNo; .END EXPORT; .LOGOFF; /* these input variables are refered from imported input file */

Multi Load

MultiLoad - MLoad or ML is a command-driven parallel load utility for high-volume batch maintenance on multiple tables and views of the Teradata Database

Features of MultiLoad Teradata MultiLoad executes a series of MultiLoad commands and Teradata SQL statements written in a batch mode job script or interactively entered Supports up to five populated tables Fastload like technology Tpump like functionality Multiple operations with one pass of input files Conditional logic for applying changes Supports INSERTs, UPDATEs, DELETEs and UPSERTs Full restart capability Error reporting via error tables Support for INMODs

SQL Statements Supported in MultiLoad


ALTER TABLE Changes the column configuration or options of an existing table COLLECT STATISTICS Collects statistical data for one or more columns of a table COMMENT Stores or retrieves comment string associated with a database object CREATE DATABASE,MACRO,TABLE,VIEW Creates a new database, macro, table, or view DATABASE Specifies a new default database for the current session DELETE Removes rows from a table DELETE DATABASE Removes all tables, views, and macros from a database

SQL Statements Supported in MultiLoad


DROP DATABASE
Drops the definition for an empty database from the Data Dictionary

DROP TABLE
Removes a table from the database

GIVE
Transfers ownership of a database to another user

GRANT
Grants access privileges to a database object

INSERT
Inserts new rows to a table

RENAME
Changes the name of an existing table, view, or macro

REPLACE MACRO,VIEW
Redefines an existing macro or view

REVOKE
Rescinds access privileges to a database object

UPDATE
Changes the column values of an existing row in a table

SQL Statements Supported in MultiLoad


Interactive mode In Interactive mode, Teradata MultiLoad uses terminal screen and keyboard as the standard output and input streams. Interactive mode for Microsoft Windows: c:\ncr\bin\MultiLoad

Batch mode In batch mode MultiLoad uses > and < to redirect the standard output / input streams.

Batch mode for Microsoft Windows : c:\ncr\bin\MultiLoad [options] < infile > outfile infile is a Teradata MultiLoad job script file and the outfile is the output stream file.

MULTILOAD TASKS
IMPORT task These are the tasks which intermix a number of different SQL/DML statements and apply them to up to five different tables depending on the APPLY conditions Import tasks are always primary index operations, but not allowed to change the value of tables primary index. Allows restart and checkpoint during each operating phase. Import tasks cannot be done on tables with USIs, Referential Integrity, Join Indexes, Hash Indexes, and Triggers. Phases involved in this task are Preliminary Basic set up DML phase Get DML steps down on Amps Acquisition phase Send the input data to Amps and sort it Application phase Send the input data to target tables End phase Basic clean up

MULTILOAD TASKS

Basic set up involves validate all sql, starts all sessions, create work tables (one per target), error tables (two per target), restart log table (one per table), apply locks to target tables (to prevent access to target while loading). Basic clean up involves session logoff, dropping error and work tables, releasing table locks. DELETE task These are tasks which execute a single DELETE statement on a single table.

Multi Load Advantage Each MultiLoad import task can do multiple data insert, update, and delete functions on up to five different tables or views; Each MultiLoad import task can have up to 100 DML steps; Each MultiLoad delete task can remove large numbers of rows from a single table.

MULTILOAD TASKS
.LOGTABLE dwlogtable; .logon TDPROD/username,pwd; .begin import mload tables FRAUD_CHECK; .layout UsingData; ; .field Claimed_Fg_Serial_Nr .field Claimed_Module_Serial_Nr .field Request_Ts .field Notif_Id .field Claimed_Module_Part_Nr .field Fraud_Request_Source_Id .Field Claimed_Fg_Warranty_Cd .Field Fraud_Mode_Cd .Field Commodity_Cd .Field Dispatch_Id .Field Create_Ts .dml label FRAUD_CHECKdml; insert into FRAUD_CHECK.*; .import inFILE FRAUD_CHECK_Request_LOG.txt format VARtext ';' layout UsingData apply FRAUD_CHECKdml; .end mload; .logoff;

* VARCHAR(50); * VARCHAR(50); * VARCHAR(30); * VARCHAR(30); * VARCHAR(50); * VARCHAR(30); * VARCHAR(30); * VARCHAR(30); * VARCHAR(10); * VARCHAR(20); * VARCHAR(30);

TPUMP

Teradata TPump, short for "Teradata Parallel Data Pump," is a continuous data-loading utility used to move data into the Teradata Database without locking the affected tables. Instead of updating Teradata Databases overnight, or in batches throughout the day, TPump updates information in near real-time or real time, acquiring data from the client system with low processor utilization. This parallel utility is featured by stream-mode loading its SQL-based, but not block-based, protocol.

SQL Statements Supported in TPump

DATABASE Changes the default database qualification for all DML statements. DELETE Removes specified rows from a table EXECUTE Specifies a user-created (predefined) macro for execution.The macro named in this statement resides in the Teradata Database and specifies the type of DML statement (INSERT, UPDATE, or DELETE) being handled by the macro. INSERT Adds new rows to a table by directly specifying the row data to be inserted UPDATE Changes field values in existing rows of a table.

Operation Modes in TPump

Interactive mode; In interactive mode, Teradata Tpump uses terminal screen and keyboard as the standard output and input streams, involving the more or less continuous participation of the user. Batch mode.

In batch mode, Teradata Tpump processes data in discrete groups of previously scheduled operations, typically in a separate operation, rather than interactively or in real time.

TPUMP Advantages

Its setup does not require staging of data, intermediary files, or special hardware; Its operation is not affected by database restarts, dirty data, and network slowdowns. Its jobs restart without intervention; Fast, scalable continuous data loads Row hash lock enables concurrent queries Dynamic throttling feature Best for small data volumes Multiple sessions and multistatement request are typically used to increase throughput. TPump also provides a dynamic throttling feature that enables it to run all out during batch windows, but within limits when it may impact other business uses of the Teradata RDBMS. Operators can specify the number of statements run per minute, or may alter throttling minute-by-minute, if necessary.

TPUMP Lab exercise


.LOGON TDMNT/username,pwd; .LOGTABLE cs_irepair_log; .ROUTE MESSAGES WITH ECHO TO FILE DROP TABLE ET_cs_irepair; .BEGIN LOAD ERRORTABLE ET_cs_irepair ERRLIMIT 0 CHECKPOINT 15 TENACITY 4 SESSIONS 1 SLEEP 6 SERIALIZE OFF PACK 20 ROBUST OFF NOMONITOR; /* Begin Layout Section */ .Layout InputFileLayout; .Field Warr_Code * varchar(6) .Field Prod_hier * varchar(1) .Field Descript * varchar(50) .Field Serv_Matl * varchar(50) .Field Reference * varchar(20) /* End Layout Section */ /* begin DML Section */ .DML Label tagDML; INSERT INTO cs_irepair ( Warr_Code , Prod_hier , Descript , Serv_Matl , Reference ) VALUES ( cs_irepair.out.ldrlog' ;

; ; ; ; ;

Comparative study of the Teradata loading utilities