Вы находитесь на странице: 1из 14

TERADATA

Teradata is a relational database management system designed to run the worlds largest commercial databases. 1 kilobyte = 103 bytes 1 Megabyte = 106 bytes 1 Gigabyte = 109 bytes 1 Terabyte = 1012 bytes 1 petabyte = 1015 bytes 1 million inches = 15.7 miles 1 trillion inches = 15,700,000 miles (30 round trips to moon)

Teradata Architecture:

SQL Request

Answer Set Response

Parser Parsing Engine Optimizer

Dispatcher

Message Parsing Layer

AMP

AMP

AMP

Parsing Engine: This component interprets SQL requests, receives input records and passes the data. It is mainly responsible for Parsing and optimizing SQL requests Dispatching the optimized plan to the AMPs Sending the answer set back to the requesting client

Note: It manages up to 120 individual sessions Message Parsing Layer: This component is responsible for transferring the messages between AMPs and PEs. It distributes the records to multiple AMPs based on the Hash Map generated in parsing engine. The Hash values are generated based on the primary index. Access Module Processor (AMP): It is responsible for managing a portion of the database and it controls same portion of the each table on the system. AMPs perform all the physical work associated with generating an answer set including sorting, Aggregating and converting etc. The AMPs are responsible for Finding the requested rows Sorting rows Aggregating columns Join Processing Creating answer set for the client Disk space management

Teradata Parallelism:
PE Session A Session B PE Session A Session B PE Session A Session B

Message Parsing Layer AMP 1 Task 1 Task 2 Task 3 AMP 2 Task 4 Task 5 Task 6 AMP 3 Task 7 Task 8 Task 9 AMP 4 Task 10 Task 11 Task 12

Each PE can handle up to 120 sessions in parallel. Each session can handle multiple requests. The MPL can handle all messages in parallel Each AMP can perform up to 80 tasks in parallel. All AMPs can work together in parallel to service any request

Creating a database user (Teradata user):


CREATE USER new_user FROM existing_user AS PERMANENT = 10e6, SPOOL = 20e6, PASSWORD = lucky_day ; When we are creating a Teradata database user we have to specify the following space PERM Space SPOOL Space TEMP Space

PERMANENT Space: It is the place where the objects are created and stored. Objects are nothing but databases, users, tables etc. SPOOL Space: It is the place to store intermediate results set while executing the queries.

TEMP Space: It is a place to store global temporary tables. These results are available to the user until the session is terminated.

Comparison of PERM Space & SPOOL Space:

PERM Space Allocation


DBC SYSDBA 500 GB

SPOOL Space Allocation


DBC SYSDBA 500 GB

DBC SYSDBA SYSDBA

DBC

300 GB

500 GB

HR

200 GB

HR

500 GB

DBC

DBC

SYSDBA

300 GB

SYSDBA

500 GB

HR

100 GB

HR

500 GB

Salary

100 GB

Salary

500 GB

In PERM Space if we create a CHILD database from the parent database then the amount of PERM space for that child database is subtracted from the parent PERM space. For example a database SYSDBA is allotted 500GB of PERM space. Now of we create another child database say HR from SYSDBA and 200GB of PERM Space to HR database. Then this 200 GB will be subtracted from the PARENT database & SYSDBA. Similarly, if we define another child database SALARY from HR database and allot 100 GB PERM Space to it then this will be deducted from HR database. While the SPOOL space limit for a child database is not subtracted from its immediate parent, but the child database SPOOL space is as large as its immediate Parent.

Primary Index:
It is a mechanism to distribute the records to the available AMPs. By applying the hash algorithm on one or more columns. Hash algorithm is applied in parsing engine and then it is distributed to multiple AMPs via Message Parsing Layer.

Creating Primary Index: CREATE TABLE Sample1 (Column_a Column_b Column_c

Int, Int, Int )

UNIQUE PRIMARY INDEX (Column_b); Primary index can be classified into two types, 1. UPI ( Unique Primary Index) Doesnt allow duplicate values 2. NUPI ( Non Unique Primary Index) Allows duplicate values Note: Changing the choice of PI requires dropping or recreating the table. When we are creating a table if we just gave Primary index it will consider as NUPI. When we dont define a primary index while creating the table, it will identify the primary index column based on the below criteria, Check for the Primary key column Check for the unique key column First column

A Primary index should be created or defined during table creation. Primary index can consist of up to 16 columns maximum.

Difference b/w primary key and primary Index:

Primary Key It must be unique No limit of no. of columns Primary key = Unique + Not Null Optional in creating table statements Teradata doesnt need to recognize

Primary Index Unique or Non Unique 16 columns limit It supports Defined in create table statements Each table must have exactly one PI

Secondary Index:
It is an alternative way to the rows of a table. A table can have 0-32 secondary indexes. Secondary indexes do not effect table distribution and creates a sub table in each AMP. Secondary indexes can be added or dropped dynamically.

Advantages: If we are trying to fire the SQL statements based on the columns which are not part of the PI then we can go for creating secondary index (SI). When creating secondary index our query will scan only the sub table instead of the base table. This will improve the performance. If we drop the SI, only the sub table will be deleted.

Disadvantages: Additional overhead in terms of disk space and maintenance. Note: Secondary Index also supports 0-16 columns.

Comparison of Primary Index and Secondary Index: Index feature Primary Index Mandatory Yes Number per table 1 Maximum no. of columns 16 Effects row distribution Yes Created/dropped dynamically No Extra processing overhead No Separate physical structure No

Secondary Index No 0-32 16 No Yes Yes Yes

Teradata Objects:
Tables Views Macros Triggers Procedures Join/Hash index Journals

Tables: A table is a container which contains data in the form of rows and columns Views: Views are predefined subsets of existing tables consisting of specified columns and rows from the tables. Views are two types Single Table View Multi Table View

Multi table views are also called as JOIN Views which are read only. Macro: A Macro is predefined set of SQL statements which is logically stored in a database. Macros may be created for frequently occurring queries. Benefits: Simplify end user access Reduce the query size which reduces the network traffic

Creating a Macro: CREATE MACRO customer_list AS ( SELECT customer_name FROM customer) ; Executing a Macro: EXEC customer_list; Replacing a Macro: REPLACE MACRO customer_list AS ( SELECT customer_name, customer_no FROM customer); Triggers: Triggers are the set of SQL statements associated with a table. These are mainly used for DML operations. Creating a Trigger: CREATE OR REPLACE TRIGGER trigger_name BEFORE/AFTER/INSTEAD OF INSERT/UPDATE/DELETE OF column_name ON table_name WHEN condition

BEGIN ------- SQL Statements END;

Teradata Utilities:
BTEQ Fast Load Multi Load TPUMP Fast Export
Host Host Host Host Host

BTEQ

FAST LOAD

MULTI LOAD

TPUMP

FAST EXPORT

TERADATA DATABASE

BTEQ Export: It is a batch mode utility for submitting SQL request to the Teradata database. Use .Export to select data from the Teradata database to another computer. Use .Import to process input from a host residing data file. Use Indicator variables to preserve NULLS BTEQ Script (Expacct.btq) .LOGON tdp1/user1, passwd1 .EXPORT DATAFILE = /home/auo1/datafile.dat SELECT account_no FROM accounts WHERE balance_current LT 100; .EXPORT RESET .QUIT

LOGON complete 1200 rows returned Time was 15:25 Secs

BTEQ

Datfile.dat 45644656 58879798 21245646 24564854

Note: We can use the exported file format as Indicator mode. Indvar 010 100 111 F1 42 F2 32 F3 96 50 -

We can identify the positions of NULL values.

BTEQ Import: Bteq Enter your BTEQ command: Or Bteq <jobscript.btq> jobscript.out Jobscript.bteq .LOGON tdp1/user1, passwd1 .IMPORT DATAFILE = datafile3.dat .QUIET ON .REPEAT * USING in-custno (INTEGER), In_socsec (INTEGER) UPDATE customer SET social_security = :in_socsec WHERE customer_number = :in_custno .Repeat * causes BTEQ to read records until EOF. USING defines the input data file.

FAST Load: It is also a batch mode utility for loading new tables on to the Teradata database. It is mainly used to load large amount of data into an empty table at high speed. Check points cab be taken for restarts (default is 100000). Only load one empty table with one fast load. If one AMP goes down fast load cant be restarted until it is back online. Fast load utility executes mainly in 2 phases, Phase 1: Fast load uses one SQL session to define AMP steps. The PE sends a block to each AMP which stores blocks of unsorted data records. AMPs hash each record and redistributes them to the AMP responsible for the hash value. At the end of phase1, each AMP has the rows it should have, but the rows are not in row hash sequence.

Phase 2: When fast load job receives END LOADING statement fast load starts Phase 2. Each AMP sorts the target table and writes to the disk. Table data is available when phase2 is completed.

FAST Load Phase1:


HOST PE PE

1.Fast load

2
B1 R2

2
B2 R4

B1 R2 B1R1 B1 R3 B1R2

B2 R5

B2 R6

3
B1R3
BYNET

B2R4

4
B2R5
AMP B1 R1 B1 R2 B1 R3

4
AMP B2 R4 B2 R5 B2 R6

B2R6

R4 R2 R5

R3 R1 R6

...

BnRx

BnRy
R4 R2 R5 R3 R1 R6

BnRz

FAST Load Phase2:


HOST 1.Fast load
END LOADING:

PE

BYNET

AMP1 R4 R2 R5 R2 R4 R5 R3 R1 R6

AMP2 R1 R3 R6

R4 R2 R5

R2 R4 R5

R3 R1 R6

R1 R3 R6

Sample Fast Load Script:


Fastload </home/job1.fld> /home/job1.out
Input script file name & output(report) name

Sessions 8; LOGON educ2/bank, bkpasswd; BEGIN LOADING customer ERRORFILES custrErr1, CustErr2;
Start Phase Name of the empty file

DEFINE in_custNum(Integer), In_socsec(Integer), Filter(Char(40)), In_Lname(Char(30)), In_Fname(Char(30)) FILE = custdata.dat; INSERT INTO customer VALUES( :in_custNum, :in_Lname, :in:Fname, :in.SocSec); END LOADING; LOGOFF;

Defines input records

SQL insert statements

Start phase. if omitted utility will pause

When we use fast load mechanism we have to design two error tables. Error table1 contains the records which are failed to be loaded due to constraints violating or translation errors. Error table 2 captures the rows that cause a UPI duplicate violation. Column Name Error Code Error Field Name Data Parcel Data Type Integer Varchar(30) VarByte(64000) Content The Error code in DBC. Error message The column that caused the error The data record sent by the host

Note: Error tables are automatically dropped if empty after the successful completion of the run. Rows are written into error tables one row at a time. Errors slow down the fast load mechanism.

MULTI Load: It is a batch mode utility which can do different multiple import tasks or delete tasks. Supports up to 5 tables per script Tables are either empty or non-empty Supports all DML operations The following phases are available in multi load mechanism 1) 2) 3) 4) 5) Preliminary (Basic Setup) DML Transaction Acquisition Application Clean up

Basic Setup: Validate all statements (Multi load & SQL) Starts all sessions Creates work tables, error tables, log tables Applies locks on the target table

DML Transaction: Stores DML steps in work tables Adds USING modifier to the request

Acquisition: Gets the HOST data to appropriate AMP work tables Sorts the records in work tables

Application: AMPs independently apply changes to target table Restart able based on the last checkpoint Journal is not needed ( log table is available)

Clean Up: All logs are released, work tables are dropped, empty error tables are dropped, log table is dropped if error code is zero(0) Target tables are available to other users.

Multi Load Delete Task: .LOGTABLE logtable001_mld; .LOGON tdp3/user2,tyler; .BEGIN DELETE MLOAD TABLES employee; DELETE FROM employee WHERE term_date > 0; .END LOAD; .LOGOFF:

Collect Statistics:
When you collect the statistics on your table it will take the statistics like (Teradata Engine) the total no. of rows, no. of columns used in index, constraints etc. This is very useful when we are using the EXPLAIN facility on the same table (SELECT, INSERT, UPDATE, DELETE Statements)

EXPLAIN:
EXPLAIN SELECT * FROM <Table_name>; This query will give you 100 records in 2 secs with high confidence. Confidence levels are 1) High Confidence

2) Low Confidence 3) No Confidence Note: SHOW command gives you the DDL statement of a table(which is used in create table statement) HELP command gives you column information in a table

Teradata Maximums:
Maximum no. of journal tables per database : Maximum no. of data tables per database : Maximum no. of columns per table or view : Maximum no. of rows per table : Maximum row/ field size : Maximum database object name size : Maximum SQL request size : Max no. of concurrent sessions PE can handle : Max no. of concurrent tasks an AMP can handle : Max no. of characters in a string constant : Maximum SQL title size : 1 4.2 * 109 2048 Limited by disk capacity 64000 bytes approx. 30 bytes 1MB 120 80 32000 60 characters

Вам также может понравиться