Teradata Day2

TERADATA- DAY 2
Teradata Indexes
Types of tables
Prepared By
AnilKumar P
-Primary index
-Unique Primary Index (UPI)
-Non Unique Primary Index(NUPI)
-No Primary Index (NOPI)
-Partition Primary Index(PPI)
-Secondary Index
-Unique Secondary Index (USI)
-Non Unique Secondary Index(NUSI)
-Join Index
-Single Table Join Index(STJI)
-Multi table Join Index (MTJI)
-Aggregate Join Index (AJI)
-Hash Index
-Types of tables
-Set table
-Multi set table
-Derived table
-Volatile table
-Global Temporary Table
-Locks
Types of tables:
Derived tables are always local to a single SQL request. They are built dynamically
using an additional SELECT within the query. The rows of the derived table are stored
in spool and discarded as soon as the query finishes..
Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derived
tables (single use). Like a derived, a volatile temporary table is materialized in spool
space. However, it is not discarded until the session ends or when the user
manually drops it.
Global Temporary tables are local to a session, like volatile tables. Global temporary
tables are used temporary space. But the major difference is GTT Data Definition is
stored in Data Dictionary. But not data. When ever user come out the session data
automatically deleted but not definition.
Derived Table Example :

Ex 1 : Select * From ( Select AVG(SAL) as Avgsalary From Emp) sample;
Ex 2 : A Derived Table that Joins to an Existing Table
SELECT Dept_No, First_Name, Last_Name, AVGSAL
FROM Employee_Table
INNER JOIN
(SELECT Dept_No, AVG(Salary) FROM Employee_Table
GROUP BY Dept_No) as Sample (Dno, AVGSAL)
ON Dept_No = Dno
Show all employees and their Average Salary per department!
The first THREE columns in the Answer Set came from the Employee_Table. AVGSAL came from
the derived table named TeraTom.
Derived table Example :

Get top three selling items across all stores :
we must first aggregate the sales by product-id using a derived table. Once we have
this aggregation done in spool, we may apply the RANK function to answer the
question.
SELECT Prodid, Sumsales, RANK(Sumsales) AS "Rank"
FROM (SELECT Prodid, sum(sales) FROM Salestbl GROUP BY 1)
AS tmp (Prodid, Sumsales)
QUALIFY RANK (Sumsales) <= 3;
Result :
Prodid
--------A
C
D
Sumsales
--------------170000.00
115000.00
110000.00
Rank
-------1
2
3
Derived table name is tmp.

The table is required for this query but no others.
The query will be run only one time with this data.
Derived column names are Prodid and Sumsales.
Table is created in spool using the inner SELECT.
SELECT statement is always in parenthesis following FROM.
Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derived
tables (single use). Like a derived, a volatile temporary table is materialized in
spool space. However, it is not discarded until the session ends or when the user
manually drops it.
Syntax: CREATE VOLATILE TABLE Dept_Agg_Vol , NO LOG
( Dept_no Integer
,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;
NO LOG allows for better performance.

LOG indicates that a transaction journal is maintained.
PRESERVE ROWS indicates keep table rows at TXN end.
DELETE ROWS indicates delete all table rows at TXN end.
The Three Steps to Use a Volatile Table :

CREATE VOLATILE TABLE Dept_Agg_Vol , NO
LOG
( Dept_no Integer
)
INSERT INTO Dept_Agg_Vol

SELECT Dept_no
,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_Vol
ORDER BY 1;
1) A USER Creates a Volatile Table and then
2) populates the Volatile Table with an
INSERT/SELECT Statement, and then
3) Query it until you Logoff.
HELP VOLATILE TABLE ;

This command is used to display the names of all Volatile temporary tables
active for the current user session.
SessionID TableName TableId Protection CreatorName CommitOption TransactionLog

1010
Dept_Agg_Vol 10C0C04 N
Anil
P
N
Global Temporary tables are local to a session, like volatile tables.

Global temporary tables are used temporary space. But the major
difference is GTT Data Definition is stored in Data Dictionary,But not
data. When ever user come out the session data automatically
deleted but not definition.
CREATE Global Temporary TABLE Dept_Agg_GLO
( Dept_no Integer
)
Have LOG and ON COMMIT PRESERVE/DELETE options.
The Three Steps to using a Global Temporary Table

CREATE Global Temporary TABLE
Dept_Agg_GLO
( Dept_no Integer
)
INSERT INTO Dept_Agg_GLO
SELECT Dept_no
,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_GLO
ORDER BY 1;
Primary Index :
A Primary Index (PI) is the physical mechanism for assigning a data row to an AMP
and a location on the AMPs disks. It is also used to access rows without having to
search the entire table.
The rows of every table are distributed among all AMPs
Each AMP is responsible for a subset of the rows of each table.
Ideally, each table will be evenly distributed among all AMPs.
Evenly distributed tables result in evenly distributed workloads.
The uniformity of distribution of the rows of a table depends on the choice of the
Primary Index.
Three Purpose of primary index
1-Distribution of rows to proper AMP.
2-Fastest way to Retrieve the single row
3-Accessig Joins
Types of Primary Index :

- Unique Primary Index
- Non Unique Primary Index
Syntax :CREATE TABLE sample_1
(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);
CREATE TABLE sample_2

(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);
Limitations:
1. Each table should have only one primary index. It supports up to 64 Columns.
2. Once primary index we cant alter and Drop.
3. Primary index is always One AMP Operation.
4.if we dont give to any column for index, Teradata automatically created first column
of table at the time table creation.
Physical Mechanism
Index value(s)
hashing algorithm
Row Hash
DSW or
Hash Bucket #
Hash Map
AMP #
{
{
{
{
The hashing algorithm is designed to insure even distribution of

unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.
A Row Hash is the 32-bit result of applying a hashing algorithm to

an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.
A Hash Map is uniquely configured for each system.

It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.
Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.
A Hashing Example
Order
Order
Number
PK
UPI
7325
7324
7415
7415
7103
7225
7384
7402
7188
7202
Customer
Number
Order
Date
SELECT * FROM order

WHERE order_number = 7202;
Order
Status
7202
2
3
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/13
4/10
4/15
4/12
4/12
4/13
4/09
O
O
O
C
O
C
C
C
C
C
Hashing Algorithm
691B 14AE
32 bit Row Hash

Destination Selection Word
Remaining 16 bits
0110 1001 0001 1011
0001 0100 1010 1110
The Hash Map

7202
691B 14AE
Hashing Algorithm
32 bit Row Hash

Destination Selection Word
Remaining 16 bits
0110 1001 0001 1011
0001 0100 1010 1110
(Hexadecimal)
HASH MAP
690
691
692
693
694
695
07
15
01
07
04
11
06
08
00
06
12
11
07
02
15
15
11
12
06
04
11
13
13
10
07
01
14
11
05
03
04
00
14
06
10
02
05
14
13
15
07
06
06
14
13
08
07
13
05
03
14
15
03
01
05
02
14
15
02
00
A B C
D E
14
03
08
08
11
06
13
00
10
07
00
06
04
15
09
10
13
12
09
09
09
08
04
05
14
01
15
11
01
07
03
02
09
05
11
05
AMP 9
7202 2 4/09 C
Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.
Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
1254
A Row Hash = 32 bits = 4.2 billion possible

values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.
7769
Data values input
Hash Algorithm
10A2 2936
10A2 2936
Hash Synonyms
(Dave)
'Smith'
NUPI Duplicates
Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.
(John)
'Smith'
Hash Algorithm
0016 5557
Conclusion
A row hash is not adequate to uniquely identify a row.
0016 5557
Rows have
same hash
The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID
Each stored row
has a Row ID as a
prefix.
Rows are logically

maintained in Row
ID sequence.
Row Hash
(32 bits)
Uniqueness Id
(32 bits)
Row ID
Row Data
Row ID
Row Data
Row Hash
Unique ID
Emp_No
Last_Name
First_Name
3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
1018
1020
1031
1014
1012
Reynolds
Davidson
Green
Jacobs
Chevas
Jane
Evan
Jason
Paul
Jose
3B11 5034
0000 0002
1021
Carnet
Jean
Secondary Index :
There are 3 general ways to access a table:
Primary Index access
(one AMP access)
Secondary Index access
(two or all AMP access)
Full Table Scan
(all AMP access)
A secondary Index provides an alternate path to the rows of a table.

A table can have from 0 to 32 secondary indexes.
Secondary Indexes:
Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance
Choosing a Secondary Index

A Secondary Index may be defined ...
at table creation
(CREATE TABLE)
following table creation
(CREATE INDEX)
it supports up to 64 columns
NUSI
USI
If the index choice of column(s) is unique,
it is called a USI.
Unique Secondary Index)
If the index choice of column(s) is nonunique, it is called a NUSI.

Non-Unique Secondary Index
Accessing a row via a USI is a 2 AMP

operation.
Accessing row(s) via a NUSI is an all AMP

operation.
CREATE UNIQUE INDEX

(Employee_Number) ON Employee;
CREATE INDEX
(Last_Name) ON Employee;
Notes:
Secondary Indexes cause an internal sub-table to be built.

Dropping the index causes the sub-table to be deleted.
Unique Secondary Index (USI) Access

Message Passing Layer
Create USI
CREATE UNIQUE INDEX
(Cust) ON Customer;
AMP 1
AMP 2
USI Subtable
Access via USI
RowID
244, 1
505, 1
744, 4
757, 1
SELECT *
FROM
Customer
WHERE Cust = 56;
Cust
74
77
51
27
RowID
884, 1
639, 1
915, 9
388, 1
AMP 3
USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1
Cust
98
84
56
49
100
Cust
31
40
45
95
RowID
638, 1
640, 1
471, 1
778, 3
RowID
175, 1
489, 1
838, 4
919, 1
Cust
37
72
12
62
RowID
107, 1
717, 2
147, 2
822, 1
778

USI Value = 56
Hashing
Algorithm
Table ID
RowID
288, 1
339, 1
372, 2
588, 1
USI Subtable
Row Hash Unique Val
100
Customer
Table ID = 100
USI Subtable
RowID
555, 6
536, 5
778, 7
147, 1
Table ID
PE
AMP 4
AMP 1
AMP 2
AMP 3
AMP 4
Base Table
Base Table
Base Table
Base Table
Row Hash USI Value

602
to MPL
56
RowID Cust Name

USI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith
Phone
NUPI
555-4444
666-5555
111-2222
222-3333
RowID Cust Name

USI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
RowID Cust Name

USI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black
Phone
NUPI
111-6666
777-4444
222-8888
444-5555
RowID Cust Name

USI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
Non-Unique Secondary Index (NUSI) Access

Create NUSI
AMP 1
CREATE INDEX (Name) ON

Customer;
AMP 2
NUSI Subtable
Access via NUSI

SELECT *
FROM
Customer
WHERE Name = 'Adams';
RowID
432, 8
448, 1
567, 3
656, 1
Name
Smith
White
Adams
Rice
RowID
640, 1
107, 1
638, 1
536, 5
AMP 3
NUSI Subtable
RowID Name
432, 3 Smith
567, 2 Adams
852, 1
Brown
RowID
884, 1
471, 1
717, 2
555, 6
AMP 4
NUSI Subtable
RowID
432, 1
448, 4
567, 6
770, 1
Name
Smith
Black
Jones
Young
RowID
147, 1
822, 1
338, 1
147, 2
NUSI Subtable
RowID
155, 1
396, 1
432, 5
567, 1
Name
Marsh
Peters
Smith
Jones
RowID
915, 9
778, 3
778, 7
639, 1
PE
Customer
NUSI Value = 'Adams'
Table ID = 100
AMP 1
AMP 2
AMP 3
AMP 4
Hashing
Algorithm
Base Table
Table ID
100
Row Hash NUSI Value

567
to MPL
Adams
RowID Cust Name

NUSI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith
Phone
NUPI
555-4444
666-5555
111-2222
222-3333
Base Table
RowID Cust Name
NUSI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
Base Table
RowID Cust Name
NUSI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black
Phone
NUPI
111-6666
777-4444
222-8888
444-5555
Base Table
RowID Cust Name
NUSI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
Full Table Scans

Every row of the table must be read.
All AMPs scan their portion of the table in parallel.
Fast and efficient on Teradata due to parallelism.
Full table scans typically occur when either:

An index is not used in the query
An index is used in a non-equality test
Customer
Cust_ID
USI
Cust_Name
Cust_Phone
NUPI
Examples of Full Table Scans:

SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name = 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;
Partitioned Primary Indexes (PPI)

What is a Partitioned Primary Index or PPI?
A new indexing mechanism in Teradata.
Data rows can be grouped into partitions at the AMP level.
What advantages does a PPI provide?
Increases the available options to improve the performance of certain types of
queries.
Only the rows of the qualified partitions in a query need to be accessed - avoid full
table scans.
Types of Partition Primary Index :
Range Based Partition and Case Based Partition.
As always, data is distributed among AMPs and automatically placed
within partitions.
In a table defined with a PPI, each row is uniquely identified by its Row Key.
Row Key = Partition # + Row Hash + Uniqueness Value
Logical Example of NPPI versus PPI

4 AMPs with
Orders Table defined
with NPPI.
4 AMPs with
Orders Table defined
with PPI on O_Date.
SELECT
WHERE O_Date
BETWEEN '2002-11-01'
AND '200211-30';
RH
O_#
'01'
1028
'03'
O_Date
RH
O_#
02/11
'06'
1009
1016
02/10
'07'
'12'
1031
02/11
'14'
1001
'17'
RH
O_#
RH
O_#
02/09
'04'
1008
02/09
'02'
1024
02/10
1017
02/10
'05'
1048
02/12
'08'
1006
02/09
'10'
1034
02/11
'09'
1018
02/10
'11'
1019
02/10
02/09
'13'
1037
02/12
'15'
1042
02/12
'18'
1041
02/12
1013
02/10
'16'
1021
02/10
'19'
1025
02/11
'20'
1005
02/09
'23'
1040
02/12
'21'
1045
02/12
'24'
1004
02/09
'22'
1020
02/10
'28'
1032
02/11
'26'
1002
02/09
'27'
1014
02/10
'25'
1036
02/11
'30'
1038
02/12
'29'
1033
02/11
'32'
1003
02/09
'31'
1026
02/11
'35'
1007
02/09
'34'
1029
02/11
'33'
1039
02/12
'38'
1046
02/12
'39'
1011
02/09
'36'
1012
02/09
'40'
1035
02/11
'41'
1044
02/12
'42'
1047
02/12
'36'
1043
02/12
'44'
1022
02/10
'43'
1010
02/09
'48'
1023
02/10
'45'
1015
02/10
'47'
1027
02/11
'46'
1030
02/11
RH
O_#
RH
O_#
RH
O_#
RH
O_#
'14'
1001
02/09
'06'
1009
02/09
'04'
1008
02/09
'08'
1006
02/09
'35'
1007
02/09
'26'
1002
02/09
'24'
1004
02/09
'20'
1005
02/09
'39'
1011
02/09
'36'
1012
02/09
'32'
1003
02/09
'43'
1010
02/09
'03'
1016
02/10
'07'
1017
02/10
'09'
1018
02/10
'02'
1024
02/10
'17'
1013
02/10
'16'
1021
02/10
'27'
1014
02/10
'11'
1019
02/10
'48'
1023
02/10
'45'
1015
02/10
'44'
1022
02/10
'22'
1020
02/10
'01'
1028
02/11
'10'
1034
02/11
'19'
1025
02/11
'25'
1036
02/11
'12'
1031
02/11
'29'
1033
02/11
'40'
1035
02/11
'31'
1026
02/11
'28'
1032
02/11
'34'
1029
02/11
'47'
1027
02/11
'46'
1030
02/11
'23'
1040
02/12
'13'
1037
02/12
'05'
1048
02/12
'18'
1041
02/12
'30'
1038
02/12
'21'
1045
02/12
'15'
1042
02/12
'38'
1046
02/12
'42'
1047
02/12
'36'
1043
02/12
'33'
1039
02/12
'41'
1044
02/12
O_Date
O_Date
O_Date
O_Date
O_Date
O_Date
O_Date
Partitioning with RANGE_N

Notes:
Partition current sales table into daily partitions.

Assume current sales table only has data for the first 3 months of 2003,
but we have defined partitions for the entire year 2003.
It is relatively easy to ALTER the table to extend the partitions for 2004.
A UPI is allowed because the partitioning columns are part of the PI.
CREATE TABLE Sales
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold
INTEGER,
UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date)
PARTITION BY RANGE_N (
sales_date
BETWEEN DATE '2003-01-01' AND DATE '2003-12-31'
EACH INTERVAL '1' MONTH);
Partitioning with CASE_N

Notes:
Partition the data based on total revenue for the products.

The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or unknown
revenue.
A UPI is NOT allowed because the partitioning columns are NOT part of the PI.
CREATE TABLE Sales_Revenue
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold
INTEGER,)
PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY CASE_N
( total_revenue < 2000 , total_revenue
total_revenue < 6000 , total_revenue
NO CASE,
UNKNOWN);
<
4000 ,
<
8000 ,
< 20000 ,
< 100000 ,
Join Index :

Teradata Day2

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Teradata Day2

Загружено:

Авторское право:

Доступные форматы

TERADATA- DAY 2

Derived Table Example :

Show all employees and their Average Salary per department!

Derived table Example :

Derived table name is tmp.

NO LOG allows for better performance.

The Three Steps to Use a Volatile Table :

INSERT INTO Dept_Agg_Vol

HELP VOLATILE TABLE ;

SessionID TableName TableId Protection CreatorName CommitOption TransactionLog

Global Temporary tables are local to a session, like volatile tables.

The Three Steps to using a Global Temporary Table

Types of Primary Index :

CREATE TABLE sample_2

The hashing algorithm is designed to insure even distribution of

A Row Hash is the 32-bit result of applying a hashing algorithm to

A Hash Map is uniquely configured for each system.

SELECT * FROM order

32 bit Row Hash

0110 1001 0001 1011

0001 0100 1010 1110

The Hash Map

32 bit Row Hash

0110 1001 0001 1011

0001 0100 1010 1110

A Row Hash = 32 bits = 4.2 billion possible

Data values input

A row hash is not adequate to uniquely identify a row.

Rows are logically

(one AMP access)

Secondary Index access

(two or all AMP access)

Full Table Scan

(all AMP access)

A secondary Index provides an alternate path to the rows of a table.

Choosing a Secondary Index

If the index choice of column(s) is nonunique, it is called a NUSI.

Accessing a row via a USI is a 2 AMP

Accessing row(s) via a NUSI is an all AMP

CREATE UNIQUE INDEX

Secondary Indexes cause an internal sub-table to be built.

Unique Secondary Index (USI) Access

Access via USI

Message Passing Layer

Row Hash Unique Val

Row Hash USI Value

RowID Cust Name

RowID Cust Name

RowID Cust Name

RowID Cust Name

Non-Unique Secondary Index (NUSI) Access

CREATE INDEX (Name) ON

Access via NUSI

Row Hash NUSI Value

RowID Cust Name

Full Table Scans

Full table scans typically occur when either:

Examples of Full Table Scans:

SELECT * FROM Customer WHERE Cust_ID > 1000;

Partitioned Primary Indexes (PPI)

Row Key = Partition # + Row Hash + Uniqueness Value

Logical Example of NPPI versus PPI

Partitioning with RANGE_N