Академический Документы
Профессиональный Документы
Культура Документы
First Edition
Published by
Coffing Publishing
Mr. Coffing has also published over 20 data warehousing articles and has been a
contributing columnist to DM Review on the subject of data warehousing. He wrote a
monthly column for DM Review entitled, "Teradata Territory". He is a nationally known
speaker and gives frequent seminars on Data Warehousing. He is also known as "The
Speech Doctor" because of his presentation skills and sales seminars.
Tom Coffing has taken his expert speaking and data warehouse knowledge and
revolutionized the way technical training and consultant services are delivered. He
founded CoffingDW with the same philosophy more than a decade ago. Centered around
10 Teradata Certified Masters this dynamic and growing company teaches every Teradata
class, provides world class Teradata consultants, offers a suite of software products to
enhance Teradata data warehouses, and has eight books published on Teradata.
Tom has a bachelor's degree in Speech Communications and over 25 years of business
and technical computer experience. Tom is considered by many to be the best technical
and business speaker in the United States. He has trained and consulted at so many
Teradata sites that students affectionately call him Tera-Tom.
Teradata Certified Master
- Teradata Certified Professional
- Teradata Certified Administrator
- Teradata Certified Developer
- Teradata Certified Designer
Table of Contents
Chapter 1 The Rules of Data Warehousing ................................................................... 1
Teradata Facts ..................................................................................................................... 2
Teradata: Brilliant by Design.............................................................................................. 3
The Teradata Parallel Architecture ..................................................................................... 4
A Logical View of the Teradata Architecture..................................................................... 6
The Parsing Engine (PE)..................................................................................................... 7
The Access Module Processors (AMPs)............................................................................. 8
The BYNET ........................................................................................................................ 9
A Visual for Data Layout.................................................................................................. 10
Teradata is a shared nothing Architecture ........................................................................ 11
Teradata has Linear Scalability......................................................................................... 12
How Teradata handles data access.................................................................................... 13
Teradata Cabinets, Nodes, VPROCs, and Disks............................................................... 14
LAN Connection for Network Attached Clients .............................................................. 15
Mainframe Connection to Teradata .................................................................................. 16
Chapter 2 Data Distribution Explained........................................................................ 17
Rows and Columns ........................................................................................................... 18
The Primary Index ............................................................................................................ 19
The Two Types of Primary Indexes.................................................................................. 20
Unique Primary Index (UPI)............................................................................................. 21
Non-Unique Primary Index............................................................................................... 22
How Teradata Turns the Primary Index Value into the Row Hash .................................. 23
The Row Hash determines the Rows Destination............................................................. 24
The Row is Delivered to the Proper AMP ........................................................................ 25
The AMP will add a Uniqueness Value............................................................................ 26
An Example of an UPI Table............................................................................................ 27
An Example of an NUPI Table......................................................................................... 28
How Teradata Retrieves Rows with the Primary Index.................................................... 29
Row Distribution............................................................................................................... 30
A Visual for Data Layout.................................................................................................. 31
Teradata accesses data in three ways ................................................................................ 32
Data Layout Summary ...................................................................................................... 33
Chapter 3 Teradata Space ............................................................................................ 35
How Permanent Space is calculated ................................................................................. 35
How Permanent Space is Given........................................................................................ 36
The Teradata Hierarchy .................................................................................................... 37
How Spool Space is calculated ......................................................................................... 38
A Spool Space Example.................................................................................................... 39
PERM, SPOOL and TEMP Space .................................................................................... 40
Spool Space controls system time..................................................................................... 41
A quiz on Perm and Spool Space...................................................................................... 42
Another quiz on Perm and Spool Space ........................................................................... 45
Table of Contents
II
Table of Contents
NUSI Subtable Example ................................................................................................... 88
How Teradata retrieves a NUSI query.............................................................................. 89
Value Ordered NUSI......................................................................................................... 90
How Teradata retrieves a Value Ordered NUSI query ..................................................... 91
Secondary Index Summary ............................................................................................... 92
Chart for Primary and Secondary Access ......................................................................... 93
Chapter 8 The Active Data Warehouse ....................................................................... 95
OLTP Environments ......................................................................................................... 96
The DSS environment....................................................................................................... 97
Mixing OLTP and DSS environments.............................................................................. 98
Detail Data ........................................................................................................................ 99
Easy System Administration........................................................................................... 100
Data Marts....................................................................................................................... 101
Teradata Tools - SQL Assistant...................................................................................... 102
TDQM............................................................................................................................. 103
Index Wizard................................................................................................................... 104
Archive Recovery ........................................................................................................... 105
Teradata Analyst Suite.................................................................................................... 106
III
Chapter 1
Teradata customers account for more than 70% of the revenue generated by the
top 20 global telecommunication companies
Teradata customers account for more than 55% of the revenue generated by the
top 30 Global retailers
Teradata customers account for more than 55% of the revenue generated by the
top 20 global airlines
More than 25% of the top 15 global insurance carriers use Teradata
Page 1
Chapter 1
Teradata Facts
Page 2
Chapter 1
Page 3
Chapter 1
Page 4
Chapter 1
Page 5
Chapter 1
P
E
P
BYNET Network
A
M
P
A
M
P
A
M
P
A
M
P
Disk
Disk
Disk
Disk
A
M
P
A
M
P
A
M
P
A
M
P
Disk
Disk
Disk
Disk
The user submits SQL to the Parsing Engine (PE). The PE checks the syntax and then
the security and comes up with a plan for the AMPs. The PE communicates with the
AMPs across the BYNET. The AMPs act on the data rows as needed and required.
Page 6
Chapter 1
Welcome!
I will be taking
care of you this
entire session.
PE
Logon
My wish is your
Commands!
Page 7
Chapter 1
PEs PLAN
PE
BYNET
AMP
Order_Table
Order_Item_Table
AMP
Order_Table
Order_Item_Table
AMP
Order_Table
Order_Item_Table
AMP
Order_Table
Order_Item_Table
Page 8
Chapter 1
The BYNET
PE
PE
PE
BYNET 0
BYNET 1
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
Page 9
Chapter 1
AMP 1
AMP 2
AMP 3
AMP 4
Employee
Table
Employee
Table
Employee
Table
Employee
Table
Order
Table
Order
Table
Order
Table
Order
Table
Customer
Table
Customer
Table
Customer
Table
Customer
Table
Student
Table
Student
Table
Student
Table
Student
Table
Page 10
Chapter 1
AMP
AMP
AMP
AMP
Memory
Memory
Memory
Memory
Disk
Disk
Disk
Disk
Customer_Table
Customer_Table
Customer_Table
Customer_Table
Order_Table
Order_Table
Order_Table
Order_Table
Employee_Table
Employee_Table
Employee_Table
Employee_Table
Dept_Table
Dept_Table
Dept_Table
Dept_Table
Page 11
Chapter 1
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
Page 12
Chapter 1
The AMPs retrieve and perform database functions on their requested rows
PE
BYNET
AMP
AMP
AMP
Order_Table
Order_Table
Order_Table
Cust_Table
Cust_Table
Cust_Table
These statements are true about session control responsible for load balancing across the
BYNET:
The Optimizer (PE) develops a new and separate plan to determine the best
response.
The Dispatcher takes steps from the parser and transmits them over the
BYNET.
Page 13
Chapter 1
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
Dual Power
BYNET 1
Node 1
PEs
Memory
AMPs
Node 2
PEs
Memory
AMPs
Node 3
PEs
Memory
AMPs
Node 4
PEs
Memory
AMPs
Dual Power
Page 14
Chapter 1
NODE 1
PEs
AMPs
NODE 2
PEs
AMPs
NODE 3
PEs
AMPs
NODE 4
PEs
AMPs
G
A
T
E
W
A
Y
S
O
F
T
W
A
R
E
CLI
MOSI
MTDP
E
T
H
E
T
H
LAN
E
T
H
E
T
H
CLI
MOSI
MTDP
CLI
MOSI
MTDP
Page 15
Chapter 1
NODE 1
PEs
AMPs
Bus/Tag Cables
Host
Adapter
NODE 2
PEs
AMPs
NODE 3
PEs
AMPs
CLI
TDP
ESCON Channel
Host
Adapter
ESCON Channel
NODE 4
PEs
AMPs
Bus/Tag Cables
Host
Adapter
Host
Adapter
CLI
TDP
Page 16
Chapter 2
Page 17
Chapter 2
FNAME
SAL
UPI
1
2
3
4
40
20
20
?
BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY
95000.00
70000.00
55000.00
34000.00
Teradata stores its information inside Tables. A table consists of rows and columns. A
row is one instance of all columns. According to relational concepts column positions
are arbitrary and a column always contains like data. Teradata does not care what
order you define the columns and Teradata does not care about the order of rows in a
table. Rows are arbitrary also, but once a row format is established then Teradata will
use that format because a Teradata table can have only one row format.
There are many benefits of not requiring rows to be stored in order. Unordered data
does not have to be maintained to preserve the order. Unordered data is independent
of the query.
ROW
40 Brown Chris
95000
Every AMP will hold a portion of every table. Rows are sent to their destination AMP
based on the value of the column designated as the Primary Index.
Page 18
Chapter 2
Page 19
Chapter 2
Page 20
Chapter 2
Employee Table
FNAME
SAL
UPI
1
2
3
4
40
20
20
?
BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY
95000.00
70000.00
55000.00
34000.00
A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst
the AMPs. UPI access is always a one-AMP operation. It also requires no duplicate
row checking.
Page 21
Chapter 2
FNAME
SAL
NUPI
1
2
3
4
40
20
20
?
BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY
95000.00
70000.00
55000.00
34000.00
Page 22
Chapter 2
New Teradata
Row
Hash the
PI Value
PI
Value
EMP DEPT LNAME FNAME SAL
------ ------- ---------- ---------- ------99
10
Hosh
Roland 50000
99 / HASH FORMULA =
00001111000011110000111100001111
A new row is going to be inserted into Teradata. The Primary Index is the column called
EMP. The value in EMP for this row is 99. Teradata runs the value of 99 through the
Hash Formula and the output is a 32-bit Row Hash. In his example our 32-bit Row Hash
output: 00001111000011110000111100001111.
Page 23
Chapter 2
2
1
4
3
3
2
1
4
4 1 2
3 4 1
2 3 4
1 2 3
3
2
1
4
Page 24
Chapter 2
AMP 4
Row Hash
FNAME SAL
---------- ------Roland 50000
The Primary Index Value for the Row is put into the Hash Algorithm
The row along with the Row Hash are delivered to that AMP
Page 25
Chapter 2
AMP 4
Uniqueness
Value
Row Hash
00001111000011110000111100001111
Page 26
Chapter 2
AMP 4
Uniqueness
Value
Row Hash
00001111000011110000111100001111
01010101010101010000000000000000
01010111111111111111111111111111
11111111111111111100000000000000
1
1
1
1
21
10
Wilson
Barry
75000
20
Holland
Mary
86000
44
30
Davis
Sandy
54000
The above Employee Table has a Unique Primary Index on the column EMP. Notice that
Row ID sorts the portion of the table on AMP 4. Notice that the Uniqueness Value for
each row is 1.
Page 27
Chapter 2
AMP 4
Uniqueness
Value
Row Hash
00000000000000000000000000111111
00000000000000000000000000111111
00000000000000000000000000111111
11111111110000000000000000000000
1
2
3
NUPI
77
10
Davis
Sara
75000
20
Davis
Mary
86000
10
Allen
Sandy
54000
The above Employee Table has a Non-Unique Primary Index on the column LNAME.
Notice that each row with the LNAME of Davis has the exact same Row Hash. Notice
that the Uniqueness Value for each Davis is incremented by 1.
Each time the LNAME is Davis, the Hashing Algorithm generates the Row Hash:
000000000000000000000000000011111
That Row Hash points to the exact same bucket in the Hash Map. This particular bucket
in the Hash Map references (or points to) AMP 4.
The Row Hash accompanied each row to AMP 4. The AMP assigned Uniqueness Values
of 1, 2 and 3 to the three rows with the LNAME of Davis.
Notice that Row ID sorts the portion of the table on AMP 4.
Page 28
Chapter 2
SQL
SELECT *
FROM Employee
WHERE EMP = 99;
PE
99 / HASH Formula
Row Hash 00001111000011110000111100001111
1
2 3 4
1 2 3
4
3
1 2 3 4 1 2
4 1 2 3 4 1
3 4 1
2 3 4
AMP 4
Row Hash
00001111000011110000111100001111
01010101010101010000000000000000
01010111111111111111111111111111
11111111111111111100000000000000
1
1
1
1
10
Wilson
Barry
75000
20
Holland
Mary
86000
44
30
Davis
Sandy
54000
Page 29
Chapter 2
Row Distribution
In the examples below we see three different Teradata Systems. The first system has
used Last_Name as a Non-Unique Primary Index (NUPI). The second example has used
Sex_Code as a Non-Unique Primary Index (NUPI).
The last example uses
Employee_Number as a Unique Primary Index (UPI).
Example # 1
AMP AMP
**
Davis
Davis
Woods
Example # 2
Jones
Rex
Male
Male
Male
*
*
Smith
Johnson
Smith
AMP
Kelly
Kelly
Hanson
Hanson
Tess
AMP AMP
Example # 3
AMP
AMP
AMP
Female
Female
Female
AMP AMP
1
5
77
22
9
15
AMP
AMP
13
99
2
34
16
4
Page 30
Chapter 2
AMP 1
AMP 2
AMP 3
AMP 4
Employee
Table
Employee
Table
Employee
Table
Employee
Table
Order
Table
Order
Table
Order
Table
Order
Table
Customer
Table
Customer
Table
Customer
Table
Customer
Table
Student
Table
Student
Table
Student
Table
Student
Table
Page 31
Chapter 2
Primary Index (fastest) - When ever a Primary Index is utilized in the SQL WHERE
Clause the PE will be able to use the Primary Index to get the data with a one-AMP
operation.
Secondary Index (next fastest) - If the Primary Index is not utilized sometimes Teradata
can utilize a secondary index. It is not as fast as the Primary Index, but it is much faster
than a full table scan.
Full Table Scan (FTS) (Slowest)
Teradata handles full table scans brilliantly because Teradata accesses each data row
only once because of the parallel processing. Full Table Scans are a way to access
Teradata without using an index. Each data block per table is read only once.
AMP
Sal
Emp Dept Name
99
10 Vu Du 55000
88
20 Sue Lou 59000
75
30 Bill Lim 76000
AMP
Sal
Emp Dept Name
45
10 Ty Law 58000
56
20 Kim Hon 57000
83
30 Jela Rose 79000
AMP
Sal
Emp Dept Name
22
10 Al Jon 85000
38
40 Bee Lee 59000
25
30 Kit Mat 96000
AMP
Sal
Emp Dept Name
44
40 Sly Win 85000
57
40 Wil Mar 59000
93
10 Ken Dew 96000
When Teradata does a Full Table Scan of the above how many
rows are read? 12 How many per AMP? 3
Page 32
Chapter 2
NUPI
0-1
0-Many
Page 33
Chapter 3
Page 34
Chapter 3
100 Gigabytes
DBC Owns 100%
of the Permanent
Space of a new system
If DBC owns 100 Gigabytes of Perm Space
then it actually owns 25 GB (per AMP)
on a 4-AMP system because all space is
Calculated on a per AMP basis.
AMP
AMP
AMP
AMP
25 GB
25 GB
25 GB
25 GB
Page 35
Chapter 3
100 Gigabytes
DBC Owns 100%
of the Permanent
Space of a new system
If DBC gives 40 Gigabytes of Perm Space to MRKT
100 Gigabytes
DBC Owns 60 GB
MRKT Owns 40 GB
Page 36
Chapter 3
100 Gigabytes
DBC Owns 60 GB
MRKT Owns 40 GB
DBC
10 GB
SALES
50 GB
MRKT
Advertising
20 GB
20 GB
Page 37
Chapter 3
Tables
Tables
Tables
Tables
Page 38
Chapter 3
Sales is assigned
10 Gigabytes of Spool
MRKT is assigned
20 Gigabytes of Spool
USER 1
USER 3
USER 2
How much spool space can be assigned to the users in MRKT? Could they each run a
query simultaneously that reached 19.5 Gigabytes of spool? Yes!
Copyright Open Systems Services 2004
Page 39
Chapter 3
A
M
P
A
M
P
A
M
P
A
M
P
PERM SPACE
SPOOL SPACE
TEMP SPACE
Copyright Open Systems Services 2004
Page 40
Chapter 3
Page 41
Chapter 3
MRKT
10,000,000 Bytes of Perm
10,000,000 Bytes of Spool
STEVE
1,000,000 Bytes of Perm
10,000,000 Bytes of Spool
SALES
5,000,000 Bytes of Perm
5,000,000 Bytes of Spool
Mandy
1,000,000 Bytes of Perm
10,000,000 Bytes of Spool
SALES
Mandy
Page 42
Chapter 3
__________________
Page 43
Chapter 3
Answers:
_______10M_________
_______5M___________
Page 44
Chapter 3
Page 45
Chapter 4
Page 46
Chapter 4
Page 47
Chapter 4
V2R4 Example
If you are on a V2R4 machine then each table is distributed to the AMPs based on
Primary Index Row Hash and then sorted on that AMP by Row ID. The example below
is also a Non-Partitioned Primary Index in V2R5.
AMP 2
Order Table
Order Table
Row
Hash
Order
Date
01
05
08
09
80
87
98
2-1-2003
1-1-2003
3-1-2003
1-2-2003
1-5-2003
2-4-2003
3-2-2003
Order
Number
99
88
95
6
77
14
17
Row
Hash
02
04
12
42
52
55
88
Order
Date
2-2-2003
1-10-2003
3-5-2003
1-6-2003
3-6-2003
2-5-2003
1-22-2003
Order
Number
44
53
16
100
35
15
74
Page 48
Chapter 4
V2R5 Partitioning
Notice that the Primary Index is now a Partition Primary Index on ORDER_DATE. The
Order_Date was hashed and rows were distributed to the same exact AMP as before. The
only difference is that the data in partitions of Order Date Months and then sorted by
Row Hash. The query below does not take a Full Table Scan because the January orders
are all together in their partition. Partitioned Primary Indexes (PPI) are best for
queries that specify range constraints.
AMP 2
Order Table
Order Table
Row Order
Hash Date
05
09
80
01
87
08
98
1-1-2003
1-2-2003
1-5-2003
2-1-2003
2-4-2003
3-1-2003
3-2-2003
Order
Number
88
6
77
99
14
95
17
Row Order
Hash Date
04
42
88
02
55
12
52
1-10-2003
1-6-2003
1-22-2003
2-2-2003
2-5-2003
3-5-2003
3-6-2003
Order
Number
53
100
74
44
15
16
35
Page 49
Chapter 4
Page 50
Chapter 4
AMP 1
Part 1
Part 2
AMP 2
Employee_Table
Employee_Table
99
75
56
30
54
40
10
10
10
20
20
20
Tom
Mike
Sandy
Leona
Robert
Morgan
13
12
21
16
55
70
10
10
10
20
20
20
Ray
Jeff
Randy
Janie
Chris
Gareth
SELECT *
FROM Employee_Table
WHERE Dept = 20;
Answer: 1
Partition Primary Indexes reduce the number of rows that are processed by using
partition elimination.
Page 51
Chapter 4
AMP 1
Part 1
Part 2
AMP 2
Employee_Table
Employee_Table
99
75
56
30
54
40
10
10
10
20
20
20
Tom
Mike
Sandy
Leona
Robert
Morgan
13
12
21
16
55
70
10
10
10
20
20
20
Ray
Jeff
Randy
Janie
Chris
Gareth
Page 52
Chapter 4
Example 1:
Example 2:
SELECT *
FROM Employee_Table
WHERE employee = 99
AND Dept = 10;
Partition is Dept
Page 53
Chapter 4
Just like the CASE statement it evaluates a list of conditions picking only the first
condition met.
The data row will be placed into a partition associated with that condition.
Page 54
Chapter 4
The data row is placed into the partition that falls within the associated range.
In the example below please notice the arrows. They are designed to illustrate that you
can use a UNIQUE PRIMARY INDEX on a Partitioned table when the Partition is part
of the PRIMARY INDEX.
Page 55
Chapter 4
Partition by CASE_N
( Salary < 30000,
Salary < 50000,
Salary < 100000
Salary < 1000000,
NO CASE OR UNKNOWN)
If you dont see the OR operand associated with UNKNOWN then NULLs will be placed
in the UNKNOWN Partition and all other rows that dont meet the CASE criteria will be
placed in the NO CASE partition. This example has a total of 6 partitions.
Partition by CASE_N
( Salary < 30000,
Salary < 50000,
Salary < 100000
Salary < 1000000,
NO CASE, UNKNOWN)
Page 56
Chapter 5
Transaction Concept
Transient Journal
FALLBACK
RAID
Clustering
Cliques
Permanent Journaling
Page 57
Chapter 5
The Transient Journal is automatic and it takes a before picture of any update or delete
for rollback purposes.
Page 58
Chapter 5
AMP
Transient
Journal
AMP
Transient
Journal
AMP
AMP
Transient
Journal
Transient
Journal
Transient Journal
Camera
Last
Name
Davis
Salary
78,000
Page 59
Chapter 5
FALLBACK Protection
Page 60
Chapter 5
1
5
9
2
6
10
3
7
11
4
8
12
Base
Table
Rows
10
7
4
1
11
8
5
2
12
9
6
3
Fallback
Rows
Page 61
Chapter 5
Fallback Clusters
Fallback is always associated with CLUSTERS. Fallback can be specified at the
table level. Fallback is worth the price because when an AMP fails users still have
access to the data even while the AMP is offline. Any data that has changed is
automatically restored during the AMP offline period.
If we can lose any one AMP/disk, what happens if we lose two? The chance of losing
two AMPs in a four-AMP system is rare, however some systems have nearly 2,000
AMPs. Therefore, the chance of losing two AMPs in a 2,000 AMP system is much
greater than in a four-AMP system. Thats why Teradata designed Clustering. With
Clustering, Teradata can lose one AMP/disk per cluster. Lets look at this next
example with 8 AMPs in two clusters.
Notice that the data in the base table lays out evenly with 24 records on 8 AMPs. What
is key to notice is that the fallback copy remains within the cluster. In other words, the
base table rows in cluster one are fallback protected within cluster one. The base table
rows in cluster two are fallback protected within cluster two. We can lose one
AMP/disk in both cluster one and cluster two and the system is fine.
Cluster # 1
1
9
17
2
10
18
3
11
19
4
12
20
Base
Table
Rows
18
1
19
12
9
2
20
17
10
3
Fallback
Rows
5
13
21
6
14
22
7
15
23
8
16
24
Base
Table
Rows
22
15
8
5
23
16
13
6
24
21
14
7
Fallback
Rows
11
4
Cluster # 2
Page 62
Chapter 5
Cluster # 1
1
9
17
18
11
4
DARJ
DARJ
2
10
18
3
11
19
4
12
20
Base
Table
Rows
1
19
12
9
2
20
17
10
3
Fallback
Rows
Cluster # 2
5
13
21
6
14
22
7
15
23
8
16
24
Base
Table
Rows
22
15
8
5
23
16
13
6
24
21
14
7
Fallback
Rows
Page 63
Chapter 5
A
M
P
Mirror
Data
Mirror
Page 64
Chapter 5
Cliques
Teradata CLIQUES (pronounced cleeks) are a method of system protection against the
failure of an entire node. Each node contains in memory AMP VPROCs. Each AMP is
attached to one virtual disk (Vdisk) and that AMP is the only Vproc allowed access to
its Vdisk. A Clique utilizes access to a set of disks from another node. If a node fails the
AMP VPROCs can migrate to the node that has the backup access to its virtual disk. The
migrating AMP can continue to read and write to its Vdisk while its home node is down.
When the home node is fixed and available again the VPROCs return home.
If a Teradata system uses two-node cliques then when one node fails all of its AMP
VPROCs migrate to the other node. The system is now about 50% slower. To solve this
problem Teradata allows bigger cliques such as eight nodes. If one node fails, its
VPROCs split up and migrate amongst the seven other nodes in the clique without much
performance degradation.
NODE 2
BYNET
INTEL
INTEL
BYNET
AMPs
INTEL
AMPs
Clique
Cables
D
A
C
D
A
C
Clique
Cables
D
A
C
D
A
C
Page 65
Chapter 5
NODE 2
BYNET
INTEL
INTEL
X
D
A
C
to Node 2
INTEL
AMPs
BYNET
Clique
Cables
D
A
C
Clique
Cables
D
A
C
D
A
C
Page 66
Chapter 5
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
D
A
C
Dual Power
BYNET 1
Node 1
PEs
Memory
AMPs
Node 2
PEs
Memory
AMPs
Node 3
PEs
B
Y
N
E
T
Memory
AMPs
Node 4
PEs
Memory
AMPs
Dual Power
Page 67
Chapter 5
Permanent Journal
Page 68
Chapter 5
Table create
Journaling
with
Fallback
and
Permanent
The example created the table called Employee in the Teratom database, and is
FALLBACK protected. A BEFORE Journal and a DUAL AFTER Journal are specified.
Remember that both FALLBACK and JOURNALING have defaults of NO - meaning
if you dont specify this protection at either the table or database level the default is NO
FALLBACK and NO JOURNALING.
FALLBACK,
BEFORE JOURNAL,
DUAL AFTER JOURNAL
(
emp
,dept
,lname
,fname
,salary
,hire_date
INTEGER
INTEGER
CHAR(20)
VARCHAR(20)
DECIMAL(10,2)
DATE
)
UNIQUE PRIMARY INDEX(emp);
Page 69
Chapter 5
Locks
Page 70
Chapter 5
Exclusive Lock
Database
Write Lock
Table
Read Lock
Row Hash
Access Lock
Page 71
Chapter 5
Teradata Lock
Compatible Locks
Exclusive Lock
No Compatibility
Write Lock
Access Lock
Read Lock
Read Lock
Access Lock
Access Lock
Read Lock
Write Lock
Access Lock
An ACCESS Lock is an excellent way to avoid waiting for a write lock currently on a
particular table. Two statements allow this:
Locking Row for Access
Locking Tablename for Access
Copyright Open Systems Services 2004
Page 72
Chapter 6
Page 73
Chapter 6
warehouses in the world. The combination of FastLoad, MultiLoad, and TPump can load
millions, even billions of records in record time.
FastLoad is designed to load flat file data from a mainframe or LAN directly into an
empty Teradata table. This is how a Teradata table is populated the first time. I have
personally seen Teradata load over one billion large rows in less than 6 hours. Plus, I
have seen Teradata load millions of rows in minutes. How is Teradatas speed and
performance accomplished? Once again its through the power of parallel processing.
Where FastLoad is meant to populate empty tables with INSERTs, MultiLoad is meant to
process INSERTs, UPDATEs, and DELETEs on tables that have existing data.
MultiLoad is extremely fast. One major Teradata data warehouse company processes
120 million inserts, updates, and deletes nightly during its batch window.
The TPump utility is designed to allow OLTP transactions to immediately load into a
data warehouse. When I started working with Teradata, more than 10 years ago, most
companies loaded data on a monthly basis. Suddenly, companies began to load data
weekly.
Today, most companies load data nightly, and industry leaders are loading data hourly.
TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines
OLTP transactions with the power of a Decision Support System (DSS).
The TPump utility theoretically acts like a water faucet. TPump can be set to full throttle
to load millions of transactions during off peak hours or turned down to trickle small
amounts of data during the data warehouse daily rush hour. It can also be automatically
preset to load levels at certain times during the day, and can be modified at any time.
Also, TPump locks at a row level so users have access to the rest of the rows while the
table is being loaded. Another advantage of this load utility is that it allows for multiple
updates to be conducted on a table simultaneously.
When the utilities start, the Parsing Engine comes up with a plan for the AMPs. The
Parsing Engine then steps back and lets the AMPs do their work. The data is loaded in
large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of
workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from
AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they
received and hash the Primary Index value sending the rows over the BYNET to their
destination AMP. Once this is done, each AMP sorts its data by Row ID and the table is
ready for business.
Page 74
Chapter 6
FastLoad
FastLoad populates empty tables at the block level. Teradata LOADs using FastLoad.
Page 75
Chapter 6
FastLoad Picture
Teradata
PE
AMP
AMP
AMP
AMP
Empty
Table
AMP
Empty
Table
AMP
Empty
Table
AMP
Empty
Table
Page 76
Chapter 6
Multiload
Multiload loads to populated tables at the block level. Teradata UPDATEs using
MULTILOAD.
Page 77
Chapter 6
Multiload Picture
Teradata
PE
AMP
AMP
AMP
AMP
AMP
AMP
AMP
Populated
Table
Populated
Table
Populated
Table
Populated
Table
Page 78
Chapter 6
TPump
TPump is used for continuous updates to rows in a table. Teradata STREAMs using
TPump.
Page 79
Chapter 6
TPump Picture
Teradata
PE
AMP
AMP
AMP
AMP
AMP
AMP
AMP
Populated
Table
Populated
Table
Populated
Table
Populated
Table
Row Level
Locks
Row Level
Locks
Row Level
Locks
Row Level
Locks
Page 80
Chapter 6
FastExport
Page 81
Chapter 6
FastExport Picture
Output to a
Mainframe or LAN
Teradata
PE
Host
File
AMP
AMP
AMP
AMP
AMP
AMP
AMP
Populated
Table
Populated
Table
Populated
Table
Populated
Table
Page 82
Chapter 7
Page 83
Chapter 7
USI subtable row (in the Secondary Index subtable) that references the actual data row,
which resides on the second AMP.
A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool
file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:
Processing aggregates
Value comparisons
Joining tables
Secondary
Index Value
(Actual Length)
Secondary
Index Row-ID
8 Bytes
Primary Index
Row-ID
8 Bytes
Page 84
Chapter 7
Page 85
Chapter 7
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
123-99-8888
146-69-2650
235-83-8712
Marx
Mavis
Davis
Secondary
Index Row-ID
102,1
118,1
134,1
276-68-2130
235-83-8712
423-87-8653
Base Table
Row-ID
45,1
14,1
18,1
14,1
38,1
45,1
10
10
40
Max
Will
Oki
Secondary
Index Value
276-68-2130
423-87-8653
212-53-4532
Wiles
Berry
Ngu
146-69-2650
212-53-4532
123-99-8888
Secondary
Base Table
Index Row-ID Row-ID
121,1
138,1
144,1
04,1
25,1
38,1
Page 86
Chapter 7
Hash Map
1
2
2
1
1
2
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
STEP 1
123-99-8888
146-69-2650
235-83-8712
Locate the
Secondary
Index Value
In the Subtable.
Find the Base Table
Row-ID.
Secondary
Marx
Mavis
Davis
Secondary
Index Row-ID
102,1
118,1
134,1
276-68-2130
235-83-8712
423-87-8653
Base Table
Row-ID
45,1
14,1
18,1
14,1
38,1
45,1
S
T
E
P
10
10
40
Max
Will
Oki
Secondary
Index Value
Wiles
Berry
Ngu
Secondary
Index Row-ID
276-68-2130
423-87-8653
212-53-4532
121,1
138,1
144,1
146-69-2650
212-53-4532
123-99-8888
Base Table
Row-ID
04,1
25,1
38,1
Index Subtable
Index Subtable
Page 87
Chapter 7
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
John
Mary
Marx
Mavis
Davis
276 -68-2130
235 -83-8712
423 -87-8653
Secondary
Base Table
Index Row-ID Row-ID
145,1
156,1
04,1 25,1
18,1
ROW ID
14,1
38,1
45,1
10
10
40
Max
Will
Oki
Secondary
Index Value
Max
Will
Oki
Wiles
Berry
Ngu
146 -69-2650
212 -53-4532
123 -99-8888
Secondary
Base Table
Index Row-ID Row-ID
134,1
157,1
159,1
14,1
38,1
45,1
Page 88
Chapter 7
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
Find John
John
Mary
* Marx
Mavis
Davis
*
276 -68-2130
235 -83-8712
423 -87-8653
ROW ID
14,1
38,1
45,1
* 04,1
18,1
25,1
45
32
65
10
10
40
Max
Will
Oki
Secondary
Index Value
Secondary
Base Table
Index Row-ID Row-ID
145,1
156,1
Max
Will
Oki
O
W
S
Wiles
Berry
Ngu
146 -69-2650
212 -53-4532
123 -99-8888
Secondary
Base Table
Index Row-ID Row-ID
134,1
157,1
159,1
14,1
38,1
45,1
Page 89
Chapter 7
Employee Table with Value Ordered Non-Unique Index Secondary Index on Dept
Employee Base Table
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
10
20
30
Marx
Mavis
Davis
276 -68-2130
235 -83-8712
423 -87-8653
Secondary
Base Table
Index Row -ID Row-ID
145,1
156,1
158,1
18,1
04,1
25,1
ROW ID
14,1
38,1
45,1
10
10
40
Max
Will
Oki
Secondary
Index Value
10
40
Wiles
Berry
Ngu
146 -69-2650
212 -53-4532
123 -99-8888
Secondary
Base Table
Index Row -ID Row-ID
145,1
159,1
14,1 38,1
45,1
Each AMP will hold the secondary index values for their rows in the base table
only. In our example, each AMP holds the Dept column for all employee rows in
the base table on their AMP (AMP local).
Each AMP Local Dept will have the Base Table Row-ID (pointer) so the AMP
can retrieve it quickly if needed. This is excellent for Range queries because the
subtable is sorted numerically by Dept.
Page 90
Chapter 7
Employee Table with Value Ordered Non-Unique Index Secondary Index on Dept
Employee Base Table
ROW ID
04,1
18,1
25,1
20 John
10 Mary
30 John
Secondary
Index Value
10
20
30
Marx
Mavis
Davis
276 -68-2130
235 -83-8712
423 -87-8653
Secondary
Base Table
Index Row -ID Row-ID
145,1
156,1
158,1
18,1
04,1
25,1
ROW ID
14,1
38,1
45,1
10
10
40
Max
Will
Oki
Secondary
Index Value
10
40
Wiles
Berry
Ngu
146 -69-2650
212 -53-4532
123 -99-8888
Secondary
Base Table
Index Row -ID Row-ID
145,1
159,1
14,1 38,1
45,1
Page 91
Chapter 7
Page 92
Chapter 7
0-1
NUPI
0-Many
USI
0-1
NUSI
All
0-Many
Page 93
Chapter 8
Page 94
Chapter 8
Page 95
Chapter 8
OLTP Environments
Page 96
Chapter 8
Pre-defined Reports
Ad Hoc Queries
Data Mining
Analytical Modeling
Page 97
Chapter 8
Write
Lock
Read
Lock
DSS
Query
Tactical
Query
Write
Lock
An Active Data Warehouse consists of short tactical OLTP type queries mixed with large
Decision Support Queries. The OLTP queries like to WRITE lock the data which is bad
when other queries need to READ the data only.
The evolution of a true data warehouse takes time and the data warehouse activities will
naturally evolve towards an active data warehouse. In the beginning the warehouse is
used for analyzing which over time evolves into predicting and finally into
operationalizing.
Page 98
Chapter 8
Detail Data
Page 99
Chapter 8
Page 100
Chapter 8
Data Marts
Data Warehouse
Tables of Detail Data
A Logical
Data Mart
There are two types of data marts in logical and physical data marts. A logical data
mart is an existing part of the data warehouse, but a physical data mart resides on
another platform.
Page 101
Chapter 8
Page 102
Chapter 8
TDQM
File
Configuration
Server
Information
Error Log
Help
TDQM allows for a period of time to be established when TDQM can execute scheduled
requests that are waiting to run. This is usually done during off peak hours. TDQM
schedules jobs, which are considered an individual execution of an instance of a
scheduled request. A request is considered a definition of the parameters and text
associated with a scheduled request. Finally, a scheduled request is a stored script of
SQL requests to be executed at a scheduled time later in the day.
Page 103
Chapter 8
Index Wizard
Define a workload
The workload is analyzed
The Wizard recommends Secondary Indexes
Reports are generated
Indexes are validated
Indexes can be applied
Page 104
Chapter 8
Archive Recovery
ARC provides data protection when there is a loss of data on a failed AMP containing
Non-Fallback tables or when multiple AMPs go down within the same cluster rendering
the Fallback useless for the cluster. It can also be used when objects are dropped or rows
are deleted from a table or even Batch Processing miscues. When you think of ARC
think first of Disaster Recovery and second think of accidental stupid mistakes. Either
way ARC has got your back!
ARC does NOT work with Join Indexes or Hash Indexes. If you need to recover a Join
Index or Hash Index just make sure the tables that the Join Index were created on are
alright and then drop and recreate the Join Index or Hash Index manually. Many DBAs
actually save the DDL for Join Index and Hash Index creation for this purpose.
There are several ways to invoke ARC including NetVault, NetBackup, ASF2, Command
Line of ARCMain, or directly from the host or Mainframe.
Page 105
Chapter 8
Page 106
Chapter 8
This page blank on purpose
Page 107