Вы находитесь на странице: 1из 115

Tera-Tom on Teradata

Basics for V2R5


Understanding is the key!

First Edition

Published by
Coffing Publishing

First Edition June, 2004


Web Page: www.Tera-Tom.com and www.CoffingDW.com
E-Mail address:
Tom.Coffing@CoffingDW.Com
Written by W. Coffing
Teradata, NCR, BYNET, V2R5 are registered trademarks of NCR Corporation,
Dayton, Ohio, U.S.A., IBM and DB2 are registered trademarks of IBM Corporation,
ANSI is a registered trademark of the American National Standards Institute. In
addition to these products names, all brands and product names in this document are
registered names or trademarks of their respective holders.
Coffing Data Warehousing shall have neither liability nor responsibility to any person or
entity with respect to any loss or damages arising from the information contained in this
book or from the use of programs or program segments that are included. The manual is
not a publication of NCR Corporation, nor was it produced in conjunction with NCR
Corporation.
Copyright July 2004 by Coffing Publishing
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system,
or transmitted by any means, electronic, mechanical, photocopying, recording, or
otherwise, without written permission from the publisher. No patent liability is assumed
with respect to the use of information contained herein. Although every precaution has
been taken in the preparation of this book, the publisher and author assume no
responsibility for errors or omissions, neither is any liability assumed for damages
resulting from the use of information contained herein. For information, address:
Coffing Publishing
7810 Kiester Rd.
Middletown, OH 45042
International Standard Book Number: ISBN 0-9704980-8-X

Printed in the United States of America


All terms mentioned in this book that are known to be trademarks or service have been
stated. Coffing Publishing cannot attest to the accuracy of this information. Use of a
term in this book should not be regarded as affecting the validity of any trademark or
service mark.

About Coffing Data Warehousings CEO Tom Coffing


Tom is President, CEO, and Founder of Coffing Data Warehousing. He is an
internationally known consultant, facilitator, speaker, trainer, and executive coach with
an extensive background in data warehousing. Tom has helped implement data
warehousing in over 40 major data warehouse accounts, spoken in over 20 countries, and
has provided consulting and Teradata training to over 8,000 individuals involved in data
warehousing globally.
Tom has co-authored over 20 books on Teradata and Data Warehousing. To name a few:

Secrets of the Best Data Warehouses in the World


Teradata SQL - Unleash the Power
Tera-Tom on Teradata Basics
Tera-Tom on Teradata E-business
Teradata SQL Quick Reference Guide - Simplicity by Design
Teradata Database Design - Giving Detailed Data Flight
Teradata Users Guide -The Ultimate Companion
Teradata Utilities - Breaking the Barriers

Mr. Coffing has also published over 20 data warehousing articles and has been a
contributing columnist to DM Review on the subject of data warehousing. He wrote a
monthly column for DM Review entitled, "Teradata Territory". He is a nationally known
speaker and gives frequent seminars on Data Warehousing. He is also known as "The
Speech Doctor" because of his presentation skills and sales seminars.
Tom Coffing has taken his expert speaking and data warehouse knowledge and
revolutionized the way technical training and consultant services are delivered. He
founded CoffingDW with the same philosophy more than a decade ago. Centered around
10 Teradata Certified Masters this dynamic and growing company teaches every Teradata
class, provides world class Teradata consultants, offers a suite of software products to
enhance Teradata data warehouses, and has eight books published on Teradata.
Tom has a bachelor's degree in Speech Communications and over 25 years of business
and technical computer experience. Tom is considered by many to be the best technical
and business speaker in the United States. He has trained and consulted at so many
Teradata sites that students affectionately call him Tera-Tom.
Teradata Certified Master
- Teradata Certified Professional
- Teradata Certified Administrator
- Teradata Certified Developer
- Teradata Certified Designer

- Teradata Certified SQL Specialist


- Teradata Certified Implementation
Specialist

Table of Contents
Chapter 1 The Rules of Data Warehousing ................................................................... 1
Teradata Facts ..................................................................................................................... 2
Teradata: Brilliant by Design.............................................................................................. 3
The Teradata Parallel Architecture ..................................................................................... 4
A Logical View of the Teradata Architecture..................................................................... 6
The Parsing Engine (PE)..................................................................................................... 7
The Access Module Processors (AMPs)............................................................................. 8
The BYNET ........................................................................................................................ 9
A Visual for Data Layout.................................................................................................. 10
Teradata is a shared nothing Architecture ........................................................................ 11
Teradata has Linear Scalability......................................................................................... 12
How Teradata handles data access.................................................................................... 13
Teradata Cabinets, Nodes, VPROCs, and Disks............................................................... 14
LAN Connection for Network Attached Clients .............................................................. 15
Mainframe Connection to Teradata .................................................................................. 16
Chapter 2 Data Distribution Explained........................................................................ 17
Rows and Columns ........................................................................................................... 18
The Primary Index ............................................................................................................ 19
The Two Types of Primary Indexes.................................................................................. 20
Unique Primary Index (UPI)............................................................................................. 21
Non-Unique Primary Index............................................................................................... 22
How Teradata Turns the Primary Index Value into the Row Hash .................................. 23
The Row Hash determines the Rows Destination............................................................. 24
The Row is Delivered to the Proper AMP ........................................................................ 25
The AMP will add a Uniqueness Value............................................................................ 26
An Example of an UPI Table............................................................................................ 27
An Example of an NUPI Table......................................................................................... 28
How Teradata Retrieves Rows with the Primary Index.................................................... 29
Row Distribution............................................................................................................... 30
A Visual for Data Layout.................................................................................................. 31
Teradata accesses data in three ways ................................................................................ 32
Data Layout Summary ...................................................................................................... 33
Chapter 3 Teradata Space ............................................................................................ 35
How Permanent Space is calculated ................................................................................. 35
How Permanent Space is Given........................................................................................ 36
The Teradata Hierarchy .................................................................................................... 37
How Spool Space is calculated ......................................................................................... 38
A Spool Space Example.................................................................................................... 39
PERM, SPOOL and TEMP Space .................................................................................... 40
Spool Space controls system time..................................................................................... 41
A quiz on Perm and Spool Space...................................................................................... 42
Another quiz on Perm and Spool Space ........................................................................... 45

Table of Contents

Chapter 4 V2R5 Partition Primary Indexes ................................................................. 47


V2R4 Example.................................................................................................................. 48
V2R5 Partitioning ............................................................................................................. 49
Partitioning doesnt have to be part of the Primary Index ................................................ 50
Partition Elimination can avoid Full Table Scans............................................................. 51
The Bad NEWS about Partitioning on a column that is not part of the Primary Index.... 52
Two ways to handle Partitioning on a column that is not part of the Primary Index ....... 53
Partitioning with CASE_N ............................................................................................... 54
Partitioning with RANGE_N............................................................................................ 55
NO CASE, NO RANGE, or UNKNOWN........................................................................ 56
Chapter 5 Data Protection............................................................................................ 57
Transaction Concept & Transient Journal ........................................................................ 58
How the Transient Journal Works .................................................................................... 59
FALLBACK Protection .................................................................................................... 60
How Fallback Works ........................................................................................................ 61
Fallback Clusters............................................................................................................... 62
Down AMP Recovery Journal (DARJ) ............................................................................ 63
Redundant Array of Independent Disks (RAID) .............................................................. 64
Cliques .............................................................................................................................. 65
Cliques A two node example ......................................................................................... 66
Cliques A four node example ........................................................................................ 67
Permanent Journal............................................................................................................. 68
Table create with Fallback and Permanent Journaling ..................................................... 69
Locks................................................................................................................................. 70
Teradata has 4 locks for 3 levels of Locking .................................................................... 71
Locks and their compatibility ........................................................................................... 72
Chapter 6 Loading the Data ......................................................................................... 73
FastLoad............................................................................................................................ 75
FastLoad Picture ............................................................................................................... 76
Multiload........................................................................................................................... 77
Multiload Picture .............................................................................................................. 78
TPump............................................................................................................................... 79
TPump Picture .................................................................................................................. 80
FastExport ......................................................................................................................... 81
FastExport Picture............................................................................................................. 82
Chapter 7 Secondary Indexes....................................................................................... 83
Unique Secondary Index (USI)......................................................................................... 85
USI Subtable Example...................................................................................................... 86
How Teradata retrieves an USI query............................................................................... 87

II

Table of Contents
NUSI Subtable Example ................................................................................................... 88
How Teradata retrieves a NUSI query.............................................................................. 89
Value Ordered NUSI......................................................................................................... 90
How Teradata retrieves a Value Ordered NUSI query ..................................................... 91
Secondary Index Summary ............................................................................................... 92
Chart for Primary and Secondary Access ......................................................................... 93
Chapter 8 The Active Data Warehouse ....................................................................... 95
OLTP Environments ......................................................................................................... 96
The DSS environment....................................................................................................... 97
Mixing OLTP and DSS environments.............................................................................. 98
Detail Data ........................................................................................................................ 99
Easy System Administration........................................................................................... 100
Data Marts....................................................................................................................... 101
Teradata Tools - SQL Assistant...................................................................................... 102
TDQM............................................................................................................................. 103
Index Wizard................................................................................................................... 104
Archive Recovery ........................................................................................................... 105
Teradata Analyst Suite.................................................................................................... 106

III

Chapter 1

Chapter 1 The Rules of Data Warehousing

Let me once again explain the rules.


Teradata rules!
Tera-Tom Coffing
The Teradata RDBMS was designed to eliminate the technical pitfalls of data
warehousing and it is parallel processing that allows Teradata to rule this industry. The
problem with Data Warehousing is that it is so big and so complicated that there literally
are no rules. Anything goes! Data Warehousing is not for the weak or faint of heart
because the terrain can be difficult and that is why 75% of all data warehouses fail.
Teradata data warehouses provide the users with the ability to build a data warehouse for
the business without having to compromise because their database is unable to meet the
challenges and requirements of constant change. That is why 90% of all Teradata data
warehouses succeed.
Teradata allows businesses to quickly respond to changing conditions. Relational
databases are more flexible than other database types and flexibility is Teradatas middle
name. Here is how Teradata Rules:

8 of the Top 13 Global Airlines use Teradata

10 of the Top 13 Global Communications Companies use Teradata

9 of the Top 16 Global Retailers use Teradata

8 of the Top 20 Global Banks use Teradata

40% of Fortune's "US Most Admired" companies use Teradata

Teradata customers account for more than 70% of the revenue generated by the
top 20 global telecommunication companies

Teradata customers account for more than 55% of the revenue generated by the
top 30 Global retailers

Teradata customers account for more than 55% of the revenue generated by the
top 20 global airlines

More than 25% of the top 15 global insurance carriers use Teradata

Copyright Open Systems Services 2004

Page 1

Chapter 1

Teradata Facts

In the Sea of Information your Data


Warehouse Users can be powered by the
Winds of Chance or the Steam of
Understanding!
Tom Coffing
Teradata allows for maximum flexibility in selecting and using data and it therefore can
be designed to represent a business and its practices.
A data warehouse is one of the most exciting technologies of today. It can contain
Terabytes of detail data; have thousands of users, with each user simultaneously asking a
different question on any data at any time. Gathering the information is somewhat easy,
but querying the data warehouse is an art. Before you are ready to query you must first
understand the basics. This book will make it happen.
A data warehouse environment should be built with Christopher Columbus in mind.
When he set sail from Spain he did not know where he was going. When he got there he
didnt know where he was. And when he returned he didnt know where hed been. A
world-class data warehouse must understand that users will ask different questions each
and every day. A good understanding will allow users to set sail today and navigate a
different route tomorrow.
Most database vendors designed their databases around Online Transaction Processing
(OLTP) where they already knew the questions. Teradata is designed for Decision
Support where different questions arise every day. Teradata is always ready to
perform even as your environment and users change and grow.

Copyright Open Systems Services 2004

Page 2

Chapter 1

Teradata: Brilliant by Design

The man who has no imagination has no


wings.
Muhammad Ali
The Teradata database was originally designed in 1976, and it has been floating like a
butterfly and stinging like a bee ever since. It is the Muhammad Ali of data warehousing
because parallel processing is pretty and definitely The Greatest invention developed in
our computer industry. Nearly 25 years later, Teradata is still considered ahead of its
time. While most databases have problems getting their data warehouses off the ground,
Teradata provides wings to give detail data flight. Because Teradata handles the
technical issues, users can reach as far as their imagination will take them and it is the
queries that have a tendency to fly. Teradata was founded on mathematical set theory.
Teradata is easy to understand and allows customers to model the business.
In 1976, IBM mainframes dominated the computer business. However, the original
founders of Teradata noticed that it took about 4 years for IBM to produce a new
mainframe. At the same time, they also noticed a little company called Intel. Intel
created a new processing chip every 2 years. With mainframes moving forward every
4 years as compared to Intels ability to produce a new microprocessor every 2 years,
Teradata envisioned a breakthrough design that would shake the pillars of the industry.
This vision was to network several microprocessor chips together enabling them to be
able to work in parallel. This vision provided another benefit, which was that the cost of
networking microprocessor chips would be hundreds of times cheaper than a mainframe.
IBM laughed out loud! They said, Lets get this straight ... you are going to network a
bunch of PC chips together and overpower our mainframes? Thats like plowing a field
with a 1,000 chickens! In fact, IBM salespeople are still trying to dismiss Teradata as
just a bunch of PCs in a cabinet.
Even with this being stated, Teradata still believed they could produce a product that
could handle large amounts of data and achieve the impossible: replace mainframe
technology. The founders of Teradata believed in the Napoleon Bonaparte philosophy
that stated, The word impossible is not in my dictionary. The Teradata founders set
two primary goals when they designed Teradata which were:
Perform parallel processing
Accommodate Terabytes of data
In 1984, the DBC/1012 was introduced. Since then, Teradata has been the dominant
force in data warehousing.

Copyright Open Systems Services 2004

Page 3

Chapter 1

The Teradata Parallel Architecture

Fall seven times, stand up eight.


--Japanese Proverb
Teradata never falls, but it can stand up to incredible amounts of work because of parallel
processing. Most databases crumble under the extreme pressures of data warehousing.
Who could blame them with thousands of users, each asking a different question on
Terabytes of data? Most databases were born for OLTP processing, while Teradata was
born to be parallel. While most databases fall and dont get up Teradata remains
outstanding and ready for more. Teradata has been parallel processing from the
beginning which incredibly dates back to 1979 and is still the only database that loads
data in parallel, backs-up data in parallel and processes data in parallel. The idea of
parallel processing gives Teradata the ability to have unlimited users, unlimited power,
and unlimited scalability. So, what is parallel processing? Here is a great analogy.
It was 12 a.m. on a Saturday night and two friends were out on the town. One of the
friends looked at his watch and said, I have to get going. The other friend responded,
Whats the hurry? His friend went on to tell him that he had to leave to do his laundry
at the Laundromat. The other friend could not believe his ears. He responded, What!
Youre leaving to do your laundry on a Saturday night? Why dont you do it
tomorrow?. His buddy went on to explain that there were only 10 washing machines at
the laundry. If I wait until tomorrow, it will be crowded and I will be lucky to get one
washing machine. I have 10 loads of laundry, so I will be there all day. If I go now,
nobody will be there, and I can do all 10 loads at the same time. Ill be done in less than
an hour and a half.
This story describes what we call Parallel Processing. Teradata was born to be
parallel, and instead of allowing just 10 loads of wash to be done simultaneously,
Teradata allows for hundreds even thousands of loads to be done simultaneously.

Copyright Open Systems Services 2004

Page 4

Chapter 1

Tera-Tom Parallel Processing Laundry Mat

Only one customer allowed at a time

After enlightenment, the laundry


Zen Proverb

After parallel processing the laundry,


enlightenment!
Teradata Zen Proverb
Teradata was born to be parallel. The optimizer is Parallel Aware so there is always
unconditional parallelism and Teradata automatically distributes the data so each
table is automatically processed in parallel.

Copyright Open Systems Services 2004

Page 5

Chapter 1

A Logical View of the Teradata Architecture

Kites rise highest against the wind not


with it.
Sir Winston Churchill
Many of the largest data warehouses in the world are on Teradata. Teradata provides
customers a centrally located architecture. This provides a single version of the truth
and it minimizes synchronization. Having Teradata on your side is a sure win-ston. If
Churchill had been a data warehouse expert, he would agree that most data warehouses
eventually receive the blitz and stop working while Teradata has the strength from
parallel processing to never give up.
Many data warehouse environments have an architecture that is not designed for decision
support, yet companies often wonder why their data warehouse failed. The winds of
business change can be difficult and starting with the right database is the biggest key to
rising higher.

P
E
P

BYNET Network

A
M
P

A
M
P

A
M
P

A
M
P

Disk

Disk

Disk

Disk

A
M
P

A
M
P

A
M
P

A
M
P

Disk

Disk

Disk

Disk

The user submits SQL to the Parsing Engine (PE). The PE checks the syntax and then
the security and comes up with a plan for the AMPs. The PE communicates with the
AMPs across the BYNET. The AMPs act on the data rows as needed and required.

Copyright Open Systems Services 2004

Page 6

Chapter 1

The Parsing Engine (PE)

The greatest weakness of most humans is


their hesitancy to tell others how much they
love them while theyre alive.
O.A. Battista
If you havent told someone lately how much you love them you need to find a way.
Leadership through love is your greatest gift. Teradata has someone who greets you with
love with every logon. That person is the Parsing Engine (PE), which is often referred to
as the optimizer. When you logon to Teradata the Parsing Engine is waiting with tears in
its eyes and love in its heart ready to make sure your session is taken care of completely.
The Parsing Engine does three things every time you run an SQL statement.
Checks the syntax of your SQL
Checks the security to make sure you have access to the table
Builds a plan for the AMPs to follow
The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data.
The PE knows how many AMPs are in the system, how many rows are in the table, and
the best way to get to the data. The Parsing Engine is the best optimizer in the data
warehouse world because it has been continually improved for over 25 years at the top
data warehouse sites in the world.
The Parsing Engine verifies SQL requests for proper syntax, checks security, maintains
up to 120 individual user sessions, and breaks down the SQL requests into steps.

Welcome!
I will be taking
care of you this
entire session.
PE

Logon

My wish is your
Commands!

Copyright Open Systems Services 2004

Page 7

Chapter 1

The Access Module Processors (AMPs)

A true friend is one who walks in when the


rest of the world walks out.
Anonymous
The AMPs are truly mans best friend because they will work like a dog to read and write
the data. (Their bark is worse then their byte). An AMP never walks out on a friend.
The AMPs are the worker bees of the Teradata system because the AMPs read and write
the data to their assigned disks. The Parsing Engine is the boss and the AMPs are the
workers. The AMPs merely follow the PEs plan and read or write the data.
The AMPs are always connected to a singe virtual disk or Vdisk. The philosophy of
parallel processing revolves around the AMPs. Teradata takes each table and spreads the
rows evenly among all the AMPs. When data is requested from a particular table each
AMP retrieves the rows for the table that they hold on their disk. If the data is spread
evenly then each AMP should retrieve their rows simultaneously with the other AMPs.
That is what we mean when we say Teradata was born to be parallel.
The AMPs will also perform output conversion while the PE performs input
conversion. The AMPs do the physical work associated with retrieving an answer set.
The PE is the boss and the AMPs are the workers. Could you have a Teradata system
without AMPs? No who would retrieve the data? Could you have a Teradata
system without PEs? Of course not could you get along without your boss?!!!
USER SQL
Select *
FROM Order_Table
Order by Order_No;

PEs PLAN

PE

(1) Retrieve all Orders from the Order_Table.


(2) Sort them by Order_No in Ascending order.
(3) Pass the results back to me over the BYNET.

BYNET

AMP

Order_Table
Order_Item_Table

AMP

Order_Table
Order_Item_Table

Copyright Open Systems Services 2004

AMP

Order_Table
Order_Item_Table

AMP

Order_Table
Order_Item_Table

Page 8

Chapter 1

The BYNET

Not all who wander are lost.


J. R. R. Tolkien
The BYNET is the communication network between AMPs and PEs. Data and
communication never wanders and is never lost. How well does the BYNET know
communication? It is the lord of the things! How often does the PE pass the plan to the
AMPs over the BYNET? Every time it makes it a hobbit!
The PE passes the PLAN to the AMPs over the BYNET. The AMPs then retrieve the
data from their disks and pass it to the PE over the BYNET.
The BYNET provides the communications between AMPs and PEs so no matter how
large the data warehouse physically gets, the BYNET makes each AMP and PE think that
they are right next to one another. The BYNET gets its name from the Banyan tree.
The Banyan tree has the ability to continually plant new roots to grow forever.
likewise, the BYNET scales as the Teradata system grows in size. The BYNET is
scalable.
There are always two BYNETs for redundancy and extra bandwidth. AMPs and PEs can
use both BYNETs to send and retrieve data simultaneously. What a network!

The PE checks the users SQL Syntax;


The PE checks the users security rights;
The PE comes up with a plan for the AMPs to follow;
The PE passes the plan along to the AMPs over the BYNET;
The AMPs follow the plan and retrieve the data requested;
The AMPs pass the data to the PE over the BYNET; and
The PE then passes the final data to the user.

PE

PE

PE

BYNET 0
BYNET 1
AMP

AMP

AMP

AMP

Copyright Open Systems Services 2004

AMP

AMP

AMP

AMP

Page 9

Chapter 1

A Visual for Data Layout

I saw the angel in the marble and carved


until I set him free.
--Michelangelo
Teradata saw the users in the warehouse and parallel processed until it set them free.
Free to ask any question at any time on any data. The Sistine Chapel wasnt painted in a
day and a true data warehouse takes time to carve. Sculpt your warehouse with love and
caring and you will build something that will allow your company to have limits that go
well beyond the ceiling. Below is a logical view of data on AMPs. Each AMP holds a
portion of a table. Each AMP keeps the tables in their own separate drawers.

AMP 1

AMP 2

AMP 3

AMP 4

Employee
Table

Employee
Table

Employee
Table

Employee
Table

Order
Table

Order
Table

Order
Table

Order
Table

Customer
Table

Customer
Table

Customer
Table

Customer
Table

Student
Table

Student
Table

Student
Table

Student
Table

Each AMP holds a portion of every table.


Each AMP keeps their tables in separate drawers.

Teradatas Parallel Architecture and a very mature optimizer (PE) make it


completely unique.

Copyright Open Systems Services 2004

Page 10

Chapter 1

Teradata is a shared nothing Architecture

To have everything is to possess nothing.


--Buddha
Each AMP has its own processor, memory, and disk. Each AMP shares nothing with
the other AMPs. Each is connected over a network called the BYNET. This
architecture allows unlimited scalability and is called a shared nothing architecture.

To Parallel Process everything is to


Share nothing.
Tera-Tom Coffing

AMP

AMP

AMP

AMP

Memory

Memory

Memory

Memory

Disk

Disk

Disk

Disk
Customer_Table

Customer_Table

Customer_Table

Customer_Table

Order_Table

Order_Table

Order_Table

Order_Table

Employee_Table

Employee_Table

Employee_Table

Employee_Table

Dept_Table

Dept_Table

Dept_Table

Dept_Table

Copyright Open Systems Services 2004

Page 11

Chapter 1

Teradata has Linear Scalability

The most important thing a father can do


for his children is to love their mother.
- Anonymous
The most important thing a father can do for his children is to love their mother. As the
family grows so should the love. The most important thing a database can do for its data
warehouse children is grow. When a data warehouse stops growing it has reached its
potential. Teradata has the ability to start small and grow forever without losing any
power. This is called Linear Scalability.
A data warehouse should start small and focused with the end goal of evolving into an
Enterprise Data Warehouse. When your data is centralized in one area, you can take full
advantage of completely understanding and analyzing all your data. That is why it is
important to purchase a database that is ready for growth.
Anytime you want to double the speed, simply double the number of PE and AMP
processors (VPROCs). This is known as Linear Scalability. This ability to scale
permits unlimited growth potential and increased response times.

Double your AMPs and double your speed!


AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

Teradatas Linear Scalability is excellent protection for Application Development.


Because the data warehouse environment can change so rapidly, it is imperative that
Teradata has the power and capability to scale up for increased workloads without
decreased throughput.

Copyright Open Systems Services 2004

Page 12

Chapter 1

How Teradata handles data access

The PE handles session control functions

The AMPs retrieve and perform database functions on their requested rows

The BYNET sends communications between the nodes

PE

Handles its Users Sessions


Checks the Syntax
Checks Security
Builds a Plan for the AMPs
The BYNET is the
communication highway
that AMPs and PEs use.

BYNET
AMP

AMP

AMP

Order_Table

Order_Table

Order_Table

Cust_Table

Cust_Table

Cust_Table

The AMPs retrieve and


perform database
functions on their
requested rows.
Each AMP has their
own virtual disk where
the table rows they own
reside.

These statements are true about session control responsible for load balancing across the
BYNET:

The Parser checks statements for proper syntax.

The Optimizer (PE) develops a new and separate plan to determine the best
response.

The Dispatcher takes steps from the parser and transmits them over the
BYNET.

Copyright Open Systems Services 2004

Page 13

Chapter 1

Teradata Cabinets, Nodes, VPROCs, and Disks

The best way to predict the future is to


create it.
- Sophia Bedford-Pierce
Teradata predicted data warehousing 20 years before its time by creating it. Who could
have imagined 100 Terabyte systems back in the 1970s? Teradata did and created an
architecture that can scale indefinitely.
In the picture below we see a Teradata cabinet with four nodes. Each node has two
Intel processors of lightning speed. Inside each nodes memory are the AMPs and PEs
which are referred to as Virtual Processors or VPROCs. Each node is attached to both
BYNETs and each node is attached directly to a set of disks. Each AMP then has one
virtual disk where it stores its tables and rows. If you want to expand your system then
merely buy another Node Cabinet and another Disk Cabinet.

System Mgmt Chassis


BYNET 0
Disk Array Cabinet
D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C
Dual Power

Copyright Open Systems Services 2004

BYNET 1
Node 1
PEs

Memory

AMPs

Node 2
PEs

Memory

AMPs

Node 3
PEs

Memory

AMPs

Node 4
PEs

Memory

AMPs

Dual Power

Page 14

Chapter 1

LAN Connection for Network Attached Clients


To connect to Teradata from a LAN the physical connections are PC to network to
ETHERNET card to Gateway Software to PE. The software needed for the client (users
PC) is CLI, MOSI, and MTDP. CLI talks directly to Teradata. MTDP tells Teradata
information about the client so Teradata can bring back the answer set to the correct PC
in the proper format. MOSI is used as the networking software. The Gateway is like a
gatekeeper for all LAN connected users. The Gateway is where logons are enabled or
disabled for LAN users. To Enable logons just use the ENABLE LOGONS command.
The three software components on a Teradata node are the AMP, PE, and PDE
software. These are referred to as VPROCs. Access Module Processors (AMPs), Parsing
Engines (PEs), and Parallel Database Extensions (PDE).

NODE 1

PEs
AMPs
NODE 2

PEs
AMPs
NODE 3

PEs
AMPs
NODE 4

PEs
AMPs

G
A
T
E
W
A
Y
S
O
F
T
W
A
R
E

CLI
MOSI
MTDP

E
T
H
E
T
H

LAN

E
T
H
E
T
H

CLI
MOSI
MTDP

CLI
MOSI
MTDP

To connect to a Teradata network host you need:


PE
Gateway software
Ethernet Card
You can also attach your LAN connections directly to the node. Often customers connect
two different LAN connections for redundancy.
Copyright Open Systems Services 2004

Page 15

Chapter 1

Mainframe Connection to Teradata


To connect to Teradata from a Mainframe the physical connections are Mainframe to
ESCON Connection or BUS/TAG Cables and then to a Host Channel Adapter and then to
a dedicated Parsing Engine (PE). The software needed for the client (mainframe) is CLI
and the Teradata Director Program (TDP). CLI talks directly to Teradata. TDP tells
Teradata information about the client so Teradata can bring back the answer set to the
correct terminal in the proper format.
You attach a mainframe host connection directly to a Teradata node and users can
access Teradata via the mainframe. All you need to make that happen is an ESCON
channel, Host Channel Adapter and a Parsing Engine (PE).

Mainframe Connection to Teradata

NODE 1

PEs
AMPs

Bus/Tag Cables

Host
Adapter

NODE 2

PEs
AMPs
NODE 3

PEs
AMPs

CLI
TDP

ESCON Channel
Host
Adapter
ESCON Channel

NODE 4

PEs
AMPs

Bus/Tag Cables

Host
Adapter

Host
Adapter

CLI
TDP

To connect to Teradata via a MAINFRAME you need:


ESCON/BUS TAG CABLES
Host Adapter
PE

Copyright Open Systems Services 2004

Page 16

Chapter 2

Chapter 2 Data Distribution Explained

There are three keys to selling real estate.


They are location, location, and location.
Teradata knows a little about real estate because the largest and best data warehouses
have been sold to the top companies in countries around the world. This is because
Teradata was meant for data warehousing. When Teradata is explained to the business
and they ask if they are interested in a purchase the word most often used is SOLD!

There are three keys to how Teradata


spreads the data among the AMPs. They
are Primary Index, Primary Index, and
Primary Index.
Every time I begin to teach a data warehousing class, an experienced Teradata manager
or DBA will come up to me and say, Please explain to the students the importance of the
Primary Index.. The Primary Index of each table lays out the data on the AMPs!

Copyright Open Systems Services 2004

Page 17

Chapter 2

Rows and Columns

I never lost a game; time just ran out on


me.
Michael Jordan
Michael Jordan never lost a game; time just ran out on him; however, many data
warehouses lose their game because managing the data can become so intense that life
turns into sudden-death double overtime. Teradata allows the data to be placed by the
system and not the DBA. Talk about a slam dunk!

EMP DEPT LNAME

FNAME

SAL

UPI
1
2
3
4

40
20
20
?

BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY

95000.00
70000.00
55000.00
34000.00

Teradata stores its information inside Tables. A table consists of rows and columns. A
row is one instance of all columns. According to relational concepts column positions
are arbitrary and a column always contains like data. Teradata does not care what
order you define the columns and Teradata does not care about the order of rows in a
table. Rows are arbitrary also, but once a row format is established then Teradata will
use that format because a Teradata table can have only one row format.
There are many benefits of not requiring rows to be stored in order. Unordered data
does not have to be maintained to preserve the order. Unordered data is independent
of the query.

ROW

40 Brown Chris

95000

Every AMP will hold a portion of every table. Rows are sent to their destination AMP
based on the value of the column designated as the Primary Index.

Copyright Open Systems Services 2004

Page 18

Chapter 2

The Primary Index

Alone we can do so little; together we can


do so much.
Helen Keller
Helen Keller may have been blind, but she saw so much more then the rest of us. Can
you imagine living in a world of such darkness, yet becoming such a shining light?
Helen Keller was the ultimate leader and she helped millions realize that they should
continue to always learn, and that the journey of life is the ultimate destination.
Teradata uses the Primary Index of each table to provide a row its destination to the
proper AMP. This is why each table in Teradata is required to have a Primary Index.
The biggest key to a great Teradata Database Design begins with choosing the correct
Primary Index. The Primary Index will determine on which AMP a row will reside.
Because this concept is extremely important, let me state again that the Primary Index
value for a row is the only thing that will determine on which AMP a row will reside.
Many people new to Teradata assume that the most important concept concerning the
Primary Index is data distribution. INCORRECT! The Primary Index does determine
data distribution, but even more importantly, the Primary Index provides the fastest
physical path to retrieving data. The Primary Index also plays an incredibly important
role in how joins are performed. Remember these three important concepts of the
Primary Index and you are well on your way to a great Physical Database Design.

The Primary Index plays 3 roles:


Data Distribution
Fastest Way to Retrieve Data
Incredibly important for Joins
What needs to be known prior to selecting the Primary Index to ensure excellent
distribution? Columns that define the index. If they are unique or nearly unique then
Teradata will spread the data evenly.

Copyright Open Systems Services 2004

Page 19

Chapter 2

The Two Types of Primary Indexes

A man who chases two rabbits


catches none.
Roman Proverb
Every table must have at least one column as the Primary Index. The Primary Index is
defined when the table is created. There are only two types of Primary Indexes, which
are a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI).

A man who chases two rabbits misses both


by a HARE! A person who chases two
Primary Indexes misses both by an ERR!
Tera-Tom Proverb
Every table must have one and only one Primary Index. Because Teradata distributes the
data based on the Primary Index columns value it is quite obvious that you must have a
primary index and that there can be only one primary index per table.
The Primary index is the Physical Mechanism used to retrieve and distribute data. The
primary index is limited to the number of columns in the primary index. This means
that the primary index is comprised totally of all the columns in the primary index.
You can have up to 16 multi-column keys comprising your primary index or as little
as one column as your primary index..
Most databases use the Primary Key as the physical mechanism. Teradata uses the
Primary Index. There are two reasons you might pick a different Primary Index then
your Primary Key. They are (1) for Performance reasons and (2) known access
paths.

Copyright Open Systems Services 2004

Page 20

Chapter 2

Unique Primary Index (UPI)

Always remember that you are unique just


like everyone else.
Anonymous
A Unique Primary Index (UPI) is unique and cant have any duplicates. It is as unique
as you are. Nobody is like you and you are extremely beautiful and amazing. Not one
other person in the history of mankind has ever been exactly like you. You are the
creation of your beautiful parents and must realize how important you are to the world.
A Unique Primary Index is not as amazing as you are, but it is also special.
A Unique Primary Index means that the values for the selected column must be unique.
If you try and insert a row with a Primary Index value that is already in the table, the row
will be rejected. A Unique Primary Index will always spread the table rows evenly
amongst the AMPs. Please dont assume this is always the best thing to do. Below is a
table that has a Unique Primary Index. We have selected EMP to be our Primary Index.
Because we have designated EMP to be a Unique Primary Index, there can be no
duplicate employee numbers in the table.

Employee Table

EMP DEPT LNAME

FNAME

SAL

UPI
1
2
3
4

40
20
20
?

BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY

95000.00
70000.00
55000.00
34000.00

A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst
the AMPs. UPI access is always a one-AMP operation. It also requires no duplicate
row checking.

Copyright Open Systems Services 2004

Page 21

Chapter 2

Non-Unique Primary Index

You miss 100 percent of the shots you


never take.
Wayne Gretzky
Take a shot at using a Non-Unique Primary Index in your Teradata tables. A NonUnique Primary Index (NUPI) means that the values for the selected column can be
non-unique. You can have many rows with the same value in the Primary Index. A
Non-Unique Primary Index will almost never spread the table rows evenly. Please
dont assume this is always a bad thing. Below is a table that has a Non-Unique Primary
Index. We have selected LNAME to be our Primary Index. Because we have designated
LNAME to be a Non-Unique Primary Index we are anticipating that there will be
individuals in the table with the same last name.

EMP DEPT LNAME

FNAME

SAL

NUPI
1
2
3
4

40
20
20
?

BROWN CHRIS
JONES
JEFF
NGUYEN XING
BROWN SHERRY

95000.00
70000.00
55000.00
34000.00

A Non-Unique Primary Index (UPI) will


almost NEVER spread the rows of the table
evenly amongst the AMPs.
A Non-Unique Primary Index (NUPI) will contain like data. There can be more than
one row with the same Primary Index value because it is non-unique.
An All-AMP operation will take longer if the data is unevenly distributed. You might
pick a NUPI over an UPI because the NUPI column may be more effective for query
access and joins.

Copyright Open Systems Services 2004

Page 22

Chapter 2

How Teradata Turns the Primary Index Value into


the Row Hash
The Primary Index is the only thing that determines where a row will reside. It is
important that you understand this process. Here are the fundamentals in the simplest
form. When a new row arrives into Teradata, the following steps occur:
Teradatas PE examines the Primary Index value of the row.
Teradata takes that Primary Index value and runs it through a Hashing Algorithm.
The output of the Hashing Algorithm (i.e., a formula) is a 32-bit Row Hash.
The 32-bit Row Hash will perform two functions:
1. The 32-bit Row Hash will point to a certain spot on the Hash Map, which will
indicate which AMP will hold the row.
2. The 32-bit Row Hash will always remain with the Row as part of a Row
Identifier (Row ID).
Hashing is a mathematical process where an Index (UPI, NUPI) is converted into a 32bit row hash value. The key to this hashing algorithm is the Primary Index. When this
value is determined, the output of this 32-bit value is called the Row Hash.

New Teradata
Row
Hash the
PI Value

PI
Value
EMP DEPT LNAME FNAME SAL
------ ------- ---------- ---------- ------99
10
Hosh
Roland 50000

99 / HASH FORMULA =
00001111000011110000111100001111

A new row is going to be inserted into Teradata. The Primary Index is the column called
EMP. The value in EMP for this row is 99. Teradata runs the value of 99 through the
Hash Formula and the output is a 32-bit Row Hash. In his example our 32-bit Row Hash
output: 00001111000011110000111100001111.

Copyright Open Systems Services 2004

Page 23

Chapter 2

The Row Hash determines the Rows Destination


The first 16 bits of the Row Hash (a.k.a., Destination Selection Word) are used to
locate an entry in the Hash Map. This entry is called a Hash Map Bucket. The only thing
that resides inside a Hash Map Bucket is the AMP number where the row will reside.

Row Hash 00001111000011110000111100001111


1
4
3
2

2
1
4
3

3
2
1
4

4 1 2
3 4 1
2 3 4
1 2 3

3
2
1
4

Four AMP Hash Map

The first 16 bits of the Row Hash of 00001111000011110000111100001111 are used to


locate a bucket in the Hash Map. A bucket will contain an AMP number. We now know
that employee 99 whose row hash is 00001111000011110000111100001111 will reside
on AMP 4. Note: The AMP uses the entire 32 bits in storing and accessing the row.
If we took employee 99 and ran it through the hashing algorithm again and again, we
would always get a row hash of 00001111000011110000111100001111.
If we take the row hash of 00001111000011110000111100001111 again and again, it
would always point to the same bucket in the hash map.
The above statement is true about the Teradata Hashing Algorithm. Every time employee
99 is run through the hashing algorithm, it returns the same Row Hash. This Row Hash
will point to the same Hash Bucket every time. That is how Teradata knows which AMP
will hold row 99. It does the math and it always gets what it always got!
Hash values are calculated using a hashing formula.
The Hash Map will change if you add additional AMPs.

Copyright Open Systems Services 2004

Page 24

Chapter 2

The Row is Delivered to the Proper AMP


Now that we know that Employee 99 is to be delivered to AMP 4, Teradata packs up the
row, places the Row Hash on the front of the row, and delivers it to AMP 4.

AMP 4

Row Hash

EMP DEPT LNAME


------ ------- ---------00001111000011110000111100001111 99
10
Hosh

FNAME SAL
---------- ------Roland 50000

The entire row for employee 99 is delivered


to the proper AMP accompanied by the Row
Hash, which will always remain with the
row as part of the Row ID.
Review:

A row is to be inserted into a Teradata table

The Primary Index Value for the Row is put into the Hash Algorithm

The output is a 32-bit Row Hash

The Row Hash points to a bucket in the Hash Map

The bucket points to a specific AMP

The row along with the Row Hash are delivered to that AMP

Copyright Open Systems Services 2004

Page 25

Chapter 2

The AMP will add a Uniqueness Value


When the AMP receives a row it will place the row into the proper table, and the AMP
checks if it has any other rows in the table with the same row hash. If this is the first row
with this particular row hash, the AMP will assign a 32-bit uniqueness value of 1. If this
is the second row hash with that particular row hash, the AMP will assign a uniqueness
value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row
ID. The Row ID is how tables are sorted on an AMP.

AMP 4
Uniqueness
Value
Row Hash
00001111000011110000111100001111

EMP DEPT LNAME FNAME SAL


------ ------- ------------ ---------- ------99
10
Hosh
Roland 50000

The Row Hash and the Uniqueness Value = Row ID

The Row Hash always accompanies when an


AMP receives a row.
The AMP will then assign a Uniqueness
Value to the Row Hash. It assigns a 1 if the
Row Hash is unique or a 2 if it is the second
or a 3 if the third, etc.

Copyright Open Systems Services 2004

Page 26

Chapter 2

An Example of an UPI Table


Below is an example of a portion of a table on one AMP. The table has a Unique
Primary Index of EMP.

AMP 4
Uniqueness
Value
Row Hash
00001111000011110000111100001111
01010101010101010000000000000000

01010111111111111111111111111111
11111111111111111100000000000000

1
1
1
1

EMP DEPT LNAME FNAME SAL


------ ------- ------------ ---------- ------99
10
Hosh
Roland 50000

21

10

Wilson

Barry

75000

20

Holland

Mary

86000

44

30

Davis

Sandy

54000

The above Employee Table has a Unique Primary Index on the column EMP. Notice that
Row ID sorts the portion of the table on AMP 4. Notice that the Uniqueness Value for
each row is 1.

Copyright Open Systems Services 2004

Page 27

Chapter 2

An Example of an NUPI Table


Below is an example of a portion of a table on one AMP. The table has a Non-Unique
Primary Index (NUPI) on the Last Name called LNAME.

AMP 4
Uniqueness
Value
Row Hash
00000000000000000000000000111111
00000000000000000000000000111111

00000000000000000000000000111111
11111111110000000000000000000000

1
2
3

NUPI

EMP DEPT LNAME FNAME SAL


------ ------- ------------ ---------- ------Davis
Roland 150000
65
20

77

10

Davis

Sara

75000

20

Davis

Mary

86000

10

Allen

Sandy

54000

The above Employee Table has a Non-Unique Primary Index on the column LNAME.
Notice that each row with the LNAME of Davis has the exact same Row Hash. Notice
that the Uniqueness Value for each Davis is incremented by 1.
Each time the LNAME is Davis, the Hashing Algorithm generates the Row Hash:
000000000000000000000000000011111
That Row Hash points to the exact same bucket in the Hash Map. This particular bucket
in the Hash Map references (or points to) AMP 4.
The Row Hash accompanied each row to AMP 4. The AMP assigned Uniqueness Values
of 1, 2 and 3 to the three rows with the LNAME of Davis.
Notice that Row ID sorts the portion of the table on AMP 4.

Copyright Open Systems Services 2004

Page 28

Chapter 2

How Teradata Retrieves Rows with the Primary


Index
In the example below, a user runs a query looking for information on Employee 99. The
PE sees that the Primary Index Value EMP is used in the SQL WHERE clause. Because
this is a Primary Index access operation, the PE knows this is a one AMP operation. The
PE hashes 99 and the Row Hash is 00001111000011110000111100001111. This points
to a bucket in the Hash Map that represents AMP 4. AMP 4 is sent a message to get the
Row Hash: 00001111000011110000111100001111 and make sure its EMP 99.

SQL
SELECT *
FROM Employee
WHERE EMP = 99;

PE

99 / HASH Formula
Row Hash 00001111000011110000111100001111
1

2 3 4

1 2 3

4
3

1 2 3 4 1 2
4 1 2 3 4 1

3 4 1

2 3 4

Four AMP Hash Map

AMP 4

Row Hash
00001111000011110000111100001111
01010101010101010000000000000000

01010111111111111111111111111111
11111111111111111100000000000000

Copyright Open Systems Services 2004

1
1
1
1

EMP DEPT LNAME FNAME SAL


------ ------- ------------ ---------- ------99
10
Hosh
Roland 50000
21

10

Wilson

Barry

75000

20

Holland

Mary

86000

44

30

Davis

Sandy

54000

Page 29

Chapter 2

Row Distribution
In the examples below we see three different Teradata Systems. The first system has
used Last_Name as a Non-Unique Primary Index (NUPI). The second example has used
Sex_Code as a Non-Unique Primary Index (NUPI).
The last example uses
Employee_Number as a Unique Primary Index (UPI).

Example # 1

Non -Unique Primary Index using Last Names

AMP AMP

**

Davis
Davis
Woods
Example # 2

Jones
Rex

Male
Male
Male

*
*

Smith
Johnson
Smith

AMP
Kelly
Kelly
Hanson
Hanson
Tess

Non -Unique Primary Index using Employee Sex Code

AMP AMP

Example # 3

AMP

AMP

AMP

Female
Female
Female

Unique Primary Index using Employee Number

AMP AMP
1
5
77

Copyright Open Systems Services 2004

22
9
15

AMP

AMP

13
99
2

34
16
4

Page 30

Chapter 2

A Visual for Data Layout


Below is a logical view of data on AMPs. Each AMP holds a portion of a table. Each
AMP keeps the tables in their own separate drawers. The Row ID is used to sort each
table on an AMP.

AMP 1

AMP 2

AMP 3

AMP 4

Employee
Table

Employee
Table

Employee
Table

Employee
Table

Order
Table

Order
Table

Order
Table

Order
Table

Customer
Table

Customer
Table

Customer
Table

Customer
Table

Student
Table

Student
Table

Student
Table

Student
Table

Each AMP holds a portion of every


table.
Each AMP keeps their tables in separate
drawers.
Each table is sorted by Row ID.

Copyright Open Systems Services 2004

Page 31

Chapter 2

Teradata accesses data in three ways

Primary Index (fastest)


Secondary Index (second fastest way)
Full Table Scan (slowest way)

Primary Index (fastest) - When ever a Primary Index is utilized in the SQL WHERE
Clause the PE will be able to use the Primary Index to get the data with a one-AMP
operation.
Secondary Index (next fastest) - If the Primary Index is not utilized sometimes Teradata
can utilize a secondary index. It is not as fast as the Primary Index, but it is much faster
than a full table scan.
Full Table Scan (FTS) (Slowest)
Teradata handles full table scans brilliantly because Teradata accesses each data row
only once because of the parallel processing. Full Table Scans are a way to access
Teradata without using an index. Each data block per table is read only once.

AMP
Sal
Emp Dept Name
99
10 Vu Du 55000
88
20 Sue Lou 59000
75
30 Bill Lim 76000
AMP
Sal
Emp Dept Name
45
10 Ty Law 58000
56
20 Kim Hon 57000
83
30 Jela Rose 79000

AMP
Sal
Emp Dept Name
22
10 Al Jon 85000
38
40 Bee Lee 59000
25
30 Kit Mat 96000
AMP
Sal
Emp Dept Name
44
40 Sly Win 85000
57
40 Wil Mar 59000
93
10 Ken Dew 96000

When Teradata does a Full Table Scan of the above how many
rows are read? 12 How many per AMP? 3

Copyright Open Systems Services 2004

Page 32

Chapter 2

Data Layout Summary


Teradata lays out data totally based on the Primary Index value. If the Primary Index is
Unique, the data layout will be spread equally among the AMPs. If the Primary Index is
Non-Unique, the data distribution across the AMPs may be skewed. A Non-Unique
Primary Index is acceptable if the data values provide reasonably even distribution.
Every table must have a Primary Index and it is created at CREATE TABLE time. When
users utilize a Unique Primary Index in the WHERE clause of their query, the query will
be a one-AMP operation. Why?
A Unique Primary Index value only uses one AMP to return at most one row.
A Non-Unique Primary Index value also uses one AMP to return zero to many rows. The
same values run through the Hashing Algorithm will return the exact same Row Hash.
Therefore, like values will go to the same AMP. The only difference will be the
Uniqueness Value.
Every row in a table will have a Row ID. The Row ID consists of the Primary Index
Value Row Hash and the Uniqueness Value.

Primary Number Rows


Index of AMPs Returned
UPI

NUPI

Copyright Open Systems Services 2004

0-1
0-Many

Page 33

Chapter 3

Copyright Open Systems Services 2004

Page 34

Chapter 3

Chapter 3 Teradata Space


How Permanent Space is calculated

No one is so generous as he who has


nothing to give.
French Proverb
If you dont have any perm you are at the Merci of the DBA. There are three types of
space in Teradata and they are Perm, Spool, and Temp. It all starts with Perm Space.
Users will most likely not have any Perm space because Perm is for Permanent tables,
secondary indexes, and the Permanent Journals. If a user is given Perm space it is not
allocated immediately, but is an upper limit of space for their tables.
Teradata permanent space is calculated by adding up all the available space on an AMPs
attached disks and that is the size of your Teradata warehouse. When a system is
delivered the user DBC owns all the Permanent Space. It is important to remember that
Teradata was born to be parallel so Teradata always calculates all space on a per AMP
basis. In the pictures below we see that the original system was 100 Gigabytes. DBC
owned all 100 Gigabytes. Since there are four AMPs then we actually calculate the space
as 25 Gigabytes per AMP.

100 Gigabytes
DBC Owns 100%
of the Permanent
Space of a new system
If DBC owns 100 Gigabytes of Perm Space
then it actually owns 25 GB (per AMP)
on a 4-AMP system because all space is
Calculated on a per AMP basis.
AMP

AMP

AMP

AMP

25 GB

25 GB

25 GB

25 GB

Copyright Open Systems Services 2004

Page 35

Chapter 3

How Permanent Space is Given

The Constitution only gives people the


right to pursue happiness. You have to
catch it yourself.
Ben Franklin
The Teradata Constitution states, We the Users, in order to form a more perfect UNION,
or INTERSECT, establish SQL as the holder of Truths to be self joined, that all users are
created equal, EXCEPT power users, with certain unalienable Access Rights, that among
these SELECTed are Life, Liberty, and the Pursuit of Managements Happiness. The
real truth is that when a system is delivered the user DBC owns all the Permanent Space
so forget about We the People. DBC sits on top of the hierarchy. It is up to DBC to
give up some of its Perm Space to others so it is Ye the DBC.
Until this system changes size there will always be 100 Gigabytes of PERM Space. It
will merely be owned by multiple users or databases. Permanent space defines the
upper limit of space and it is not allocated at Table Create time.

100 Gigabytes
DBC Owns 100%
of the Permanent
Space of a new system
If DBC gives 40 Gigabytes of Perm Space to MRKT

100 Gigabytes
DBC Owns 60 GB
MRKT Owns 40 GB

Copyright Open Systems Services 2004

Page 36

Chapter 3

The Teradata Hierarchy

In the end well remember not the words of


our enemies, but the silence of our friends.
Martin Luther King, Jr.
One of the greatest human beings of all-time in our opinion was Dr. Martin Luther King,
Jr. who had a dream. Teradata has a dream that all users can be judged not by the color
of their skin, but by the characters in their SQL.
Teradata is hierarchical in nature. Anyone above you in the hierarchy is your parent or
owner. Anyone below you is your child. The key point is that anytime you give away
some of your Permanent Space you lose it until the child is dropped or gives it back. It is
like money. If you give it away you have less in your account. Give it all away and you
are broke! Also notice that the total Permanent Space in the system below is still 100
Gigabytes.
Permanent Space is where objects (i.e., databases, users, tables) are created and stored.
Permanent Space is released when data is deleted or when objects are dropped.
Permanent space defines the upper limit of space for a database or user.

100 Gigabytes

What would the Hierarchy


Look like if DBC created SALES
With 10 Gigabytes of Perm
And MRKT created Advertising
And gave them 20 GB of Perm?

DBC Owns 60 GB
MRKT Owns 40 GB

DBC
10 GB

SALES

50 GB
MRKT

Advertising

Copyright Open Systems Services 2004

20 GB
20 GB

Page 37

Chapter 3

How Spool Space is calculated

Its not the size of the dog in the fight, but


the size of the fight in the dog.
Archie Griffin
Spool space is a wonderful thing unless the query goes to Hies-man! Each user who runs
queries is allocated a certain amount of Spool Space for the query answer set. If your
answer set runs past its spool limit the query is aborted. Some users logon twice thinking
they can trick the system, but spool is calculated on a user basis and once you are over
you are stopped at the goal line. Running out of spool space makes users mad dog mean.
Spool space is literally unused permanent space. Spool space is system wide so
anywhere there is empty PERM space it can be used for spool. It is important to
remember that Teradata was born to be parallel so Teradata always calculates all space
on a per AMP basis. The difference in how you calculate PERM Space is radically
different than Spool Space. PERM Space always totals the total space available in the
system. Each database or user might own a portion of the space, but what counts is
whether or not the space has been filled with Tables, Secondary Index subtables or
Permanent Journals.
Spool and Temp space are nothing more than unused PERM. If tables are not filling the
disks then users can utilize this empty space for their Spool and Temp space. A user will
run out of spool space if they exceed their limit on a per AMP basis.

Tables

Tables

Tables

Tables

The total amount of PERM allocated is always 100%.


However, the actual loading of the tables took up 50% of the disks.
There is 50% of the disks available system wide for SPOOL.

Copyright Open Systems Services 2004

Page 38

Chapter 3

A Spool Space Example

Speak in a moment of anger and youll


deliver the greatest speech youll ever
regret.
Anonymous
When you are angry hold your tongue and keep your cool. When your query aborts hold
your tongue and raise your spool. It is important to remember that Teradata was born to
be parallel so Teradata always calculates all space on a per AMP basis. The difference
in how you calculate PERM Space is radically different than Spool Space. PERM Space
always totals the total space available in the system. The total amount of spool space is
whatever is left over from PERM once the tables have been loaded. Remember, that if
you totaled everyones spool it could be a thousand times more then the total perm. This
is because it is assumed not everyone will be logged on at the same time.
In our example below we have 3 users in MRKT. Each could be assigned the maximum
amount of Spool Space that MRKT is assigned and that is 20GB. Each could run their
queries simultaneously and all could be just under 20GB and the system would not care.
Spool space is an upper limit for your query answer sets. You dont add or subtract when
you are giving someone else spool. The total amount of spool in the system will always
exceed the actual Perm space. There are two times you run out of Spool Space. When
the system is completely out of free space or when your query exceeds its spool limit.

Sales is assigned
10 Gigabytes of Spool
MRKT is assigned
20 Gigabytes of Spool
USER 1

USER 3

USER 2

How much spool space can be assigned to the users in MRKT? Could they each run a
query simultaneously that reached 19.5 Gigabytes of spool? Yes!
Copyright Open Systems Services 2004

Page 39

Chapter 3

PERM, SPOOL and TEMP Space

Every sunrise is a second chance.


Unknown
There is a tribe in Africa that awakes during darkness and prays for the sun to come up.
They have been doing so for thousands of years and every day their prayers are answered.
We all owe them a debt of gratitude because every sunrise is a second chance. There are
times when you might feel down, but dont forget to give yourself a second chance.
Teradata will always give you another chance if you run out of Perm, Spool or Temp.
A user or database is assigned at least two types of space. They are Permanent Space and
Spool Space. PERM space is used to store tables and most users wont get any Perm.
SPOOL space is for users to run their queries and every user gets Spool. If your query
exceeds your allocated Spool space you will need a second chance because your query
is immediately aborted. For users who want to utilize Global Temporary Tables another
space is used and it is called TEMP space.
A user who is assigned no Permanent Space cant create tables in their user space.
They could however create a view, macro, or trigger because these objects dont use
Perm space.

Permanent Space is used for Tables, Secondary Indexes, and Permanent


Journals.
Spool Space is intermediate query results.
Temp Space is intermediate query space for Global Temporary Tables.

A
M
P

A
M
P

A
M
P

A
M
P

PERM SPACE
SPOOL SPACE
TEMP SPACE
Copyright Open Systems Services 2004

Page 40

Chapter 3

Spool Space controls system time

Danger Will Robinson Danger!


-Robot on 1960s TV Show Lost in Space
Permanent Space and Spool space were designed to control how data warehouse space is
allocated and how long a users query can run. Spool space limits are designed to handle
run away queries and control how much time a users query can run before it is deemed
Hogging the system. When a query is run the result set is produced on the AMPs disk
in a Spool file until it is ready to be transmitted over the BYNET to the PE which then
passes the answer to the user. If a user exceed their spool limit by one byte the query is
immediately aborted. Not every user has the same amount of spool space. Power users
are often given more spool then someone who is a new user.

Danger Your Query has exceeded its


limit and will be aborted before it does some
ROBBING SON!
-Abort Button of 1507s TV Show Lost in Spool Space
Spool Space comes from PERM space that has not been allocated. Spool space is unused
PERM. The primary reason to have SPOOL space available is to store intermediate and
final results of queries that are being processed in Teradata. Spool Space is released
when the query is over or when the query no longer needs it. Spool space is
Permanent Space that is not currently being used.
Temporary Space is Permanent Space that is not currently being used. Some users
may be assigned Temporary Space which defines the upper limit of space that the user
can utilize in Global Volatile tables.

Copyright Open Systems Services 2004

Page 41

Chapter 3

A quiz on Perm and Spool Space

MRKT starts with 10,000,000 bytes of Perm and


10,000,000 bytes of Spool
SALES starts with 5,000,000 bytes of Perm and
5,000,000 bytes of Spool
Steve and Mandy are then created with 1,000,000
bytes of Perm and 10,000,000 bytes of spool each

MRKT
10,000,000 Bytes of Perm
10,000,000 Bytes of Spool

STEVE
1,000,000 Bytes of Perm
10,000,000 Bytes of Spool

SALES
5,000,000 Bytes of Perm
5,000,000 Bytes of Spool

Mandy
1,000,000 Bytes of Perm
10,000,000 Bytes of Spool

The New Hierarchy looks like this


MRKT
STEVE

SALES

Mandy

Copyright Open Systems Services 2004

Page 42

Chapter 3

Once Steve and Mandy are created:


(1) How much Perm in MRKT? ___________
(2) How much Spool in MRKT? ___________

If Steve is Given to SALES then:


(3) How much Perm and Spool in MRKT NOW?
__________________ ___________________
(4) How much Perm and Spool in Steve?
__________________ __________________
(5) How much Perm and Spool in SALES?
__________________ __________________

If Steve is then Dropped from the System:


(6) How much Perm and Spool in MRKT NOW?
__________________ ___________________
(7) How much Perm and Spool in SALES?
_________________

Copyright Open Systems Services 2004

__________________

Page 43

Chapter 3

Answers:

(1) How much Perm in MRKT? __8 M______


(2) How much Spool in MRKT? __10 M_____

If Steve is Given to SALES then:


(3) How much Perm and Spool in MRKT NOW?
_______8M _________

_______10M_________

(4) How much Perm and Spool in Steve?


________1M________ ________10M_______
(5) How much Perm and Spool in SALES?
________5M________ _________5M_______

If Steve is then Dropped from the System:


(6) How much Perm and Spool in MRKT NOW?
________8M________ _______10M__________
(7) How much Perm and Spool in SALES?
_______6M________

Copyright Open Systems Services 2004

_______5M___________

Page 44

Chapter 3

Another quiz on Perm and Spool Space

A system has 200 Gigabytes of Space and User A is


assigned 60 Gigabytes of permanent space. User A
gives User B 40 Gigabytes of permanent space.
How much space will User A have left?
20 Gigabytes
A system has 200 Gigabytes of permanent space in
the system. 100 Gigabytes is reserved for spool. The
system currently has 60 Gigabytes of user data. How
much could be left for spool?
140 Gigabytes

Copyright Open Systems Services 2004

Page 45

Chapter 4

Copyright Open Systems Services 2004

Page 46

Chapter 4

Chapter 4 V2R5 Partition Primary Indexes

Life is a succession of lessons, which must be


lived to be understood.
--Ralph Waldo Emerson
As Ralph Waldo Emerson once said, Life is a succession of lessons, which must be
lived to be understood. Teradata has lived and understood the data warehouse
environment for decades over their competitors. One of the key fundamentals of the
V2R5 release is in the ability to allow the AMPs to access data quicker with Partition
Primary Indexes.
In the past Teradata has hashed the Primary Index, which produced a Row Hash. From
the Row Hash, Teradata was able to send the row to a specific AMP. The AMP would
place a uniqueness value and the Row Hash plus the Uniqueness value made up the Row
ID. The data on each AMP was grouped by table and sorted by ROW ID.
Through years of experience working with data warehouse user queries Teradata has
decided to take the hashing to an additional level.
In the past you could choose a Unique Primary Index (UPI) or a Non-Unique Primary
Index (NUPI). Now Teradata will let you choose either a Partition Primary Index (PPI)
or a Non-Partition Primary Index (NPPI).
This allows for fantastic flexibility because user queries will often involve ranges or are
specific to a particular department, location, region, or code of some sort. Now the
AMPs can find the data quicker because the data is grouped in alphabetical order. You
can avoid Full Table Scans more often.
An example is definitely called for here. I will show you a table that is hashed and
another that has a Partition Primary Index.

Copyright Open Systems Services 2004

Page 47

Chapter 4

V2R4 Example
If you are on a V2R4 machine then each table is distributed to the AMPs based on
Primary Index Row Hash and then sorted on that AMP by Row ID. The example below
is also a Non-Partitioned Primary Index in V2R5.

An Example of Teradata V2R4


AMP 1

AMP 2

Order Table

Order Table

Row
Hash

Order
Date

01
05
08
09
80
87
98

2-1-2003
1-1-2003
3-1-2003
1-2-2003
1-5-2003
2-4-2003
3-2-2003

Order
Number
99
88
95
6
77
14
17

Row
Hash
02
04
12
42
52
55
88

Order
Date
2-2-2003
1-10-2003
3-5-2003
1-6-2003
3-6-2003
2-5-2003
1-22-2003

Order
Number
44
53
16
100
35
15
74

Primary Index is Order Date


Notice that the Primary Index is ORDER_DATE. The Order_Date was hashed and rows
were distributed to the proper AMP based on Row Hash and then sorted by Row Id.
Unfortunately the query below results in a full table scan to satisfy the query.

SELECT * FROM Order_Table


WHERE Order_Date between 1-1-2003 and 1-31-2003;

Copyright Open Systems Services 2004

Page 48

Chapter 4

V2R5 Partitioning
Notice that the Primary Index is now a Partition Primary Index on ORDER_DATE. The
Order_Date was hashed and rows were distributed to the same exact AMP as before. The
only difference is that the data in partitions of Order Date Months and then sorted by
Row Hash. The query below does not take a Full Table Scan because the January orders
are all together in their partition. Partitioned Primary Indexes (PPI) are best for
queries that specify range constraints.

An Example of PPI on Teradata V2R5


AMP 1

AMP 2

Order Table

Order Table

Row Order
Hash Date
05
09
80
01
87
08
98

1-1-2003
1-2-2003
1-5-2003
2-1-2003
2-4-2003
3-1-2003
3-2-2003

Order
Number
88
6
77
99
14
95
17

Row Order
Hash Date
04
42
88
02
55
12
52

1-10-2003
1-6-2003
1-22-2003
2-2-2003
2-5-2003
3-5-2003
3-6-2003

Order
Number
53
100
74
44
15
16
35

Partition Primary Index is Order_Date


SELECT * FROM Order_Table
WHERE Order_Date between 1-1-2003 and 1-31-2003;

Copyright Open Systems Services 2004

Page 49

Chapter 4

Partitioning doesnt have to be part of the Primary


Index

A Journey of a thousand miles begins with a


single step.
-Lao Tzu
Understanding Teradata begins with a single step and that is reading and understanding
this book. You will soon be a Teradata master and that is quite an accomplishment.
Understanding Partitioning is easy once you understand the basic steps. You do not have
to partition by a column that is the primary index. Here is an example:

CREATE SET TABLE EMPLOYEE_TABLE


(
EMPLOYEE
INTEGER NOT NULL
,DEPT
INTEGER
,FIRST_NAME
VARCHAR(20)
,LAST_NAME
CHAR(20)
,SALARY
DECIMAL(10,2)
)
PRIMARY INDEX (EMPLOYEE)
PARTITION BY DEPT;
Non-Unique
Primary Index
You can NOT have a UNIQUE PRIMARY INDEX on a table that is partitioned by
something not included in the Primary Index.
Remember, data is never distributed based on the partition. Data is only distributed
based on the Primary Index of a table (even if it is a PPI Table).

Copyright Open Systems Services 2004

Page 50

Chapter 4

Partition Elimination can avoid Full Table Scans

AMP 1

Part 1

Part 2

AMP 2

Employee_Table

Employee_Table

Employee Dept First_Name

Employee Dept First_Name

99
75
56
30
54
40

10
10
10
20
20
20

Tom
Mike
Sandy
Leona
Robert
Morgan

13
12
21
16
55
70

10
10
10
20
20
20

Ray
Jeff
Randy
Janie
Chris
Gareth

Partition Primary Index is Dept


How many partitions on each AMP will need to be read for the following query?

SELECT *
FROM Employee_Table
WHERE Dept = 20;
Answer: 1
Partition Primary Indexes reduce the number of rows that are processed by using
partition elimination.

Copyright Open Systems Services 2004

Page 51

Chapter 4

The Bad NEWS about Partitioning on a column


that is not part of the Primary Index
Before you get too excited about partitioning by a column that is not part of the primary
index you should remember The Alamo. This is because when you run queries that
dont mention the PARTITION COLUMN in your SQL you have to check every
partition and this can be some serious battle. Partitions can range from 1-65,535
partitions. The example below will have to check every partition so be careful.

AMP 1

Part 1

Part 2

AMP 2

Employee_Table

Employee_Table

Employee Dept First_Name

Employee Dept First_Name

99
75
56
30
54
40

10
10
10
20
20
20

Tom
Mike
Sandy
Leona
Robert
Morgan

13
12
21
16
55
70

10
10
10
20
20
20

Ray
Jeff
Randy
Janie
Chris
Gareth

Partition Primary Index is Dept


SELECT *
FROM Employee_Table
WHERE employee = 99
GREAT Things about Partition Primary Indexes:
PPI avoids full table scans without the overhead of a secondary index and allows for
instantaneous dropping of old data and rapid addition of newer data.
Remember these rules: A Primary KEY cant be changed. A Primary Index always
distributes the data. A Partition Primary Index (PPI) partitions data to avoid full
table scans.

Copyright Open Systems Services 2004

Page 52

Chapter 4

Two ways to handle Partitioning on a column that


is not part of the Primary Index
You have two ways to handle queries when you partition by a column that is not part of
the Primary Index.
a. You can assign a Unique Secondary Index (when appropriate).
b. You can include the partition column in your SQL.

Example 1:

CREATE UNIQUE INDEX (Employee)


On Employee_Table
SELECT *
FROM Employee_Table
WHERE employee = 99

Example 2:

SELECT *
FROM Employee_Table
WHERE employee = 99
AND Dept = 10;
Partition is Dept

In all examples above only one partition would need to


be read.
Copyright Open Systems Services 2004

Page 53

Chapter 4

Partitioning with CASE_N


You Cracked the Case Honey
Vinny My cousin Vinny
Teradata now allows you to crack the CASE statement as a partitioning option. Here are
the fundamentals:
Use of CASE_N results in:

Just like the CASE statement it evaluates a list of conditions picking only the first
condition met.

The data row will be placed into a partition associated with that condition.

CREATE TABLE Order_Table


(
Order_Number
Integer NOT NULL
,Customer_Number Integer NOT NULL
,Order_Date
Date
,Order_Total
Decimal (10,2)
)
PRIMARY INDEX(Customer_Number)
PARTITION BY CASE_N
(Order_Total < 1000
,Order_Total < 5000
,Order_Total < 10000
,Order_Total < 50000, NO Case, Unknown
);
Note: We cant have a Unique Primary Index (UPI) here because we are
partitioning by ORDER_TOTAL and ORDER_TOTAL is not part of the Primary
Index.
Data Distribution of a Partitioned Primary Index table is based only on the Primary
Index.

Copyright Open Systems Services 2004

Page 54

Chapter 4

Partitioning with RANGE_N


Teradata also has a western theme because they allow your partitions to go Home on the
Range by using the RANGE_N function. Here are the fundamentals. Use of
RANGE_N results in:

The expression is evaluated and associated to one of a list of ranges.

Ranges are always listed in increasing order and cant overlap

The data row is placed into the partition that falls within the associated range.

The test value in RANGE_N function must be an INTEGER or DATE. (This


includes BYTEINT or SMALLINT)

In the example below please notice the arrows. They are designed to illustrate that you
can use a UNIQUE PRIMARY INDEX on a Partitioned table when the Partition is part
of the PRIMARY INDEX.

CREATE TABLE Order_Table


(
Order_Number
Integer NOT NULL
,Customer_Number Integer NOT NULL
,Order_Date
Date
,Order_Total
Decimal (10,2)
)UNIQUE PRIMARY INDEX
(Customer_Number, Order_Date)
PARTITION BY Range_N
(Order_Date
BETWEEN DATE 2003-01-01
AND 2003-06-30
EACH INTERVAL 1 DAY
);
Copyright Open Systems Services 2004

Page 55

Chapter 4

NO CASE, NO RANGE, or UNKNOWN

We only have one person to blame, and


thats each other
Barry Beck, NY Ranger, on who started a fight during a hockey game
In no case does no range have anything to do with a New York Ranger, but if you specify
NO Case, NO Range, or Unknown with partitioning their will be no fighting amongst the
partitions. These keywords tell Teradata which penalty box or partition to place bad data.
You can specify a NO CASE or NO RANGE Partition as well as a partition for
UNKNOWN.
A NO CASE or NO RANGE partition is for any value, which isnt true for any previous
CASE_N or RANGE_N expression.
If UNKNOWN is included as part of the NO CASE or NO RANGE option with an OR
condition, then any values that are not true for any previous CASE_N or RANGE_N
expression and any unknown (e.g., NULL) will be put in the same partition. This
example has a total of 5 partitions.

Partition by CASE_N
( Salary < 30000,
Salary < 50000,
Salary < 100000
Salary < 1000000,
NO CASE OR UNKNOWN)
If you dont see the OR operand associated with UNKNOWN then NULLs will be placed
in the UNKNOWN Partition and all other rows that dont meet the CASE criteria will be
placed in the NO CASE partition. This example has a total of 6 partitions.

Partition by CASE_N
( Salary < 30000,
Salary < 50000,
Salary < 100000
Salary < 1000000,
NO CASE, UNKNOWN)

Copyright Open Systems Services 2004

Page 56

Chapter 5

Chapter 5 Data Protection


"Age

does not protect you from love. But


love, to some extent protects you from age. "
-Jeanne Moreau, French Actress
As a man was driving down the interstate highway, his cell phone rang. When he
answered he heard his wife warn him urgently, "George, I just heard on the news that
there's a car going the wrong way on I-26!" George replied, "I'm on I-26 right now and
it's not just one car. It's hundreds of them!"
How do you protect your data when things go the wrong way? Murphy s Law states,
The more mission critical a data warehouse, the more likely the system will crash at
the most critical moment of the mission. Ironically, most DBAs think Murphy was an
optimist.
A database not prepared to defend itself is like an unsigned contract. It is not worth the
paper it is written on. However, Teradata is always prepared and it will protect your
data better than a wild pit bull. As a matter of fact, the difference between Teradata and
a pit bull is that eventually the pit bull will get bored and let go.
System and user errors are inevitable in any large system. For example, an associate
may accidentally give everyone a 100% raise instead of a 10% raise. Or, what if a
million-dollar transaction fails right at the wrong time? Or an AMP or DISK goes
down? In any of these cases, Teradata will have many ways to protect your data. Some
processes for protection are automatic and some of them are optional.
The protection features we will discuss are:

Transaction Concept
Transient Journal
FALLBACK
RAID
Clustering
Cliques
Permanent Journaling

Copyright Open Systems Services 2004

Page 57

Chapter 5

Transaction Concept & Transient Journal

The afternoon knows what the morning


never suspected.
- Swedish Proverb
At any time something could go wrong with a transaction. An old proverb suggests,
The afternoon often knows what the morning never suspected, likewise the
Transient Journal knows what the transaction never suspected.
What good would it do if you could gather, store and analyze terabytes of data, but
doubted the integrity of the data? Teradata makes every effort to ensure a database
doesnt get corrupt. Fundamental to this assurance is the Transaction Concept, which
means that an SQL statement is viewed as a transaction. Simply stated, either it works
or it fails.

The Transient Journal knows what the


Transaction never suspected.
Swedish Proverb after a rollback
The Transient Journals job is to ensure if an insert, update, or delete fails, then the
rows affected can be reverted back to their original state. This is called a Rollback.
In Teradata, all SQL statements are considered transactions. This applies whether you
have one statement or multiple statements executing (MACRO). If all SQL statements
cannot be performed successfully, the following happens:

The user receives immediate feedback in the form of a failure message;


The entire transaction is rolled back, and any changes made to the database are
reversed;
Locks are released
Spool files are discarded

The Transient Journal is automatic and it takes a before picture of any update or delete
for rollback purposes.

Copyright Open Systems Services 2004

Page 58

Chapter 5

How the Transient Journal Works

Beware of the young doctor and the old


barber.
- Benjamin Franklin
Wouldnt it be great if every time you got a haircut, the barber or stylist took a picture of
your hairdo before they cut a single strand? Then after he or she cut your hair, asked if
you liked it? If you didnt like it, then you could ask to have it restored? Well, that is
what the Transaction Journal does. If a row is going to change because of an INSERT,
UPDATE, or DELETE, it takes a BEFORE picture. If the transaction fails, then the
journal restores it to the way it was.
The TRANSIENT JOURNAL is an automatic system function. It is not optional. The
BEFORE image is actually stored in the AMPs Transient Journal. Every AMP has a
transient journal that is maintained in DBCs PERM space. If the transaction is aborted
for any reason, the AMP restores the data to match the before-image stored in the
Transient Journal. The data will then revert to its original state. When a transaction is
successful, the PE and the AMPs shake hands on it and the Transient Journal is wiped
clean. The handshake is called the COMMIT. After a COMMIT, all the AMPS have a
party to celebrate, and the user is invited to join in the festivities! In other words,
Transaction Journal Cleanliness is next to Godliness. If it is clean, then things went
good!
The Transient Journal provides two system events that occur automatically to ensure data
integrity. An automatic rollback of changed rows occurs in the event of a transaction
failure. This is done because before images are retained on each AMP as changes
occur. Data is always returned to its original state after a transaction failure.

AMP
Transient
Journal

AMP
Transient
Journal

AMP

AMP

Transient
Journal

Transient
Journal

Before picture of row being changed

Transient Journal
Camera

Employee Department First


Number
Name
Number
99
10
Bill

Copyright Open Systems Services 2004

Last
Name
Davis

Salary
78,000

Page 59

Chapter 5

FALLBACK Protection

United we stand divided we fall.


-Circular letter, Boston during the American Revolution
FALLBACK is a table protection feature used in case an AMP fails. Fallback is similar
to mirroring in that a duplicate copy of a row is created and maintained on another
AMP for redundancy purposes. Essentially, anytime you define a table with Fallback
you are using twice the space. You can use FALLBACK on all tables, some tables or
no tables. You can also create a table with or without FALLBACK and then add or
drop the feature at any time.

Divided we stand united we Fallback.


-AMP during the computer revolution
Fallback is similar to mirroring in that it creates and maintains a duplicate copy of each
row, but it is designed in a revolutionary manner for performance purposes. With
mirroring if one disk goes down another duplicate disk takes over. Fallback however
will take all the rows that one AMP is responsible for in a fallback protected table and
store them on multiple AMPs. If the AMP fails then multiple AMPs will be
responsible for delivering the failed AMPs rows.

We have the right to bear arms.


-2nd amendment of the constitution
Teradata believes its constitution is to protect the data and so a duplicate copy is always
maintained on another AMP.

We have no access rights to bare amps.


-2nd amendment of the Teradata constitution

Copyright Open Systems Services 2004

Page 60

Chapter 5

How Fallback Works

Its deja vu all over again!


-Yogi Berra
Fallback is like dj vu all over again because when a table is fallback protected the rows
are duplicated on other AMPs. Fallback is similar to mirroring, but different. The
similarities is that both provide a duplicate copy, but the difference is that Fallback places
copies of its rows on multiple AMPs so if a failure occurs Teradata can use the
parallelism to help the failed AMP.
Below is a diagram of four AMPs holding a base table. For examples sake, lets assume
that the base table is the Employee Table. There are 12 employees with employee
numbers ranging from 1 to 12. The data is spread evenly in the table with each AMP
responsible for 3 employees.
The Employee Table has been created with Fallback, so each row of the base table is
duplicated on another AMP in the Fallback Table. Notice three very important features:
(1) No base table row is on the same AMP with its Fallback protected duplicate copy.
(2) Each AMP spreads their Fallback rows evenly to multiple AMPs.
(3) The perm space used for the table is double because of the fallback
The system can lose any single AMP or Disk in this system. If multiple AMPs or Disks
fail in the picture below then Teradata wont be able to run queries that ask for all the
data.

1
5
9

2
6
10

3
7
11

4
8
12

Base
Table
Rows

10
7
4

1
11
8

5
2
12

9
6
3

Fallback
Rows

Copyright Open Systems Services 2004

Page 61

Chapter 5

Fallback Clusters
Fallback is always associated with CLUSTERS. Fallback can be specified at the
table level. Fallback is worth the price because when an AMP fails users still have
access to the data even while the AMP is offline. Any data that has changed is
automatically restored during the AMP offline period.
If we can lose any one AMP/disk, what happens if we lose two? The chance of losing
two AMPs in a four-AMP system is rare, however some systems have nearly 2,000
AMPs. Therefore, the chance of losing two AMPs in a 2,000 AMP system is much
greater than in a four-AMP system. Thats why Teradata designed Clustering. With
Clustering, Teradata can lose one AMP/disk per cluster. Lets look at this next
example with 8 AMPs in two clusters.
Notice that the data in the base table lays out evenly with 24 records on 8 AMPs. What
is key to notice is that the fallback copy remains within the cluster. In other words, the
base table rows in cluster one are fallback protected within cluster one. The base table
rows in cluster two are fallback protected within cluster two. We can lose one
AMP/disk in both cluster one and cluster two and the system is fine.

Cluster # 1

1
9
17

2
10
18

3
11
19

4
12
20

Base
Table
Rows

18

1
19
12

9
2
20

17
10
3

Fallback
Rows

5
13
21

6
14
22

7
15
23

8
16
24

Base
Table
Rows

22
15
8

5
23
16

13
6
24

21
14
7

Fallback
Rows

11
4

Cluster # 2

Copyright Open Systems Services 2004

Page 62

Chapter 5

Down AMP Recovery Journal (DARJ)

Once the game is over, the king and the


pawn go back in the same box.
- Italian Proverb
The Down AMP Recovery Journal (DARJ) is started on all AMPs in the cluster when
an AMP is down. This allows for three AMPs to check on their mate. Since there are
four AMPs in most clusters and all Fallback for a particular AMP remains within the
cluster there are Three AMPs that will hold Fallback rows for a down AMP.
The Down AMP Recovery Journal (DARJ) is a special journal used only for
FALLBACK rows when an AMP is not working. Like the TRANSIENT JOURNAL,
the DARJ, also known as the RECOVERY JOURNAL, gets it space from the DBCs
PERM space. When an AMP fails, the rest of the AMPs in its cluster initiate a DARJ.
The DARJ keeps track of any changes that would have been written to the failed AMP.
When the AMP comes back online, the DARJ will catch-up the AMP by completing
missed transactions. Once everything is caught-up the DARJ is dropped.

Cluster # 1

1
9
17

18

11
4

Down AMP Recovery Journal (DARJ)


DARJ

DARJ

DARJ

2
10
18

3
11
19

4
12
20

Base
Table
Rows

1
19
12

9
2
20

17
10
3

Fallback
Rows

DARJ Example for Online Catch-Up


Updated Base 1 Salary = 92000
Updated Base 9 Last_Name = Smith
Updated Fallback 4 Dept_no = 10

Cluster # 2

5
13
21

6
14
22

7
15
23

8
16
24

Base
Table
Rows

22
15
8

5
23
16

13
6
24

21
14
7

Fallback
Rows

Copyright Open Systems Services 2004

Page 63

Chapter 5

Redundant Array of Independent Disks (RAID)

I know that you believe that you


understand what you think I said, but I am
not sure you realize that what you heard is
not what I meant.
-Sign on Pentagon office wall
RAID never gets confused. It always knows exactly what the disk said and it mirrors it
exactly! Redundant Array of Independent Disks (RAID) protects against a disk failure.
There are many levels of RAID in the data storage industry. The most common level,
and one that is used by Teradata, is RAID-1, also called Transparent MIRRORING.
With RAID-1, each primary disk has a mirror image, or an exact copy of all its data on
another disk. The contents of both disks are identical. Each AMP has one virtual disk
meaning only that AMP can access it disks, but there are actually four physical disks.
When data is written on the primary disk, it is also written on the mirror disk. The down
side of RAID-1, like FALLBACK, is that it requires a 100% overhead of disk space.
RAID 1 data is mirrored across paired disks. RAID 5 Data and Parity are stripped
across a rank of disks. Data is reconstructed on a disk failure. Fallback and RAID 1
provide the highest level of protection.

A
M
P

Disk Array Controller


Data

Mirror

Data

Mirror

2 Ben Hon 2 Ben Hon


10 Don Roy 10 Don Roy

Four Physical Disks


One Virtual Disk

Copyright Open Systems Services 2004

Page 64

Chapter 5

Cliques
Teradata CLIQUES (pronounced cleeks) are a method of system protection against the
failure of an entire node. Each node contains in memory AMP VPROCs. Each AMP is
attached to one virtual disk (Vdisk) and that AMP is the only Vproc allowed access to
its Vdisk. A Clique utilizes access to a set of disks from another node. If a node fails the
AMP VPROCs can migrate to the node that has the backup access to its virtual disk. The
migrating AMP can continue to read and write to its Vdisk while its home node is down.
When the home node is fixed and available again the VPROCs return home.
If a Teradata system uses two-node cliques then when one node fails all of its AMP
VPROCs migrate to the other node. The system is now about 50% slower. To solve this
problem Teradata allows bigger cliques such as eight nodes. If one node fails, its
VPROCs split up and migrate amongst the seven other nodes in the clique without much
performance degradation.

If a node fails, Vprocs can migrate to the other node


NODE 1
INTEL

NODE 2

BYNET

INTEL

INTEL

BYNET

AMPs

INTEL

AMPs

Clique
Cables

D
A
C

D
A
C

Clique
Cables

D
A
C

D
A
C

And still have access to their Virtual Disks (Vdisks)

Copyright Open Systems Services 2004

Page 65

Chapter 5

Cliques A two node example


During a node failure All AMPs migrate from the failing node to another node within
the clique. Vdisks can still be accessed by their AMPs during a failover.
Cliques help protect node failures, but have nothing to do with how the data is spread
across the AMPs. In a two, three, or four node CLIQUE Teradata system data is spread
across all AMPs in a system.
In the same way, when a node goes down the software AMPs and PEs migrate over the
BYNET to a temporary home on another node.

Node 1 Fails so Vprocs Migrate


NODE 1
INTEL

NODE 2

BYNET

INTEL

INTEL

X
D
A
C

to Node 2

INTEL

AMPs

BYNET
Clique
Cables

D
A
C

Clique
Cables

D
A
C

D
A
C

And still have access to their Virtual Disks (Vdisks)

Copyright Open Systems Services 2004

Page 66

Chapter 5

Cliques A four node example


Below is an example of a four node clique. If a node goes down the VPROCs in the
failed node will distribute evenly among the remaining nodes in the clique. Degradation
is minimal.
Access to all data is maintained during a failover, yet performance degradation is
inversely proportional to clique size. The bigger the clique the less the performance
degradation.

4-Node Clique Cable

System Mgmt Chassis


BYNET 0

Disk Array Cabinet


D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C

D
A
C
Dual Power

Copyright Open Systems Services 2004

BYNET 1
Node 1
PEs

Memory

AMPs

Node 2
PEs

Memory

AMPs

Node 3
PEs

B
Y
N
E
T

Memory

AMPs

Node 4
PEs

Memory

AMPs

Dual Power

Page 67

Chapter 5

Permanent Journal

The absent are always in the wrong.


English Proverb
If a system had five million rows and used FALLBACK protection, then it would have
five million FALLBACK rows. However, this would be quite costly because
FALLBACK actually stores a duplicate copy of all the rows on other AMPs within the
same cluster. FALLBACK is used either because the system is mission critical or the
system is not backed up regularly. For customers who backup data regularly, another
option for data restoration is the Permanent Journal. When a company is not severely
impacted by a couple of hours for a restoration to be completed, this is a very good
option. The Permanent Journal works in conjunction with backup procedures, plus its a
lot more cost effective than FALLBACK.

The absent are always in the write.


Permanent Journal Proverb
The Permanent Journal stores only images of rows that have been changed due to
an INSERT, UPDATE, or DELETE command. That is why when data is lost or
absent the permanent journal can write it back to the disks. The permanent journal keeps
track of all new, deleted or modified data since the last Permanent Journal backup. This
option is usually less expensive than storing the additional five million FALLBACK
rows.
Like FALLBACK, the Permanent Journal is optional. It may be used on specific tables
of your choosing or on no tables at all. It provides the flexibility to customize a Journal
to meet specific needs. The Permanent Journal must be manually purged from time to
time.
There are four image options for the Permanent Journal:
Before Journal
After Journal
Dual Before Journal
Dual After Journal

Copyright Open Systems Services 2004

Page 68

Chapter 5

Table create
Journaling

with

Fallback

and

Permanent

The example created the table called Employee in the Teratom database, and is
FALLBACK protected. A BEFORE Journal and a DUAL AFTER Journal are specified.
Remember that both FALLBACK and JOURNALING have defaults of NO - meaning
if you dont specify this protection at either the table or database level the default is NO
FALLBACK and NO JOURNALING.

CREATE TABLE Teratom.employee,

FALLBACK,
BEFORE JOURNAL,
DUAL AFTER JOURNAL
(
emp
,dept
,lname
,fname
,salary
,hire_date

INTEGER
INTEGER
CHAR(20)
VARCHAR(20)
DECIMAL(10,2)
DATE

)
UNIQUE PRIMARY INDEX(emp);

Copyright Open Systems Services 2004

Page 69

Chapter 5

Locks

Some birds arent meant to be caged, their


feathers are just too bright. And when they
fly away, the part of you that knows it was a
sin to lock them up, does rejoice.
Shawshank Redemption
You dont lock up a bird, but you always lock a query. Teradata uses a lock manager to
automatically lock at the database, table or row hash level. Teradata will lock objects
using four types of locks:
Exclusive - Exclusive locks are placed only on a database or table when the object is
going through a structural change. An Exclusive lock restricts access to the object by any
other user. This lock can also be explicitly placed using the LOCKING modifier.
Write - A Write lock happens on an INSERT, DELETE, or UPDATE request. A Write
lock restricts access by other users. The only exception is for users reading data that are
not concerned with data consistency and override the applied lock by specifying an
Access lock. This lock can also be explicitly placed using the LOCKING modifier.
Read - This is placed in response to a SELECT request. A Read lock restricts access by
users who require Exclusive or Write locks. This lock can also be explicitly placed using
the LOCKING modifier. Read locks put the word integrity in data integrity. If you
have a multi-user environment with updates occurring and you need to keep data
consistent, you want a read lock.
Access - Placed in response to a user-defined LOCKING FOR ACCESS phrase. An
Access lock permits the user to access to READ an object that may already be locked for
READ or WRITE. An access lock does not restrict access by another user except when
an Exclusive lock is required. A user requesting access cannot be concerned with data
consistency.
When Teradata locks a resource for a user the lifespan of the transaction lock is forever
or until the user releases the lock.
This is different then a deadlock situation.
youngest query is always aborted.
Copyright Open Systems Services 2004

If two transactions are deadlocked the

Page 70

Chapter 5

Teradata has 4 locks for 3 levels of Locking

When you go into court you are putting


your fate into the hands of twelve people
who werent smart enough to get out of jury
duty.
- Norm Crosby
Teradata uses a lock manager to be judge, jury, and executioner of SQL. There are four
locks placed on objects at the database, table, or row hash level.

Exclusive Lock
Database
Write Lock
Table
Read Lock

Row Hash

Access Lock

Copyright Open Systems Services 2004

Page 71

Chapter 5

Locks and their compatibility

Frankly, my dear, I dont give a damn.


- Rhett Butler Gone with the Wind (1939)
Not everyone is compatible and Teradata locks are no exception. Locks that are
compatible can lock the same object simultaneously. Clark Gable would have been a
great Teradata user because he always used a Rhett Lock and according to Scarlet was
almost never Write!
Locks that are compatible can share access to objects simultaneously so READ locks
are great because one or a thousand users can read the same object at the same time.
Teradata will not allow a user to change a table while others are reading it. This prevents
database corruption.

Teradata Lock

Compatible Locks

Exclusive Lock

No Compatibility

Write Lock

Access Lock

Read Lock

Read Lock
Access Lock

Access Lock

Read Lock
Write Lock
Access Lock

An ACCESS Lock is an excellent way to avoid waiting for a write lock currently on a
particular table. Two statements allow this:
Locking Row for Access
Locking Tablename for Access
Copyright Open Systems Services 2004

Page 72

Chapter 6

Chapter 6 Loading the Data

I dont know who my grandfather was. I


am more interested in who his grandson will
become.
Abraham Lincoln, 12th president of the United States
My son once told me he did not feel like studying. I said to him, When Abraham
Lincoln was your age, he studied by candlelight. My son retorted, When Abraham
Lincoln was your age, he was president.
Data within a warehouse environment is often historic in nature, so the sheer volume of
data can overwhelm many systems. But, not Teradata!

Abraham Lincoln will go down as one of


the greatest presidents in history, but
Teradata is even better because it will not
go down when it loads history.
Tom Coffing, 1st president of Coffing Data Warehousing
Teradata is so advanced in the data-loading department that other database vendors cant
hold a candle to it. A Teradata data warehouse brings enormous amounts of data into the
system. This is an area that most companies overlook when purchasing a data warehouse.
Most company officials think loading of data is simply that just loading data. Some
people actually ask, Are data loads that critical? Come on, ASCII stupid question and
get a stupid ANSI.
Data warehouses fail because customer cannot load the data fast enough once it reaches a
certain volume. As one Teradata developer said, It is not the load that brings them
down, but the way they carry it. Even an experienced body builder must use a good
technique to lift the weight over his head. While most database vendors are new to the
data warehouse game, Teradata has had 15 years of experience of loading the largest data

Copyright Open Systems Services 2004

Page 73

Chapter 6
warehouses in the world. The combination of FastLoad, MultiLoad, and TPump can load
millions, even billions of records in record time.
FastLoad is designed to load flat file data from a mainframe or LAN directly into an
empty Teradata table. This is how a Teradata table is populated the first time. I have
personally seen Teradata load over one billion large rows in less than 6 hours. Plus, I
have seen Teradata load millions of rows in minutes. How is Teradatas speed and
performance accomplished? Once again its through the power of parallel processing.
Where FastLoad is meant to populate empty tables with INSERTs, MultiLoad is meant to
process INSERTs, UPDATEs, and DELETEs on tables that have existing data.
MultiLoad is extremely fast. One major Teradata data warehouse company processes
120 million inserts, updates, and deletes nightly during its batch window.
The TPump utility is designed to allow OLTP transactions to immediately load into a
data warehouse. When I started working with Teradata, more than 10 years ago, most
companies loaded data on a monthly basis. Suddenly, companies began to load data
weekly.
Today, most companies load data nightly, and industry leaders are loading data hourly.
TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines
OLTP transactions with the power of a Decision Support System (DSS).
The TPump utility theoretically acts like a water faucet. TPump can be set to full throttle
to load millions of transactions during off peak hours or turned down to trickle small
amounts of data during the data warehouse daily rush hour. It can also be automatically
preset to load levels at certain times during the day, and can be modified at any time.
Also, TPump locks at a row level so users have access to the rest of the rows while the
table is being loaded. Another advantage of this load utility is that it allows for multiple
updates to be conducted on a table simultaneously.
When the utilities start, the Parsing Engine comes up with a plan for the AMPs. The
Parsing Engine then steps back and lets the AMPs do their work. The data is loaded in
large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of
workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from
AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they
received and hash the Primary Index value sending the rows over the BYNET to their
destination AMP. Once this is done, each AMP sorts its data by Row ID and the table is
ready for business.

Copyright Open Systems Services 2004

Page 74

Chapter 6

FastLoad

If you are all wrapped up in yourself, you


are overdressed
Kate Halverson
The Teradata FastLoad utility is wrapped up in your data and even though it appears
under dressed without fancy dressings it is one of the best utilities every built. It may not
be dressed to kill, but it is designed to thrill!
FastLoad is actually designed to load flat file data from a mainframe or LAN directly
into an empty Teradata table. This is how a Teradata table is populated the first time. I
have personally seen Teradata load over one billion large rows in less than 6 hours. Plus,
I have seen Teradata load millions of rows in minutes. Teradata has the quickest time to
solution, and has the most powerful performance in the data warehousing industry.
How is Teradatas speed and performance accomplished? Its done through parallel
processing.
FastLoad understands one SQL command - INSERT. It inserts rows into an empty table.
The process is as follows: A flat file is prepared for loading on a mainframe or LAN.
The FastLoad utility needs three pieces of information to process: where the flat file
located, what is its file definition, and what table the data should be loaded into in
Teradata.
When the FastLoad utility starts, the Parsing Engine comes up with a plan for the AMPs.
The Parsing Engine then steps back and lets the AMPs do their work. The data is loaded
in large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of
workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from
AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they
received, hash the rows in those blocks (in parallel) and send the rows to the proper AMP
over the BYNET. Once this is done, each AMP sorts its data by Row ID and the table is
ready for business. FastLoad Basics:

Loads data to Teradata from a Mainframe or LAN flat file;


Only one table may be loaded at a time;
The table to be loaded must be empty;
There can be no secondary indexes, referential integrity, or triggers;
It locks at the table level.

FastLoad populates empty tables at the block level. Teradata LOADs using FastLoad.

Copyright Open Systems Services 2004

Page 75

Chapter 6

FastLoad Picture

Input File from


Mainframe or LAN
64K Block
64K Block
64K Block
64K Block

Teradata
PE
AMP

AMP

AMP

AMP

Fastload inserts into empty tables at the Block Level.


No Secondary Indexes, Referential Integrity or Triggers allowed.
AMP

Empty
Table

AMP

Empty
Table

Copyright Open Systems Services 2004

AMP

Empty
Table

AMP

Empty
Table

Page 76

Chapter 6

Multiload

No wonder nobody comes here Its too


crowded
Yogi Berra
Tera-Tom has actually had dinner with Yogi and he was a real pleasure. As an AllAmerican Athlete who placed third in the NCAAs for the University of Arizona in 1979
Tera-Tom got to spend some time with Yogi. Yogi is a lot like Multiload. He is fast on
his feet, is extremely versatile, and he knows a little bit about clean-up. Multiload can
handle the high heat or the curve when inserting, updating or deleting data.
Where FastLoad is meant to populate empty tables with INSERTS, Multiload is meant to
process INSERTS, UPDATES, and DELETES on tables that have existing data.
Multiload is extremely fast. One major Teradata data warehouse company processes 120
million inserts, updates, and deletes during its nightly batch.
Multiload works similar to FastLoad. Data originates as a flat file on either a mainframe
or LAN. When the Multiload utility is executed, the Parsing Engine creates a plan for the
AMPs to follow. The data is then passed to the AMPs, in parallel, in 64K blocks, and the
AMPs hash the rows to the proper AMP. Last, the INSERTS, UPDATES, and
DELETES are applied.
In the previous diagram the mainframe/LAN is talking to the Parsing Engine. The PE
passes the data across the BYNET for the AMPs to retrieve. Keep in mind, many
systems have hundreds to thousands of AMPs. The load takes place, continually, in
parallel when the 64K packets are delivered to the AMPs. Multiload has been designed
for users who have a need for speed. Multiload locks at the table level. Therefore,
while Multiload is running, the table is unavailable unless users utilize an Access Lock.
Multiload Basics:

Loads data to Teradata from a Mainframe or LAN flat file;


Up to 20 INSERTS, UPDATES, or DELETES may be executed on up to 5 tables;
Receiving tables are usually populated;
There can be no Unique secondary indexes, referential integrity, or triggers;
It locks at the table level.

Multiload loads to populated tables at the block level. Teradata UPDATEs using
MULTILOAD.

Copyright Open Systems Services 2004

Page 77

Chapter 6

Multiload Picture

Input File from


Mainframe or LAN
64K Block
64K Block
64K Block
64K Block

Teradata
PE
AMP

AMP

AMP

AMP

Multiload inserts, updates, upserts and deletes rows into


populated tables at the Block Level. It does not allow Triggers,
Unique Secondary Indexes (USIs) or Referential Integrity.
AMP

AMP

AMP

AMP

Populated
Table

Populated
Table

Populated
Table

Populated
Table

Copyright Open Systems Services 2004

Page 78

Chapter 6

TPump

You dont drown by falling into the water;


you drown by staying in the water.
-Edwin Louis Cole
The TPump utility is designed to allow OLTP transactions to immediately load into a
data warehouse. When I started working with Teradata, more than 10 years ago, most
companies loaded data on a monthly basis. Suddenly, companies began to load data
weekly. Today, most companies load data nightly, and industry leaders are loading data
hourly. TPump is the beginning step of an Active Data Warehouse (ADW). ADW
combines OLTP transactions with a Decisions Support System (DSS).
If the data is not flowing, a company can drown in it! The utility is called TPump because
it theoretically acts like a water faucet. TPump can be set to full throttle to load millions
of transactions during off peak hours or turned down to trickle small amounts of data
during the data warehouse rush hour. It can also be automatically preset to load different
levels at certain times during the day, and can be modified at any time.
Also, TPump locks at a row level so users have access to the rest of the rows while the
table is being loaded.
Basics:

Loads data to Teradata from a Mainframe or LAN flat file;


Processes INSERTS, UPDATES, or DELETES;
Tables are usually populated;
It can have secondary indexes, triggers, and referential integrity;
It locks at the row level.

TPump is used for continuous updates to rows in a table. Teradata STREAMs using
TPump.

Copyright Open Systems Services 2004

Page 79

Chapter 6

TPump Picture

Input File from


Mainframe or LAN
Packets
Packets
Packets
Packets

Teradata
PE
AMP

AMP

AMP

AMP

Tpump inserts, updates, upserts and deletes rows into


populated tables at the Row Level. It supports Triggers,
all Secondary Indexes and Referential Integrity.
AMP

AMP

AMP

AMP

Populated
Table

Populated
Table

Populated
Table

Populated
Table

Row Level
Locks

Row Level
Locks

Row Level
Locks

Row Level
Locks

Copyright Open Systems Services 2004

Page 80

Chapter 6

FastExport

The most exciting phrase to hear in


science, the one that heralds the most
discoveries, is not Eureka!, but Thats
funny
Isaac Asimov
The most exciting words when loading or unloading data is That Fast. Put a seat belt
on before running FastExport because this utility will blow your socks off.
FastExport is designed to export Teradata data to a flat file on a mainframe or LAN.
FastExport merely takes an SQL Select command and places the output to a host.
FastExport exports data from multiple tables and exports data to a host file.
Teradata LOADs using FASTLOAD
Teradata UPDATEs using MULTILOAD
Teradata STREAMs using TPump
Teradata Exports using FASTEXPORT

Copyright Open Systems Services 2004

Page 81

Chapter 6

FastExport Picture

Output to a
Mainframe or LAN

Teradata
PE

Host
File

AMP

AMP

AMP

AMP

Fastexport uses a SELECT statement to retrieve rows from


one or more tables and exports the result set to a host
file on a mainframe or LAN.
AMP

AMP

AMP

AMP

Populated
Table

Populated
Table

Populated
Table

Populated
Table

Copyright Open Systems Services 2004

Page 82

Chapter 7

Chapter 7 Secondary Indexes

I dont skate to where the puck is, I skate to


where I want the puck to be.
Wayne Gretzky
What Wayne Gretzky is saying is that he finds the best path to the goal and expects the
puck to be there when he arrives for the shot. Secondary indexes are similar because they
define a path that will deliver the data quickly to meet the users expected goals. A
secondary index is an alternate path to the data. They can be defined as a Unique
Secondary Index (USI) or a Non-Unique Secondary Index (NUSI). Without any
secondary indexes, your data warehouse could be skating on thin ice!
When it comes to working with large amounts of data that is centrally located, demands
for performance to access this data is key. So what can a user do to influence the way
data is accessed? The first rule of thumb, which is essential when it comes to working
with centralized databases today, is to know your data. Second, understand how Teradata
manages data distribution and what a user can do to enhance performance. A query that
utilizes a Primary Index in the WHERE column is the fastest path to the data. A query
that utilizes a Secondary Index will provide an alternate path to the data and be the
second fastest access method. This chapter is dedicated to secondary indexes.
Secondary Indexes
Secondary Indexes provide another path to access data. Lets say that you were planning
a road trip to your hometown. To determine the best way to get there, you need to utilize
a map. This map will give you many alternatives to plan your trip. In this case, you need
to get there, ASAP. So you choose the best route to get there in the shortest period of
time. Secondary indexes work very similar to this above example because they provide
another path to the data. Teradata allows up to 32 secondary indexes per table. Keep in
mind that the base table data rows arent redistributed when secondary indexes are
defined. The value of secondary indexes is that they reside in a subtable and are stored
on all AMPs, which is very different from how the primary indexes (part of base table)
are stored. Keep in mind that Secondary Indexes (when defined) do take up additional
space.
Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be
changed or dropped at any time. However, because of the overhead for index
maintenance, it is recommended that index values should not be frequently changed.
There are two different types of Secondary Indexes, Unique Secondary Index (USI), and
Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely
efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the
Copyright Open Systems Services 2004

Page 83

Chapter 7
USI subtable row (in the Secondary Index subtable) that references the actual data row,
which resides on the second AMP.
A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool
file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:

Satisfying complex conditions

Processing aggregates

Value comparisons

Matching character combinations

Joining tables

Below is a general illustration of a secondary index subtable row:

Secondary Index Subtable Columns

Secondary
Index Value
(Actual Length)

Secondary
Index Row-ID
8 Bytes

Primary Index
Row-ID
8 Bytes

Secondary Index Column Lengths

Copyright Open Systems Services 2004

Page 84

Chapter 7

Unique Secondary Index (USI)

Measure a thousand times and cut once.


-Turkish Proverb
Secondary Indexes provide an alternate path to the data, and should be used on queries
that run thousands of times. Teradata runs extremely well without secondary indexes, but
since secondary indexes use up space and overhead, they should only be used on
KNOWN QUERIES or queries that are run over and over again. Once you know the
data warehouse, environment you can create secondary indexes to enhance its
performance.

Measure a thousand query times and


create a secondary index.
-Turkish Teradata Certified Professional
Whenever a secondary index is created, Teradata creates a secondary index subtable on
each AMP. All secondary index subtables contain:
Secondary Index Value
Secondary Index Row ID
Primary Index Row ID
A UNIQUE Secondary Index (USI) will improve data retrieval and can also be used to
enforce uniqueness on a primary key. Typically, only two AMPs are used on a
Unique Secondary Index (USI) access.
A Non-Unique Secondary Index (NUSI) is AMP local and is an All AMP operation,
but not a full table scan.
Four major index types with Teradata are Join Index, Hash Index, Sparse Index and a
Value Ordered Index.

Copyright Open Systems Services 2004

Page 85

Chapter 7

USI Subtable Example


When a USI is designated on a table, each AMP will build a subtable to point back to
the base table. If you create 32 USI indexes on a table, then each AMP will build 32
separate subtables. Therefore, choose your Secondary Indexes wisely, because space is
used when these indexes are created. When a user inputs SQL that utilizes a USI in the
WHERE clause, then Teradata will know that either one row or no rows can be returned.
Reason, the column in the WHERE clause is unique. The following example illustrates
how a USI Subtable is created and how it works to speed up queries.

Employee Table with Unique Secondary Index (USI) on Soc_Security


Employee Base Table

ROW ID
04,1
18,1
25,1

Employee Base Table

Emp Dept Fname Lname Soc_Security ROW ID


88
75
15

20 John
10 Mary
30 John

Secondary
Index Value
123-99-8888
146-69-2650
235-83-8712

Marx
Mavis
Davis

Secondary
Index Row-ID
102,1
118,1
134,1

276-68-2130
235-83-8712
423-87-8653
Base Table
Row-ID
45,1
14,1
18,1

Secondary Index Subtable

14,1
38,1
45,1

Emp Dept Fname Lname Soc_Security


45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value
276-68-2130
423-87-8653
212-53-4532

Wiles
Berry
Ngu

146-69-2650
212-53-4532
123-99-8888

Secondary
Base Table
Index Row-ID Row-ID
121,1
138,1
144,1

04,1
25,1
38,1

Secondary Index Subtable

When A USI is created Teradata will immediately build a secondary


index subtable on each AMP.
Each AMP will then hash the secondary index value for each of their
rows in the base table. In our example, each AMP hashes the
Soc_Security column for all employee rows they hold.
The output of the Soc_Security hash will utilize the hash map to point
to a specific AMP and that AMP will hold the secondary index
subtable row for the secondary index value.

Copyright Open Systems Services 2004

Page 86

Chapter 7

How Teradata retrieves an USI query


When an USI is used in the WHERE clause of an SQL statement, the PE Optimizer
recognizes the Unique Secondary Index. It will perform a two-AMP operation to find
the base row. Teradata knows it is looking for only one row and it can find it easily. It
will hash the secondary index value and the hash map will point to the AMP where the
row resides in the subtable. The subtable row will hold the base table Row-ID and
Teradata will then find the base row immediately.

SELECT * FROM Employee


WHERE Soc_Security = 123-99-8888 ;
Step 1: Hash the
Value 123-99-8888

Step 2: Go to the AMP


where the Hash Map Points.

Hash Map

Take the row hash


output and point to
a bucket in the Hash Map
to locate the AMP holding
the subtable row for
Soc_Security 123-99-8888

1
2

2
1

1
2

Locate the Soc_Security


123-99-8888 row in the
subtable and get the
base Row-ID. Use the
base Row-ID to find
the base row.

Employee Table with Unique Secondary Index (USI) on Soc_Security


Employee Base Table

ROW ID
04,1
18,1
25,1

Emp Dept Fname Lname Soc_Security ROW ID


88
75
15

20 John
10 Mary
30 John

Secondary
Index Value
STEP 1

Employee Base Table

123-99-8888
146-69-2650
235-83-8712

Locate the
Secondary
Index Value
In the Subtable.
Find the Base Table
Row-ID.
Secondary

Marx
Mavis
Davis

Secondary
Index Row-ID
102,1
118,1
134,1

276-68-2130
235-83-8712
423-87-8653
Base Table
Row-ID
45,1
14,1
18,1

14,1
38,1
45,1
S
T
E
P

Emp Dept Fname Lname Soc_Security


45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value

Wiles
Berry
Ngu

Secondary
Index Row-ID

276-68-2130
423-87-8653
212-53-4532

121,1
138,1
144,1

146-69-2650
212-53-4532
123-99-8888
Base Table
Row-ID
04,1
25,1
38,1

Index Subtable

Copyright Open Systems Services 2004

Use the Base Table


Row-ID to find the
Base Table row. Secondary

Index Subtable
Page 87

Chapter 7

NUSI Subtable Example


When a Non-Unique Secondary Index (NUSI) is designated on a table, each AMP will
build a subtable. The NUSI subtable is said to be AMP local because each AMP will
create its secondary index subtable to point to its own base rows. In other words, every
row in an AMPs NUSI subtable will reflect and point to the base rows it owns. When a
user inputs SQL that utilizes a NUSI in the WHERE clause, then Teradata will have each
AMP check its subtable to see if it has any qualifying rows. Only the AMPs that contain
the values that are needed will be involved in the actual retrieve.

Employee Table with Non-Unique Secondary Index (NUSI) on Fname


Employee Base Table

ROW ID
04,1
18,1
25,1

Emp Dept Fname Lname Soc_Security


88
75
15

20 John
10 Mary
30 John

Secondary
Index Value
John
Mary

Marx
Mavis
Davis

276 -68-2130
235 -83-8712
423 -87-8653

Secondary
Base Table
Index Row-ID Row-ID
145,1
156,1

04,1 25,1
18,1

Secondary Index Subtable

Employee Base Table

ROW ID
14,1
38,1
45,1

Emp Dept Fname Lname Soc_Security


45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value
Max
Will
Oki

Wiles
Berry
Ngu

146 -69-2650
212 -53-4532
123 -99-8888

Secondary
Base Table
Index Row-ID Row-ID
134,1
157,1
159,1

14,1
38,1
45,1

Secondary Index Subtable

When A NUSI is created Teradata will immediately build a secondary


index subtable on each AMP.
Each AMP will hold the secondary index values for their rows in the
base table only. In our example, each AMP holds the Fname column
for all employee rows in the base table on their AMP (AMP local).
Each AMP Local Fname will have the Base Table Row-ID (pointer)
so the AMP can retrieve it quickly if needed. If an AMP contains
duplicate first names, only one subtable row for that name is built
with multiple Base Row-IDs.
Copyright Open Systems Services 2004

Page 88

Chapter 7

How Teradata retrieves a NUSI query


When an NUSI is used in the WHERE clause of an SQL statement, the PE Optimizer
recognizes the Non-Unique Secondary Index. It will perform an all AMP operation to
look into the subtable for the requested value. If it contains the value it will continue
participation. If it does not contain the requested value it will no longer participate. A
NUSI query is an All AMP operation, but not a Full Table Scan (FTS).

SELECT * FROM Employee


WHERE Fname = John ;
Step 1: Hash the
Value John for speed.

Step 2: All AMPs who


contain a John will
retrieve their rows.

Take the row hash for John


and have each AMP check
its subtable to see if it has
a John.

Any AMP that does not


contain the name John will
no longer participate in the
query.

Employee Table with Non-Unique Secondary Index (NUSI) on Fname


Employee Base Table

ROW ID
04,1
18,1
25,1

Emp Dept Fname Lname Soc_Security


88
75
15

20 John
10 Mary
30 John

Secondary
Index Value

Find John

Employee Base Table

John
Mary

* Marx
Mavis
Davis
*

276 -68-2130
235 -83-8712
423 -87-8653

ROW ID
14,1
38,1
45,1

* 04,1
18,1

Secondary Index Subtable

Copyright Open Systems Services 2004

25,1

45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value

Secondary
Base Table
Index Row-ID Row-ID
145,1
156,1

Emp Dept Fname Lname Soc_Security

Max
Will
Oki

O
W
S

Wiles
Berry
Ngu

146 -69-2650
212 -53-4532
123 -99-8888

Secondary
Base Table
Index Row-ID Row-ID
134,1
157,1
159,1

14,1
38,1
45,1

Secondary Index Subtable

Page 89

Chapter 7

Value Ordered NUSI


When a Value Ordered Non-Unique Secondary Index (Value Ordered NUSI) is
designated on a table, each AMP will build a subtable. The NUSI subtable is said to be
AMP local because each AMP will create its secondary index subtable to point to its own
base rows. In other words, every row in an AMPs NUSI subtable will reflect and point to
the base rows it owns. It is called a Value Ordered NUSI because instead of the subtable
being sorted by Secondary Index Value HASH it is sorted numerically.

Employee Table with Value Ordered Non-Unique Index Secondary Index on Dept
Employee Base Table

ROW ID

Emp Dept Fname Lname Soc_Security


88
75
15

04,1
18,1
25,1

20 John
10 Mary
30 John

Secondary
Index Value
10
20
30

Marx
Mavis
Davis

276 -68-2130
235 -83-8712
423 -87-8653

Secondary
Base Table
Index Row -ID Row-ID
145,1
156,1
158,1

18,1
04,1
25,1

Secondary Index Subtable

Employee Base Table

ROW ID
14,1
38,1
45,1

Emp Dept Fname Lname Soc_Security


45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value
10
40

Wiles
Berry
Ngu

146 -69-2650
212 -53-4532
123 -99-8888

Secondary
Base Table
Index Row -ID Row-ID
145,1
159,1

14,1 38,1
45,1

Secondary Index Subtable

When A Value Ordered NUSI is created Teradata will immediately build a


secondary index subtable on each AMP and sort it in order.

Each AMP will hold the secondary index values for their rows in the base table
only. In our example, each AMP holds the Dept column for all employee rows in
the base table on their AMP (AMP local).

Each AMP Local Dept will have the Base Table Row-ID (pointer) so the AMP
can retrieve it quickly if needed. This is excellent for Range queries because the
subtable is sorted numerically by Dept.

Copyright Open Systems Services 2004

Page 90

Chapter 7

How Teradata retrieves a Value Ordered NUSI


query
When a Value Ordered NUSI is used in the WHERE clause of an SQL statement, the PE
Optimizer recognizes the Value Ordered Non-Unique Secondary Index. It will perform
an all AMP operation to look into the AMP-Local subtable for the requested value. It is
excellent at checking ranges because all subtable rows are in order. If an AMP contains
the value or values requested it will continue participation. If it does not contain the
requested value or values it will no longer participate. A Value Ordered NUSI query is
an All AMP operation, but very seldom a Full Table Scan (FTS). A Value Ordered NUSI
must be non-unique and it must be a numeric data type. A DATE column type is
considered numeric and there for may be a Value-Ordered NUSI.

SELECT * FROM Employee


WHERE Dept BETWEEN 10 AND 20
Step 1: Check the subtable
for Dept values ranging
from 10 to 20

Step 2: If the AMP has


qualifying rows then
retrieve the rows in the
range.

If no rows are found then


The AMP should no longer
Participate in the query.

Employee Table with Value Ordered Non-Unique Index Secondary Index on Dept
Employee Base Table

ROW ID
04,1
18,1
25,1

Emp Dept Fname Lname Soc_Security


88
75
15

20 John
10 Mary
30 John

Secondary
Index Value
10
20
30

Marx
Mavis
Davis

276 -68-2130
235 -83-8712
423 -87-8653

Secondary
Base Table
Index Row -ID Row-ID
145,1
156,1
158,1

18,1
04,1
25,1

Secondary Index Subtable

Copyright Open Systems Services 2004

Employee Base Table

ROW ID
14,1
38,1
45,1

Emp Dept Fname Lname Soc_Security


45
32
65

10
10
40

Max
Will
Oki

Secondary
Index Value
10
40

Wiles
Berry
Ngu

146 -69-2650
212 -53-4532
123 -99-8888

Secondary
Base Table
Index Row -ID Row-ID
145,1
159,1

14,1 38,1
45,1

Secondary Index Subtable

Page 91

Chapter 7

Secondary Index Summary


You can have up to 32 secondary indexes for a table.
Secondary Indexes provide an alternate path to the data.
The two types of secondary indexes are USI and NUSI.
Every secondary index defined causes each AMP to create a
subtable.
USI subtables are hash distributed.
NUSI subtables are AMP local.
USI queries are Two-AMP operations.
NUSI queries are All-AMP operations, but not Full Table
Scans.
Value-Ordered NUSIs can be any non-unique index of
integer type.
Always Collect Statistics on all NUSI indexes.
The PE will decide if a NUSI is strongly selective and worth
using over a Full Table Scan.
Use the Explain function to see if a NUSI is being utilized or
if bitmapping is taking place.

Copyright Open Systems Services 2004

Page 92

Chapter 7

Chart for Primary and Secondary Access


The chart below shows that Primary Index access is a one-AMP operation. For Unique
Secondary Index (USI) access, it is a two-AMP operation. For Non-Unique Secondary
Index (NUSI) access, it is an all-AMP operation, but not a Full Table Scan (FTS). Keep
this chart near and dear to your heart.

Primary Number Rows


Index of AMPs Returned
UPI

0-1

NUPI

0-Many

USI

0-1

NUSI

All

0-Many

Copyright Open Systems Services 2004

Page 93

Chapter 8

Copyright Open Systems Services 2004

Page 94

Chapter 8

Chapter 8 The Active Data Warehouse

Only he who attempts the ridiculous may


achieve the impossible.
Don Quixote
For years it has always been the belief that there were computer systems designed for
Online Transaction Processing (OLTP) and others designed for Decision Support. IBMs
DB2 and Oracle were originally designed for quick transactions in an OLTP world.
Teradata was originally designed for Decision Support (DSS) in the data warehousing
world. Teradata has attempted what many once felt was the ridiculous by combining
OLTP quick transactions with the power of DSS to achieve the impossible. This
incredible concept is called the Active Data Warehouse.
Here is how the Active Data Warehouse came to evolve. Back in the early 1990s
companies loaded new data to the data warehouse on a monthly basis. This was pretty
much the standard practice. As competition began to get more prevalent companies
decided they needed an edge and began to load data on a weekly basis. It was only a
matter of time before most companies were doing nightly loads. Now, companies want
to load data in near real-time processing. What advantage does this bring?
The Active Data Warehouse allows companies to take their OLTP transactions and load
them into the data warehouse in near real-time so users can analyze data and make
decisions before their competitors.
Some of the characteristics of an active data warehouse environment are mission critical
applications, tactical queries and a need for 24/7 reliability.
Active data warehouses provide scalability in order to support large amounts of detail
data. Users are allowed to update the operational data store directly, and an integrated
environment supporting a wide mix of queries is created.

Copyright Open Systems Services 2004

Page 95

Chapter 8

OLTP Environments

Always be a first-rate version of yourself,


instead of a second-rate version of
somebody else.
-Judy Garland
OLTP environments are quite different than DSS environments. OLTP environments
involve many quick transactions where DSS environments have long transactions. A
Transaction is considered a logical unit of work.
In an OLTP environment transactions typically occur in seconds and not minutes. The
number of rows per transaction is also smaller. There are a great deal of writes, but
because the rows are small it is not considered write intensive.
OLTP applications will utilize very little I/O processing to complete transactions and
most often only access a few of many possible tables. For example updating a checking
or savings account to reflect a deposit or withdrawal would only affect one or two
tables and only one or two rows would be updated.
An example of an OLTP transaction is going to a retail store and buying a pencil or
making an ATM money withdrawal from your local bank.
You know you have arrived at an active data warehousing environment when you have
Analytical Modeling, continuous updates and even-based triggering.

Copyright Open Systems Services 2004

Page 96

Chapter 8

The DSS environment

We're going to have the best-educated


American people in the world.
Dan Quayle
If Dan Quayle would have had a Teradata system he would probably be president.
Instead he is often considered the potatoe head of vice presidents.
Teradata is designed around Decision Support (DSS). If you were designing a data
warehouse for a customer to use for strategic long range planning and answering what
if questions, you would definitely want a DSS system.
The DSS environment has many users asking a wide variety of questions. Most of the
questions involve reading records so READ locks are primarily used.
With DSS environments most queries take minutes to hours. The transaction usually
involves multiple tables and millions of rows. DSS environments can be brought to their
knees if the have to continually wait due to locks on the system.
A true data warehousing environment will need to support three types of environments
for Pre-defined reports, Ad Hoc Queries, and Data Mining and Analytical Modeling.
Data Warehouse Environments

Pre-defined Reports

Ad Hoc Queries

Copyright Open Systems Services 2004

Data Mining

Analytical Modeling

Page 97

Chapter 8

Mixing OLTP and DSS environments

Am I not destroying my enemies when I


make friends of them?
-Abraham Lincoln
Teradata makes friends of a data warehouses worst enemy OLTP transactions.
In an OLTP world the query times are predictable. In the DSS world the query times are
unpredictable. OLTP depends on throughput and DSS depends on power. Mixing the
environments is difficult. This is especially true because OLTP environments do a lot of
writes where DSS does a lot of reads.
OLTP queries run quickly and are often called Tactical Queries. An example of a tactical
query might be altering a campaign based on current results or determining the best
offer for a specific customer.
.
Table
Tactical
Query

Write
Lock

Read
Lock
DSS
Query

Tactical
Query

Write
Lock

An Active Data Warehouse consists of short tactical OLTP type queries mixed with large
Decision Support Queries. The OLTP queries like to WRITE lock the data which is bad
when other queries need to READ the data only.
The evolution of a true data warehouse takes time and the data warehouse activities will
naturally evolve towards an active data warehouse. In the beginning the warehouse is
used for analyzing which over time evolves into predicting and finally into
operationalizing.

Copyright Open Systems Services 2004

Page 98

Chapter 8

Detail Data

Can't died when Could was born.


--- Author Unknown
Detail data is the foundation for a great data warehouse. For years most companies said
they Could keep that much data when the real facts were the database Couldnt handle
it. Teradata said Could while the others said Cant! Some companies are now processing
over 50 Terabytes of raw data.
The ability to use detail data and Ad Hoc Queries as well as the decreased need for
summary data are a few aspects of DSS environments that have gained importance.
Detail data is the cornerstone of a good warehouse. Without detail data users cant dig
into the detail. If they ask a question and get a summarized answer they can check the
detail for the explanation.
In the past, detail data was not used as often because most systems did not have the
power to read millions of records, sort millions of records, execute full table scans, and
perform aggregations on millions of rows.
Teradata has always been great with the detail and continues the tradition today.

Copyright Open Systems Services 2004

Page 99

Chapter 8

Easy System Administration

I have had dreams and I have had


nightmares. I overcame my nightmares
because of my dreams
--- Author Unknown
A data warehouse brings dreams of turning data into information and saving the
corporation millions of dollars. A data warehouse brings nightmares to someone who has
to administer and manage this dynamic and daily growing giant. Most data warehouses
require from 4 to 10 system administrators working rampantly around the clock. I always
recommend two system administrators for Teradata data warehouses. Why two? In case
one gets hit by a bus! Now, this is a dream come true.
With most databases, the system administrator is responsible for setting up the database,
placing and partitioning data, running database reorganizations, and tuning queries. This
is a tremendous responsibility, especially when dealing with large amounts of data in a
complex data warehouse environment. Plus, data warehouses on average are doubling in
size each year. Teradata was designed to let the system manage these functions and the
larger the database, the more Teradata outshines the competition.
I have travelled around the globe from one corporation to another training the world on
data warehousing and thousands of people on Teradata. The topics include system
administration, load utilities, architecture, SQL, and operations. Students who have
experience with other databases literally think I am out of my mind when I explain the
Teradata database. They say things like, "If loading data and system administration is
that easy, why isnt everyone doing it?" The answer is simple. Teradata was originally
designed around parallel processing with hands-off operations to work in conjunction
with large amounts of mainframe data. This design concept must be done in the original
design and most databases overlooked it.
The DBA never has to do Database Reorganizations and never has to pre-prepare data
to load and there is never a pre-allocation of table space.

Copyright Open Systems Services 2004

Page 100

Chapter 8

Data Marts

I have found the best way to give advice to


your children is to find out what they want
and then advise them to do it.
--Harry S. Truman
Data Marts are always designed for a particular use and will either be summary data
for a particular use or detailed data for a particular use. Because they have a
particular use they are designed for speed.

Data Warehouse
Tables of Detail Data

Tables of Summary Data

A Logical
Data Mart

There are two types of data marts in logical and physical data marts. A logical data
mart is an existing part of the data warehouse, but a physical data mart resides on
another platform.

Copyright Open Systems Services 2004

Page 101

Chapter 8

Teradata Tools - SQL Assistant

He who asks a question may be a fool for


five minutes, but he who never asks a
question remains a fool forever.
Unknown
SQL Assistant is a tool that allows users to become cool. Nothing makes a user cooler
then positively affecting the company bottom line. SQL Assistant allows access to
Teradata and other databases as well. SQL Assistant is how users submit their SQL
and soon questions are being answered and data warehouse genius is born.

Copyright Open Systems Services 2004

Page 102

Chapter 8

TDQM

Be not afraid of going slowly,


Be afraid of standing still.
- Chinese Proverb
The wrong mix of Teradata queries can make users afraid because the system can not
only slow down, but appear to be standing still. TDQM uses rules to make sure your
system doesnt stand still or go slowly. Teradata Database Query Manager (TDQM)
provides users with the ability to schedule SQL requests at a later time using the
Teradata DQM Scheduled Request Viewer. TDQM automatically manages system
workflow by stopping queries from executing if they violate predefined rules. TDQM
can limit certain types of joins and can even control access to certain database objects.
Queries can be delayed or cancelled based on predefined rules. TDQM requires queries
to be based on set thresholds.
The TDQM server can be started or stopped through the Control Panel Services
application or from the TDQM Scheduled Requests Operations Utility.
The TDQM Scheduled Requests Operations utility menu has the following menus:

File

Configuration

Server

Information

Error Log

Help

TDQM allows for a period of time to be established when TDQM can execute scheduled
requests that are waiting to run. This is usually done during off peak hours. TDQM
schedules jobs, which are considered an individual execution of an instance of a
scheduled request. A request is considered a definition of the parameters and text
associated with a scheduled request. Finally, a scheduled request is a stored script of
SQL requests to be executed at a scheduled time later in the day.

Copyright Open Systems Services 2004

Page 103

Chapter 8

Index Wizard

If the facts dont fit the theory, change the


facts
-Albert Einstein
The Index Wizard will allow Teradata to find the theory of relativity for Secondary
Indexes. Index Wizard is designed to help with Secondary Index recommendations
(not primary) and comes with a beautiful Graphical User Interface (GUI). The wizard
works by analyzing SQL statements in a defined workload and then recommends the
best secondary indexes to utilize based on What If analysis.
Index Wizard analyzes a workload of SQL and then creates a series of reports and index
recommendations describing the costs and statistics associated with the
recommendations. The reports help you backup your decision to apply an index.
Both Index Wizard and Statistics Wizard allow a user to import workloads from other
Teradata tools such as Database Query Log (DBQL) or Query Capture Database (QCD).
Here are the steps to using the Index Wizard in exact order:
1.
2.
3.
4.
5.
6.

Define a workload
The workload is analyzed
The Wizard recommends Secondary Indexes
Reports are generated
Indexes are validated
Indexes can be applied

You can define a workload:

Using DBQL Statements


Using QCD
Entering SQL Text
Importing a Workload
Creating a new workload from an existing one

Copyright Open Systems Services 2004

Page 104

Chapter 8

Archive Recovery

A Diamond is a lump of coal that could


handle the pressure
-William Coffing
The Archive Recovery tool (ARC) is a diamond in the restore. The Archive Recovery
utility allows you to copy a table and restore it to another Teradata Database. The
Archive and Recovery (ARC) utility backs up and restores database tables, objects, and
database DBCs Data Dictionary. The ARC utility performs three major tasks:

Archive Dumps data onto portable storage (usually tape)


Restore Reverses the archive process and moves data from the stored media
Recovery Uses information stored in the Permanent Journals for
Rollback/Rollforward information

ARC provides data protection when there is a loss of data on a failed AMP containing
Non-Fallback tables or when multiple AMPs go down within the same cluster rendering
the Fallback useless for the cluster. It can also be used when objects are dropped or rows
are deleted from a table or even Batch Processing miscues. When you think of ARC
think first of Disaster Recovery and second think of accidental stupid mistakes. Either
way ARC has got your back!
ARC does NOT work with Join Indexes or Hash Indexes. If you need to recover a Join
Index or Hash Index just make sure the tables that the Join Index were created on are
alright and then drop and recreate the Join Index or Hash Index manually. Many DBAs
actually save the DDL for Join Index and Hash Index creation for this purpose.
There are several ways to invoke ARC including NetVault, NetBackup, ASF2, Command
Line of ARCMain, or directly from the host or Mainframe.

Copyright Open Systems Services 2004

Page 105

Chapter 8

Teradata Analyst Suite

What lies behind us and what lies before us


are tiny matters compared to what lies
within us
-Ralph Waldo Emerson
The Teradata Analyst Suite has three tools that dont lie because they use facts to provide
information so users can find the brilliance that lies within them. This allows analyses of
the Teradata system, which can make Teradata, perform better and it doesnt get any
sweeter then that.
The three tools and utilities that are part of the Teradata Analyst Suite are:
Teradata Index Wizard
Query Capture Database
Teradata System Emulation Tool
We hope you have enjoyed this book and its simple explanations of Teradata. The basics
are the foundation to base the rest of your Teradata knowledge. Now, go pass the
Teradata Certification test and become a Teradata Certified Professional. What lies
before you will be huge after you pass that test. This is the only book you need to study
and always remember, Where you find bold you will find gold.

Copyright Open Systems Services 2004

Page 106

Chapter 8
This page blank on purpose

Copyright Open Systems Services 2004

Page 107

Вам также может понравиться