Академический Документы
Профессиональный Документы
Культура Документы
Teradata Aster
Database Administration
Here is the proposed 3-day class schedule. Note this can change based on student
interest in the topics.
Every student will have their own VMware cloud environment where they will have:
Aster Cluster
Servers
We will be using VMware images for our lab environment. Heres the big picture
1. From Web browser (IE preferred): Good idea to bookmark this now in your Web
browser, since youll need to go back to this
site every day of class
https://teradata.hostedtraining.com
Setting up a ReadyTech VMware environment is a custom process for each teach. This
is a one-time setup and may require IT support from your company to complete
successfully. The following instructions are generic in nature and are not all inclusive.
The goal is to get all those green checks. (OK if VNC is not found)
Next, click Continue to ActiveX Download or JAVA Download
Follow along with the Instructor as we setup the ReadyTech VMware environment.
2 things can stop you here. You must have Admin privileges to install this Plugin. If you repeatedly get
RETRY message, logout and log back into Windows as someone with permissions to Install Apps.
Secondly, you may need to manually configure IP Binding as follows:
Follow along with the Instructor as we setup the ReadyTech VMware environment.
Follow along with the Instructor as we setup the ReadyTech VMware environment.
Username: student
Password: training
Housekeeping chores
Once all the images are started, we will SUSPEND HDP 2.1 since we wont need
this image until tomorrow. Click on Vmware Workstaton icon, then right-click on
HDP 2.1, then select Power>SUSPEND
From Project Explorer tab, open 3)DBA folder > Testing 1-2-3 folder and double-click on
00-a-Aster-test-SQL. The code will appear in the upper right-hand pane
Highlight code, then right-click and select EXECUTE SELECTED TEXT
Confirm you get back a Result Set in the pane. Then repeat for 00-b-Teradata-test-SQL
3. If get 'Mule Exception Error', just go to Data Source Explorer tab and Disconnect, then
Reconnect. Then run the query again
Run the simple SQL statement and confirm you get an Answer set.
Tomorrow when you re-connect, you will be back to the same exact place you left the
day before.
When you login the next day (via https://teradata.hostedtraining.com) you will
return right where you left off.
Did I mention you should Bookmark this URL right now ???
Mod 01
Aster Architecture
Next well talk about Fault Tolerance functionalities built into Aster such as Replication Factor
and RAID. And well discuss sizing a Cluster based on your requirements.
Aster clients
Easy to manage:
Self-managing:
Active Queen
One or more Worker Nodes
One or more Loader Nodes
Queries/Answers
Designed for Big Data using clusters
Queen
of hardware boxes managed as a
1
single database
Queries
4 independent (share nothing) tiers for
management, query processing,
2 Worker Nodes loading, & backup
The Queen is roughly something like a combination of the Teradata Bynet and Parsing Engine.
Queens are a single point of failure, but recovery time can be minutes with a redundant Queen.
Queries/Answers
Queen
Queries
- Stores data and interacts with the
Queen and other Workers
Worker Nodes
Intra Cluster Express (ICE)
Worker Nodes
Worker-1 Worker-2 Worker-3 Worker-4 Worker-5 Worker-6
Using Loader nodes reduces the stream of data to the Queen during the Load process. In
addition, if Loading Distributed tables, the Loader node will also provide for the hashing of those
rows to the correct vWorker.
Loader Node
Queries/Answers
Queen
- Run specialized software for
high speed bulk load of data
Queries
- Receives data from loader
clients, segments data into
Worker Nodes
partitions and moves partitions
directly to appropriate
Worker/vWorker bypassing the
Data
Queen
Loader Nodes
- Fully parallel. Takes advantage
of multi-core CPU
Number of Workers = ( 3 x D ) / C
Where:
The different data types take up bytes of storage, and have ranges, as in the table below:
(If free space < 30% queries may fail and failover may not work)
Increasing Workers with the same amount of data increases the ratio of
processing to data thus increasing performance since now have more
CPU processing power
1 Queen
Queries
2 Workers
Worker Nodes
0 Loaders
In 8-CPU Worker node, Aster comes preconfigured with 6 v-Workers per Node
The replication of a given piece of data occurs in the background as soon as the transaction is
committed on the primary data. Txman ensures that the v-Worker replication gets a record of the
change.
Switch
(10 GB or InfiniBand)
The slide shows general RAID recommendations for the Aster Data system node types.
The asterisk notes that Aster Data does not support Queens without redundant storage.
RAID is not really important for the Loader Nodes as data just passes through them.
The choice of RAID level determines how the disks will be combined
into a storage layer. It impacts both performance and reliability
through tradeoffs in capacity, cost, and risk
The Teradata Aster Big Analytics Appliance features a complete Aster Database, including the
patented Aster SQL-MapReduce framework, Hortonworks Data Platform, Aster MapReduce
Analytics Portfolio with more than 50 analytical functions. It runs on proven Teradata hardware,
leverages the most current Intel processor chip technology, SUSE Linux operating system,
and market-leading enterprise-class storage. It can be configured to store a maximum of 5
petabytes of uncompressed user data for Aster and up to 10 petabytes of uncompressed user data
for Hadoop.
1. Make sure that you are running python version 2.5.5 or a higher version.
2. 2 Run the Aster Database installer from a command shell on the queen. The command
will be similar to (the installer file name below is just an example; please replace it with
the appropriate name for use on your operating system):
# ./AsterInstaller__6-00-xx.R_xxxxx.bin
Queen
Number of Workers
Number of Virtual Workers
Number of CPUs
Other parameters
ACT commands
SQL commands
SQL-MR commands
Putty login
Putty password
ACT logon
ACT password
ACT Provides SQL, SQL-
MR and ACT commands
SQL command
ACT command
Closes window
If copy ACT software onto your PC, after initially getting your RSA signature, you can then
logon to ACT via a Windows command prompt (ie: CMD) using following syntax:
Whenever you are at a prompt that does not have =>, and you cannot
get back to a => prompt, you can escape back to the => prompt by
typing:
CTRL + C
When have more than 1 page of results from a query, hit the SPACE bar
to go to the next page.
# export PAGER=less
(scroll down- scroll up via keyboard)
To break out of the final page and return to a prompt, hit the q key
To change from 1 database to another, instead of logging out (\q) of the old
database (prod) and logging back in to the new database
(ie: act d retail_sales U beehive w beehive), can use below code:
\c retail_sales beehive
where \c = connect, retail_sales = new database, beehive = password
If wish to paste code into ACT, just right-click from the ACT prompt
Note: In the hands-on labs, we will typically be using ACT when an ACT
command needs to be run. To get a list of ACT commands, type: \?
All ACT commands start with: \ (slash)
BI tools:
- Tableau
- MicroStrategy
- SAS
- Cognos
Mod 02
Management Consoles -
Aster and Viewpoint
The Aster Management Console (AMC) is a web-based interface that lets you manage,
configure, and monitor Aster Database activity. The AMC provides administrators with an
authoritative view of the system and mechanisms for invoking administrative actions. AMC
provides developers and other users with insight into Aster Database activity, such as details on
currently executing SQL statements and statement histories.
Portlets enable users across an enterprise to customize tasks and display options to their
specific business needs. You can view current data, run queries, and make timely business
decisions, reducing the database administrator workload by allowing you to manage your
work independently. Portlets are added to a portal page from a menu. The Teradata
Viewpoint Administrator configures access to portlets based on your role.
Teradata Viewpoint portlets let you monitor not only Teradata systems, but also Aster and
Hadoop systems. In the future the AMC functionalities will all migrate to Teradata Viewpoint.
1 Dashboard: summarizes
cluster status and activity
Top
Processes
Nodes
As shown in the image above, the top of the Dashboard consists of the following items.
Clockwise from the upper left, they are:
Status Lamp: The status lamp lights green to show the cluster is running correctly. The
legend next to the status lamp shows the name of the cluster and its status, and the current queen
time, converted to browser-local time.
Resource Center: Click this link to open the Teradata Aster Resource Center, a web
page where you can find documentation, videos, and downloadable client software for
various operating systems.
Help Link: Click this link to open an HTML page containing information about the
AMC page you are currently viewing.
Login Details: In the top right of the window is the Teradata Aster logo. Directly below that
is your current, logged-in AMC user account name. Your user account determines what
actions you can perform in the AMC.
Status Summary: In the upper right of the Dashboard tab is the status box. This box is a
fixture not only of the Dashboard, but of all AMC windows. The status box notifies you of
important events in Aster Database.
Message Board: In the upper left of the Dashboard tab is the message board. Here, you and
other Aster Database administrators can post messages to all AMC users. To add a
message, click the pencil icon, type the message in the dialog box that appears, and click
OK to post it. All AMC users on this cluster will see your message immediately on the
message board in their AMC session.
The Processes section of the dashboard shows an overview of the current and recent jobs in
the cluster, as well as statistics including the Most Active Users rankings and the Process
Execution Time graph. The Active Applications box shows currently installed applications
that run on the cluster. The Processes section corresponds to the Processes tab, and clicking
most labels in this section will take you to the Processes tab.
Top of Dashboard
Processes
The green summary box lists the counts of nodes in your cluster and summarizes the status of
the nodes.
This section shows the following (click any label to show its details):
Queen(s): Count of queen nodes in this cluster. The Active count is the number of active
queen nodes in this cluster. This can only be 1 or zero. The Passive count is the number of
passive (backup) queens in this cluster.
Loader(s): Count of the loader nodes in the cluster.
Worker Nodes: Count of worker machines in the cluster. Note this is the count of worker
machines, not the count of virtual workers. Below this are listed the counts of Active, New,
Suspect, and Failed nodes.
The center panel of the Nodes section shows the current replication factor of Aster Database. If
the current replication factor is below your target replication factor (your Aster Database
administrator specified this when installing Aster Database), a warning appears at the top of
this section.
The Replication Factor section shows, first, the cluster-wide current replication factor. Below
that, it shows how many virtual workers are at RF=2 (these are workers that have a valid
backup worker stored in Aster Database) and how many are lacking a backup (RF=1).
Teradata Asters recommended setting is to maintain the cluster at RF=2.
The bottom of this section is the Hardware Statistics panel, showing current and recent CPU
usage, memory usage, network bandwidth usage, and disk I/O usage. Click the Nodes:
Hardware Stats tab for more hardware statistics.
The right side of the Nodes panel of the AMC Dashboard shows the Data Payload Panel. This
panel provides a cluster-wide view of the data capacity of your cluster and shows how much
disk space is currently being occupied by data and other system files.
# of Nodes (not
counting Replication Factor and Cluster-wide disk
Backup Nodes) Hardware statistics capacity/usage
Each process is a SQL command or a block of SQL statements (BEGIN ... END). The
statements can contain SQL-MapReduce functions.
The Processes list is useful for monitoring activity on your cluster, checking on the progress of
queries you have submitted, and finding performance issues such as statements that take
much longer to run than others.
To display processes:
2 To filter the display of processes in the Query Timeline using the Change Filter button, as
described
3 To display summary information about a process, move the mouse over the process ID.
4 To display detailed information about a connected process, click its ID.
Sometimes, you may need to cancel a running process on the cluster. For example, suppose a
user runs the query SELECT * from events. If the events table is large, the query could
easily take far too long to complete. Another operation that can be time-consuming is a
CREATE TABLE that inserts a large number of rows.
Some SQL statements that are not cancellable. These are transaction-related SQL statements,
such as COMMIT, ROLLBACK, CLOSE cursor, and COPY-in SQL.
To cancel a running process, do one of the following:
In the Processes tab (Processes > Processes), if a Cancel icon is displayed for a process, click
the icon in the Cancel column (right-most column), then click OK when prompted.
In the Process Details tab, you can cancel the statement by clicking the Cancel Process
button.
Either action will place the process in Cancelling mode, which indicates that the cancellation
request has been received. Statement cancellation in Aster Database is an asynchronous,
besteffort
operation. While executing a statement, the Aster Database back-end checks
periodically to see whether a cancellation request has been issued. If requested, the back-end
acknowledges the cancellation and triggers a best-effort service to cancel the ongoing
execution.
Processes
Tab/Page
Filter
Query /
process
details
A Process Detail tab for that process is displayed. In addition to the columns displayed in the
process list, this tab shows the following additional information.
Statement
Execution
steps
To see more detail , click the SHOW ALL STEPS hotlink (not shown)
Can find most
expensive v-Worker
2. To filter the display of processes in the Query Timeline using the Change Filter button.
3. To display details about a process, move the mouse over it. A popup message appears
with additional information.
Session status
User
Database
Login Time
Login Duration
Node Failures
The queen node in Aster Database actively monitors all nodes participating in the system. If it
observes a node behaving in an unexpected or inappropriate manner, it will consider thatnode to
be suspicious and change its status to Suspect, and the node will appear yellow in the AMC.
A Suspect node status does not necessarily imply that the node has experienced a failure, only
that the queen is examining it in order to determine whether one has occurred. If the node
continues to demonstrate suspicious behavior while in Suspect status, the queen will consider
it to be Failed and change its status accordingly.
The presence of a Suspect node does not necessarily imply a decrease in performance, but it
typically means the cluster has fallen from RF=2 to RF=1, meaning that one or more vworkers
may not have a backup vworker. While a node is in Suspect state, the queen monitors the
nodes behavior and only consider it to be Failed if it continues to demonstrate such behavior.
If the behavior that was originally observed was a one-time event (e.g. a transient network
error between the queen and the node), the node will remain an active participant while being
considered Suspect.
In Aster Database, the queen will not automatically transition a node from Suspect to Active.
Instead, a node will be returned to Active status on the next activation or load balancing activity.
If the system continues to operate for a reasonable length of time after the node was originally
marked as Suspect, Teradata Aster recommends that the node be returned to Active status by
clicking the Balance Data button in the AMC. Allowing a node that is performing normally
(e.g. one that has continued to operate for at least 24 hours without transitioning to Failed) to
remain in a Suspect status for a lengthy period of time increases the chance that the node will
eventually be considered Failed, triggered by an event such as an unrelated transient error.
To see detailed descriptions of how and to what extent the disks are being used on individual
nodes, click the Nodes tab and click the Node Overview tab. Disk usage details appear in these
columns:
Uncompressed Active Data Size: This column shows the amount of data currently stored on
the node. The term active data refers to the raw, uncompressed data size before it is
stored on disk.
Storage (GB) This column shows a graph showing the current usage of the nodes disk, by
type of data stored (user data, replica data, and free space), and lists the amount of disk
space currently occupied by user and replica data, expressed in GB. This shows the actual
on-disk space that is used and free on the node. Hover your mouse cursor on the graph to
see these statistics for the node:
User Data is the amount of space occupied by primary copies of your data on the node.
Replica Data is the amount of space occupied by the replica copies of your data on the
node.
System represents the amount of the nodes disk space consumed by operating system
files, Aster Database software files, and other files that do not contain your Aster
Database-stored data.
Available represents the amount of unused storage currently available on the node.
Total Space shows the total amount of disk space on the node.
% Full: This column indicates how mach space has been used on this node. This graph
turns orange to indicate that more than 70% of the node disk space has been used, and it
turns red to indicate that more than 90% had been used. If this graph is displayed in
orange or red, you must take action to free up disk space by calling Teradata Support.
Replication Factor
The replication factor (RF) is the number of copies of your data that are stored in Aster
Database to provide tolerance against failures. Maintaining an RF of two ensures Aster
Database is resilient to node and queen failures. While you can run Aster Database at an RF of
one, Teradata Aster strongly recommends that you run with an RF of two. During operation
of the cluster, hardware failures can cause the RF to fall below two, at which point you must
take action to restore the RF.
Overview of use of
database nodes in
a rack
Not Bal Prepared Failed Suspect
Summary of virtual
workers on node
Prepared
Replication Factor
Listing of each Node and their vWorkers
Failed node means v-Workers not participating in Cluster. So this screen shot may not be Refreshed
Balance Process ( see upcoming slide) not run since 1 Worker seen has all Primary v-Workers (no Secondary's)
Prepared nodes means either New Worker with no v-Workers, or existing Worker whose v-Workers are all Secondary
Suspect node typically means some v-Workers are Suspect but still participate in Queries
1. Cluster Management
2. Events
3. Executables
4. Backup
5. Configuration
6. Logs
1. Soft Restart* Boot Aster software only * - Queries cannot be processed during these activities
2. Hard Restart* Boot both Operating system and then Aster software
3. Activate Cluster* When add New node or existing Worker reboots
4. Balance Data 2nd v-Workers copied. Goal: Eensure Pri/Sec vWorkers on different Workers
5. Balance Process* Optimally locates vWorkers. Decides Primary/Secondary v-Workers (ie:
which v-Workers will be Active/Passive). Goal is even v-Worker distribution across Workers
Can bounce individual nodes:
When a node is first added to Aster Database, or registered, it is considered to be a New node.
At this point, Aster Database is aware of the nodes existence, but the node has not yet
contacted the queen in order to be prepared, or loaded with the Aster Database software.
Nodes are also shown as New immediately following a restart of Aster Database, before their
state can be determined.
Preparing
After the node contacts the queen to be prepared, its status changes to Preparing. While in this
status, it is loading the Aster Database software and preparing itself to become a participant in
Aster Database.
Prepared
Once the node completes preparation, its status becomes Prepared. At this point, the node is
ready to be incorporated into Aster Database so that it can host vworkers.
Active
Active and Passive are the acceptable states for nodes in a running cluster. Active nodes are
nodes that are available immediately to process queries in Aster Database.
Passive
Active and Passive are the acceptable states for nodes in a running cluster. A Passive node is a
standby that holds frequently updated copies of vworkers data and later can be made Active to
take on query processing work as needed.
Suspect
Suspect nodes are nodes that have exhibited unusual behavior and are participating in the
Aster Database in a limited capacity while being investigated for potential failures by the
queen.
Failed
Failed nodes are nodes that are no longer participating in the Aster Database.
The Event Engine resides on the queen. It monitors and generates notification on states and
activities on each node. You create subscriptions to specific types of events in order to be
notified when they occur. These subscriptions are created through ncli, and may be viewed in
ncli or the AMC. When certain events occur, Aster Database will perform a remediation, such
as a soft shutdown automatically.
The Event Engine uses a subscription model to send notifications within the system
You can configure separate subscriptions to be notified of events based on various filters.
Some examples of filters you can create include:
The Event Engine resides on the Queen. It monitors and generates notification on states
and activities on each node. You create subscriptions to specific types of events in order to
be notified when they occur. These subscriptions are created through ncli, and may be
viewed in ncli or the AMC. When certain events occur, Aster Database will perform a
remediation, such as a soft shutdown automatically
To assist Administrators in detecting and managing situations where the cluster is running out
of disk space, a node is suspect or failed, a user is initiating actions in the AMC, or replication
factor issues exist, Aster Database provides the following subscribable events:
Partial listing
The events section provides commands to view and configure event subscriptions in the Aster
Database Event Engine. See Monitor Events with the Event Engine on page 140 for
information about event subscriptions.
When you set up event subscriptions, youre setting up subscription to be notified via SNMP
or email whenever events of a particular type occur. The ncli is the only way to add and
manage subscriptions.
The commands in the events section will run against the queen, even if executed from a
worker node. The syntax to run a command in the events section looks like this example:
$ ncli events listsubscriptions
Event Subscriptions
+--------+------------+--------------+--------------+----------------
| Sub ID | Notif Type | Min Priority | Min Severity | Component Type
+--------+------------+--------------+--------------+----------------
| 9 | snmp | High | FATAL |
| 8 | snmp | Medium | ERROR |
| 7 | snmp | High | FATAL |
| 6 | snmp | High | FATAL |
+--------+------------+--------------+--------------+----------------
4 rows
table continued...
+-----------+---------------+----------------------+
| Event IDs | Throttle Secs | Notification Details |
+-----------+---------------+----------------------+
| ST0001 | 0 | manager=10.60.11.5 |
| SY0002 | 0 | manager=10.60.11.5 |
| SY0001 | 0 | manager=10.60.11.5 |
| ST0002 | 0 | manager=10.60.11.5 |
----------------+-----------+----------------------+
To add a new event subscription, issue a command like:
$ ncli events addsubscription --eventIds ST0003 --type snmp --manager
10.60.11.5 --minPriority high --minSeverity fatal
Which displays the event subscription added, returning a result like:
Event Subscriptions
+--------+------------+--------------+--------------+----------------+-----------+
| Sub ID | Notif Type | Min Priority | Min Severity | Component Type | Event IDs |
+--------+------------+--------------+--------------+----------------+-----------+
| 5 | snmp | High | FATAL | | ST0003 |
+--------+------------+--------------+--------------+----------------+-----------+
Events are configured from a command line interface using NCLI commands
To view all existing Subscriptions (and their subid): ncli events listsubscriptions
To view an existing Subscription: ncli events listsubscriptions <subid>
To delete an existing Subscription: ncli events deletesubscription <subid>
To create email subscription when a CANCEL ocurs, would type the following:
The above code will send an E-mail to aster@freemail.com when a User attempts to cancel
a process from the AMC by clicking Cancel from the Processes list
4. From ReadyTech Desktop, under Apps caption, open Thunderbird Mail. Click on
Get Mail and you should receive an E-mail concerning the Cancelled query
ncli events listsubscriptions -- to view events
ncli events deletesubscription <sub id> -- to delete event
Aster provides five out-of-the-box scripts, which install automatically with a clean
install or upon upgrading. These scripts perform cluster administration tasks, such as
finding data skew and determining table information such as size. These scripts
cannot be modified or deleted, but they serve as a useful reference when creating
your own custom scripts. Many of the scripts cascade, which means that if they are
acting on a parent table, they will automatically act on all of its descendants as well.
Run Table Info script on clicks using the following parameters, then go to
Executables Jobs tab to view result via Output hotlink. Note may take a few
minutes to run
To make the Cluster aware of the Backup Manager, you add the IP address of
the Backup Manager here
When Backup Manager starts a Backup, those Backups will be recorded here
As administrator, you create rules that group database operations into workloads using criteria
such as:
Physical backups
ETL operations
All queries generated by members of the Sales department
Reports against the table daily_summary
Administrative operations
You create rules, known as service classes, to assign to each workload with:
a priority - a first-level control on admission to the queue for processing and resource
usage (CPU and disk I/O),
a weight - a second-level control on admission to the queue and resource usage, and
soft and hard memory limits that control the memory to be allocated for the workload.
These rules instruct Aster Database to run each type of job with the right level of urgency.
Based on your rules, Aster Database assigns an initial level of importance to each job and, if
warranted, re-ranks the job while it is running. For example, your rules can ensure high
resource allocation for a newly added query of a given type but throttle down resources for
that query if it runs so long that it is suspected of being a runaway query.
The Admin>Configuration> Workload panel lets you control Aster Databases workload
management rules to ensure proper allocation of the clusters computing resources. In this
panel, you create the rules that allow Aster Database to identify higher- and lower-
importance jobs and run them with the right level or urgency.
Workloads will be covered in a later module
1 Log into the AMC as an amc_admin user. This is typically the db_superuser account in a
new Aster Database installation.
3 In the Roles & Privileges tab, the available AMC Roles (amc_admin, process_admin,
process_viewer, process_runner, node_admin, and node_viewer) are listed on the horizontal
axis of the table, and the individual privileges are listed on the vertical axis. Each privilege
is a combination of a section of the AMC and an action the user can perform there.
Each privilege is a combination of a section of the AMC and an action the user can perform there
Roles and Privileges will be covered in a later module
You can set up host entries on all the nodes of an Aster Database cluster by editing the /etc/
hosts file on each Aster Database node manually (for UMOS clusters) or through the AMC
(for AMOS and UMOS clusters) by performing the following steps.
4. Create a host entry for each host you want to add by clicking on New Host Entry and filling
in the web form with its IP address and alias.
5. When you are finished adding entries for each node, click Save and Apply Changes.
6. Your changes will be written to the hosts file on each Aster Database node.
You can set up host entries on all the nodes of an Aster Database cluster by editing the
/etc/hosts file on each Aster Database node manually or through the AMC
HOSTS are commonly used to point to other Databases (ie: Teradata, Hadoop) so when
using the Connectors to these databases, the host names can be resolved to an IP address
You assign Aster Database functions to their own subnets by using the AMC Network settings.
To view and/or edit the Network settings:
1. Select the Admin tab, and then choose Configuration and Network from the drop-down
options.
2. The AMC Network Overview screen will appear, showing each node and its current settings.
For each node, you can assign an IP address or NIC for each of the following functions.
Note that if you do not assign an IP address or NIC for backups or loads, the default
(queries) setting will be used.
3. Click the Configure button on the far right hand side for the node whose network settings
you want to configure. In the network configuration window for the node, you will see
three tabs for AMOS: Current State, Edit Configuration, and Network Assignments. For UMOS,
the Edit Configuration tab does not appear.
Allows you to test Network Connectivity of your Backup and Loader nodes
You need to fill out the Server name (host name or IP address) and which version of Hadoop you
are using from a particular vendor (i.e.: Hortonworks or Cloudera).
Must configure Aster SQL-H connector within AMC first or will get
following error message when using the connector
When an issue arises on a cluster, one of the first steps in finding the cause is to retrieve the
relevant log files. Aster Database is made up of a large array of distinct services, and it
produces more than 60 different logs spread across every node in the cluster. The AMC
213 Teradata Aster Big Analytics Appliance 3H Aster Database Administrator Guide
provides an easy way for you to deal with all these different logs by creating diagnostic log
bundles. A diagnostic log bundle is a compressed tarball containing data used to determine the
system context and diagnose Aster Database issues. This data may come in system logs from
the queen and subordinate nodes (worker and loader).
By using diagnostic log bundles, you can more easily send information to Teradata Aster tech
support for analysis, reducing the time and effort required to diagnose system problems.
Only AMC users with administrative privileges can create, download, and send diagnostic log
Bundles.
When an issue arises on a cluster, one of the first steps in finding the cause is to retrieve the
relevant log files. Aster Database is made up of a large array of distinct services, and it
produces more than 60 different logs spread across every node in the cluster. The AMC
provides an easy way for you to deal with all these different logs by creating diagnostic log
bundles . A diagnostic log bundle is a compressed tarball containing data used to determine the
system context and diagnose Aster Database issues. This data may come in system logs from
the queen and subordinate nodes (Worker and Loader)
You can also add more Backup nodes although Backup nodes are not considered part of the
cluster. There is a separate process for doing this which is outside the AMC.
Prerequisites
Warning! If you wish to re-deploy a node that previously served as an Aster Database node,
make sure the machine does not contain any data you need, since you must delete all its Aster-
stored data before you re-deploy it. As a guideline, if your cluster is currently running at RF=2
(after removing the node that you will re-deploy), then it is probably safe to delete the nodes
data as explained below.
1. Ensured that the operating system and any required patches are installed on the node;
4. If the prospective node machine has been previously used as an Aster Database node,
then you may wish to clean its file system. Alternatively, you can leave the old data in
place and tick the Clean Node checkbox to allow Aster Database to delete the old data
when adding the machine as a new node.
Partition splitting is an Aster Database feature that helps you add vworkers so that you can
maintain an optimal ratio of CPU cores to vworkers as your cluster grows.
To scale out your cluster, you add worker nodes. As you add worker nodes to the cluster, Aster Database
does not automatically increase the number of vworkers. In other words, the number of vworkers stays
constant as you add worker nodes (machines). This means that, as you add nodes to the cluster, the ratio
of CPU cores to vworkers will increase, and eventually your CPUs may become under-utilized. If this
happens, you can improve performance by increasing the number of vworkers (also known as splitting
partitions).
Teradata Aster recommends that you manage your cluster so that you have approximately two
CPU cores per vworker. For example, an 8-core node should typically host 4 to 6 vworkers. In order to
avoid having to split partitions, you may elect to set up your cluster with 6 vworkers per 8-core node and
then add nodes as your data grows, until your ratio falls below 4 vworkers per 8-core node. Once the ratio
falls below this point, its a good idea to split partitions to make better use of the processing power of
your nodes.
Worker Nodes
24
To increase the parallel processing power of your cluster, you add (Primary) vWorkers. This
is called Partition splitting. This is done at the Queen from a Unix command prompt. You
must have a quiet system so set Concurrency=0 so no one can logon
Once increase # of v-Workers, the data is re-hashed and re-shuffled among all v-Workers
which requires an exclusive lock on the Cluster system. Afterwards, set Concurrency = 100
24 means there are 24 PRI and 24 SEC vWorkers in Cluster. Queens PRI-SEC vWorker not counted
System Health
Query Monitor
Capacity Heatmap
Metrics Graph/Analysis
Space Usage
Overview:
Mod 03
Databases and Schemas
Cluster Aster-Queen
Database - beehive
Schema - public
Tables/Views
Database - <name>
Schema - <xxx>
<Tables/Views>
2 Users/passwords
db_superuser/db_superuser - Can access all database objects
beehive/beehive - Has no admin rights, but owns beehive database
1 Database
beehive - Default database in Aster cluster
2 Schemas
public - All users have read/write access to public schema
nc_system - Houses data dictionary (system tables) for that database
To create a database, you must be a superuser or have the special db_admin privilege.
When you create a database, no other users have the right to use it. You must manage user
privileges as follows:
To grant users the right to use the new database, you must GRANT at least the CONNECT
privilege on the database to the users or roles who will use it.
To grant users the right to create tables in the new database, you must grant them at least
the CREATE privilege on one of the schemas in the database.
The user who created the database is the owner of the new database. It is solely the privilege of
the owner of a database to drop it later. Removing a database removes all the objects (e.g.
tables) within it, even if the individual object has a different owner than the database owner.
You need to be connected to the database server to execute the CREATE DATABASE
command. The first database is always created when Aster Database is initialized. This default
database is called beehive. To create the first ordinary database, you can connect to
beehive.
A database name may contain only alphanumeric characters and the underscore character
(A-Z, a-z, 0-9, and _).
The name must not start with the prefix "_bee" which is reserved for use in naming Aster
Database system objects.
Drop a Database
Only the owner of the database can drop a database. Dropping a database removes all objects
contained within the database. The destruction of a database cannot be undone.
Out of the box, 2 different Aster databases within same Cluster cannot JOIN tables by
default. You will need AnyDatabase2Aster connector to do so
By default, it is possible for Aster to JOIN to both Teradata and Hadoop tables/views
Note although you will have created one logical database, you actually create as many databases
as you have Primary v-Workers.
Note there an no space limitations on a database. It can consume as much hard drive space as is
available on the Workers.
It is also important to note there is no hierarchy to databases. In other words, a new database is
not created from a Parent database.
In most cases, an Aster DBA will create only 1 additional logical database (beehive being the
first database) and use Schemas to enforce security and permissions.
Note if you want to join tables in different schemas in different database in the same Cluster, it is
possible using the AnyDatabase2Aster connector. However for performance reasons, this should
be used as a last resort.
Users can join tables from one schema with tables in another
schema (within same Aster database) if they have proper privileges
for the schemas/tables
2 different schemas
From ACT:
Since pointing to beehive
database, can only see tables
beehive=> SELECT * FROM sales_schema.sales_fact f
in this database INNER JOIN public.calendar cal
ON cal.cal_date = f.sales_date;
You can specify a schema search path for a transaction or session (using SET), or as the default
for a user (using ALTER USER).
The first schema named in the search path is called the current schema. Aside from being the
first schema searched, it is also the schema in which new tables will be created if the CREATE
TABLE command does not specify a schema name.
To put a new schema in the search path, use the SET search_path command, as shown in
this example:
CREATE SCHEMA myschema;
CREATE TABLE myschema.mytable (
...
);
SET search_path TO myschema,public;
After doing this, we can access the table without schema qualification:
For unqualified queries, Schema Search Path will determine the schema
accessed. The default schema search path for all users is the public schema
When using ACT commands, \dt to display tables, can only view 1st schema in the
SET_SEARCH_PATH command. To view Tables in other schemas, use \dt <SCHEMA>.* command
2. From ACT, type this command to see which schema the system will
use if you dont specify one in your query?
Goal:
Create new database
(retail_sales) and 4 schemas
(meta, views, stage, prod)
Mod 4
Data Modeling
To achieve these results, though, you must take the time to properly design your data model, so
that it suits the characteristics of your data and your queries. This does not mean you are
designing your data model around pre-canned queries! Instead, it means that you are providing
Aster Database with clues as to which data should be collocated with which other data, and
which data is likely to be queried more often, or used more often for joining or filtering results.
Project Initiation
Activity Modeling
Activity Volume
Modeling Usage Extended Logical Data Model (ELDM)
Frequency
Integrity
Physical
Modeling Physical Database Design & Creation CREATE TABLE, Indexes
Production Release
The following form is used for customer purchase. Based on the below Attributes
(columns), how many Entities (tables) do we think we will initially need?
Customer ID __________
Last Name ___________ First Name __________ Middle Initial ___
Gender _____ Born days ago ______ City _______
Date Product Product Product Retail Unit Sales Discount Store Store Region Store Basket
ID Name Category Price Cost Quantity Amt Id Name Id Sqft Id
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
Primary Key
One or more attributes that uniquely identifies an entity
nCluster does not require Primary Keys
Foreign Key
An attribute in common between entities making a relationship
nCluster uses the link but does not enforce referential integrity
Constraint Options
Null / Not Null (Not Null eliminates the instances of null values)
Check Values (Range constraints for logically partition tables)
Default Values (Default values set column to predefined value)
1. Do a CREATE TABLE statement with data type SERIAL with argument GLOBAL on a
DIMENSION table. This will ensure all rows inserted into the table will have a unique
value.
2. If you need to put the contents of the DIMENSION table into a FACT table, do the
following. Do an INSERT SELECT where you copy the contents of the DIMENSION
table into a new FACT table.
Use the star schema model (also known as dimensionalizing data) to put your most frequently
read columns into skinny fact tables, and to relegate the less frequently read columns to separate
dimension tables. The goal is to make your fact tables skinny. This lets your queries run faster,
because your queries dont have to read the less relevant dimension information. Put more
precisely, a query can scan more rows at a time, since each row is smaller, and this makes
lookups faster.
Now we need to decide which Schema model to use for our tables (Sales, Customer, Product, Store)
Star Schema
Classifies the attributes of an event into facts (measured numeric/time data), and
descriptive dimension attributes (product ID)
Care is taken to minimize the number and size of attributes in order to constrain the
overall table size and maintain performance
Star schemas are designed to optimize user ease-of-use and retrieval performance by
minimizing the number of tables to join to materialize a transaction
Snowflake schema
The snowflake schema is similar to the star schema. However, in the snowflake
schema, dimensions are normalized into multiple related tables, whereas the star
schema's dimensions are normalized with each dimension within a single table
Star and Snowflake Schemas are optimal for big data analytics:
The star schema approach is commonly used in data warehousing for databases that contain
large amounts of data. The star schema is efficient for large data sets because it relies on a
single, narrow table (called the fact table) that avoids storing descriptive values and repeated
values. Such columns are instead moved to helper tables called dimension tables. Timesensitive
queries are run against the fact table only and can run very quickly because the
narrowness of the table allows fast scanning. Queries that have to join against dimension tables take
slightly longer to run (but note that Aster Database supports a number of
techniques, outlined earlier in this document, for making these joins run fast).
You can apply dimensionalization with verticalization to speed up query performance by further reducing
the size of tables that must be scanned to find the desired rows. If youre running SQL-MapReduce
functions, a dimensionalized schema provides a much smaller memory footprint for SQL-MapReduce
operations that do not need the dimension data, because in most cases these operations need
only deal with an integer ID number, rather than the dimension data.
For more information and examples on how to dimensionalize data, refer to The Data
Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Ralph Kimball and Margy
Ross.
Use a Star schema to make your Fact table(s) skinny. Skinny tables let your
queries run faster because they have less data to read. With a properly
dimensionalized schema, queries can run up to 20x faster. Optional database
models include Snowflake and 3rd Normal Form
Customer ID __________
Last Name ___________ First Name __________ Middle Initial ___
Gender _____ Born days ago ______ City _______
Date Product Product Product Retail Unit Sales Discount Store Store Region Store Basket
ID Name Category Price Cost Quantity Amt Id Name Id Sqft Id
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
_____ ______ ______ _______ _____ _____ _____ ______ ______ _____ _____ _____ ______
Fact tables are usually very large (e.g. millions or billions of rows). These tables contain two
types of columns: the columns that contain facts and the columns that refer to the dimension
tables. Fact tables require a distribution key column to be declared.
FACT tables holds the metric values recorded Have following columns:
for a specific event. Which columns would you Customer_id: 100
choose ? Last_name: Nimitz
First_name: Juli
Middle_initial: A
Gender: F
Born_days_ago: 20075
City_id: 456
Date: 2012-07-04
Store_id: 567
Store_name: NickNack
Region_id: 5
Store_Sq_ft: 50000
Product_id: 692
Product_name: Widget
Product_category: Home
Retail_price: 10.50
Unit_cost: 4.00
Basket_id: 125
Sales_quantity: 3
Discount_amount: .10
Fact Tables
Fact tables are usually very large (i.e. millions or billions of rows), with each row containing a
set of dimension values and a set of measures. These tables contain two types of columns: the
columns that contain the facts (the raw data youre tracking such as units sold or pages
clicked) and the columns that are foreign keys to the dimension tables (Note that Aster
Database does not enforce referential constraints. Foreign keys are used mainly for joining
tables.) In Aster Database you must declare a distribution key column for each fact table using
DISTRIBUTE BY HASH. The distribution key tells Aster Database how to divide up the tables
contents so that these contents can be physically distributed across the vworkers. Distributing the
data in this way is called distribution or physical distribution in Aster Database. (Historically, it
used to be referred to as physical partitioning.) When creating a table, if DISTRIBUTE BY
HASH is used, then the table will be a FACT table by default, but may optionally be specified as
a DIMENSION table.
Dimension Tables
Dimension tables are usually smaller than fact tables (that is, a typical dimension table holds
only thousands of rows, rather than millions). Each dimension table specifies a set of known
values for a particular dimension. For example, a customers table is a dimension table that
contains detailed information about each customer (for example, customer_id, name,
address, and phone_number). Most dimension tables are replicated in Aster Database, meaning
that a copy of the table exists on every node in the cluster. Having a local copy on every node
makes it more likely that joins can run locally on each node, providing faster query results. To
declare your table as a replicated dimension table, include the DISTRIBUTE BY
REPLICATION clause in the CREATE TABLE statement.
Optionally, you can declare your dimension table as distributed, by declaring a distribution
key column using DISTRIBUTE BY HASH. In that case the table will be distributed across
nodes using the distribution key specified, rather than replicated on every node. There may be
a couple advantages to having a distributed dimension table:
If the fact table is distributed on the column that will be used to perform joins to a
dimension table, it can be very practical to distribute that dimension table on the same
column, too. Joins between the fact and dimension tables on their respective distribution
key fields will be fast because the lookups will be local.
Additions and updates to a distributed dimension table will be faster because those
changes only need to be made in one place, rather than to every instance, as is the case
with a replicated table.
In certain queries where only a small portion of data is retrieved from the table
(bytes), your queries will probably run faster if you create table with a
columnar storage layout, rather than the traditional row-wise layout
The obvious benefit of columnar tables is the fact that, for a given query, only the required
columns will be fetched from disk. In some situations, this can substantially reduce the
amount of I/O required. So if a majority of your queries on a table access a low percentage of
its columns, then it may be a good candidate for columnar.
The performance gains youll see will depend on the cost associated with transposing values
from a column-wise layout to a row-wise layout during retrieval. This cost goes up with the
total width of columns selected. As the total width of the selected columns increases, the cost
of I/O and transposition can exceed that of straight selection from a row table. Note that we
say column width, not number of columns.
For example, consider a table that consists of five columns, of which four are integer-typed
columns, and the other is a very wide, varchar-typed column. If most of your queries select
only the integer columns (and even if they select all of the integer columns), then it makes
sense to have the table be a columnar table. Doing so allows Aster Database to store the wide,
varchar values separately from the other columns, so that queries can load the other columns
without paying the price of scanning over the wide values.
Note that columnar storage tables are optimized for read operations not for update
operations. They have more overhead on INSERT/UPDATE than row-based tables do. It is
preferable to perform updates to columnar tables in batch rather than updating only a few
tuples at a time. Also, DELETE operations may show degraded performance if there are many
TOAST entries for a table.
To summarize, it makes sense to consider using a columnar table if the majority of the queries
going against this table in your workload access only a few of the columns and the table is not
expected to have a lot of incremental updates.
Column Store
Row Store Pageviews table
PageviewsTable
projection1 projection2 projection3 projection4
userid ip ts page domain qs ref userid ip ts page
domain
1 MB I/O
1 MB I/O
1 MB I/O
500 kb I/O 500 kb I/O
Row Storage - Must scan entire row so consuming I/O on all columns
Column Storage - Only scan columns you need. I/O is minimized
Verticalization can be used for row-formatted tables to achieve some of the similar
performance benefits of columnar storage. The only caveat is that a verticalized table requires
more manual effort to maintain as the data changes.
If your schema contains a wide fact table that must remain so to satisfy some users, but you
wish to narrow the table to allow other queries to run more quickly, then you can improve
performance by creating materialized projection tables that include only those columns
needed by your high-performance queries. We call this verticalization because the
projections are more vertical in nature than the wide fact tables they represent. Verticalization
is useful, for example, if you have wide tables on which you run daily reporting queries that
select only a few of the columns and must run quickly.
Tip: In most cases, 60% is the magic number: If most of your queries hit less that 60% of the
fact tables columns, then you should consider creating a verticalized projection of the fact
table.
Costs: When you weigh the usefulness of Verticalization, bear in mind the costs:
You must build a two-step loading process to create both your Fact table and
its materialized projection(s). This typically means a standard load to the main
Fact table and a CREATE TABLE AS SELECT to create the projection
You must rewrite some of your existing queries to run against the materialized
projection, rather than the main Fact table
How to Verticalize: To verticalize your schema, you create new tables that
contain copies of only those columns that are frequently queried together. We
refer to such copy-tables as materialized projections
Maintaining materialized projections requires that you periodically update or
recreate the materialized projection with data from the source Fact table.
Your queries that use only the projected columns will now run faster, while
other queries that need data from the columns not in the projection can
continue to use the original, wider table
For example, in a clickstream tracking database that logs users views of
website pages, you might precompute page_view summary statistics for
every combination of user, page_view, and domain
Your queries that use only the projected columns will now run faster, while other queries that
need data from the columns not in the projection can continue to use the original, wider table.
For example, in a clickstream tracking database that logs users views of website pages, you
might precompute page_view summary statistics for every combination of user, page_view,
and domain.
When choosing the distribution key, you should also, as a secondary concern, bear in mind
the evenness of your data distribution. That is, make sure the distribution key column
contains enough unique values to allow Aster Database to distribute data so that no single
worker holds significantly more data than any other. When a worker holds too much or too
little data, relative to other workers, we say there is data skew. Avoiding data skew is important,
but it is really a secondary concern because matching your schema to your joins is more
important.
The rule is to keep data movement across the network at a minimum. Any time
you have to move data from one v-Worker to another (ie: JOIN, GROUP BY),
you incur a performance penalty
Choose Join columns that match Hash column, there is no need to copy
data between v-Workers since rows guaranteed to be hashed to the same v-
Worker
Some SKEW may be acceptable on v-Worker if column chosen as
HASH column joined frequently to other tables
Follow these guidelines to pick the distribution key column for a table:
1 Consider your joins. Choose the column that will be used in your most performancecritical
joins. When picking your distribution key, choose the column that is most
frequently used in joins or aggregations (GROUP BY or DISTINCT), in that order. Since
Aster Database is optimized for joins, it is cost effective to design table schemas so that as
many joins happen on the distribution key as possible. Aster Database is also optimized for
aggregation on the column specified as the distribution key. Therefore, when there are no
joins but only aggregations (via GROUP BY or DISTINCT), then using the column most
frequently involved in the aggregation as the distribution key provides better performance.
2 Data skew is secondary. Consider using a distribution key that avoids data skew, but
remember that when youre optimizing for performance, its more important to pick a
distribution key that matches your joins than to pick one that avoids data skew. Data skew
occurs when the distribution key causes a disproportionately large number of rows to be
routed to a single worker in the cluster. As a result, one worker has a very large amount of
data, and all other workers have correspondingly smaller amounts of data. The one worker
with the majority of records performs slowly, as expected, and this can slow down queries
that need to access the skewed data.
3 What if theres no good candidate? If no appropriate distribution key column exists, you
may need to create a surrogate distribution key during loading. To do this, look for a
column or columns whose values might be transformed or combined to create more
useful distribution key values. As discussed in point 1 above, let your users actual join
predicates guide you to find useful values to distribute on. You can use SQL-MapReduce
functions to perform the needed transformations during loading. If you have no existing
join predicates to guide you, then you can create distribution key values that just minimize
data skew. For example, you might define a an id column of type UUID, and then, in
your data loading code, include an SQL-MapReduce function that uses a utility like
java.util.UUID to create a fairly universal identifier for every row. The broad distribution
of these values ensures good data distribution in Aster Database.
Explain select f.c1 from factjoin f INNER JOIN repljoin r on f.c1 = r.c1;
SELECT c.clickid, v.userid FROM click c, vendor v No Shuffle needed.
Only 1-Worker gets
c.userid
WHERESELECT = v.userid;
product_id, sum(sales_quantity) from sales_fact group by 1; involved
. . . . . . . .
Any time you have a Replicated table in a JOIN condition, a shuffle of data will not be required
for that JOIN regardless of the other table type (Replicated or Distributed).
SELECT
Explain select clickid, zip
f.c1 from factjoin FROM
f INNER click,
JOIN repljoin r on f.c1zip
= r.c1; No Shuffle
WHERE SELECT zip.userid = click.userid;
product_id, sum(sales_quantity) from sales_fact group by 1;
needed
. .
. . . . . . . .
AdHeight INTEGER,
AdStatus CHAR(10),
CONSTRAINT DimAdPK PRIMARY KEY (AdID)
)
DISTRIBUTE BY REPLICATION;
Note, there is no Distribution KEY
declaration for Replicated tables
CREATE TABLE web_clicks insert into web_clicks values (100, 1, 200, home, 2012-01-01);
(customer_id INTEGER, session_id INTEGER, insert into web_clicks values (100, 1, 300, mortgage, 2012-01-02);
page VARCHAR(100), visitdate date) insert into web_clicks values (200, 1, 400, fraud, 2012-01-03);
DISTRIBUTE BY REPLICATION; insert into web_clicks values (200, 1, 500, savings, 2012-01-04);
insert into web_clicks values (300, 1, 600, checking, 2012-01-05);
insert into web_clicks values (300, 1, 700, cd, 2012-01-06);
Making good use of analytic tables where appropriate can speed up query performance and
make multiple explorations on a specific set of data easier to perform. Analytic tables are not
replicated, so for very large tables that are based on derived data, they reduce the load on the
cluster. They have the benefit over temporary tables of not having to worry about losing the
data if a session or transaction is terminated before the user has finished doing the analysis.
Some common use cases for analytic tables are:
Create an analytic table to hold the output of a SQL-MR function, such as sessionize,
attribution or nPath. Then use the analytic table as input to other SQL-MR functions or
SQL queries. For example, nPath is sometimes used to filter web sessions based on the
behavior of shoppers in an online store (i.e. browsers, cherry pickers, price-sensitive
shoppers, etc.). Then further analysis can be done on just the sessions that fit that behavior
profile.
Use an analytic table to hold the results of a resource-intensive JOIN operation, so further
exploration can be done on the data without having to perform the JOIN again.
Employ analytic tables for a complex multistep process for which you need the highest
performance and want to keep the end results, but not the intermediate steps. In this case,
you can do most of the processing using analytic tables, and then write to a regular
(persistent) table at the very end of that process.
The reason these operations invalidate analytic tables has to do with replication. Analytic
tables are effectively unreplicated (RF=1) tables, because although their metadata is replicated,
the data itself is not. After a worker failover operation, the partition that was previously a
Secondary becomes a Primary. Since the data in the Analytic Tables was never replicated to the
Secondary, that Secondary (which is now the new Primary) does not have a copy of the data
rows - just an empty table. Therefore, a worker failover must invalidate the Analytic Tables to
force the user to recognize that the Analytic Tables don't have any data. Similarly, other
operations which may cause the Secondary to be used (i.e. balance data, node failover, etc. will
also have the side effect of invalidating the analytic tables.
BEGIN;
CREATE TEMP TABLE temp1_hash
DISTRIBUTE BY HASH( emp) AS
SELECT e.emp, d.dept FROM emp e, dept d
WHERE e.dept = d.dept and e.mgr = 801;
SELECT * from temp1_hash;
END;
BEGIN;
CREATE TEMP TABLE temp1_repl
DISTRIBUTE BY REPLICATION AS
SELECT e.emp, d.dept FROM emp e, dept d
WHERE e.dept = d.dept and e.mgr = 801;
SELECT * from temp1_repl;
END;
Fact tables are usually very large (e.g. millions or billions of rows). These tables contain
two types of columns: the columns that contain facts and the columns that refer to the
dimension tables. Fact tables require a distribution key column to be declared.
Dimension tables are usually much smaller (e.g. tens to thousands of rows) than fact
tables. Each dimension table specifies a set of known descriptive values for a particular
dimension. For example, a customers table can be a dimension table that contains
detailed information about each customer: for example, a customer ID, name, address,
and phone number. A distribution key column is optional for dimension tables.
It is also possible to query the NC_ALL_TABLES view to determine the table type as well.
Session1 table
To copy Result set from an
SQL-MR query to a
Permanent Table, use
above syntax
There is no change in query syntax for compressed tables. For all query purposes, a
compressed table will be treated the same as a normal table. Compression is currently not
supported for temporary tables. Compressed tables are replicated in their compressed form.
Before you alter existing table compression properties compression levels, initial
compression of a table, decompression of a table you should ensure that there is sufficient
disk space available for the operation.
Table compression occurs in an online fashion without disruption to Aster Database. One
useful application of compression is to combine it with Aster Databases logical partitioning
feature for information lifecycle management. As you recall, logical partitioning enables
creation of a hierarchy such that a large table can have partitions, which in turn can have their
own partitions, and so on. If the child partitions are range-partitioned (e.g. monthly
partitions), compression can be used to compress the monthly child partitions over time, as
they become less frequently accessed.
For example, assume it is November. You may leave the October and November child
partitions uncompressed as they are more frequently accessed. However, older data can be
compressed at increasing levels since query frequency may drop as data gets stale. For
example, Q3 data (July-Sept.) may be compressed LOW, Q2 data (April-June) may be
compressed MEDIUM, and Q1 data (Jan.-Mar.) may be compressed HIGH.
Realized compression ratios depend on the compression level selected by the user and the data
characteristics. While realized compression rates vary, typical ratios range from 3x to 12x.
At slight cost of CPU, get better Disk I/O since have more rows per Data block
Use Logical Partitioning to give the effect of smaller tables, which shortens
query runtimes. By using logical partitioning, queries will improve in
performance nearly linearly with the number of partitions added
Logical Partitioned tables can take advantage of Partition Pruning. Provided
that the child partition structure matches the users predicate in the WHERE
clause, Aster Database reads only the relevant child partitions, resulting in
fast query runtimes
2 Update performance is improved, since each partition of the table has indexes smaller than
an index on the entire data set would be.
3 You can effectively do a bulk delete by dropping a child partition, as long as your
partitioning design plan allows for it. ALTER TABLE ... DROP PARTITION is far faster
than a bulk DELETE.
4 Removing a large segment of data does not leave a big hole in the table as it would when
using only one large table.
With Logical Partitioning we can reap both access performance and data
manageability benefits in an Cluster environment:
2 ways to PARTITION BY
LIST
RANGE
PARTITION BY LIST is when you provide a list of values that belong to Partition
- If NULL is in the list, then the partition will include NULL values
[ ie: VALUES(1,2, NULL) ]
START Value
Required? No, declaring a START value is not required. If you omit it, Aster Database will
create a default START value for you.
If you omit the START value from the first range partition, it is equivalent to declaring
START MINVALUE.
If you omit the START value from any subsequent range partition, that partition uses the
END value of the preceding partition as its start value.
Allowed values: For the END value, you may specify a constant or the keyword,
MAXVALUE. Specifying END MAXVALUE says that there is no upper bound in the
partition. MAXVALUE does not correspond to a real value. Conceptually, MAXVALUE is
greater than all possible values (including NULL).
The range partition can also specify NULLS FIRST or NULLS LAST. This says that the NULL
value is ordered before (NULLS FIRST) or after (NULLS LAST) all the other values in the
partition. Regardless of which option for NULL ordering is designated, any NULL values will
come after MINVALUE and before MAXVALUE. The default is NULLS LAST.
In range partitions, using NULLS FIRST or NULLS LAST only affects the interpretation of the
START and END values in a range partition.
If NULLS LAST is chosen (that is the default), then the ordering will be:
MINVALUE, <actual values, e.g. Albania, Zambia>, NULL, MAXVALUE
The Range partition can also specify NULLS FIRST or NULLS LAST. This says
that the NULL value is ordered before (NULLS FIRST) or after (NULLS LAST) all
the other values in the partition. Regardless of which option for NULL ordering
is designated, any NULL values will come after MINVALUE and before
MAXVALUE. The default is NULLS LAST
If you say NULLS FIRST: Similarly, if you say NULLS LAST:
(START MINVALUE END 0) will include NULL (START MINVALUE END 0) will not include NULL
(START NULL END 0) will be a valid range (START NULL END 0) is not valid and will be
(START 0 END NULL) is not valid and will be rejected by the system
rejected by the system (START 0 END NULL) will be a valid range
(START 0 END MAXVALUE) will not contain NULL (START 0 END MAXVALUE) will contain NULL
CREATE
CREATE TABLE
TABLE customer
lp_null_first (customer_id,
(customer_id int, zipcodezipcode
int ) int )
DISTRIBUTE BY HASH(customer_id)
DISTRIBUTE BY HASH(customer_id)
PARTITION BY RANGE(zipcode NULLS FIRST)
PARTITION
( PARTITION BY RANGE(zipcode
zipcode_is_NULL( START NULL END NULLS FIRST) PARTITION
NULL INCLUSIVE), (
zipcodes_00000_09999 (START 0 END 10000 EXCLUSIVE), PARTITION zipcodes_10000_19999 (END
PARTITION
20000 EXCLUSIVE), zipcode_is_NULL (START
PARTITION zipcodes_20000_29999 (ENDNULL
30000 END NULL
EXCLUSIVE) ); INCLUSIVE),
PARTITION zipcodes_00000_09999 (START 0 END 10000 EXCLUSIVE),
insert into lp_null_first values (1, null);
insert into lp_null_first values (2, 00000);
PARTITION
insert zipcodes_10000_19999
into lp_null_first values (3, 10000); (END 20000 EXCLUSIVE),
insert into lp_null_first values (4, 20000);
PARTITION zipcodes_20000_29999 (END 30000 EXCLUSIVE),
select * from lp_null_first;
... );
Confidential and proprietary. Copyright 2009 Aster Data Systems
When using ranges, ensure that ranges do not overlap. When using values, ensure that
each value is assigned to only one partition. If you attempt to insert rows with values that
do not fall within the defined list of ranges or values for any partition, the insert will fail.
Note that you can create multilevel partitioned tables in one CREATE TABLE command
by nesting PARTITION BY statements.
Compression is supported at every level in the logical partition hierarchy. That is, you may
compress the table itself, its index, and/or any of its child partitions. If compression is
specified for the table or one of its partitions, the compression will cascade to any partitions
below it in the hierarchy, unless they have their own compression explicitly specified.
CREATE FACT TABLE records(id int, country varchar, ts timestamp) DISTRIBUTE BY HASH(id)
PARTITION BY RANGE(ts) ( PARTITION oldrecords( END '2010-01-01' COMPRESS LOW),
PARTITION jan01_2010( END '2010-01-02' COMPRESS LOW),
PARTITION jan02_2010( END '2010-01-03' COMPRESS LOW),
PARTITION jan03_2010( END '2010-01-04' COMPRESS LOW),
...
PARTITION dec31_2010( END '2011-01-01' COMPRESS LOW),
PARTITION jan01_2011( END '2011-01-02' PARTITION BY LIST(country) (
PARTITION na ( VALUES ('usa', 'canada', 'mexico') ),
PARTITION eu ( VALUES ('germany', 'spain') ) )
- This can improve query performance when the predicate for each
level is given in the query (in the WHERE clause)
- This can cause full table scans to run longer because now a
SELECT * FROM parent_table; query must open/read MULTIPLE
child tables!
.. So LP tables can
PARTITION BY RANGE (sales_date ) provide increased
perf. However
1 level deep (partition Sales_Old (END '2012-07-01') without a WHERE
2 level deep ,partition Sales_2012_07 (END '2012-08-01' clause using LP
columns, an LP
PARTITION BY LIST (region) table can be
(partition East (VALUES ('E') slower than an
,partition West (VALUES ('W'))); non-LP table
since has to walk
through each
SELECT * FROM sales partition
The ALTER TABLE...ADD PARTITION operation allows adding a new partition to a logically
partitioned table. The new partition will have the same columns, indexes, permissions and
distribution key as the logically partitioned table to which it will be added.
Examples
The following example adds a partition south_america to an existing table of distributors
partitioned by a list of country names:
The ALTER TABLE...DROP PARTITION operation drops an existing partition and all of its
data from a logically partitioned table. If the partition to be dropped is a subpartition, the
reference to it must include references to all partitions above it in the hierarchy (i.e.
partition_name.subpartition_name). If the partition to be dropped includes
subpartitions, they will be deleted as well, along with their data.
Suppose have the following LP table and want to detach post_2008 partition and add 200901 partition
Users are complaining about the performance of a large fact table named
page_view_fact. You find they are doing queries based on dates
Use PROCESS tab in AMC to record times: Non-LP: _______ LP: _______
Most queries from the SALES_FACT table will be queried by MONTH so create
this table as a Logically Partitioned table for each Month of 2008
Indexes are created using the CREATE INDEX command, specifying the name of the index,
the underlying table to index, and the column(s) to index.
Indexes are useful for quickly accessing selective amounts of data. For example, suppose we
want to filter a table called pageviews by a highly selective criteria: the sessionid in (11, 12).
Then an index on the sessionid column of the pageviews table would speed up this query. In
contrast, suppose we have a non-selective criteria: find the average age of all users in the North
America region. Then a sequential scan is likely more efficient than index-based access which
results in many random I/Os.
Teradata recommends the following guidelines for vworker-local indexes. (Note: Aster
Database does not support cluster-global indexes. Instead, indexes are per-vworker, which
means each index includes all rows stored on that vworker.)
It is more efficient to create indexes on fact tables after data has been loaded. If done the
other way, i.e., if an index exists before the load begins, the database will need to maintain
the index for every inserted row, slowing loads tremendously.
Low row selection: If a workload has queries that frequently access less than 10-15% of
the rows in a large table, an index might be appropriate. Of course, such a percentage
value depends largely on the relative speed of table-scan and the distribution of the row
data in relation to the order of the index key. The faster the table-scan, the lower the
above percentage; the more clustered the row data, the higher the percentage.
JOINs on multiple tables: Performance of JOINs across multiple tables could improve
with indexes, as the execution plan avoids sequential scans of all tables in the JOIN.
Suitability of columns for indexing: If a column contains many NULLs, and the workload has
frequent queries that access the non-NULL values, an index might be appropriate.
When creating an index with a composite (multi-column) key, the order of the columns
should be based on the general rule that the most frequently occurring columns in queries
should be placed first. For example, an index over columns <c1, c2, c3> will be used by
queries that access either c1, c1 and c2, or c1 and c2 and c3. Queries that access c2, c3, or
c2 and c3 will not leverage the index.
An index will speed up SELECTs, but will slow down DMLs such as INSERTs, DELETEs,
and UPDATEs, because index maintenance is a per-row operation and entails random I/O.
This trade-off between SELECTs and DMLs must be kept in mind when deciding how
many indexes to create for a table. Tables that are primarily read-only will benefit from
indexes. Tables that get modified very often will do better with fewer indexes.
Aster performs an Index scan only if all of the following criteria are met:
Indexes are available and
The query filters on a column that has an index; and
The optimizer thinks the querys filtering on the indexed column will remove
at least 80-90% of the rows from the result. Please note that what matters
is what the optimizer thinks, so always make sure you run ANALYZE on the
table after any significant change to the tables contents
Other considerations:
For many slightly complicated queries, the planner doesnt estimate the
cost of an operation exactly right. In such scenarios, the planner might
mistakenly use an index scan because it assumed high selectivity, when in
actuality the selectivity is not high enough
When your table contains a large number of rows but your typical queries
only select a small portion at any point, indices are usually helpful
because they can reduce the amount of scanning needed to find a
querys results, but indexes are not the only way. In these situations,
using logical partitioning may also reduce scanning and should be
considered as an alternative to indexing
The key field(s) for the index are specified as column names. Multiple fields can be specified.
An index field can be an expression computed from the values of one or more columns of the
table row. This feature can be used to obtain fast access to data based on some transformation
of the basic data. For example, an index computed on upper(col) would allow the clause
WHERE upper(col) = 'JIM' to use an index.
Aster Database supports the B-tree index method (btree) and the GiST method (gist).
Note that GiST indexes (used for columns of IP4range datatype) is not supported for analytic
tables. So if you want to index a table with an IP4range column, you should create the table as
a regular or persistent table.
The keyword ASC (ascending) or DESC (descending) is optional. It specifies the ordering of the
values in the column. If not specified, ASC is assumed by default.
If NULLS LAST is specified, null values sort after all non-null values in the index; if NULLS
FIRST is specified, null values sort before all non-null values in the index. If neither is
specified, the default behavior is NULLS LAST when ASC is specified or implied, and NULLS
FIRST when DESC is specified (thus, the default is to act as though nulls are larger than
nonnulls).
When the WHERE clause is present, a partial index is created. A partial index is an index that
contains entries for only a portion of a table, usually a portion that is more useful for indexing
than the rest of the table. For example, if you have a table that contains both billed and
unbilled orders, where the unbilled orders take up a small fraction of the total table and yet
that is an often used section, you can improve performance by creating an index on just that
portion.
The expression used in the WHERE clause may refer only to columns of the underlying table,
but it can use all columns, not just the ones being indexed. Presently, subqueries and aggregate
expressions are also forbidden in WHERE. The same restrictions apply to index fields that are
expressions.
Note we could have done this just as easily using a Logically Partitioned table
4. You can define a different Distribution Key when using the CREATE
TABLE AS syntax (T of F)
Mod 5
Loading
Now is a good time to RESUME
the HDP 2.1 VMware image
To load data into Aster Database, you can use the Aster Loader Tool, the COPY command, the
INSERT command, or a custom-defined SQL-MapReduce data loading function you have
written. This section provides tips for efficient loading and shows how to load using the Aster
Loader Tool.
This module will focus mainly on the Aster utility NCLUSTER_LOADER utility.
Input files are typically gathered together on a server called a Staging machine where the
ncluster_loader.exe utility is installed. Using the ncluster_loader syntax, a command can be
issued to load data using a Loader node if desired. By specifying a Loader node, this relieves the
Queen from doing the loading.
Rather than load a single, larger, data file with a single instance of loader, the input file can be
split into smaller chunks using an OS command such and split or csvgrep in linux. Then to
run multiple copies of ncluster_loader, on linux, you put an nohup before the ncluster_loader
command to cause it to run in the background.
To load table(s) from multiple files you must map source files to target
tables in a Mapping file
Can invoke multiple times on the same staging machine (i.e. nohup
ncluster_loader) so can load in parallel
where
arguments are the command-line flags that control how the loader runs. The flags are
explained in Argument Flags, below, or you can display the help by typing:
$ ncluster_loader -?
tablename is the name of the destination table (See Case-Sensitive Handling for
Table Names on page 180 if you wish to have Aster Database evaluate table names in a
case-sensitive manner.);
filename or dirname indicates the file or directory of files to be loaded. Files to be loaded
can optionally be compressed gzip or bzip2 files. These are extracted and their contents are
then loaded.
filename Qualified path of the file containing the data to be loaded. The contents of
the file must be in either CSV or text format, as described for the COPY statement.
Details of the encoding used (such as non-default values for null or delimiter) are
specified using the appropriate options, as described below; or
dirname Qualified path of the directory containing one or more data files to be
loaded. All data files found within this directory are expected to be in the same format
and will be loaded as a single transaction. Subdirectories will not be processed.
Generic syntax
$ ncluster_loader [flags] tablename { filename | dirname }
You can invoke ncluster_loader utility multiple times on the same staging machine
by adding nohup before the ncluster_loader command.
In the table that follows, the argument flags are sorted based on the long-form, command-line
flag:
The left column lists the flag you use at the command line
The middle column lists the flag you can use in a map file
If no value appears in the middle column, then the argument is one that can only be passed at the
command line.
Note that you can always specify a loader node using its IP address. If you wish to specify it by
hostname, you first need to add the loader node to the Aster Database hosts file on all nodes
through the Hosts tab in the AMC.
See Teradata Aster Client User Guide for more information on loader arguments and their use.
The Aster Loader Tool supports the use of many Aster Loader nodes. For most loading tasks,
the queen is sufficient to handle all loading, but for high volume loading, you can add dedicated
loader nodes to your cluster.
To use a loader node, you invoke one or more ncluster_loader instances that will load through
that loader node. You may run many ncluster_loader sessions in parallel against one loader node,
and you may use many loader nodes in parallel (with each node handling loads from a number of
ncluster_loader instances).
To do this, you invoke each ncluster_loader instance with the -l (and optionally -f) argument to
specify the loader node. The required flags are:
the --loader flag (-l) provides the IP address of the desired loader node; and
Optionally, the --force-loader flag (-f) forces the use of the desired loader node.
Loader Node
- Optional! (For small data volumes you
can load via the Queen)
Loader Node not only relieve the Queen of hashing rows for Distributed tables, it
also offloads the work of receiving the stream of rows from the client and
distributing them to the v-Workers
Since you can invoke ncluster_loader multiple times on the staging machine, you can load in
parallel to speed the process.
Queen Cluster
Loaders Workers
Point to Loaders via Argument -l. Note each Cluster Loader Tool
still has to point to Queen so jobs can be assigned, but Loaders
now do the Hashing and stream data
Files are considered to be in tab separated value (TSV) format. If this is not the case you can use
the c flag to denote comma separted value (CSV) or the D flag to explicitly spell out the
delimiter.
The column delimiter character to use when interpreting the input file (must be a string that
represents a valid single character, such as 'd' or '\n'). The default is a tab character ('\t').
When loading a very large amount of data, you may choose to created multiple map files that
each load their data files using a different loader. This can help speed up the process of loading
a large amount of data.
The map file is a text file containing a set of logical text blocks, each surrounded by curly braces.
Each block represents a file or directory to be loaded. The format is like this:
{
"dbname" : "beehive",
"username" : "beehive",
"password" : "beehive_pwd",
"loader" : "141.206.66.28",
"force-loader" : true,
"timeout" : 5,
"loadconfig" :
[
{
"table" : "schema1.targettable1",
"file" : "data/insert1.txt",
"errorlogging" : { "enabled" : true }
},
{
"table" : "schema1.targettable1",
"file" : "data/insert2.txt",
"begin-script" : "input/mapfile/begin-script.sql",
"end-script" : "input/mapfile/end-script.sql",
"errorlogging" :
{
"enabled" : true,
"discard-errors" : true
"label" : "insert2_log",
"schema" : "nc_system",
"table" : "nc_errortable_part"
}
]
}
In the above example, we assume the current directory (from which we invoke ncluster_loader)
contains a subdirectory, data, which has two files, insert1.txt and insert2.txt, and we load these
both into table targettable1 in schema1. Error logging is turned on. For the second table,
additional error logging parameters are supplied to log to a system table, label each row and skip
errors.
See Teradata Aster Client User Guide for more information on loader arguments and their use.
With Cluster_loader tool you can load multiple data files into multiple tables in a
single invocation using this flag and a file:
"errorlogging":{"enabled":true,"label":"page200801","limit":100000,"schema":"prod","table":"load_pag
e_err"}}
,
{"table" : "prod.page_view_fact", Could have pointed to a different table here
"file": "/home/lab07/pageviewdata200802.tsv",
"errorlogging":{"enabled":true,"label":"page200802","limit":100000,"schema":"prod","table":"load_pag
e_err"}}
]}
MAP FILEs load in Serial so to achieve Parallelism, invoke multiple ncluster_loader.exe with MAP FILES
--columns username,orderqty,timestamp
Must not have spaces after comma (,) or else load fails
Useful for:
-Shuffling columns around to fix any mismatchs in the column ordering
between input file and destination table
-Telling the Loader which columns will not receive data if an input file lacks
data for columns in the destination table. In other words, lack of column name
in above command means NULL values for tables column rows (see next slide
for example)
-Alternatively if have 4 values in file and 3 columns in table, load will fail. It
cannot ignore columns in input files, for that you need (see two slides from
here). Workaround is using pre-Script to load to staging table
By using -C, you can specify that the ordering of the columns in the input file is different from
the ordering in the table. You can also use -C to specify that this input file contains values for
only some of the columns in the table.
The input data is assumed to contain values for the columns in the order specified here. For
example, to load data into columns 'col1' and 'col2', one could specify "col1, col2" as the value
for this option. Column names not specified here are expected to get NULL values.
When using the -C option where the column list has any uppercase or special characters, you
must put the column list in double quotes. On Windows, this requires escaping the double
quotes:
Example on Linux:
On Windows, when using the -C option where the column list has any uppercase or special
characters, you must put the column list in escaped double quotes.
Example on Windows:
--columns c1,c2
Specifies the qualified path of the file containing SQL commands that should be executed when
the transaction starts, i.e. immediately after the BEGIN command is issued to Aster Database.
Note: Data returning
statements such as SELECT are not allowed in scripts executed by the ncluster_loader.
Qualified path of the file containing SQL commands that should be executed when the
transaction is about to commit, i.e., immediately before the END/COMMIT command is issued
to Aster Database.
Goal: Show how to LOAD data from Hadoop into new Aster table
01b (1
03b of 4) DELETE
CREATE TABLE
from ASTER_TARGET;
Aster_from_Hadoop distribute by hash(department_number) AS
SELECT count(*) from 0 rows
SELECT ASTER_TARGET;
* FROM load_from_hcatalog Aster Function
1 3-4 2
--skip-rows arg
Specifies how many rows of the file to skip before starting to load data. If combined with --
header-included, --skip-rows will not start counting until after the first line in the file. The
default is 0. --verbose Run in verbose mode.
-z [ --auto-analyze ]
Runs ANALYZE on the table(s) after loading the data and sets the hint bits on the table(s). By
default, this is disabled. If data was loaded to child tables via autopartitioning, they will be
analyzed, as well. Note that analyzing a columnar table may be slow if there are many columns.
To improve the speed of statistics collection, execute a
separate ANALYZE command after the load that only processes the columns involved in query
row filters or grouping.
--verbose
--el-enabled errorlogging
If present, turns on error logging for this invocation of the ncluster_loader. This needs to be
enabled for any other error logging option to be accepted. The default is disabled.
Use the --el-enabled flag (or the errorlogging flag inside a map file) to run the Aster
Loader Tool in a mode in which it tolerates poorly formatted input rows and logs each bad
row to a table. This differs from Aster Loaders normal mode of running:
Running normally, Aster Loader aborts the load immediately if it encounters a bad input
row, and it does not log the malformed input row to a table.
Running in --el-enabled mode, Aster Loader logs each malformed input row (that is,
any row it cannot interpret for loading or cannot load due to datatype mismatch or check
constraint violation) to an error table and continues to load the remaining rows in the
load job. We refer to this as error logging.
The --el-enabled flag is a master flag that operates with a set of sub-flags (--el-discarderrors,
--el-errfile, --el-label, --el-limit, --el-schema, and --el-table) that
fine-tune your handling of malformed rows. To use any of the sub-flags, you must first have
specified the --el-enabled flag. If youre using a map file, the syntax is different. The master
flag is errorlogging, and the sub-flags are discard-errors, errfile, label, limit,
schema, and table.
The --el-discard-errors flag discards all malformed rows, the --el-label tags failed
row data, the --el-limit flag sets a maximum number of allowed failed rows for the job, and
the --el-table flag specifies a custom error logging table.
To perform error logging, the Aster Loader Tool relies on the error handling features of the
Aster Database COPY command in SQL.
If data being loaded will cause duplicate values of a UNIQUE or PRIMARY KEY constraint on
the target table, it is considered an error. This particular error cannot be handled by error
logging, so the loading transaction will be aborted if any record causes a unique or primary
key constraint violation, even when error logging is enabled.
Error logging turned off by default. Must enable via --el-enabled argument. If error
logging is not enabled then the load job will abort and rollback the data upon first error
encountered
If error table not defined (have --el-enabled but not --el-table <tablename>), then
default error table used. From ACT, \dt nc_system.* to view these 2 default error tables
--el-label <labelname> is optional and adds extra column to Error table. This is so
when have multiple loads inserting into same Error table, that you can recognize which
rows belong to which load job
--el-limit <#> is optional. If not defined, (but -el-enabled defined), even in presence of
malformed rows, load succeeds. If limit set (ie: 100), and exceed this limit, transaction
aborts and no rows inserted into error table or the target table
Error logging is more general than just handling malformed rows. Essentially, any error
related to an individual row can be recorded by error logging and the load can continue,
except for the special case of UNIQUE/PRIMARY KEY violations. If there is such a violation,
the loader will abort the load. Also, any error not related to an individual row (such as
insufficient privileges) will abort the load operation.
Sub-flag that can accompany --el-enabled. The elerrfile flag introduces the pathname of the
optional error logging file. If you use error logging, you must have an error logging table,
and you can have an error logging file. Upon completion of the load, ncluster_loader writes the
contents of the error logging tables rawdata column (and no other columns) to the error logging
file. Only the contents of this column are written to the file. The filename will have a numeral
appended to it in the form, _0.
For this option to work, you must have also specified an el-table.
Regardless of whether or not you specify that an error logging file should be used, the error
logging table will still contain all error rows upon completion of each load.
One way to correct errors, and get the data loaded into Cluster, is
to use the following flag:
Upon completion of the load, Cluster will then write the contents
of the error logging tables raw data column, and no other data, to
the designated error logging file. Upon completion of the load,
ncluster_loader writes the contents of the error logging tables
rawdata column (and no other columns) to the error logging file.
Only the contents of this column are written to the file. The
filename will have a numeral appended to it in the form, _0
Inspect the contents of the error logging table to find the cause of
the errors, fix the problems you find there in the file, and finally
reload the fixed data from the file
Make sure that even in the presence of malformed rows a given load operation succeeds;
This can be accomplished by enabling error logging but not setting an error logging limit .
(Set --el-enabled but do not set an --el-limit.) If you are not interested in what errors are present,
malformed input rows can be discarded so that they are not stored in the cluster (--el-enabled --el-
discard-errors).
Abort data load operation in the presence of too many malformed rows;
This is in particularly useful if you want a given load operation to abort if too many malformed rows are
present in the input data (--el-enabled --el-limit = 100). In order to preserve atomicity for bulk load
operations, the load operation fails as a transaction when the error limit is exceeded. When the operation
fails, any rows already written by the transaction to the target table and error logging table are deleted
If have violation when loading, the above two constraints will always cause load
to abort regardless of any error settings in ncluster_loader. Must find and fix
these before load
This can be accomplished by enabling error logging but not setting an error
logging limit. Set --el-enabled but do not set an --el-limit)
If you are not interested in what errors are present, malformed input rows can be
discarded using the --el-discard-errors
If want an exclusive Error table for each load, you must define Error table prior
to loading and grant any necessary permissions
When the operation fails, any rows already written by the transaction to the target
table and error logging table are deleted. Use the --el-enabled --el-limit <INT>
So if have 4 errors during the load, load will fail and the good rows are rolled back
Quotes can be used on any portion of a data field, typically around special
characters. For example, with the default CSV mode, this is the usual way to
handle commas within a string:
Below example can introduce problems when working with varchar columns,
because many people put space between the comma and the quote, and that
space is considered significant. For example:
In this example,the third column will be loaded with a space before r character.
Note can only see space in ACT. Will not see this space in TD Studio
Daily child tables offer a good example: When you load to daily child tables, run
ANALYZE on each child table after you load the days data into that table
If either the schema name, the table name or both names include
capital letters, you must surround each name in escaped quotation
marks, individually
Also note a carriage return at the end of the source file will cause Error
SPLIT is a Linux command that can split a large file into smaller ones.
Use Linux commands to split larger files into multiple smaller ones, then
load in parallel via ncluster_loader
Load Speeds
From 10GB/hour (Replicated tables, especially if have load errors) to 500GB/hour for Distributed tables.
(LP tables are in the 10-20GB/hr range). Loads tend to slow down on bigger files and on Columnar tables
It is important to point out there is no built-in error table that is created when loading into an
Aster table. In other words, this is an all-or-nothing proposition. Any malformed row loading
into an Aster table will cause an ABORT and ROLLBACK of that data.
Teradata
Teradata | Aster
Supports Terabytes Data Warehouse
MapReduce Platform
of Data Transfer
Workers per Hour
Goal: Show how to LOAD data from Teradata into new Aster table
01b (1 of 4) DELETE
CREATE
from ASTER_TARGET;
TABLE Aster_from_TD distribute by hash(employee_number) compress low AS
SELECT count(*) from 0 rows
SELECT ASTER_TARGET;
* FROM load_from_teradata Aster Function
QUERY ('SELECT
First error* FROM sql00.teradata_source'));
rollbacks the Aster table contents
COPY
QueryGrid: Aster-Hadoop is in an
intelligent connector to Hadoop that
selectively pulls data from Hadoop into
Aster for analytics
Goal: Show how to LOAD data from Hadoop into new Aster table
01b (2 of server
4) ('192.168.100.21')
SELECT dbname ('default')
* from sql00.teradata_source; 26 rows
INSERT into ASTER_TARGET
username ('hive') tablename ('department')); Source table on Hadoop
SELECT * FROM load_from_teradata
01c (3 of 4) Aster Function
(ON mr_driver
tdpid ('dbc') username ('sql00') password ('sql00')
03c SELECT * QUERYFROM Aster_from_Hadoop; 9 rows
('SELECT * FROM sql00.teradata_source')); TD credentials used to logon
01d (4 of 4) SELECT * from ASTER_TARGET; 26 rows
This transferred contents of table to a different cluster, running different version of Aster,
and put it in a table with a different storage type (columnar vs row). It was 17M rows
Objectives:
5a Bulk loading
5b Data validation (error logging)
5c Loading small datasets
1. Open your lab manuals to Page 22
2. Perform the steps in the labs
3. Notify the instructor when you are finished
4. Be prepared to discuss the labs
Before you begin:
SUSPEND the HDP 2.1 VMware image
When prompted later in the lab you will need to POWER
ON the LOADER. Do not do this step until prompted
Note in the following labs, all the files to be loaded are on the Queen
and we will invoke NCLUSTER_LOADER from the Queen. Typically
we would load from a Staging machine instead of using the Queen
Mod 06
Managing Tables
How to TRUNCATE
How to ANALYZE
32-kb db
102 Mark . True 102 Mark . True increments (datablocks) to
103 Marty . True 103 Mary . True accommodate the row size.
104 Paula . True 104 Paula . True Note this may create some Free
105 Kris . True 105 Kris . True
32-kb db Space
106 Jude . True 106 Jude . True
Have 3 - 32 kb datablocks
114 Ann True In addition, a hidden column is
115 Dave True created that flags the new rows
Free Space
as being able to be queried
Table grew to 4 datablocks
with new INSERTS. FREE
SPACE went down
The main difference between multiversion and lock models is that in MVCC locks acquired for
querying (reading) data don't conflict with locks acquired for writing data and so reading never
blocks writing and writing never blocks reading.
108 Su True
DELETE FROM employees
108 Su True
203 Esther False 203 Esther True Rows are not removed when
204 Jan False 204 Jan True
DML operations are applied.
They are simply marked as not
205 Mike False 205 Mike True
The TRUNCATE command empties a table or set of tables. TRUNCATE is a faster alternative
to performing an unqualified DELETE on a table. DELETE operates more slowly because it
does a full scan of each table before deleting the rows. TRUNCATE deletes the rows without
performing a scan.
If your table has child tables created through inheritance and you want to delete rows from the
entire hierarchy, remember to include the CASCADE option. Note that this usage of the
CASCADE option is different from the meaning of TRUNCATE CASCADE in Postgres.
If the table is a logically partitioned table, TRUNCATE automatically acts on the whole
hierarchy unless the ONLY keyword is used.
Synopsis
Description
TRUNCATE removes all rows from a set of tables. It reclaims disk space, rather than requiring
a subsequent VACUUM operation. The reclaimed disk space may be available immediately, or
availability may be deferred. This is most useful on large tables.
TRUNCATE is typically used for STAGING tables and takes Table size to 0 kb
DELETE increases
DEAD_TUPLE_LEN. Large
values here are to be avoided
when possible
We will do an INSERT, UPDATE and DELETE and confirm table size on the
ASTER_TARGET able using below code. Record Table size for each after
running SQL-MR function NCLUSTER_STORAGESTAT
Dead Tuples
TRUNCATE aster_target;
SELECT * from ncluster_storagestat('aaf.aster_target'); 0
_________________
TRUNCATE aster_target;
select * from ncluster_storagestat('aaf.aster_target'); 0
_________________
If your table has child tables created through inheritance, dont forget to include the CASCADE
option. If the table is a logically partitioned table, VACUUM automatically acts on the whole
hierarchy, unless you specify a partition using partitionname.
VACUUM ANALYZE performs a VACUUM and then an ANALYZE on the specified table or
partition. This updates the table or partitions statistics for proper query planning. See
ANALYZE for details.
Synopsis
Run the VACUUM command on the queen, from the Aster Database Cluster Terminal (ACT).
The default Aster Database behavior requires that you pass a tablename argument:
VACUUM [ FULL ] tablename [ partition_reference ] [ CASCADE ]
When you run ANALYZE during a vacuum, you can also pass one or more columnname
arguments, if you wish to update statistics for only that column or columns. You may also pass
an optional partition_reference argument if you wish to VACUUM a partition of a logically
partitioned table. Note that the partition reference always comes after the column list when
present. It is always okay to use a partition reference without a column list or vice versa.:
Optional Aster Database behavior allows you to omit the tablename to VACUUM the whole
database. This behavior is not allowed in a default Aster Database installation; contact
Teradata Global Technical Support (GTS) if you wish to enable it. See Optional: Running
VACUUM on a Database on page 718. With this feature enabled, the following synopsis
applies in addition to the two above:
Parameters
FULL Selects "full" vacuum, which may reclaim more space, but takes much longer and
exclusively locks the table.
ANALYZE Updates statistics used by the planner to determine the most efficient way to
execute a query.
columnname The name of a column to ANALYZE. If omitted, all columns are ANALYZEd.
Employee Table
ID Name Visible
Here we have our table after
102 James True we DELETEd 3 rows
103 Qi .. True
108
Free Su
Space False
New INSERTS can occupy newly
104
Free Pente
Space False
206 Al True
1 After UPDATE
2
After DELETE
4 5
To find out whether a table would benefit from a VACUUM operation, you can check its dead
row percentages (as well as its uncompressed table size) using the nc_relationstats function.
You can use the function to find the number of dead rows, live rows, size of dead rows, size of
live rows, etc., so you can decide whether to run VACUUM FULL.
Note that in the case of a dimension table that is distributed by replication, the tuple count
returned reflects the total number of tuples on all workers.
Usage Recommendations
Before initiating a VACUUM FULL request on a database, use the ncli catalog
checklocks command to check for catalog locks in order to avoid conflicts with other
processes.
VACUUM generates a large amount of I/O traffic, which can slow other queries.
After adding or deleting a large number of rows, its a good idea to issue a VACUUM
ANALYZE command for the affected table. This updates the system catalogs so that query
planner can plan more efficient queries.
VACUUM recovered 35 MB of
dead space from DEL/UPD rows.
This space can now be used for
new INSERT rows. But notice
Table size is still >125 MB
Using ACT exclusively, login to BEEHIVE database and type the following:
SELECT * from ncluster_storagestat('aaf.clicks5');
Above requires re-assigning Permissions to table since it has new Object ID.
Must also build an Indexes on this table as well. And if have Views on Table want
to DROP, have to remove those dependent Views first
- If the catalogs are not vacuumed regularly then the Cluster system
performance will degrade over time
Vacuuming Indexes
Synopsis
Description
ANALYZE collects statistics about the contents of the specified table or partition in the database
and stores the results in internal tables. Subsequently, the query planner uses these statistics to
help determine the most efficient execution plans for queries. Note that the "partition
reference" always comes after the column list when present. It is always okay to use a partition
reference without a column list or vice versa.
You have the option of specifying one or more column names, in which case only the statistics
for those columns are collected. If your table has child tables created through inheritance,
dont forget to include the CASCADE option. If the table is a logically partitioned table,
ANALYZE automatically acts on the whole hierarchy, unless a partition is specified.
It is a good idea to run ANALYZE periodically, or just after making major changes in the
contents of a table. Accurate statistics will help the planner to choose the most appropriate
query plan, and thereby improve the speed of query processing. Also, the information
provided by the EXPLAIN command is only as current as the last running of ANALYZE.
Teradata recommends that you run ANALYZE after every batch of writes so that the statistics
are refreshed in bulk. You should run ANALYZE after any running of a CREATE TABLE AS
SELECT, INSERT, UPDATE, DELETE, or ALTER TABLE statement. A common strategy is
to run VACUUM and ANALYZE once a day during a low-usage time of day.
Unlike VACUUM FULL, the ANALYZE command requires only a read lock on the target table,
so it can run in parallel with other activity on the table.
The statistics collected by ANALYZE usually include a list of some of the most common values
in each column and a histogram showing the approximate data distribution in each column.
One or both of these may be omitted if ANALYZE deems them uninteresting (for example, in a
unique-key column, there are no common values) or if the column datatype does not support
the appropriate operators.
Best Practices:
VACUUM useful for slowly changing DIMENSION tables where
constant stream of UPDATES and INSERTS that could potentially
reuse the FREE SPACE
For tables loaded only once, no need to VACUUM
In Aster Database, the following SQL operations result in dead space being created on disk in
the data files for a given database table:
Dead space occurs when data rows are marked invisible but the space they take up is not
compacted or reused. For example, SQL DELETE command is executed by marking all
qualifying rows as invisible. The SQL UPDATE command, for example, operates as follows:
When you update a row, the existing row is marked as invisible, and the updated row is
appended at the end of the tables data file. Dead space, such as the invisible row in this
example, is not automatically marked to be reclaimed or compacted. Reclaiming such space
requires the administrator to run specific commands, which we will discuss below.
It is important to be vigilant about dead space and proactively reuse or compact dead space.
Even though dead space does not contain live data for a table, it affects your cluster in these
ways:
Too much dead space may result in node failures: Excessive dead space may result in a full
disk on a worker node.
Too much dead space may result in slow query/commit performance: A sequential scan on
a table requires scanning through all data files for the table, including the dead space.
Cluster-wide replication at transaction commit time is performed at the file level, so dead
space also needs to be replicated over the network.
Dead tuples no longer in use (either due to a DELETE or an UPDATE) are not physically
deleted. This dead tuple space is only reclaimed after a VACUUM operation. In order to decide
if you need to perform a VACUUM operation, first determine the exact dead tuple count.
There is no change in query syntax for compressed tables. For all query purposes, a
compressed table will be treated the same as a normal table. Compression is currently not
supported for temporary tables. Compressed tables are replicated in their compressed form.
Before you alter existing table compression properties compression levels, initial
compression of a table, decompression of a table you should ensure that there is sufficient
disk space available for the operation.
Table compression occurs in an online fashion without disruption to Aster Database. One
useful application of compression is to combine it with Aster Databases logical partitioning
feature for information lifecycle management. As you recall, logical partitioning enables
creation of a hierarchy such that a large table can have partitions, which in turn can have their
own partitions, and so on. If the child partitions are range-partitioned (e.g. monthly
partitions), compression can be used to compress the monthly child partitions over time, as
they become less frequently accessed.
When using the CREATE TABLE statement, specify Low since are typically good performers at cost of
CPU cycles. From ACT, to check for Table compression, execute \d <schema.tablename>
Using SQL Assistant, create 3 tables with the only change being different
Table names and Compression levels
Run the following to check the Disk space of each of the 3 compressed tables and the 1 Uncompressed table
Neither ACT, using\d <tablename>, nor TD Studio, allows you to see your
Logical Partitions for a Table. Note TD Studio 14.02 has a rudimentary Wizard
that attempts to Show DDL but is not yet a complete solution
See next page for 2 solutions to view Logical Partitions of a Parent Table
Create an analytic table to hold the output of a SQL-MR function, such as sessionize,
attribution or nPath. Then use the analytic table as input to other SQL-MR functions or
SQL queries. For example, nPath is sometimes used to filter web sessions based on the
behavior of shoppers in an online store (i.e. browsers, cherry pickers, price-sensitive
shoppers, etc.). Then further analysis can be done on just the sessions that fit that behavior
profile.
Use an analytic table to hold the results of a resource-intensive JOIN operation, so further
exploration can be done on the data without having to perform the JOIN again.
Employ analytic tables for a complex multistep process for which you need the highest
performance and want to keep the end results, but not the intermediate steps. In this case,
you can do most of the processing using analytic tables, and then write to a regular
(persistent) table at the very end of that process.
This special type table was created to hold data that for operations across a
span of several transactions, sessions or days. It has persistence between that
of a regular table and temporary table
These tables are not replicated and will not survive a System restart
Should only be used for derived data and never for Source data
Analytic tables will not survive things like a Soft/Hard restart, Node failover,
Balance Data, Balance Process, Activate, etc. If this occurs, it is recommended
you either TRUNCATE or DROP table, then repopulate it
Views are read-only in Aster Database: the system will not allow an insert, update, or delete on
a view. Also, Aster Database does not provide session-level temporary views, which you may
be accustomed to using on other database platforms.
If an object is referenced by a view, and that object is renamed, then the view will continue to
reference that object using the old name. Even if a new object is created with the old name, the
view will continue to reference the original object.
schema
table
column
another view
Typically Views point to another table (or view). This makes the View a dependent object
which means you cannot DROP the table unless you first drop the dependent View which can be
a time-consuming process. The CASCADE argument can be used in the DROP TABLE
statement to drop any dependent Views.
In-line Lab:
create table sales_repl distribute by replication as
SQL-MR function
Mod 07
Unified Data Architecture (UDA)
and Teradata QueryGrid connectors
VMware Workstation Before continuing, now is a good time to RESUME
icon the HADOOP nodes via VMware Workstation
Teradata has been in the big data market longer than anyone, so weve leveraged our expertise to
tackle the whats new part of the big data phenomenon with five parallel engineering activities.
Beginning with our core product, the Teradata Integrated Data Warehouse, we:
1.Defined a new architecture called the Teradata Unified Data Architecture that adds a
discovery platform and a data platform to complement the Teradata Integrated Data
Warehouse. In the Teradata advocated solution, the discovery platform is Teradata Aster, while
the data platform can either be Hadoop or a Teradata Integrated Big Data Platform for large,
cost-effective storage and processing of big data.
2. Engineered new access connectors between the platforms and to external sources of big data,
and extended our hardware platform portfolio and interconnect options to provide even faster
data transfers.
3. Created a library of pre-built Teradata Aster analytic modules to speed up and simplify
discovery, all in an easy-to-use Teradata Aster SQL-MapReduce programming paradigm. We
also continue to add more technical partner analytics to the Teradata Aster and Teradata engines.
4. Extended our traditional systems management products (Viewpoint, Studio) for similar use
with Hadoop and Teradata Aster.
5. Developed new Professional and Customer Service offers to help customers quickly design
and deploy enterprise data architecture and analytics projects.
Informatica
SAS
SQLServer
Netezza
etc ..
tudio
ection
To deliver value from big data, customers should create an architecture that allows the
orchestration of analytic processes across parallel databases rather than federated
servers. Teradata QueryGrid is the most flexible solution with innovative software that gets the
job done," said Scott Gnau, president, Teradata Labs. "After the user selects an analytic engine
and a file system, Teradata software seamlessly orchestrates analytic processing across systems
with a single SQL query, without moving the data. In addition, Teradata allows for multiple file
systems and engines in the same workload."
"Teradata pioneered integration with Hadoop and HCatalog with Aster SQL-H to empower
customers to run advanced analytics directly on vast amounts of data stored in Hadoop," said Ari
Zilka, CTO, Hortonworks. "Now they are taking it to the next level with pushdown processing
into Hadoop, leveraging the Hive performance improvements from Hortonworks Stinger
initiative, delivering results at unprecedented speed and scale."
Teradata QueryGrid changes the rules of the game by giving users seamless, self-service access
to data and analytic processing across different systems from within a single Teradata Database
or Aster Database query. Teradata QueryGrid uses analytic engines and file systems to
concentrate their power on accessing and analyzing data without special tools or IT intervention.
It minimizes data movement and duplication by processing data where it resides.
TERADATA
TERADATA ASTER
DATABASE DATABASE
TERADATA
HADOOP ASTER TERADATA OTHER LANGUAGES
DATABASE DATABASE DATABASES
Remote, Aster functions Teradata RDBMS Leverage
push-down such as SQL- Databases Databases Languages such
processing in MapReduce, as SAS, Perl,
Hadoop graph Python, Ruby, R
When fully implemented, the QueryGrid will be able to intelligently use the functionality
and data of multiple heterogeneous processing engines
~5
concurrent users
~25
concurrent users
~100++
Ingest, Transform, Archive concurrent users
Hadoop serves as the data store for capturing and refining bulk data, the Aster Database acts
as the discovery platform, and the Teradata Database acts as the data warehouse
Stable Teradata
Teradata/
Teradata Teradata Teradata Teradata (SQL
Schema Hadoop
analytics)
Aster
Aster
Evolving Hadoop
Aster / Aster /
Aster Aster
(SQL
(SQL ++
Schema Hadoop Hadoop MapReduce
MapReduce
Analytics)
Analytics)
Aster
Aster
Format, Hadoop Hadoop Hadoop Aster Aster
Hadoop Hadoop Hadoop Aster Aster (MapReduc
(MapReduce
No Schema e Analytics)
Analytics)
Other options include creating an Aster View that points to the remote data store table. Now the
Aster user can perform analysis using the Aster View that points to the remote data stores table.
1. Load Data Table will be persisted into Aster Database. Can Query or do
Analysis as needed
CREATE TABLE aster_movieratings DISTRIBUTE BY HASH( userid) AS
(SELECT * FROM load_from_hcatalog (ON mr_driver SERVER ('hadoop1')
USERNAME ('huser') DBNAME ('default')TABLENAME
('hadoop_movieratings'));
2. Query Data via SQL Table (or Subset of table via WHERE clause) will be
brought into Aster via Aster View for duration of the query
3. Analyze Data via SQL-MR Using Aster SQL-MR functions, pull in data on-
the-fly (from table or view) for the duration of the transaction
via View via Table
SELECT * FROM npath (ON SELECT * FROM npath (ON
(SELECT * FROM v_movieratings .. (SELECT * FROM load_from_hatcatalog
On the Teradata side, you need connectivity for all nodes and an assigned database name for
any Teradata database(s) you will be accessing through the Connector. All the other network
configurations in this section apply to Aster Database only.
Because the Teradata Import and Export operations are executed in parallel across Aster
Database, every node in Aster Database will need to be able to access the source or target
Teradata database(s) by name using DNS. Depending on the network configuration in use,
this may require that the /etc/hosts and/or the /etc/resolv.conf files on each Aster
Database node be edited to include the necessary entries to access these gateways. It is
recommended that you manage these configurations centrally using the AMC, as described in
the Teradata Aster Big Analytics Appliance 3H Database Administrator Guide. You will make
the settings once in the AMC, and they will be copied to all Aster Database nodes
automatically.
The Teradata TPT client uses DNS to discover gateways to the Teradata database. Teradata
calls these gateways Communication Processors or cops. Each Teradata database is given a
database name, and all the cops have DNS names which use a very specific naming convention
(e.g. dbnamecop1, dbnamecop2, ..., dbnamecopn).
For example, because the Teradata Import and Export operations are
executed in parallel across an Aster Database, every node in Aster
Database will need to be able to access the source or target Teradata
database(s) by name using DNS. It is recommended that you manage
these configurations centrally using the AMC
The load_to_teradata SQL-MR function copies data from Aster Database to Teradata. The
function is invoked on Aster Database. A SELECT statement is supplied in the ON clause, to
specify the data to be loaded. The function outputs information about the data copied and any
errors.
Teradata
Teradata | Aster
Supports Terabytes Data Warehouse
MapReduce Platform
of Data Transfer
Workers per Hour
You must first create a table in Teradata to hold the data being loaded. This table must exist, be
empty, and have a schema that's compatible with the data being exported from Aster
Database. Note that because the table must be empty, you cannot make two consecutive
load_to_teradata calls to the same target table. If you are making multiple consecutive calls to
the function, you must use a different target table for each one. After the data is loaded into
the target tables, it can be consolidated into a single Teradata table in a separate SQL operation
on Teradata.
If a datatype specified in the originating Aster Database schema does not match the datatype
in the target Teradata table, implicit datatype conversion will be performed by Teradata. For
Teradata, conversion rules are listed in the Teradata document SQL Functions, Operators,
Expressions, and Predicates which may be found at http://www.info.teradata.com/
edownload.cfm?itemid=102320046. After this table has been created, you can execute the
load_to_teradata function.
Why not just keep the data in Aster for users to query?
Because Aster is for Analytics by a limited number of Data Scientists.
You put the table on Teradata so thousands of users can access data
For larger result sets, it's a good idea to capture the output from load_from_teradata to a table
in Aster Database, to avoid the need to repeat the load if a query must be run again.
Goal: This lab demonstrates how to copy data from Teradata to Aster
Of course, to make it even easier for the Aster end-users, it is a common practice to hide the
complexities of the SQL-MR code by creating a View. Now the end-user can use common
ANSI standard SQL statements.
SQL-MR code
Common arguments
TDPID('tdpid')
[USERNAME(' username ')]
[PASSWORD(' password ')]
[LOGON_MECHANISM('TD2' | 'LDAP')]
[LOGON_DATA(' mechanism-specific logon data')]
[ACCOUNT_ID(' account-id')]
[TRACE_LEVEL('trace-level ')]
[MAX_SESSIONS('max-sessions-number')]
[QUERY_TIMEOUT(' timeout_in_seconds ')]);
load_from_teradata only load_to_teradata only
ON mr_driver ON source_query
QUERY ('query') TARGET_TABLE ('Tablename')
NUM_INSTANCES ('instances-count') ERROR_TABLES ('error table')
PRESERVE_COLUMN_CASE ('Yes|No') LOG_TABLE ('Tablename')
SPOOLMODE ('NoSpool | Spool') START_INSTANCE ('Instance')
SKIP_ERROR_RECORDS('yes'|'no') NUM_INSTANCES ('Instance')
bigint bigint
byte[(n)] bytea
byteint smallint
char[(n)] char[(n)]
date date
decimal[(s[, p])] numeric(s, p)
float double precision
integer integer
long varchar varchar
smallint smallint
time time
time with time zone time with time zone
timestamp timestamp
varbyte(n) bytea
varchar(n) varchar(n)
loaded_row_count
The loaded_row_count output indicates the total number of rows that were loaded into
the target Teradata table. Only one row will have the total row count, and the other rows
will have a value of 0. If the connector succeeded in loading rows into Teradata, but failed
to get statistics, this column will have the value of -1. Use sum(loaded_row_count) to
obtain the total number of rows loaded.
The value returned is equal to (actual number of rows returned) modulo 2^32. If the
number of rows to be loaded is expected to be greater than 2^32 (4,294,967,295) or the
row count is -1, please check the row count in the Teradata database by issuing a SELECT
COUNT(*) on the target table.
error_row_count
The error_row_count returns the number of rows in both of the Teradata error tables. To
see a total number of errors, issue sum(error_row_count).
You must first create a table in Teradata to hold the data being loaded. This table must exist, be
empty, and have a schema that's compatible with the data being exported from Aster
Because the table must be empty, you cannot make two consecutive load_to_teradata calls to
the same target table. If you are making multiple consecutive calls to the function, you must
use a different target table for each one. After the data is loaded into the target tables, it can be
consolidated into a single Teradata table in a separate SQL operation on Teradata.
The following code example includes a SELECT statement to access the output of the function
and provide 1) a count of rows successfully loaded, and 2) a count of rows with errors. In the
ON clause, there is a SELECT statement to indicate which rows are to be copied to Teradata.
SELECT sum(loaded_row_count), sum(error_row_count)
FROM load_to_teradata
(ON (SELECT * FROM ASTER_SOURCE)
tdpid ('dbc') username ('td01') password ('td01')
target_table ('td01.teradata_target'));
This example shows how to load data into Teradata when the number of Aster Database
virtual workers exceeds the number of AMPs in Teradata. To find out the number of AMPs in
Teradata:
$bteq
.logon dbc/UserID
password:
Select Count( distinct vproc) from dbc.AmpUsage;
*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.
Count(Distinct(Vproc))
----------------------
2
quit;
2 The output displays the number of Teradata AMPs. In the above example the number of
Teradata AMPs is 2.
The output displays the number of Teradata AMPs. In the above example the number of
Teradata AMPs is 2.
The load_to_teradata function is invoked as many times as needed, to balance the data
transfer between the Aster Database vworkers and the Teradata AMPs. Note that as a best
practice, the functions should be run within the context of a single transaction to maintain
data integrity, as shown in the example.
Perform the following steps when the number of vworkers exceeds the number of AMPs:
1 Determine the number of load_to_teradata() calls to make, using the following formula:
CallCount = vworkerCount / AMPCount
If the result is not an integer, round up to the nearest integer.
2 In Teradata, create the required number of target tables (e.g. if CallCount is 4, then create 4
target tables with names like target_table1, target_table2, target_table3, etc.)
2. In Teradata, create the required number of target tables (e.g. if CallCount is 2, then create 2
target tables with names like target_table1, target_table2)
3. Create and execute the load_to_teradata() statements, as follows:
BEGIN; Example:
SELECT (*)FROM load_to_teradata (ON (SELECT * FROM aster_source_table) # of v-Workers = 64
TARGET_TABLE ('schema.target_table1') TDPID('...') USERNAME('...') PASSWORD('...') # of AMPs = 32
START_INSTANCE('0') NUM_INSTANCES('64'));
Later, run: INSERT INTO target_table SELECT * FROM target_table1; INSERT INTO target_table SELECT * FROM target_table2;
The load_from_teradata SQL-MR function must be invoked on a fact table in Aster Database.
This is usually done by using a dummy table, which will be referred to as 'mr_driver', created
as follows: CREATE TABLE mr_driver (c1 INT) DISTRIBUTE BY HASH (c1)
Handing Upper Case letters in Imported Column Names
In Aster Database, you must surround any uppercase or mixed case name in quotation marks
(you use double quotes,) or it will be treated as if you had typed it in all lowercase characters.
In Teradata, no such quoting is needed. This difference can create confusion when you
retrieve tables and columns from Teradata and use them in an Aster query, because if you
specify PRESERVE_COLUMN_CASE ( ' YES ' ) the case of the Teradata table and column
names is preserved. As a result, if a retrieved table or column name contains any uppercase
characters, you must double-quote that name in your Aster Database query. To avoid
confusion, you should make a habit of enclosing in double quotes all table and column
names that you retrieve using load_from_teradata. For example:
Put in Aster
code
For both QueryGrid: Aste-Teradata functions, the most common problems are the
mapping of the schema is inconsistent between the two tables
Ensure that both tables have the same structure (columns, data types, etc.)
Data inconsistencies (constraint violations, hidden characters, failure of data type
conversion) can also cause problems, so check your data when in doubt
In case of a Teradata database shutdown, a pre-existing query using the Teradata
Connector on Aster Database will continue to wait for up to four hours for a reply
from Teradata. The query will resume and run to completion upon a restart of the
Teradata database. During the wait period, the query will show a status of Running
on Aster Database
During execution of the Connector functions, an error message may be returned from
either Aster Database or Teradata. First, determine whether an error message is likely
to be based on a problem on the Aster Database side or on the Teradata side
COPY
tudio
(QueryGrid: Aster-Hadoop) ection
LOAD_FROM_HCATALOG
All required Hadoop packages and HCatalog jars for certified distributions are installed during
the normal Aster Database installation or upgrade. However, you do need to set up SQL-H
Configuration for each Hadoop cluster you will access.
QueryGrid: Aster-Hadoop is in an
intelligent connector to Hadoop that
selectively pulls data from Hadoop
into Aster for analytics
Using ACT, you can query HCatalog directly without using the QueryGrid connectors. Here are
examples to view databases, tables and columns of a table.
Apache Hadoop is an open source platform for storing and managing big
data. Teradata Aster SQL-H is a software access method which provides a
bridge that enables users to easily analyze data stored in Hadoop through
standard ANSI SQL and Aster's SQL-MapReduce framework. The table and
storage management service for data stored in Apache Hadoop is HCatalog.
SQL-H provides deep metadata layer integration with the Apache Hadoop
HCatalog project
1. You can use Aster Database to access the HCatalog metadata directly.
For example, you can list all databases and tables in HCatalog from
Aster using ACT
2. Aster Database supports fetching data from HCatalog, and the
automatic mapping of HCatalog datatype value to the Aster datatype
value
3. You can query HCatalog from Aster Database. There is support for
partitions and Partition Pruning on HCatalog, to improve query
performance
Each Hadoop cluster you wish to access must have a SQL-H configuration. You can either use
the AMC or ncli to create a SQL-H configuration. To use the AMC:
a Server: The hostname of your Hive server. Note that if the Hive server and namenode
have different hostnames, you must specify the Hive server hostname, not the
namenode hostname.
c Port: The port on which the Hive server listens. This is generally port 9083.
6 You will see the entry you just made in the SQL-H Configuration list.
Must configure QueryGrid connector within AMC first or will get following error
message when using the connector (Note it uses the older name 'SQL-H)
SELECT * FROM load_from_hcatalog
(on mr_driver
server('192.168.100.21')
dbname('default')
tablename('department')
username('hive')) limit 5;
Good idea to
configure HOSTS in
AMC as well if use
Names instead of IP
Supported
HCatalog Datatype Aster Datatype
To view databases in Hcatalog via ACT To view tables in Hcatalog via ACT
(on mr_driver
server ('192.168.100.15') dbname ('default')
tablename ('sales_fact) username ('hive'))
limit 5;
The WHERE clause in the SELECT statement must point to the partitioning columns that were
declared in the CREATE TABLE of the Hive table.
Hive Table CREATE TABLE carpricedata (Price DOUBLE, Mileage BIGINT, Make STRING,
Carpricedata on
Hadoop Model STRING, Trim STRING, Type STRING, Cylinder INT, Liter FLOAT, Doors
Notice this table is INT, Cruise TINYINT, Sound TINYINT, Leather TINYINT)
Partitioned PARTITIONED BY(country STRING, brand STRING, dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS
TEXTFILE;
The vH_carpricedata view hides the load_from_hcatalog call to Hadoop. Aster user can perform ANSI
SQL query as if the data were on Aster. Or can do SQL-MR code if wanted to do Analysis on this data
tudio
ection
tudio
ection
s to
s all
1. If want total connective to all 3 query platforms today, you must initiate
from Aster side (T or F)
2. This connector tells you how many good rows were loaded and how
many malformed rows were not load
3. The GUI Smart Loader can only pull entire contents to/ from TD and
Hadoop
Objectives:
Module 8
Users, Privileges, Functions and
Security
VMware Workstation
SUSPEND the TD-box and Hadoop nodes
icon
External Security
Teradata Wallet
Lightweight Directory Access Protocol (LDAP)
Secure Socket Layer (SSL)
Access Control
Limits on users ability to read from and write to databases are governed as follows:
Aster Database security on database objects is managed through GRANT and REVOKE
statements.
GRANT statements grant privileges on database objects to one or more roles or individual
users.
Object level security authorizations are stored locally in systems tables on the coordinator.
Users rights to use the AMC are also managed with GRANT and REVOKE.
Role
CREATE role mktRole;
1
GRANT CONNECT on DATABASE New User assigned with Role
beehive TO mktRole;
CREATE USER moe in
2 PASSWORD 'stooge'
GRANT USAGE on aaf TO mktRole; IN ROLE mktRole;
Add Role to existing User
3
GRANT SELECT on GRANT mktRole TO curly;
aaf.employee TO mktRole;
Assigned to User
GRANT SELECT on
aaf.employee TO larry;
Privileges User
Default users:
beehive: The user beehive (default password: beehive) owns the default database, also
called beehive. By default, the beehive user has no administrative rights.
Important! Immediately after you install Aster Database, you should change the default
password of db_superuser to one that is more secure.
- Has the powerful db_admin role and can access all database objects
in every way without restriction.
See CREATE USER in the Teradata Aster Big Analytics Appliance 3H SQL and Function
Reference for the list of options.
The following example demonstrates how to add a new user to Aster Database with the name
ryan in the group marketing with specified password ryan123.
Database users are global across an Aster Database installation (and not per individual
database). To create user theadon with password 5t4g0l33, use the SQL command CREATE
USER:
The user name must follow the rules for SQL identifiers: either double-quoted or without
special characters. The password must follow the Password Rules.
A User is an entity that can login into the database, can own database
objects, and can have database privileges
These password rules apply regardless of the authentication method. If you are automatically
generating passwords, ensure that only passwords that follow these rules can be generated. If
you use any tools which automatically generate passwords, you may need to modify them to
choose only Aster-supported passwords.
Local password authentication: Aster Database validates the username and password
against the users record in its local repository on the queen (with backup on the cluster).
Passwords stored here are masked. The Aster Database user accounts are not shared with
the operating system user accounts and vice versa. This is the default. If you have activated
LDAP or AD authentication and wish to switch back to password authentication
Lightweight Directory Access Protocol (LDAP) authentication: Aster Database passes the
username and password to the LDAP server for authentication. Your user accounts must
be stored in an LDAP-compatible directory server such as Active Directory or OpenLDAP.
Active Directory authentication: This uses the LDAP mechanism discussed in the
preceding bullet point.
Note that Active Directory can also be supported via LDAP support
The best practice for certain table access is to use fully qualified
schema-name.table-name queries
For unqualified queries the Schema Search Path will determine the
schema accessed. The default schema search path for all users is
simply the 'public' schema
To view your current schema search paths, from ACT type: show search_path;
From TD Studio, open nc_all_users
If you are unable to drop a role, you may need to revoke privileges before dropping the role.
Any memberships in the group role are automatically revoked, but the individual members
(users or roles) are not otherwise affected. So if DROP ROLE admin occurred, user jstrummer
would no longer be a member of the group role admin since it was dropped. But jstrummer
as a user would still exist and not be affected.
If a USER or ROLE owns or has privileges to database objects, you will not be
able to drop it easily
Best Practice:
Find which objects a USER or ROLE owns/has access in the system tables
(more on this in a few slides)
Again in the system tables find all the roles this USER or ROLE has a
relationship with and REVOKE all of the related privileges. Also REVOKE
any access to any objects for that user
You may find it convenient to group users to ease management of privileges. That way,
privileges can be granted to, or revoked from, a group as a whole. In Aster Database this is
done by creating a role that represents the group, and then granting membership in the group
role to individual user roles.
New roles in Aster Database are defined through the CREATE ROLE command:
See CREATE ROLE in the Teradata Aster Big Analytics Appliance 3H SQL and Function
Reference for a list of options.
Once the group role exists, you can add and remove members using the GRANT and
REVOKE commands:
There isn't any real distinction between group roles and non-group roles, so you can grant
membership to other group roles, too. The only restriction is that you can't set up circular
membership loops. Member roles automatically have the privileges of the group roles to
which they belong.
You destroy a group role in the same manner with which you destroy any other role, using the
DROP ROLE command:
If you are unable to drop a role, you may need to revoke privileges before dropping the role.
Users in Aster Database are authenticated before they can access the
various database components. A user has access to the various database
components based on the privileges they have been granted
Access Control limits on users ability to read from and write to databases
are governed as follows:
Sometimes Roles are called 'Group' or 'Group Roles'. All 3 are synonymous and used
interchangeably. For ease of use, we will use Role exclusively unless otherwise noted
catalog_admin is Aster Databases standard administrative role. This role has a minimal
set of administrative rights. The catalog_admin role has the privilege to view all system
tables. (However, the catalog_admin role does not have unrestricted access to everything
in the database. For example, the catalog administrator cannot arbitrarily modify user
tables unless explicitly granted permission to do so.) The catalog_admin role does not
provide AMC access.
amc_admin and similar roles determine what actions the user can undertake in the AMC.
The default administrative roles db_admin and catalog_admin, as well as the default
administrative user db_superuser cannot be dropped or altered in any way.
Note there are other default Roles that can be viewed from the AMC as well.
db_admin
- Member of this role are Cluster Database Administrators with
unrestricted access to everything within the database
catalog_admin
- Users with this role are privileged to view all system tables
- This role has a minimal set of administrative rights, it does not have
unrestricted access to everything in the database
PUBLIC
- All Users are automatically assigned to this Role by system
The above Roles cannot be Dropped. To view other Roles, type: SELECT *
from nc_all_roles; or use AMC
Another Default Role, process_runner can be given to all Users so they can
have access to the AMC in order to Cancel their queries
To assign privileges, use the GRANT command. In Aster Database, privileges can be granted
only at the database or table level. Note that rolename can be either a role group or user. Heres
a subset of the GRANT syntax supported in Aster Database:
Assign Privileges
to Objects
When an object (such as a database, schema, or table) is created, it is assigned an owner. The
owner is the user that created the object. For most kinds of objects, only the owner can do
anything with the object initially. To allow other users and roles to use the object, privileges
must be granted. There are many different privileges, including: SELECT, INSERT, UPDATE,
DELETE, and CREATE. For more details, see the Teradata Aster Big Analytics Appliance 3H
SQL and Function Reference.
To achieve this grant specific privileges to all the tables and views, and
use the WITH GRANT OPTION clause sparingly
If WITH GRANT OPTION is specified, the recipient of the privilege may in turn grant
it to others. Without a grant option, the recipient cannot do that. Grant options
cannot be granted to PUBLIC role
Can grant Role to another Role (Nesting Roles- Can nest at least 9 levels deep)
Roles Roles
Role: mkt_report_writer GRANT mkt_readonly TO mkt_report_writer;
GRANT CREATE on a database gives the user/role the right to create new schemas in the
database. Granting CREATE on a database does not confer the right to create tables. To do
that, you must do the following:
GRANT CREATE on a schema gives the user or role the right to create new tables and
objects in the schema.
GRANT USAGE on a schema gives the user or role the right to access objects in the schema.
GRANT CREATE
- For Databases, gives the user/role the right to create new Schemas in the
database. Granting CREATE on a database does not confer the right to
create tables. To do that, you must grant the user CREATE on a schema in
the database
- For Schemas, granting CREATE on a Schema gives the user/role the right to
create new Tables and objects in the schema
GRANT CONNECT - Gives a User/Role the ability to connect to a Database.
This privilege is checked at connection startup. For new databases you create,
only you have the CONNECT privilege
GRANT USAGE - Gives a user/role the ability to access data objects stored in a
Schema. It is checked on schema access
GRANT INSTALL FILE, GRANT CREATE FUNCTION Allows user to upload
and install files and SQL-MR functions in the schema
GRANT EXECUTE Allows user to run SQL-MR function
ALL [PRIVILEGES] Grants all privileges at once
The keyword PUBLIC specifies that the privileges are to be granted to all
USERs, including those that might be created later. PUBLIC can be
thought of as an implicitly defined Role that always includes all USERs.
Any particular role will have the sum of privileges granted directly to it,
privileges granted to any role it is presently a member of, and privileges
granted to PUBLIC
4. Logon to another ACT session as joe (act -d beehive -U joe) and then:
SELECT * from aaf.employee; Were you successful ? No
5. What command do you type to (from 1st ACT) so Joe can do Step 4?
GRANT SELECT ON aaf.employee to joe;
Note: The special privileges of an object's owner the right to modify or destroy the object
are always implicit and cannot be granted or revoked.
Note: Some types of objects can be assigned to a new owner with an object-appropriate ALTER
command. A user/role can reassign ownership of an object only if she is both the current owner
of the object (or a member of the owning role) and a member of the new owning role.
You grant and revoke users rights to functions using the commands shown below. For more
complete descriptions of these commands, see the reference documentation for GRANT and
REVOKE in the Teradata Aster Big Analytics Appliance 3H SQL and Function Reference.
To give a user or group the right to install files and create functions in Aster Database, an Aster
Database administrator must use one of GRANT commands that gives the user privileges in
the context of a schema:
GRANT { INSTALL FILE | CREATE FUNCTION } [, ...] [ PRIVILEGE ]
ON SCHEMA schemaname [, ...]
TO { username | GROUP rolename | PUBLIC } [, ...]
Note that there is no support for delegating privilege management, because the WITH
GRANT OPTION clause is not supported.
To give a user or group the right to run a function in Aster Database, an Aster Database
administrator or the functions owner must use the GRANT EXECUTE command that gives
the user the privilege for the specific function:
GRANT EXECUTE [ PRIVILEGE ]
ON FUNCTION [schemaname.]funcname
TO { username | GROUP rolename | PUBLIC } [, ...]
Note that there is no support for delegating privilege management, because the WITH
GRANT OPTION clause is not supported.
REVOKE INSTALL
To deny a user or group the right to install functions and files in Aster Database, an Aster
Database administrator must use the REVOKE INSTALL command:
REVOKE [ GRANT OPTION FOR ]
INSTALL { FILE | FUNCTION } [,...] [ PRIVILEGES ]
ON SCHEMA schemaname [, ...]
FROM { [ GROUP ] rolename | PUBLIC } [, ...]
REVOKE EXECUTE
To deny a user or group the right to run a function in Aster Database, an Aster Database
administrator or the function's owner must use the REVOKE EXECUTE command:
REVOKE [ GRANT OPTION FOR ]
EXECUTE [ PRIVILEGES ]
ON FUNCTION [schemaname.]funcname
FROM{ [ GROUP ] rolename | PUBLIC } [, ...]
Once you INSTALL, the Queen distributes a copy of the file to all the
v-Workers. When you execute a SQL-MR statement, that function is
copied into JVM along with tables rows for processing
2. Open another ACT and logon as joe and attempt to run ANTISELECT function
SELECT * from aaf.ANTISELECT (on aaf.employee exclude ('job_code')); Fail
From DB_supeuser
EXTRA CREDIT: ACT,
What ALTER userhave
do you moe set
to search_path to 'aaf' , 'public';
do so Joe doesnt have to qualify both the
Then have moe
function and log
theback in as Hint:
table? moe Need to do two things
If you want a user group in a deployment to be able to SELECT but not add or alter data or
tables in the public schema, you can revoke the default rights to the public schema and grant
limited rights as show here. Let's assume the user's who we want to give read-only access are in
a group called, "ANALYSTS":
By default, all users in Aster Database have read/write access to schema public, so first we must
revoke that:
Next, we grant back the rights we want the group to have. For example, let's assume we want
them to be able to select from table1:
You can REVOKE the default rights to the PUBLIC schema and grant limited
rights
Assume you have a Role = Analysts and 10 users are in this Role. To setup
for read-only access to PUBLIC schema for Analysts Role:
Role
Above will revoke all privileges from all Users in the default PUBLIC role for
the PUBLIC schema. It then GRANTS back SELECT permissions on the 2
tables within the PUBLIC schema
(these accounts will be used later for the Workload Policy labs)
1. etl
2. strategic
3. tactical
2. Teradata Wallet
Rather than using the passwords in scripts, you can use their corresponding names defined in
your wallet.
ACT
Loader Tool
Export
JDBC
ODBC
First ensure there is an Aster user named 'trm' with password 'trm'.
If not, create this user
TD Wallet Commands
Open TD Wallet, create master password (tdwallet1) and <name value> for the
trm User account (trm's password = trm) name
password
In this example, we export data from the Aster Database table mytable to the file
mydata.txt as the user mjones. TD Wallet replaces $tdwallet(mjones) with the
corresponding password stored on the client TD Wallet.
$ ./ncluster_export -U mjones -w $tdwallet(mjones) -h 10.50.52.100 -d
mydb mytable mydata.txt
In this example, we connect to Aster Database as the user beehive. TD Wallet replaces
$tdwallet(beehive) with the corresponding password stored on the client TD Wallet.
$ act -d beehive -h 10.42.52.100 -U beehive -w $tdwallet(beehive)
In this query, TD Wallet replaces $tdwallet(abc) with the corresponding password stored on the
client TD Wallet.
select * from cfilter(
on (select 1)
partition by 1
database('beehive')
userid('beehive')
LDAP is specified in a series of Internet Engineering Task Force (IETF) Standard Track
publications called Request for Comments (RFCs), using the description language ASN.1. The
latest specification is Version 3, published as RFC 4511. For example, here is an LDAP search
translated into plain English: "Search in the company email directory for all people located in
Nashville whose name contains 'Jesse' that have an email address. Please return their full name,
email, title, and description."[3]
A common usage of LDAP is to provide a "single sign on" where one password for a user is
shared between many services, such as applying a company login code to web pages (so that
staff log in only once to company computers, and then are automatically logged into the
company intranet).[3]
To use LDAP authentication in Aster Database, each user must have two corresponding user
accounts: one in LDAP and one in Aster Database. The usernames must match, and the user
account in Aster Database must have connect privileges to the databases the user will use. If
the user will use the AMC, you must grant an AMC-capable role to his or her Aster Database
user account. Stated more explicitly, this means:
1 If an existing Aster Database user does not also have an account in LDAP, he or she will not
be able to log in to Aster Database after LDAP is enabled! Create user accounts in LDAP for
all Aster Database users.
2 If an LDAP user does not also have an account in Aster Database, he or she will not be able
to log in to Aster Database! Even with LDAP authentication enabled, every Aster Database
user must also have an account in Aster Database.
3 If the users Aster Database user name does not match his or her global LDAP user name,
he or she cannot connect to Aster Database. To fix this without creating new user accounts,
create an alias in the global LDAP server that matches the Aster Database user name.
To revert the LDAP configuration to the default (authentication through Aster Database
only):
1 Perform a soft shutdown on the cluster. Log in to the queen as root and run the following
command:
2 Change the working directory to where the Aster Database configuration utility is located:
# cd /home/beehive/bin/lib/configure
#./ConfigureNCluster.py --auth_type=PASSWD
Mod 09
Backup and Restore
Data from Aster Database can be backed up to two types of storage targets:
Backup Cluster: The Backup Nodes themselves can store backups using their directattached
storage, with hardware-level disk mirroring (e.g. RAID 1) for enhanced data
protection. Storage-heavy servers can provide high-density storage at a low cost per
terabyte an important consideration for data volumes associated with data warehousing
applications. The Backup Cluster design follows incremental scalability principles similar
to those of the Aster Database architecture more servers can be incrementally
provisioned to add backup capacity. Thus, backup storage costs can be managed in a
granular manner as data volumes grow. Backup files can subsequently be moved to tapes
or VTLs (virtual tape libraries) from Backup Nodes if required by IT or corporate
governance policies.
Network Storage: Aster Database Backup also provides the flexibility to store backups on
network storage (SAN/NAS) if required by an organizations IT policies. If this option is
chosen, Backup Nodes can run backup and restore processes in a massively-parallel
manner while using a networked storage array to store backup data.
These are the steps that occur when performing a backup operation:
1 The Backup Manager first contacts the queen node in Aster Database.
2 The queen coordinates the backup request with the worker nodes.
3 Worker nodes connect directly to the Backup Nodes and stream data in a massively
parallel manner to the Backup Nodes.
The archive blob(s) on each node are named automatically, using this convention:
AsterArchive_<backup_id>_Node<node_id>_<BLOBIndexNumber>_<BLOBCount>.tar
Split
By looking at the name of any archive file, you can infer details such as which backup object
this blob belongs to, which backup node the blob was created on, the index number of this
blob, and the total number of blobs that make up the archive for this backup object.
Storing a copy of the archive off site helps in implementing a disaster recovery plan. After the
archive has been created, you may move the archive blobs to offline storage or tape. When
moving the files, ensure that the owner, group and permissions remain the same by using the
mv instead of the cp command. The permissions on the archive files should remain as:
The backups and archive files on the Aster Backup cluster can then be removed to free up disk
space.
Rm rf .metadata
1. On all Backup servers, create the UNIX account 'beehive' with the home directory
/home/beehive
2. Create UNIX 'beehive' group and add UNIX 'beehive' user to the 'beehive' group
3. Create the directory /home/beehive/data on the Backup Manager and Backup
Nodes. Give full access to 'beehive' user on the /home/beehive/data directory
4. Set up passwordless SSH for user 'beehive and user 'root' among Backup Nodes
You can perform this installation from any Linux workstation that has Python
2.5.2 or a later version of Python 2.x installed and has access to the machines
that will form your Backup Cluster. In these instructions we will assume you
are working in a command shell on the machine that will be the Backup
Manager. After logging in as ROOT user:
This folder must remian
1. Create the directory /home/beehive/data empty prior to install
This produces the 5 files you will need (put in /home directory):
- install_backup.py
- install_backup_node.py and install_backup_node.pyc
- backup-sw.tar.gz
- tc_backup_x86_64.tar.gz
4. Save these five files to a single directory on the Backup Manager. The
machine you place them on must have network connectivity to all Backup
Nodes
5. Log in as root and change directories to the directory where the 3 installer files
are located, and run the script install_backup.py with the following options:
1. Open VMware image for Backup node. If needed login as: root/root
2. Click on Computer button in lower-left hand corner, and open up both the
Nautilus (file browser) and GNOME Terminal
3. From File Browser, in left-hand pane click on File System, then double-click
on Home folder. Confirm have files needed to install (backup-sw.targ.gz,
install_backup_node.py, install_backup_node.pyc, tc_backup_x86_64.tar.gz
4. Minimize the VMware Workstation icon
5. From PUTTY, login to Backup-Mgr. Use root / root to login. From prompt,
type: cd /home, then type:
To launch ncluster_backup:
1 Open a command shell on any machine that has Aster Backup installed.
2 Log in as beehive.
3 If ncluster_backup is not in your path, add it now, or change the working directory to
the executable directory. By default, the executables are installed in:
$ cd /home/beehive/bin/exec
4 Type ncluster_backup, passing the -h flag followed by the IP address or DNS name of
the Backup Manager:
$ ncluster_backup -h <mgr_IP>
For example, if you are working on the manager itself, you will type
$ ncluster_backup -h localhost
ncluster_backup command
From Unix prompt, to initiate software, type:
/home/beehive/bin/exec/ncluster_backup h <Backup Mgr IP addr>
1 Log in to the AMC for the Aster Database cluster you want to back up.
5 Click OK.
Once the Backup Manager has been added, you will see a confirmation message in the upper
right part of the window. You wont see any entries in the Cluster Backups table until a backup
has been started.
The output shows all backup nodes along with their used/total storage capacity and current
status.
show storage
Shows all backup nodes along with their used/total storage capacity
and their current status.
After do DELETE, USE% may not immediately reflect the actual disk space used
Register/unregister node
- Backup manager needs to be registered with itself
Note during original install, code automatically REGISTER any nodes defined after the -n
argument. So above code is after you have installed the Backup software and want to
add/delete Backup nodes
Physical Backup (Incremental) backs up anything that has changed in the Aster Database
cluster since the last physical backup, saving space and bandwidth compared to a full
physical backup. When restored, an incremental backup does a full physical restore,
automatically using the last full physical backup and any incremental backups taken up to
the point of the incremental backup you have chosen to restore.
All types of backups are online operations, meaning your cluster remains up and can service
queries while the backup runs. All may be scheduled to occur automatically at a specified
interval.
Online
Full physical backups
Incremental physical backups
Physical restore
Restore FULL and any required INCREMENTALS
Compression at Backup node, not Worker node for both FULL and
INCREMENTAL (for cost and performance)
Backups Can
Pause/Resume
Cancel
2
Full Physical Backup Backup Manager communicates with Queen
1
Queen returns list of v- 3 Backup command issued
Workers via CLI to Backup Manager
Queen
During RESTORE, files are uncompressed on Backup nodes and then transferred to Worker nodes
Queen
Changed Files Full Backup
Incremental Backup
Full Backup
v-vWorker 1 v-Worker 11 v-Worker 21
File 1 File 2 File 3
File
Link1 File
Link2 File
New3
File 3
v-Worker 2 v-Worker 12 v-Worker 22 File 1 File 2 File 3
During Restore, point to FULL. The INCREMENTALS are automatically applied in proper Order
Logical backup restoration is an online operation, meaning that it does not interrupt the
operation of Aster Database. Physical backup restoration, on the other hand, requires a restart
of Aster Database.
2 Type the start backup command, specifying physical full and the IP address of the
Aster Database queen:
3 Use the show backups command to check the progress of your backup:
2 Type the start backup command, specifying physical incremental and the IP
address of the Aster Database queen:
3 Use the show backups command to check the progress of your backup:
Queen
Starting a Full Physical Backup:
Can execute from PUTTY or from VMware image via UNIX prompt . Note youe
execute commands from Backup Mgr, but always point to Queen in the command
2 Type the start backup command, specifying logical, the table name, and the IP
address of the Aster Database queen:
3 Use the show backups command to check the progress of your backup:
2 Type the start backup command, specifying logical, the table name, the partition
reference, and the IP address of the Aster Database queen:
3 Use the show backups command to check the progress of your backup:
The above example schedules a physical backup of the Aster Database with queen IP address
10.50.200.100. The scheduled backup will start at 6:30 AM on July 31, 2008 UTC
(Coordinated Universal Time). The repeat keyword is used to indicate if the backup activity is
recurring. In the above example, a full backup will be taken at the start time and would be
automatically taken once every two weeks (represented by 2w). An incremental backup will
be taken every other day (represented by 2d).
To schedule a physical backup without incrementals, specify repeat time for incremental
backups as 0 days:
The above example schedules a logical backup of table testTable from database TestDB on
queen 10.50.200.100 starting at 1:00 AM on July 31, 2008 (UTC). A logical backup will be
taken at that time and will be taken every day (represented by repeat 1d).
The REPEAT keyword indicates if the backup is Recurring. So a Full backup will occur
on Oct 23, 2013 starting at 2:30 AM and will automatically be taken once every week.
An Incremental will be taken every day thereafter
Here we take a FULL backup each week starting o Oct 22nd and have no Incremental
backups whatsoever
A Full backup will occur on Jan 1, 2013 starting at 2:30 AM and will
automatically be taken once every week. No incremental backups will taken
A Logical backup will occur on Jan 2, 2013 starting at 1:00 AM and will automatically
be taken once every day
While backups can execute concurrently with queries and loads, running the backup still
consumes system resources. In some cases, you may want to pause the current backup so the
system can allocate all its resources to queries or data loading. The pause and resume
commands let you do this. Note that only physical backups (either full or incremental) can be
paused. Logical backups cannot be paused.
All of these operations require that you know the unique ID of the backup you wish to act
upon. If you do not know the backup ID, you can use the show backups command to find
the backup you wish to pause, resume, or cancel.
Pausing a Backup
Pausing a backup stops a backup that is in progress, until you resume it. Any data that has
already been transferred from the subject Aster Database to the Backup cluster will not be
transferred again. In other words, any incremental work that the backup performed prior to
being paused is preserved.
2 Type the pause backup command, specifying the ID of the backup to pause:
nCluster Backup> pause backup <backup_id>
3 Use the show backups command to check that your backup has been paused:
nCluster Backup> show backups
Resuming a Backup
Resuming a backup starts a paused backup process from where it left off.
To resume a backup that was paused:
2 Type the resume backup command, specifying the ID of the backup to resume:
nCluster Backup> resume backup <backup_id>
3 Use the show backups command to check that your backup has been resumed:
nCluster Backup> show backups
2 Type the cancel backup command, specifying the ID of the backup to cancel:
nCluster Backup> cancel backup <backup_id>
3 Use the show backups command to check that your backup has been cancelled:
The storage used for a backup or archive can be reclaimed by issuing a delete backup or
delete archive command. After the delete command finishes, the data associated with
the given backup or archive will have been removed.
Before you can delete a physical backup, you must delete all its subsequent incremental
backups. That is, if there is an incremental backup with a timestamp later than the backup you
are trying to delete, your attempt to delete will fail.
2 Type the delete command, specifying the ID of the backup you want to delete:
To delete a backup:
To delete an archive:
Note that it is possible to delete an archive, but leave its corresponding backup intact and
vice versa. The two are stored and managed independently of one another.
Before you can DELETE a Full Physical backup, you must Delete its subsequent Incremental
backups. That is, if there is an Incremental Backup with a timestamp later than the Backup
you are trying to Delete, your attempt to delete will fail.
To do so first, do a SHOW BACKUPS to view Backup IDs, then can run DELETE command in
the proper timestamp order
3
2
1
Delete backup
This removes it from
the AMC and removes
Directories and Files
6 Make a note of the Restore ID that appears when the restore begins.
7 Check the status of the restore job: nCluster Backup> show restores
Once your restore shows a status of Succeeded, proceed to the next step.
This will bring up the cluster and replicate the data to restore the original replicationfactor.
1 Make sure your Aster Database on the target cluster (i.e. the cluster to which you wish to
apply the restore) has the same version number as the cluster from which the backup was
made. Cross-version restores are not supported.
2 Make sure your target Aster Database has the same partition count as the backup you will
restore. If the cluster is active and no administrator has performed a partition split since
the backup was taken, then the partition count is the same. You can check the partition
count as shown in Prepare for Partition Splitting on page 436.
3 Because physical restore is an offline operation, you should notify users that the system
will be unavailable during the restore operation.
4 Log in as root on the target Aster Database queen .
5 Perform a soft shutdown of Aster Database: # ncli system softshutdown
6 Clean all data off of the target Aster Database by running the following command:
# ConfigureNCluster.py --clean_data
7 Perform a soft startup of Aster Database: # ncli system softstartup
8 Activate the Aster Database cluster: # ncli system activate
9 Next, to perform the restore
Restoring from a logical backup is an online operation that can be performed while Aster
Database is running. Logical restore takes longer than logical backup because of the replicating
overhead. A logical restore operation is executed as a single modifying transaction, with
replication of the data included as part of the transaction commit.
There is some overhead associated with restoration of formatting from logical backups. While
data blocks associated with a physical backup can be readily used after being transferred back
into Aster Database, data in a logical backup needs to be reloaded. The data has to be
converted back from the text format to the native format used by Aster Database, which causes
some performance overhead. This process is not executed by the queen or loader nodes. It occurs
automatically on the worker nodes after the data has been transferred.
1 Make sure your Aster Database has the same version number as the cluster from which the
backup was made. Cross-version restores are not supported.
2 The table (or partition) you are restoring must not exist in the database.
If you are restoring a table that already exists, drop it before attempting to restore.
If you are restoring a partition, you can either drop it or detach it from its parent table
before attempting to restore.
3 If the table you are restoring is a child table or partition, its parent table or partition must exist
in the database. If the parent is missing, create it before attempting to restore the child.
4 If you are restoring a partition, its parent must have the same partition format (i.e. LIST/
RANGE option and partition key) that it had at the time of the backup operation. After
the partition is restored, it will have the same structure and be attached at the same point
in the hierarchy as it was before the backup.
5 If you are restoring a partition, the parent table should have the same structure as it had
when the partition was backed up.
On-line operation
Target table must not exist. DROP it if it exists
Ensure have same Aster version number as one from backup
Parent of Target should exist (if using parent/child inheritance)
Includes restore of metadata
Running
Failed since table existed
Succeeded
Workload Management
Verify priority of the Backup class in AMC
Backup nodes
Verify incoming network throughput (iperf)
Verify IO writes (atop, iostat -m -d x 60)
Verify CPU usage (sar u 60)
Possible Actions
Add more network cards
Enable or correct NIC bonding
Check routing
Setup dedicated backup network (Network Assignments
feature)
Change RAID level
Enable write back cache
Replace faulty disks
Optimize TCP parameters
Add Backup Nodes
Collect the following log files (from time of error) from the Backup
Manager node:
- /home/beehive/data/logs/backupExec.log
- /home/beehive/data/logs/cluster.log
Collect the following log files (from time of error) from the Queen node:
- /primary/logs/sysmanExec.log
- /primary/logs/cluster.log
- /primary/logs/queenExec.log
Many of the same task you can perform from the Aster backup command
line you can also perform from the AMC
6. You can define 3 different levels of Backup Compression (Hi, Med, Lo)
Mod 10
Workload Management and
Admission Controls
Now is a good time to SUSPEND
the Backup VMware image
Report Writers
Frontline
Applications
BI Analysts
Management
Ad Hoc Users
Workload Management
Resource prioritization based on user-provided guidelines
Workload: Set of SQL statements and activities with shared
properties from the users perspective
Main objective: predictable running times
More resources given to higher-priority Workloads at the expense
of lower-priority ones
Administrators have the option to enforce admission control by setting Admission Limits
to determine when and if tasks (transactions, jobs, or queries) are allowed to be admitted into
the system for processing. This is especially important if you have a particular type of query
upon which other transactions depend. As an example, if you have call center or point of sale
transactions that depend on other transactions, the administrator has the ability to control:
These rules instruct Aster Database to run each type of job with the right level of urgency.
Based on your rules, Aster Database assigns an initial level of importance to each job and, if
warranted, re-ranks the job while it is running. For example, your rules can ensure high
resource allocation for a newly added query of a given type but throttle down resources for
that query if it runs so long that it is suspected of being a runaway query.
It consists of 2 parts:
These rules instruct Aster to run each type of job with the right level of
urgency. Based on your rules, Aster Database assigns an initial level of
importance to each job and, if warranted, re-ranks the job while it is running
Statements are optimally executed by controlling the running and resources allocated to all tasks
including: loading, reporting, mining, applications, compression, backup, scale-out
1 2 3 4
Priority 0 (zero) indicates a job that will not be allowed to run, priority 1 a very lowimportance
job, priority 2 a medium-priority job, and priority 3 a high-importance job. You
can prevent a query from starting by having it map to a priority zero service class at the outset,
but you cannot use priority zero to stop an already-running query. Priority levels 1, 2, and 3
can be applied to a running query.
Reasons for using priority 0 (deny) might be that you wish to disallow any queries against a
particular table during certain hours when the daily sales reports are run. Or you may wish to
block certain categories of users from running queries during peak hours.
Priority 1st level setting that governs admission to the queue. The
priority value is an integer between 0 and 3, inclusive, that establishes,
at the coarsest level, how important a job is. A higher priority value
indicates a job of higher importance.
Priority 0 = Deny (a job that will not be allowed to run), Priority 1 = Low
importance , Priority 2 = Medium, and Priority 3 = High importance
Reasons for using Priority 0 (deny) might be that you wish to disallow
any queries against a particular table during certain hours when the
daily sales reports are run. Or you may wish to block certain categories
of users from running queries during peak hours.
Within a priority level, the weight value dictates the ratio of resource allocation. For example,
if two statements execute with the same priority, but with weight values of 80 and 20, the
system will aim to allocate resources in a 4:1 ratio, with most of the resources allocated to the
statement with higher weight. Allocation of I/O-related resources is less accurate than
allocation of CPU shares, so in this example, the CPU share ratio would be very close to 4:1,
while the disk I/O shares cannot be guaranteed as precisely.
Weight 2nd level setting that governs admission to the queue. After
evaluated by Priority, statement are then ranked by weight. Taken
together, these 2 settings map to a per-node CPU and Disk I/O
The Weight value is an integer between 1 and 100, inclusive, that
establishes how important a job is, relative to other jobs executing with
the same priority
Within a Service class, the Weight value dictates the ratio of resource
allocation. For example, if two statements execute with the same
priority, but with weight values of 80 and 20, the system will aim to
allocate resources in a 4:1 ratio, with most of the resources allocated to
the statement with higher weight. Allocation of I/O-related resources is
less so cannot be guaranteed
Under extreme memory pressure, service classes using more memory than their soft limits are
considered to be over quota. In such situations, queries and other activities executing under
that service class may be canceled
Under extreme memory pressure, service classes using more memory than
their soft limits are considered to be over quota. In such situations, queries
and other activities executing under that service class may be canceled.
This will be discussed in 2 more slides
Like a soft limit, a hard limit is defined as a percentage of physical memory (RAM) on a per
physical node basis. Again because of swap, hard limits and/or their sum can be higher than
100%.
Queries in service classes with a hard limit are canceled when the service class swap usage
goes above 1 GB.
Queries in service classes without a hard limit are canceled when the service class swap
usage goes above 10 GB.
Queries without an assigned service class fall under the default service class, which is
required for WLM.
Queries in a given Service Class will use swap space when the service
class hard limit is reached, or when the node is under memory pressure.
Queries are automatically cancelled by the system when they use
substantial amounts of swap space. Here are the rules:
1. Queries in service classes with a Hard limit are canceled when the
service class swap usage goes above 1 GB
1 GB If have HARD LIMIT defined and query exceeds it, query spills
into SWAP and is cancelled if SWAP > 1 GB
10 GB
If have HARD LIMIT not defined and query exceeds Soft limit,
query spills into SWAP and is cancelled if SWAP > 10 GB
If I dont have a SOFT or HARD LIMIT, that query can use all
10 GB
the available RAM (at that point in time) on that Worker and if
query spills into SWAP, it will be cancelled if SWAP > 10 GB
For example, consider the following simple service class configuration show in the slide.
Despite the fact that they have the same priority and weight settings, we have separated
interactive queries from the statements issued by the administrator. If these are the only two
service classes active in the system at a given point in time, in other words, no active statement
maps to any other service class - the configuration above stipulates that each get the same
share of resources, or 50% each. For the allocation of CPU resources this is the case even if the
administrator is issuing a single SQL statement while 99 concurrent interactive queries are
executing. Instead of receiving only 1% of the available resources, that single admin query will
in fact receive roughly the same share as all the interactive queries put together! Note that I/O
resource allocation is not as fine-grained as CPU time allocation, so Aster Database performs
this in a best-effort manner.
Similarly, if only the CEO and the administrator have active statements executing in the
system, all admin statements would collectively receive 3 times more resources than all CEO
queries put together.
If these are the only 2 ACTIVE Service Classes (ie: InteractiveQueries and
AdminStmts) in the system at a given point in time.- the configuration above
stipulates that each get the same share of resources, or 50% each
For the allocation of CPU resources this is the case even if:
InteractiveQueries 99 active queries
AdminStmts 1 active query
Instead of receiving only 1% of the available resources, that single Admin
query will in fact receive roughly the same share as all the Interactive queries
put together
Before any statement starts executing, it is mapped to a workload policy. Workload policies
contain a predicate attribute that specifies a boolean clause that evaluates to true for all
activities that are to be mapped to that workload. You define this predicate to evaluate the
attributes of the statement and its context. For instance, a workload containing all queries
issued by user beehive could be defined with the predicate username='beehive'.
The workload policy then specifies the service class under which the statement will be
executed. In the examples in this document we focus on SQL statements, but the mechanism
described here also applies to other activities including physical backup and restore
operations.
The Workload Policy then specifies the Service Class under which the
statement will be executed
When you install Aster Database, no workload policies are provided. Before you define any
other workload policy, you must first define the default workload policy.
Workload Policies are enforced against all User accounts, including db_superuser
User-based policies
Many customers may rely mostly on userName, roles
Time-based policies
Variable currentTime can be used to implement different policies
for business hours and during the night
Object-based policies
Using table and database names
IP-based policies
May be useful for large companies; branches, office vs. home
office
Periodic Re-evaluation
Allows the same statement/activity to be mapped to different
service classes over time
Dynamic statement reprioritization
Example: change workloads based on statement execution time
You build the workload predicate using the pre-defined WLM attributes listed in the table
below. WLM attributes are SQL-typed values that contain information about the query itself
or about the user or session that ran the query. When you write a predicate, the datatype of
your test value(s) must match the datatype of the attribute being tested.
The WLM attributes and their associated values are assigned to the query when it is planned
by the system. For example, the attribute 'userName' has its value assigned during connection
establishment and contains the username provided by the user, while the attribute 'stmtType'
contains the type of statement being executed (e.g., a SELECT or an INSERT statement).
You build the workload predicate using the pre-defined WLM attributes
listed in the table below. WLM attributes are SQL-typed values that
contain information about the query itself or about the user or session
that ran the query. When you write a predicate, the datatype of your test
value(s) must match the datatype of the attribute being tested
Allows SQL-MR
functions to be run
at different WLM
returns true for any statement executed by any user on the analyticsTeam group but only after
the statement has been running for more than thirty minutes.
The attributes you can evaluate in your predicate, listed in the following table, are also listed in
the Aster Database system table, nc_qos_workload_variables.
Like in a WHERE clause, any construct that is compatible with the attributes
type is accepted. WLM predicates may include AND and OR conjunctions,
with as much complexity as desired. For example, the predicate:
returns true for any statement executed by any user on the analyticsTeam
Role but only after the statement has been running for more than 30minutes
It is a best practice to immediately SAVE your Workload Policy after creating
it. If there is a syntax error, you will get an Error message
Missing Parens
At evaluation time, the first match wins. In other words, when a user submits a query, the first
workload policy whose predicate matches the users query will be used. Because of this, you
will always want the default policy to appear last in the evaluation order. Follow the steps
below to set the workload policy evaluation order:
1 In the Workload Policies tab, click on a policys row and drag it up or down.
2 Repeat for the other policies, dragging each row to the right place in the order.
At evaluation time, the Query walks the Workload Policy Tree and the
first match wins. Because of this, you will always want the Default Policy
to appear last in the evaluation order as the Catch-all Policy for queries
that match no other Policy
Simply Drag and Drop the Workload Policies to the order you wish them
Query
Re-order
Queries can match more than 1 Policy so Policy placement order is important
You cannot force re-prioritization of a running query, but you can write workload rules that
will re-prioritize running and queued jobs.
From the AMC, go the Processes tab to confirm your Queries are picking up
the desired Workload Policy
Overlapping Workloads
It is possible for a single statement to match multiple workloads, so sometimes the mapping may
not happen as you expect (and still be correct). To ensure that mapping is working as intended,
you can use the nc_qos_active_workloads view to see the ordered list of active workload.
Invalid Predicates
The type of each QoS context attribute defines the expressions and operators that are allowed
by the system. For instance, given that userName is of type VARCHAR, an expression such as
userName like 'daniel%' is valid and should be accepted, while something like userName
< 100 will be rejected by the system with a message as shown below.
beehive=> insert into nc_qos_workload values (10, 'newWorkload', 't', 'userName < 100',
'defaultClass');
ERROR: Predicate error: operator does not exist: character varying < integer (SQLSTATE:
2883)
beehive=>
Improper Quoting
The predicate column in the nc_qos_workload table is of type VARCHAR, so you need to
properly quote constants when inserting new entries into the table. For example, note how the
given username is quoted below:
beehive=> insert into nc_qos_workload values (10, 'newWorkload', 't',
'userName=''jsmith''', 'defaultClass');
INSERT 1
Case Sensitivity
Values of type VARCHAR are case sensitive! Keep this in mind when defining predicates that
use attributes of this type. For example, although the predicate stmtType='alter' will be
accepted as a valid predicate, (that is, its a valid expression using a VARCHAR attribute) it will
not match any ALTER ... SQL statements because Aster Database recognizes each operation
only by the Aster Database constant used to represent it, which in this case is Alter with a
capital A instead of the all-lowercase alter.
3. Overlapping Workloads
Query can match multiple Workloads, so be careful when defining
4. Invalid Predicates
Username predicate = VARCHAR, so username < 100 invalid syntax
Admission limits are configured using either the AMC or the command line to set specific
admission limits and to set the global admission threshold or limit.
Admission limits are created using an arbitrarily ordered list of rules to apply admission limits
to a particular transaction, executable, or query. These rules define the maximum number of
queries of a specific type (those that match a specific predicate) that are allowed to run
concurrently. Every admission limit is tied directly to one predicate (which must be a valid
SQL WHERE clause) and requires each task (transaction, job, or query) to pass all predicates
and admission limit counts before being admitted into the system.
The side effect of this is that a global admission threshold or limit can be set. The ncli qos
setconcurrency <concurrency> command sets and then displays the maximum query
concurrency.
This setting can be used as a global admission threshold to hold all tasks under a certain
concurrency limit. If the number of running tasks is at or above the set nc_qos_concurrency
value, no new tasks are admitted to the system. These tasks are not denied, but rather queued.
If this global admission threshold is not reached, admission limits determine if and when a
task is admitted into the system.
Admission limits are created using an arbitrarily ordered list of rules to apply
admission limits to a particular transaction, task, job, or query. These rules
define the maximum number of queries of a specific type (those that match a
specific predicate) that are allowed to run concurrently. Every admission
limit is tied directly to one predicate (which must be a valid SQL WHERE
clause) and requires each task (transaction, job, or query) to pass all
predicates and admission limit counts before being admitted into the system
For example, when administrators want to restrict certain users from submitting queries
(during specific business hours) the limit for that user would be set to 0 and all queries for that
user would be denied.
Admission control is performed at the transaction level, but is based on the context at that
particular moment in time for that session. When a user makes a connection, it is a
continuous session until that connection is ended. Within a session, one or more transactions
may happen.
The context of all three (session, transaction, and statement) at that particular moment in
time constitutes the context against which predicates are evaluated.
A transaction that starts with a BEGIN can only be evaluated against that statement and it
implies one or more statements may follow before the transaction progresses to the COMMIT,
ABORT, or END phase. However, a single statement is still its own transaction and can be
evaluated based on attributes within the statement when it is outside of an explicit transaction.
Statements in the same transaction will all execute once the transaction is admitted, however
multiple transactions in the same session must each pass their own admission routine.
Note! Because predicate evaluation uses the context at that particular moment in time for
that session (which includes some values from the statement) a statement of BEGIN or
END may not work as one would expect against the table name checks or SQL-MR function
names. This means that in the context of admission limits, transactions with multiple
statements may not be evaluated as expected.
1. etl
2. strategic
3. tactical
a. etl can only run from midnight to 2:00 AM as Medium, else Deny
b. tactical runs High 1st minute, then Low thereafter
c. strategic has no limitations (High)
Mod 11
DD, scripts, ncli and ganglia
Using Ganglia
This view tells you which users have USAGE permissions to the schemas.
The system view nc_user_schemas will display all of the schemas for
which the currently logged in user has the USAGE privilege
This view tells you table information such as table owner, which table type, compression type if
any, and access priviliges.
You can validate the constraint definitions for your tables (ie:
CHECK, PK) using:
This set of system views, sometimes referred to as the Stats DB, maintains
information and statistics about various activities in the database cluster. This set of
system views is accessible only to members of the roles catalog_admin and
db_admin and all are read-only
This system view contains information about current and past sessions
This table contains information about all transactions. Individual statements are implemented as
stand-alone transactions, or everything in a BEGIN ... END block are explicit transactions.
This table contains information about phases (wait for admission, executing, worker-
to-worker transfer (Shuffle) of transactions)
- check_table_last_vacuum_analyze.sh
- vacuum_catalogs.sh
- check_catalogs_deadspace_v2.sh
- nc_relationstats replaces the following:
nc_tablesize
nc_tablesize_details
getTableSize
ncluster_storagestat
- gen_drop_sql.sh
At this time you must contact a Teradata Aster consultant to purge the dead space for the PG
tables.
Output
When you run this script, you can point to an object (such as a Table) and it will return all the
Views that are dependent on that Table. The script will create a DROP script of all the objects
that must be executed before you can DROP that object.
1. Tables
2. Views
3. Users
Open gen_drop_sql.sh and edit the following variables for your Cluster
(this has already been done for you):
#-------------------------------------------------------------------------------------
# The following variables must be set for your environment.
#-------------------------------------------------------------------------------------
ASTER_DB_HOSTNAME="10.XXX.XXX.100"
ASTER_DB_DATABASE="dbname"
ASTER_DB_LOGON="db_superuser"
ASTER_DB_PASSWORD="XXXXX" # password for account
# ASTER_DB_PASSWORD="\$tdwallet(db_superuser_passwd)" # tdwallet
ASTER_DBA_SCHEMA="dba_maint" # working DBA schema
FULL_ASTER_CLIENT_DIR="/home/beehive/clients"
DepTable
v_depTable1
v_depTable10
v_depTable2
But when you attempt to DROP DepTable, you get the following ERROR
Using Bruno's script, point to object type (-t) that has dependencies on it
(table in our case) and point to object name (-n) which is our case is table
named 'aaf.deptable'
bash gen_drop_sql.sh -t table -n aaf.deptable
DepTable
v_depTable1
v_depTable10
v_depTable2
Syntax
For more information on this function, see the Aster Database User Guide.
ncli allows you to generate output (such as cluster system statistics) in a format that you can
later analyze. ncli functionality includes a way to look at node status, vworker configuration,
I/O configuration, replication status, and process management job status. Operations may be
performed on one, a group, or all of the nodes. Output may be formatted in tables for screen
viewing, piped to another UNIX command, or saved to a file.
Partial listing:
See Appendix for
NCLI examples.
Plus go to Aster
Database Guide
for more details
To run most ncli commands, you should log in as the UNIX user, beehive.
This is in contrast to the AMC (Aster Database Management Console), which is focused on
setting up, managing, and scaling out Aster Database. The AMC is used by your in-house
Aster Database administrators and DBAs.
Works even when the Cluster is down. So when AMC is unavailable due to
Cluster down, NCLI is still functional
$ ncli --help
The capabilities of ncli are divided into sections, which are groups of commands with related
functions. The following table lists the sections:
A command line interface that can do many of AMC duties plus others that
the AMC cannot do
When you set up event subscriptions, youre setting up subscription to be notified via SNMP
or email whenever events of a particular type occur. The ncli is the only way to add and
manage subscriptions.
The commands in the events section will run against the queen, even if executed from a
worker node. The syntax to run a command in the events section looks like this example:
It provides general tools for reporting and running UNIX commands on one or
more nodes in the cluster
Shows how much Free Memory, Free Swap is available on the Queen and all
Worker nodes
The Workload Management and Admission Limits commands enable you to connect with the
QosManager to access (to set, edit, remove, or show) the statistics, settings, and rules for
concurrency, workload management, and admission limits. This allows you to query the
admission queue to show why a particular task is still queued and not yet admitted.
The following workload management and admission limit commands are available:
The syntax to run a command in the qos section looks like this example:
Note that if you run the following commands as beehive, your session will be dropped:
The tables section provides general tools for returning information about tables
On the worker that seems to be the bottleneck, you can use Linux utilities such as the 'top'
command to find the process with high CPU usage or high I/O wait times
http://192.168.100.100/ganglia
Goal
Use Data Dictionary tables
Use Ganglia
Mod 12
Explain Plans, Joins and Table
Scans
1 EXPLAIN prints wide columns of output that are too wide to read easily on most screens.
A quick way to view more readable output is to switch the formatting of ACT before you
run EXPLAIN. To do this, type \x at the ACT prompt. This turns on 'expanded output,'
which pivots the results of the display to show each column on a new row, providing a
cleaner view. See Formatting-related commands in ACT in the Aster Database User's
Guide.
2 Always invoke ACT with the -h <queen hostname> argument. (Use the queens actual
hostname or IP address; never use "localhost" as the queen hostname, even when running
EXPLAIN from the queen node.) If you fail to provide the queens hostname or IP address,
you get a very verbose explain plan. With -h, it gives you a local plan which provides a
better picture of the real plan.
3 When you generate an EXPLAIN plan in an ACT session that is running directly on the
queen, it provides more detail than you would get in an ACT session running on your local
workstation.
Query:
EXPLAIN SELECT * from EMP;
EXPLAIN shows the estimated cost (startup cost and total cost) for each of the phases in the
querys execution plan. Phases in the plan are organized into tasks and sub-tasks called
nodes. Cost values are shown for each node. The cost shown for a parent node includes all
the costs of its children, grandchildren, and so on.
EXPLAIN also estimates the size of the result set and shows the network data transfers
involved (if any), and the purpose of each. Pay close attention to the estimated result set sizes.
Misestimating these is one of the planner's most common errors.
There are two levels of EXPLAIN used with Aster Database. The main plan is the parallel
execution plan, often referred to as the queen explain plan. This is the plan that the queen
will execute to complete the query. Next are the Postgres execution plans, often referred to as
the vworker explain plans. These are the plans the vworkers will follow in order to execute
the queries that have been sent to them by the queen. You can examine the queen-level plans
yourself. If you suspect a problem in an individual worker, your Teradata support
representative can check vworker-level explain plans for possible inefficiencies in processing.
When you run a query in Aster Database, the cluster components do the following:
1 Queen parses the query and does syntax checking.
2 Queen creates the parallel execution plan.
3 Queen sends the SQL to the vworkers to process.
4 vworkers send back the results.
5 Queen assembles and returns the finished query results.
Cost-based Optimizer
Rule-based Optimizer
COST is a logical unit; a combination of CPU cost, Memory cost and Disk cost
-[ RECORD 7 ]---------------------+-----------------------------------------------------------------
Number | 7
Operation Type | Actual statement execution
Location | Queen
Statement Type | Query
Result Table |
This is the query that the Queen will execute to
Finish Action Type |
Final Result? | Y
get the result set back to the User. It is taking
Transaction Id | Intermediate result sets from all the v-Workers
Transfer Type | and SUMming them up
Partition Attribute Offsets |
Target Table Attributes |
Statement | SELECT sum( "_c5" ) AS "count(1)" FROM "_tmp_0" AS "aggregateInp_"
Replacement Parameters |
Query Plan and Estimates | localCost=117.63..117.64 rows=1 width=8 networkCost=0.00
: Aggregate (cost=117.63..117.64 rows=1 width=8)
: -> Seq Scan on _tmp_0 "aggregateInp_" (cost=0.00..96.10 rows=8610
Partitioned table
: -> Seq Scan on _bee_p576_sales_fact_may_2008 "projInp_" (cost=0.00..536.77 rows=44377 width=0)
: -> Seq Scan on _bee_p577_sales_fact_jun_2008 "projInp_" (cost=0.00..529.79 rows=43779 width=0)
: -> Seq Scan on _bee_p578_sales_fact_jul_2008 "projInp_" (cost=0.00..1079.04 rows=89304 width=0)
: -> Seq Scan on _bee_p579_sales_fact_aug_2008 "projInp_" (cost=0.00..1093.60 rows=90460 width=0)
: -> Seq Scan on _bee_p580_sales_fact_sep_2008 "projInp_" (cost=0.00..1042.34 rows=86234 width=0)
Step executed from bottom : -> Seq Scan on _bee_p581_sales_fact_oct_2008 "projInp_" (cost=0.00..539.38 rows=44638 width=0)
up. So first Append child : -> Seq Scan on _bee_p582_sales_fact_nov_2008 "projInp_" (cost=0.00..521.40 rows=43140 width=0)
: -> Seq Scan on _bee_p583_sales_fact_dec_2008 "projInp_" (cost=0.00..544.97 rows=45097 width=0)
partitions together, then : -> Seq Scan on _bee_p597_sales_fact_pre_2008 "projInp_" (cost=0.00..62.80 rows=5280 width=0)
Aggregate : -> Seq Scan on _bee_p599_sales_fact_post_2008 "projInp_" (cost=0.00..62.80 rows=5280 width=0)
:
Data Size Distribution (in bytes) | mean size=0 standard deviation=0
(Column_name(s)) -- This indicates that rows (tuples) are being shuffled across
different workers. The Column name(s) is (are) the column(s) on which data is being
partitioned. Here are the exact reasons for repartitioning data:
4 Operation Type: Broadcast tuples <reason for broadcast> This indicates that
the result set of rows (tuples) is being sent to all the workers. The following are the cases
that could cause broadcast of rows:
Location: This column indicates where this phase is being executed. This can state either
queen, Workers or AnyWorker.
Active Transports
+------------+---------------------+--------------------+
| Node | SessionId | TransportId |
+------------+---------------------+--------------------+
| 10.60.11.5 | 2327674903724048181 | 387487523833891928 |
| 10.60.11.6 | 2327674903724048181 | 387487523833891928 |
| 10.60.11.7 | 2327674903724048181 | 387487523833891928 |
+------------+---------------------+--------------------+
3 rows
Worker-1
v-W1
queenExec
ICE
v-W2
queendb ICE
v-W3
ICE
ICE is always DataTransfer Operation but not
all DataTransfer Operations are ICE
Worker-2
v-W4
Suppose you have the following two tables (Distributed and Replicated) and
do a JOIN between them
Operation of Data Transfer means a shuffle is occurring. In this case we are copying (actually
hashing) rows from a v-Worker to another v-Worker. The table being shuffled is the
CLICKS_PAGE table.
No table being
Shuffled. Note just
showing columns.
So Data Transfer
(ICE) will be to Dest
Queen
create table users (userid int) create table users2 (user_id int)
distribute by replication; distribute by replication;
Step 3 declares
Repl table
No table being
Shuffle. Note just
showing columns.
So Data Transfer
(ICE) will be to Dest
Queen
Step 4 shows
Queen executing
final query
In the top example, Repartitioning in the Explain Plan means there is a shuffle going on. In this
case, its shuffling data in order to do an aggregation. Thats because the table that is being
aggregated is a FACT table which means have to copy like values to the same v-Worker in order
to aggregate. So although we save space using a FACT table (as opposed to a DIMENSION)
table, the shuffle must be performed which slows performance somewhat.
In the bottom example, there is no Repartitioning because the aggregation is being done on a
DIMENSION table so there is no need to copy data since every v-Worker has a copy of the
complete table. The bad news is you lose parallelism for the aggregation since only one v-
Worker will do the task.
Above show 2 Repartitions (Shuffles) resulting in Network traffic. If SALES_FACT table were a
Replicated table instead of a Distributed table, could eliminate Repartitioning
When JOIN 2 hashed tables and JOIN columns dont match hash columns, one of the Tables must be
shuffled (via hash). d.dept rows will be hashed and copied to get on same v-Workers as e.dept rows
This table is
going to be
Shuffled
Network bottleneck Repartition is when Workers are sending their rows to other Workers
Materialize true means the data is stored in the TEMP (tmp) table and then passed to the next
node.
True- We materialize the Intermediate result set into a TEMP table. It can then be
queried by the next process
EXPLAIN select e.last_name, d.dept from employee e FULL OUTER JOIN
dept d on e.department_number = d.dept;
The Explain Plan at the vWorker level is read from the bottom up
EXPLAIN select count(*) from sales_fact_lp;
localCost=9733.10..9875.15 rows=2 width=0 networkCost=0
Query Plan and Estimates |
Aggregate (cost=9875.14..9875.15 rows=1 width=0)
:
: -> Append (cost=0.00..8182.31 rows=677131 width=0)
Append summed rows of Child
: -> Seq Scan on sales_fact "projInp_" (cost=0.00..62.80 rows=5280 width=0) partitions
-> Seq Scan on sales_fact (cost=0.00..23.10 rows=1310 width=0)
: -> Seq Scan on _bee_p572_sales_fact_jan_2008 "projInp_" (cost=0.00..542.08 rows=44808 width=0)
-> : ->Seq
SeqScan
Scan on sales_fact_200801 sales_fact (cost=0.00..806.50
on _bee_p573_sales_fact_feb_2008 "projInp_" (cost=0.00..491.52 rows=40652 width=0)
: -> Seq Scan on _bee_p574_sales_fact_mar_2008 "projInp_" (cost=0.00..545.02 rows=45102 width=0)
rows=43950 width=0)
: -> Seq Scan on _bee_p575_sales_fact_apr_2008 "projInp_" (cost=0.00..528.00 rows=43700 width=0)
SeqScan
-> : ->Seq Scan on
on sales_fact_200812
_bee_p576_sales_fact_may_2008 "projInp_"(cost=0.00..812.84
sales_fact (cost=0.00..536.77 rows=44377 width=0)
: -> Seq Scan on _bee_p577_sales_fact_jun_2008 "projInp_" (cost=0.00..529.79 rows=43779 width=0)
rows=44284 width=0)
: -> Seq Scan on _bee_p578_sales_fact_jul_2008 "projInp_" (cost=0.00..1079.04 rows=89304 width=0)
: -> Seq Scan on _bee_p579_sales_fact_aug_2008 "projInp_" (cost=0.00..1093.60 rows=90460 width=0)
-> Seq Scan on _bee_p580_sales_fact_sep_2008 "projInp_" (cost=0.00..1042.34 rows=86234 width=0)
-> Seq Scan on _bee_p581_sales_fact_oct_2008 "projInp_" (cost=0.00..539.38 rows=44638 width=0)
-> Seq Scan on _bee_p582_sales_fact_nov_2008 "projInp_" (cost=0.00..521.40 rows=43140 width=0)
-> Seq Scan on _bee_p583_sales_fact_dec_2008 "projInp_" (cost=0.00..544.97 rows=45097 width=0)
-> Seq Scan on _bee_p597_sales_fact_pre_2008 "projInp_" (cost=0.00..62.80 rows=5280 width=0)
-> Seq Scan on _bee_p599_sales_fact_post_2008 "projInp_" (cost=0.00..62.80 rows=5280 width=0)
Here we see the vWorker is going to perform full table (sequential) scans of
all of the logical partitions of the sales_fact table
The total cost for scanning the July 2008 logical partition is 0.00 to read the
first row, while the cost to read all rows is 1079.04. We will return 89,304
rows (based on ANALYZE). Width= 0 tells us that we are not returning any
columns.(Width = # of bytes returned)
The cost of getting the first row is 0. The cost to read all the rows is 34.00. The number of rows
returned will be 9600. The average width of the row is 4, and the network cost to transfer these
rows from the workers to the queen is 37.
From the second line onward, the output is from the node where the query is going to be
executed. In case the query executes at multiple workers, output from the slowest worker is
displayed to the user. One can see which low-level algorithms will be applied at the nodes. In
phase 1 for example, the slowest worker will perform a sequential scan on explain_t1 to
satisfy the query.
Data Size Distribution (in bytes): This column gives an estimate on the mean and standard
deviation of data coming from worker nodes. This is only applicable for queries that require
transfer of data from either the worker nodes to the queen node or amongst the workers.
The first line of the output displays the cost summary of the entire query on
the vWorker. Here is how to interpret the query plan and estimates in the
EXPLAIN output the vWorkers:
The cost of getting the first row is 9581.10 There are Costs for:
The cost to read all the rows is 9723.15 Join, Sort, Appends,
Aggregate, etc.
The number of rows returned will be 2 (# of vWorkers)
The average width of the row is 0 (bytes)
The network cost to transfer these rows from Workers to Queen is 0.
The basic unit of cost is a decimal value where one unit represents the cost
of fetching 8 KB of sequential data from the disk. The cost of transferring 1
KB of network data is set to 1
The statement that is executed on the Queen is shown. This is the query
that will be executed to return results to the User
Take a look at the examples when executing the same query against a non-LP table and a LP
table.
Performance improves (as denoted by a lower cost) where Partition Pruning occurs. But when
there is no WHERE clause, the non-LP is a better performer.
Here is what to look for when examining the Explain plan output:
For Scan type of operations, there are only 3 possible operation types by
which a table can be accessed:
1 Sequential scan Performs full table scan, visits all data blocks
2 Index scan (only if indices are available AND the query specifies
the column being indexed AND the optimizer thinks that the
querys filtering on the column will select out < 10-20% of the
base fact table) (note: again, what matters is what the Optimizer
thinks, and so ANALYZE ANALYZE ANALYZE)
3 Bitmap Index scan (if multiple indices are available AND the query
filters on columns on which these indices have been constructed
AND the optimizer thinks that the querys filtering on these
columns will select out <20% of the base fact table)
Most queries in Aster will be Sequential scans. All data blocks are read once.
- Fast to startup
- Sequential I/O is much faster than random access if more than 80% of the
records are fetched from the table
- Only has to read each data block once
- Produces Unordered output
begin; Yes, we are dropping
set enable_seqscan to 'on'; Hints to force
set enable_indexscan to 'off'; Optimizer to do our
set enable_bitmapscan to 'off'; bidding
explain select customer_id, product_id from sales_fact where
customer_id = 467 and product_id = 42;
end;
- Since I was only looking for a single row from the table, there might be a
better way to retrieve this row than a Full Table Scan. See next page
Index scans can be very fast. If the query has the indexed column in its WHERE clause and
Aster Database estimates that the predicate will select less than 20% of the rows in the table
(based on Aster Databases table statistics) then Aster Database is likely to choose an index
scan rather than a sequential scan.
begin;
set enable_seqscan to 'off';
set enable_indexscan to 'on';
set enable_bitmapscan to 'off';
explain select customer_id, product_id from sales_fact where
customer_id = 467 and product_id = 42;
end;
begin;
set enable_seqscan to 'off';
set enable_indexscan to 'off';
set enable_bitmapscan to 'on';
explain select customer_id, product_id from sales_fact where
customer_id = 467 and product_id = 42;
end;
1- HASH JOIN
2- MERGE JOIN
A Hash join is typically faster than a merge join, given enough memory. This is the join
method most typically observed. The Postgres optimizer picks the smaller table for
constructing the hash table by looking at the unique distribution of values in the smaller table.
If the table hasn't been analyzed, then the Postgres optimizer could mistakenly assume that the
hash table will fit in memory whereas in fact, the table could be requiring more memory than
available RAM. In such cases, there will be Swap activity happening. That is the surest sign
that the estimates are wrong.
EXPLAIN select e.last_name, d.dept from employee e INNER JOIN dept d on e.department_number = d.dept;
Hash Joins are efficient because it's single pass, whereas sorting in a
Merge join may be multi-pass
Works great when the joining column of the smaller of the two tables will
fit into a Hash table in work_mem amount of memory
It might be that Optimizer thinks it'll fit, but due to analyze not having
been run, the Hash table actually doesnt fit - in that case the Hash table
spills to virtual memory (Swap file) and performance will deteriorate
A hash join tends to be the more efficient join type to use when the joining column of the
smaller of the two tables will fit into a hash table in the allocated (work_mem) amount of
memory. A hash join is efficient because it takes place in a single pass, whereas sorting in a
merge join may require more than one pass.
The optimizer picks the smaller table for constructing the hash table by looking at the
unique distribution of values in the smaller table.
Its important that tables statistics are up to date. Make sure you run ANALYZE after any
significant change to a tables contents. Otherwise, the optimizer may mistakenly guess
that a columns contents will fit into an in-memory hash table when, in fact, they will not.
In such cases the hash table spills to virtual memory, and performance is likely to be poor.
If this happens, you will see disk swap activity on the worker node. That is a sign the plans
estimates might be wrong.
- The Optimizer picks the smaller table for constructing the Hash table by
looking at the unique distribution of values in the smaller table
- If the table hasnt been analyzed, then the Optimizer could mistakenly
assume that the Hash table will fit in memory whereas in fact, the table
could be requiring more memory than available RAM. In such cases,
there will be Swap activity happening. That is the surest sign that the
estimates are wrong (ie: vmstat 5 5))
EXPLAIN select e.last_name, d.dept from employee e inner join dept d on e.department_number = d.dept;
In this case, DEPT is the Inner table and will be the HASH table
A Merge join is the most scalable and widely usable method of joining. That also typically
makes it the slowest.
The Merge join is implemented by sorting both the tables on the columns being joined and
then streaming the top few rows from each table to do the join. This makes for a very low
memory utilization during the join operation. But the sorting step requires memory, and
again misestimation causes problems. If Postgres thinks a table sorting will fit in memory,
then it will use a quicksort algorithm. Else, it will use an external disk-based sorting algorithm
(using at most work_mem amount of memory at any point during the sort).
Merge Join is the most scalable and widely usable way of joining. That also
typically makes it the slowest
The Merge join is implemented by Sorting both the tables on the columns
being joined and then streaming the top few rows from each table to do the
join
Its only practical to do Joins on large tables that do not fit in memory
Works great for all cases where the tables are large and the Optimizer
thinks they cannot fit in work_mem amount of memory
That also typically makes it the slowest if any of the other two approaches are possible.
The merge join is implemented by sorting both the tables on the columns being joined and
then streaming the top few rows from each table to do the join.
This makes for a very low memory utilization during the join operation.
However, the sorting step requires memory, and again bad estimates cause problems.
If the optimizer thinks a table sorting will fit in memory, then it will use a quicksort algorithm.
Else, it will use an external disk-based sorting algorithm (using at most work_mem amount of
memory at any point during the sort).
This makes for a very low memory utilization during the join operation.
However, the Sorting step requires memory, and again bad estimates
cause problems
If the Optimizer thinks a table sorting will fit in memory, then it will use a
quicksort algorithm. Else, it will use an external disk-based sorting
algorithm (using work_mem of memory at any point during the sort)
begin;
set enable_hashjoin to 'off';
EXPLAIN select p1.userid, p2.page from page_view_fact p1 inner join
page_view_fact p2 on p1.refdomain = p2.refdomain;
end;
Nested loop joins are the fastest but can't always be used. Since the default statistics for a table
that has not been analyzed tend to produce table size estimates that are too small, and since
nested loop joins are the joins of choice for smaller tables, the Postgres planner, by default,
tends to pick nested loops. For this reason, all Aster Database deployments ship with the
enable_nestedloop parameter set to off. Turn it on with care, and only after you have done
an EXPLAIN ANALYZE directly on the worker Postgres instances! (Aster Database does not
support EXPLAIN ANALYZE, but Postgres does, so you can run it by connecting directly to a
Postgres instance.)
Nested loop joins are very useful for performing a star-schema join between a large fact table
and a very small dimension table (either the dimension table is itself small or the dimension
table is being filtered down to a small set of rows). In such cases, consider creating an index on
the large fact table on the column being joined with the very small dimension table. Turn on
nested loops, and then check to see that the Postgres optimizer is scanning the very small
dimension table first and then using its joining column values as probes into the large fact
table using the index.
Nested Loops Join can be fastest but only where it makes sense
Since the default statistics for a table that has NOT been analyzed is very
small, the Optimizer by default has a tendency to pick nested loops.
Thats why all Cluster deployments today ship with the
enable_nestedloop set to off
Works great when the Optimizer thinks both the tables are really small OR
One table is much, much smaller than the other AND
the larger table has an Index on the joining column
Nested loop joins come in really handy when you are doing a star-schema
join between a large Fact table and very small Dimension table
In such cases, consider creating an Index on the large Fact table on the
column being joined with the very small Dimension table
Turn on nested loops, and then check to see that the Optimizer is scanning
the very small dimension table first and then using its joining column values
as it probes the large fact table using the Index
Work great when the optimizer thinks both the tables are really small
Work great when one table is much, much smaller than the other AND the larger table has
an index on the joining column.
begin;
set enable_mergejoin to 'off';
set enable_hashjoin to 'off';
set enable_nestloop to 'on';
EXPLAIN select p1.userid, p2.page from page_view_fact p1 inner join
page_view_fact p2 on p1.refdomain = p2.refdomain;
end;
- Slowest Join in theory
- But fast to produce first record
- In practice, its usually desirable for OLTP queries
- Performs poorly if second child is slow
- Only Join capable of executing CROSS JOIN
- Only Join capable of Non-equi JOIN conditions
- - Can't fit HASH table in Memory
For all Join operations, the most important cluster parameter to think
about is amount of physical memory in the machine (and the number of
v-Workers in that physical worker)
Mod 13
Bottlenecks and Tuning
Proactive
Reactive
1. Viewpoint
2. AMC
3. NCLI
4. Linux utilities
Bottleneck
Bottleneck
Bottleneck
Measuring your system is a first step in defining where the bottleneck exits.
Several bottlenecks may exist simultaneously
If you need to move big data, make it small first, and then move small data.
Prepare the data model in advance to ensure that queries touch the least amount of data.
Prepare your queries such that each computation is done exactly once, and never again.
Hardware resources
Software
SQL
CPU bottlenecks can be view via: AMC, ganglia, and Linux utilities.
Typical symptoms:
All CPUs are 100% busy
One CPU is constantly 100% busy but others are idle
>90%
Normal
>70%
https://192.168.100.100/ganglia
It can display system summary information as well as a list of tasks currently being
managed by the Linux kernel.
The types of system summary information shown and the types, order and size of
information displayed for tasks are all user configurable and that configuration can be
made persistent across restarts.
Run 'top' from the Linux command line. Press 'h' at any time to toggle online help.
The 'top' command displays a variety of information about the processor state. The display is
updated every five seconds by default, but you can change that with the 'd' command-line
option or the interactive command, 's'.
When you run top, it displays the following information in the console:
Up (uptime) This line displays the time the system has been up, and the three load averages
for the system. The load averages are the average number of processes ready to run during the
last 1, 5, and 15 minutes. This line is just like the output of the 'uptime' command. The uptime
display may be toggled by the interactive command, 'l'.
Tasks / processes Shows the total number of processes running at the time of the last update.
This is also broken down into the number of tasks which are running, sleeping, stopped, or
undead. The processes and states display may be toggled by the interactive command, 't'.
Cpu(s) Shows the percentage of CPU time in user mode, system mode, 'niced' tasks, iowait
and idle. ('Niced' tasks are only those whose nice value is positive.) Time spent in niced tasks
will also be counted in system and user time, so the total will be more than 100%. The processes
and states display may be toggled by the interactive command, 't'.
Mem Statistics on memory usage, including total available memory, free memory, used
memory, shared memory, and memory used for buffers. The display of memory information may
be toggled by the interactive command, 'm'.
Swap Statistics on swap space, including total swap space, available swap space, and used
swap space. The contents of this field and the Mem field are just like the output of the 'free'
command.
top -07:01:51 up 5 days, 12:17, 1 user, load average: 0.38, 0.29, 0.16
Tasks: 178 total, 2 running, 176 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2% us, 0.2% sy, 0.0% ni, 99.3% id, 0.0% wa, 0.0% hi, 0.3% si
Mem: 6046832k total, 4075068k used, 1971764k free, 156340k buffers
Swap: 11807348k total, 0k used, 11807348k free, 3510632k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22485 beehive 20 0 654m 13m 11m S 74 0.2 0:04.58 postgres
22486 beehive 20 0 654m 13m 11m R 54 0.2 0:03.72 postgres
22495 beehive 20 0 78360 11m 1656 S 23 0.2 0:00.62 IWTServerExec
5968 beehive 20 0 31900 1404 572 S 4 0.0 0:20.92 postgres
22497 beehive 20 0 75288 8772 1188 S 4 0.1 0:00.02 IWTServerExec
22509 beehive 20 0 10668 1296 892 R 4 0.0 0:00.02 top
5293 beehive 20 0 268m 24m 1628 S 2 0.4 142:59.20 python
1 root 20 0 2612 580 492 S 0 0.0 0:04.98 init
On the worker node, look for the Postgres processes. There should be one for each vworker on
that worker node. When you view the 'top' output during query execution, if you see fewer
running Postgres processes than you have vworkers on the node, this may indicate processing
skew. Seeing fewer Postgres processes than vworkers indicates that some vworkers have
completed processing a given query before others, which may be a sign of skew.
Taken together, the Postgres processes should consume almost all available CPU. A properly
configured Aster Database should be CPU bound when processing its typical large queries,
rather than I/O bound or network bound.
Look for the IceServer process on current versions of Aster Database, or the IWTServerExec
process on pre-4.5 versions of Aster Database. This is the process that shuffles data between
vworkers.
Look for high amounts of swap activity. The 'Swap' field appears near the top of you console:
Mem: 6046832k total, 6006724k used, 40108k free, 786540k buffers
Swap: 12096936k total, 228k used, 12096708k free, 3591048k cached
While running 'TOP', pressing the '1' key will toggle the distinct CPU usage by processor.
top -08:10:29 up 5 days, 13:26, 1 user, load average: 0.00, 0.02, 0.00
Tasks: 167 total, 1 running, 166 sleeping, 0 stopped, 0 zombie
Cpu0: 0.3% us, 1.0% sy, 0.0% ni, 95.7% id, 0.0% wa, 0.0% hi, 3.0% si
Cpu1: 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu2: 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu3: 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 6046832k total, 4066752k used, 1980080k free, 156516k buffers
Swap: 11807348k total, 0k used, 11807348k free, 3518552k cached
Look at the wait (wa) percentage. Long wait times can indicate swapping. One CPU running
at a higher use percent ('us') than the rest can indicate processing skew.
On the Worker node, look for the Postgres processes. There should be one for
each v-Worker on that worker node. If you see:
Fewer running Postgres processes than you have v-workers on the node,
this may indicate processing skew
Taken together, the Postgres processes should consume almost all available
CPU. A properly configured Database should be CPU bound when processing
its large queries, rather than I/O bound or Network bound
vmstat syntax
vmstat -a -n 5 4
-a = Active/Inactive Memory switch
-n = Display the header 1 time
5 = Delay between system output
4 = Number of iterations
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free inact active si so bi bo in cs us sy id wa st
0 0 368160 84528 403120 346772 3 9 101 150 159 343 1 1 95 2 0
0 0 368160 84288 403148 346816 0 0 0 258 95 136 0 1 96 3 0
0 0 368160 83164 403156 348264 0 0 0 0 75 155 1 0 99 0 0
0 0 368160 82660 403156 348264 0 0 0 4 62 139 0 0 99 0 0
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
The UNIX vmstat command is useful for finding out how busy a worker node is.
Specifically, vmstat prints statistics showing memory usage, disk paging, I/O wait
times, and CPU activity
At rest Under load
For example, three big tables T1, T2, T3, each size 100GB
This is All-or-nothing query. One
Sequential: One user runs following: error causes ROLLBACK
BEGIN; Select * from T1; Select * from T2; Select * from T3; END;
Highly possible for one v-Worker to use only one CPU
(this assumes all 3 tables are REPL tables)
Ensure have 100-GB bandwidth and high-end switches that can handle the increased bandwidth.
If possible, ensure only query traffic is on the link. Configure Virtual LANs (VLANS) so only
query traffic is allowed on the link.
It is also possible to enable NIC bonding. See Aster Database User Guide for more details on
this.
Potential Solution : NIC bonding ties 2 NICs together for double the
throughput assuming network wire is not over-consumed
https://192.168.100.100/ganglia
Typical symptom:
Utilization (%util) shown by iostat is high (> 90%)
https://192.168.100.100/ganglia
Table skew
Join skew
GROUP BY (aggregate) skew
For example, if your statement has: GROUP BY city, gender, then only two v-Workers would
hold the two genders. If you instead did a: GROUP BY gender, city, then you have a much
better chance of distributing the rows across more v-Workers.
Start from the Cluster EXPLAIN Plan step that you want to understand
disk characteristics
Typically this is the first phase that scans the base fact tables that are
being queried
The Linux du command is used to look for data skew on a worker node. Its syntax is:
Data on the Aster Database workers is stored in directories with names like /primary/w5z
(vworker number 5), /primary/w12z (vworker number 12), and so on. Collectively, we refer
to these directories as the "w*z" directories. We can examine these directories to check for data
skew. For example, to show space usage for all the virtual workers on a node, you type:
du -sh /primary/w*z
Note that vworker number 38 has 60 MB more data than the lowest vworker!
Skew on one v-Worker can cause disk I/O bottlenecks as the v-Worker
on that Worker node must process more data compared to others
See left-hand page for how to use Linux du and SQL-MR function nc_relationstats to find skew
How does the Optimizer estimate how much data a filtering condition would
filter out?
- Given a condition like 'WHERE a>10', how does the Optimizer estimate
that <10% of the base Fact data set is selected and so it is time to use
the Index on a?
WHERE clause to reduce the size of the Intermediate Result set before further processing
Logically Partitioned tables to reduce I/O on certain queries
Columnar tables when warranted
Datablock size = 32k. So if you can stuff more rows in a datablock, you
naturally read less datablocks
If the /primary file system is filling up there are several possible reasons.
We need to check them one by one:
The ncli node runonall command may be used to run any executable on multiple nodes.
It can also be used to run a command from a file. The executable must exist on all nodes prior
to the command being run. For some commands (like df), the command already exists on all
nodes. If a user-written script is being executed, then it must be copied to all nodes using ncli
node clonefile or a similar mechanism. This effectively allows you to run commands in
parallel over SSH on the cluster.
You can pass the following runtime parameter names as the name clause of a SET statement.
For each, we list the set or range or allowed values. When assigning a value, enclose the value
in single quotes, unless otherwise specified.
enable_bitmapscan
Enable or disable the Local Planner's use of bitmap-scan plan types. Default value is 'on'.
enable_seqscan
Enable or disable Local Planner's use of sequential-scan plan types. Default value is 'on'.
random_page_cost
Set the Local Planner' estimate of the cost of a disk page that was fetched non-sequentially from
disk. The default value is 40. Reducing this value relative to 'seq_page_cost' will cause the local
planner to prefer index scans over sequential scans. Increasing this value will lead to sequential
scans being preferred over index scans.
effective_cache_size
Sets the Local Planner's assumption about the effective size of the disk cache that is available to
a single query. This is factored into estimates of the costs of specific plans, i.e. whether to use an
index or not. A higher value makes it more likely to use an index scan, whereas a lower value
makes it more likely that sequential scans will be used. The default value is 2GB.
Tip: you would rarely turn off 'seqscan', and you would
rarely tune 'random_page_cost' or 'effective_cache_size'
Effective_cache_size: Sets the Local Planner's assumption about the effective size of
the disk cache that is available to a single query. This is factored into estimates of the costs of
specific plans, i.e. whether to use an index or not. A higher value makes it more likely to use an
index scan, whereas a lower value makes it more likely that sequential scans will be used. The
default value is 2GB
Random_page_cost: Set the Local Planner' estimate of the cost of a disk page that was fetched
non-sequentially from disk. The default value is 40. Reducing this value relative to
'seq_page_cost' will cause the local planner to prefer index scans over sequential scans.
Increasing this value will lead to sequential scans being preferred over index scans
Specifies the amount of memory to be used by internal sort operations and hash tables before
switching to temporary ondisk files. The default value, per virtual worker is 64MB. Note that for
a complex query, several sort or hash operations might be running in parallel; each one will be
allowed to use as much memory as this value specifies before it starts to put data into temporary
files. Also, several running sessions could be doing such operations concurrently. Lastly, there
are multiple virtual workers per worker node. So the total memory used could be many times the
value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort
operations are used for ORDER BY, DISTINCT, and merge joins.
vmstat syntax
In-line lab: ncli node runonall "vmstat 5 5" will show Queen,
All Workers and Loader parameters
Query tuning In general, keep the default. Instead use transaction set
commands like below to increase work_mem in effort to increase chance of
Hash Join instead of Merge Join
begin;
set work_mem = '196000'; (191 MB but only for life of query)
SELECT * from ;
end:
1. Check for bottlenecks. Use AMC and Ganglia to find the overworked
worker
2. Run ANALYZE regularly to ensure the most optimal query plans
3. Run VACUUM to ensure queries achieve best performance
4. Make sure youre using transactions. When UPDATING or INSERTING
data to Aster Database, wrap statements in a transaction
5. Make sure youve written your queries in a way that allows Aster
Database to parallelize the work as much as possible
6. Check the EXPLAIN plan to make sure Aster Database has chosen a
good plan for running the query
7. If you suspect Aster Database has chosen the wrong join technique,
can drop Hints via SET ENABLE commands
8. If you suspect too much data is being shuffled among workers, refer to
Aster User Guide
9. If none of the techniques listed above solve the problem, try to isolate
the problem using the Linux tools
Tuning is only as good as the data model get that right and optimized for
Aster!
If you can express the same results using GROUP BY and DISTINCT (hash
to v-Worker, then Dedup) use the GROUP BY. This gives the Optimizer
more options to choose from.
Less /primary/tmp/iostat.write.directio.log
Time: 10:23:47
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 2.82 0.00 0.00 97.18
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 10.56 0.00 2.82 0.00 0.05 38.00 0.00 0.00 0.00 0.00
iostat Displays single history since boot report for all CPU/Devices
############################
Wed Sep 23 13:12:10 PDT 2009
Write test directio
1857+0 records in
1857+0 records out
1947205632 bytes (1.9 GB) copied, 7.36374 s, 264 MB/s
How to find queries that bottleneck the system (on less busy systems the
AMC can be used to identify slow queries)
1. Identify the point in time the cluster is really busy. You could look at
Ganglia to figure this out
2. Identify one of the nodes where the resource utilization is heavy
3. ssh to that node , and run atop -r /var/log/atop.log.1 ( where atop store
logs for about 7 days )
4. Travel back to the point of time when cluster was busy
5. Get one of the postgres backend PIDs whose resource utilization (
CPU, Disk, RAM ) is heavy.
6. Associate this postgres backend PID on the worker with the queenExec
PID on the queen
You can tie the postgres pid ( from atop output ) to the queenExec by searching
queenExec logs for backendPid=<pid>
The queenExec PID will be listed as: [fromPid:32653]
7. Get the query from the queenExec PID. (displayed in queenExec.log)
Solution: Add battery for cache, add NIC card, take advantage of
InfiniBand by configuring Backup node in cabinet, enable NIC
bonding
Distributed table created with distribution key that is not distributed evenly
among the vWorkers (column=gender)
nc_tablesize_details
1. From AMC, go to Worker Node that had Skew, then go to Node Hardware Stats
and determine which component spiked (CPU, Memory, Network I/O, Disk I/O).
Note may have to wait a few minutes for it to register in Graph
6
Product_id ________
DROP table clicks_skew;
Page 6
________
3. One more thing. This table will be joined to Product_dim table at the end of
each month to determines month end sales figures per Product
4. Another query will be to find any NULL customers at month end
5. What are your recommendations to prevent bottlenecks at all costs?
Create new CLICKS_PROD table with PRODUCT_ID as Distribution key
Create INDEX on USER_ID in CLICKS_Skew table
ANALYZE both columns
2. Name two Linux utilities that can help you diagnose bottlenecks
4. This cost tends to be the most expensive (Disk I/0, Network, Memory,
CPU)
Module 14
Errors and Logs
Monitoring Replication
What to do when .
3. If client software is involved, collect the name and version number for the client software and on what OS this client
package is running.
4. Is the above problem related to a partner software application (SAS, Information, Tableau, Microstrategy, R, etc)?
5. If partner software is involved, collect the name and version of the partner application along with OS the
partner application is running on.
6. Gather any relevant logs that has the error messages and post the error message statement(s) into the incident.
By using diagnostic log bundles, you can more easily send information to Teradata Global
Technical Support (GTS) for analysis, reducing the time and effort required to diagnose
system problems.
Only AMC users with administrative privileges can create, download, and send diagnostic log
bundles.
Aster Database automatically tracks its activity in a variety of log files. The log files
are useful when you need to find the cause of an error or unexpected behavior, or
when you just want to confirm that an operation has taken place
There are 2 ways to to access logs in Aster Database
preparation log (the log of events related to the process of preparing a node for
participation in Aster Database);
system log (the contents of the Linux syslog file /var/log/messages); and
kernel log (the contents of the Linux kernel buffer provided through dmesg).
The log appears, showing the latest 1000 lines. Click Refresh at any time to load the latest
1000 lines.
preparation log (the log of events related to the process of preparing a node for
participation in Aster Database);
system log (the contents of the Linux syslog file /var/log/messages);
kernel log (the contents of the Linux kernel buffer provided through dmesg)
When an issue arises on a cluster, one of the first steps in finding the cause is to retrieve the
relevant log files. Aster Database is made up of a large array of distinct services, and it
produces more than 60 different logs spread across every node in the cluster. The AMC
provides an easy way for you to deal with all these different logs by creating diagnostic log
bundles. A diagnostic log bundle is a compressed tarball containing data used to determine the
system context and diagnose Aster Database issues. This data may come in system logs from
the queen and subordinate nodes (worker and loader).
By using diagnostic log bundles, you can more easily send information to Teradata Global
Technical Support (GTS) for analysis, reducing the time and effort required to diagnose
system problems.
Only AMC users with administrative privileges can create, download, and send diagnostic log
bundles.
When an issue arises on a cluster, one of the first steps in finding the cause
is to retrieve the relevant log files. Aster Database is made up of a large
array of distinct services, and it produces more than 60 different logs
spread across every node in the cluster. The AMC provides an easy way for
you to deal with all these different logs by creating diagnostic log bundles
A diagnostic log bundle is a compressed tarball containing data used to
determine the system context and diagnose Aster Database issues. This
data may come in system logs from the Queen and subordinate nodes
(Worker and Loader). By using diagnostic log bundles, you can more easily
send information to Teradata Aster tech
By default, a diagnostic log bundle contains only system logs from the
Queen. If you want to create a complete bundle that includes logs from the
other nodes as well, you can create what is called a cluster bundle by
clicking the Prepare link
Type Queen or Cluster. A queen-type bundle includes only log files and information
from the queen. A cluster-type bundle includes log files and information from all
nodes, including the queen.
Submitted by Tells what initiated the job. System means the job was run automatically by the
AMC. If the job was manually initiated, the username of the person who
submitted the job is displayed.
Start Time Start time of the log content. That is, the time of the first logged event included in
the bundle.
Filename Name of the log bundle file. The name indicates the time the bundle creation job
was initiated.
PrepareClusterBundle Click Prepare create a complete bundle that includes logs from the other
nodes as well
Send to Aster Support Click Send to use to send the log bundle to Teradata Global
Technical Support (GTS).
Open an ssh session to the Queen machine and look in the directory
/primary/diagbundles . Look for a file with the same name shown in the
list of diagnostic log bundle jobs. The name will look similar to
YYYYMMDD_
Another way to include all nodes in a bundle is to click the Manually
Initiate Diagnostic Bundle button. This displays a dialog that provides
many more choices, including the choice to include queen and cluster
nodes in the bundle, set a time window, and add custom commands
Your SQL-MapReduce functions can emit debugging messages written to the standard output
or the standard error. During or after execution, access to the standard output (stdout) and
standard error (stderr) is provided through the AMC. To see this:
1 Open the AMC in a browser window by typing http:<IP address of the queen>
2 Go to the Processes tab.
3 Find your query in the Processes list. To do this, it may be helpful to sort based on Type or
User. Click a column to sort based on that column.
4 Click the ID of the query you wish to view. In the Process Detail tab that appears, click the
View Logs button.
According to the standard, the first two characters of an error code denote a class of errors,
while the last three characters indicate a specific condition within that class. Thus, an
application that does not recognize the specific error code can still be able to infer what to do
from the error class.
Partial listing
http://<QueenIP>:1990
procman
Job creator
Creates job for SQL-MR function
Tasks a TEMP directory
txman qos executor Tasks QosSlave for Workload Mgt
SQL vworkeraa
vworker
vworker a IOInterceptor
FUSE server for compression ICE
SQL vworkeraa
vworker
Tuple Mover
vworker b queenExec
queenExec
JVM
workerd
Java MR execution
engines
SQL vworkeraa
vworker
Slave for txman, sysman,
vworker N procman
replicationd
Passive vworkers.
Filesystem state Replication Server
containing up-to-date
vworker copies.
Collect the following log files (from time of error) from Queen node:
1 - /primary/logs/sysmanExec.log
2 - /primary/logs/cluster.log
3 - /primary/logs/queenExec.log
4 - /primary/logs/alerts.log
All warning, error, and fatal messages from all the nodes in the cluster are
collected in the file /primary/logs/cluster.log
Note that each log entry has an associated session id that can be used to
put together the messages for a specific session
queenExec
This process handled the planning, preprocessing and execution of
all statements. Each statement in the AMC has a corresponding
queenExec process active on the queen. This queenExec
communicates with the PostGres processes in each Worker node
Note: for data loads through a loader node the queenExec process
will reside on a loader node, there will not be a queenExec process
active on the Queen
StatsServer
This process collects statistics and populates the NC_ views such
as nc_all_statements, nc_all_statement_phases
WorkerDaemon
This process receives commands from sysMan on the queen to
perform certain PostGres related tasks: vworker
activate/deactivate, commit/rollback of transactions
ReplicationDaemon
This process initiates replication of data from the primary v-Workers to
the secondary v-Workers, including replication from the queenDb to the
secondary queenDb
IoInterceptor
This process handles the compressed data in Aster. Since PostGres
does not support high levels of compression, Aster built a module
that sits between the file system and PostGres
IceServer
- This process mainly handles repartitioning of data.
- Each v-Worker has a bridge (.so file linked into PG) to extract data and
send it through ICE (InterConnect Exchange, reads/write in raw pg
format) to another v-Worker
Process URL
queenExec http://<queen_ip>:1990
Workload Mapper http://<queen_ip>:2011/workloadmapper
StatsServer http://<worker_ip>:6543
sysMan http://<queen_ip>:2105
FileStreamServer http://<worker_ip>:2113 (On Worker nodes)
Backups http://<backup manager IP address>:1991/stats
IceServer http://<worker_ip>:2115
URL : http://<queen_ip>:2105/asyncActivityStats
- This page shows any pending activities related to replication. Any time a
user performs a commit the AMC will show the step 'Prepare
Transaction' which generates WAL (Write Ahead Logs) which are
replicated to the secondary v-Workers. If replication fails, the Prepare
Transaction step will be marked Failed
- Under normal conditions the activities listed on this web page should
process (and disappear from the list) every 15-30 seconds. If activities
remain in this list for more than 30 min, there should be several errors
reported in /primary/logs/sysmanExec.log on the Queen
The bottom of the page shows replication stats. The 'Replications Failed'
field should be zero or close to zero
Replication diagnostics
- Files Stream Service: Transfers files from 1 node to another via AFTP
http://<QueenIP>:2113
- Replication helper Service: Tracks files that have been modified since
last Replication http://<QueenIP>:2111
When a node is marked Suspect in the AMC, it means that one or more of
the v-workers on that node have failed, but one or more are still running.
When the node is marked Failed, it means that all the v-workers have
failed (often as a result of the node itself having crashed, gone off the
network, or otherwise failed)
Collect the following log files (from time of error) from the Backup
Manager node:
- /home/beehive/data/logs/backupExec.log
- /home/beehive/data/logs/cluster.log
Collect the following log files (from time of error) from the Queen node:
- /primary/logs/sysmanExec.log
- /primary/logs/cluster.log
- /primary/logs/queenExec.log
3. You do a RESTORE on your Aster nCluster. But now the AMC wont
display. What command(s) would you type?
The conclusion slides that were presented in class are omitted in this book..
Thank you for attending the Teradata Aster Database Administration class!