Вы находитесь на странице: 1из 6

Teradata SQL Tuning Evergreens

Ensure Completeness and Correctness of Teradata Statistics

This is what I always teach in my performance Teradata SQL tuning workshops without ever
disappointing my clients:

The most important optimization task is to provide complete and correct statistics. This
statement is valid for the statistics of tables and indexes (secondary indexes, join indexes, hash
indexes etc.).

Detecting the missing statistics is easier than one might expect; In the SQL assistant one runs the
following command:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

This turns on the optimizers statistic hints, and adds information about required statistics to the
explain output. At the very end of the output, a list of statistics suggestions will be shown. These
are the statistics the optimizer considers helpful in creating a better execution plan. Add them
one by one, and re-check the execution plan after each change.

Several methods can be used to detect stale statistics. The easiest way is to divide the SQL
statement, and to test each part separately, by comparing the estimated row count (as shown in
the explain output) with the real number of records being returned by the query.

This approach is particularly suitable, if the SQL statement cannot be executed in a reasonable
time. Here is one example:

SELECT t01.*

FROM
table_1 t01

INNER JOIN

table_2 t02

ON

t01.PK = t02.PK

WHERE

t01.column_1 = 1

AND t02.column_a = 2

Above query can be split into two parts for testing:

SELECT * FROM table_1 WHERE column_1 = 1;

and

SELECT * FROM table_2 WHERE column_a = 2;

Execute and EXPLAIN both queries, and compare the number of returned rows with the number
of rows the explain output delivers.

If there is a big difference, it could be related to stale statistics or similar problems, which are
related to data distribution.

The Primary Index (PI) Choice


Whenever possible, design your queries in a way, that the Primary Index can be used in the joins.
Being able to join on the complete set of Primary Index columns is the least costly way of joining
(apart from the Nested Join). The join condition can include additional columns, which are not
part of the Primary Index (they will be applied as residual conditions), but missing Primary Index
columns withdraw the possibility of a rowhash match scan.

If you cannot change the Primary Indexes for whatever reason, or you would need a different
Primary Index only for certain SQL statements, use a volatile table or a real temporary table with
the same structure and content like the original table but with a different Primary Index.

Teradata Indexing Techniques

The usage of indexes can be another option to improve the performance of a query. For
example, secondary indexes could be helpful if you have highly selective WHERE conditions.

Although usually not considered indexing, you could try join indexes or work with partitioning.
The difference between an index and partitioning is that indexes are stored in sub-tables,
partitioning is just another way of structuring the base table.

Whenever working with indexing techniques you have to keep the overall data warehouse
architecture in mind, and determine if your solution fits into it.

If potentially useful indexes are not used by the Optimizer, and they are not useful in the overall
PDM design, drop them immediately. They only will waste space.

More details about indexing you can find here

Query Rewriting

Many times, queries performance can be improved by rewriting the query.


Examples like using DISTINCT instead of GROUP BY on columns with many different values come
to my mind.

Union statements could be used to break up a large SQL statements into several smaller pieces,
which can be executed in parallel.

Query rewriting often allows to solve performance issues, even in the cases when all other
techniques failed.

While query rewriting is a very powerful way of improving performance, it often requires to
understand the business logic (can this left join be replaced by an inner join?). Usually,
understanding the business logic of a query, reveals additional tuning options.

Real Time Monitoring

Watch the query running in real-time. Observing your query while its running in Viewpoint or
PMON, will help you to detect the critical steps.

Most performance issues are caused either by query steps with heavy skewing, or by a bad
execution plan caused by stale and/or missing statistics.

Stale and missing statistics typically lead to wrong decisions in join preparation (copying a table
to all AMPs instead of rehashing) and join types used (product join instead of merge join).

Comparison of Resource Usage

Another very important task in performance optimization is to measure the resource usage
before and after the optimization took place. Query run times can be misleading, as they are
influenced by concurrent workload , blocking etc.
Here is one example query, which gives you detailed insight about how good or bad each step of
your query is. In order to be able to run this query you need to have select access to the
DBC.DBQLOGTBL table.

Set a different QUERYBAND for each version of the query you are running:

SET QUERY_BAND = Version=1; FOR SESSION;

SELECT

AMPCPUTIME,

(FIRSTRESPTIME-STARTTIME DAY(2) TO SECOND(6)) RUNTIME,

SPOOLUSAGE/1024**3 AS SPOOL_IN_GB,

CAST(100-((AMPCPUTIME/(HASHAMP()+1))*100/NULLIFZERO(MAXAMPCPUTIME)) AS
INTEGER) AS CPU_SKEW,

MAXAMPCPUTIME*(HASHAMP()+1) AS CPU_IMPACT,

AMPCPUTIME*1000/NULLIFZERO(TOTALIOCOUNT) AS LHR

FROM

DBC.DBQLOGTBL

WHERE

QUERYBAND = Version=1;

The query will return:

The total CPU Usage

The Spool Space needed

The LHR (ratio between CPU and IO usage)


The CPU Skew

The Skew Impact on the CPU

Goal is to reduce total CPU usage, consumed spool space and skew impact on the CPU.
Furthermore, the LHR is optimally around 1.00

Вам также может понравиться