Tuning

Teradata SQL Tuning Evergreens
Ensure Completeness and Correctness of Teradata Statistics
This is what I always teach in my performance Teradata SQL tuning workshops without ever
disappointing my clients:
The most important optimization task is to provide complete and correct statistics. This
statement is valid for the statistics of tables and indexes (secondary indexes, join indexes, hash
indexes etc.).
Detecting the missing statistics is easier than one might expect; In the SQL assistant one runs the
following command:
DIAGNOSTIC HELPSTATS ON FOR SESSION;
This turns on the optimizers statistic hints, and adds information about required statistics to the
explain output. At the very end of the output, a list of statistics suggestions will be shown. These
are the statistics the optimizer considers helpful in creating a better execution plan. Add them
one by one, and re-check the execution plan after each change.
Several methods can be used to detect stale statistics. The easiest way is to divide the SQL
statement, and to test each part separately, by comparing the estimated row count (as shown in
the explain output) with the real number of records being returned by the query.
This approach is particularly suitable, if the SQL statement cannot be executed in a reasonable
time. Here is one example:
SELECT t01.*
FROM
table_1 t01
INNER JOIN
table_2 t02
ON
t01.PK = t02.PK
WHERE
t01.column_1 = 1
AND t02.column_a = 2
Above query can be split into two parts for testing:
SELECT * FROM table_1 WHERE column_1 = 1;
and
SELECT * FROM table_2 WHERE column_a = 2;
Execute and EXPLAIN both queries, and compare the number of returned rows with the number
of rows the explain output delivers.
If there is a big difference, it could be related to stale statistics or similar problems, which are
related to data distribution.
The Primary Index (PI) Choice

Whenever possible, design your queries in a way, that the Primary Index can be used in the joins.
Being able to join on the complete set of Primary Index columns is the least costly way of joining
(apart from the Nested Join). The join condition can include additional columns, which are not
part of the Primary Index (they will be applied as residual conditions), but missing Primary Index
columns withdraw the possibility of a rowhash match scan.
If you cannot change the Primary Indexes for whatever reason, or you would need a different
Primary Index only for certain SQL statements, use a volatile table or a real temporary table with
the same structure and content like the original table but with a different Primary Index.
Teradata Indexing Techniques
The usage of indexes can be another option to improve the performance of a query. For
example, secondary indexes could be helpful if you have highly selective WHERE conditions.
Although usually not considered indexing, you could try join indexes or work with partitioning.
The difference between an index and partitioning is that indexes are stored in sub-tables,
partitioning is just another way of structuring the base table.
Whenever working with indexing techniques you have to keep the overall data warehouse
architecture in mind, and determine if your solution fits into it.
If potentially useful indexes are not used by the Optimizer, and they are not useful in the overall
PDM design, drop them immediately. They only will waste space.
More details about indexing you can find here
Query Rewriting
Many times, queries performance can be improved by rewriting the query.

Examples like using DISTINCT instead of GROUP BY on columns with many different values come
to my mind.
Union statements could be used to break up a large SQL statements into several smaller pieces,
which can be executed in parallel.
Query rewriting often allows to solve performance issues, even in the cases when all other
techniques failed.
While query rewriting is a very powerful way of improving performance, it often requires to
understand the business logic (can this left join be replaced by an inner join?). Usually,
understanding the business logic of a query, reveals additional tuning options.
Real Time Monitoring
Watch the query running in real-time. Observing your query while its running in Viewpoint or
PMON, will help you to detect the critical steps.
Most performance issues are caused either by query steps with heavy skewing, or by a bad
execution plan caused by stale and/or missing statistics.
Stale and missing statistics typically lead to wrong decisions in join preparation (copying a table
to all AMPs instead of rehashing) and join types used (product join instead of merge join).
Comparison of Resource Usage
Another very important task in performance optimization is to measure the resource usage
before and after the optimization took place. Query run times can be misleading, as they are
influenced by concurrent workload , blocking etc.
Here is one example query, which gives you detailed insight about how good or bad each step of
your query is. In order to be able to run this query you need to have select access to the
DBC.DBQLOGTBL table.
Set a different QUERYBAND for each version of the query you are running:
SET QUERY_BAND = Version=1; FOR SESSION;
SELECT
AMPCPUTIME,
(FIRSTRESPTIME-STARTTIME DAY(2) TO SECOND(6)) RUNTIME,
SPOOLUSAGE/1024**3 AS SPOOL_IN_GB,
CAST(100-((AMPCPUTIME/(HASHAMP()+1))*100/NULLIFZERO(MAXAMPCPUTIME)) AS
INTEGER) AS CPU_SKEW,
MAXAMPCPUTIME*(HASHAMP()+1) AS CPU_IMPACT,
AMPCPUTIME*1000/NULLIFZERO(TOTALIOCOUNT) AS LHR
FROM
DBC.DBQLOGTBL
WHERE
QUERYBAND = Version=1;
The query will return:
The total CPU Usage
The Spool Space needed
The LHR (ratio between CPU and IO usage)

The CPU Skew
The Skew Impact on the CPU
Goal is to reduce total CPU usage, consumed spool space and skew impact on the CPU.
Furthermore, the LHR is optimally around 1.00

Tuning

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Tuning

Загружено:

Авторское право:

Доступные форматы

Teradata SQL Tuning Evergreens

Ensure Completeness and Correctness of Teradata Statistics

DIAGNOSTIC HELPSTATS ON FOR SESSION;

Above query can be split into two parts for testing:

SELECT * FROM table_1 WHERE column_1 = 1;

SELECT * FROM table_2 WHERE column_a = 2;

The Primary Index (PI) Choice

Teradata Indexing Techniques

More details about indexing you can find here

Many times, queries performance can be improved by rewriting the query.

Real Time Monitoring

Comparison of Resource Usage

SET QUERY_BAND = Version=1; FOR SESSION;

(FIRSTRESPTIME-STARTTIME DAY(2) TO SECOND(6)) RUNTIME,

The query will return:

The total CPU Usage

The Spool Space needed

The LHR (ratio between CPU and IO usage)

The Skew Impact on the CPU

Вам также может понравиться