Академический Документы
Профессиональный Документы
Культура Документы
This is what I always teach in my performance Teradata SQL tuning workshops without ever
disappointing my clients:
The most important optimization task is to provide complete and correct statistics. This
statement is valid for the statistics of tables and indexes (secondary indexes, join indexes, hash
indexes etc.).
Detecting the missing statistics is easier than one might expect; In the SQL assistant one runs the
following command:
This turns on the optimizers statistic hints, and adds information about required statistics to the
explain output. At the very end of the output, a list of statistics suggestions will be shown. These
are the statistics the optimizer considers helpful in creating a better execution plan. Add them
one by one, and re-check the execution plan after each change.
Several methods can be used to detect stale statistics. The easiest way is to divide the SQL
statement, and to test each part separately, by comparing the estimated row count (as shown in
the explain output) with the real number of records being returned by the query.
This approach is particularly suitable, if the SQL statement cannot be executed in a reasonable
time. Here is one example:
SELECT t01.*
FROM
table_1 t01
INNER JOIN
table_2 t02
ON
t01.PK = t02.PK
WHERE
t01.column_1 = 1
AND t02.column_a = 2
and
Execute and EXPLAIN both queries, and compare the number of returned rows with the number
of rows the explain output delivers.
If there is a big difference, it could be related to stale statistics or similar problems, which are
related to data distribution.
If you cannot change the Primary Indexes for whatever reason, or you would need a different
Primary Index only for certain SQL statements, use a volatile table or a real temporary table with
the same structure and content like the original table but with a different Primary Index.
The usage of indexes can be another option to improve the performance of a query. For
example, secondary indexes could be helpful if you have highly selective WHERE conditions.
Although usually not considered indexing, you could try join indexes or work with partitioning.
The difference between an index and partitioning is that indexes are stored in sub-tables,
partitioning is just another way of structuring the base table.
Whenever working with indexing techniques you have to keep the overall data warehouse
architecture in mind, and determine if your solution fits into it.
If potentially useful indexes are not used by the Optimizer, and they are not useful in the overall
PDM design, drop them immediately. They only will waste space.
Query Rewriting
Union statements could be used to break up a large SQL statements into several smaller pieces,
which can be executed in parallel.
Query rewriting often allows to solve performance issues, even in the cases when all other
techniques failed.
While query rewriting is a very powerful way of improving performance, it often requires to
understand the business logic (can this left join be replaced by an inner join?). Usually,
understanding the business logic of a query, reveals additional tuning options.
Watch the query running in real-time. Observing your query while its running in Viewpoint or
PMON, will help you to detect the critical steps.
Most performance issues are caused either by query steps with heavy skewing, or by a bad
execution plan caused by stale and/or missing statistics.
Stale and missing statistics typically lead to wrong decisions in join preparation (copying a table
to all AMPs instead of rehashing) and join types used (product join instead of merge join).
Another very important task in performance optimization is to measure the resource usage
before and after the optimization took place. Query run times can be misleading, as they are
influenced by concurrent workload , blocking etc.
Here is one example query, which gives you detailed insight about how good or bad each step of
your query is. In order to be able to run this query you need to have select access to the
DBC.DBQLOGTBL table.
Set a different QUERYBAND for each version of the query you are running:
SELECT
AMPCPUTIME,
SPOOLUSAGE/1024**3 AS SPOOL_IN_GB,
CAST(100-((AMPCPUTIME/(HASHAMP()+1))*100/NULLIFZERO(MAXAMPCPUTIME)) AS
INTEGER) AS CPU_SKEW,
MAXAMPCPUTIME*(HASHAMP()+1) AS CPU_IMPACT,
AMPCPUTIME*1000/NULLIFZERO(TOTALIOCOUNT) AS LHR
FROM
DBC.DBQLOGTBL
WHERE
QUERYBAND = Version=1;
Goal is to reduce total CPU usage, consumed spool space and skew impact on the CPU.
Furthermore, the LHR is optimally around 1.00