You are on page 1of 4

TERADATA SQL PERFORMANCE TUNING

1 TOP N option:
Using TOP N option with SELECT statement restricts the no of rows, thereby improving the performance. For best performance, use the TOP N option instead of the QUALIFY clause with RANK or ROW_NUMBER. In best cases, the TOP N option provides better performance. In worse cases, the TOP N option provides equivalent performance. If a SELECT statement using the TOP N option does not also specify an ORDER BY clause, the performance of the SELECT statement is better with BTEQ than with FastExport.

2 CASE statements:
Using CASE statement can significantly improve the performance. For multiple aggregates filtering distinct ranges of values. For example, total sales for several time periods. To create two-dimensional reports directly from Teradata Database. For example, balances in individual accounts held by all bank customers. CASE expressions help increase performance. They return multiple results in a single pass over the data rather than making multiple passes over the data and then using the client application to combine them into a single report.

You can see performance improvements using the CASE expression as the following increase: Number of queries against the same source table(s) Volume of data in the source table

3 SI:
SIs supply alternate access paths. This increases performance. For best results, base SIs on frequently used set selections and on an equality search.

4 ANALYTICAL FUNCTIONS:

Analytical functions are very much useful in improving the performance if used in proper manner. Using analytical functions, you can target the data analysis within the data warehouse itself. This provides several advantages, including: Improved processing performance. Less data movement across the network.

5 PI:
Choosing an appropriate PI will improve the performance to a greater extent. More the PI is unique, the distribution would be uniform and no amp would be overloaded.

6 COMPRESSION:
Choosing the most frequent values and compressing those values will save space and improve performance.

7 STATISTICS:
Statistics need to be collected for the columns which change often.

8 STEP BY STEP PROCEDURE:


1. Run explain plan (pressing F6 or EXPLAIN sel * ,) Then see for potential information like - No or low confidence - Product joins conditions - By way of an all row scan - FTS - Translate Also check for - Distinct or group by keywords in SQL query - In/ not in keywords and check for the list of values generated for the same APPROACHES

A. In case of product join scenarios,check for - Proper usage of alias - joining on matching columns - Usage of join keywords - like specifying type of joins (ex. inner or outer ) - use union in case of "OR scenarios - Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique. B. collects stats - Run command "diagnostic help stats on for the session" - Gather information on columns on which stats has to be collected - Collect stats on suggestions columns - Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename> - Make sure stats are re-collected when at-least 10% of data changes - remove unwanted stats or stat which hardly improves performance of the queries - Collect stats on columns instead of indexes since index dropped will drop stats as well!! - collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions - Check if stats are re-created for tables whose structures have some changes c. Full table scan scenarios - Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system - Make sure SI is defined on the columns which are used as part of joins or Alternate access path. - Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column 2. If intermediate tables are used to store results, make sure that - It has same PI of source and destination table 3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not redistributed on AMPS 4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query 5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results 6. Inner Vs Outer Joins Check which join works efficiently in given scenarios.Some examples are - Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column) - Inner joins can be used when we get actual data and no extra data is loaded into spool for processing

Please note for outer join conditions: 1. Filter condition for inner table should be present in "ON" condition 2. Filter condition for outer table should be present in "WHERE" condition