Академический Документы
Профессиональный Документы
Культура Документы
Contents
1. Introduction ........................................................................................ 4
Reduce the Workload ............................................................................................. 4 Balance the Workload ............................................................................................. 4 Parallelize the Workload ......................................................................................... 4 Upgrade .................................................................................................................. 4
2.3 Eliminate the use of Temporary Tables ...........................................................10 2.4 Avoid Mixed Type Expressions ........................................................................10
3. Indexes ............................................................................................ 12
3.1 Secondary Indexes ..........................................................................................12 3.2 Join Index.........................................................................................................12
3.2.1 Single table Join Index ......................................................................................... 13 3.2.2 Aggregate Join Index ........................................................................................... 13 3.2.3 Sparse Index........................................................................................................ 13 3.2.4 Global (Join) Index ............................................................................................... 14 3.2.5 Join Index performance........................................................................................ 14
1. Introduction
Performance of Teradata SQL based application depends on several factors, including database design, network latency, and query optimization, hardware specifications. Poorly tuned queries often cause performance problems. The objective of tuning a system is to either reduce the response time for end users of the system, or to reduce the resources used to process the same work. Both these objectives can be achieved in several ways:
Upgrade
Software/Hardware upgrade can be another means for attaining the desired performance levels and should be used only after trying everything from SQL tuning/data model restructuring and Database tuning perspective. To ensure performance after a major / minor software or hardware upgrade, perform the following tasks on the production system. Recollect statistics where possible.
Consider the predicates in the SQL statement and the number of rows in the table. Look for suspicious activity, such as a full table scans on tables with large number of rows, which have predicates in the where clause. Determine why an index is not used for such a selective predicate. A full table scan does not mean inefficiency. It might be more efficient to perform a full table scan on a small table, or to perform a full table scan to leverage a better join method (for example, hash_join) for the number of rows returned. If any of these conditions are not optimal, then consider restructuring the SQL statement or the indexes available on the tables.
Example 1: This example demonstrates how rewriting a query to use IN can improve performance. This query identifies all employees who have placed orders on behalf of customer 144. The following SQL statement uses EXISTS: SELECT e.employee_id, e.first_name, e.last_name, e.salary FROM employees e WHERE EXISTS (SELECT 1 FROM orders o WHERE e.employee_id = o.sales_rep_id AND o.customer_id = 144); Rewriting the statement using IN, results in significantly fewer resources used. The SQL statement using IN: SELECT e.employee_id, e.first_name, e.last_name, e.salary FROM employees e WHERE e.employee_id IN (SELECT o.sales_rep_id FROM orders o WHERE o.customer_id = 144); Explanation: In the query using EXISTS, an extra unnecessary step is being performed by the parent query. From the sub-query, we obtain a table that has equijoined the tables employees and orders on the basis of employee_id and sales_rep_id. Further, the table is filtered for customer_id = 144. The obtained table is the resultant table needed but, an extra step of comparing of employee_id is done once again when the parent query is performed. However, in the query using IN, the subquery returns only a filtered table on the basis of the specified customer_id and the join function is performed by the parent query. Thus, the work performed by the query using IN clause is much less than that of the query using the EXISTS clause.
Example 2: This example demonstrates how rewriting a query to use EXISTS can improve performance. This query identifies all employees from department 80 who are sales reps who have placed orders. The following SQL statement uses IN:
3. Indexes
3.1 Secondary Indexes
Secondary Indexes (SI) supply alternate access paths and the use of appropriate secondary indexes can increase retrieval performance. For best results, secondary indexes should be based on frequently used set selections and on equality search. A table can have up to 32 Secondary Indexes that can be created and dropped dynamically. However, it is not a good idea to create a number of SIs for each table just to speed up set selection because SIs consume the following extra resources: SIs require additional storage to hold their subtables. In the case of a Fallback table, the SI subtables are Fallback also. Twice the additional storage space is required. SIs require additional I/O to maintain these subtables.
When deciding whether or not to define a NUSI, there are other considerations. The Optimizer may choose to do a Full Table Scan rather than utilize the NUSI in two cases: When the NUSI is not selective enough. When no COLLECTed STATISTICS are available.
As a guideline, choose only those rows having frequent access as NUSI candidates. After the table has been loaded, create the NUSI indexes, COLLECT STATISTICS on the indexes, and then do an EXPLAIN referencing each NUSI. If the Parser chooses a Full Table Scan over using the NUSI, drop the index.
An aggregate join index can be created using: SUM function COUNT function GROUP BY clause
The use of an aggregate join index entails: Initial time consumed to calculate and create the index. Whenever a value in a join-index column of the base table is updated, time to recalculate the aggregate and update the index.
Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning expression. PPI improves performance as follows: Automatic optimization occurs for queries that specify a restrictive condition on the partitioning column. Uses partition elimination to improve the efficiency of range searches when, for example, the searches are range partitioned. Only the rows of the qualified partitions in a query need to be accessed avoid full table scans. Provides an access path to the rows in the base table while still providing efficient join strategies If the same partition is consistently targeted, the part of the table updated may be able to fit largely in cache, significantly boosting performance
Benefits that are the result of using PPI vary based on: The number of partitions defined The number of partitions that can be eliminated given the query workloads Whether or not the user follows an update strategy that takes advantage of partitioning.
However, the main disadvantages of using a PPI are: The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this situation, a query specifying a PI value, but no value for the partitioning column, must look in each partition for that value, instead of positioning directly to the first row for the PI value. The direct join disadvantage occurs when another table with the same PI is joined with an equality condition on every PI column. For two non-PPI tables, the rows of the two tables will be ordered the same, and the join can be performed directly. If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table.