Вы находитесь на странице: 1из 4

Collecting Statistics

The Teradata PE or optimizer follows the saying, If you fail to PLAN you PLAN to fail. The PE is responsible for taking users SQL and after optimizing the SQL comes up with a PLAN for the AMPs to follow. The PE is the Boss and the AMPs are the workers. Ask yourself two questions:

The Teradata Parsing Engine (PE) is the best optimizer in the data warehouse world, but it needs you to COLLECT STATISTICS so it can optimize its work. The statistics allow the optimizer to use its vast experience to PLAN the best way to fulfill the query request. It is particularly important for the optimizer to have accurate table demographics when data is skewed. The purpose of the COLLECT STATISTICS command is to gather and store demographic data for one or more columns or indices of a table or join index. This process computes a statistical profile of the collected data, and stores the synopsis in the Data Dictionary (DD) for use during the PEs optimizing phase of SQL statement parsing. The optimizer uses this synopsis data to generate efficient table access and join plans. Lets review: The Parsing Engine Processor (PEP) which is also referred to as the optimizer takes SQL requests from a user and comes up with a Plan for the Access Module Processors (AMPs) to execute. The PEP uses statistics to come up with the most cost efficient plan. You must COLLECT STATISTICS on any columns or indices of a table you want the optimizer to use with high confidence. If statistics are not collected, the PE randomly chooses an AMP in which it will ask a series of questions. The PEP will then estimate based on the total number of AMPs to estimate the number of rows in the entire table. This guess-timate value can be inaccurate, especially if the data is skewed. You should COLLECT STATISTICS on all tables. You also have the ability to COLLECT STATISTICS on a Global temporary tables, but not Volatile tables.

We recommend you refresh the statistics whenever the number of rows in a table is changed by 10%. For example, a MultiLoad job may INSERT a million records in a 9 million-row table. Since the table has an additional 10% of new rows it is definitely time to refresh the COLLECT STATISTICS. In reality, we refresh statistics by using the COLLECT STATISTICS command again any time the table changes by more than 10%. The first time you collect statistics you collect them at the index or column level. After that you just collect statistics at the table level and all previous columns collected previously are collected again. It is a mistake to collect statistics only once and then never do it again. In reality, it is better to have no statistics than to have ridiculously incorrect statistics. This is because the optimizer is gullible and believes the statistics, no matter how inaccurate. Collecting Statistics is rough on system resources so it is best to do it at night in a batch job or during other off peak times. You can see what statistics have been collected on a table and the date and time the STATISTICS were last collected with the following:

Here are some excellent guidelines on what you should collect statistics on: All Non-Unique indices Non-index join columns The Primary Index of small tables Primary Index of a Join Index Secondary Indices defined on any join index Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

The two key words to collect and drop statistics are: COLLECT STATISTICS DROP STATISTICS

Release V2R5 allows the collection of statistics on multiple non-indexed columns in a table. This feature enhances the ability of the optimizer to better estimate the number of qualifying rows based on the use of these columns in the WHERE clause. Prior to V2R5 you can only collect statistics on a single column in a single COLLECT STATISTICS command.

Here is the syntax to collect statistics for V2R5:

or CREATE INDEX Style

Here is an example of collecting statistics on the columns dept and emp_no and the multi-column index of lname, fname columns in the employee table (V2R4).

The above commands still work, however in V2R5 they can be run as:

Notice that with V2R5, you can now collect statistics on more than one column in a single statement. This saves time because the statistics for all columns are collected in a single pass of the file, no longer one pass per column. Therefore, this type of statistics collection will be faster. Additionally, you only get a SAMPLE for V2R5. The next example of COLLECTING STATISTICS at the table level is for V2R5 and later:

COLLECT STATISTICS USING SAMPLE column lname on TomC.Employee;

You will always COLLECT STATISTICS on a column or index one at a time initially. You must use the COLLECT STATISTICS command for each column or index you want to collect in a table. In the above examples, we collected statistics on the column dept and the index(lname, fname). You can collect statistics at either the column or index level. It is best to COLLECT STATISTICS at the column level unless you are dealing with a multi-column index. COLLECT at the index level only for indices that are multicolumn indices. Otherwise collect columns and single column indices at the column level. Single column indices actually perform the same COLLECT STATISTICS functions as if they were collected at the column level. Plus, if you drop an index, you lose the statistics. The table Employee now has COLLECTED STATISTICS defined within the table. Although you must collect statistics the first time at the column or index level you only collect statistics at the TABLE LEVEL for all refreshing of STATISTICS. Here is an example of COLLECTING STATISTICS at the table level (V2R5 and prior).
COLLECT STATISTICS on TomC.Employee;

The system will refresh the COLLECT STATISTICS on the columns and indices it had previously collected on the table. You cannot request sampling at the table level, only at the column or index request.
2A0C0215000000

Вам также может понравиться