Вы находитесь на странице: 1из 16

Oracle Data Architecture

Introduction to Performance Tuning Performance tuning focuses primarily on writing efficient SQL, allocating appropriate computing Resources, and analyzing wait events and contention in the system. Unlike several other features of Oracle database management, performance tuning isnt a cut and dried subject with clear prescriptions and rules for every type of problem you may face. This is one area where your technical knowledge must be used together with constant experimentation and observation A Systematic Approach to Performance Tuning Its important to follow a systematic approach to tuning database performance. Performance problems commonly come to the fore only after a large number of users start working on a new

production database. The system seems fine during development and rigorous testing, but it slows down to a crawl when it goes to production. This could be because the application isnt easily scalable for a number of reasons. The seeds of the future performance potential of your database are planted when you design your database. You need to know the nature of the applications the database is going to support. The more you understand your application, the better you can prepare for it by creating your database with the right configuration parameters. If major mistakes were made during the design stage and the database is already built, youre left with tuning application code on one hand and the database resources such as memory, CPU, and I/O on the other. Oracle suggests a specific design approach with the following steps: 1. Design the application correctly. 2. Tune the application SQL code. 3. Tune memory. 4. Tune I/O. 5. Tune contention and other issues.

Reactive & Reactive Performance Tuning Although the preceding performance tuning steps suggests that you can follow the sequence in an orderly fashion, the reality is completely different. Performance tuning is an iterative process, not a sequential one where you start at the top and end up with a fully tuned database as the product. As a DBA, you may be involved in a new project from the outset, when you have just the functional requirements. In this case, you have an opportunity to be involved in the tuning effort from the beginning stages of the application, a phase that is somewhat misleadingly dubbed proactive tuning by some. Alternatively, you may come in after the application has already been designed and implemented, and is in production. In this case, your performance efforts are categorized as reactive performance tuning. What you can do to improve the performance of the

database depends on the stage at which you can have input, and on the nature of the application itself. Optimizing Oracle Query Processing When a user starts a data-retrieval operation, the users SQL statement goes through several sequential steps that together constitute query processing. One of the great benefits of using the SQL language is that it isnt a procedural language in which you have to specify the steps to be followed to achieve the statements goal. In other words, you dont have to state how to do something; rather, you just state what you need from the database. Query processing is the transformation of your SQL statement into an efficient execution plan to return the requested data from the database. Query optimization is the process of choosing the most efficient execution plan. The goal is to achieve the result with the least cost in terms of resource usage. Resources include the I/O and CPU usage on the server where your database is running. This also means that the goal is to reduce the total execution time of the query, which is simply the sum of the execution times of all the component operations of the query. This optimization of throughput may not be the same as minimizing response time. If you want to minimize the time it takes to get the first n rows of a query instead of the entire output of the query, the Optimizer may choose a different plan. If you choose to minimize the response time for all the query data, you may also choose to parallelize the operation. A users SQL statement goes through the parsing, optimizing, and execution stages. If the SQL statement is a query, data has to be retrieved, so theres an additional fetch stage before the SQL statement processing is complete.

Parsing

Parsing primarily consists of checking the syntax and semantics of the SQL statements. The end product of the parse stage of query compilation is the creation of the parse tree, which represents the querys structure. The SQL statement is decomposed into a relational algebra query thats analyzed to see Whether its syntactically correct. The query then undergoes semantic checking. The data dictionary is consulted to ensure that the tables and the individual columns that are referenced

in the query do exist, as well as all the object privileges. In addition, the column types are checked to ensure that the data matches the column definitions. The statement is normalized so it can be processed more efficiently. The query is rejected if it is incorrectly formulated. Once the parse tree passes all the syntactic and semantic checks, its considered a valid parse tree, and its sent to the logical query plan generation stage. All these operations take place in the library cache portion of the SGA. Optimization During the optimization phase, Oracle uses its optimizerwhich is a cost-based optimizer (CBO) to choose the best access method for retrieving data for the tables and indexes referred to in the query. Using statistics that you provide and any hints specified in the SQL queries, the CBO produces an optimal execution plan for the SQL statement. Execution Plan Generation Phase During this phase, Oracle transforms the logical query plan into a physical query plan. The Optimizer may be faced with a choice of several algorithms to resolve a query. It needs to choose the most efficient algorithm to answer a query, and it needs to determine the most efficient way to implement the operations. In addition to deciding on the best operational steps, the Optimizer determines the order in which it will perform these steps. For example, the Optimizer may decide that a join between table A and table B is called for. It then needs to decide on the type of join and the order in which it will perform the table join. The physical query or execution plan takes into account the following factors: The various operations (for example, joins) to be performed during the query The order in which the operations are performed The algorithm to be used for performing each operation The best way to retrieve data from disk or memory The Optimizer may generate several valid physical query plans, all of which are potential execution plans. The Optimizer then chooses among them by estimating the cost of each possible physical plan based on the table and index statistics available to it, and selecting the plan with the lowest estimated cost. This evaluation of the possible physical query plans is called cost-based query optimization. The cost of executing a plan is directly proportional to the amount of resources such as I/O, memory, and CPU necessary to execute the proposed plan. The Optimizer passes this lowcost physical query plan to Oracles query execution engine.

An Example of Cost Optimization Lets say you want to run the following query, which seeks to find all the supervisors who work in Dallas. The query looks like this:

Now, you have several ways to arrive at the list of the supervisors. Lets consider three ways to arrive at this list, and compute the cost of accessing the results in each of the three ways. Make the following simplifying assumptions for the cost computations: You can only read and write data one row at a time (in the real world, you do I/O at the block level, not the row level). The database writes each intermediate step to disk (again, this may not be the case in the real world). No indexes are on the tables. The employee table has 2,000 rows. The dept table has 40 rows. The number of supervisors is also 40 (one for each department). Ten departments are in the city of Dallas. In the following sections, youll see three different queries that retrieve the same data, but that use different access methods. For each query, a crude cost is calculated, so you can compare how the three queries stack up in terms of resource cost. The first query uses a Cartesian join.

Query 2: A Join of Two Tables The second query uses a join of the employee and dept tables. First, join the employee and dept tables on the dept_no column. From this join, select all rows where e.job=supervisor and city=Dallas. The following would be the total cost of performing the query: Joining the employee and dept tables first requires a read of all the rows in both tables: 2,000 + 40 = 2,040

Creating the join of the employee and dept tables: 2,000 writes Reading the join results costs: 2,000 reads Total I/O cost: 2,040 + 2,000 + 2,000 = 6,040 Query 3: A Join of Reduced Relations The third query also uses a join of the employee and dept tables, but not all the rows in the two tablesonly selected rows from the two tables are joined. Heres how this query would proceed to retrieve the needed data. First, read the employee table to get all supervisor rows. Next, read the dept table to get all Dallas departments. Finally, join the rows you derived from the employee and the dept tables. The following would be the total cost of performing the query:

Reading the employee table to get the supervisor rows: 2,000 reads Writing the supervisor rows derived in the previous step: 40 writes Reading the dept table to get all Dallas departments: 40 reads Writing the Dallas department rows derived from the previous step: 10 writes Joining the supervisor rows and department rows derived in the previous steps of this query execution: A total of 40 + 10 = 50 writes Reading the join result from the previous step: 50 reads Total I/O cost: 2,000 + 2(40) + 10 + 2(50) = 2,190 This example, simplified as it may be, shows you that Cartesian products are more expensive than more restrictive joins. Even a selective join operation, the results show, is more expensive than a selection operation. Although a join operation is in query 3, its a join of two reduced relations; the size of the join is much smaller than the join in query 2. Query optimization often involves early selection (picking only some rows) and projection (picking only some columns) operations to reduce the size of the resulting outputs or row sources.

Query Execution During the final stage of query processing, the optimized query (the physical query plan that has been selected) is executed. If its a SELECT statement, the rows are returned to the user. If its an INSERT, UPDATE, or DELETE statement, the rows are modified. The SQL execution engine takes the execution plan provided by the optimization phase and executes it. Of the three steps involved in SQL statement processing, the optimization process is the

crucial one because it determines the all-important question of how fast your data will be retrieved. Understanding how the Optimizer works is at the heart of query optimization. Its important to know what the common access methods, join methods, and join orders are in order to write efficient SQL. The next section presents a detailed discussion of the all-powerful Oracle CBO.

Query Optimization and the Oracle Cost-Based Optimizer In most cases, you have multiple ways to execute a SQL query. You can get the same results from doing a full table scan or using an index. You can also retrieve the same data by accessing the tables and indexes in a different order. The job of the Optimizer is to find the optimal or best plan to execute your DML statements such as SELECT, INSERT, UPDATE, and DELETE. Oracle uses the CBO to help determine efficient methods to execute queries. The CBO uses statistics on tables and indexes, the order of tables and columns in the SQL statements, available indexes, and any user-supplied access hints to pick the most efficient way to access them. The most efficient way, according to the CBO, is the least costly access method, cost being defined in terms of the I/O and the CPU expended in retrieving the rows. Accessing the necessary rows means Oracle reads the database blocks on the file system into the buffer pool. The resulting I/O cost is the most expensive part of SQL statement execution because it involves reading from the disk. You can examine these access paths by using tools such as the EXPLAIN PLAN. Choosing Your Optimization Mode In older versions of Oracle, you had a choice between a rule-based and a cost-based optimizer. In a rule-based approach. All the access paths were assigned a rank, and the path with the lowest rank was chosen. The operations with a lower rank usually executed faster than those with a higher rank. For example, a query that uses the ROWID to search for a row has a cost of 1. This is expected because identifying a row with the help of the ROWID, an Oracle pointer-like mechanism, is the fastest way to locate a row. On the other hand, a query that uses a full table scan has a cost of 19, the highest possible cost under rule-based optimization. The CBO method almost always performs better than the older rule-based approach because, among other things, it takes into account the latest statistics about the database objects. Providing Statistics to the Optimizer

In Oracle Database 10g, the database itself automatically collects the necessary Optimizer statistics. Every night, the database schedules a statistics collection job during the maintenance window of the Oracle Scheduler. The maintenance window, by default, extends from 10 PM to 6 AM on weekdays and all weekend as well. The job is named GATHER_STATS_JOB, and runs by default in every Oracle Database 10g database. You have the ability to disable the GATHER_STATS_JOB if you wish. You can get details about this default GATHER_STATS_JOB by querying the DBA_SCHEDULER_JOBS view. The GATHER_STATS_JOB collects statistics for all tables that either dont have Optimizer statistics, or have stale (outdated) statistics. Oracle considers an objects statistics stale if more than 10 percent of its data has changed since the last time it collected statistics for that object. By default, Oracle monitors all DML changes such as inserts, updates, and deletes made to all database objects. You can also view the information about these changes in the DBA_TAB_MODIFICATIONS view. Based on this default object monitoring, Oracle decides whether to collect new statistics for an object.

To check that the GATHER_STATS_JOB is indeed collecting statistics on a regular basis, use the following:

Note the following points about the preceding output:

The job collects statistics during the maintenance window of the database, which is, by default, scheduled between 10 PM and 6 AM during weekdays and all weekend. The statistics are collected by the nightly GATHER_STATS_JOB run by the Scheduler. If a table is created that day, the job uses all the rows of the table the first time it collects statistics for the table. The sampling percentage varies from less than 1 percent to 100 percent. The size of the table and the percentage of the sample arent correlated. The job doesnt collect statistics for all the tables each day. If a tables data doesnt change after its created, the job never collects a second time.

Oracle determines the sample size for each object based on its internal algorithms; there is no Standard sample size for all objects. What exactly the statistics does The number of rows in a table The number of rows per database block The average row length The total number of database blocks in a table The number of levels in each index The number of leaf blocks in each index The number of distinct values in each column of a table Data distribution histograms The number of distinct index keys Cardinality (the number of columns with similar values for each column) The minimum and maximum values for each column System statistics, which include I/O characteristics of your system; and CPU statistics, which include CPU speed and other related statistics The key to the CBOs capability to pick the best possible query plan is its capability to correctly estimate the cost of the individual operations of the query plan. These cost estimates are derived from the knowledge about the I/O, CPU, and memory resources needed to execute each operation based on the table and index statistics. The database stores the Optimizer statistics that it collects in its data dictionary. The DBA_

TAB_STATISTICS table shows Optimizer statistics for all the tables in your database. You can also see column statistics by querying the DBA_TAB_COL_STATISTICS view

As you can see, more than 22 million PERSON_ID numbers are in the PERSONNEL table. However, there are only 7,281 distinct last names and 1,729 distinct first names. Of course, the GENDER column has only two distinct values. The Optimizer takes into account these types of information regarding your table data, before deciding on the best plan of execution for a SQL statement that involves the tables columns. Tip Optimizer statistics include both object (table and index) statistics, as well as system statistics. Without accurate system statistics, the Optimizer cant come up with valid cost estimates to evaluate alternate execution plans. What Does the Optimizer Do? The CBO performs several intricate steps to arrive at the optimal execution plan for a users query. The original SQL statement is most likely transformed, and the CBO evaluates alternative access paths (for example, full-table or index-based scans). If table joins are necessary, the Optimizer evaluates all possible join methods and join orders. The Optimizer evaluates all the possibilities and arrives at the execution plan it deems the cheapest in terms of total cost, which includes both I/O and CPU resource usage cost.

SQL Transformation Oracle hardly ever executes your query in its original form. If the CBO determines that a different SQL formulation will achieve the same results more efficiently, it transforms the statement before executing it. A good example is where you submit a query with an OR

condition, and the CBO transforms it into a statement using UNION or UNION ALL. Or your statement may include an index, but the CBO might transform the statement so it can do a full table scan, which can be more efficient under some circumstances. In any case, its good to remember that the query a user wishes to be executed may not be executed in the same form by Oracle, but the querys results are still the same. Here are some common transformations performed by the Oracle CBO: Transform IN into OR statements. Transform OR into UNION or UNION ALL statements. Transform noncorrelated nested selects into more efficient joins. Transform outer joins into more efficient inner joins. Transform complex subqueries into joins, semi joins, and anti joins. Star transformation for data warehouse tables based on the star schema. Transform BETWEEN to greater than or equal to and less than or equal to statements.

Choosing the Access Path Oracle can often access the same data through different paths. For each query, the Optimizer evaluates all the available paths and picks the least expensive one in terms of resource usage. If joins are involved, then the join order and the join method are evaluated to finally arrive at the best execution plan. Youll take a brief look at the steps the Optimizer goes through before deciding on its choice of execution path. Full Table Scans Oracle scans the entire table during a full table scan. Oracle reads each block in the table sequentially, so the full table scan can be efficient if the MULTI_BLOCK_READ_COUNT

initialization parameter is set high enough. However, for large tables, full table scans are inefficient in general.

Table Access by ROWID Accessing a table by ROWID retrieves rows using unique ROWIDs. ROWIDs in Oracle specify the exact location in the data file and the data block where the row resides, so ROWID access is the fastest way to retrieve a row in Oracle. Often, Oracle obtains the ROWID through an index scan of the tables indexes. Using these ROWIDs, Oracle swiftly fetches the rows.

Index Scans An index stores two things: the column value of the column on which the index is based and the ROWID of the rows in the table that contain that column value. An index scan retrieves data from an index using the values of the index columns. If the query requests only the indexed column values, Oracle will return those values. If the query requests other columns outside the indexed column, Oracle will use the ROWIDs to get the rows of the table.

Choosing the Join Method When you need to access data thats in two or more tables, Oracle joins the tables based on a common column. However, there are several ways to join the row sets returned from the execution plan steps. For each statement, Oracle evaluates the best join method based on the statistics and the type of unique or primary keys on the tables. After Oracle has evaluated the join methods, the CBO picks the join method with the least cost. The following are the common join methods used by the CBO: Nested-loop join: A nested-loop join involves the designation of one table as the driving table (also called the outer table) in the join loop. The other table in the join is called the inner table. Oracle fetches all the rows of the inner table for every row in the driving table. Hash join: When you join two tables, Oracle uses the smaller table to build a hash table on the join key. Oracle then searches the larger table and returns the joined rows from the hash table. Sort-merge join: If both the tables in a join are sorted on the join key, the sorted lists are merged together.

Choosing the Join Order

Once the Optimizer chooses the join method, it determines the order in which the tables are joined. The goal of the Optimizer is always to join tables in such a way that the driving table eliminates the largest number of rows. A query with four tables has a maximum of 4 factorial, or 24, possible ways in which the tables can be joined. Each such join order would lead to a number of different execution plans, based on the available indexes and the access methods. The search for an optimal join strategy could take a long time in a query with a large number of tables, so Oracle depends on an adaptive search strategy to limit the time it takes to find the best execution plan. An adaptive search strategy means that the time taken for optimization is always a small percentage of the total time that is taken for execution of the query itself. Drawbacks of the CBO

The CBO is systematic, but the Optimizer is not guaranteed to follow the same plan in similar cases. However, the CBO isnt always perfect, and you need to watch out for the following: Application developers may know more than the CBO when it comes to choosing the best access path. Application developers know the needs of the users, of which the CBO is completely unaware. This could lead to a situation where the CBO may be optimizing throughput, when the users would rather have a quick set of results on their screen. By using hints such as FIRST_ROWS_n, you can overcome this drawback in the CBO. The CBO depends enormously on correct statistics gathering. If the statistics are absent or outdated, the Optimizer can make poor decisions.

Providing Statistics to the CBO The CBO can follow optimal execution paths only if it has detailed knowledge of the database objects. In Oracle Database 10g, the recommended way to provide these statistics is by letting the database automatically collect statistics for you. This is known as the Automatic Optimizer Statistics Collection feature. You can also manually provide statistics to the

Optimizer with the DBMS_STATS package. Using DBMS_STATS to Collect Statistics

Although letting the database automatically collect Optimizer statistics is the recommended approach under Oracle Database 10g, you can still manually collect Optimizer statistics using the DBMS_STATS package.

Collecting the Statistics

The DBMS_STATS package has several procedures that let you collect data at different levels. The main data collection procedures for database table and index data are as follows:

Frequency of Statistics Collection Theoretically, if your data is static, you may only need to collect statistics once. If your database performs only a small amount of DML activities, you may collect statistics at relatively longer intervals, say weekly or monthly. However, if your database objects go through constant change on a daily basis, you need to schedule the statistics collection jobs much more frequently, say daily or even more often. One of the best Oracle Database 10g new features is that you can let the database decide the frequency of statistics collection. Remember that the database bases its statistics collection on whether the statistics are fresh or stale. Thus, you can relax and let the database be the arbiter of how often to collect statistics.

What Happens When You Dont Have Statistics Youve seen how the Oracle database can automatically collect Optimizer statistics for you. Youve also learned how to use the DBMS_STATS package to collect the statistics manually yourself. But what happens if you disable the automatic statistics collection process, or if you dont collect statistics in a timely fashion? Even with automatic statistics collection, under which necessary statistics are collected on a nightly basis, you may have a situation where table data is altered after the statistics collection process is over. In situations such as this, Oracle uses data, such as the number of blocks taken up by the table data and other ancillary information, to figure out the Optimizer execution plan. You can also use the initialization parameter OPTIMIZER_DYNAMIC_SAMPLING to let Oracle estimate Optimizer statistics on the fly, when no statistics exist for a table, or when the statistics exist but are too old or otherwise unreliable. Of course, sampling statistics dynamically would mean that the compile time for the SQL

statement involved would be longer. Oracle smartly figures out if the increased compile time is worth it when it encounters objects without statistics. If its worth it, Oracle will sample a portion of the objects data blocks to estimate statistics. You need to set the value of the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter to 2 or higher to enable dynamic sampling of all unanalyzed tables. Because the default for this parameter is 2, dynamic sampling is turned on by default in your database. Thus, you need not spend sleepless nights worrying about objects with missing or outdated statistics. In any case, if you adhere to Oracles recommendation and use the Automatic Optimizer Statistics Collection feature, the GATHER_STATS_JOB will automatically collect your databases statistics. The GATHER_STATS_JOB is created at database creation time and is managed by the Oracle Scheduler, which runs the job when the maintenance window is opened. By default, the maintenance window opens every night from 10 PM to 6 AM, and all day on weekends. Oracle will collect statistics for all objects that need them if you adopt the Automatic Optimizer Statistics Collection feature. The feature is turned on by default in a new Oracle 10g database or when you upgrade to the 10g release from an older version.

Вам также может понравиться