Oracle Partitioning For Developers

ORACLE PARTITIONING FOR DEVELOPERS
Lewis R Cunningham, EnterpriseDB Who is this presentation for?

In many of the most recent clients I have worked with and for, it has been left to the developers to design database schemas and make the performance of the application meet user requirements. These are both tasks that have traditionally been left to DBAs but as the DBA job changes, so does the developer job. This presentation is geared primarily towards developers or those who might need partitioning but have not used it as of yet. I will only briefly touch on the maintenance and availability aspects of partitioning, which mainly benefit DBAs, in favor of the advantages to applications.
What is Partitioning?
First, I want to say that different databases do things with partitioning very differently. In this presentation, I am very much speaking directly about Oracle partitioning not any other database. A database partition is a way to physically separate the data in large tables without having to rewrite your application. The main benefit to an application is performance. Once a table gets over a certain size, queries and other DML begin to slow. Even with judicious indexing, large tables just take more time to scan. To explain a partition adequately, there are some key concepts that need to be understood. Data file A data file is the physical OS disk file that stores data. All data in an Oracle database ends up in a data file. A tablespace may be made up of 1 or more data files. Segment A segment is a logical container for an object in an Oracle database. A normal table (i.e. non partitioned) is exactly one segment. A regular index is also a segment. A partitioned table will be made up of as many segments as it has partitions. The same is true of partitioned indexes. Tablespace A tablespace is a logical container for segments. A tablespace may be empty but it will most likely hold one or more segments. Data Blocks & Extents Data blocks and extents are not really pertinent to a discussion of partitioning. I am throwing these in just for the curious. A data block is a physical chunk of data stored in a data file and an extent is a set of contiguous data blocks. A segment is made up of extents. While a segment can span multiple data files, an extent must be contained within a single data file. If this doesn't make sense, that's ok. I won't mention these two objects again.
www.odtug.com
ODTUG Kaleidoscope 2008
Oracle Partitioning for Developers
Cunningham
Figure 1 Tablespaces, Data Files and Segments So, with that understanding, a tablespace is made up physically of data files and logically of segments. Because a normal table is a single segment, it will reside within a single tablespace. That same table will probably exist in multiple data files. This is an important performance concept because this set up allows Oracle to spread a single table across multiple physical or logical disks. When you perform a full table scan, you scan the entire segment that is that table. An index is also a segment, so when you scan an index, you are scanning that segment. For a large table, that scan (whether table or index) can take time. As I said, a regular table, a single segment, exists completely within a single tablespace. A partitioned table, however, is made up of multiple segments. Each segment may reside in the same or different tablespaces as other partitions in the table. NOTE: There is a myth (a partial holdover from the past) that I hear fairly frequently. That myth says that partitioning is not usable or useful unless each partition exists in its own tablespace. This myth is usually used on existing systems as a reason to not partition existing tables (who wants to create a bunch of tablespaces on an old database?). This myth is not true. In many cases, especially from a maintenance or availability perspective, you may want to put each partition in its own tablespace but it is in no way required and you can still get significant performance improvements without that step.
Why is Partitioning Important?

Partitioning has traditionally been used in Very Large Databases (VLDB). A VLDB is usually a datawarehouse or other OLAP/reporting database. Not that long ago, a few gigabytes (GB) was considered a VLDB. Today, OLTP databases can easily reach the GB level and even go way beyond. I have personally seen more reporting instances being combined with OLTP applications creating a kind of hybrid database. This makes even more important to make the best use of your data. Oracle recommends that any table over 2GB be considered as a partitioning candidate. That makes sense because if you can put your data in different segments, then it may be possible to scan only those segments and not the entire table. The same
www.odtug.com
Cunningham
can be said for indexes. If an index is partitioned, and if Oracle can figure out what partitions will hold the data without looking at every segment, you will scan less data making your statements return faster. With the appropriate parallel configuration (which is beyond the scope of this presentation), even full scans of a partitioned table are faster than a non-partitioned table. The true benefit, though, comes from Oracle optimizing partition access by only accessing those partitions that contain required data (or at least only those that potentially contain the data). This is called partition pruning and I will delve into that a bit more later.
Partition Keys
A partition key may or may not be the primary key of the table. In my experience, except for hash partitioning, it is rarely the primary key. A partition key will be one or more columns in your table. In the same way a primary key identifies a distinct row of data in a table, a good partition key will be something that distinctly identifies a category of data. For example, in a retail environment, a good key might be SALE_DATE. In a geographic application, a good key might be country or state. The partition key is something that should exist in most, if not all, of the where clauses in queries against that table. This will allow the Oracle optimizer to only look at those partitions who that might include the required data (partition pruning). When you tell Oracle to create the table as a partitioned table, you identify the key(s). I will show examples of this later on. You also tell Oracle how many partitions to create (how you do that is dependent on the type of partition). When data comes in, in the form of inserts, Oracle looks at what the value in the key is and puts the data in the correct partition. Then when you update, delete or select data from the table, and include the partition key in the where clause, Oracle will only look at the impacted partitions.
Figure 2 Partition Data Operations (Partition Key sale_date)
www.odtug.com
Cunningham
In addition to the key, when you create your partitions, you must identify the data that will exist in the partition. If we stick with the sale_date example in the figure above, then sale_date is the key (left side of where clause). "All records from 2005" and "all records from before 2005" would be the key values (right side of where clause). A partition key's values must also make sure that a record can only exist in a single partition. In our example above, we could not have created partitions like "all records from 1-jan-2005 though 31-jan-2005" and "all records from 15-jan-2005 though 15-feb-2005". Oracle would not know which partition to put the data in and it would not know which partition to query on retrieval. All of the partition types below are dependent on the partition key (or keys) that exist in your data. The partition key itself is what drives the type of partition you will use. Identifying the right partition key is critical to effective use of partitioning. Remember that it should be a key that you primarily query by. This is not an absolute rule, but is one to keep in mind. New in 11g is the ability to use virtual columns as your partition key. A virtual column is an expression rather than a specific column in the table. For example, you can add to dates, or consolidate using last_day(order_date) or even substring out a piece of a column and that is a virtual column. Prior releases of Oracle did not allow virtual columns as partition keys. You needed to populate the data (usually via a trigger) on insert and that hidden column would become the partition key.
Partition Types
Oracle offers 6 major types of partitions and a set of composite partitions. A composite partition allows you to combine partition types into partitions and sub-partitions. List Partitions A list partition is probably the simplest partition type to implement and understand. If your data has a column that would make sense looking at it as a list (or as a look up table), it would probably make a good list partition. State, country, color, product type, sic code, or other industry codes all would make good list partition keys. For a list partition, you must enumerate the list. By that I mean you must specifically say what the elements in the list are. You can create a default partition to accept all values not included as values. I don't want to get into the code yet but a pseudo partition declaration for a list type (using state as an example) would be something like: Partition by LIST Partition 1 Partition 2 Partition 3 Partition 4 (state_column) stores ('AZ', 'AK', 'AL') stores ('NM', 'NY', 'MD') stores ('LA', 'TX', 'MS') stores (ALL OTHER VALUES) <- Default partition
Range Partitions In my experience, range partitions tend to be the most common. A range partition key will be a value that maps to a range data value. The example above, based on sale_date, is an example of a range partition. Like a list partition, all values must be enumerated. Unlike a list though, you specify the data as a range using a less than operator. Range partitioning allows a default partition to store any records that are greater than the highest partition. A pseudo declaration of a range partition might look like: Partition by RANGE (sale_date) Partition 1 stores sale_date Partition 2 stores sale_date Partition 2 stores sale_date Partition 2 stores sale_date Partition 2 stores sale_date < < < < < INFINITY 01-jan-2008 01-jan-2007 01-jan-2006 01-jan-2005
www.odtug.com
Cunningham
Hash Partitions Hash partitioning allows you store your data across multiple files when you don't otherwise have an obvious partition key. If your table is not storing historical data and is not queried by a list column, you may want to use a hash partition. A table with a sequence as a primary key can use that key as the partition key, but more likely will be a customer ID or product ID. A hash partition is less likely to be needed now that Oracle has added reference partitions. When using a hash partition, Oracle tries to balance the data evenly across the number of partitions defined. This can be a dramatic benefit to maintenance as a DBA can operate on individual partitions instead of a single huge table. A pseudo declaration of a hash partition might look like: Partition by hash (seq_pk) Create 5 Partitions Reference Partitions One of the most significant enhancements in Oracle 11g is the addition of reference partitions. A reference partition is a partition based on a foreign key. A good example is an order entry system. Our orders table may be partitioned by order date. Let's say we have 1 billion rows in our table (we are exceptionally successful). Now, we have a child table, order lines, that has 10 times the number of rows but does not have a good partition key. We normally join orders to order_lines on order ID. In the past we would add order date to the child table so that we could partition using the same key. The benefit of this, in addition to breaking the table into smaller chunks, is that we can do a partition-wise join. When we do partition pruning, we would know to only join with matching child partitions. The downside is that we make our table bigger by adding order date and we now have to maintain order date in two places. In Oracle 11g, we can declare a reference partition. As the developer, you no longer need to maintain the order date in the order lines table. As long as both tables have active primary keys and an enforced foreign key constraint, you can define the parent partition using the best method available and then define the child constraint as a reference. Oracle will maintain the metadata of the relationship and allow you to do partition-wise joins. A pseudo declaration of a reference partition might look like: Partitition by reference (foreign_key_name) Systems Partitions System partitioning allows third party development access to partitioning. That means that it allows an application to control data placement within a partition. System partitioning is outside the scope of this presentation. Interval Partitions An interval partition is not really a different partition type as much as it is an extension to a range partition. Because we need to list all of the potential ranges, we have regular maintenance on range partitioned tables. If we partition by month, we may create 12 months in advance but when those run out, we need to create more. Rather than list all of the potential ranges, interval partitions let us tell Oracle how we want the ranges created and we let Oracle figure out when we need a new partition. A pseudo declaration of an interval partition might look like: Partition by RANGE (sale_date) Interval of 1 year Start with partition 1 sale_date < 01-jan-2008
www.odtug.com
Cunningham
If we insert a record outside of 2008, Oracle will automatically create a new partition. Composite Partitions A composite partition is not a new type of partition. For performance and management reasons, you might need to partition by one method and then sub-partition your data even further. An example might be a worldwide sales database. The majority of queries are by year but local departments may also want to see data by region. You can range partition by sale date and sub-partition by region code. A pseudo code of this is: Partition by RANGE (sale_date) Subpartition by LIST (region_code) Subpartition region1 region_code = 1 Subpartition region2 region_code = 2 Subpartition region3 region_code = 3 Subpartition region4 region_code = 4 Partition 1 stores sale_date < INFINITY Partition 2 stores sale_date < 01-jan-2008 Partition 2 stores sale_date < 01-jan-2007 Partition 2 stores sale_date < 01-jan-2006 Partition 2 stores sale_date < 01-jan-2005 I won't get much into the details of composite partitions. They are basically just an extension of the partition types that I will be talking about. I will provide one syntactically correct example below.
Indexing Partitions
Partitioned tables, like any other table, can be indexed for faster access. With a partitioned table, you can either index the entire table (a global index) or you can index the partitions (a local index). When you create a global index, you may create a global partitioned index or a global non-partitioned index. Global Non-Partitioned Index A global index is an index across the entire table. Unless a primary key column is included in your partition key, your table's primary key will be a global index. A non-partitioned index is a regular index that would exist on a non-partitioned table. In a global non-partitioned index, maintenance and availability is sacrificed at the cost of transactional performance. For this reason, global indexes tend to be preferred for OLTP applications. Global Partitioned Index Like a global non-partitioned index, a global index is an index across the entire table but the index itself is partitioned with a different partition key from the table. Global partitioned indexes can only be partitioned by hash or range. Partitioning by these methods is completely separate from the type of table partition. A table can be partitioned by LIST and still have a global range partition index. Local Index Local indexes are indexes create on each partition in a table. A local index automatically creates an index partition for each partition in the table. The index is partitioned by the same key as the partition key of the table. A local index is always partitioned by the same partition key as the parent table. You cannot add or remove partitions in a local index, or in a global index for that matter. You must add and remove partitions from the parent table. A local index does not need to include the partition key in the list of indexed columns. Local indexes provide the best throughput of a query and are used primarily in OLAP and DSS type environments. Choosing an Index Rather than try to come up with a new way to say this, I will let Oracle's documentation say it for me. Just like partitioned tables, partitioned indexes improve manageability, availability, performance, and scalability. They can either be partitioned independently (global indexes) or automatically linked to a table's partitioning method (local indexes). In
www.odtug.com
Cunningham
general, you should use global indexes for OLTP applications and local indexes for data warehousing or DSS applications. Also, whenever possible, you should try to use local indexes because they are easier to manage. When deciding what kind of partitioned index to use, you should consider the following guidelines in order: 1. If the table partitioning column is a subset of the index keys, use a local index. If this is the case, you are finished. If this is not the case, continue to guideline 2. If the index is unique and does not include the partitioning key columns, then use a global index. If this is the case, then you are finished. Otherwise, continue to guideline 3. If your priority is manageability, use a local index. If this is the case, you are finished. If this is not the case, continue to guideline 4. If the application is an OLTP one and users need quick response times, use a global index. If the application is a DSS one and users are more interested in throughput, use a local index.
2.
3.
4.
When should I use Partitioning?

Oracle recommends that any table over 2gb be considered for partitioning. That is a very good baseline number. Partitioning is an add-on package that adds cost to your database so you should consider that also. If you are already paying for partitioning, I would recommend that any table in a data warehouse with a logical partition key (except for very small tables, small < 100k rows) be considered for partitioning regardless of disk size. In an OLTP database, I would recommend being a bit stricter on choosing tables to partition. Like parallelization, partitioning can actually hurt you in a transaction processing database. In some cases, I have seen very hot tables that were not very large, benefit from hash and list partitioning. Very large tables will almost always benefit from partitioning. When scanning a partitioned table, Oracle is able to make the best use of the parallel features in the database. It's beyond the scope of this presentation but combining parallelism with partitioning on data loads (using things like merge from external tables) can dramatically improve overall load times.
How do I use Partitioning?

Now we get to the code portion of this presentation. I am not going to try to duplicate the Oracle documentation (which is actually very good). There are many options that may be used that I will not show here. You can choose storage options for each partition just like you can for a regular table. Before implementing partitioning, you should take a read through partitioning syntax. I am going to use the orders example from above and show several different aspects of partitioning.
Creating Partitions
Partitioned tabled are created like a regular table with some additional syntactical sugar. Below are the tables that I will be using for the rest of this presentation. List Partition CREATE TABLE customers ( cust_id NUMBER NOT NULL PRIMARY KEY, cust_name VARCHAR2(30), region_code VARCHAR2(3) ) PARTITION BY LIST (region_code) ( PARTITION asia_region VALUES ('AS', 'CN', 'JP', 'RS', 'CK'), PARTITION euro_region VALUES ('GB', 'FR', 'GR', 'IT', 'SC'),
www.odtug.com
Cunningham
PARTITION VALUES PARTITION VALUES PARTITION VALUES );
amer_region ('US', 'MX', 'CD', 'GU', 'BR'), null_region (NULL), error_region (DEFAULT)
The customers table is a LIST partitioned table. There are three primary partitions and two special partitions. The null region will hold any records where region_code is null. The final partition uses the keyword DEFAULT which creates a default partition. This partition will catch any records that do not fall into the other partitions. We can run a query against the data dictionary to see the partitions we just created: SQL> select partition_name, high_value 2 from user_tab_partitions 3 where table_name = 'CUSTOMERS'; PARTITION_NAME -----------------------------ASIA_REGION EURO_REGION AMER_REGION NULL_REGION ERROR_REGION HIGH_VALUE -----------------------------'AS', 'CN', 'JP', 'RS', 'CK' 'GB', 'FR', 'GR', 'IT', 'SC' 'US', 'MX', 'CD', 'GU', 'BR' NULL DEFAULT
Range Partition CREATE TABLE orders ( order_id NUMBER NOT NULL PRIMARY KEY, order_date DATE NOT NULL, cust_id NUMBER, extra_column VARCHAR2(10), CONSTRAINT ordfk1 FOREIGN KEY (cust_id) REFERENCES customers(cust_id) ) PARTITION BY RANGE (order_date) ( PARTITION y2005 VALUES LESS THAN (TO_DATE('01/01/2006', PARTITION y2006 VALUES LESS THAN (TO_DATE('01/01/2007', PARTITION y2007 VALUES LESS THAN (TO_DATE('01/01/2008', PARTITION y2008 VALUES LESS THAN (TO_DATE('01/01/2009', PARTITION yInfinite VALUES LESS THAN (MAXVALUE) );
'DD/MM/YYYY')), 'DD/MM/YYYY')), 'DD/MM/YYYY')), 'DD/MM/YYYY')),
The orders table is a RANGE partitioned table. It is range partitioned using order_date as the partition key. Any records inserted with an order date less than 1/1/2006 will be inserted into partition y2005. That means that a record with an order date of 1/1/1995 will be inserted into y2005. yInfinite uses the key word MAXVALUE which for a date means that highest date possible. Any record with an order_date greater than or equal to 1/1/2009 will be inserted into yInfinite. That means that a record with an order_date of 31/12/4712 will be inserted into yInfinite. It has a foreign key into the customers table which is of no import to this particular discussion. We can run our query again to see what was created, partitionwise:
www.odtug.com
Cunningham
SQL> select partition_name, high_value 2 from user_tab_partitions 3 where table_name = 'ORDERS'; PARTITION_NAME HIGH_VALUE ------------------------------ -----------------------------Y2006 TO_DATE(' 2007-01-01 00:00:00' , 'SYYYY-MM-DD HH24:MI:SS', 'N LS_CALENDAR=GREGORIA Y2007 TO_DATE(' 2008-01-01 00:00:00' , 'SYYYY-MM-DD HH24:MI:SS', 'N LS_CALENDAR=GREGORIA TO_DATE(' 2009-01-01 00:00:00' , 'SYYYY-MM-DD HH24:MI:SS', 'N LS_CALENDAR=GREGORIA MAXVALUE TO_DATE(' 2006-01-01 00:00:00' , 'SYYYY-MM-DD HH24:MI:SS', 'N LS_CALENDAR=GREGORIA
Y2008
YINFINITE Y2005
Reference Partition CREATE TABLE order_lines ( order_id NUMBER NOT NULL, line_num NUMBER NOT NULL, line_amt NUMBER NOT NULL, CONSTRAINT ol_pk PRIMARY KEY (order_id, line_num), CONSTRAINT order_fk FOREIGN KEY (order_Id) REFERENCES orders(order_id) ) PARTITION BY REFERENCE (order_fk); The order_lines table is a REFERENCE partitioned table linked to the orders table via the order_fk foreign key constraint. We can see that the same partitions as the orders table have been created for the order_lines table by running our query again: SQL> select partition_name, high_value 2 from user_tab_partitions 3 where table_name = 'ORDER_LINES'; PARTITION_NAME HIGH_VALUE ------------------------------ -------------------Y2005 Y2006 Y2007 Y2008 YINFINITE SQL>
www.odtug.com
Cunningham
Notice that the high_value column is null. That's because this table's partitions will reference the parent table's partitions and ensure that child rows are inserted into matching parent rows. Indexes We've already created several indexes on these tables. The primary keys for each table were created as global, nonpartitioned indexes. We can also create global, partitioned indexes or we can create local indexes. Below are two local indexes on the orders table. The first index is indexing the partition key (order_date) and the second is partitioning a different column. CREATE INDEX ord_date_ind1 ON orders (order_date) LOCAL; CREATE INDEX extra_col_ind1 ON orders (extra_column) LOCAL; We can see the indexes we created by running this query: SQL> select index_name, partition_name, high_value 2 from user_ind_partitions 3 order by index_name, partition_name; INDEX_NAME PARTITION_ HIGH_VALUE --------------- ---------- ---------------------------------------EXTRA_COL_IND1 Y2005 TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA EXTRA_COL_IND1 EXTRA_COL_IND1 EXTRA_COL_IND1 EXTRA_COL_IND1 ORD_DATE_IND1 ORD_DATE_IND1 ORD_DATE_IND1 ORD_DATE_IND1 ORD_DATE_IND1 Y2006 Y2007 Y2008 YINFINITE Y2005 Y2006 Y2007 Y2008 YINFINITE TO_DATE(' 2007-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TO_DATE(' 2008-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA MAXVALUE TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TO_DATE(' 2007-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TO_DATE(' 2008-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-M M-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA MAXVALUE
10 rows selected. The primary key indexes do not show here because they are not partitioned.
Partition Maintenance
Oracle provides many partition maintenance options. You can alter tables and split partitions into smaller partitions, merge partitions to combine them. You can drop a table partition when you no longer need the data. This is usually done on RANGE partitioned tables as the data ages out.
www.odtug.com
10
Cunningham
The most important thing to remember when doing partition maintenance is how it affects your indexes, particularly global indexes. While maintenance is beyond the scope of a one hour presentation, plenty has been written about it on the web. One of the goals of partitioning is availability and not thinking about maintenance beforehand can really hurt that availability in the long run.
Queries Against Partitions

You normally query table directly and don't even worry about partitions. Oracle will utilize the partitions for you transparently. However, there may be times when you want to query the partitions directly. This is easily accomplished by including the partition keyword. For example, I would normally query the orders table like this: SELECT * FROM orders WHERE order_date BETWEEN to_date('01/02/2007', 'DD/MM/YYYY') AND to_date('15/02/2007', 'DD/MM/YYYY'); If I wanted to force my query to the y2007 partition, I can do that like this: SELECT * FROM orders PARTITION (y2007) WHERE order_date BETWEEN to_date('01/02/2007', 'DD/MM/YYYY') AND to_date('15/02/2007', 'DD/MM/YYYY'); Notice the partition keyword in the second query.
Explain Plan
A detailed explanation of explain plan is outside the scope of the presentation. Oracle provides a decent explanation of tuning partitioned table queries in the performance tuning guide. A couple of things to note though, make sure you are analyzing your tables (and partitions) using DBMS_STATS. This is critical to get the performance you are looking for. Second, a couple of things you are looking for in your explain plan are a PARTITION ITERATOR (possibly INLIST ITERATOR), which means Oracle is selecting specific partitions to iterate through rather than scanning the entire table, and a PARTITION SINGLE scan (like PARTITION RANGE SINGLE or PARTITION HASH SINGLE), which means Oracle decided it could look at just a single partition. Both of these are examples of partition pruning.
Partition Advisor
New in 11g is the Partition Advisor. This new tool is part of the SQL Access advisor. It's not the easiest tool in the world to use but it may help you decide where you can use partitioning and what kind of partitioning you need.
Summary
In summary, Oracle Partitioning improves performance in most instances on large data and may even help in certain instances on smaller data sets. It is important to keep in mind that partitioning is not a silver bullet but it is very, very useful. Choosing the right partition key is important and the right partition key will decide the right partition type. Sub-partitions may improve performance even more.
www.odtug.com
11

Oracle Partitioning For Developers

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Oracle Partitioning For Developers

Загружено:

Авторское право:

Доступные форматы

ORACLE PARTITIONING FOR DEVELOPERS

Lewis R Cunningham, EnterpriseDB Who is this presentation for?

ODTUG Kaleidoscope 2008