Вы находитесь на странице: 1из 45

Partitioning with

Oracle 11G

Bert Scalzo, Domain Expert, Oracle Solutions


Bert.Scalzo@Quest.com

Copyright 2006 Quest Software

About the Author


Domain Expert & Product Architect for Quest Software
Oracle Background:

Worked with Oracle databases for over two decades (starting with version 4)

Work history includes time at both Oracle Education and Oracle Consulting
Academic Background:

Several Oracle Masters certifications

BS, MS and PhD in Computer Science

MBA (general business)

Several insurance industry designations


Key Interests:

Data Modeling

Database Benchmarking

Database Tuning & Optimization

"Star Schema" Data Warehouses

Oracle on Linux and specifically: RAC on Linux

Articles for:

Oracles Technology Network (OTN)

Oracle Magazine,

Oracle Informant

PC Week (eWeek)

This presentation draws


heavily on these areas

Articles for:

Dell Power Solutions


Magazine

The Linux Journal

www.linux.com

www.orafaq.com

Books by Author

Coming in 2008

Agenda
Partitioning Benefits
Partitioning History
Partitioning Options
Partitioning Advisor (if youre licensed)
Typical Data Warehousing Environment
TPC-H Data Warehouse Benchmark
Results TPC-H with Various Partition Strategies
What about OLTP Environments and the TPC-C/E
Lessons Learned (and their relevance/application)
Questions & Answers
4

Partitioning Benefits: Facts


Manageability
Classic Divide & Conquer technique
More granular storage allocation options
Keeps otherwise time consumptive options viable

Availability
More granular online/offline options
More granular rebuild/reorganization options
More granular object level backup/restore options

Capacity Management
Enables a Tiered Storage Architecture approach
More granular storage cost management decision points

Performance
Partition Pruning
Partition-Wise Joins
5

Partitioning Benefits: Opinion (Mine)

Manageability
40%
Availability 20%
Capacity Management 20%
Performance
20%

Why to
Partition

Dont over-sell/over-expect the performance aspect


Need to experiment for best approach for a database
Better to take longer at the start to get right, because very
often its far too expensive to change afterwards
Examples demonstrate very positive performance, but
better to be conservative and error on the side of caution
then be very pleasantly surprised
6

Partition Pruning (Restriction Based)


From Docs: In partition pruning, the optimizer analyzes FROM and
WHERE clauses in SQL statements to eliminate unneeded partitions
when building the partition access list. This enables Oracle Database to
perform operations only on those partitions that are relevant ...

Divide and Conquer for performance


Sometimes can yield order of magnitude improvement
But once again, best not to oversell and/or over-expect

Some Potential Issues to be aware of:


SQL*Plus Auto-Trace can sometimes miss partition pruning
Old Style Explain Plans via simple SELECT has issues too
Best to always use DBMS_XPLAN and/or SQL_TRACE
Note: Trace file analysis much easier these days SQL Developer +
free Hotsos plug-in, metalink trace analysis scripts, Quest Toad DBA

Partition-Wise Join (Multi-Object Based)


From Docs: Partition-wise joins reduce query response time by
minimizing the amount of data exchanged among parallel execution
servers when joins execute in parallel. This significantly reduces response
time & improves the use of both CPU & memory resources.

Different Flavors:

Full Single to Single


Full Composite to Single
Full Composite to Composite
Partial Single
Partial Composite

Indexing Strategy Counts

All of these
affect the
explain plan

Local Prefixed/Non-Prefixed
Global
8

Picture Worth 1000 Words (from Docs)


Simple Mantra: Subdivide the work into equally paired
chunks, then perform all that work using many parallel
processes

Make sure not to over-allocate


CPUs remember there will
also be concurrent workload
9

Partitioning History (from Oracle 11G training+)


Oracle 5
Oracle 7

Before Tablespaces we had partitions


Partition Views really more of a cheat

10

Partitioning Options Part 1


IOTs can be partitioned as
well in later versions of Oracle,
so the basic choices are even
more complex than this

11

Partitioning Options Part 2


Prior to 11G:

Oracle White Paper: 2007 Partitioning in Oracle Database 11g

12

Partitioning Options Part 3


Post 11G:

Oracle White Paper: 2007 Partitioning in Oracle Database 11g

Very
exciting
new
options

13

Partitioning Advisor (if youre licensed)


Advisor Central -> SQL Advisors -> SQL Access Advisor

14

Typical Data Warehouse Architecture

TPC-H

15

Typical Environments
OLTP

ODS

OLAP

DM/DW

Business
Focus

Operational

Operational
Tactical

Tactical

Tactical
Strategic

End User
Tools

Client
Server Web

Client Server
Web

Client Server

Client Server
Web

DB
Technology

Relational

Relational

Cubic

Relational

Trans Count

Large

Medium

Small

Small

Trans Size

Small

Medium

Medium

Large

Trans Time

Short

Medium

Long

Long

Size in Gigs

10 200

50 400

50 400

400 - 4000

Normalization

3NF

3NF

N/A

0NF

Data Modeling

Traditional
ER

Traditional ER

N/A

Dimensional

Well come back to this picture

16

TPC-H Benchmark
Industry Standard Data Warehouse Benchmark
URL: www.tpc.org/tpch
Spec: http://tpc.org/tpch/spec/tpch2.7.0.pdf
8 Tables
22 Queries (answer complex business questions)
Database scaling:
Factor = 1, 10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000
Size GB = 1, 10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000

17

Sub-Partitions

TPC-H Data Model


SF *
200,000

SF *
6,000,000
SF *
800,000

SF *
10,000

Partitions

25

SF *
150,000

SF *
1,500,000

18

TPC-H Permits Partitioning

But what to do, what to do ???

19

Disclosure Reports

http://tpc.org/tpch/results/tpch_perf_results.asp
20

Disclosure Report Lots of Info

This is where people document


exactly what advanced database
feature and storage parameters
they used info is invaluable
21

Disclosure Report Appendix B

22

Sample Expensive Query

23

Example Explain Plan

24

Example Explain Plan


Explain complete.
Plan hash value: 2545634784

---------------------------------------------------------------------------------------------| Id

| Operation

| Name

| Rows

| Bytes |TempSpc| Cost (%CPU)| Time

---------------------------------------------------------------------------------------------|

0 | SELECT STATEMENT

| 42533 |

5648K|

641K

(1)| 01:57:41 |

1 |

| 42533 |

5648K|

105M|

641K

(1)| 01:57:41 |

|*

2 |

715K|

92M|

631K

(1)| 01:55:51 |

3 |

TABLE ACCESS FULL

| H_NATION

25 |

725 |

|*

4 |

HASH JOIN

715K|

72M|

631K

(1)| 01:55:51 |

5 |

TABLE ACCESS FULL

| H_SUPPLIER |

100K|

781K|

646

(1)| 00:00:08 |

|*

6 |

HASH JOIN

720K|

68M|

68M|

631K

(1)| 01:55:44 |

|*

7 |

751K|

59M|

232M|

589K

(1)| 01:48:10 |

|*

8 |

3004K|

197M|

4984K|

485K

(1)| 01:28:57 |

|*

9 |

100K|

3808K|

| 11805

(1)| 00:02:10 |

TABLE ACCESS FULL| H_LINEITEM |

60M|

1716M|

(1)| 01:02:52 |

15M|

200M|

10 |

11 |

SORT GROUP BY
HASH JOIN

HASH JOIN
HASH JOIN

TABLE ACCESS FULL| H_PART

TABLE ACCESS FULL | H_ORDER

342K

| 72122

(0)| 00:00:01 |

(1)| 00:13:14 |

25

Method of Attack
Since many data warehouses are utilized for data
mining, we cant always know every possible query
likely to run thus aggregate measure for success
Thus well compare the benchmarks weighted
performance scores for the TPC-H using various
partitioning schemes (all within spec of course)
Goal will be to find the best overall partitioning
Then well examine some specific explain plans
26

10G Sample Test cases

10G Simple Approach (Just Huge Tables)


Range: ORDER (order date)
Hash: LINEITEM (order key)

10G Basic Approach (Single Level Partitions)


Range: ORDER (order date)
Hash: LINEITEM (order key)
List: CUSTOMER (nation key)
Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

10G Complex Approach (Composite Partitions)


Range-Hash: ORDER (order date & cust key)
Multi-Hash: LINEITEM (part, supp & order keys)
List: CUSTOMER (nation key)
Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)
27

11G Sample Test cases

11G Simple Approach (+Interval)


Interval-Hash: ORDER (order date & cust key)
Multi-Hash: LINEITEM (part, supp & order keys)
List: CUSTOMER (nation key)
Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

11G Basic Approach (+Virtual)


Interval-Hash: ORDER (virtualized order date & cust key)
Multi-Hash: LINEITEM (part, supp & order keys)
List: CUSTOMER (nation key)
Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

11G Complex Approach (+REF)


Interval-Hash: ORDER (virtualized order date & cust key)
REF: LINEITEM (order key)
List: CUSTOMER (nation key)
Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

28

Is that It?
No just six very obvious high-level scenarios
Your selections and actual mileage will vary
Experimentation usually yields the best results
Always trust empirical results over conjecture
So improved response-time beats better explain plan
Remember, DWs usually have unpredictable queries
So dont tune for just a few queries, look for the best
overall and/or more generic performance solution
29

Intermediate Results can be Misleading

TPC-H Power score seemingly


implies that every partitioning
schema is incrementally better

TPC-H Throughput score


seems to show that nonpartitioned is equal to the best
30

Final Results tell the Real Truth

TPC-H Query/Hour score


shows that some partitioning
schemes better, and some not

TPC-H $/Query/Hour score


confirms the inverse in terms
of dollars per unit of work
31

Why such seemingly Opposite Results ???


Run times and explain plans apply to single measurable operation
Even aggregate & averaged run times dont relate the entire truth!
Need answer based upon sound mathematics (reliable & repeatable)

32

TPC-C Benchmark
Historical Industry Standard OLTP Benchmark
URL: www.tpc.org/tpchc
Spec: http://tpc.org/tpcc/spec/tpcc_current.pdf
Probably the most used & widely quoted benchmark
But suffers from overly simplistic design & code logic
Generally considered unreliable with modern RDBMS
But still a decent rough sounding board for many .
Being replaced by the newer TPC-E (later slides)
33

TPC-C Data Model


Base Scaling Unit

# Terminals/Warehouse
(i.e. concurrent users)
Clustered

Partitions

Sub-Partitions

TPC-E Benchmark
Emerging Industry Standard OLTP Benchmark
URL: www.tpc.org/tpche
Spec: http://tpc.org/tpce/spec/TPCE-v1.5.1.pdf
Very new and still evolving but highly promising
Not too many published TPC-E results as of yet
Design not compromised by RDBMS features
Much more realistic (i.e. real world) in nature
Nowhere near as easy as the old TPC-C test
35

TPC-E Data Model

36

TPS is Moot, Average Response Time is King


tpmC
584

582

579

578

But wait: adding cluster &partitioning yields negative why ???


Look to Stats Pack, AWR and ADDM Reports to investigate

37

Single block
reads 32 ms!
DISTRICT Table
needs clustered

Suggest fix one major item


per test iteration, so made
choice to address this 1st
Single block read 32 ms
Clustering worked, it made
SQL the #1 performance
issue as was expected
Single block read 5 ms

Partitioning did not shine


through just yet, possibly
skewed by the first issue

Switch MEMORY_TARGET to SGA/PGA_TARGETS


-13%

Notice that manual memory management resulted in 13% gain !!!


But wait, theres more (isnt that almost always the case)

40

Now its all


just SQL, so
time for
SQL Tuning
Advisor
&
SQL Tuning
Sets

SQL Tuning Sets & 11G Results/Client Cache


-9%

Going to stop Have quickly reached the good enough point


42

Architecture Findings
OLTP

ODS

OLAP

DM/DW

Business
Focus

Operational

Tactical
Strategic

End User
Tools

Client
Server Web

Client Server
Web

DB
Technology

Relational

Relational

Trans Count

Large

Small

Trans Size

Small

Trans Time

Short

Size in Gigs

10 200

400 - 4000

Normalization

3NF

0NF

Data Modeling

Traditional
ER

Mostly
Partition
Elimination

Design to
eliminate
per object

Mostly
Partition
Wise Join

Design to
parallelize
across
objects

Large
Long

Dimensional

43

Other Interesting Findings


64-bit scales much more reliably than 32-bit, even when
on same hardware and using the <= 4GB memory model
If you know the application codes nature and its well
definable, manual memory management may be better
Using manual SGA/PGA targets with floors yields much
more scalable results and also more predictable patterns
Partitioning is not an automatic bonus, must experiment
to identify the optimal partitioning scheme per situation
Dont forget older technologies like clusters, they still can
add a positive to the overall equation in certain cases
Dont forget SQL Tuning Sets & SQL Advisor, or the
explain plans may not in fact be the best obtainable
Dont forget 11Gs Result and Client Caches
44

Questions and Answers

Thank you
Presenter: Bert Scalzo
E-mail: Bert.Scalzo@Quest.com

Note: these slides should be available on Open World web site,


but Ill also make sure to post them on my companys web site:
www.toadworld.com/Experts/BertScalzosToadFanaticism/tabid/318/Default.aspx

Вам также может понравиться