Вы находитесь на странице: 1из 46
Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query Parallelism John

Deep Dive into DB2 10 Query Performance Optimization:

Star Schemas and Multi-core Query Parallelism

John Hornibrook IBM Canada

Information Management
Information Management

© 2012 IBM Corporation

Information Management
Information Management

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

2

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

ITS

CREATING

ANY

WARRANTY

OR

REPRESENTATION

FROM

IBM

AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

ALTERING

THE

TERMS

AND

CONDITIONS

OF

THE

APPLICABLE

AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

2

(OR

LICENSE

© 2012 IBM Corporation

Information Management
Information Management

Agenda

New DB2 10.1 features

Star schema query optimization –Zig-zag join

Multi-core query parallelism –Intra-partition query parallelism –Existing functionality, significantly improved

Information Management
Information Management

Star Schema Query Optimization

Provides improved performance for ‘star schema’ queries

Star schemas are typically found in data marts or some data warehouses

Introduces new star schema join method: (zig-zag join) –Complimentary to existing star schema join methods

Improves existing star schema detection algorithms –Supports wider range of queries

Information Management
Information Management
Star Schemas Product Customer prodkey custkey category name Daily Sales upc_number address perkey prodkey
Star Schemas
Product
Customer
prodkey
custkey
category
name
Daily Sales
upc_number
address
perkey
prodkey
storekey
Period
promokey
perkey
custkey
year
quantity_sold
month
price
cost
Promotion
promokey
promotype
promodesc

•Logical DB design resembles a star •Central table contains business ‘facts’

•Sales prices, cost, quantities, etc.

•Surrounding tables contain ‘dimensional’ data

•Time, location, characteristics, etc.

•Each dimension is a ‘parent’ of the fact table

•1:N from a dimension to the fact

Store

storekey

storenumber

region

Information Management
Information Management

Star joins

Queries performed against star schemas

SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST)

FROM PERIOD, DAILY_SALES, PRODUCT, STORE

WHERE

PERIOD.PERKEY=DAILY_SALES.PERKEY AND

PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND

STORE.STOREKEY=DAILY_SALES.STOREKEY AND

CALENDAR_DATE BETWEEN AND

'01/01/2005' AND '04/28/2005' AND

STORE_NUMBER='03' AND

CATEGORY=72

GROUP BY ITEM_DESC

Aggregate on dimension attribute, sum on fact measures

Join fact to some subset of the dimensions

Join fact foreign keys to dimension primary keys

Constrain on dimension attributes

Information Management
Information Management

Star join dilemma

No single dimension may filter the fact table well

But a combination of dimensions may filter well

– 840,000 category 72 products sold during Jan. to April 2005 – 53,000 category 72 products sold during Jan. to April 2005 in store #3

How do we filter with a combination of dimensions?

Period 30M
Period
30M

50M

CALENDAR_DATE BETWEEN '01/01/2005' AND '04/28/2005'

Daily Sales 750M rows

20M

Product
Product

CATEGORY=72

Store

' STORE_NUMBER='03'

Information Management
Information Management

Star join solutions

Specialized join methods (pre-DB2 10.1):

Semi-join with index ANDing

• Use combinations of fact table indexes to avoid accessing data pages Hub join

• Use Cartesian product of dimension rows to provide better fact keys –Query must meet star join criteria –Both methods can be built and competed

–Regular join plans are still built and competed with either method

• Costing decides

• Specialized methods aren’t always the winners

Information Management
Information Management

Semi-join index ANDing star join

FETCH Daily Sales
FETCH
Daily
Sales

Produce filtered fact table (Daily_Sales) with foreign key indices ƒ Execute "semi-join" with each dimension that filters the fact table ƒ "AND" RID-maps from each semi-join with next semi-join ƒ Retrieve fact table columns via RIDs

NLJOIN
NLJOIN

rid bitmap –> each semi-join eliminates bits ->

100100101001001010011001

bits -> 1 00 1 00 1 0 1 00 1 00 1 0 1 00

NLJOIN

semi-join

1 00 1 00 1 0 1 00 1 00 1 0 1 00 11 00

Period

101101101001111011011001

semi-join Period 1 0 11 0 11 0 1 00 1111 0 11 0 11 00

NLJOIN

semi-join

1 0 11 0 11 0 1 00 1111 0 11 0 11 00 1 NLJOIN
1 0 11 0 11 0 1 00 1111 0 11 0 11 00 1 NLJOIN

000100001000000010001000

0 11 00 1 NLJOIN semi-join 000 1 0000 1 0000000 1 000 1 000 semi-join
0 11 00 1 NLJOIN semi-join 000 1 0000 1 0000000 1 000 1 000 semi-join

semi-join

NLJOIN semi-join 000 1 0000 1 0000000 1 000 1 000 semi-join P r o d

Product

Store

Daily Sales

Daily Sales

Daily Sales

Information Management
Information Management

Hub star join

Form a Cartesian join of filtering dimensions – Cartesian join -> no join predicates – Cartesian join result should be small to be effective

Join the Cartesian join result to the fact table using a multi-column fact table index.

PRODKEY STOREKEY PERIODKEY 10 30 50 NLJOIN 20 30 50 10 40 50 20 40
PRODKEY STOREKEY PERIODKEY
10
30
50
NLJOIN
20
30
50
10
40
50
20
40
50

PRODKEY STOREKEY

10

30

20

30

10

40

20

40

NLJOIN
NLJOIN

Daily Sales

PERIODKEY NLJOIN 50 Period PRODKEY STOREKEY 10 30 20 40 Store
PERIODKEY
NLJOIN
50
Period
PRODKEY
STOREKEY
10
30
20
40
Store

Probe fact table with multi-column index on:

PRODKEY,STOREKEY,PERIODKEY

Product

Information Management
Information Management

Hub star join

Works well if Cartesian result is small

Cartesian may contain many key combinations that don’t exist in the fact table – Results in unnecessary fact table index probes.

NLJOIN
NLJOIN
PRODKEY STOREKEY PERIODKEY 10 30 50 20 30 50 10 40 50 20 40 50
PRODKEY STOREKEY PERIODKEY
10
30
50
20
30
50
10
40
50
20
40
50
10
30
60
20
30
60
10
40
60
20
40
60
10
30
70
20
30
70
10
40
70
20
40
70
30 60 10 40 60 20 40 60 10 30 70 20 30 70 10 40

Daily Sales

Information Management
Information Management

DB2 10.1 Star Schema Highlights

Introduces a new zigzag join method that builds upon the zigzag join technology available in Redbrick that has proven unique performance advantage in the industry.

Provides consistent performance for warehouse queries.

Adds a new star detection method that is more reliable.

Supports star schema queries in single and multiple subject areas with snowflakes.

Exploits indexes even when there is a gap in probing key, reducing the number of indexes that need to be created.

Works seamlessly for range partitioned tables and in serial, SMP and DPF environments.

Can use MDC block indexes on the fact table for enabling zigzag join.

Recommends multi-column indexes to enable zigzag join through explain diagnostics and index advisor in Optim Query Tuner (OQT)

Information Management
Information Management

Enhancing the star detection in DB2 pre-10.1

DB2 (pre-10.1) recognizes a star –By analysis of sizes of tables and join predicates. –A star is detected after application of local filtering and snowflake joins.

The New Star Detection in DB2 10.1:

New
New

–Only requirement: joining dimension column(s) must be unique

–Detects multiple stars per query block

–Allows a star to be detected with fewer restrictions –Much more reliable –The new star detection method also enables pre-DB2 10.1 star schema plans. –Pre-DB2 10.1 detection is invoked if the new star detection fails to detect any star.

Information Management
Information Management

Comparison of old and new star detection methods:

No.

Requirement/Restriction

Before DB2 10.1

DB2 10.1

1

Minimum of three base tables

Necessary to form a star.

Necessary to form a star.

2

Minimum of two equijoin predicates

Necessary to form a star.

Necessary to form a star.

3

Multi-column index on fact table

Used by the Cartesian Hub plan, if available.

Used by the Zigzag join plan, if available.

4

Number of fact tables allowed

One

Unlimited

5

Non-deterministic or side-effect predicates

   

6

Non-equijoin predicates

Star can not be formed in the query block in the presence of this SQL feature.

Star can be formed in the query block in the presence of these features and may include the feature in the star

7

Sub-query predicates

8

Correlation among tables in a snowflake

9

Simple XML predicates

   

10

Derived (non-base) tables

Excluded from the star.

Can be included in the star.

14

© 2012 IBM Corporation

Information Management
Information Management
New
New

The new zigzag join method for star schema based queries

How does it work? –First forms the virtual Cartesian product of dimensions. –Avoids most non-productive probes from the Cartesian product into the fact table. –Fact table index provides feedback to dimensions. –zigzags through the dimensions and the fact table.

–zigzags through the dimensions and the fact table.  Pre-requisite : A multi-column index on the

Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.

15

© 2012 IBM Corporation

Information Management
Information Management

Using a multi-column index in a zigzag join

Pre-requisite

– Columns that participate in the join are included in the index

– Index columns from at least two dimension tables are completely covered by join predicates

D3 (D) D Fact B,C
D3
(D)
D
Fact
B,C

Consider this star schema based query:

– D1 has primary key A

– D2 has a composite primary key (B,C)

– D3 has primary key D

– These PK columns are used in equi-join operations with the fact table

D1 A (A)
D1
A
(A)
D2 (B,C)
D2
(B,C)
Fact table index definition (A,D), (A,B,C), (B,C,D), (C,B,D) (A,B,C,D), (A,B), (C,D) (B,A,C) (A,C,B,D)
Fact table
index
definition
(A,D), (A,B,C),
(B,C,D),
(C,B,D)
(A,B,C,D),
(A,B), (C,D)
(B,A,C)
(A,C,B,D)
Qualified?
YES
YES
NO
NO
Why?
The index
The index
completely
completely
covers two
covers three
dimensions.
dimensions.
The index does
not completely
cover the
dimension D2.
The columns B and C in the
composite index are not in
contiguous positions in the
index.
Information Management
Information Management

Zigzag join with index key gap processing

Gap processing allows a single multi-column index to be used for a bigger set of queries.

Greatly reduces the number of fact table indexes

E.g., a fact table index on (A, C, B) allows zigzag join when there is no join on C

D1 A B Fact (A)
D1
A
B
Fact
(A)
D2 (B)
D2
(B)

Gap processing is implemented using new jump scan technology

Explain facility indicates when gap processing is used – New JUMPSCAN argument on IXSCAN operator – Gap columns identified

Arguments:

--------------

JUMPSCAN: (JumpScan Plan) TRUE

Gap Info:

--------------------- Index Column 0:

Index Column 1:

Index Column 2:

Status --------------------- No Gap Positioning Gap No Gap

Information Management
Information Management

Multi-column index recommendations

New explain diagnostic message recommending multi-column fact table indexes

The optimizer performs analysis of primary/unique keys and equi-join predicates in the query that and detects that:

– the query is based on a star schema and – a multi-column index does not exist or a different multi-column index might provide better performance

Extended Diagnostic Information:

------------------------------------------------

Diagnostic Identifier: 1 Diagnostic Details:

EXP0256I Analysis of the query shows that the

query might execute faster if an additional index

was created. Schema name: "STAR". Table name:

"FACT". Column list: "(F3, F2, F1, F0)".

Optim Query Tuner provides a workload based index advisor that uses the above feature to determine a consolidated set of index recommendations.

Information Management
Information Management

Understanding ZZJOIN plan components

Performs data prefetch of the fact table for an all-probes List-Prefetch.

ZZJOIN(2)

FETCH

RIDSCAN

Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.

Scans either:

1) Index over temp or 2) Fast integer sort array

Builds either:

1) Index over temp or 2) Fast integer sort

SORT

ZZJOIN(1) Performs the zigzag join operation 1) Last leg is the fact table 2) Preceding
ZZJOIN(1)
Performs the zigzag join operation
1) Last leg is the fact table
2) Preceding legs are dimensions
TBSCAN
TBSCAN
access plan
for fact table
TEMP
TEMP
plan for
plan for
Could be one of the following:
snowflake 1
snowflake 2
1) Index scan
2) Single-probe list-prefetch
3) All-probes list-prefetch
Snowflake plans could either be:

1) Access of a single table or 2) Joins of multiple tables

Information Management
Information Management

Accessing a dimension in a zigzag join plan

A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.

ZZJOIN(1) TBSCAN TBSCAN access plan for fact table TEMP TEMP plan for plan for snowflake
ZZJOIN(1)
TBSCAN
TBSCAN
access plan
for fact table
TEMP
TEMP
plan for
plan for
snowflake 1
snowflake 2

The

TEMP

operator shows the following information (new operator argument):

RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).

To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.

Information Management
Information Management

Fast integer sort and index-over-temp

Two new dimension access methods are implemented to ensure efficient random access of the dimensions by the zigzag join operator.

– An index is created over the TEMP operator (IOT) using dimension join columns. Additional columns may be included in the index as ‘include’ columns

– A fast integer sort (FIS) data structure is built using the join key from the dimension. This method has an extension to allow additional columns if the join key is of type INTEGER.

In order for the optimizer to pick fast integer sort, the dimension must not have a composite key and the joining column must be of type INTEGER or BIGINT.

– If the join column is of type BIGINT, fast integer sort can be used only if no other dimension column is required for subsequent operations.

The

TBSCAN

operator (input to ZZJOIN(1) operator) shows the following:

IDXOVTMP: (A temporary index will be created and used on this temp)

• TRUE - the scan builds an index over the temporary table for random access.

• FALSE - the scan builds a fast integer sort structure for random access.

– The feedback predicates applicable to that dimension are displayed in the form of start- stop key conditions.

Information Management
Information Management

Fact table index access strategies

Index scan and data page fetch Single-probe list-prefetch All-probes list-prefetch

Information Management
Information Management

Fact table index access

IXSCAN-FETCH plan:

– The index scan accesses the index over the fact table to retrieve RIDs from the fact table matching the input probe values.

– These fact table RIDs are then used to fetch the necessary fact table data.

ZZJOIN Any access on Any access on FETCH D1 D2 IXSCAN FACT
ZZJOIN
Any access on
Any access on
FETCH
D1
D2
IXSCAN
FACT
Information Management
Information Management

Fact table access using single-probe list-prefetch plan

The list prefetch plan executes for every probe row from the combination of dimension tables/snowflakes.

The index scan over the fact table finds fact table RIDs matching the input probe values.

The SORT, RIDSCAN and FETCH operators sort RIDs according to data page ids and start off list prefetchers to get the fact table data.

24

ZZJOIN Any access Any access FETCH on on D1 D2
ZZJOIN
Any access
Any access
FETCH
on
on
D1
D2
RIDSCAN
RIDSCAN

fact table data. 24 ZZJOIN Any access Any access FETCH on on D1 D2 RIDSCAN SORT

SORT

IXSCAN
IXSCAN
FACT
FACT

© 2012 IBM Corporation

Information Management
Information Management

Fact table access using all-probes list-prefetch plan

All matching RIDs from all the probes are sorted together in the order of the fact table data pages and the list prefetchers started to retrieve the necessary fact table data .

The benefit of sorting all the RIDs in this fashion is that it helps achieve better prefetching and can lower the number of physical I/Os.

A back-join with each of the dimension tables is necessary to retrieve the dimension table columns required for subsequent operations

– Dimension columns do not flow through list-prefetch operation

– Back-join represented as a 2 nd ZZJOIN operator

25

ZZJOIN(2) FETCH RIDSCN SORT ZZJOIN(1) Any access on Any access on IXSCAN on D1 D2
ZZJOIN(2)
FETCH
RIDSCN
SORT
ZZJOIN(1)
Any access on
Any access on
IXSCAN on
D1
D2
FACT

© 2012 IBM Corporation

Information Management
Information Management

Multi-core Query Parallelism

Also known as ‘intra-partition parallelism’

Supported in DB2 since V5

Query parallelism within a database partition

Parallelism achieved without the use of the database partitioning feature –Does not require any form of data partitioning

Exploits symmetric multi-processor and/or multi-core processors

DB2 10.1:

– Extend the existing implementation – Remove scalability bottlenecks

Information Management
Information Management

Multi-core Query Parallelism Use Cases

Large OLTP reporting systems –Reporting jobs can often be a large part of the batch processing –Workloads are normally running on large multi-processor machines • SMP, with multiple cores, sometimes with hyper-threading –Improve multi-core query parallelism to reduce the time the reporting jobs take within the batch windows

C-Class warehouse workloads –Targeting warehouse and marts that are up to 4-5 TB –Will be running on x or p servers with anywhere from 8 to 32 cores –Simple setup using ESE (i.e. no database partitioned) –Improve query response through multi-core parallelism

Information Management
Information Management

Current intra-partition parallelism architecture

Combination of data and functional parallelism

Data parallelism Dynamically partition data

• Assign partition to query task

• Easier to load balance

• User not required to partition data e.g. range, hash, etc

Data dynamically assigned to query tasks

• Assign range of pages or rows (Range is a fixed size prior to DB2 10.1) Assign new range when range is consumed

• Provides dynamic load balancing

• Support table and index scans

Information Management
Information Management

Dynamic data allocation – “straw scans”

Degree=4

Subagent 1
Subagent 1
Subagent 2 Subagent 3 Subagent 4 Subagent 3 Subagent 2
Subagent 2
Subagent 3
Subagent 4
Subagent 3
Subagent 2
Pages 0-1 Pages 2-3 Pages 4-5 Pages 6-7 Pages 8-9 etc
Pages 0-1
Pages 2-3
Pages 4-5
Pages 6-7
Pages 8-9
etc

© 2012 IBM Corporation

Information Management
Information Management

Functional parallelism

Functional parallelism

– Divide query task by function

– Assign functional task to different execution units

– Doesn't require data partitioning

– Harder to load balance • Must ensure execution units are equally busy

DB2 implementation –Single co-ordinator process services application requests –Multiple sub-agent processes return data through local table queue –Only 1 parallelized functional unit (section)

Information Management
Information Management
Functional parallelism RETURN (9) | •Query contains only 2 subsections and 1 local table queue
Functional parallelism
RETURN
(9)
|
•Query contains only 2
subsections and 1 local table
queue
Co-ordinator
LTQ
(8)
•Runtime operators coordinated
using latches, semaphores,
shared memory controls blocks
Subagent 1
Subagent 2
Subagent 3
Subagent 4
 

LTQ

LTQ

LTQ

LTQ

(8)

(8)

(8)

(8)

|

|

|

|

MSJOIN

 

MSJOIN

MSJOIN

MSJOIN

 

(7)

(7)

(7)

(7)

/----+----\

 

/----+----\

/----+----\

/----+----\

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

(3)

(6)

(3)

(6)

(3)

(6)

(3)

(6)

|

|

|

|

|

|

|

|

SORT

SORT

SORT

SORT

SORT

SORT

SORT

SORT

(2)

(5)

(2)

(5)

(2)

(5)

(2)

(5)

|

|

|

|

|

|

|

|

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

TBSCAN

(1)

(4)

(1)

(4)

(1)

(4)

(1)

(4)

|

|

|

|

|

|

|

|

PRODUCT

PRODATR

PRODUCT

PRODATR

PRODUCT

PRODATR

PRODUCT

PRODATR

© 2012 IBM Corporation

Information Management
Information Management

Intra-partition parallelism example

LTQ (8) | MSJOIN (7) /----+----\ TBSCAN TBSCAN (3) (6) | | SORT SORT (2)
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN
TBSCAN
(3)
(6)
|
|
SORT
SORT
(2)
(5)
|
|
TBSCAN
TBSCAN
(1)
(4)
|
|
PRODUCT
PRODATR

select p.name, p.prod_id, pa.attribute from product p, prodatr pa where p.prod_id = pa.prod_id;

Results returned via shared memory table queue to co-ordinator agent

Join processed in parallel by each agent by joining corresponding partitions

Each agent scans a sort partition

Hash partitioned sorts on prod_id

one partition per agent

Parallel table scans ("straw" scans)

© 2012 IBM Corporation

Information Management
Information Management

Intra-partition parallelism architecture

Compile -

Time

SQL Query

parallelism architecture Compile - Time SQL Query Query Optimizer Best Query Plan Threaded Code Run -

Query Optimizer

Best Query Plan

Threaded Code

Run -

Time

Agent Agent Agent
Agent
Agent
Agent

Prefetchers

Threaded Code Run - Time Agent Agent Agent Prefetchers  Single query involves – 1 coordinating

Single query involves

– 1 coordinating agent

– n sub agents

– m prefetchers (shared)

– All executing in parallel on available processors

Combination of

– Data parallelism

• Each agent works on subset of data

• Data dynamically assigned so user not required to partition data

– Functional parallelism

• Each agent works on different query function, e.g. scan, sort

User can control "degree" of parallelism

Also benefits I/O bound uniprocessors

Information Management
Information Management

DB2 10.1 Multi-core query parallelism

Improved scalability

–Within the current architecture –Scale near-linearly to degree 32

–Achieved by:

1.Improved load balance New rebalance (REBAL) access plan operator 2.More efficient parallelization techniques Move LTQ ‘higher’ in the access plan 3.Reduce latch contention

Information Management
Information Management

Improved scalability

•Load imbalance results in poor scalability

•REBAL redistributes rows to ensure all subagents do equal work

•Optimizer performs load balance analysis to determine REBAL placement

6.77122e+06

NLJOIN

(

6)

713706

63

/---------+----------\

292.2

REBAL

(

7)

325.265

23173.3

FETCH

(

9)

2456.85

11

2

|

/---+----\

 

292.2

23173.3

6.77122e+07

TBSCAN

IXSCAN

TABLE: DB2USER

(

8)

(

10)

DAILY_SALES

325.265

1605.23

11

1

|

|

2922

TABLE: DB2USER PERIOD

Q2

6.77122e+07

INDEX: SYSIBM

SQL091218161022180

Q1

Q1

Multi-core Query Parallelism

Before

degree After degree
degree
After
degree
Information Management
Information Management

Improved scalability

More efficient parallelization techniques –Partial-final UNIQUE –GRPBY on unique key

• Can perform complete GRPBY without a partitioned SORT

–Improved access plan parallelization transformation costing –Improved exploitation of stream partitioning

• Avoid partitioned SORT

Reduce latch contention –Dynamic straw scan unit (straw “gulp” size) –Improved NLJOIN inner access –Improved HSJOIN –Improved partitioned SORT –Prefetcher queues –Various others

Information Management
Information Management

DB2 10.1 Multi-core query parallelism externals

Support mixed workloads

–Parallelize report queries in an OLTP system

–Reduce parallel ‘infrastructure’ overhead on OLTP queries

• Pre DB2 10.1 there is a 10-15% impact just by setting INTRA_PARALLEL=ON

In ESE only. DPF unconditionally enables parallel infrastructure

• DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload –Improved automatic degree determination

• degree=ANY

• Avoid parallelizing queries that won’t benefit

• Improved automatic runtime degree reduction

Information Management
Information Management

Controlling query parallelism

WLM workload control:

– An OLTP workload that doesn’t use parallelism

• =1 INTRA_PARALLEL=NO

CREATE WORKLOAD banking_wl APPLNAME (‘banking’) MAXIMUM DEGREE 1;

– A BI workload using parallelism

• >1 INTRA_PARALLEL=YES

• Also specifies the degree upper limit

• The application specifies the requested degree using existing external controls

CREATE WORKLOAD report_wl APPLNAME (‘cognos’) MAXIMUM DEGREE 8; ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;

Application control:

CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(‘YES’)

Toggles intra-partition parallelism at transaction boundaries

– Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors

Information Management
Information Management
Pre-DB2 10.1 intra-partition parallelism external controls Parameter Value Default Scope Comment INTRA_PARALLEL
Pre-DB2 10.1 intra-partition parallelism external controls
Parameter
Value
Default
Scope
Comment
INTRA_PARALLEL
NO,YES
NO
Instance
DBM configuration
MAX_QUERYDEGREE
ANY,
ANY
Instance
DBM configuration,
1~32,767
Valid only if INTRA_PARALLEL is ON
DFT_DEGREE
ANY,
1
Database
DB configuration,
1~32,767
Initial value for CURRENT DEGREE special register or
package bind DEGREE option
CURRENT DEGREE
ANY,
DFT_DEGREE
Application
Special register, the degree of parallelism considered by the
SQL compiler for dynamic SQL access plans
1~32,767
Bind DEGREE
ANY,
DFT_DEGREE
Package
DB2 bind option, the degree of parallelism considered by the
SQL compiler for static SQL access plans
1~32,767
SET RUNTIME
DEGREE command
1~32,767
Application
CLP command, the degree of parallelism allowed at runtime for
any access plans (dynamic or static SQL)
Information Management
Information Management

Appendix

Additional material

Information Management
Information Management

Star schemas

Dimension tables

–Contain descriptive information to augment fact rows –Used to filter fact rows –Query results are aggregated on dimension attributes –Contains a primary key

•possibly multiple columns •generated, meaningless numeric value

–Typically contains much fewer rows than the fact table –May be represented as a hierarchy of tables or a ‘snowflake’

•e.g. product is further normalized to product, brand and category •but this requires extra joins

Brand

Product
Product

Category

Information Management
Information Management

Star schemas

Fact table

– Contains numeric measures of business information

– Queries perform computation (sum, avg, etc.) on measures

– Contains primary key columns from each dimension

• Represent foreign keys referencing each parent dimension

• Can have explicit referential integrity, but not necessary for DB2

– May have a primary key

• Composite of the foreign keys or

• Single, generated, meaningless numeric value

– Number of rows depends on fact granularity

• hourly, daily, etc.

• finer granularity -> more rows

• coarser granularity -> limits drill down ability

– Typically, local predicates aren’t applied directly

Information Management
Information Management

Star schemas

Data Marts –Can contain multiple fact tables –Each fact usually denotes a separate star –Dimensions can be shared across stars • e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions –Queries may join multiple fact tables

Information Management
Information Management

ZZJOIN(1) operator

An n-ary join method that joins together the dimension table/snowflakes and the fact table.

Drives the process of forming probe rows from dimension tables/snowflakes,

– Probes the fact table to find matching fact table rows

– Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes.

Feedback predicates identified in explain information

– New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK

– Displayed in the ZZJOIN operator details in db2exfmt

44

Predicates:

----------

2) Feedback Predicate used in Join,

Comparison Operator:

Equal (=)

Subquery Input Required:

No

Filter Factor:

0.25

Predicate Text:

--------------

(Q3.D2FK = Q1.D2PK)

3) Feedback Predicate used in Join, Comparison Operator:

Equal (=)

Subquery Input Required:

No

Filter Factor:

0.25

Predicate Text:

--------------

(Q3.D1FK = Q2.D1PK)

© 2012 IBM Corporation

Information Management
Information Management

ZZJOIN (2) operator

Only required for all-probes list-prefetch.

Uses the join columns to locate the matching row in the temporary table so that the required non-join columns from the dimension table can be retrieved.

Makes use of the efficient random access method such as FIS or IOT to retrieve the dimension table columns required for subsequent operations. – Also known as ‘backjoin’

Indicated in explain by BACKJOIN argument of ZZJOIN operator

Information Management
Information Management

Star schema plans in DB2 pre-10.1

Type of plan How does the plan work? Pre-requisite
Type of plan
How does the plan work?
Pre-requisite

Hub join

• Cartesian product of dimensions.

• Each row in Cartesian product probes the multi-column fact table index.

Multi-column index on the fact table on columns that join with the dimensions.

Semi-join with index ANDing

• Pre-filtering of the fact table by dimensions (semi-joins).

• Index ANDing the results of the dimension filtering.

• Completing the dimension join.

Indexes on the fact table on each of the columns that joins with the dimensions (typically, the foreign keys)

Regular (2- way) join

Most likely plan is to:

• Join the most filtering dimension with the fact table first.

• Join in rest of the dimensions using a suitable join method such as hash join. Other plans are possible.

None.