Вы находитесь на странице: 1из 9

Cloud products DB Optimizer

This document contains, the optimizations that MySQL does as mentioned in their internal
manuals, and what we DB in cloud products must need to do w.r.t the cloud data model.

1. Primary Optimization
1.1. Optimizing Constant Relations

1.1.1. Constant Propagation

Transitivity law is applied and any constant that can be propagated is propagated.

DB in cloud products:

It is highly unlikely to get query from Mickey/CPA/CD developers using criteria where one field is
equated to other field, and one of those is also in another criteria equated with constants. Mostly
criteria are equated with constants.

1.1.2. Eliminating Dead Code

Removing conditions that are always true or always false (sometimes, impossible where). Those
conditions are :

1. Constants to constants boolean expression and the enclosed other conditions.

2. Nullable/Not Null column and IS NULL and IS NOT NULL criteria

MySQL cannot detect all ‘always false’ criteria. For example, if a char column size is less than the
char sequence in criteria, it is always false.

DB in cloud products:

It is highly unlikely to get query criteria that is between constant and another constant from
Mickey/CPA/CD developers.

All data table fields are nullable by MySQL schema, but in custom modules creation some fields
could have been set to be not nullable. So this needs to be carried out by DB in cloud products.

DB in cloud products could also eliminate impossible where with the meta data we have.

Datatype mismatch between const and column type in meta,

Char field size,

Enum field choices (int, char).

Allowed values intelligence in Mickey Data Dictionary is not going to be available to us, so we
have to do the validation.

1.1.3. Folding of Constants

Transformation of arithmetic expressions to final value in criteria. For ex:

From: WHERE column1 = 1 + 2

To:WHERE column1 = 3

Page 1 of 9
This might not be part of the criteria to begin with, but could have been a result after propagating
constants.

DB in cloud products:


Any arithmetic expression given in criteria object would be solved by JVM before attaching to the
field. So we would not need that normally. If there is constant propagation, then this would need
to be done. Depends on constant propagation’s situation.

1.1.4. Constants and Constant Tables

The word “constant” in MySQL context means not only a primitive value or a literal string, but
when a table expression also satisfies the following conditions.

1. A table with zero rows, or with only one row


2. A table expression that is restricted with a WHERE condition, containing expressions of the


form column = constant, for all the columns of the table's primary key, or for all the columns
of any of the table's unique keys (provided that the unique columns are also defined as NOT
NULL).

These rules mean that a constant table has at most one row value. MySQL will evaluate a
constant table in advance, to find out what that value is. Then MySQL will “plug”that value into
the query.

DB in cloud products:

We will need to do this in DB in cloud products, if there is criteria on Primary Keys, Serial Fields,
Unique Fields. Note: If there are other criteria in conjunction with those, then it is still valid
constant. If there are criteria that are in disjunction then it might not be a constant. It might
contain more than 1 rows. Example:

FROM Table1 ... WHERE unique_not_null_column=5

when Table1 definition contains

... unique_not_null_column INT NOT NULL UNIQUE

The above is a constant, but not when the query is:

FROM Table1 ... WHERE unique_not_null_column=5 OR some_random_column=3

1.2. Optimizing JOINS

1.2.1. Determining the JOIN type (But it really means Access Type)

Access/Join Type means “In what way the data in the table is retrieved/accessed”. Best to worst:

1. system: a system table which is a constant table

2. const: a constant table

3. eq_ref: a unique or primary index with an equality relation

4. ref: an index with an equality relation, where the index value cannot be NULL

5. ref_or_null: an index with an equality relation, where it is possible for the index value to be
NULL

6. range: an index with a relation such as BETWEEN, IN, >=, LIKE, and so on.

Page 2 of 9
7. index: a sequential scan on an index

8. ALL: a sequential scan of the entire table

The optimizer uses the best join type, to pick the driver expression. Scanning the whole TABLE is
bad. Indexed search is better than Full table scan.

DB in cloud products:

There might be an index created by the module creator for a field, MySQL might be unaware of it
and might do a full table scan in data table. So we must check for indexes (better access paths)
for a table/table rows. If we use index, we will get one or more record IDs from index table through
any access type among [const, eq_ref, ref, ref_or_null, range, index or ALL] based on the
criteria and with that we will do eq_ref, range access in data table.

1.2.2. Joins and Access Methods (Access Path - multiple tables)

Query Execution Plan (QEP) is a set of tables, order in which it is joined and the access types of
those tables. For each part of a plan a cost is assigned, and the QEP with least cost is chosen to
be the optimal QEP.

MySQL starts calculation of cost for tables, in order as given in select. It calculates in bottom up
manner. It calculates all the plan for one table, and then all the plan for two tables and so on. It
first calculates a QEP cost, and in subsequent combinations, it only continues if current cost (for
partial plan - plan till now) is less than the best. Otherwise it goes to next combination.

Cost for on access is calculated using a complex formula involving following factors. Correlation
is given next to them in parenthesis. The factors in blue are not considered all the time.

1. How many rows in table (More rows -> bad)

2. How many Key parts in common with tables (join criteria indexed on both table?) (More match
-> better)

3. How selective is restriction in WHERE part (More selective - better)

4. Long Key (Good, Varchar bad?)

5. Unique/Primary Key (Good)

6. Text Key (Bad)

7. Multi Column (covering) Keys (Good)

8. Short Key Length (Good)

9. Table file (Smaller -> Better)

10. Index Level (B+Tree, Smaller -> Good)

11. All Sort, Aggregate columns in this table (Good)

DB in cloud products:

Index details, Constraint (Unique) details, Number of rows in the “module” are with DB in cloud
products now. So we will have to decide the access path and ask MySQL to execute in the path
provided.

[…] More analysis + observations required.

1.2.3. Range Join Type (Access Type)

Working with indexes, but relation is > , <, >=, <=, IN, LIKE, BETWEEN and AND. MySQL chooses
to do ALL instead of range join type, if range is very wide. Saves time overhead caused by going
back and forth between index and table pages. Also MySQL treats

“column1 IN (1,2,3)”

the same as

“column1 = 1 OR column1 = 2 OR column1 = 3”

Page 3 of 9
and hence they need not be changed manually, BETWEEN and AND are also changed to <= and
>=.

MySQL optimizer will use index range scan for

“column1 LIKE ‘x%'”

but not for

“column1 LIKE ‘%x'”

The first character cannot be a wild card if index needs to be used.

DB in cloud products:

We need to use our index table (if available) always, instead of scanning data table.

>, <, >=, <=, <>, BETWEEN, AND, LIKE

We can extract range condition from given condition, similar to how MySQL does (Range Access
Method for Single part index), and get those records from data table and perform the where
criteria, instead of performing it to the whole table.

IN

Find the min and max value in the list of values and do a range access like above?

1.2.4. Index Join Type

MySQL chooses to use the index itself to return result set, instead of reading the table page, as
the index itself contains the required columns.

DB in cloud products:

We must return the value from Index table, without reading the data table, if the result set
demands only the field in the index. If we support composite index, this case must be extensible
and the data table should not be read unless required.

1.2.5. Index Merging

Index merging happens when the query criteria is in the form

“cond_1 OR cond_2 ... OR cond_N”

Every column in the predicate in this form must have an index and no pair or more combination of
columns are in the same index.

Works in following queries

SELECT * FROM t WHERE key1=c1 OR key2<c2 OR key3 IN (c3,c4);

SELECT * FROM t WHERE (key1=c1 OR key2<c2) AND nonkey=c3;

MySQL selects row ids from all the index keys separately. Puts in into an object (Unique).
Duplicate row ids are removed and uses those keys to retrieve the rows.

Page 4 of 9
Creates a SEL_IMERGE object that contains disjunction of SEL_TREE objects. Optimizer collects
all possible ways of row access and chooses the one with minimal cost.

DB in cloud products:

We must implement an index merge feature, where we

1. Take the indexed column criteria and return record ids.

2. Combine the result with other index criteria result record ids using set theory.

3. Return the rows from data table, with record ids in final result.

If the query contains criteria only on indexes, then we must use the implemented index merge
and get the record ids -> rows.

If the query contains criteria on indexes and on non indexed columns, then we must use the index
criteria part and get rows ids, get the rows and then apply the non indexed column criteria. The
result will be the query result.

If the query is a join between two tables, with index criteria on both the tables, and a join
condition, Then first the rel record id relation can be selected, after that records ids from first table
and record ids from second table are selected. The row ids that need to be selected from data
table are

Result Row Id Pair = Rel table pairs( Intersection )Cross Product between first and second?

Instead of full table scan,

Best Case: All the criteria column are indexed

Final result record ids from index table -> records from data table using record ids.

Worst Case: Only one criteria is indexed column and in conjunction

Records ids from that index table -> records from data table -> criteria on result set -> Final result
set

No Case: No index criteria or one or more indexed criteria in disjunction with other non indexed
criteria.

Cannot use index merging now, as we will have to do full table scan for the other criteria anyway.

1.3. Transpositions

MySQL supports transpositions (reversing the order of operands around a relational operator). But
arithmetic is not supported.

“WHERE - 5 = column1”

becomes

“WHERE column1 = -5”

But

“WHERE 5 = -column1”

is not supported.

DB in cloud products:
Page 5 of 9
It is not expected that the users will use arithmetic expression in query through code, and even so
if they use such in RHS (or LHS) it will be calculated in the stack before query is sent to MySQL.

1.3.1. AND relations

For following situation:

“WHERE column1 = 'x' AND column2 = ‘y'”

MySQL decides like following.

1. If both column1 and column2 are non indexed - table scan

2. If one of them has better join type, then pick that index to drive the query. (get row ids,
consequently the row and apply column2 on it)

3. If both of them are indexed, then pick one that was created first OR use index merge

DB in cloud products:

Using index merge is more efficient due to computation of unions, intersections and difference etc
can be done on app server.

1.3.2. OR relations

“WHERE column1 = 'x' OR column2 = ‘y'”

MySQL decides

1. Table scan for both

2. Or Index merge (union)

If column1 and column2 are not different columns, but same, then there is no table scan or index
merge, we use range access.

DB in cloud products:

Same observation as 1.3.1.

1.3.3. UNION queries

“SELECT * FROM Table1 WHERE column1 = 'x'

UNION ALL

SELECT * FROM TABLE1 WHERE column2 = ‘y'”

This produces same result as 1.3.2. (OR relations)

DB in cloud products:

Same observation as 1.3.1.

1.3.4. NOT relations

“column1 <> 5”

…will produce same result as…

Page 6 of 9
“column1 < 5 OR column1 > 5”

But MySQL will NOT transform this case. If developer feels range access is better, then query
accordingly.

DB in cloud products:

We could build the intelligence of transforming this query in DB in cloud products. If the row count
is large, then range access, but if less index scan? Or always range scan?

1.4. ORDER BY Clauses


MySQL removes dead order by clause. Ex-

“SELECT column1 FROM Table1 ORDER BY ‘x';”

MySQL removes/skips order by if it knows the rows will be in order anyway.

“SELECT column1 FROM Table1 ORDER BY column1;”

The above query uses index on column1 if it is available.

“SELECT column1 FROM Table1 ORDER BY column1+1;”


The above query also uses index on column1 but only to find rows. Index scan is always better
join type than table scan.

DB in cloud products:

We can get record ids in order we want, if we order using index table, if the order by is on an
indexed field. Then query data table like follows:

select Zcdb_DataTable.<columns>

from Zcdb_DataTable

where

MODULE_ID = <MODULE_ID>

and

RECORD_ID in (<ORDERED RECORD_ID WE GOT FROM INDEXED TABLE ORDERING>)

order by

field(RECORD_ID, <ORDERED RECORD_ID WE GOT FROM INDEXED TABLE ORDERING>);

Just need to do more analysis if this method is efficient than file sort in the data table.

1.5. GROUP BY and related conditions


Related items are HAVING, COUNT(), MIN(), MAX(), SUM(), AVG(), DISTINCT(). Group by will use
an index if index exists. If no index, Group by will sort and use Hash table, so same elements fall
in same bucket. Result of group by is always in sort order.

“SELECT COUNT(*) FROM Table1;” returns row count directly from meta cache for MyISAM and
table is never read. “SELECT COUNT(column1) FROM Table1;” is subject to same optimization
only when column is not null.

Optimizations for MAX(), MIN() exist.

“SELECT MAX(column1)

Page 7 of 9
FROM Table1

WHERE column1 < ‘a’;”

If column1 is indexed, finding “a” and going back one key. Vice vera for MIN(), < and
combinations.

“SELECT DISTINCT column1 FROM Table1;” is transformed to

“SELECT column1 FROM Table1 GROUP BY column1;” only when

• The group by can be done with an index. (Only one table in from clause, and no where clause)

• There is no limit clause

DISTINCT will return ordered result like GROUP BY when this transformation occurs. GROUP BY
will return ordered results always unless there is an ORDER BY NULL clause.

DB in cloud products:

We can return count of rows in table or not null column from meta, when we decide to maintain
count meta, instead of querying.

We can also perform MAX(), MIN() optimization directly from index.

We can use index table to do group by and get record ids, and do a shorter range scan and
preform the aggregate action on data table.

For ex: Imagine index table contains:

Record ID Field Value

1 F1 722

2 F1 675

3 F1 923

4 F1 675

5 F1 722

6 F1 722

7 F1 722

8 F1 675

9 F1 675

Index table grouped will be

Record ID Value

{1, 5, 6, 7} 722

{2, 4, 8, 9} 675

{3} 923

With this record IDs (also apply criteria if any) we can do a range scan on data table and perform
group by, which could be efficient if that module row count is sufficiently large, and we have
reduced the range scan width. If the module size is smaller, this need not be done.

Page 8 of 9
2. Other Optimizations
2.1. NULLs filtering for ref and eq_ref access

2.1.1. Early NULL filtering

This is done after choosing the join order. Suppose there is a join order

…, …, table1, table2, …

And the condition in which they are joined is

table1.key_column = table2.column

or

… (table1.composite_key_partN = table2.columnN) AND (table1.composite_key_partN+1 =


table2.columnN+1) …

Then we will get ref or eq_ref access on table1 as it is a key. There for this implies table2.column
or table2.columnN cannot be null. Therefore MySQL adds the predicate

table2.column IS NOT NULL

This predicate can be checked and rows can be filtered after reading the rows of table2.

DB in cloud products:

For any equality between an indexed/not null column and any (non indexed not null column) we
can add this predicate, so filtering is improved.

2.1.2. Late NULL filtering

If there is a query plan with ref access method on table, and criteria is

table.key_part_1 = expr1 AND table.key_part_2 = expr2 AND ..

If any of expri is NULL, then query returns no result without executing.

DB in cloud products:

We can return no result without executing if in a predicate P there is NULL on RHS and a non null
index on LHS, if it is in disjunction. If P is in conjunction with other predicates then we can just
remove it. Refer 1.1.2. Eliminating Dead Code

2.2. Partitioning Related Optimization


Not in scope currently.

Page 9 of 9

Вам также может понравиться