Академический Документы
Профессиональный Документы
Культура Документы
This document contains, the optimizations that MySQL does as mentioned in their internal
manuals, and what we DB in cloud products must need to do w.r.t the cloud data model.
1. Primary Optimization
1.1. Optimizing Constant Relations
Transitivity law is applied and any constant that can be propagated is propagated.
DB in cloud products:
It is highly unlikely to get query from Mickey/CPA/CD developers using criteria where one field is
equated to other field, and one of those is also in another criteria equated with constants. Mostly
criteria are equated with constants.
Removing conditions that are always true or always false (sometimes, impossible where). Those
conditions are :
MySQL cannot detect all ‘always false’ criteria. For example, if a char column size is less than the
char sequence in criteria, it is always false.
DB in cloud products:
It is highly unlikely to get query criteria that is between constant and another constant from
Mickey/CPA/CD developers.
All data table fields are nullable by MySQL schema, but in custom modules creation some fields
could have been set to be not nullable. So this needs to be carried out by DB in cloud products.
DB in cloud products could also eliminate impossible where with the meta data we have.
Allowed values intelligence in Mickey Data Dictionary is not going to be available to us, so we
have to do the validation.
To:WHERE column1 = 3
Page 1 of 9
This might not be part of the criteria to begin with, but could have been a result after propagating
constants.
DB in cloud products:
Any arithmetic expression given in criteria object would be solved by JVM before attaching to the
field. So we would not need that normally. If there is constant propagation, then this would need
to be done. Depends on constant propagation’s situation.
The word “constant” in MySQL context means not only a primitive value or a literal string, but
when a table expression also satisfies the following conditions.
These rules mean that a constant table has at most one row value. MySQL will evaluate a
constant table in advance, to find out what that value is. Then MySQL will “plug”that value into
the query.
DB in cloud products:
We will need to do this in DB in cloud products, if there is criteria on Primary Keys, Serial Fields,
Unique Fields. Note: If there are other criteria in conjunction with those, then it is still valid
constant. If there are criteria that are in disjunction then it might not be a constant. It might
contain more than 1 rows. Example:
when Table1 definition contains
1.2.1. Determining the JOIN type (But it really means Access Type)
Access/Join Type means “In what way the data in the table is retrieved/accessed”. Best to worst:
4. ref: an index with an equality relation, where the index value cannot be NULL
5. ref_or_null: an index with an equality relation, where it is possible for the index value to be
NULL
6. range: an index with a relation such as BETWEEN, IN, >=, LIKE, and so on.
Page 2 of 9
7. index: a sequential scan on an index
The optimizer uses the best join type, to pick the driver expression. Scanning the whole TABLE is
bad. Indexed search is better than Full table scan.
DB in cloud products:
There might be an index created by the module creator for a field, MySQL might be unaware of it
and might do a full table scan in data table. So we must check for indexes (better access paths)
for a table/table rows. If we use index, we will get one or more record IDs from index table through
any access type among [const, eq_ref, ref, ref_or_null, range, index or ALL] based on the
criteria and with that we will do eq_ref, range access in data table.
Query Execution Plan (QEP) is a set of tables, order in which it is joined and the access types of
those tables. For each part of a plan a cost is assigned, and the QEP with least cost is chosen to
be the optimal QEP.
MySQL starts calculation of cost for tables, in order as given in select. It calculates in bottom up
manner. It calculates all the plan for one table, and then all the plan for two tables and so on. It
first calculates a QEP cost, and in subsequent combinations, it only continues if current cost (for
partial plan - plan till now) is less than the best. Otherwise it goes to next combination.
Cost for on access is calculated using a complex formula involving following factors. Correlation
is given next to them in parenthesis. The factors in blue are not considered all the time.
2. How many Key parts in common with tables (join criteria indexed on both table?) (More match
-> better)
DB in cloud products:
Index details, Constraint (Unique) details, Number of rows in the “module” are with DB in cloud
products now. So we will have to decide the access path and ask MySQL to execute in the path
provided.
Working with indexes, but relation is > , <, >=, <=, IN, LIKE, BETWEEN and AND. MySQL chooses
to do ALL instead of range join type, if range is very wide. Saves time overhead caused by going
back and forth between index and table pages. Also MySQL treats
“column1 IN (1,2,3)”
the same as
Page 3 of 9
and hence they need not be changed manually, BETWEEN and AND are also changed to <= and
>=.
DB in cloud products:
We need to use our index table (if available) always, instead of scanning data table.
We can extract range condition from given condition, similar to how MySQL does (Range Access
Method for Single part index), and get those records from data table and perform the where
criteria, instead of performing it to the whole table.
IN
Find the min and max value in the list of values and do a range access like above?
MySQL chooses to use the index itself to return result set, instead of reading the table page, as
the index itself contains the required columns.
DB in cloud products:
We must return the value from Index table, without reading the data table, if the result set
demands only the field in the index. If we support composite index, this case must be extensible
and the data table should not be read unless required.
Every column in the predicate in this form must have an index and no pair or more combination of
columns are in the same index.
MySQL selects row ids from all the index keys separately. Puts in into an object (Unique).
Duplicate row ids are removed and uses those keys to retrieve the rows.
Page 4 of 9
Creates a SEL_IMERGE object that contains disjunction of SEL_TREE objects. Optimizer collects
all possible ways of row access and chooses the one with minimal cost.
DB in cloud products:
2. Combine the result with other index criteria result record ids using set theory.
3. Return the rows from data table, with record ids in final result.
If the query contains criteria only on indexes, then we must use the implemented index merge
and get the record ids -> rows.
If the query contains criteria on indexes and on non indexed columns, then we must use the index
criteria part and get rows ids, get the rows and then apply the non indexed column criteria. The
result will be the query result.
If the query is a join between two tables, with index criteria on both the tables, and a join
condition, Then first the rel record id relation can be selected, after that records ids from first table
and record ids from second table are selected. The row ids that need to be selected from data
table are
Result Row Id Pair = Rel table pairs( Intersection )Cross Product between first and second?
Final result record ids from index table -> records from data table using record ids.
Records ids from that index table -> records from data table -> criteria on result set -> Final result
set
No Case: No index criteria or one or more indexed criteria in disjunction with other non indexed
criteria.
Cannot use index merging now, as we will have to do full table scan for the other criteria anyway.
1.3. Transpositions
MySQL supports transpositions (reversing the order of operands around a relational operator). But
arithmetic is not supported.
“WHERE - 5 = column1”
becomes
But
“WHERE 5 = -column1”
is not supported.
DB in cloud products:
Page 5 of 9
It is not expected that the users will use arithmetic expression in query through code, and even so
if they use such in RHS (or LHS) it will be calculated in the stack before query is sent to MySQL.
2. If one of them has better join type, then pick that index to drive the query. (get row ids,
consequently the row and apply column2 on it)
3. If both of them are indexed, then pick one that was created first OR use index merge
DB in cloud products:
Using index merge is more efficient due to computation of unions, intersections and difference etc
can be done on app server.
1.3.2. OR relations
MySQL decides
If column1 and column2 are not different columns, but same, then there is no table scan or index
merge, we use range access.
DB in cloud products:
UNION ALL
DB in cloud products:
“column1 <> 5”
Page 6 of 9
“column1 < 5 OR column1 > 5”
But MySQL will NOT transform this case. If developer feels range access is better, then query
accordingly.
DB in cloud products:
We could build the intelligence of transforming this query in DB in cloud products. If the row count
is large, then range access, but if less index scan? Or always range scan?
The above query also uses index on column1 but only to find rows. Index scan is always better
join type than table scan.
DB in cloud products:
We can get record ids in order we want, if we order using index table, if the order by is on an
indexed field. Then query data table like follows:
select Zcdb_DataTable.<columns>
from Zcdb_DataTable
where
MODULE_ID = <MODULE_ID>
and
order by
Just need to do more analysis if this method is efficient than file sort in the data table.
“SELECT COUNT(*) FROM Table1;” returns row count directly from meta cache for MyISAM and
table is never read. “SELECT COUNT(column1) FROM Table1;” is subject to same optimization
only when column is not null.
“SELECT MAX(column1)
Page 7 of 9
FROM Table1
If column1 is indexed, finding “a” and going back one key. Vice vera for MIN(), < and
combinations.
• The group by can be done with an index. (Only one table in from clause, and no where clause)
DISTINCT will return ordered result like GROUP BY when this transformation occurs. GROUP BY
will return ordered results always unless there is an ORDER BY NULL clause.
DB in cloud products:
We can return count of rows in table or not null column from meta, when we decide to maintain
count meta, instead of querying.
We can use index table to do group by and get record ids, and do a shorter range scan and
preform the aggregate action on data table.
1 F1 722
2 F1 675
3 F1 923
4 F1 675
5 F1 722
6 F1 722
7 F1 722
8 F1 675
9 F1 675
Record ID Value
{1, 5, 6, 7} 722
{2, 4, 8, 9} 675
{3} 923
With this record IDs (also apply criteria if any) we can do a range scan on data table and perform
group by, which could be efficient if that module row count is sufficiently large, and we have
reduced the range scan width. If the module size is smaller, this need not be done.
Page 8 of 9
2. Other Optimizations
2.1. NULLs filtering for ref and eq_ref access
This is done after choosing the join order. Suppose there is a join order
…, …, table1, table2, …
table1.key_column = table2.column
or
Then we will get ref or eq_ref access on table1 as it is a key. There for this implies table2.column
or table2.columnN cannot be null. Therefore MySQL adds the predicate
This predicate can be checked and rows can be filtered after reading the rows of table2.
DB in cloud products:
For any equality between an indexed/not null column and any (non indexed not null column) we
can add this predicate, so filtering is improved.
If there is a query plan with ref access method on table, and criteria is
DB in cloud products:
We can return no result without executing if in a predicate P there is NULL on RHS and a non null
index on LHS, if it is in disjunction. If P is in conjunction with other predicates then we can just
remove it. Refer 1.1.2. Eliminating Dead Code
Page 9 of 9