CT004-3.5-3-Advanced Database Systems Database Performance & QO Topic & Structure of Lesson In this lecture we will look at: Database performance issues Indexing Steps involved in query optimization Already covered most of this Tips for SQL performance tuning An overview of distributed query processing Slide 2 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Learning Outcomes By the end of this lesson you should be able to: Discuss database performance issues Explain how a query can be optimized Use tips for tuning SQL Slide 3 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Introduction to Performance Tuning You need an understanding/awareness of: Transforming models to implementation The relational model (for a number of reasons) Technology factors Slide 4 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Introduction to Performance Tuning Database design starts with modeling the requirement and producing a conceptual model In implementation, a number of trade-offs need to be considered to: Satisfy todays needs for information Satisfy the above in reasonable time (performance requirements) Satisfy anticipated or unanticipated user demands, e.g. ad- hoc queries Be capable of being extended Be easy to modify in changing hardware & software environments Slide 5 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Tuning Application (Analyst/Programmer) DBA - Systems Level Tuning Vendor - Product specific Investigation Monitoring DB Statistics e.g.. logs, transaction times Simulation Describes how the system evaluates the query Explain facility (Oracle) Execution Plan facility (SQL Server) Slide 9 (of 37) Indexing CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Tuning: Table Indexing Index is a means of expediting retrieval of data e.g. Find all students with gpa > 3.3 May need to scan entire table
Index enables finding data quickly without having to scan the whole table
Indexes are built on a column(s) (search key) 1 column or a combination of columns of a table By default, primary key field is indexed Fields marked as 'unique' are indexed
Index consists of a set of entries pointing to locations of each search key CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 B-Tree Index Example Commonly used with attribute tables as well as graphic-attribute tables (CAD data structures) Binary coding reduces the search list by streaming down the tree. A balanced tree is best. 37 12 49 59 19 44 3 37 49 12 19 44 3 Number High Low Primary Key # 59 CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Index Indexes take up storage choose carefully On popular 'queryed' columns Can be added after understanding query patterns WHERE condition should be tuned to take advantage of the indexes Rethink: attributes with a high badness factor, e.g. gender Full table scan may be better if hit rate > 20%
Indexes need 'maintainance' Indexes need to be periodically refreshed Indexes are normally refreshed during downtime Indexes arent technically necessary for operation Indexes must be maintained by DB administrator Pruned, refreshed, analyzed... CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Create Index CREATE INDEX index_name ON table_name (column_name) CREATE INDEX IDX_CUSTOMER_LAST_NAME ON CUSTOMER (Last_Name) CREATE INDEX IDX_CUSTOMER_LOCATION ON CUSTOMER (City, Country)
CREATE TABLE employee_records ( name VARCHAR(50), employeeID INT, INDEX (employeeID) )
ALTER DROP CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Types of Indexes Clustered vs. Unclustered Clustered- ordering of data records same as ordering of data entries in the index Unclustered- data records in different order from index Primary vs. Secondary Primary index on fields that include primary key Secondary other indexes Unique vs. Non-unique Non-unique e.g. Lastname CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Example: Clustered Index Sorted by sid sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 Smith 3.2 53831 Madayan 1.8 53832 Guldu 2.0 50000 53600 53800 CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Example: Unclustered Index Sorted by sid but Index on gpa sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 Smith 3.2 53831 Madayan 1.8 53832 Guldu 2.0 1.8 2.0 3.2 3.3 3.4 3.8 CT004-3.5-3-Advanced Database Systems Database Performance & QO Database Management Systems - TMC Computer School, August 1999 Using Indexes - Need to choose attributes to index wisely! - Examine transaction requirements - Update V Query? - Volume of rows updated or queried - Frequency of a query - Transaction rates - User priorities - Index usage (Oracle) - Not used with Nulls in the WHERE clause - Nor used if mathematics is used with the indexed attribute - What queries could benefit most from an index? Query Optimization CT004-3.5-3-Advanced Database Systems Database Performance & QO Query Optimization Objective: Find the optimum set of access paths to retrieve the required data Applies to updates and queries Reasons for automating the optimization process: Machine can use more information Re-optimization easier following data re-organization Optimizer can evaluate more solutions than a non- automated process (i.e the user) Automation makes expertise more widely available Still very dependent on programmer skills Query syntax dramatically affects access path choices e.g whether or not an index is used Slide 16 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Query Optimization Still scope for human intervention Still very dependent on programmer skills because query syntax dramatically affects the access path choices e.g whether or not an index is used The term optimization can be an over claim Slide 17 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Example Get names of suppliers who supply part P2: Select distinct s.name from s,sp where s.s# = sp.s# and sp.p = p2 The database contains 100 suppliers and 10,000 shipments, 50 of which supply p2 Consider how to evaluate the query without optimization? Slide 18 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Unoptimized 1 Compute cartesian product of s and sp involves reading the 10,000 sp tuples 100 times resulting in 1,000,000 tuple reads Product will contain 1,000,000 tuples which will need to be written back to disk 2 Apply restriction in the where clause involves 1,000,000 tuple reads but gives a 50 row result which can stay in memory 3 Project the result of step 2 over sname to give the final result, containing at most 50 tuples, which again can remain in memory Slide 19 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Optimized 1 Restrict SP to those tuples containing p2 involves 10,000 tuple reads but the results has 50 rows which stay in memory 2 Join the result of step 1 to relation S over s# 100 tuple reads and results in 50 tuples, still in memory 3 Project the result of step 2 over sname to give a final result of 50 tuples Optimized version is about 300 times faster in terms of tuple I/O. Unoptimized version needs about 3,000,000 I/Os whereas the optimized version need around 10,100 A restriction followed by a join instead of a product then the restriction has produced a dramatic improvement Slide 20 (of 37) CT004-3.5-3-Advanced Database Systems Database Performance & QO Example: Indexing If SP was indexed or hashed on p# the tuples read in step 1 would be 50 rather than 10,000 and optimised version would be around 20,000 times faster Also, I/Os in step 2 to at most 50 in practice, block I/O are what count. Slide 21 (of 37) SQL Tuning CT004-3.5-3-Advanced Database Systems Database Performance & QO SQL Optimizations: Basics - Use column names instead of * in SELECT - Try to minimize the number of subquery block in your query - Try to use UNION ALL in place of UNION - Avoid != or NOT or <> Unable to use indexes even if 1 exists - Avoid DISTINCT - Avoid HAVING: e.g. Write the query as SELECT subject, count(subject) FROM student_details WHERE subject != 'Science' AND subject != 'Maths' GROUP BY subject; Instead of: SELECT subject, count(subject) FROM student_details GROUP BY subject HAVING subject!= Science' AND subject!= Maths';
http://beginner-sql-tutorial.com/sql-query-tuning.htm CT004-3.5-3-Advanced Database Systems Database Performance & QO - Use IN instead of OR for non-indexed column Better: .... WHERE <column> IN (val1, val2, val3) Than: .... WHERE <column> = val1 OR <column> = val2... - Use LIMIT 1 or EXISTS instead of IN or TOP - Preferably use 'Prefix' pattern matches - Avoid use of functions in Left-side of comparison Better: ...WHERE first_name LIKE 'Chan%'; Than: WHERE SUBSTR(first_name,1,3) = 'Cha'; - Avoid functions on indexed columns Better: WHERE event_date >= '2011/03/15' - INTERVAL 7 DAYS Than: WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(event_date) <= 7 - General SQL rules Use single case for all SQL verbs Begin all SQL verbs on a new line Separate all words with a single space Right or left aligning verbs within the initial SQL verb SQL Optimizations: More CT004-3.5-3-Advanced Database Systems Database Performance & QO Improve Query Processing - Verify that appropriate statistics are being collected - Use INDEXed columns in WHERE clauses - Use EXPLAIN to understand the query execution plan - Use MySQL 'Query' cache Caches result set and returns it if identical query re-comes Does not work for prepared statements http://dev.mysql.com/doc/refman/5.0/en/query-cache.html - Sequence WHERE clause predicates from most restrictive to least restrictive by table, by predicate type - Adjust internal variables Index Buffer Size, Table Buffer Size Number of max open tables, Time limit for long queries http://www.infoworld.com/d/data-management/7-performance-tips-faster-sql- queries-262 http://www.codeforest.net/8-great-mysql-performance-tips CT004-3.5-3-Advanced Database Systems Database Performance & QO Tips for MySQL/DB performance - Choose the right data type - (Almost) always have an id field - Use ENUM (vs. Varchar) if appropriate - Use NOT NULL constraints - Fixed length tables (static) are faster - Use vertical partitioning - Choose the right storage engine MyISAM for read-heavy, limited updates InnoDB for more scale, has row-based locking - Use PROCEDURE_ANALYSE for column type recommendation - Use persistent connections to DB - Do big insert, updates and deletes in (small) batches http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/ http://www.infoworld.com/d/data-management/10-essential-performance-tips- mysql-192815 CT004-3.5-3-Advanced Database Systems Database Performance & QO Prepared Statements Prepare a query ones and inform the query engine Reuse query -> get performance benefit
- Does not need to be 're-parsed' each time - Protects application against SQL injection attacks - (In MySQL) Transmitted in a native binary form -> more efficient & help reduce network delays - BUT cannot be used by query cache (in MySQL)
http://www.roseindia.net/jdbc/jdbc-mysql/TwicePreparedStatement.shtml CT004-3.5-3-Advanced Database Systems Database Performance & QO Demos & Walkthrus View queries executed Check the query plan Change query and re-execute http://www.mysql.com/products/enterprise/demo.html
View Query Execution Plan http://www.codeproject.com/Articles/9990/SQL-Tuning- Tutorial-Understanding-a-Database-Execu
How to find if a query is worth optimizing? http://www.mysqlperformanceblog.com/2012/09/11/how- to-find-mysql-queries-worth-optimizing/
Slow Query Log and indexing walkthru http://www.dreamhost.com/dreamscape/2013/08/19/mys ql-checking-the-slow-query-log-and-simple-indexing/ Distributed Query Processing CT004-3.5-3-Advanced Database Systems Database Performance & QO Slide 34 (of 37) Distributed Query Processing Consider get London suppliers of red parts the user is at the New York site, data is in London n suppliers satisfy the criteria a relational system involves 2 messages
A non-relational system
NY London A n NY London A(n) n CT004-3.5-3-Advanced Database Systems Database Performance & QO Slide 35 (of 37) Optimization is an important issue There may be many ways of moving the data around Rx at X, Ry at Y Rx Y Ry X Rx, Ry Z Distributed Query Processing CT004-3.5-3-Advanced Database Systems Database Performance & QO Slide 36 (of 37) Distributed Query Processing Suppliers (S) (S#, CITY) 10,000 Site A Parts (P) (P#, COLOUR) 100,000 Site B Supplies (SP) (S#, P#) 1000,000 Site A Every tuple is 200 bits long There are 10 red parts 100,000 shipments by London suppliers Data transfer at 50,000 bps Access delay of .1 second Query: Find London suppliers of red parts. Total time (t) = access delay + (data vol. / data rate)
CT004-3.5-3-Advanced Database Systems Database Performance & QO Slide 37 (of 37) Distributed Query Processing Move relation P to site A and process .1 + ((100,000 x 200) / 50,000) = 400s (6.67 minutes) Move relations S and SP to B and process .1 + ((10,000 + 1000,000) x 200) / 50,000 = 4040s (1.12hrs) Restrict P at site B (to give 10 red parts). Move the result to site A .1 + (10 x 200) / 50,000 = .14 second! Distributed Query Optimization