Вы находитесь на странице: 1из 36

SQL Server Query Performance Guidelines

Overview
In my years being a DBA Ive seen many (even made some myself) common mistakes when
reviewing the SQL queries that run against the systems I maintain. With this experience Ive
found that there are some general guidelines that should be followed when writing queries
and also when designing a database schema. In this tutorial we will take a look at a few
different areas where these common mistakes are made and what can be done to fix them.
These areas include:

Query writing

Indexing

Schema design

Explanation
In each section of this tutorial we will take a look at specific examples that will illustrate
things that should be avoided when it comes to performance in SQL Server. For each of
these items I will provide a solution or alternative that would provide better performance.
Please keep in mind that these are general guidelines and there will be exceptions to these
examples but in general following these basic principles should get you off to a fast start
performance wise.
The specific topics that will be covered in this tip are as follows:

Query writing:
o

How Join Order Can Affect the Query Plan

Remove Function Calls From the SELECT List

Avoid Using <> in WHERE Clause

Avoid Using Functions in WHERE Clause

Avoid Using Wildcard Characters to Start Search Criteria

Use a Derived Table in Place of IN Predicate With Aggregate Functions

Indexing:
o

Make Sure All JOIN Columns are Indexed

Use WHERE, JOIN, ORDER BY, SELECT Column Order When Creating Indexes

Make Sure All Tables Have a Clustered Index Defined

Schema design:
o

Use DELETE CASCADE Option to Handle Child Key Removal in Foreign Key
Relationships

Denormalize For Performance

Sample Table and Data Setup


So you can work through these examples in your own environment below is a sample
schema with some test data you can use to test each topic.
-- NOTE: You have to recreate these tables after each section of the tutorial
-- table creation logic
-- parent table
CREATE TABLE [dbo].[Parent](
[ParentID] [bigint] NOT NULL,
[IntDataColumn] [bigint] NULL,
[VarcharDataColumn] [varchar](1000) NULL,
[DateDataColumn] [datetime] NULL,
CONSTRAINT [PK_Parent] PRIMARY KEY CLUSTERED
([ParentID] ASC)
)
GO
-- child table
CREATE TABLE [dbo].[Child](
[ChildID] [bigint] NOT NULL,
[ParentID] [bigint] NULL,
[IntDataColumn] [bigint] NULL,
[VarcharDataColumn] [varchar](10) NULL,
CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED

([ChildID] ASC)
)
GO
-- foreign key constraint
ALTER TABLE [dbo].[Child] WITH CHECK
ADD CONSTRAINT [FK_Child_Parent] FOREIGN KEY([ParentID])
REFERENCES [dbo].[Parent] ([ParentID])
ON DELETE CASCADE
GO

-- child detail table


CREATE TABLE [dbo].[ChildDetail](
[ChildDetailID] [bigint] NOT NULL,
[ChildID] [bigint] NOT NULL,
[ExtraDataColumn] [bigint] NULL,
CONSTRAINT [PK_ChildDetail] PRIMARY KEY CLUSTERED
([ChildDetailID],[ChildID] ASC)
)
GO
-- foreign key constraint
ALTER TABLE [dbo].[ChildDetail] WITH CHECK
ADD CONSTRAINT [FK_ChildDetail_Child] FOREIGN KEY([ChildID])
REFERENCES [dbo].[Child] ([ChildID])
ON DELETE CASCADE
GO

-- data load
DECLARE @val BIGINT
DECLARE @val2 BIGINT
SELECT @val=1
WHILE @val < 100000
BEGIN
INSERT INTO dbo.[Parent] VALUES(@val,@val,'TEST' + CAST(@val AS
VARCHAR),getdate()-(@val/24.0))

SELECT @val2=1
WHILE @val2 < 20
BEGIN
INSERT INTO dbo.[Child] VALUES ((@val*100000)+@val2,@val,@val,'TEST' +
CAST(@val AS VARCHAR))
INSERT INTO dbo.[ChildDetail] VALUES (1,(@val*100000)+@val2,9999)
INSERT INTO dbo.[ChildDetail] VALUES (2,(@val*100000)+@val2,1111)
INSERT INTO dbo.[ChildDetail] VALUES (3,(@val*100000)+@val2,3333)
INSERT INTO dbo.[ChildDetail] VALUES (4,(@val*100000)+@val2,7777)
SELECT @val2=@val2+1
END
SELECT @val=@val+1

END
GO

-- small table for joins


CREATE TABLE [dbo].[Small](

[SmallID] [bigint] NOT NULL,


[IntDataColumn] [bigint] NULL,
[VarcharDataColumn] [varchar](100) NULL,
CONSTRAINT [PK_Small] PRIMARY KEY CLUSTERED
([SmallID] ASC)
)
GO

-- data load
INSERT INTO dbo.[Small] VALUES(50,80,'TEST5080')
INSERT INTO dbo.[Small] VALUES(510,810,'TEST510810')
INSERT INTO dbo.[Small] VALUES(7001,9030,'TEST70019030')
INSERT INTO dbo.[Small] VALUES(12093,10093,'TEST1209310093')
INSERT INTO dbo.[Small] VALUES(48756,39843,'TEST48756,39843')
INSERT INTO dbo.[Small] VALUES(829870,57463,'TEST82987057463')
GO

-- cleanup statements
--DROP TABLE [dbo].[Small]
--DROP TABLE [dbo].[ChildDetail]
--DROP TABLE [dbo].[Child]
--DROP TABLE [dbo].[Parent]

How Join Order Can Affect the Query Plan


Overview
The order in which the tables in your queries are joined can have a dramatic effect on how
the query performs. If your query happens to join all the large tables first and then joins to a
smaller table later this can cause a lot of unnecessary processing by the SQL engine.
Explanation
Generally speaking the SQL Server Optimizer will try to first join the tables that allows it to
work with the smallest result set possible. To do this the optimizer will try to figure out
which join order provides the smallest result set early in the processing but in very complex
queries it does not try all possible combinations as this can become quite resource
intensive. As a best practice you should try to order your table join so the join that reduces
the result set the most is joined first. Before we start let's add an index to the column in the
table we'll be using as the join condition (you'll see more on this in a later topic).
CREATE NONCLUSTERED INDEX idxChild_ParentID
ON [dbo].[Child] ([ParentID])

-- cleanup statements
--DROP INDEX Child.idxChild_ParentID
Since in most cases this issue arises when queries become really complex and
the optimizer has a lot of possible plans to evaluate, i.e.. multiple table joins, in order to
illustrate this point more clearly we'll use the force order hint with a simple query. Here is
the code to illustrate our poor join order.
SELECT P.ParentID,C.ChildID,S.SmallID
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON C.ParentID=P.ParentID INNER JOIN
[dbo].[Small] S ON S.SmallID=C.ParentID
OPTION (FORCE ORDER)
Looking at the explain plan for this query we can see that the Parent and Child tables are
joined first resulting in 1899980 rows which is then joined to the Small table which reduces
the final recordset to 95 rows.

And now let's join them in the proper order so the smallest table is joined first. Here is the
SQL statement.
SELECT P.ParentID,C.ChildID,S.SmallID
FROM [dbo].[Small] S INNER JOIN
[dbo].[Parent] P ON S.SmallID=P.ParentID INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
Looking at the explain plan for this query we see that the Parent table is first joined to the
Small table resulting in 5 rows which is then joined to the Child table which produces the
final recordset of 95 rows (as above).

Just looking at the explain plans should be enough information for us to see that the second
query will perform better but let's take a look at the SQL Profiler statistics just to confirm. As
we see from below joining the Small table first significantly reduces the amount of data the
query has to process therefore reducing the resources required to execute this query.

CPU

Reads

Writes

Duration

Poor Join Order

265

5935

309

Good Join Order

35

Additional Information

Joining tables in SQL Server

SQL Server Join Hints

Remove Function Calls From the SELECT List


Overview
Using functions as a part of any type of programming is usually a good practice as it
generally makes code more readable and allows you to use to use it over and over again.
This is also true for SQL queries except for the fact that there are cases when running the
same statement over and over again might not be the most efficient way to get your result.
Explanation
Let's take a look at a high level what happens when you use a function in the SELECT list of a
query. Basically that function needs to be called for every record returned by the query. If
this function contains a multi table join to do some sort of lookup this process could get
quite expensive. In some cases in makes sense to remove this function call and simply join
the tables from the function directly to the other tables in your query. To illustrate this point
let's take a simple query that does a lookup into another table. Below is some SQL we will
need to run before out test in order to create our function and add an index on the lookup
column.
CREATE FUNCTION fn_getParentDate (@ParentID bigint) RETURNS datetime AS
BEGIN
DECLARE @DateData datetime
SELECT @DateData = DateDataColumn from [dbo].[Parent] where ParentID=@ParentID
RETURN @DateData
END
GO

CREATE NONCLUSTERED INDEX idxChild_ParentID


ON [dbo].[Child] ([ParentID])

-- cleanup statements
--DROP INDEX Child.idxChild_ParentID
--DROP FUNCTION fn_getParentDate
Now we can write a simple query that calls this function and which does a lookup in the
parent table. Here is the statement.
SELECT dbo.fn_getParentDate(ParentID),ChildID
FROM [dbo].[Child]
Looking at the explain plan for this query we can see it's going to do a scan of the Child table
which makes sense since there is no WHERE clause and for each row returned it uses the
index on the Parent table to do a seek for the lookup.

Now let's rewrite this query and instead of using the function call let's just join the Parent
table in our query and add the DateDataColumn to our SELECT list. Here is the statement.
SELECT P.DateDataColumn,ChildID
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID

Looking at the explain plan for this query we can see it only has to access the Parent table
once but it now has to do a scan of this table before it performs a merge join.

It's not entirely clear from just looking at the above explain plans which statement will
perform better. The index seek in the query with the function might lead you to believe that
it would be faster but let's run the statements and take a look at the SQL Profiler results
below. We can see from these results that in fact the query without the function ran more
than twice as fast and used considerably less resources than the one that uses the function
call.
CPU

Reads

Writes

Duration

Function

14985

5705126

25982

No Function

578

5933

11964

Additional Information

Removing function calls for better performance

Avoid Using Not Equal in WHERE Clause


Overview
In almost all cases when we use the <> operator (or any other operator in conjunction with
the NOT operator, i.e.. NOT IN) index seeks will not be performed and instead a table/index
scan is required.
Explanation
For this example let's make an update to one of our test tables to skew the data a little.
We'll also add an index to the table on the column that will be used in our WHERE clause.
UPDATE [dbo].[Child] SET IntDataColumn=60000

UPDATE [dbo].[Child] SET IntDataColumn=3423 WHERE ParentID=4788


UPDATE [dbo].[Child] SET IntDataColumn=87347 WHERE ParentID=34268
UPDATE [dbo].[Child] SET IntDataColumn=93423 WHERE ParentID=84938
UPDATE [dbo].[Child] SET IntDataColumn=5564 WHERE ParentID=74118

CREATE NONCLUSTERED INDEX idxChild_IntDataColumn


ON [dbo].[Child] ([IntDataColumn],[ParentID]) INCLUDE ([ChildID])

-- cleanup statements
--DROP INDEX Child.idxChild_IntDataColumn
Now let's look at a simple query which would return all the records where IntDataColumn
<> 60000. Here is what that would look like.
SELECT P.ParentID,C.ChildID,C.IntDataColumn
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE C.IntDataColumn <> 60000
Looking at the explain plan for this query we see something really interesting. Since
the optimizer has some statistics on the data in this column it has rewritten the query to use
separate < and > clauses. We can see this in the details of the Index Seek under the Seek
Predicate heading.

Now let's see what happens if we have two <> clauses as follows.
SELECT P.ParentID,C.ChildID,C.IntDataColumn
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE C.IntDataColumn <> 60000 and C.IntDataColumn <> 5564
Looking at the explain plan for this query we also see that the optimizer has done some
manipulation to the WHERE clause. It is now using the new value we added in the Seek
Predicate and the original value as the other Predicate. Both have been changed to use
separate < and > clauses.

Although the changes that the optimizer has made have certainly helped the query by
avoiding an index scan it's always best to use an equality operator, like = or IN, in you query
if you want the best performance possible. One thing you should consider before making a
change like is you want to make sure you have a good understanding of your data as
changes in your table data can then affect your query results. With that said and given that
we know our table has very few records that satisfy the WHERE condition let's flip it to an
equality operator and see the difference in performance. Here is the new query.
SELECT P.ParentID,C.ChildID,C.IntDataColumn
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE C.IntDataColumn IN (3423,87347,93423)

Looking at the explain plan for this query we can see that it's also doing an index seek but
looking deeper into the Seek Predicate we can now see it's using the equality operator
which should be much faster given the number of records that satisfy the WHERE condition.

Now let's take a look at the SQL Profiler results for these two queries. We can see below
that the example using the equality operator runs faster and requires much less resources.
Note: Both queries returned the same result set.
Clause

CPU

Reads

Writes

Duration

Inequality

250

110901

255

Equality

15

654

15

Additional Information

How to find outdated SQL Server statistics

Introduction to SQL Server statistics

Avoid Using Functions in WHERE Clause


Overview
As was mentioned in an earlier topic function calls do make code more readable but in some
cases they cause other unwanted effects that reduce the performance of our queries.
Certain scenarios of using a function in a WHERE clause is one of those cases.
Explanation
There are two reasons why you want to avoid having a function call in your WHERE clause
and more specifically on the columns you are filtering on in your WHERE clause. The first is
the same as the previous topic, the function needs to be called for every record in the result
set which can slow down your query performance. The second reason which can have even
more impact on query performance is the fact that if there is a function surrounding the
column you are trying to filter on, any indexes on that column can not be used. Let's take a
look at a simple example. First we'll need to create an index on the column in
our WHERE clause so we can show how it's not used when we add a function around it.
Here is the code.
CREATE NONCLUSTERED INDEX idxParent_DateDataColumn
ON [dbo].[Parent] ([DateDataColumn])

-- cleanup statements
DROP INDEX Parent.idxParent_DateDataColumn
Now let's look at a simple query which would return all the records in the Parent table that
are less than 30 days old. Here is one way that we could write the SQL statement.
SELECT ParentID
FROM [dbo].[Parent]
WHERE dateadd(d,30,DateDataColumn) > getdate()
Looking at the explain plan for this query we can see that the index on the DateDataColumn
that we created is ignored and an index scan is performed.

Now let's rewrite this query and move the function to the other side of the > operator. Here
is the SQL statement.
SELECT ParentID
FROM [dbo].[Parent]
WHERE DateDataColumn > dateadd(d,-30,getdate())
Looking at the explain plan for this query we can see that the optimizer is now using the
index and performs a seek rather than a scan.

To confirm that it is indeed faster let's take a look at the SQL Profiler results for these two
queries. We can see below that when using an index, as is usually the case, we use fewer
resources and our statement executes faster.
CPU

Reads

Writes

Duration

Function

274

43

No Function

Additional Information

More examples on functions in WHERE clause causing issues with performance

Avoid Using Wildcard Characters to Start Search Criteria


Overview
When using the LIKE operator and having the first character in your search string a wildcard
character, either % or _, the SQL Optimizer will be forced to do a table/index scan when
executing your query.
Explanation

Before we get into the details of our explanation let's first create an index on the column
that we are going to use in the WHERE clause of our query. Here is the code to create that
index on the Child table.
CREATE NONCLUSTERED INDEX idxChild_VarcharDataColumn
ON [dbo].[Child] ([VarcharDataColumn])

-- cleanup statements
--DROP INDEX Child.idxChild_VarcharDataColumn
So why does it have to perform a table/index scan? Since all SQL Server indexes are stored
in a B-Tree structure when we begin our search criteria with a wildcard character
the optimizer is not able to use an index to perform a seek to find the data quickly. It either
performs a scan of the table or a scan of an index if all the columns required for the query
are part of the index. Now I understand that there are some cases where this would not be
possible based on your requirements but the following example shows why you should try
to avoid doing this whenever it's possible. Let's write a simple query that performs a search
on the column we indexed above. Here is the code for this simple SQL statement.
SELECT * FROM [dbo].[Child]
WHERE VarcharDataColumn LIKE '%EST5804%'
Looking at the explain plan for this query we can see that the index on the
VarcharDataColumn that we created is ignored and a clustered index scan (essentially a
table scan) has to be performed.

Now let's change the search string in this query to remove the wildcard so the string you are
searching for begins with a valid character. Here is the updated SQL statement. Note: I
picked the search criteria so that both queries return the same result set so that the results
are not skewed by one query returning a larger result set.
SELECT * FROM [dbo].[Child]
WHERE VarcharDataColumn LIKE 'TEST5804%'
Looking at the explain plan for this query we can see that the optimizer is now using the
index we created and performs a seek rather than a scan.

Although we should be able to tell from just comparing the explain plans that the second
query will perform better let's just confirm that it indeed uses less resources and executes
faster than our initial query by looking at the SQL Profilerresults. We can see from below
that by removing the wildcard character from the start of the query we do in fact see quite a
big improvement.
CPU

Reads

Writes

Duration

Wildcard at Start

328

7042

404

No Wildcard at Start

670

64

Additional Information

Investigate using Full Text Search as an alternative to "LIKE '%abc'"

Use a Derived Table in Place of IN Predicate With Aggregate Functions


Overview
Using a derived table in place of the IN predicate when we are aggregating data allows us to
only have to process certain table records once therefore reducing the amount of resources
required to execute a query.
Explanation
When we use the IN predicate we first have to process the data in our subquery then we are
processing a lot of the same data again (depending on the WHERE clause) in our main query.
If we can use a derived table to do most of the work we can avoid the double processing of
data. Before we take a look at an example to illustrate this point we'll need to add an index
to our Parent table so the results are not skewed by having to do a table scan. Here is the
code to create this index.
CREATE NONCLUSTERED INDEX idxParentID_IntDataColumnParentID

ON [dbo].[Parent] ([IntDataColumn],[ParentID])

-- cleanup statements
DROP INDEX Parent.idxParentID_IntDataColumnParentID
Let's look at a query that uses the IN predicate to return the second largest value from a
table. One way to do this would be as follows.
SELECT MIN(IntDataColumn)
FROM [dbo].[Parent]
WHERE ParentID IN (SELECT TOP 2 ParentID
FROM [dbo].[Parent]
ORDER BY IntDataColumn DESC)
Just by looking at the query we can see we are going to access the Parent table twice to get
this result. From theexplain plan we can see that the second access does use an index seek
so it might not be too much of an issue.

Now let's rewrite this query and use a derived table to generate the result. Here is that SQL
statement.
SELECT MIN(IntDataColumn)
FROM (SELECT TOP 2 IntDataColumn
FROM [dbo].[Parent]
ORDER BY IntDataColumn DESC) AS A
Notice that from the query we only reference the Parent table once and the explain
plan confirms that we no longer have to access the Parent table a second time, even with an
index.

We can also see from the SQL Profiler results below that we do get some significant
resource savings even for this simple query. Although the CPU and total duration were the
same, we only had to perform 2 reads as opposed to the 8 required by the original query.
CPU

Reads

Writes

Duration

IN Predicate

Derived Table

Additional Information

SQL Server aggregate functions

Make Sure All JOIN Columns are Indexed


Overview
There are many different scenarios when an index can help the performance of a query and
ensuring that the columns that make up your JOIN predicate is an important one.
Explanation
In order to illustrate this point let's take a look at a simple query that joins the Parent and
Child tables. We'll keep this result set small by including a ParentID filter in
the WHERE clause. Here is the statement.
SELECT *
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE P.ParentID=32433
Looking at the explain plan for this query we can see that the SQL Optimizer has to perform
an index scan on the Child table even though we are only looking for a specific ParentID
from the Parent table.

Let's add an index on the join column, Child.ParentID, and see how this effects the explain
plan. Here is the SQL statement.
CREATE NONCLUSTERED INDEX idxChild_ParentID
ON [dbo].[Child] ([ParentID])

-- cleanup statements
DROP INDEX Child.idxChild_ParentID
Using the same query above if we regenerate the explain plan after adding the index we see
below that the SQL Optimizer is now able to access the Child table using an index seek
which will more than likely run much faster and use less resources.

Let's confirm our assumption by taking a look at the SQL Profiler output of both queries. We
see below that we were correct in our assumption. With the index added the query ran
much faster, used much less CPU and performed way fewer reads.
CPU

Reads

Writes

Duration

No Index

110

14217

110

Index

63

There is one other thing I'd like to mention when it comes to adding indexes on join
columns. As a general guideline I usually start out by indexing all of my foreign key columns
and only remove them if I find that they have a negative impact. I recommend this practice
because more often than not these are the columns that the tables are joined on and you
tend to see a pretty good performance benefit from having these columns indexed.
Additional Information

Choosing non-clustered indexes

Why foreign keys are important

Use WHERE, JOIN, ORDERBY, SELECT Column Order When Creating Indexes
Overview
The order that the columns are specified in your indexes has an effect on whether or not the
entire index can be used when the SQL Optimizer parses your query.
Explanation
When looking at an explain plan for a query you'll notice that the SQL Optimizer first parses
the WHERE clause, then theJOIN clause, followed by the ORDER BY clause and finally it
processes the data being selected. Based on this fact it makes sense that you would need to
specify the columns in your index in this order if you want the entire index to be used. This
is especially true if you are trying to create a covering index. Let's look at the following
simple query as an example.
SELECT P.ParentID,C.ChildID,C.IntDataColumn,C.VarcharDataColumn
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE C.IntDataColumn=32433
ORDER BY ChildID
And we'll use the following index statement to show how progessively adding columns to
the index in the order we mentioned above, WHERE-JOIN-ORDER BY-SELECT, will improve
the queries performance. A couple things to note. First, I've included the entire index

statement here but you can add the columns one at a time to see the difference in each
step. Second, the second create index statement is just an alternative to adding
the SELECT columns directly to the index, instead they are part of an INCLUDE clause.
CREATE NONCLUSTERED INDEX idxChild_JOINIndex
ON [dbo].[Child] ([IntDataColumn],[ParentID],[ChildID],[VarcharDataColumn])

CREATE NONCLUSTERED INDEX idxChild_JOINIndex


ON [dbo].[Child] ([IntDataColumn],[ParentID],[ChildID]) INCLUDE ([VarcharDataColumn])

-- cleanup statements
DROP INDEX Child.idxChild_JOINIndex
Let's first take a look at the explain plans for each of these queries as we progessively add
columns to the index.
No Index

WHERE Index

WHERE,JOIN Index

WHERE,JOIN,ORDER BY Index

WHERE,JOIN,ORDER BY, SELECT Index

WHERE,JOIN,ORDER BY, INCLUDE Index

It's hard to tell just from the explain plans if each step will see an improvement or not
except for maybe just adding the initial index which eliminated the index scan so let's take a
look at the SQL Profiler results to see the actual performance benefit.
Table Type

CPU

Reads

Writes

Duration

No Index

110

14271

103

WHERE Index

129

WHERE, JOIN Index

117

WHERE, JOIN, ORDER BY Index

117

WHERE, JOIN, ORDER BY, SELECT Index

60

WHERE, JOIN, ORDER BY, INCLUDE Index

60

We can see from these results that as we add each column we do see the SQL engine has to
perform less reads to execute the query thereby executing a little faster. The only exception
to this is the step where we added the ORDER BY to the index but this can be attributed to
the fact that we are ordering by ChildID which is a primary key so it's already sorted. The

other thing we should note is that there isn't really a performance difference between
adding theSELECT column directly to the index vs. using the INCLUDE clause.
Additional Information

SQL Server index column order - Does it matter?

Make Sure All Tables Have a Clustered Index Defined


Overview
There are two different storage types for tables in SQL Server, Heap tables and Clustered
tables. Heap tables are tables that do not have a clustered index defined on them and
Clustered tables are tables that have a clustered index defined. There are few reasons why
it's recommended that most if not all tables have a clustered index defined.
Explanation
The first benefit to having a clustered index defined on a column is it saves a lookup when
you query the based on the indexed column because the data is part of the index. It's
because of this that it's recommended that you create your clustered indexes on columns
that are most heavily used in WHERE clauses not necessarily the primary key column if it's
not used that often to access the table. For example, if you had an OrderDetail table where
the primary key was an identity column OrderDetailID but most queries accessed the table
using the OrderID column then it would be better if the clustered index was on the OrderID
column as this would produce more row level locks when the tables is accessed. Let's take a
look at the simple query of the Parent table. First we'll have to create an index on one of the
secondary columns and also remove the clustered index from this table. Here are the SQL
statements that do both of these operations.
ALTER TABLE dbo.Parent DROP CONSTRAINT PK_Parent

ALTER TABLE dbo.Parent ADD CONSTRAINT


PK_Parent PRIMARY KEY NONCLUSTERED (ParentID)

CREATE NONCLUSTERED INDEX idxParent_IntDataColumn


ON [dbo].[Parent] ([IntDataColumn])
And here is our query of the Parent table.

SELECT * FROM [dbo].[Parent] P


WHERE P.IntDataColumn=32433
Looking at the explain plan for this query we can see that the SQL Optimizer has to perform
a lookup on the Heap table after finding the record in the index.

Now let's recreate this index on the InDataColumn as a clustered index. Here is the SQL
statements.
DROP INDEX Parent.idxParent_IntDataColumn

CREATE CLUSTERED INDEX idxParent_IntDataColumn


ON [dbo].[Parent] ([IntDataColumn])
Checking the explain plan now we can see that the SQL optimizer now just has to do an
index seek.

Looking at the SQL Profiler results for this query we can confirm that having a clustered
index does in fact allow SQL Server to execute the query using less resources, specifically the
number of reads it has to perform to process the data.
Table Type

CPU

Reads

Writes

Duration

Heap

Clustered

The second benefit to having a clustered index on a table is it provides a way to reorganize
the table data when it becomes fragmented. Let's run an update on our table so it becomes
a little bit fragmented. We'll also put the table back to its original state with only the
clustered primary key to make it easier to view the results. Here are the SQL statements to
perform these tasks.
DROP INDEX Parent.idxParent_IntDataColumn

ALTER TABLE dbo.Parent DROP CONSTRAINT PK_Parent

ALTER TABLE dbo.Parent ADD CONSTRAINT


PK_Parent PRIMARY KEY CLUSTERED (ParentID)

DECLARE @x BIGINT
DECLARE @y BIGINT
SELECT @x=1
WHILE @x < 100000
BEGIN
UPDATE [dbo].[Parent] SET
VarcharDataColumn='TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST

TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTE
STTESTTESTTESTTESTTESTTEST'+
CAST(@x AS VARCHAR)
WHERE ParentID=@x
SELECT @x=@x+1
END
We can double check the fragmentation level of our table using the following query.
SELECT
index_level,avg_fragmentation_in_percent,fragment_count,avg_fragment_size_in_pages,p
age_count
FROM sys.dm_db_index_physical_stats(DB_ID(N'master'), OBJECT_ID(N'dbo.Parent'),
NULL, NULL , 'DETAILED')
We can see from the following results after executing the update above we have some
fragmentation in our table.
index_lev avg_fragmentation_in_pe fragment_co
el
rcent
unt

avg_fragment_size_in_p page_cou
ages
nt

14.3

3507

6.9

24394

5.3

111

1.0

112

Now if our table did not have a clustered index we would have to create a temporary table
and reload the data into this table, then recreate all of the indexes, then drop the original
table and rename the temporary table. We would also have to have disabled any referential
integrity constraints before doing any of this and add them back when we were done. All of
these tasks would also require downtime for the application. Since our table does have a
clustered index we can simply rebuild this index to reorganize the table data. Doing a
regular rebuild would require some downtime but we would avoid all the extra steps

required by the reload. If we don't have the luxury of being able to take our application
offline to do maintenance the SQL Server does provide the ability to perform this task
online, while the table is being accessed. Here is the SQL statement to do an online rebuild
(note: simply remove the WITH condition or replace ON with OFF to perform a regular
offline rebuild).
ALTER INDEX PK_Parent ON Parent REBUILD WITH (ONLINE=ON)
After running the index rebuild statement we can again check the fragmentation in our
table using thesys.dm_db_index_physical_stats query from earlier.
index_lev avg_fragmentation_in_pe fragment_co
el
rcent
unt

avg_fragment_size_in_p page_cou
ages
nt

0.01

18

694.4

12500

7.5

30

Additional Information

Find a better candidate for your SQL Server clustered indexes

Rebuilding indexes using the ONLINE option

Use DELETE CASCADE Option to Handle Child Key Removal in Foreign Key
Relationships
Overview
Using the DELETE CASCADE option in your foreign key constraint definitions means better
performance and less code when removing records from tables that have a parent-child
relationship defined.
Explanation
Let's first confirm that our current schema does indeed have the DELETE CASCADE option
defined on the foreign key between the Parent and Child table. Here is the SQL statement
the check this as well as the result.
SELECT name,delete_referential_action_desc

FROM sys.foreign_keys

name

delete_referential_action_desc

FK_Child_Parent

CASCADE

Now that we've confirmed we have this option defined let's delete a record from the Parent
table using the following SQL statement.
DELETE FROM [dbo].[Parent] where ParentID=82433
Looking at the explain plan for this query we want to note that the SQL Optimizer is first
removing the child records then performing the delete on the Parent table. Because of this
it only needs to access each table once.

Now let's remove the DELETE CASCADE option from our foreign key definition and see if
there are any differences. In order to do this we'll need to drop and recreate the foreign key
without the DELETE CASCADE option. Here are the SQL statements to make this change.
ALTER TABLE [dbo].[Child] DROP CONSTRAINT [FK_Child_Parent]

ALTER TABLE [dbo].[Child] WITH CHECK


ADD CONSTRAINT [FK_Child_Parent] FOREIGN KEY([ParentID])
REFERENCES [dbo].[Parent] ([ParentID])

Once the foreign key has been recreated we can run a second delete to see if there is any
difference in performance. One thing to note here is that without the DELETE CASCADE
option defined we need to run an additional delete statement to remove the records from
the Child table first. Here are the SQL statements to perform the delete.
DELETE FROM [dbo].[Child] where ParentID=62433
DELETE FROM [dbo].[Parent] where ParentID=62433
Looking at the explain plan for these statements we see that they are quite similar. The only
difference being that because we are executing separate delete statements the Child table
needs to be accessed a second time to check the foreign key constraint when deleting from
the Parent table.

Using the SQL Profiler results from each query we can confirm this extra scan of the Child
table does indeed mean that the DELETE CASCADE option performs better. We can see
below that the DELETE CASCADE option uses less resources in every category and runs
about 20% faster.
CPU

Reads

Writes

Duration

No Delete Cascade

344

28488

399

Delete Cascade

250

14249

312

Additional Information

Using the DELETE CASCADE option for foreign keys

The importance of SQL Server foreign keys

You can't define a foreign key constraint that contains multiple cascade paths

Denormalize For Performance


Overview
Even though it might mean storing a bit of redundant data, schema denormalization can
sometimes provide better query performance. The only question then becomes is the extra
space used worth the performance benefit.
Explanation
Before we get started on the example let's make sure our join columns are indexed (as we
saw in an earlier topic) so the performance results are not skewed by any scans. Here are
the SQL statements to create these indexes.
CREATE NONCLUSTERED INDEX idxChild_ParentID
ON [dbo].[Child] ([ParentID])

CREATE NONCLUSTERED INDEX idxChildDetail_ChildID


ON [dbo].[ChildDetail] ([ChildID])
To test the performance of both our normalized and denormalized schemas we'll use the
following simple 3 table join query.
SELECT *
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID INNER JOIN
[dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID
WHERE P.ParentID=32433
Looking at the explain plan for this query it behaves just as we would suspect, using index
seeks and lookups as it joins each table in the query.

Now let's do some denormalization by moving the ChildDetail table data into the Child
table. We'll first need to add the required columns to the Child table. Then before we can
migrate any data we'll need to remove the primary and foreign key constraints and once the
data is migrated we can recreate them. The following SQL statements perform these tasks.
ALTER TABLE [dbo].[Child] ADD [ChildDetailID] [bigint] NOT NULL DEFAULT
0,[ExtraDataColumn] [bigint]

ALTER TABLE [dbo].[ChildDetail] DROP CONSTRAINT [FK_ChildDetail_Child]

ALTER TABLE [dbo].[Child] DROP CONSTRAINT [PK_Child]

INSERT INTO [dbo].[Child]


SELECT C.ChildID,C.ParentID,C.IntDataColumn,C.VarcharDataColumn,
CD.ChildDetailID,CD.ExtraDataColumn
FROM [dbo].[Child] C INNER JOIN

[dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID

DELETE FROM Child where ChildDetailID=0

ALTER TABLE [dbo].[Child] ADD CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED


([ChildID] ASC, [ChildDetailID] ASC)

DROP TABLE [dbo].[ChildDetail]


We'll also need to update the query used above as we no longer need to access the
ChildDetail table. Here is the updated query.
SELECT *
FROM [dbo].[Parent] P INNER JOIN
[dbo].[Child] C ON P.ParentID=C.ParentID
WHERE P.ParentID=32433
The explain plan for this query looks very similar to the original one just we no longer have
the join with the ChildDetail table.

Looking at the SQL Profiler results from these two queries we do see a pretty big benefit
from removing the join of the ChildDetail table. SQL Server performed fewer reads and the
total execution time was also improved.

CPU

Reads

Writes

Duration

Normalized

365

75

Denormalized

250

We should also take a look at how much extra space we are using as this is important in
deciding whether or not to implement this type of change. The following SQL statement will
tell you the amount of disk space each of your tables is consuming.
SELECT o.name,SUM(reserved_page_count) * 8.0 / 1024 AS 'Size (MB)'
FROM sys.dm_db_partition_stats ddps INNER JOIN
sys.objects o ON ddps.object_id=o.object_id
WHERE o.name in ('Parent','Child','ChildDetail')
GROUP BY o.name
The following table shows the results of the above query for both the normalized and
denormalized table schemas. As we can see the denormalized table schema does use about
18MB more disk space. The only question now becomes, is the performance benefit worth
the space this redundant data is holding.
Table

Normalized Size (MB)

Denormalized Size (MB)

Parent

5.9

5.9

Child

151.6

679.2

ChildDetail

509.6

N/A

Total

667.1

685.1

Additional Information

Optimizing the Database Design by Denormalizing

Вам также может понравиться