Академический Документы
Профессиональный Документы
Культура Документы
Abstract
This is a small article on the Index in DB and defining Clustered and Non-Clustered Index with
reference to the Microsoft SQL server DB.
1 Introduction
In our day to day business application huge amount of transaction
information is being stored on our database. In the beginning when the
amount of records is less and everything is working fine and then
suddenly after some time the support guy comes and say query is taking
time to execute and the system is taking the huge amount of time to
process the records. Indexes comes to prevention where we forecast our
SQL queries and how db generating the Execution SQL plan for the query.
Index result in a tremendous improvement in SQL query if used properly
and vice versa also is true.
2. Indexes
Indexes are used to explicitly speed up the SQL statement execution on a table. The
index points to the location of the rows containing the value.
The default type of the Index used in the DB are B-Tree index
By default, when we execute a select command with no primary key and index
based on a search criteria e.g. Take the below numbers as an Id and we will be
writing our select command with id in where clause. So when the database will
execute the query it will search sequentially row by row the Id right from the
beginning till it find the desired row. So we can imagine the scenario if we have
billions of data.
So in order to minimize the sequential search problem there when B-Tree comes into
the picture. B-Tree is a self-balancing tree data structure which is an extension of
Binary tree that keeps data sorted and allows searches, sequential, insertion and
deletion in a quick manner.
Figure 2: B-Tree
Sequential Search is called Table Scan and when they are searched using B-tree
they are called Index seek or index scan.
Lets do a demo to check what kind of scan does SQL server does when there are no
indexes and primary key in table.
So below we have a table called CityName which doesnt have any indexer and will
contain information regarding the City.
So lets run our select command with one particular Cityid selected
As shown in the diagram and we will use, display Estimated Execution plan.
Execution plan tells us how the SQL query performs when we dont have any index
in our table.
Scan Count: This count specifies that the optimizer has chosen a plan that
caused 1 Scan count.
Logical Reads: This number specifies the number of pages read from the
cache. This is the important parameter that is needed to be focused. This
number can be decreased using index structure.
Physical Read: This number specifies the number of pages actually read
from the disk. These are pages that were not in data cache. SQL server
performs everything on cache if the requested page is not present it will read
the page from the disk and then put the same in the cache, then use that
page.
Lob logical Reads: This number grows if request any large object such as
an image, varchar(max), nvarchar(max).
Lob physical read: is same as physical read for lob pages.
Lob read-ahead reads: This number specifies Number
of text, ntext, image or large value type pages placed into the cache for the
query.
Now lets follow the below steps and add an index into the table and check the
performance.
Now lets again run our execution plan and check is it again using Table Scan?
Now lets run our performance statistics and check is there any difference?
(1 row(s) affected)
Table 'CityName'. Scan count 1, logical reads 2 and physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0.
We can see the difference before the logical reads were 26 and now they have
decreased to 2. So we can say there is a performance improvement by using an
Index.
3. CLUSTERED INDEX:
Clustered Index determines the physical order of data in the table. In Clustered
Index the rows are stored physically on the disk in the same order as the Index. So
therefor there can only be one Clustered Index because the records can only be
ordered in one order.
By the statement Clustered Index determines the physical order of data in the table
means that data inserted in the table is order wise reference to the Index Column.
That means that if we insert data in unordered way still SQL
server will insert
in order wise according to the index. If the table is not a Clustered Index, its rows
are inserted in unordered structure known as heap.
Let take an example and create two same table with different name and create
index in one table and leave the second and try to insert the values in the table and
fetch the same.
mrp decimal,
sp decimal
)
CREATE TABLE tblOrderWithIndex
(
order_id int primary key,
sku_id int,
description varchar(50),
mrp decimal,
sp decimal
)
To check the Index in the table we can use the following command
Exec Sp_helptIndex tableName
So here we have created two tables, one with primary key and leaving the other
table without an index. Lets now insert some values into the table.
Inserting values in tblOrderWithIndex table
insert into tblOrderWithIndex values(10,1234,'Apple Iphone 5',23000,22000);
insert into tblOrderWithIndex values(1,1235,'Samsung Note 3',21000,20000);
Lets run the select command to check how the records are being retrieved in an order or not?
Figure 12: Order wise records are retrieved from the database.
I.e the Clustered Index stores the records in an ordered format while if there is no
index the records are not stored in order format.
We can have only one Clustered Index but however the index can have multiple
columns can be termed as (Composite Index)
In order to create the Composite Index, we just add the column name to the Index
CREATE CLUSTERED INDEX In_tblOrderWithIndex_skui_id_description
on tblOrderWithIndex(description asc,sku_id asc)
Now lets execute our select command to retrieve the rows and check how the
records are now stored.
We can see that now we are getting records based on sku id and description in asc
mode.
4. Non-clustered Index
Non-clustered index is a pointer to the data. The data is stored in one place, the
index in another place. The index will have a pointer to the location of the data.
Since, the data is stored in another place than the actual data we can have a
different Non-Clustered Index. We can determine the following as shown below
where we have a pointer table with a row address pointing to the actual data.
order_
id
2
10
1
sku_id
1233
1234
1235
Description
Apple Iphone 4
Apple Iphone 5
Samsung Note 3
mrp
11000
23000
21000
sp
10000
22000
20000
order_id
ROW
2 ROW
10 ROW
1 ROW
Locator
ADDRESS
ADDRESS
ADDRESS
Table 2. Reference Table pointing towards the Actual Table rows address.
We can have only one clustered Index while we can have as many Nonclustered Index.
Clustered Index are faster than the Non Clustered as Clustered Index
determines the physical order of the records.
References
http://odetocode.com/articles/70.aspx
https://www.simple-talk.com/content/article.aspx?article=934