Page Split

http://sqlblogcasts.com/blogs/tonyrogerson/search.aspx?
q=index
What is a page split? What happens? Why does it happen?

Why worry?
You’ve probably heard the term banded around but do you know what it means and what it means to the
performance of your application? I’m going to demonstrate a page split, use DBCC PAGE to show what
happens, SQL Profiler to show the locks and talk round what’s going on.
Terms
IAM Index Allocation Map

(See http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/24/645803.aspx for a good talk about these)
GAM Global Allocation Map
SGAM Shared Global Allocation Map
PFS Page Free Space
(See http://blogs.msdn.com/sqlserverstorageengine/archive/2006/07/08/under-the-covers-gam-sgam-and-pfs-pages.aspx for a
good talk about these)
Page 8Kbytes of data
Extent 8 Pages (totals 64Kbytes)
Background
A page is 8Kbytes of data which can be index related, data related, large object binary (lob’s) etc...
When you insert rows into a table they go on a page, into ‘slots’, your row will have a row length and you
can get only so many rows on the 8Kbyte page. What happens when that row’s length increases because
you entered a bigger product name in your varchar column for instance, well, SQL Server needs to move
the other rows along in order to make room for your modification, if the combined new length of all the
rows on the page will no longer fit on that page then SQL Server grabs a new page and moves rows to
the right or left of your modification onto it – that is called a ‘page split’.
Example
Create a new database, a new table and put some rows in it.
use master
go
drop database pagesplit
go

create database pagesplit
go
use pagesplit
go

create table mytest (
something_to_see_in_data char(5) not null constraint pk_mytest primary
key clustered,
filler varchar(3000) not null
)
go

insert mytest ( something_to_see_in_data, filler ) values( '00001', replicate( 'A',
3000 ) )
insert mytest ( something_to_see_in_data, filler ) values( '00002', replicate( 'B',
1000 ) )
insert mytest ( something_to_see_in_data, filler ) values( '00003', replicate( 'C',
3000 ) )
go
Now look at the contents of one of your data pages, use DBCC IND to identify what pages are targeted
by our object.
DBCC IND ( 0, 'mytest', 1);
GO

We can see that page 80 and 73 have data pertaining to our object, looking at the output, a PageType of
10 indicates an IAM page and a PageType of 1 a data page. I’m not going to go into IAM’s because I’d
lose the focus of what I’m talking about, but I’ve put a reference at the top for further reading.

So here we will concentrate on the data page (page 73).

dbcc traceon( 3604 ) -- Output to the console so we can see output
go

dbcc page( 0, 1, 73, 1 ) -- page ( <0 for current db>, <file>, <page>, <level of
detail> )
go
Page 73 contains all our data; the row offset table is shown below...
OFFSET TABLE:

Row - Offset
2 (0x2) - 4128 (0x1020)
1 (0x1) - 3112 (0xc28)
0 (0x0) - 96 (0x60)
Before we update our middle row to force a page split we should get SQL Profiler running so we can
capture the Locks used so we can identify when the split has occurred and the locks in use.
Start SQL Profiler, New Trace, select ‘blank trace’, now on the events selection bit – choose Locks
(Lock:Aquired, Lock:Released and Lock:Esculation), on TSQL choose SQL:StmtStarting and
SQL:StmtCompleted. Now remember to set the filter to the SPID that you are going to run the test in.
First, let’s update a row that does not cause a page split...
update mytest
set filler = replicate( 'B', 1000 )
where something_to_see_in_data = '00002'
Looking at the profiler trace (below) we can see there is a simple exclusive lock on the page and key
being updated. The important part here is that no other pages are locked, ok, in reality we may have
indexes on the columns being updated that would indeed cause locks but I’m keeping it simple!
Lets now update the middle row but this time make it so that the combined length of the rows currently
on the page will no longer fit into 8Kbytes.
update mytest
set filler = replicate( 'B', 3000 )
where something_to_see_in_data = '00002'
Before looking at the profiler output, let’s take a look and see what pages we now have allocated to our
object...
DBCC IND ( 0, 'mytest', 1);
GO

Interestingly we now have 2 more, page 89 and page 109, looking at the PageType again, page 109 is a
data page (type of 1) and 89 is an index page (type of 2); for a single data page there is no balanced
tree to speak of because there is a one to one relationship between the root node of the index and the
data page. Also, look at the page chain (columns NextPagePID and PrevPagePID ), 109 links to 73 and
vice versa. Our IAM is still page 80.
Visualisation of what’s going on
Looking at the Profiler trace we see a number of things have happened...
1) The exclusive lock is taken on the page and key being updated.
2) The IAM page is locked (page 80), well, one of the slots in it – this is because we are being
allocated some new data pages and IAM keeps track of these, you will notice that the lock is
acquired and released within the transaction and not at the end, this keeps contention low;
imagine how much slower inserts would be on a table if you also locked the IAM page until the
end of the transaction – it would be disaster.
3) The new index page is locked (page 89), again – notice how that lock is also released before the
statement commits, why? Well, you aren’t updating the index keys so effectively there really isn’t
a change so a rollback doesn’t really mean anything. One of the recommendations I make is that
you be careful what columns you index – be aware of the additional locking time incurred when
you are updating columns in an index.
4) The new data page is locked (page 109), notice how that lock is kept until after the update has
finished (the transaction completes).
5) The page is split – some of the rows (id 2 and 3) are moved from page 73 to page 109, locks
aren’t taken on the keys, in fact if we start a transaction and update row ‘3’ thus to prevent
another connection from updating or deleting the row then SQL Server still (under the covers)
moves the row to the other page – clever.
Summing Up
A page split does require resources and on a system that has continual page splits then that is going to
affect performance and scalability.
Concurrency (blocking) problems are kept to a minimum by some internal tricks so they aren’t as
expensive as they used to be in previous editions of SQL Server.
How do you avoid them? There are strategies for doing this, one such strategy is to use a low fill factor
when you create and index, but that will be at the expense of read operations and because you usually
have more reads to writes then that’s bad.
Personally, the usual thing I recommend (and by no means is this a one box fits all recommendation) is
that you defrag indexes using DBCC INDEXDEFRAG (SQL 2000) or the equivalent ALTER INDEX
REORGANIZE (SQL 2005), other things are to pre-populate columns so the row length doesn't grow when
you update, don't use NULL's everywhere. If you have say a varchar type column that is frequently
updated to different sizes then perhaps consider putting that in its own table.

Page Split

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Page Split

Загружено:

Авторское право:

Доступные форматы

http://sqlblogcasts.com/blogs/tonyrogerson/search.aspx?

What is a page split? What happens? Why does it happen?

IAM Index Allocation Map

Visualisation of what’s going on

Looking at the Profiler trace we see a number of things have happened...

Вам также может понравиться