Вы находитесь на странице: 1из 33

SQL Server 2005

Physical Database
Architecture

All the contents of this file are based on the information published in MSDN,
TechNet and SQL Server 2005 Online Tutorials.
Page1
Introduction
SQL Server 2005 is a set of components that work together to meet the data
storage and analysis needs of the largest Web sites and enterprise data processing
systems. The following components work together to manage data effectively in
SQL server 2005.

Physical Database Architecture


Describes the logical components defined in SQL Server databases and how
they are physically implemented in database files.

Relational Database Engine Architecture


Describes the features of the database engine that make it efficient at
processing large numbers of concurrent requests for data from many users.

Physical Database Architecture


The physical database architecture contains

• Pages and Extents


• Physical Database File and Filegroups
• Space Allocation and Reuse
• Table and Index Architecture
• Transaction Log Architecture

Pages and Extends


The fundamental unit of data storage in SQL Server is the page. The disk space
allocated to a data file (.mdf or .ndf) in a database is logically divided into pages
numbered contiguously from 0 to n. Disk I/O operations are performed at the page
level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to
efficiently manage the pages. All pages are stored in extents.

Pages
In SQL Server, the page size is 8 KB. This means SQL Server databases have 128
pages per megabyte. Each page begins with a 96-byte header that is used to store
system information about the page. This information includes the page number,
page type, the amount of free space on the page, and the allocation unit ID of the
object that owns the page.

The following table shows the page types used in the data files of a SQL Server
database.
Page Type Contents
Data Data rows with all data, except text, ntext, image,
nvarchar(max), varchar(max), varbinary(max), and xml
data, when text in row is set to ON.
Page1

Index Index entries.


Text/Image Large object data types:

text, ntext, image, nvarchar(max), varchar(max),
varbinary(max), and xml data
Variable length columns when the data row exceeds 8
KB:
• varchar, nvarchar, varbinary, and sql_variant
Global Allocation Map, Information about whether extents are allocated.
Shared Global Allocation
Map
Page Free Space Information about page allocation and free space
available on pages.

Index Allocation Map Information about extents used by a table or index per
allocation unit.

Bulk Changed Map Information about extents modified by bulk operations


since the last BACKUP LOG statement per allocation
unit.

Differential Changed Map Information about extents that have changed since the
last BACKUP DATABASE statement per allocation unit.

Note: Log files do not contain pages; they contain a series of log records.

Data rows are put on the page serially, starting immediately after the header. A row
offset table starts at the end of the page, and each row offset table contains one
entry for each row on the page. Each entry records how far the first byte of the row
is from the start of the page. The entries in the row offset table are in reverse
sequence from the sequence of the rows on the page.

Large Row Support


Rows cannot span pages in SQL Server 2005, however portions of the row may be
moved off the row's page so that the row can actually be very large. The maximum
amount of data and overhead that is contained in a single row on a page is 8,060
bytes (8 KB). However, this does not include the data stored in the Text/Image page
Page1

type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar,
nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and
variable columns in a table exceeds the 8,060 byte limitation, SQL Server
dynamically moves one or more variable length columns to pages in the
ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest
width. This is done whenever an insert or update operation increases the total size
of the row beyond the 8060 byte limit. When a column is moved to a page in the
ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the
IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the
row size, SQL Server dynamically moves the columns back to the original data page.

Extents
Extents are the basic unit in which space is managed. An extent is eight physically
contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per
megabyte.
To make its space allocation efficient, SQL Server does not allocate whole extents to
tables with small amounts of data. SQL Server has two types of extents:

• Uniform extents are owned by a single object; all eight pages in the extent
can only be used by the owning object.
• Mixed extents are shared by up to eight objects. Each of the eight pages in
the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the
table or index grows to the point that it has eight pages, it then switches to use
uniform extents for subsequent allocations. If you create an index on an existing
table that has enough rows to generate eight pages in the index, all allocations to
the index are in uniform extents.

Physical Database Files and Filegroups


SQL Server 2005 maps a database over a set of operating-system files. Data and
log information are never mixed in the same file, and individual files are used only
by one database. Filegroups are named collections of files and are used to help with
data placement and administrative tasks such as backup and restore operations.

Database Files
SQL Server 2005 databases have three types of files:
• Primary data files
Page1

The primary data file is the starting point of the database and points to the
other files in the database. Every database has one primary data file. The
recommended file name extension for primary data files is .mdf.
• Secondary data files
Secondary data files make up all the data files, other than the primary data
file. Some databases may not have any secondary data files, while others
have several secondary data files. The recommended file name extension for
secondary data files is .ndf.
• Log files
Log files hold all the log information that is used to recover the database.
There must be at least one log file for each database, although there can be
more than one. The recommended file name extension for log files is .ldf.

SQL Server 2005 does not enforce the .mdf, .ndf, and .ldf file name extensions, but
these extensions help you identify the different kinds of files and their use.

In SQL Server 2005, the locations of all the files in a database are recorded in the
primary file of the database and in the master database. The Database Engine uses
the file location information from the master database most of the time. However,
the database engine uses the file location information from the primary file to
initialize the file location entries in the master database in the following situations:

• When attaching a database using the CREATE DATABASE statement with


either the FOR ATTACH or FOR ATTACH_REBUILD_LOG options.
• When upgrading from SQL Server version 2000 or version 7.0 to SQL Server
2005.
• When restoring the master database.

Logical and Physical File Names


SQL Server 2005 files have two names:

logical_file_name
The logical_file_name is the name used to refer to the physical file in all Transact-
SQL statements. The logical file name must comply with the rules for SQL Server
identifiers and must be unique among logical file names in the database.

os_file_name
The os_file_name is the name of the physical file including the directory path. It
must follow the rules for the operating system file names.

The following illustration shows examples of the logical file names and the physical
file names of a database created on a default instance of SQL Server 2005:
Page1
SQL Server data and log files can be put on either FAT or NTFS file systems. NTFS is
recommended for the security aspects of NTFS. Read/write data filegroups and log
files cannot be placed on an NTFS compressed file system. Only read-only
databases and read-only secondary filegroups can be put on an NTFS compressed
file system.

When multiple instances of SQL Server are run on a single computer, each instance
receives a different default directory to hold the files for the databases created in
the instance.

Data File Pages


Pages in a SQL Server 2005 data file are numbered sequentially, starting with zero
(0) for the first page in the file. Each file in a database has a unique file ID number.
To uniquely identify a page in a database, both the file ID and the page number are
required. The following example shows the page numbers in a database that has a
4-MB primary data file and a 1-MB secondary data file.
Page1
The first page in each file is a file header page that contains information about the
attributes of the file. Several of the other pages at the start of the file also contain
system information, such as allocation maps. One of the system pages stored in
both the primary data file and the first log file is a database boot page that contains
information about the attributes of the database.

File Size
SQL Server 2005 files can grow automatically from their originally specified size.
When you define a file, you can specify a specific growth increment. Every time the
file is filled, it increases its size by the growth increment. If there are multiple files in
a filegroup, they will not autogrow until all the files are full. Growth then occurs in a
round-robin fashion.

Each file can also have a maximum size specified. If a maximum size is not
specified, the file can continue to grow until it has used all available space on the
disk. This feature is especially useful when SQL Server is used as a database
embedded in an application where the user does not have convenient access to a
system administrator. The user can let the files autogrow as required to reduce the
administrative burden of monitoring free space in the database and manually
allocating additional space.

Database Filegroups
Database objects and files can be grouped together in filegroups for allocation and
administration purposes. There are two types of filegroups:

Primary
The primary filegroup contains the primary data file and any other files not
specifically assigned to another filegroup. All pages for the system tables are
allocated in the primary filegroup.

User-defined
User-defined filegroups are any filegroups that are specified by using the
FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.
Page1
Log files are never part of a filegroup. Log space is managed separately from data
space.
No file can be a member of more than one filegroup. Tables, indexes, and large
object data can be associated with a specified filegroup. In this case, all their pages
will be allocated in that filegroup, or the tables and indexes can be partitioned. The
data of partitioned tables and indexes is divided into units each of which can be
placed in a separate filegroup in a database. For more information about partitioned
tables and indexes, see Partitioned Tables and Indexes.

One filegroup in each database is designated the default filegroup. When a table or
index is created without specifying a filegroup, it is assumed all pages will be
allocated from the default filegroup. Only one filegroup at a time can be the default
filegroup. Members of the db_owner fixed database role can switch the default
filegroup from one filegroup to another. If no default filegroup is specified, the
primary filegroup is the default filegroup.

File and Filegroup Example


The following example creates a database on an instance of SQL Server. The
database has a primary data file, a user-defined filegroup, and a log file. The
primary data file is in the primary filegroup and the user-defined filegroup has two
secondary data files. An ALTER DATABASE statement makes the user-defined
filegroup the default. A table is then created specifying the user-defined filegroup.

USE master;
GO
-- Create the database with the default data
-- filegroup and a log file. Specify the
-- growth increment and the max size for the
-- primary data file.
CREATE DATABASE MyDB
ON PRIMARY
( NAME='MyDB_Primary',
FILENAME=
'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_Prm.mdf',
SIZE=4MB,
MAXSIZE=10MB,
FILEGROWTH=1MB),
FILEGROUP MyDB_FG1
( NAME = 'MyDB_FG1_Dat1',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_1.ndf',
SIZE = 1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB),
( NAME = 'MyDB_FG1_Dat2',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_2.ndf',
SIZE = 1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB)
LOG ON
( NAME='MyDB_log',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB.ldf',
SIZE=1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB);
GO
Page1

ALTER DATABASE MyDB


MODIFY FILEGROUP MyDB_FG1 DEFAULT;
GO
-- Create a table in the user-defined filegroup.
USE MyDB;
CREATE TABLE MyTable
( cola int PRIMARY KEY,
colb char(8) )
ON MyDB_FG1;
GO

The following illustration summarizes the results of the previous example.

Space Allocation and Reuse


SQL Server 2005 is effective at quickly allocating pages to objects and reusing
space that is made available by deleted rows. These operations are internal to the
system and use data structures that are not visible to users. However, these
processes and structures are still occasionally referenced in SQL Server messages.

This section is an overview of the space allocation algorithms and the data
structures. It also provides users and administrators with the knowledge they
require to understand the references to the terms in the messages generated by
SQL Server.

Managing Extent Allocations and Free Space


The SQL Server 2005 data structures that manage extent allocations and track free
space have a relatively simple structure. This has the following benefits:

• The free space information is densely packed, so relatively few pages contain
Page1

this information.
This increases speed by reducing the amount of disk reads that are required
to retrieve allocation information. This also increases the chance that the
allocation pages will remain in memory and not require more reads.
• Most of the allocation information is not chained together. This simplifies the
maintenance of the allocation information.

Each page allocation or deallocation can be performed quickly. This decreases the
contention between concurrent tasks having to allocate or deallocate pages.

SQL Server uses two types of allocation maps to record the allocation of extents:

• Global Allocation Map (GAM)


GAM pages record what extents have been allocated. Each GAM covers
64,000 extents, or almost 4 GB of data. The GAM has one bit for each extent
in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the
extent is allocated.
• Shared Global Allocation Map (SGAM)
SGAM pages record which extents are currently being used as mixed extents
and also have at least one unused page. Each SGAM covers 64,000 extents,
or almost 4 GB of data. The SGAM has one bit for each extent in the interval it
covers. If the bit is 1, the extent is being used as a mixed extent and has a
free page. If the bit is 0, the extent is not used as a mixed extent, or it is a
mixed extent and all its pages are being used.

Each extent has the following bit patterns set in the GAM and SGAM, based on its
current use.

Current use of Extent GAM bit setting SGAM bit


setting
Free, not being used 1 0
Uniform extent, or full mixed extent 0 0
Mixed extent with free pages 0 1

This causes simple extent management algorithms. To allocate a uniform extent,


the Database Engine searches the GAM for a 1 bit and sets it to 0. To find a mixed
extent with free pages, the Database Engine searches the SGAM for a 1 bit. To
allocate a mixed extent, the Database Engine searches the GAM for a 1 bit, sets it to
0, and then also sets the corresponding bit in the SGAM to 1. To deallocate an
extent, the Database Engine makes sure that the GAM bit is set to 1 and the SGAM
bit is set to 0. The algorithms that are actually used internally by the Database
Engine are more sophisticated than what is described in this topic, because the
Database Engine distributes data evenly in a database. However, even the real
algorithms are simplified by not having to manage chains of extent allocation
information.

Tracking Free Space


Page Free Space (PFS) pages record the allocation status of each page, whether an
individual page has been allocated, and the amount of free space on each page.
Page1

The PFS has one byte for each page, recording whether the page is allocated, and if
so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent
full, or 96 to 100 percent full.

After an extent has been allocated to an object, the Database Engine uses the PFS
pages to record which pages in the extent are allocated or free. This information is
used when the Database Engine has to allocate a new page. The amount of free
space in a page is only maintained for heap and Text/Image pages. It is used when
the Database Engine has to find a page with free space available to hold a newly
inserted row. Indexes do not require that the page free space be tracked, because
the point at which to insert a new row is set by the index key values.

A PFS page is the first page after the file header page in a data file (page number
1). This is followed by a GAM page (page number 2), and then an SGAM page (page
3). There is a PFS page approximately 8,000 pages in size after the first PFS page.
There is another GAM page 64,000 extents after the first GAM page on page 2, and
another SGAM page 64,000 extents after the first SGAM page on page 3. The
following illustration shows the sequence of pages used by the database engine to
allocate and manage extents.

Managing Space Used by Objects


An Index Allocation Map (IAM) page maps the extents in a 4-GB part of a database
file used by an allocation unit. An allocation unit is one of three types:

• IN_ROW_DATA
Holds a partition of a heap or index.
• LOB_DATA
Holds large object (LOB) data types, such as xml, varbinary(max), and
varchar(max).
• ROW_OVERFLOW_DATA
Holds variable length data stored in varchar, nvarchar, varbinary, or
sql_variant columns that exceed the 8,060 byte row size limit.

Each partition of a heap or index contains at least an IN_ROW_DATA allocation unit.


It may also contain a LOB_DATA or ROW_OVERFLOW_DATA allocation unit,
depending on the heap or index schema. For more information about allocation
units, see Table and Index Organization.

An IAM page covers a 4-GB range in a file and is the same coverage as a GAM or
SGAM page. If the allocation unit contains extents from more than one file, or more
than one 4-GB range of a file, there will be multiple IAM pages linked in an IAM
chain. Therefore, each allocation unit has at least one IAM page for each file on
which it has extents. There may also be more than one IAM page on a file, if the
Page1

range of the extents on the file allocated to the allocation unit exceeds the range
that a single IAM page can record.
IAM pages are allocated as required for each allocation unit and are located
randomly in the file. The system view, sys.system_internals_allocation_units,
points to the first IAM page for an allocation unit. All the IAM pages for that
allocation unit are linked in a chain.

The sys.system_internals_allocation_units system view is for internal use only


and is subject to change. Compatibility is not guaranteed.

An IAM page has a header that indicates the starting extent of the range of extents
mapped by the IAM page. The IAM page also has a large bitmap in which each bit
represents one extent. The first bit in the map represents the first extent in the
range, the second bit represents the second extent, and so on. If a bit is 0, the
extent it represents is not allocated to the allocation unit owning the IAM. If the bit
is 1, the extent it represents is allocated to the allocation unit owning the IAM page.

When the Database Engine has to insert a new row and no space is available in the
current page, it uses the IAM and PFS pages to find a page to allocate, or, for a heap
or a Text/Image page, a page with sufficient space to hold the row. The Database
Engine uses the IAM pages to find the extents allocated to the allocation unit. For
each extent, the Database Engine searches the PFS pages to see if there is a page
that can be used. Each IAM and PFS page covers lots of data pages, so there are few
IAM and PFS pages in a database. This means that the IAM and PFS pages are
generally in memory in the SQL Server buffer pool, so they can be searched quickly.
For indexes, the insertion point of a new row is set by the index key. In this case, the
search process previously described does not occur.

The database engine allocates a new extent to an allocation unit only when it
Page1

cannot quickly find a page in an existing extent with sufficient space to hold the row
being inserted. The Database Engine allocates extents from those available in the
filegroup using a proportional allocation algorithm. If a filegroup has two files and
one has two times the free space as the other, two pages will be allocated from the
file with the available space for every one page allocated from the other file. This
means that every file in a filegroup should have a similar percentage of space used.

Tracking Modified Extends


SQL Server 2005 uses two internal data structures to track extents modified by bulk
copy operations and extents modified since the last full backup. These data
structures greatly speed up differential backups. They also speed up the logging of
bulk copy operations when a database is using the bulk-logged recovery model. Like
the Global Allocation Map (GAM) and Shared Global Allocation Map (SGAM) pages,
these structures are bitmaps in which each bit represents a single extent.

• Differential Changed Map (DCM):


This tracks the extents that have changed since the last BACKUP DATABASE
statement. If the bit for an extent is 1, the extent has been modified since the
last BACKUP DATABASE statement. If the bit is 0, the extent has not been
modified.

Differential backups read just the DCM pages to determine which extents have been
modified. This greatly reduces the number of pages that a differential backup must
scan. The length of time that a differential backup runs is proportional to the
number of extents modified since the last BACKUP DATABASE statement and not the
overall size of the database.

• Bulk Changed Map (BCM)


This tracks the extents that have been modified by bulk logged operations
since the last BACKUP LOG statement. If the bit for an extent is 1, the extent
has been modified by a bulk logged operation after the last BACKUP LOG
statement. If the bit is 0, the extent has not been modified by bulk logged
operations.

Although BCM pages appear in all databases, they are only relevant when the
database is using the bulk-logged recovery model. In this recovery model, when a
BACKUP LOG is performed, the backup process scans the BCMs for extents that have
been modified. It then includes those extents in the log backup.

This lets the bulk logged operations be recovered if the database is restored from a
database backup and a sequence of transaction log backups. BCM pages are not
relevant in a database that is using the simple recovery model, because no bulk
logged operations are logged. They are not relevant in a database that is using the
full recovery model, because that recovery model treats bulk logged operations as
fully logged operations.

The interval between DCM pages and BCM pages is the same as the interval
between GAM and SGAM page, 64,000 extents. The DCM and BCM pages are
located behind the GAM and SGAM pages in a physical file:
Page1
Table and Index Architecture
Objects in the SQL Server 2005 database are stored as a collection of 8-KB pages.
This section describes the way the pages for tables and indexes are organized,
stored, and accessed.

Table and Index Organization


The following illustration shows the organization of a table. A table is contained in
one or more partitions and each partition contains data rows in either a heap or a
clustered index structure. The pages of the heap or clustered index are managed in
one or more allocation units, depending on the column types in the data rows.

Partitions
In SQL Server 2005, table and index pages are contained in one or more partitions.
A partition is a user-defined unit of data organization. By default, a table or index
has only one partition that contains all the table or index pages. The partition
resides in a single filegroup. A table or index with a single partition is equivalent to
the organizational structure of tables and indexes in earlier versions of SQL Server.

When a table or index uses multiple partitions, the data is partitioned horizontally
so that groups of rows are mapped into individual partitions, based on a specified
column. The partitions can be put on one or more filegroups in the database. The
table or index is treated as a single logical entity when queries or updates are
performed on the data.
Page1
To view the partitions used by a table or index, use the sys.partitions (Transact-
SQL) catalog view.

Clustered Tables, Heaps, and Indexes


SQL Server 2005 tables use one of two methods to organize their data pages within
a partition:

• Clustered tables are tables that have a clustered index.


The data rows are stored in order based on the clustered index key. The
clustered index is implemented as a B-tree index structure that supports fast
retrieval of the rows, based on their clustered index key values. The pages in
each level of the index, including the data pages in the leaf level, are linked
in a doubly-linked list. However, navigation from one level to another is
performed by using key values. For more information, see Clustered Index
Structures.
• Heaps are tables that have no clustered index.
The data rows are not stored in any particular order, and there is no particular
order to the sequence of the data pages. The data pages are not linked in a
linked list.

Indexed views have the same storage structure as clustered tables.


When a heap or a clustered table has multiple partitions, each partition has a heap
or B-tree structure that contains the group of rows for that specific partition. For
example, if a clustered table has four partitions, there are four B-trees; one in each
partition.

Nonclustered Indexes
Nonclustered indexes have a B-tree index structure similar to the one in clustered
indexes. The difference is that nonclustered indexes do not affect the order of the
data rows. The leaf level contains index rows. Each index row contains the
nonclustered key value, a row locator and any included, or nonkey, columns. The
locator points to the data row that has the key value.

XML Indexes
One primary and several secondary XML indexes can be created on each xml
column in the table. An XML index is a shredded and persisted representation of the
XML binary large objects (BLOBs) in the xml data type column. XML indexes are
stored as internal tables. To view information about xml indexes, use the
sys.xml_indexes or sys.internal_tables catalog views.

Allocation Units
An allocation unit is a collection of pages within a heap or B-tree used to manage
data based on their page type. The following table lists the types of allocation units
used to manage data in tables and indexes.

Allocation unit Is used to manage


type
IN_ROW_DATA Data or index rows that contain all data, except large object
Page1

(LOB) data.
Pages are of type Data or Index.
LOB_DATA Large object data stored in one or more of these data types:
text, ntext, image, xml, varchar(max), nvarchar(max),
varbinary(max), or CLR user-defined types (CLR UDT).
Pages are of type Text/Image.
ROW_OVERFLOW_D Variable length data stored in varchar, nvarchar, varbinary, or
ATA sql_variant columns that exceed the 8,060 byte row size limit.
Pages are of type Data.

A heap or B-tree can have only one allocation unit of each type in a specific
partition. To view the table or index allocation unit information, use the
sys.allocation_units catalog view.

IN_ROW_DATA Allocation Unit


For every partition used by a table (heap or clustered table), index, or indexed view,
there is one IN_ROW_DATA allocation unit that is made up of a collection of data
pages. This allocation unit also contains additional collections of pages to
implement each nonclustered and XML index defined for the table or view. The page
collections in each partition of a table, index, or indexed view are anchored by page
pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only


and is subject to change. Compatibility is not guaranteed.

Each table, index, and indexed view partition has a row in


sys.system_internals_allocation_units uniquely identified by a container ID
(container_id). The container ID has a one-to-one mapping to the partition_id in the
sys.partitions catalog view that maintains the relationship between the table, index,
or the indexed view data stored in a partition and the allocation units used to
manage the data within the partition.

The allocation of pages to a table, index, or an indexed view partition is managed by


a chain of IAM pages. The column first_iam_page in
sys.system_internals_allocation_units points to the first IAM page in the chain of IAM
pages managing the space allocated to the table, index, or the indexed view in the
IN_ROW_DATA allocation unit.

sys.partitions returns a row for each partition in a table or index.

• A heap has a row in sys.partitions with index_id = 0.


The first_iam_page column in sys.system_internals_allocation_units
points to the IAM chain for the collection of heap data pages in the specified
partition. The server uses the IAM pages to find the pages in the data page
collection, because they are not linked.
• A clustered index on a table or a view has a row in sys.partitions with
index_id = 1.
The root_page column in sys.system_internals_allocation_units points to
the top of the clustered index B-tree in the specified partition. The server
uses the index B-tree to find the data pages in the partition.
Page1

• Each nonclustered index created for a table or a view has a row in


sys.partitions with index_id > 1.
The root_page column in sys.system_internals_allocation_units points to
the top of the nonclustered index B-tree in the specified partition.
• Each table that has at least one LOB column also has a row in sys.partitions
with index_id > 250.
The first_iam_page column points to the chain of IAM pages that manage
the pages in the LOB_DATA allocation unit.

ROW_OVERFLOW_DATA Allocation Unit


For every partition used by a table (heap or clustered table), index, or indexed view,
there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero
(0) pages until a data row with variable length columns (varchar, nvarchar,
varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row
size limit. When the size limitation is reached, SQL Server moves the column with
the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation
unit. A 24-byte pointer to this off-row data is maintained on the original page.

Text/Image pages in the ROW_OVERFLOW_DATA allocation unit are managed in the


same way pages in the LOB_DATA allocation unit are managed. That is, the
Text/Image pages are managed by a chain of IAM pages.

LOB_DATA Allocation Unit


When a table or index has one or more LOB data types, one LOB_DATA allocation
unit per partition is allocated to manage the storage of that data. The LOB data
types include text, ntext, image, xml, varchar(max), nvarchar(max),
varbinary(max), and CLR user-defined types.

Partition and Allocation Unit Example


The following example returns partition and allocation unit data for two tables:
DatabaseLog, a heap with LOB data and no nonclustered indexes, and Currency, a
clustered table without LOB data and one nonclustered index. Both tables have a
single partition.

USE AdventureWorks;
GO
SELECT o.name AS table_name,p.index_id, i.name AS index_name , au.type_desc AS allocation_type,
au.data_pages, partition_number
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.objects AS o ON p.object_id = o.object_id
JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id
WHERE o.name = N'DatabaseLog' OR o.name = N'Currency'
ORDER BY o.name, p.index_id;

Here is the result set. Notice that the DatabaseLog table uses all three allocation
unit types, because it contains both data and Text/Image page types. The Currency
table does not have LOB data, but does have the allocation unit required to manage
data pages. If the Currency table is later modified to include a LOB data type
column, a LOB_DATA allocation unit is created to manage that data.

table_name index_id index_name allocation_type data_pages partition_number


----------- -------- ----------------------- --------------- ----------- ------------
Currency 1 PK_Currency_CurrencyCode IN_ROW_DATA 1 1
Page1

Currency 3 AK_Currency_Name IN_ROW_DATA 1 1


DatabaseLog 0 NULL IN_ROW_DATA 160 1
DatabaseLog 0 NULL ROW_OVERFLOW_DATA 0 1
DatabaseLog 0 NULL LOB_DATA 49 1
(5 row(s) affected)

Heap Structures
A heap is a table without a clustered index. Heaps have one row in sys.partitions,
with index_id = 0 for each partition used by the heap. By default, a heap has a
single partition. When a heap has multiple partitions, each partition has a heap
structure that contains the data for that specific partition. For example, if a heap
has four partitions, there are four heap structures; one in each partition.

Depending on the data types in the heap, each heap structure will have one or more
allocation units to store and manage the data for a specific partition. At a minimum,
each heap will have one IN_ROW_DATA allocation unit per partition. The heap will
also have one LOB_DATA allocation unit per partition, if it contains large object (LOB)
columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition, if
it contains variable length columns that exceed the 8,060 byte row size limit.

The column first_iam_page in the sys.system_internals_allocation_units system view


points to the first IAM page in the chain of IAM pages that manage the space
allocated to the heap in a specific partition. SQL Server 2005 uses the IAM pages to
move through the heap. The data pages and the rows within them are not in any
specific order and are not linked. The only logical connection between data pages is
the information recorded in the IAM pages.

The sys.system_internals_allocation_units system view is for internal use only


and is subject to change. Compatibility is not guaranteed.

Table scans or serial reads of a heap can be performed by scanning the IAM pages
to find the extents that are holding pages for the heap. Because the IAM represents
extents in the same order that they exist in the data files, this means that serial
heap scans progress sequentially through each file. Using the IAM pages to set the
scan sequence also means that rows from the heap are not typically returned in the
order in which they were inserted.

The following illustration shows how the SQL Server Database Engine uses IAM
pages to retrieve data rows in a single partition heap.

Page1
Clustered Index Structures
In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is
called an index node. The top node of the B-tree is called the root node. The bottom
level of nodes in the index is called the leaf nodes. Any index levels between the
root and the leaf nodes are collectively known as intermediate levels. In a clustered
index, the leaf nodes contain the data pages of the underlying table. The root and
leaf nodes contain index pages holding index rows. Each index row contains a key
value and a pointer to either an intermediate level page in the B-tree, or a data row
in the leaf level of the index. The pages in each level of the index are linked in a
doubly-linked list.

Clustered indexes have one row in sys.partitions, with index_id = 1 for each
partition used by the index. By default, a clustered index has a single partition.
When a clustered index has multiple partitions, each partition has a B-tree structure
that contains the data for that specific partition. For example, if a clustered index
has four partitions, there are four B-tree structures; one in each partition.

Depending on the data types in the clustered index, each clustered index structure
will have one or more allocation units in which to store and manage the data for a
specific partition. At a minimum, each clustered index will have one IN_ROW_DATA
allocation unit per partition. The clustered index will also have one LOB_DATA
allocation unit per partition if it contains large object (LOB) columns. It will also have
one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length
columns that exceed the 8,060 byte row size limit. For more information about
allocation units, see Table and Index Organization.

The pages in the data chain and the rows in them are ordered on the value of the
clustered index key. All inserts are made at the point where the key value in the
inserted row fits in the ordering sequence among existing rows. The page
collections for the B-tree are anchored by page pointers in the
sys.system_internals_allocation_units system view.
Page1
The sys.system_internals_allocation_units system view is for internal use only
and is subject to change. Compatibility is not guaranteed.

For a clustered index, the root_page column in sys.system_internals_allocation_units


points to the top of the clustered index for a specific partition. SQL Server moves
down the index to find the row corresponding to a clustered index key. To find a
range of keys, SQL Server moves through the index to find the starting key value in
the range and then scans through the data pages using the previous or next
pointers. To find the first page in the chain of data pages, SQL Server follows the
leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index in a single partition.

Nonclustered Index Structures


Nonclustered indexes have the same B-tree structure as clustered indexes, except
for the following significant differences:
• The data rows of the underlying table are not sorted and stored in order
based on their nonclustered keys.
• The leaf layer of a nonclustered index is made up of index pages instead of
Page1

data pages.
Nonclustered indexes can be defined on a table or view with a clustered index or a
heap. Each index row in the nonclustered index contains the nonclustered key value
and a row locator. This locator points to the data row in the clustered index or heap
having the key value.

The row locators in nonclustered index rows are either a pointer to a row or are a
clustered index key for a row, as described in the following:

• If the table is a heap, which means it does not have a clustered index, the
row locator is a pointer to the row. The pointer is built from the file identifier
(ID), page number, and number of the row on the page. The whole pointer is
known as a Row ID (RID).
• If the table has a clustered index, or the index is on an indexed view, the row
locator is the clustered index key for the row. If the clustered index is not a
unique index, SQL Server 2005 makes any duplicate keys unique by adding
an internally generated value called a uniqueifier. This four-byte value is not
visible to users. It is only added when required to make the clustered key
unique for use in nonclustered indexes. SQL Server retrieves the data row by
searching the clustered index using the clustered index key stored in the leaf
row of the nonclustered index.

Nonclustered indexes have one row in sys.partitions with index_id >1 for each
partition used by the index. By default, a nonclustered index has a single partition.
When a nonclustered index has multiple partitions, each partition has a B-tree
structure that contains the index rows for that specific partition. For example, if a
nonclustered index has four partitions, there are four B-tree structures, with one in
each partition.

Depending on the data types in the nonclustered index, each nonclustered index
structure will have one or more allocation units in which to store and manage the
data for a specific partition. At a minimum, each nonclustered index will have one
IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. The
nonclustered index will also have one LOB_DATA allocation unit per partition if it
contains large object (LOB) columns . Additionally, it will have one
ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length
columns that exceed the 8,060 byte row size limit. For more information about
allocation units, see Table and Index Organization. The page collections for the B-
tree are anchored by root_page pointers in the
sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only


and is subject to change. Compatibility is not guaranteed.
Page1
Included Column Indexes
In SQL Server 2005, the functionality of nonclustered indexes can be extended by
adding included columns, called nonkey columns, to the leaf level of the index.
While the key columns are stored at all levels of the nonclustered index, nonkey
columns are stored only at the leaf level.

Transaction Log Architecture


Every SQL Server 2005 database has a transaction log that records all the
transactions and database modifications made by each transaction. The transaction
log is a critical component of any database. This section contains the architectural
information required to understand how the transaction log is used to guarantee the
data integrity of the database and how it is used for data recovery.

Transaction Log Fundamentals


Every SQL Server 2005 database has a transaction log that records all transactions
and the database modifications made by each transaction. The transaction log is a
critical component of the database and, in the case of a system failure, can be the
only source of recent data. It should never be deleted or moved unless the
consequences of doing that are fully understood.
Page1

Operations Supported by the Transaction Log


The transaction log supports the following operations:

• Recovery of individual transactions


If an application issues a ROLLBACK statement, or if the Database Engine
detects an error such as the loss of communication with a client, the log
records are used to roll back the modifications made by an incomplete
transaction.
• Recovery of all incomplete transactions when SQL Server is started
If a server that is running SQL Server fails, the databases may be left in a
state where some modifications were never written from the buffer cache to
the data files, and there may be some modifications from incomplete
transactions in the data files. When an instance of SQL Server is started, it
runs a recovery of each database. Every modification recorded in the log
which may not have been written to the data files is rolled forward. Every
incomplete transaction found in the transaction log is then rolled back to
make sure the integrity of the database is preserved.
• Rolling a restored database, file, filegroup, or page forward to the point of
failure
After a hardware loss or disk failure affecting the database files, you can
restore the database to the point of failure. You first restore the last full
backup and the last full differential backup, and then restore the subsequent
sequence of the transaction log backups to the point of failure. As you restore
each log backup, the Database Engine reapplies all the modifications
recorded in the log to roll forward all the transactions. When the last log
backup is restored, the Database Engine then uses the log information to roll
back all transactions that were not complete at that point.
• Supporting transactional replication
The Log Reader Agent monitors the transaction log of each database
configured for transactional replication and copies the transactions marked
for replication from the transaction log into the distribution database. For
more information, see How Transactional Replication Works.
• Supporting standby server solutions
The standby-server solutions, database mirroring and log shipping, rely
heavily on the transaction log. In a log shipping scenario, the primary server
sends the active transaction log of the primary database to one or more
destinations. Each secondary server restores the log to its local secondary
database. For more information, see Understanding Log Shipping.

In a database mirroring scenario, every update to a database, the principal


database, is immediately reproduced in a separate, full copy of the database, the
mirror database. The principal server instance sends each log record immediately to
the mirror server instance which applies the incoming log records to the mirror
database, continually rolling it forward. For more information, see Overview of
Database Mirroring.

Transaction Log Characteristics


Following are the characteristics of the SQL Server Database Engine transaction log:
Page1
• The transaction log is implemented as a separate file or set of files in the
database. The log cache is managed separately from the buffer cache for
data pages, which results in simple, fast, and robust code within the database
engine.
• The format of log records and pages is not constrained to follow the format of
data pages.
• The transaction log can be implemented in several files. The files can be
defined to expand automatically by setting the FILEGROWTH value for the
log. This reduces the potential of running out of space in the transaction log,
while at the same time reducing administrative overhead. For more
information, see ALTER DATABASE (Transact-SQL).
• The mechanism to reuse the space within the log files is quick and has
minimal effect on transaction throughput.

Transaction Log Logical Architecture


The SQL Server 2005 transaction log operates logically as if the transaction log is a
string of log records. Each log record is identified by a log sequence number (LSN).
Each new log record is written to the logical end of the log with an LSN that is higher
than the LSN of the record before it.

Log records are stored in a serial sequence as they are created. Each log record
contains the ID of the transaction that it belongs to. For each transaction, all log
records associated with the transaction are individually linked in a chain using
backward pointers that speed the rollback of the transaction.

Log records for data modifications record either the logical operation performed or
they record the before and after images of the modified data. The before image is a
copy of the data before the operation is performed; the after image is a copy of the
data after the operation has been performed.
The steps to recover an operation depend on the type of log record:

• Logical operation logged


• To roll the logical operation forward, the operation is performed again.
• To roll the logical operation back, the reverse logical operation is
performed.

• Before and after image logged


• To roll the operation forward, the after image is applied.
• To roll the operation back, the before image is applied.

Many types of operations are recorded in the transaction log. These operations
include:

• The start and end of each transaction.


• Every data modification (insert, update, or delete). This includes changes to
system tables made by system stored procedures or data definition language
(DDL) statements.
Page1

• Every extent and page allocation or deallocation.


• Creating or dropping a table or index.
Rollback operations are also logged. Each transaction reserves space on the
transaction log to make sure that enough log space exists to support a rollback that
is caused by either an explicit rollback statement or if an error is encountered. The
amount of space reserved depends on the operations performed in the transaction,
but generally is equal to the amount of space used to log each operation. This
reserved space is freed when the transaction is completed.

The section of the log file from the first log record that must be present for a
successful database-wide rollback to the last-written log record is called the active
part of the log, or the active log. This is the section of the log required to do a full
recovery of the database. No part of the active log can ever be truncated.

Transaction Log Physical Architecture


The transaction log in a database maps over one or more physical files.
Conceptually, the log file is a string of log records. Physically, the sequence of log
records is stored efficiently in the set of physical files that implement the
transaction log.

The SQL Server Database Engine divides each physical log file internally into a
number of virtual log files. Virtual log files have no fixed size, and there is no fixed
number of virtual log files for a physical log file. The database engine chooses the
size of the virtual log files dynamically while it is creating or extending log files. The
Database Engine tries to maintain a small number of virtual files. The size of the
virtual files after a log file has been extended is the sum of the size of the existing
log and the size of the new file increment. The size or number of virtual log files
cannot be configured or set by administrators.

The only time virtual log files affect system performance is if the log files are
defined by small size and growth_increment values. If these log files grow to a large
size because of many small increments, they will have lots of virtual log files. This
can slow down database startup and also log backup and restore operations. We
recommend that you assign log files a size value close to the final size required, and
also have a relatively large growth_increment value.

The transaction log is a wrap-around file. For example, consider a database with one
physical log file divided into four virtual log files. When the database is created, the
logical log file begins at the start of the physical log file. New log records are added
at the end of the logical log and expand toward the end of the physical log. Log
records in the virtual logs that appear in front of the minimum recovery log
sequence number (MinLSN) are deleted, as truncation operations occur. The
transaction log in the example database would look similar to the one in the
following illustration.
Page1
When the end of the logical log reaches the end of the physical log file, the new log
records wrap around to the start of the physical log file.

This cycle repeats endlessly, as long as the end of the logical log never reaches the
beginning of the logical log. If the old log records are truncated frequently enough
to always leave sufficient room for all the new log records created through the next
checkpoint, the log never fills. However, if the end of the logical log does reach the
start of the logical log, one of two things occurs:

• If the FILEGROWTH setting is enabled for the log and space is available on the
disk, the file is extended by the amount specified in growth_increment and
the new log records are added to the extension. For more information about
the FILEGROWTH setting, see ALTER DATABASE (Transact-SQL).
• If the FILEGROWTH setting is not enabled, or the disk that is holding the log
file has less free space than the amount specified in growth_increment, an
9002 error is generated.

If the log contains multiple physical log files, the logical log will move through all the
physical log files before it wraps back to the start of the first physical log file.

Write-Ahead Transaction Log


SQL Server 2005 uses a write-ahead log (WAL). A write-ahead log guarantees that
no data modifications are written to disk before the associated log record is written
to disk. This maintains the ACID properties for a transaction.

To understand how the write-ahead log works, it is important for you to know how
modified data is written to disk. SQL Server maintains a buffer cache into which it
reads data pages when data must be retrieved. Data modifications are not made
directly to disk, but are made to the copy of the page in the buffer cache. The
modification is not written to disk until a checkpoint occurs in the database, or the
modification must be written to disk so the buffer can be used to hold a new page.
Writing a modified data page from the buffer cache to disk is called flushing the
Page1

page. A page modified in the cache, but not yet written to disk is called a dirty
page.
At the time a modification is made to a page in the buffer, a log record is built in the
log cache that records the modification. This log record must be written to disk
before the associated dirty page is flushed from the buffer cache to disk. If the dirty
page is flushed before the log record is written, the dirty page will create a
modification on the disk that cannot be rolled back if the server fails before the log
record is written to disk. SQL Server has logic that prevents a dirty page from being
flushed before the associated log record is written. Log records are written to disk
when the transactions are committed.

Checkpoints and the Active Portion of the Log


Checkpoints flush dirty data pages from the buffer cache of the current database to
disk. This minimizes the active portion of the log that must be processed during a
full recovery of a database. During a full recovery, two types of actions are
performed:

• The log records of modifications not flushed to disk before the system
stopped are rolled forward.
• All modifications associated with incomplete transactions, such as
transactions for which there is no COMMIT or ROLLBACK log record, are rolled
back.

Checkpoint Operation
A checkpoint performs the following processes in the current database:

• Writes a record to the log file marking the start of the checkpoint.
• Stores information recorded for the checkpoint in a chain of checkpoint log
records.

One piece of information recorded in the checkpoint records is the LSN of the first
log record that must be present for a successful database-wide rollback. This LSN is
called the Minimum Recovery LSN (MinLSN) and is the minimum of the:

• LSN of the start of the checkpoint


• LSN of the start of the oldest active transaction
• LSN of the start of the oldest replication transaction that has not yet been
delivered to the distribution database

Another piece of information recorded in the checkpoint records is a list of all the
active transactions that have modified the database.

• Marks for reuse the space that precedes the MinLSN, if the database uses the
simple recovery model.
• Writes all dirty log and data pages to disk.
• Writes a record marking the end of the checkpoint to the log file.
• Writes the LSN of the start of this chain to the database boot page.
Page1

Activities That Cause a Checkpoint


Checkpoints occur in the following situations:

• A CHECKPOINT statement is explicitly executed. A checkpoint occurs in the


current database for the connection.
• A minimally logged operation is performed in the database; for example, a
bulk-copy operation is performed on a database that is using the Bulk-Logged
recovery model.
• Database files have been added or removed by using ALTER DATABASE.
• A change to the simple recovery model as part of the log truncation process
that occurs during this operation.
• An instance of SQL Server is stopped by a SHUTDOWN statement or by
stopping the SQL Server (MSSQLSERVER) service. Either will checkpoint each
database in the instance of SQL Server.
• An instance of SQL Server periodically generates automatic checkpoints in
each database to reduce the time that the instance would take to recover the
database.
• A database backup is taken.
• An activity requiring a database shutdown is performed. For example,
AUTO_CLOSE is ON and the last user connection to the database is closed, or
a database option change is made that requires a restart of the database.

Automatic Checkpoints
The SQL Server Database Engine generates automatic checkpoints. The interval
between automatic checkpoints is based on the amount of log space used and the
time elapsed since the last checkpoint. The time interval between automatic
checkpoints can be highly variable and long, if few modifications are made in the
database. Automatic checkpoints can also occur frequently if lots of data is
modified.

The interval between automatic checkpoints is calculated for all the databases on a
server instance from the recovery interval server configuration option. This option
specifies the maximum time the Database Engine should use to recover a database
during a system restart. The Database Engine estimates how many log records it
can process in the recovery interval during a recovery operation.

The interval between automatic checkpoints also depends on the recovery model:

• If the database is using either the full or bulk-logged recovery model, an


automatic checkpoint is generated whenever the number of log records
reaches the number the database engine estimates it can process during the
time specified in the recovery interval option.
• If the database is using the simple recovery model, an automatic checkpoint
is generated whenever the number of log records reaches the lesser of these
two values:
• The log becomes 70 percent full.
• The number of log records reaches the number the Database Engine
estimates it can process during the time specified in the recovery
interval option.
Page1
Automatic checkpoints truncate the unused section of the transaction log if the
database is using the simple recovery model. However, if the database is using the
full or bulk-logged recovery models, the log is not truncated by automatic
checkpoints.

The CHECKPOINT statement now provides an optional checkpoint_duration


argument that specifies the requested period of time, in seconds, for checkpoints to
finish.

Active Log
The section of the log file from the MinLSN to the last-written log record is called the
active portion of the log, or the active log. This is the section of the log required to
do a full recovery of the database. No part of the active log can ever be truncated.
All log records must be truncated from the parts of the log before the MinLSN.

The following figure shows a simplified version of the end-of-a-transaction log with
two active transactions. Checkpoint records have been compacted to a single
record.

LSN 148 is the last record in the transaction log. At the time that the recorded
checkpoint at LSN 147 was processed, Tran 1 had been committed and Tran 2 was
the only active transaction. That makes the first log record for Tran 2 the oldest log
record for a transaction active at the time of the last checkpoint. This makes LSN
142, the Begin transaction record for Tran 2, the MinLSN.

Long-Running Transactions
The active log must include every part of all uncommitted transactions. An
application that starts a transaction and does not commit it or roll it back prevents
the Database Engine from advancing the MinLSN. This can cause two types of
problems:

• If the system is shut down after the transaction has performed many
uncommitted modifications, the recovery phase of the subsequent restart can
take much longer than the time specified in the recovery interval option.
• The log might grow very large, because the log cannot be truncated past the
MinLSN. This occurs even if the database is using the simple recovery model
where the transaction log is generally truncated on each automatic
checkpoint.

Replication Transactions
The Log Reader Agent monitors the transaction log of each database configured for
transactional replication and copies the transactions marked for replication from the
Page1

transaction log into the distribution database. The active log must contain all
transactions that are marked for replication, but that have not yet been delivered to
the distribution database. If these transactions are not replicated in a timely
manner, they can prevent the truncation of the log.

Truncating the Transaction Log


If log records were never deleted from the transaction log, the logical log would
grow until it filled all the available space on the disks holding the physical log files.
To reduce the size of the logical log, the transaction log is periodically truncated. In
very early versions of SQL Server, truncating the log meant physically deleting the
log records that were no longer needed for recovering or restoring a database.
However, in recent versions, the truncation process just marks for reuse the space
that was used by the old log records. The log records in this space are eventually
overwritten by new log records.

Truncation does not reduce the size of a physical log file. Instead, it reduces the size
of the logical log file and frees disk space for reuse.

The active portion of the transaction log, the active log, can never be truncated. The
active portion of the log is the part of the log that is used to recover the database
and must always be present in the database. The record at the start of the active
portion of the log is identified by the minimum recovery log sequence number
(MinLSN). The log records before the MinLSN are only needed to maintain a
sequence of the transaction log backups.

The recovery model selected for a database determines how much of the
transaction log in front of the MinLSN must be retained in the database, as shown in
the following:

• In the simple recovery model, a sequence of transaction logs is not being


maintained. All log records before the MinLSN can be truncated at any time,
except while a BACKUP statement is being processed.
• In the full and bulk-logged recovery models, a sequence of transaction log
backups is being maintained. The part of the logical log in front of the MinLSN
cannot be truncated until the transaction log has been backed up.

Operations That Truncate the Log


Log truncation occurs at these points:

• After a BACKUP LOG statement is completed and it does not specify NO


TRUNCATE.
• Every time a checkpoint is processed, if the database is using the simple
recovery model. This includes both explicit checkpoints that result from a
CHECKPOINT statement and implicit checkpoints that are generated by the
system. The exception is that the log is not truncated if the checkpoint occurs
when a BACKUP statement is still active.

Log Truncation Example


Page1

Transaction logs are divided internally into sections called virtual log files. Virtual log
files are the unit of space that can be reused. When a transaction log is truncated,
the log records in front of the virtual log file containing the MinLSN are overwritten
as new log records are generated.

This illustration shows a transaction log that has four virtual logs. The log has not
been truncated after the database was created. The logical log starts at the front of
the first virtual log and the part of virtual log 4 beyond the end of the logical file has
never been used.

This illustration shows how the log appears after truncation. The space before the
start of the virtual log that contains the MinLSN record has been marked for reuse.

Shrinking the Transaction Log


The size of the log files are physically reduced in the following situations:

• A DBCC SHRINKDATABASE statement is executed


• A DBCC SHRINKFILE statement referencing a log file is executed
• An autoshrink operation occurs

Shrinking a log depends on first truncating the log. Log truncation does not reduce
the size of a physical log file. However, it does reduce the size of the logical log and
marks as inactive the virtual logs that do not hold any part of the logical log. A log
shrink operation removes enough inactive virtual logs to reduce the log file to the
requested size.

The unit of the size reduction is a virtual log file. For example, if you have a 600-MB
log file that has been divided into six 100-MB virtual logs, the size of the log file can
only be reduced in 100-MB increments. The file size can be reduced to sizes such as
500 MB or 400 MB, but the file cannot be reduced to sizes such as 433 MB or 525
MB.

The size of the virtual log file is chosen dynamically by the database engine when
log files are created or extended.

Virtual log files that hold part of the logical log cannot be freed. If all the virtual log
files in a log file hold parts of the logical log, the file cannot be shrunk until a
Page1

truncation marks as inactive one or more of the virtual logs at the end of the
physical log.
When any file is shrunk, the space freed must come from the end of the file. When a
transaction log file is shrunk, enough virtual log files from the end of the log file are
freed to reduce the log to the size requested by the user. The target_size specified
by the user is rounded to the next highest virtual log file boundary. For example, if a
user specifies a target_size of 325 MB for our sample 600-MB file that contains 100-
MB virtual log files, the last two virtual log files are removed and the new file size is
400 MB.

A DBCC SHRINKDATABASE or DBCC SHRINKFILE operation immediately tries to


shrink the physical log file to the requested size:

• If no part of the logical log in the virtual log files extends beyond the
target_size mark, the virtual log files that come after the target_size mark are
freed and the successful DBCC statement is completed with no messages.

If part of the logical log in the virtual logs does extend beyond the target_size mark,
the SQL Server Database Engine frees as much space as possible and issues an
informational message. The message tells you what actions you have to perform to
remove the logical log from the virtual logs at the end of the file. After you perform
this action, you can then reissue the DBCC statement to free the remaining space.

For example, assume that a 600-MB log file that contains six virtual log files has a
logical log that starts in virtual log 3 and ends in virtual log 4 when you run a DBCC
SHRINKFILE statement with a target_size of 275 MB:

Virtual log files 5 and 6 are freed immediately, because they do not contain part of
the logical log. However, to meet the specified target_size, virtual log file 4 should
also be freed, but it cannot because it holds the end portion of the logical log. After
freeing virtual log files 5 and 6, the Database Engine fills the remaining part of
virtual log file 4 with dummy records. This forces the end of the log file to the end of
virtual log file 1. In most systems, all transactions starting in virtual log file 4 will be
committed within seconds. This means that the entire active portion of the log is
moved to virtual log file 1. The log file now looks similar to this:

The DBCC SHRINKFILE statement also issues an informational message that states
Page1

that it could not free all the space requested and that you can run a BACKUP LOG
statement to free the remaining space. After the active portion of the log moves to
virtual log file 1, a BACKUP LOG statement will truncate the entire logical log that is
in virtual log file 4:

Because virtual log file 4 no longer holds any portion of the logical log, you can now
run the same DBCC SHRINKFILE statement with a target_size of 275 MB. Virtual log
file 4 will then be freed and the size of the physical log file will be reduced to the
size you requested.

Page1

Вам также может понравиться