Академический Документы
Профессиональный Документы
Культура Документы
Author
Sharon Dooley
1-800-843-8733
www.learningtree.ca
T A B L E O F C O N T E N T S
Introduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction
1. Enhancements for the Developer.. . . . . . 2
Flashback to late October 2005: “It” finally arrived! “It” was
1.1. The MERGE Statement .. . . . . . . . . . . . . . 2 the long-awaited release of SQL Server 2005. The magnitude
1.2. Change Data Capture. . . . . . . . . . . . . . . . 4 of the changes made to the software was almost frightening,
1.3. Row Constructors .. . . . . . . . . . . . . . . . . . 5 and the SQL Server community was quite vocal in demand-
1.4. Beyond Relational Data .. . . . . . . . . . . . . 5 ing that Microsoft never repeat that scenario again. Microsoft
listened and promised that they would aim for a two- to
1.4.1. Spatial Data Types .. . . . . . . . . . . . 5
three-year interval between releases in the future.
1.4.2. FILESTREAM . . . . . . . . . . . . . . . . . . 6
1.4.3. DATE and TIME Data Types. . . . . 7 Now it’s early 2009, and SQL Server 2008 has been available
for four months. Many members of that same SQL Server
1.5. Additional Smaller (but Useful)
Features .. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 community who demanded more frequent releases are now
1.5.1. Intellisense . . . . . . . . . . . . . . . . . . . 8 complaining that it’s too soon!
1.5.2. Stored Procedure Debugger .. . . 8 Is it too soon for you? Is SQL Server 2008 a “must-have” up-
2. Enhancements for the DBA .. . . . . . . . . . . . 8 grade? In this paper, I’ll describe the features I personally find
most compelling, focusing on the core database engine and
2.1. Policy-Based Management . . . . . . . . . . . . . . 8
covering three basic topics: Enhancements for the developer,
2.2. Auditing .. . . . . . . . . . . . . . . . . . . . . . . . . . 10
Enhancements for the Database Administrator (DBA) and
2.3. Transparent Date Encryption . . . . . . . . 10 performance Enhancements. Of course, these topics overlap,
2.4. External Key Management .. . . . . . . . . 11 so if you don’t find what you are looking for in a particular
2.5. Availability . . . . . . . . . . . . . . . . . . . . . . . . 11 section, try one of the others.
2.6. Additional Smaller (but Useful) Many of the new features that I’ll describe are available only
Features .. . . . . . . . . . . . . . . . . . . . . . . . . . 11
in the Enterprise Edition of SQL Server 2008. If some of these
2.6.1. Hot-Add Hardware .. . . . . . . . . . 11
interest you, you’ll want to compare the features of the various
2.6.2. Backup Compression . . . . . . . . . 11 editions at http://www.microsoft.com/sqlserver/2008/en/us/
2.6.3. PowerShell Integration .. . . . . . . 11 editions-compare.aspx
3. Performance Enhancements.. . . . . . . . . . . 12
I hope that this paper will help you decide whether or not
3.1. Management Data Warehouse SQL Server 2008 is a “must-have” upgrade for you.
and the Data Collector.. . . . . . . . . . . . . 12
3.2. Sparse Columns.. . . . . . . . . . . . . . . . . . . . 15
3.3. Filtered Indexes .. . . . . . . . . . . . . . . . . . . 15
3.4. Resource Governor. . . . . . . . . . . . . . . . . 16
3.5. Data Compression.. . . . . . . . . . . . . . . . . 17
1-800-843-8733 • www.learningtree.ca
©2009 Learning Tree International. All Rights Reserved.
SQL Server 2008 Database Engine 1
1-800-843-8733 • www.learningtree.ca LEARNING TREE INTERNATIONAL White Paper
The MERGE statement is extremely powerful, but also quite complex. To use the MERGE statement, you need to understand
two tables:
1. The Source table. The table or table-expression from which the new and changed data comes.
2. The Target table. This is the table that will be modified by the data in the source table.
When you are comparing two tables, there are three possible results:
1. There is a match between a row in the Source table and a row in the Target table based on some key.
2. There is a row in the Source table with no matching row in the Target table.
3. There is a row in the Target table with no matching row in the Source table.
When you use the MERGE statement, you may need to deal with all three possibilities.
Example 1
There are two tables involved in this example. Purchases contains details of purchases by product. Purchase History contains
historical information about product purchases. Periodically, the contents of the Purchases table need to be placed in the
Purchase History table. Now that we understand the data and the business rules, we are ready to look at the MERGE statement
that will accomplish the task.
1 20080507 25 125.00
3 20080509 50 750.99
4 20080509 43 400.00
1 20080506 5 400.95
3 20080407 30 401.32
Observe the italicized rows in the two tables. You should see that the Purchases table contains an entry for ProductID 2 on
20080409 and that the PurchaseHistory table contains an entry for the same day and product. You should also notice that there
is an entry for Product 6 in the Purchases table and that there are no entries for that product in the purchase history table.
We have two of the three cases listed above:
1. There is a match between a row in the Source table and a row in the Target table.
2. There is a row in the Source table with no matching row in the Target table.
The business requirements are that if there is a match, the values for Quantity and Amount from the Purchases table are to
replace the amounts in the PurchaseHistory table, and that a row that exists in the Purchases table with no matching row in
the Target table should be inserted.
Now that we understand the data and the business rules, we are ready to look at the MERGE statement that will accomplish
the task.
MERGE dbo.PurchaseHistory AS Target
USING dbo.Purchases AS Source
ON Target.ProductID = Source.ProductID
AND Target.PurchaseDate = Source.PurchaseDate
WHEN MATCHED THEN
UPDATE SET Target.Quantity = Source.Quantity,
Target.Amount = Source.Amount
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, PurchaseDate, Quantity, Amount)
VALUES(
Source.ProductID, Source.PurchaseDate,
Source.Quantity,
Source.Amount)
;
Notice that “;” at the end? It’s not optional; each MERGE statement must be terminated with a semi-colon.
1 20080506 5 400.95
1 20080507 25 125.00
3 20080506 30 401.32
3 20080409 50 750.99
4 20080509 43 400.00
This simple example only begins to demonstrate the power of the new MERGE statement. Furthermore, when combined
with SQL Server 2008 Integration services, applying incremental changes to a data warehouse becomes much easier. If this is
a capability you need, I suggest you refer to Books Online for additional information. (You can download the documentation
and install it without actually installing SQL Server 2008.)
Column Contents
The log sequence number assigned to the transaction when it was committed. This serves to
__$start_lsn tie all the rows affected by a transaction together and can be used to put the transactions in
the order that they happened.
The change that was done: INSERT – 1, DELETE – 2, UPDATE before image – 3,
__$operation
UPDATE after image – 4.
Bit map that specifies the columns that were changed. Every bit will be set for INSERT and
__$update_mask DELETE operations. For UPDATE operations, only the bits for the columns mentioned in
the SET clause of the update statement will be set.
The Change table includes both the before and after images for an update operation. To use this function, you must first
enable Change Data Capture for the database. Once you have done that, you can enable Change Data Capture for tables
where you need it. The capture of the changed data is a “behind the scenes” operation, meaning the capture process reads the
transaction log and writes the data to the Change tables. Several different functions are available for retrieving data from the
Change tables as well as functions for converting time-of-day to Log Sequence Numbers (LSNs) so that it’s easier to define
the interval for which you want to retrieve changes. Note that there is no function that will purge the change tables; you will
have to develop this code yourself.
If you plan to use Change Data Capture, you need to keep in mind that it will have a negative effect on performance. There will
be at an additional write for INSERT and DELETE operations and two additional writes for UPDATE operations. In addition,
you’ll need extra disk space for storing the change data.
Geometry data is two dimensional, usually referred to as Flat Earth data. With Flat Earth data, space is defined by its coor-
dinate points, similar to the X and Y coordinates of a graph. This type of data is used to represent data as it is presented on
a two-dimensional map. The coordinates can represent inches, meters, feet, miles or any other measure as long as the same
measure is used for both coordinates. So you might use geometry data to store information about the distances between two
cities or to define the polygon created by multiple points. For example, connecting three points would form a triangle and
connecting four points would form a rectangle or a square. You could also use this data to show the restaurants located within
a certain number of miles from a given location. SQL Server 2008’s geometry data conforms to the Open Geospatial Consor-
tium (OGC) Simple Features for SQL Specification version 1.1.0.
Geography data handles three-dimensional data or ellipsoidal data, usually referred to as “round earth” data. Points in
this system are defined by latitude and longitude. This is the type of data collected by GPS systems, and coordinates in this
system can form a solid, such as a cube or a pyramid.
Both the geometry and geography data types are implemented as .NET data types, which means that unlike a scalar data type
such as an integer, they are manipulated with methods. For example, if you have two integers, you can compute the distance
between them by subtracting one from another. To compute the area defined by two integers, you simply multiply them e.g.,
2” by 4” gives an area of 8”. To compute the area of a polygon defined by geometry, you use the STArea() method. Consider
the following table that contains the points that define the boundaries of cities:
To find the area of Philadelphia, Pennsylvania, you’d use a query that looks like this:
SELECT City, Region, Boundaries.STArea() AS Area
FROM Cities
WHERE City = ‘Philadelphia’
If your applications require spatial data, you’ll want to delve further into these complex data types.
1.4.2. FILESTREAM
Since the inception of SQL Server, there has been an ongoing debate between whether large objects (images and text) should
be stored in the database or remain in the file system with the database containing the location of the file. Proponents of
storing these objects in the database pointed to the fact that when the objects are in the database, they’re backed up with the
database and that pointers can’t become stale since there are no pointers. The other side pointed to the convenience of hav-
ing the file in the file system since you can easily open it with the application that created it. If you ever had to load a Visual
Basic image control with data from a SQL Server image column, you know that it was not a pretty process. Even though SQL
Server 2005’s varbinary(max) and varchar(max) data types made the process easier, it still wasn’t ideal. You were also limited
to a maximum of 2GB for each large object.
With SQL Server 2008, the debate is finally over, thanks to the FILESTREAM column property. This gives you the best of
both worlds—data lives in the file system but is managed through the database. It gets backed up with the database, but you
can also retrieve the data using the standard Windows API calls that underlie ADO.NET. These manipulations can still retain
transactional consistency like any other database operations, and you can also manipulate the data with Transact-SQL as
you would the varbinary(max) and varchar(max) data types. In addition, the only size limit becomes the maximum size of an
operating system file.
You can enable the FILESTREAM property when you install SQL Server or you can enable it later. In either case, you will
need to consider what types of retrieval you want to permit. You can allow access to FILESTREAM data from Transact-SQL.
If you want to have access from Windows for clients running on the SQL Server machine (for example, CLR functions
and procedures), you need to enable FILESTREAM for file I/O streaming access. If remote clients are going to access
FILESTREAM data from Windows (as a Windows Forms or Active Server Page application), you also need to allow remote
clients to have streaming access to FILESTREAM data. If you are enabling FILESTREAM after installation, you’ll need to
use SQL Server Configuration Manager to set these options. You’ll also need to use sp_configure to set the file stream
access level.
All the familiar date manipulation functions (GETDATE, DATENAME, DATEPART, DATEADD, etc.) work with the new data
types as well as the old data types (which are still supported). The old data types can easily be converted to the new ones. The
new types also have the same flexible input formats as the old ones. In addition, there are some new functions which make it
much easier to get consistent dates regardless of the actual time zone.
Results:
I think these new data types are going to eliminate the problems we’ve had working with the datetime data type in the past,
and I am looking forward to using them.
When you build a condition, you start with a “facet.” A facet is some aspect of a database or a server and is a collection of
properties of that aspect. For example, if you want to prevent people from starting the names of stored procedures with sp_,
you’d use the Stored Procedure facet and select the @name property. Your condition would look like this:
Once you have created the condition(s) for your policy, you can create the policy itself. We’ll create a policy that applies to all
databases with On Change Prevent enforcement.
Now, if someone tries to create a stored procedure whose name starts with sp_, it will not work. Policy-based management
implements this with the Data Definition Language (DDL) triggers introduced in SQL Server 2005. This is an extremely
powerful capability. I’ve only shown you the simplest possible policy, but I hope that this gives you an understanding of
what this feature can provide.
2.2. Auditing
SQL Server 2008 provides an auditing capability. It is based on the SQL Profiler technology and allows you to audit exactly
the same events provided by that tool. However, the setup is more automatic than trying to do the same thing with Profiler.
You can audit all security-related changes: adding or deleting logins, granting database access, granting object permissions,
etc. In addition, you can audit SELECT, INSERT, UPDATE, DELETE and EXECUTE operations. The audit trail can be written
to a file, the Windows Application Event Log or the Windows Security Log. An audit is defined with a Server Audit. This
gives the audit a name and specifies the destination for the audit trail. You can set it up so that the SQL Server will stop if
there is an audit failure.
For each Server Audit, you can create one or more Server Audit Specifications. Here, you define the server level events (such
as creating logins or creating databases) that you want to audit. You can also create Database Audit Specifications. Here, you
can specify the database security activities (authorizing users, granting permissions, etc.) as well as the Data Manipulation
Language (DML) actions that you want to audit. You can limit both specifications to a specific login or user. While useful,
this capability does not provide everything that I think it should provide. For example, when an update statement is recorded,
the audit trail does not show what the data was changed to nor does it show the value (if any) used in the WHERE clause.
Reading the audit trail from the logs is difficult—it’s simply a continuous stream of text, but if you save the audit trail in a file,
you can use a supplied function, fn_get_audit_file, which will present the audit trail in a more readable fashion.
The GUI is very difficult to use, but it does make sure you get all the pieces in the right order. The Transact-SQL commands
are straightforward and are well documented. I think they are easier to use than the GUI, particularly since they allow you to
specify more than one principal. You’ll still have to have a separate line for each securable that you want to audit.
The audit does capture most of the details—who did it, what they did, when they did it. And it’s available to you with only
some relatively tedious setup work. It may save you from having to buy a third party product, and if you start using it now,
I’m sure that we’ll see many improvements in the next release of SQL Server.
2.5. Availability
Database mirroring and peer-to-peer replication were introduced in SQL Server 2005, and both have been enhanced in
SQL Server 2008. Database mirroring can now recover a damaged page in the principal from the undamaged copy in
the mirror. This happens automatically when SQL Server detects the damaged page. The log records used for database
mirroring are now compressed also, which reduces the load on the network and speeds up the process, and peer-to-peer
replication now has built-in conflict detection. When SQL Server sees that there is a conflicting update on one of the
peer servers, it stops sending new data to that server. When the conflict is resolved (a manual, rather than an automated,
process), replication to that server will resume.
3. Performance Enhancements
3.1. Management Data Warehouse and the Data Collector
The Management Data Warehouse (MDW) is a SQL Server database that serves as a repository for historical performance
data. A data collector process runs on a regular basis to collect the data and store it in the database. The repository can be
created on the server where the Data Collector runs or it can be on a different server. You may want to put it on a server of
its own if the process of writing to the repository is affecting the performance of your production server. The same repository
can hold data from multiple servers.
The MDW and Data Collector are very easy to set up with the Wizard in Management Studio, and once the Data Collector
has been running for a short time, you’ll be able to view the results in the three built-in reports provided. All three reports let
you drill down to deeper levels of information. Below are illustrations of the initial page of each report.
Server Activity Report
!
The Data Collector can only collect data from SQL 2008 servers. However, one of the MVPs (Microsoft Most Valuable
Professionals) has posted a collector for SQL 2005 servers on Codeplex (www.codeplex.com), a free, open source
download named the “SQL Monitor Tool.”
The Management Data Warehouse will give you more insight into what’s going on with your SQL Server than you’ve ever
been able to gain with SQL Server tools, and it’s easily a rival to some of the third party monitoring tools available.
Sparse columns have also been introduced in SQL Server 2008 to provide optimized storage for null values.You create a table con-
taining sparse columns in exactly the same way that you create any other table. The only difference is the keyword SPARSE in the
column definition:
CREATE TABLE …
MySparseColumn nvarchar(50) SPARSE null, …
You can refer to the columns in SELECT, INSERT, UPDATE and DELETE statements just as you refer to any other column.You can
also define a column set, which is simply a name given to all of the sparse columns in the table. This allows you to retrieve all the
sparse columns, or to insert or update data in the sparse columns.You need to keep in mind, however, that the column set will be
returned as an XML string, not as discrete values. If you insert or update the column set, you must provide the data in XML format.
The main benefit of sparse columns is the storage savings they provide. The Books Online site has some useful charts that will help
you estimate how much space you would save.
3.3. Filtered Indexes
The new filtered indexes capability is often grouped with sparse columns, but while there is a relationship between sparse columns
and filtered indexes, you may want to use filtered indexes for non-sparse columns as well. A filtered index includes a condition
in its definition. Only values that conform to that definition will be included in the index. This will reduce the index size and the
amount of index maintenance that is required. Because of the smaller index size, filtered indexes also make queries more efficient.
Only non-clustered indexes can be filtered. Consider the Production.Product table in AdventureWorks. It looks like this:
CREATE TABLE [Production].[Product](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[Name] [dbo].[Name] NOT NULL,
[ProductNumber] [nvarchar](25) NOT NULL,
[MakeFlag] [dbo].[Flag] NOT NULL,
[FinishedGoodsFlag] [dbo].[Flag] NOT NULL,
[Color] [nvarchar](15) NULL,
…
[ListPrice] [money] NOT NULL,
…
[ProductSubcategoryID] [int] NULL,
[ProductModelID] [int] NULL,
[SellStartDate] [datetime] NOT NULL,
[SellEndDate] [datetime] NULL,
[DiscontinuedDate] [datetime] NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
)
Suppose that we have a lot of queries for accessories, which are ProductSubCategoryIDs 27 – 36. To optimize those queries, we cre-
ate a filtered index that only includes those subcategory IDs:
CREATE NONCLUSTERED INDEX ProductAccessories
ON Production.Product (Name, ListPrice)
WHERE ProductSubcategoryID >= 27
AND ProductSubcategoryID <= 36;
GO
When you issue a query that has a where clause referencing ProductSubCategoryIDs in that range, the filtered index will be used.
• Resource Pools—A container for the CPU and memory resources. When you create a resource group, you specify the minimum
and maximum CPU and memory that pool is allowed to have.
• Classifier Function—A classifier function determines which workload group a connection should be placed in.
Let’s see how the Resource Governor works to handle the mixed OrderEntry/Reporting scenario described above. We will have two
Resource Pools as shown below:
OrderEntryPool 30 50 30 50
ReportingPool 10 50 10 50
We’ll also have two workload groups: OrderEntryGroup and ReportingGroup. The OrderEntryGroup will be assigned to the Order-
EntryPool and the ReportingGroup will be assigned to the ReportingPool. Now we have to create a classifier function. Our function
will be based on Login Name and will look like this:
CREATE FUNCTION DemoClassifier() RETURNS SYSNAME
WITH SCHEMABINDING
AS
BEGIN
DECLARE @groupname AS sysname
IF SUSER_NAME() = ‘ReportingUser’
SET @groupname = ‘ReportingGroup’
IF SUSER_NAME() = ‘OrderEntryUser’
SET @groupname = ‘OrderEntryGroup’
RETURN @groupname
END
Any connections that aren’t for either OrderEntryUser or ReportingUser will be placed in the Default pool, which is predefined
by SQL Server itself. With these three things in place, and a small amount of setup to enable the Resource Governor, it will
begin doing its work. Microsoft supplies several dynamic management views that will allow you to determine how well the
Resource Governor is working for you. Here’s a sample query:
SELECT s.group_id, CAST(g.name as nvarchar(20)) AS GroupName,
s.session_id, s.login_time,
CAST(s.host_name as nvarchar(20)) AS HostName,
CAST(s.program_name AS nvarchar(20)) AS ProgramName
FROM sys.dm_exec_sessions AS s
INNER JOIN sys.dm_resource_governor_workload_groups AS g
ON g.group_id = s.group_id
WHERE g.name NOT IN (‘internal’, ‘default’)
ORDER BY g.name
The results shown were based on a slightly different set of resource pools and workload groups than those I described above, but
as you can see, this feature may really help you if you’re struggling with the needs of diverse workloads on the same server.
Row compression is not compression in the sense of WinZip or Windows compressed files. Rather, it stores data types in smaller
than normal formats. The vardecimal data type property Microsoft introduced in SQL Server 2005 Service Pack 2 is an example of
this. With row compression, data types may take less than their required number of bytes. For example, take the integer data type.
By default, an integer data type requires four bytes of storage. However, with row compression, if the actual value will fit in one
byte, only one byte will be used. Another example is the way fixed length character strings are handled. By default, a CHAR(5)
column will occupy five bytes of storage with blanks as needed to make five bytes. But if there are blanks, row compression strips
them away, using only the number of bytes needed for the non-blank data.
The Row compression step uses the techniques previously described above.
In Prefix compression, prefix strings are extracted from the data and placed in the page header. Consider the following data:
aaabb aaaab
The first value has a prefix of three a’s while the second has a prefix of four a’s. Once the prefixes are in the page header, the data
values will be modified so that they look like this:
3bb 4b
which means that the first n characters are to be taken from the prefix. In the above example, 10 characters are replaced by 9 (the 4
a’s stored in the page header, 3 for the first string and 2 for the second string). That doesn’t seem like much, but in actual practice—
with more data—the savings are much greater.
The final step is the Dictionary compression. In this step, SQL Server looks for repeated values anywhere in the page, without
regard to which column they are in. The repeated values will be placed in the page header and replaced with a pointer to that
value in the actual data. This could mean substantial space savings—think about how many times the word “street” is going
to appear in a table of addresses. From the outside, retrieval and update of the data is unchanged, but there is now the cost
of reassembling the compressed data when it is being presented.
Row compression will be less CPU-intensive than Page compression, and you probably want to try that first if you’re think-
ing about using compression. Both types of compression are specified in the CREATE or ALTER table statements. Microsoft
provides a stored procedure named sp_estimate_data_compression_savings that you may want to use before deciding that
compression is the right thing for you.
If you’ve heard about a new feature that I didn’t talk about and want to learn more or if you want to dig more deeply into the
features I have talked about, you can download the SQL Server 2008 Books Online from
http://www.microsoft.com/downloads/results.aspx?pocld=&freetext=%22SQL%20Server%20
2008%22&DisplayLang=en
You don’t have to have SQL Server 2008 to download the Books Online.
I’m sold on SQL Server 2008, but I know a database upgrade is a considerable task that many people aren’t eager to do,
particularly when a release is so new. However, if you’re already using SQL 2005, then you’ve really done the hard part—
you’ve gotten over the learning curve of that release. For those of you who are still using SQL 2000, you’ll have a lot of things
to catch up on, but, at this point, there’s no longer any good reason to upgrade to 2005. Just “leapfrog” to 2008. I hope you’ll
start exploring it. Build yourself a test environment (my friends all use virtual machines for this purpose), download the
evaluation edition and see whether or not you find some of the features as appealing as I do.