Академический Документы
Профессиональный Документы
Культура Документы
Abstract
Users are faced with many options and tradeoffs when choosing a backup strategy
for Microsoft SQL Server databases. This paper maps out those tradeoffs and examines
how Data Domain deduplication storage preserves data integrity, meets stringent
RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party
backup environments.
DEDUPLICATION STORAGE
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Additional concepts. . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Executive summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. SQL background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Recovery models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Recovery techniques . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Terminology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 Types of backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Selected SQL backup definitions. . . . . . . . . . . . . . . 7
5. Data Domain product background . . . . . . . . . . . . . . . 9
5.1 Advantages of Data Domain in an SQL
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Data transfer rates . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.1 Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Important Options . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.3 Third-Party Backup Applications . . . . . . . . . . . . . 11
7. Microsoft recommendations. . . . . . . . . . . . . . . . . . . . 12
8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9. Appendix A Backup compression . . . . . . . . . . . . . . . . 12
9.1 Bottlenecks addressed by compression. . . . . . . 12
9.2 Compression challenges. . . . . . . . . . . . . . . . . . . . . 12
9.3 Pick one form of compression, but not both. . 13
10. Appendix B Index Fragmentation. . . . . . . . . . . . . . . 14
10.1 Addressing the challenge . . . . . . . . . . . . . . . . . . 14
2
1. Introduction
Many database administrators prefer native SQL Server backups compared to using third-party backup applications.
4 There is no reliance on the backup administrative team to perform backups or play a role in database recovery.
4 There is no need for a database administrator to become proficient in deploying, configuring, administering, or maintaining a thirdparty backup application.
Historically, native SQL backups have been the target of some criticism for a number reasons:
4 Native SQL backup facilities provide little to no automated media management capabilities. While backups performed to disk media
eliminated the challenge of manually managing tape cartridges, it also introduced the need for additional disk. The cost of disk versus
removable tape media was significant.
4 In addition, many users require retaining an off-site copy of database backups as part of a disaster recovery strategy. Native backup
facilities fell short of providing a viable solution for this requirement.
Deployed as database backup media, Data Domain deduplication storage addresses the historical pitfalls of performing native database
backups:
4 Backups to disk are no longer cost prohibitive due to Data Domain cost-effective backup deduplication ratios.
4 Data Domain replication software enables users to create off-site backup copies that are easily retained for disaster recovery purposes.
4 In addition users may eliminate any need for third-party backup application SQL Server agents and their associated maintenance fees.
This paper provides information about the use of Data Domain deduplication storage as backup media for Microsoft SQL Server backups.
The target audience includes data protection architects, SQL server database administrative staff, and backup administrators seeking
information about integrating Data Domain deduplication storage as a component in a comprehensive data protection strategy.
Figure 1: Native database backup tool. The native database backup tool
is easy to use and provides a feature set that addresses many business
requirements. Figure 1 depicts the native database backup tool being used to
perform a full database backup to disk.
The second backup methodology uses third-party backup application software that interfaces with Microsoft SQL Server to
perform SQL database backups based on the Virtual Backup Device
Interface (VDI). This solution is typically packaged as a database
agent specifically for Microsoft SQL server and a particular backup
application. When VDI is used, the backup application allows
setting customized backup and recovery parameters similar to
those that can be employed when using native Microsoft SQL tools
and utilities.
Third-party backup software may also use available snapshot technologies designed to enhance functionality or otherwise add value
to backup and recovery processes (Figure 2). When the snapshot
type is based on Microsoft Volume Shadow Copy Service (VSS), the
backup application is the VSS requestor, the SQL Server is the VSS
writer, and backup is coordinated with a VSS provider. Advanced
backup and recovery features such as disk staging and instant
recovery may be available with these implementations depending
on the backup application and agent being used. Drawbacks to
this strategy may include a user interface foreign to the database
administrative staff and substantial third-party backup application
license fees.
2. Executive Summary
For those already well briefed on both Microsoft SQL Server and
Data Domain, Table 1 presents a summary of the suggested best
practices. Explanations and reasoning for these suggestions are
discussed later in this paper.
Parameters Affecting Deduplication
Performance
SQL Server 2008 native compression
NO_COMPRESSION
Disabled
None
Setting
BLOCKSIZE
BUFFERCOUNT
MAXTRANSFERSIZE
Stripes
Setting
Setting
IP Network
Mount Options
Setting
UNC path
Miscellaneous Options
Setting
Replication
3. SQL Background
A Microsoft SQL server instance includes system and user databases. System databases are created at installation and include:
4 The master database, which records all of the system-level
information for a Microsoft SQL Server. It contains records for
all login accounts and all system configuration settings. The
master database records the existence and location of all other
databases.
4 The model database, which is used as a template that
contains the default settings for all databases created within the
Microsoft SQL server instance.
4 The msdb database, which is used for scheduling, alerts and
jobs.
4 The tempdb database which serves as a global resource that
contains all temporary tables and temporary stored procedures.
It is re-created every time the Microsoft SQL Server instance is
started.
File backups:
4 File Backup This consists of a full backup of all the data in one
or more files or filegroups.
4 Differential File Backup This is a backup of one or more files
containing data extents changed since the prior full backup of
each file.
Copy-Only backups:
4 Database backups usually change the database in some
way, such as truncating a transaction log in the case of a full
database backup. Copy-Only backups can be used in cases
where a backup of a database is required without changing the
database.
4. Terminology
Entire databases, specific database files, file groups, and transaction
log backups are among the supported backup types with Microsoft
SQL Server. This section defines the terminology associated with a
given backup type.
COMPRESSION
Specific to SQL Server 2008 Enterprise and later versions, backup
compression can be enabled or disabled. The default product
installation does not compress backups. A server-level compression
setting can be applied that alters default behavior. The use of the
COMPRESSION keyword within a backup SQL transaction explicitly
enables backup compression. The use of the NO_COMPRESSION
keyword within a backup SQL transaction explicitly disables backup
compression.
Partial backups:
Partial backups provide flexibility for backing up databases that
contain some number of read-only file groups.
4 Partial Backup This is a partial backup of all data in the primary
filegroup, each read/write filegroup, and any optionally specified
read-only files or filegroups.
4 Differential Partial Backup This backup contains only the
extents modified since the prior partial backup of the same set
of filegroups.
Figure 10: SQL Server 2008 native compression. The Compress backup
server-level property. This property is used for backup jobs that do not
explicitly enable or disable compression.
BLOCKSIZE
The BLOCKSIZE keyword can be used to alter physical block size
used when writing to backup media. By default the backup process
will automatically select a block size appropriate for the backup
device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K and 64K
bytes. The default value used for disk backup is 512 bytes.
The default 512-byte size yields excellent performance with
Data Domain systems. Third-party backup applications may
substitute their own default value. The fact that this parameter
can be adjusted is included as reference. The use of larger sizes
may improve or degrade performance. Users are encouraged to
investigate further to determine what value may provide optimal
results in their environment.
BUFFERCOUNT
The BUFFERCOUNT keyword specifies the total number of I/O
buffers used for the backup process. Any positive integer value can
be specified.
The practice of using a minimum of 2 buffers per stripe is recommended. This practice simultaneously provides one buffer that
can be written into from the database (a reader thread) and one
buffer that can be read out of for data transfer to a storage device
(a writer thread). Buffers consume memory on the Microsoft
SQL Server based on the BUFFERCOUNT and MAXTRANSFERSIZE
keyword parameters.
Figure 12: Full database backup with BUFFERCOUNT = 2. The same full
database backup represented in figure 11 using the optional BUFFERCOUNT
keyword with a parameter value equal to 2. The use of two buffers
increased backup data transfer rate performance by approximately 9% when
compared to using a single buffer.
MAXTRANSFERSIZE
The MAXTRANSFERSIZE keyword specifies the unit of transfer in
bytes used between SQL server and the backup media. Values can
range from 64 KB to 4 MB.
Larger units of transfer are generally preferred to smaller values.
Excessive use of buffers combined with larger units of transfer consumes Microsoft SQL Server memory. Care should be taken to avoid
memory-related errors as the result of using these parameters.
STRIPES
While not a keyword within the context of Microsoft SQL Server,
the term stripes correlates to the number of simultaneous backup
streams to be created for a given backup operation. In the case
of disk backups with SQL server, multi-streamed backups are
performed by specifying a number of backup disk targets with the
BACKUP command.
The recommended use of SQL stripes is as a speed matching
technology. Multiple backup streams from a given database can be
simultaneously written to a target Data Domain system in an effort
to achieve an aggregate data transfer rate that aligns with business
requirements.
Figure 14: Database backup to a null device part 1. The results of the query
indicate that the theoretical maximum rate at which the SQL Server backup
function can extract data from this database using a single stripe is approximately 80 MB/s. Regardless of the data transfer rate at which backup media
can accept data, backing up this database as it currently stands will be speed
limited to 80 MB/s when using a single stripe.
Figure 15: Database backup to a null device part 2. Improved results as the
single stripe database backup to a null disk device now executes at more than
twice the initial data transfer rate.
6.1 Planning
Figure 16: Database backup to multiple null devices. The use of multiple null
disk devices. Similar to multi-striped backups, the use of multiple null disk
devices increases the number of readers used during the backup process.
6. Integration
Direct integration with Microsoft SQL Server, where the Data
Domain system is used as disk backup media, is accomplished
by using the Data Domain system as a CIFS (Common Internet
Filesystem) share. As a general rule, the UNC path to the share
should be used instead of a mapped drive because: a) scheduled
backups may execute when no user is logged on to the server and
b) when Sqlservr.exe is executed as a service, it has no relation to a
login session.
Third-party backup applications used to protect Microsoft SQL
Server can also take advantage of Data Domain systems employed
as backup media. Data Domain systems are easily configured as
supported backup media types including VTL, CIFS share, NFS
mount, or OpenStorage disk pool (OpenStorage requires an
10
6.2.2 Compression
Data Domain recommends NOT using Microsoft SQL Server-based
compression in conjunction with backups written to Data Domain
systems. This topic is covered in greater detail in appendix A.
6.2.3 Multiplexing
When the Data Domain system is integrated as a backup device
with a backup application that supports multiplexed backups, Data
Domain recommends disabling multiplexed backups. Multiplexing
limits the ability of the Data Domain system to deduplicate incoming data.
Historically used as a speed matching solution where multiple
slower data streams were multiplexed into a single stream to take
advantage of a somewhat faster tape drive, backups to disk derive
no advantage from multiplexing. Whether deployed as a CIFS
share, NFS mount, VTL, or OpenStorage disk pool, Data Domain
systems accommodate writing multiple backup streams in parallel
without multiplexing.
6.2.4 Network
When Data Domain systems are deployed as a CIFS backup share,
Data Domain recommends interconnecting SQL Servers and Data
Domain systems using a dedicated backup area network.
11
8. Summary
A Data Domain system makes an excellent target for Microsoft SQL
Server backups because it:
4 Integrates easily and seamlessly into existing Microsoft SQL
Server environments
4 Allows the database administrative team to retain a greater
number of full backup images online, thereby optimizing
recovery options while occupying minimal footprint in the data
center
4 Greatly reduces dependence on tape
7. Microsoft Recommendations
A comprehensive collection of resources that address Microsoft SQL
Server backup and restore are available online. This section includes
a brief sampling of technical articles that can be referenced as
required.
4 SQL Server 2000 Backup and Restore
http://technet.microsoft.com/en-us/library/cc966495.aspx
4 Backing Up and Restoring Databases in SQL Server from
SQL Server 2005 Books Online
http://msdn.microsoft.com/en-us/library/ms187048(SQL.90).aspx
4 Backing Up and Restoring Databases in SQL Server - from SQL
Server 2008 Books Online
http://msdn.microsoft.com/en-us/library/ms187048.aspx
4 Optimizing Backup and Restore Performance in SQL Server
http://msdn.microsoft.com/en-us/library/ms190954(SQL.90).aspx
12
Figure 21: CPU utilization with MS SQL Server 2008 native compression
Figure 23: Data Domain system status. Sample output of the sysstat command on a Data Domain system captured during a database backup. When
compression and deduplication are performed on the Data Domain system,
CPU usage on the SQL Server platform is greatly reduced when compared to
the use of server-based compression. Also worth noting is that the Net in
data transfer rate is near the theoretical maximum that can be achieved with
2 GbE network connections. The next logical step to eliminate this bottleneck
would be to use additional GbE interfaces or to employ the use of a single 10
GbE network connection.
Figure 22: CPU utilization with a third party solution using level 5
compression
13
Figure 24: DBCC showcontig command output. This graphic includes extent
scan fragmentation data indicating that index C_Customer_I1 does not
require defragmentation at this time.
Data Domain | 2421 Mission College Blvd., Santa Clara, CA 95054 | 866-WE-DDUPE, 408-980-4800
Copyright 2010 Data Domain LLC. All rights reserved.
Data Domain LLC believes information in this publication is accurate as of its publication date. This publication could include technical inaccurancies or typographical errors. The information is subject to change without notice. Changes are periodically added to the information herein; these
changes will be incorporated in new additions of the publication. Data Domain LLC may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Reproduction of this publication without prior written permission is forbidden.
The information in this publication is provided as is. Data Domain LLC makes no representations or warranties of any kind, with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Data Domain and Global Compression are trademarks of Data Domain LLC All other brands, products, service names, trademarks, or registered service marks are used to identify the products or services of their respective owners. WP-MSSQL-0210
DEDUPLICATION STORAGE
www.datadomain.com