Вы находитесь на странице: 1из 14

White Paper

Microsoft SQL Server

Best Practices with Data Domain Deduplication Storage

Abstract
Users are faced with many options and tradeoffs when choosing a backup strategy
for Microsoft SQL Server databases. This paper maps out those tradeoffs and examines
how Data Domain deduplication storage preserves data integrity, meets stringent
RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party
backup environments.

DEDUPLICATION STORAGE

Microsoft SQL Server


Best Practices with Data Domain

Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Additional concepts. . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Executive summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. SQL background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Recovery models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Recovery techniques . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Terminology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 Types of backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Selected SQL backup definitions. . . . . . . . . . . . . . . 7
5. Data Domain product background . . . . . . . . . . . . . . . 9
5.1 Advantages of Data Domain in an SQL
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Data transfer rates . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.1 Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Important Options . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.3 Third-Party Backup Applications . . . . . . . . . . . . . 11
7. Microsoft recommendations. . . . . . . . . . . . . . . . . . . . 12
8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9. Appendix A Backup compression . . . . . . . . . . . . . . . . 12
9.1 Bottlenecks addressed by compression. . . . . . . 12
9.2 Compression challenges. . . . . . . . . . . . . . . . . . . . . 12
9.3 Pick one form of compression, but not both. . 13
10. Appendix B Index Fragmentation. . . . . . . . . . . . . . . 14
10.1 Addressing the challenge . . . . . . . . . . . . . . . . . . 14
2

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

1. Introduction
Many database administrators prefer native SQL Server backups compared to using third-party backup applications.
4 There is no reliance on the backup administrative team to perform backups or play a role in database recovery.
4 There is no need for a database administrator to become proficient in deploying, configuring, administering, or maintaining a thirdparty backup application.
Historically, native SQL backups have been the target of some criticism for a number reasons:
4 Native SQL backup facilities provide little to no automated media management capabilities. While backups performed to disk media
eliminated the challenge of manually managing tape cartridges, it also introduced the need for additional disk. The cost of disk versus
removable tape media was significant.
4 In addition, many users require retaining an off-site copy of database backups as part of a disaster recovery strategy. Native backup
facilities fell short of providing a viable solution for this requirement.
Deployed as database backup media, Data Domain deduplication storage addresses the historical pitfalls of performing native database
backups:
4 Backups to disk are no longer cost prohibitive due to Data Domain cost-effective backup deduplication ratios.
4 Data Domain replication software enables users to create off-site backup copies that are easily retained for disaster recovery purposes.
4 In addition users may eliminate any need for third-party backup application SQL Server agents and their associated maintenance fees.
This paper provides information about the use of Data Domain deduplication storage as backup media for Microsoft SQL Server backups.
The target audience includes data protection architects, SQL server database administrative staff, and backup administrators seeking
information about integrating Data Domain deduplication storage as a component in a comprehensive data protection strategy.

1.1 Basic Concepts


Microsoft SQL backup methodology falls into one of two generic
categories. The first consists of native SQL Server database backups.
This backup technique creates SQL database backups using tools
and utilities native to Microsoft SQL Server and does not rely on
third-party backup application software (Figure 1). Benefits include
the use of backup and recovery interfaces familiar to the database
administrative staff. This ability is included with Microsoft SQL
server, and there are no additional third-party software license fees.

Figure 1: Native database backup tool. The native database backup tool
is easy to use and provides a feature set that addresses many business
requirements. Figure 1 depicts the native database backup tool being used to
perform a full database backup to disk.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

The second backup methodology uses third-party backup application software that interfaces with Microsoft SQL Server to
perform SQL database backups based on the Virtual Backup Device
Interface (VDI). This solution is typically packaged as a database
agent specifically for Microsoft SQL server and a particular backup
application. When VDI is used, the backup application allows
setting customized backup and recovery parameters similar to
those that can be employed when using native Microsoft SQL tools
and utilities.
Third-party backup software may also use available snapshot technologies designed to enhance functionality or otherwise add value
to backup and recovery processes (Figure 2). When the snapshot
type is based on Microsoft Volume Shadow Copy Service (VSS), the
backup application is the VSS requestor, the SQL Server is the VSS
writer, and backup is coordinated with a VSS provider. Advanced
backup and recovery features such as disk staging and instant
recovery may be available with these implementations depending
on the backup application and agent being used. Drawbacks to
this strategy may include a user interface foreign to the database
administrative staff and substantial third-party backup application
license fees.

Figure 3: Augmented backup solution. This methodology uses two backup


solutions in conjunction to satisfy business objectives. SQL native database
backups are performed to a Data Domain system and are subsequently
backed up by a third-party backup application and written to the same Data
Domain system.

2. Executive Summary
For those already well briefed on both Microsoft SQL Server and
Data Domain, Table 1 presents a summary of the suggested best
practices. Explanations and reasoning for these suggestions are
discussed later in this paper.
Parameters Affecting Deduplication
Performance
SQL Server 2008 native compression

NO_COMPRESSION

Third-party backup application SQL Server


local compression

Disabled

Third-party backup application multiplexing

None

Parameters Affecting Backup


and Recovery Performance
Figure 2: NetBackup MS SQL Client backup GUI. The NetBackup MS SQL
Client graphical user interface is an example third-party backup application
that uses VDI to interface with Microsoft SQL server.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Setting

BLOCKSIZE

Default 512 bytes or higher based


on performance improvements

BUFFERCOUNT

Minimum 2 buffers per stripe


(requires available memory based
on MAXTRANSFERSIZE value)

MAXTRANSFERSIZE

4194304 (requires available


memory based on BUFFERCOUNT
value)

Stripes

Consider the use of multiple


stripes to improve backup and
restore data transfer rates

Server Disk Subsystem

Database and log files should


be placed on disk storage with
performance attributes facilitating
required transaction and backup
performance metrics

1.2 Additional concepts


Many customers utilizing the native Microsoft SQL Server database
backup methodology augment the solution with third-party backup
client agents that effectively protect the native backup data as a
flat file. This two-phased methodology is effectively backing up a
backup. Among the perceived benefits of the augmented solution
is that it allows segregation of the SQL database administrative
staff from the data protection staff while providing means to retain
database backups in conformance with sound business practices.
(Figure 3)

Setting

Parameters Affecting Backup


and Recovery Performance

Setting

IP Network

Dedicated backup network that


meets or exceeds bandwidth
requirements for the desired data
transfer rates

Data Domain System

Sized to meet or exceed ingest


rate and backup retention
requirements

Mount Options

Setting

When performing native


database backups

UNC path

When using a third-party


backup server

Dependent on backup application


and server OS type

Miscellaneous Options

Setting

Co-mingling native and


third-party backup application
database backups

Yes (this should have a negligible


impact on deduplication ratios)

Replication

Yes, use the Data Domain system to


replicate database backups to remote
DR site

Table 1: Summary of recommendations

3. SQL Background
A Microsoft SQL server instance includes system and user databases. System databases are created at installation and include:
4 The master database, which records all of the system-level
information for a Microsoft SQL Server. It contains records for
all login accounts and all system configuration settings. The
master database records the existence and location of all other
databases.
4 The model database, which is used as a template that
contains the default settings for all databases created within the
Microsoft SQL server instance.
4 The msdb database, which is used for scheduling, alerts and
jobs.
4 The tempdb database which serves as a global resource that
contains all temporary tables and temporary stored procedures.
It is re-created every time the Microsoft SQL Server instance is
started.

Figure 4: System and user databases. Master, model, msdb, and


tempdb system databases as provided by the Microsoft SQL Server
Management Studio interface.

Data protection strategies for the system databases are dependent


on the database being protected. For instance, transaction log
backups are not supported for the master database. The master
database cannot be recovered if a functional version of it does not
already exist. Recovery procedures for the master database may
include re-installing Microsoft SQL Server such that a backup of the
pre-disaster master database can then be restored.
The model and msdb databases can contain customized data
such as user-specific templates, scheduling information, as well as
backup and restore history information. Without a data protection
strategy, these items will need to be manually reconstructed in the
event of a disaster.
The tempdb database is empty when the SQL instance is shut
down, and does not require protection as it is re-created at startup.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

3.1 Recovery models


Microsoft SQL Server includes three recovery models: simple, bulk
logged, and full. The desired recovery model can be deployed
based on requirements. Functionally, each recovery model differs
with regard to how backup and recovery strategies are executed.
4 The full recovery model includes log backups. This model
typically has no exposure to data loss. Point in time recovery is
possible, up to including the last committed transaction.
4 The bulk logged recovery model requires log backups. This
model permits high performance bulk copy operations. Recovery
to the end of any backup is possible, point in time recovery is
not supported.
4 The simple recovery model consists of performing full backups
only. Logs are not backed up. In the event database recovery
is required, the most recent full backup can be restored. Any
changes that occurred subsequent to the last full backup must
be redone. From a transactional perspective, the database can
only be recovered to the point of the prior full backup.

Figure 5: Recovery model selection. Selection of the desired recovery model


via the Database Properties dialog window.

3.2 Recovery techniques

Figure 6: Restore database dialog window. General database restore


attributes. Note that by default the full backup and subsequent transaction
log backups are all checked. Clicking the OK button would initiate recovery
to the most recent possible point in time. Alternatively, recovery to a specific
point in time is also possible.

Figure 7: Restore database options. Available database recovery options. Note


that by default an existing database will not be overwritten. Also note that
by default the recovery state is RESOTRE WITH RECOVERY, which leaves
the recovered database in an online and usable state after the restore process
completes.

The technique used to restore a database will vary based on the


recovery model being used as well as the backup types being
performed. Figures 6-9 provide a brief look at restoring a database
that was protected using the full recovery model with full and
transaction log backups. A single full backup was performed,
followed by 5 transaction log backups.

Figure 8: Recovery query. An example of a recovery transaction that restores


the initial full backup followed by the first transaction log backup. The
remaining transaction logs were not included in this query for brevity.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Third-party backup applications will each have a unique recovery


interface for databases. Many automate and coordinate the
recovery of full and transaction log backups similar to the way
native Microsoft SQL Server tools and utilities do.

File backups:
4 File Backup This consists of a full backup of all the data in one
or more files or filegroups.
4 Differential File Backup This is a backup of one or more files
containing data extents changed since the prior full backup of
each file.

Transaction log backups:


4 Regular transaction log backups are required when using the
full or bulk-logged recovery models. This backup contains all log
records that have not been backed up previously.

Copy-Only backups:
4 Database backups usually change the database in some
way, such as truncating a transaction log in the case of a full
database backup. Copy-Only backups can be used in cases
where a backup of a database is required without changing the
database.

4.2 Selected SQL backup definitions


Figure 9: NetBackup MS SQL client restore GUI. In this example a single full
database backup and 5 transaction log backups are available for recovery.

4. Terminology
Entire databases, specific database files, file groups, and transaction
log backups are among the supported backup types with Microsoft
SQL Server. This section defines the terminology associated with a
given backup type.

4.1 Types of backups


Database backups:
4 Database Backup This is a full backup of an entire database
and represents the state of the database at the point when the
backup is completed.
4 Differential Database Backup This is a backup of all the files
within a database, and contains only the extents modified since
the most recent full backup of each file. Restoring a database
protected with full and differential backups to the most recent
point in time includes recovering the most recent full and
differential backup.

A subset of Microsoft SQL backup definitions as they relate to


the use of Data Domain deduplication storage are detailed in
this section. Examples shown here use the Microsoft SQL Server
Management Studio query interface. The use of these keywords
and associated parameters can impact backup performance in
terms of both data deduplication and data transfer rates.

COMPRESSION
Specific to SQL Server 2008 Enterprise and later versions, backup
compression can be enabled or disabled. The default product
installation does not compress backups. A server-level compression
setting can be applied that alters default behavior. The use of the
COMPRESSION keyword within a backup SQL transaction explicitly
enables backup compression. The use of the NO_COMPRESSION
keyword within a backup SQL transaction explicitly disables backup
compression.

Partial backups:
Partial backups provide flexibility for backing up databases that
contain some number of read-only file groups.
4 Partial Backup This is a partial backup of all data in the primary
filegroup, each read/write filegroup, and any optionally specified
read-only files or filegroups.
4 Differential Partial Backup This backup contains only the
extents modified since the prior partial backup of the same set
of filegroups.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Figure 11: Full database backup with BUFFERCOUNT = 1. A full backup of


the Test_One database using the optional BUFFERCOUNT keyword with a
parameter value equal to 1. This query executed in 118.238 seconds with a
data transfer rate equal to 49.748 MB/s.

Figure 10: SQL Server 2008 native compression. The Compress backup
server-level property. This property is used for backup jobs that do not
explicitly enable or disable compression.

The use of native SQL server compression is not recommended as a


best practice when using Data Domain deduplication storage.

BLOCKSIZE
The BLOCKSIZE keyword can be used to alter physical block size
used when writing to backup media. By default the backup process
will automatically select a block size appropriate for the backup
device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K and 64K
bytes. The default value used for disk backup is 512 bytes.
The default 512-byte size yields excellent performance with
Data Domain systems. Third-party backup applications may
substitute their own default value. The fact that this parameter
can be adjusted is included as reference. The use of larger sizes
may improve or degrade performance. Users are encouraged to
investigate further to determine what value may provide optimal
results in their environment.

BUFFERCOUNT
The BUFFERCOUNT keyword specifies the total number of I/O
buffers used for the backup process. Any positive integer value can
be specified.
The practice of using a minimum of 2 buffers per stripe is recommended. This practice simultaneously provides one buffer that
can be written into from the database (a reader thread) and one
buffer that can be read out of for data transfer to a storage device
(a writer thread). Buffers consume memory on the Microsoft
SQL Server based on the BUFFERCOUNT and MAXTRANSFERSIZE
keyword parameters.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Figure 12: Full database backup with BUFFERCOUNT = 2. The same full
database backup represented in figure 11 using the optional BUFFERCOUNT
keyword with a parameter value equal to 2. The use of two buffers
increased backup data transfer rate performance by approximately 9% when
compared to using a single buffer.

MAXTRANSFERSIZE
The MAXTRANSFERSIZE keyword specifies the unit of transfer in
bytes used between SQL server and the backup media. Values can
range from 64 KB to 4 MB.
Larger units of transfer are generally preferred to smaller values.
Excessive use of buffers combined with larger units of transfer consumes Microsoft SQL Server memory. Care should be taken to avoid
memory-related errors as the result of using these parameters.

STRIPES
While not a keyword within the context of Microsoft SQL Server,
the term stripes correlates to the number of simultaneous backup
streams to be created for a given backup operation. In the case
of disk backups with SQL server, multi-streamed backups are
performed by specifying a number of backup disk targets with the
BACKUP command.
The recommended use of SQL stripes is as a speed matching
technology. Multiple backup streams from a given database can be
simultaneously written to a target Data Domain system in an effort
to achieve an aggregate data transfer rate that aligns with business
requirements.

5.2 Data transfer rates


Multiple business objectives are considered when determining
required backup and recovery data transfer rates. Decision criteria
include backup window duration, log growth, and recovery time.
By definition, slow backups are those that fail to meet or exceed
business objectives. Understanding factors that can affect performance is critical to removing them from the environment.

Figure 13: Multi-striped database backup to disk. A database backup that


uses 8 stripes in an effort improve backup data transfer rate performance.
Multiple stripes can be used to better match data transfer rate capabilities
between source and destination media.

A reasonable place to start any backup performance investigation


is to understand the theoretical maximum speed at which SQL
Server can process a given database backup. Performing a database
backup to a null disk device provides an estimate of that maximum
achievable speed in a given environment. (Figure 14)

5. Data Domain product background


Data Domain deduplication storage systems minimize backup and
recovery times, storage and network bandwidth, and risk of data
loss. Data Domain offers a comprehensive range of products to
meet the backup and archive storage needs of companies of all
sizes as they seek to reduce costs and simplify data management.
Data Domain systems also offer replication that is extremely easy to
deploy. The primary advantage of Data Domain system replication
is that the data is all deduplicated and compressed prior to being
sent over the network.

5.1 Advantages of Data Domain in an SQL environment


Data Domain systems can be directly integrated into Microsoft
SQL Server environments as disk backup media. In addition, Data
Domain systems support all leading enterprise backup and archive
applications for seamless integration into existing IT infrastructures.
The use of different backup methodologies with Microsoft SQL
Server and Data Domain systems typically has a negligible effect on
overall data deduplication ratios. This enables performing native
database backups in conjunction with database backups controlled
by a third-party backup application without affecting deduplication
efficiency. This includes third-party backup applications that use an
SQL agent, with or without VSS snapshots. Additionally, the use of
different numbers of stripes or different BLOCKSIZE values also has
a negligible impact on deduplication ratios.
Data Domain replication can be used to create offsite copies of
SQL backups faster and more economically than legacy tape-based
strategies. Data Domain replication makes advanced disaster
recovery preparedness for MS SQL Server a reality.

Figure 14: Database backup to a null device part 1. The results of the query
indicate that the theoretical maximum rate at which the SQL Server backup
function can extract data from this database using a single stripe is approximately 80 MB/s. Regardless of the data transfer rate at which backup media
can accept data, backing up this database as it currently stands will be speed
limited to 80 MB/s when using a single stripe.

Factors that can contribute to slow database backups to a null


disk device include disk file fragmentation and the underlying disk
storing the database. In the example noted in figure 14, the .mdf
and .ldf files associated with the Test_One database resided
on a single spindle 250 GB 7200 RPM SATA disk drive. Relocating
the .mdf and .ldf files to a higher performance striped
external RAID array volume increased the rate at which SQL Server
was able to process the database backup.

Figure 15: Database backup to a null device part 2. Improved results as the
single stripe database backup to a null disk device now executes at more than
twice the initial data transfer rate.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

OST-compliant backup application such as Veritas NetBackup from


Symantec). Additionally, OpenStorage adds enhanced backup
image replication capabilities known as optimized duplication.
In this scenario, backup images are replicated from one Data
Domain system to another under the direct control of NetBackup.
NetBackup monitoring, reporting, and cataloging of duplicates can
be used to architect a comprehensive disaster recovery plan.

6.1 Planning

Figure 16: Database backup to multiple null devices. The use of multiple null
disk devices. Similar to multi-striped backups, the use of multiple null disk
devices increases the number of readers used during the backup process.

Consider the use of non-default values for BUFFERCOUNT and


MAXTRANSFERSIZE in addition to the use of multiple backup
stripes when investigating database backup performance with
one or more null disk devices. (Figure 16) Once an acceptable null
device backup data transfer rate is achieved, additional steps can
be taken to understand and remove other bottlenecks from the
remainder of the backup process.

Capacity and performance planning play a critical role in both


successful deployment and ongoing production usage of a Data
Domain system. A detailed capacity analysis should be performed
by a knowledgeable Data Domain system engineer. The analysis
considers database sizes, growth rates, change rates, and retention
periods as input criteria. Performance analysis considers data points
such as the required aggregate data transfer rate for backups,
connection topology requirements to support the data transfer
rate, and the Data Domain system required to meet or exceed the
required data transfer rate.
Beyond capacity and performance planning are additional considerations for Data Domain system replication.
4 What database backups should be replicated?
Replicating all database backups is certainly possible. However,
many users will want to implement replication at a more
granular level. Production database backups are usually excellent
replication candidates, whereas development and test database
backups are less critical. An analysis of network bandwidth and
destination disk space requirements should be performed by a
knowledgeable Data Domain system engineer.
4 Will database backups be replicated to a disaster recovery site,
or between multiple production sites?

Figure 17: Nominal database backup performance improvement. This query


shows a moderately tuned 8 stripe SQL database backup with an aggregate
data transfer rate of approximately 172 MB/s indicating that the network
attached backup devices are now limiting throughput.

6. Integration
Direct integration with Microsoft SQL Server, where the Data
Domain system is used as disk backup media, is accomplished
by using the Data Domain system as a CIFS (Common Internet
Filesystem) share. As a general rule, the UNC path to the share
should be used instead of a mapped drive because: a) scheduled
backups may execute when no user is logged on to the server and
b) when Sqlservr.exe is executed as a service, it has no relation to a
login session.
Third-party backup applications used to protect Microsoft SQL
Server can also take advantage of Data Domain systems employed
as backup media. Data Domain systems are easily configured as
supported backup media types including VTL, CIFS share, NFS
mount, or OpenStorage disk pool (OpenStorage requires an
10

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Backups are typically replicated to serve as a second backup


copy for recovery in the event of a disaster. When backups from
a primary site are being replicated to a secondary site, planning
is relatively straightforward. Users with multiple primary sites
may decide to implement a bidirectional replication solution
where database backups from either site are replicated to the
alternate site. Proper planning should render an outline detailing
which database backups are being replicated to each location.
4 Will tape-based backup copies be required?
Some users replicate backup images to a central location for
disaster recovery purposes while also using the solution as a
vehicle that enables centralized tape creation. The third-party
backup application used to create tape-based backup copies
will dictate any additional considerations or restrictions that
this solution involves. A knowledgeable Data Domain system
engineer will be able to assist with this planning task.

6.2 Important Options


6.2.1 Backup types
The goal of backups is to satisfy recovery time and point objectives.
Outlining a strategy of full, differential, and transaction log backups
is beyond the scope of this paper. That stated, there are a few key
points worth noting:
4 Performing full backups frequently with Data Domain deduplication storage does not create a storage usage penalty, as
redundant database segments do not consume additional disk
space. While this may appear to enable the ability to perform
full backups more frequently, the load full backups place on
the Microsoft SQL Server and connection topology to the Data
Domain system should be taken into consideration.
4 When split-mirror or snapshot backups are performed and
controlled by a third-party backup application, the Data Domain
system is easily integrated as a backup storage device. The
features provided by these backup techniques (low-impact
backups, instant recovery, etc.) do not preclude the use of Data
Domain technology.

6.2.2 Compression
Data Domain recommends NOT using Microsoft SQL Server-based
compression in conjunction with backups written to Data Domain
systems. This topic is covered in greater detail in appendix A.

Whenever possible, the network used for backup and recovery


communications should be segregated from other production
networks. This best practice recommendation seeks to assure that
network bandwidth is available for backup and restore jobs to meet
or exceed business objectives.
Network bandwidth requirements may dictate the need for a
topology that supports data transfers in excess of 125 MB/s. All
Data Domain systems support the use of multiple GbE network
interfaces, and many support the use of 10 GbE network interfaces. A knowledgeable Data Domain system engineer will be able
to assist with planning the deployment based on user requirements
and available resources.

6.3 Third-Party Backup Applications


When Data Domain systems are integrated with third-party backup
applications, it is important to note that Microsoft SQL Server
backup parameters are handled the same as when compared to
a native SQL Server backup implementation. The COMPRESSION,
BLOCKSIZE, BUFFERCOUNT, and MAXTRANSFERSIZE keywords,
as well as any striping, are still valid parameters. Some of these
settings may be unavailable when using a third-party backup
application. (Figures 18, 19)

6.2.3 Multiplexing
When the Data Domain system is integrated as a backup device
with a backup application that supports multiplexed backups, Data
Domain recommends disabling multiplexed backups. Multiplexing
limits the ability of the Data Domain system to deduplicate incoming data.
Historically used as a speed matching solution where multiple
slower data streams were multiplexed into a single stream to take
advantage of a somewhat faster tape drive, backups to disk derive
no advantage from multiplexing. Whether deployed as a CIFS
share, NFS mount, VTL, or OpenStorage disk pool, Data Domain
systems accommodate writing multiple backup streams in parallel
without multiplexing.

6.2.4 Network
When Data Domain systems are deployed as a CIFS backup share,
Data Domain recommends interconnecting SQL Servers and Data
Domain systems using a dedicated backup area network.

Figure 18: SQL backup parameters NetBackup 6.5.3. Third-party backup


applications may allow the use of keyword parameters similar to native
Microsoft SQL Server backup tools. Figure 18 shows the NetBackup MS SQL
Client interface. Note as of NetBackup version 6.5.3 there is no ability to
override the MS SQL 2008 server level compression setting.

When deployment is in conjunction with a backup application as


a CIFS share, NFS mount, or OpenStorage disk pool, Data Domain
similarly recommends interconnecting backup application media
servers and Data Domain systems using a dedicated backup area
network.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

11

8. Summary
A Data Domain system makes an excellent target for Microsoft SQL
Server backups because it:
4 Integrates easily and seamlessly into existing Microsoft SQL
Server environments
4 Allows the database administrative team to retain a greater
number of full backup images online, thereby optimizing
recovery options while occupying minimal footprint in the data
center
4 Greatly reduces dependence on tape

9. Appendix A Backup Compression


Figure 19: SQL backup parameters Backup Exec 12.5. Details of the
ability to set MS SQL Server 2008 compression on a per job basis with
Backup Exec version 12.5. Other parameters such as BUFFERCOUNT and
MAXTRANSFERSIZE are absent from the Backup Job Properties dialog
window.

Users of third-party backup applications seeking to exploit the


full compliment of available Microsoft SQL Server backup options
should contact their software provider in the event additional
information is required.

7. Microsoft Recommendations
A comprehensive collection of resources that address Microsoft SQL
Server backup and restore are available online. This section includes
a brief sampling of technical articles that can be referenced as
required.
4 SQL Server 2000 Backup and Restore
http://technet.microsoft.com/en-us/library/cc966495.aspx
4 Backing Up and Restoring Databases in SQL Server from
SQL Server 2005 Books Online
http://msdn.microsoft.com/en-us/library/ms187048(SQL.90).aspx
4 Backing Up and Restoring Databases in SQL Server - from SQL
Server 2008 Books Online
http://msdn.microsoft.com/en-us/library/ms187048.aspx
4 Optimizing Backup and Restore Performance in SQL Server
http://msdn.microsoft.com/en-us/library/ms190954(SQL.90).aspx

Performing compression on the Microsoft SQL Server when


backups are executed can provide benefit by reducing the overall
size of the backup. Smaller backups require fewer I/O operations
to write to backup devices, consume less backup media, and may
execute faster when compared to uncompressed backups.
This appendix examines the tradeoffs associated with server-based
compression; the use of Microsoft SQL Server CPU cycles versus
implementing a backup infrastructure that reduces impact to
transactional processing performance.

9.1 Bottlenecks addressed by compression


Examining the backup data transfer path assists in providing an
understanding of the backup bottlenecks that compression is able
to circumvent:
4 SQL Server to directly connected disk storage
4 SQL Server to directly connected tape storage
4 SQL Server network connected to a backup application media
server
Bandwidth constraints between the Microsoft SQL Server and destination storage device are mitigated with server-based compression
as less data is being transferred between the server and storage
device. Likewise, write speed limitations of the storage device are
also mitigated by writing less data during a backup.

9.2 Compression challenges


While the benefits of compression are understood, there are
potential drawbacks that should be noted:
4 Compression consumes SQL Server resources
4 Disk I/O related to reading database content is unchanged
CPU resources on the Microsoft SQL Server are used to accomplish
compression at backup time. If online transactions are impacted
by the backup process, adding compression to the equation may
induce a severe performance impact.

12

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

Figure 20: CPU utilization without server based compression

Figure 21: CPU utilization with MS SQL Server 2008 native compression

Figure 23: Data Domain system status. Sample output of the sysstat command on a Data Domain system captured during a database backup. When
compression and deduplication are performed on the Data Domain system,
CPU usage on the SQL Server platform is greatly reduced when compared to
the use of server-based compression. Also worth noting is that the Net in
data transfer rate is near the theoretical maximum that can be achieved with
2 GbE network connections. The next logical step to eliminate this bottleneck
would be to use additional GbE interfaces or to employ the use of a single 10
GbE network connection.

9.3 Pick one form of compression, but not both

Figure 22: CPU utilization with a third party solution using level 5
compression

Figures 20 through 22 detail % Processor Time for the same


database backup on a dual quad-core 2.66 GHz server platform
with Microsoft SQL Server 2008. Data transfer rates and CPU
usage vary widely in these examples. No SQL transactions or other
activity beyond the single backup were executing at the time these
metrics were captured. Note that all three backups used the same
Data Domain system as a disk backup device. Also note that the
sampling rate used for the performance monitor for figure 17 was
decreased to accommodate the longer running backup job.

The recommended best practice is to architect a solution that


compresses database backup data once. There are multiple reasons
for this. First, compressing data that has already been compressed
usually ends with a larger resulting data set when compared to
compressing the data once. Second, the result of multiple compression operations has a negative impact on Data Domain deduplication where the efficient use of disk is reduced.
In short, the Data Domain system is designed specifically to
optimize compression and deduplication. To get the full value from
the Data Domain system, letting it perform the compression will
always return the best results.

With server based compression CPU usage is elevated to a point


where non-backup transactions may be elongated. Backing up
multiple databases simultaneously may not be practical with server
based compression.
4 Compression and deduplication
Native Microsoft SQL Server 2008 compression, or compression
provided by a third-party backup application, occurs on the
Microsoft SQL Server platform. Data Domain deduplication technology is different in that Microsoft SQL backups are compressed and
deduplicated on the Data Domain system.

MICROSOFT SQL SERVER: BEST PRACTICES WITH DATA DOMAIN

13

10. Appendix B Index Fragmentation


Index fragmentation affects I/O performance of queries whose
data pages do not reside in the Microsoft SQL Server data cache. A
variety of techniques are commonly used to reduce index fragmentation, including but not limited to DBCC INDEXDEFRAG, DBCC
DBREINDEX, and CREATE INDEX WITH DROP EXISTING.
While these techniques are effective in reducing index fragmentation, they can also have a negative impact on deduplication.
Database administrative teams that routinely defragment all
indexes at some predetermined frequency may notice reduced data
deduplication rates on their Data Domain systems. The end result is
reduced storage efficiency.
Index defragmentation has the effect of reorganizing the pages
within a database such that Data Domain deduplication sees the
backup data stream as new unique data. In addition to the inefficient use of backup device storage space, this can also impact the
ability to replicate database backups using Data Domain replication.
A greater quantity of unique data blocks equates to replicating a
greater quantity of data over what may be a bandwidth limited
WAN.
Database administrative teams may find themselves in a situation
where index fragmentation impacts query performance, and
frequent index defragmentation impacts backup storage device
performance in terms of deduplication and replication rates.

files at deployment to accommodate potential future growth.


While it may be impossible to anticipate the size of a given
database three years into the future, doing so helps to reduce
the possibility that file fragmentation will impact query performance. If automatically growing database files is a requirement,
consider growing in large chunks versus small chunks. It may be
impractical to locate each database on a unique logical volume,
but consider doing so for databases that are expected to grow
considerably over time. Finally, disk file fragmentation can be
reduced by Windows file system defragmentation utilities such
as the Windows Disk Defragmenter.
4 Do all indexes need to be defragmented or just a subset?
Data Domain recommends the use of index defragmentation
tools based on thresholds and limits versus automatically
defragmenting every index on every table whether it is required
or not. The suggestion is to understand what indexes and
their corresponding fragmentation levels impact performance.
These indexes should be monitored for a specific fragmentation
threshold, and action taken to defragment these indexes only
when necessary. Selective index defragmentation will have less
impact on production and will assist in preserving the ability to
efficiently deduplicate database backups.

10.1 Addressing the challenge


Data Domain recommends addressing these challenges with a
balanced approach. For instance, instead of defragmenting all
indexes based on a schedule, consider defragmentation based on
thresholds. Additionally Data Domain recommends the use of index
keys that are less prone to fragmentation in the first place.
4 Is index fragmentation the only issue impacting transaction
performance?
I/O subsystem performance, memory usage, and CPU utilization can all have a negative impact on query performance.
These issues should be diagnosed and resolved versus the
use of frequent automatic index defragmentation to improve
performance.
File fragmentation can also impact performance. Many small databases sharing the same logical disk volume combined with the
use of the autogrowth property can cause logically sequential
database files to allocate non-sequential physical storage on
disk. Ideally, administrators should set the size of database

Figure 24: DBCC showcontig command output. This graphic includes extent
scan fragmentation data indicating that index C_Customer_I1 does not
require defragmentation at this time.

4 Indexes and keys


Structuring indexes and keys so as to minimize fragmentation may or may not be realistic in all cases, but it should be
considered as it potentially reduces the need to defragment
indexes frequently. Index and key inserts that occur at the
end of the table and index are likely to reduce fragmentation.
Deletes that occur in contiguous chunks also assist in reducing
fragmentation.

Data Domain | 2421 Mission College Blvd., Santa Clara, CA 95054 | 866-WE-DDUPE, 408-980-4800
Copyright 2010 Data Domain LLC. All rights reserved.
Data Domain LLC believes information in this publication is accurate as of its publication date. This publication could include technical inaccurancies or typographical errors. The information is subject to change without notice. Changes are periodically added to the information herein; these
changes will be incorporated in new additions of the publication. Data Domain LLC may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Reproduction of this publication without prior written permission is forbidden.
The information in this publication is provided as is. Data Domain LLC makes no representations or warranties of any kind, with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Data Domain and Global Compression are trademarks of Data Domain LLC All other brands, products, service names, trademarks, or registered service marks are used to identify the products or services of their respective owners. WP-MSSQL-0210

DEDUPLICATION STORAGE

www.datadomain.com

Вам также может понравиться