Network Bandwidth Implications of Oracle Data Guard2

Network Bandwidth Implications of Oracle Data Guard
Introduction
Oracle Data Guard is Oracle's data protection and disaster recovery solution. One of the frequent
questions that customers ask the Data Guard team is how much network bandwidth is required by Data
Guard. A variation of the same question is: Can the network link between the production (or primary site)
and disaster recovery (DR or secondary site) data center support a Data Guard configuration?
At a high level, the answer is simple enough. It depends on how busy the production database is. Let's
look into it in a bit more detail.
It's the Redo
What is the basis of Data Guard's operation? Well, Data Guard sends the redo data generated by the
primary database to one or more secondary, or standby databases. That's how Data Guard keeps the
standby databases transactionally consistent with the primary database. The more redo data the primary
database generates, the more redo data Data Guard needs to transmit to the standby database. In other
words, the faster the primary database generates redo, the faster Data Guard needs to send the redo to
the standby database, otherwise either the standby database may fall behind, or processing on the
primary database may slow down (the exact behavior depends on the Data Guard protection mode
chosen - more on it later). This means that the available network between the primary and standby
databases must be capable of supporting this redo generation rate, or more precisely - the peak redo
generation rate.
Since the amount of redo generated by a database is proportional to the transactional, or the write activity
in the database, this implies that for a very busy OLTP (on-line transaction processing) system (e.g. the
leading e-commerce websites), the network bandwidth required by Data Guard will be higher than that
required by a non-OLTP system, or a system which supports primarily read-intensive transactions (e.g.
systems for the technical support knowledge bank of a hi-tech company, or systems that allow you to view
your present/previous credit card statements, your bank account balances, etc.).
What are the typical redo generation rates of Oracle production databases out there? The following
graph, which summarizes the responses to the question "What is your peak redo generation rate?" from
approximately 100 Oracle customers interested in Data Guard and attending OracleWorld San Francisco
2003, provides some insights:
It shows that 70%+ Oracle customers report a peak redo rate less than 500KB/sec.
Measuring the Peak Redo Rate
How does one measure the peak redo generation rate for a database? Use the Oracle Statspack utility for
an accurate measurement of the redo rate.
Based on your business you should have a good idea as to what your peak periods of normal business
activity are. For example, you may be running an online store which historically sees the peak activity for
4 hours every Monday between 10:00 am - 2:00 pm. Or, you may be running a merchandising database
which batch-loads a new catalog every Thursday for 2 hours between 1 am - 3 am. Note that we say
"normal" business activity - this means that in certain days of the year you may witness much heavier
business volume than usual, e.g. the 2-3 days before Mother's Day or Valentine's Day for an online florist
business. Just for those days, perhaps you may allocate higher bandwidth than usual, and you may not
consider those as "normal" business activity. However, if such periodic surges of traffic are regularly
expected as part of your business operations, you must consider them in your redo rate calculation.
During the peak duration of your business, run a Statspack snapshot at periodic intervals. For example,
you may run it three times during your peak hours, each time for a five-minute duration. The Statspack
snapshot report will include a "Redo size" line under the "Load Profile" section near the beginning of the
report. This line includes the "Per Second" and "Per Transaction" measurements for the redo size in bytes
during the snapshot interval. Make a note of the "Per Second" value. Take the highest "Redo size" "Per
Second" value of these three snapshots, and that is your peak redo generation rate. For example, this
highest "Per Second" value may be 394,253 bytes.
Note that if your primary database is a RAC database, you must run the Statspack snapshot on every
RAC instance. Then, for each Statspack snapshot, sum the "Redo Size Per Second" value of each
instance, to obtain the net peak redo generation rate for the primary database. Remember that for a RAC
primary database, each node generates its own redo and independently sends that redo to the standby
database - hence the reason to sum up the redo rates for each RAC node, to obtain the net peak redo
rate for the database.
Redo Generation Rate and the Required Network Bandwidth
The paper titled "Oracle9i Data Guard: Primary Site and Network Configuration Best Practices" available
at http://otn.oracle.com/deploy/availability/htdocs/maa.htm, is part of Oracle Maximum Availability
Architecture (MAA) series of white papers, and provides a useful framework to show the correlation
between the peak redo rate and the required bandwidth (ref. Appendix F: Network Throughput and Peak
Redo Rates). This article will not go into the details of the formula calculation since it is already explained
in the paper. The formula used in the paper (assuming a conservative TCP/IP network overhead of 30%)
is:
Required bandwidth = ((Redo rate bytes per sec. / 0.7) * 8) / 1,000,000 = bandwidth in Mbps
Thus, our example of 385 KB/sec peak rate would require an available network bandwidth of at least
((394253 / 0.7) * 8) / 1,000,000 = 4.5 Mbps.
For this Data Guard configuration, a standard T1 line primary-standby connection providing up to 1.544
Mbps will not be adequate. However, a T3 connection (typically providing up to 44.736 Mbps) may be
more than adequate, provided of course this connection is not heavily shared by other applications that
may reduce the effective bandwidth for the primary-standby connection. This means that while the peak
redo generation rate is a good indication of your Data Guard-related network requirements, make sure
that while specifying your network requirements with your network service provider, you also consider
other applications and their Service Level Agreements (SLAs) that may be sharing this network.
Remember - the formula above indicates the network bandwidth that should be available to Data Guard, it
does not indicate what the entire network bandwidth should be between your primary and DR data
centers.
If this network link may be shared with other critical apps, consider configuring a higher bandwidth
network e.g. dark fibre, OC1, or OC3, and/or using Quality of Service (QoS) to prioritize network traffic or
to allocate dedicated bandwidth to a particular class of traffic, to prevent bursty traffic adversely affecting
your latency-sensitive traffic (such as Data Guard redo traffic).
Data Guard Protection Modes and the Network
Data Guard can be configured in one of three protection modes - Maximum Protection, Maximum
Availability or Maximum Performance. These protection modes essentially differ in the following:
their recommended redo data transport settings,
the behavior of the primary when the last standby in the chosen protection mode is unavailable,
and
their capabilities for zero data loss in the event of a disaster at the primary site.
For Maximum Protection and Maximum Availability, the redo data transport setting requires the LGWR
SYNC AFFIRM attributes in the log_archive_dest_n entry for the particular standby. For the
Maximum Performance mode, the redo transport is set to the LGWR ASYNC, or alternatively, ARCH
attributes.
Synchronous transport, as implied by LGWR SYNC AFFIRM attributes, means that primary database
transactions are not committed till they are also available on disk on the standby. This implies that a
possible impact on the production transactions is correlated to the latency of the network link between the
primary and the standby. Since the latency or round trip time for a network is usually correlated to the
length of the network, or the physical distance between the two end points (in this case the primary and
standby), Maximum Protection and Maximum Availability modes are not recommended for Data Guard
deployments over a Wide Area Network (WAN). Note that this recommendation is driven by the laws of
physics (speed of light limitation) - the greater the distance of a network, the longer it will take for data
packets to traverse the network, and hence the longer it will take for primary database transactions to
commit.
For WAN deployments of Data Guard, the Maximum Performance protection mode is recommended. All
three protection modes can however be used for Local Area Network (LAN) or Metropolitan Area Network
(MAN) deployments of Data Guard. As demonstrated in the previously mentioned MAA paper titled
"Oracle9i Data Guard: Primary Site and Network Configuration Best Practices", Maximum
Protection/Availability modes are viable for a Data Guard deployment of approximately 345 miles, with
minimal performance impact (no more than 3% in Oracle's internal tests) on the primary. A US coast-tocoast (i.e. WAN) deployment of Data Guard using the Maximum Performance mode has almost no
performance impact (1% in tests) on the primary.
Network Bandwidth Issues During Standby Creation
If you are creating the standby database from a backup of your multi-terabyte production database, an
issue that you have to resolve is how to ship the initial backup to the standby site. Sending this initial
multi-terabyte backup to the standby site over the network may not be feasible. You may be better off by
shipping the backup tape(s) to the standby site and subsequently using the network to copy incremental
backups to the standby site.
Data Guard provides an important optimization in this regard. While the backup tapes are in transit, the
standby database may be mounted and started, based on the standby control file and initialization file
sent to the standby site over the network. In such a situation, the standby database acts as an archive log
repository. For example, any archive logs generated at the primary server since the backup of the primary
database can be manually copied to the standby site over the network. Also, after redo shipping is
enabled on the primary, any new redo data generated on the primary can automatically be sent to the
standby server by Data Guard. This redo data is not applied to the standby database since it is not yet
fully restored with the backups, but at least the archive logs will be available at the standby site. This way,
Data Guard minimizes any risk of data loss in the event of a severe outage at the primary server while the
backup tapes are in transit, and enables faster synchronization of the standby database with the primary
since the required archive logs are already available locally at the standby site.
Once the backup tapes arrive at the standby site and the backups (full and incremental) are restored at
the standby database, the standby database and the apply process can be started. All accumulated redo
data at the standby site will now be automatically applied to the standby database. If necessary, Data
Guard will use the network to automatically send any new primary database archive logs, or any missing
archive logs, to the standby site and rapidly bring the standby up-to-date with the primary database.
What if I have a Slow Network?
If you have a slow network link between the production and DR data centers, seriously consider
upgrading the network. Remember, Disaster Recovery is not an area where you would want to cut
corners, especially if your business has strict availability requirements. In case there is a severe outage at
the production site and your business operations are down, the last thing that you want to do at that
critical moment is to figure out how much data you might have lost because redo data was not shipped to
the standby because of a slow network, or figure out how much the standby database is behind the
currently unavailable primary database.
Data Guard does provide you with some additional options in case you want to reduce the demands on
your network resources for a highly active production database. If you have configured multiple standbys,
consider the Cascaded Redo Log Destinations feature, with which you can have one standby database
sending redo data to one or more standby databases, instead of requiring the primary database to send
this redo to all standbys. This feature not only saves network resource consumption around the
production data center, but also saves valuable processing cycles for the production database.
Another option that you may evaluate is configuring the link with SSH port forwarding with compression.
For a high latency low bandwidth network, SSH port forwarding is recommended for Maximum
Performance mode. Oracle's internal tests in a high latency WAN showed that using SSH with
compression made a significant reduction in network traffic and reduction in redo data transfer time. Refer
to the "Oracle9i Data Guard: Primary Site and Network Configuration Best Practices" paper for further
details on the test results. Please also refer to the MetaLink Note 225633.1 "Implementing SSH port
forwarding with 9i Data Guard" for configuration guidelines.
For additional guidelines related to tuning the relevant parameters for Data Guard, Oracle Net Services
and your operating system, refer to this Oracle9i Data Guard: Primary Site and Network Configuration
Best Practices" paper as well as the following MetaLink Notes:
MetaLink Note 241925.1 "Troubleshooting 9i Data Guard Network Issues"
MetaLink Note 260040.1: "Refining Remote Archival Over a Slow Network with the ARCH
Process"
A question that we do commonly get for this slow network issue is whether there is any way Data Guard
can filter out selected transactions before sending the redo data to the standby sites. The answer is no.
Every bit of redo data that is generated on the primary database will be sent over to the standby site - no
filtering is possible. Make sure you understand the rationale here - Data Guard is a disaster recovery
mechanism, so the general goal should be to keep the standby databases transactionally consistent with
the primary, such that during a switchover or a failover, a chosen standby database may easily be
transitioned into a primary role. If you need to transform/filter your redo data before sending that over to
the standby site, consider an alternative solution such as Oracle Streams. Unlike Data Guard, Streams
also allows the replication of a subset of of the tables on the source database to the target database, and
that could be another way to ensure that only the data that needs to be protected is transmitted across
the network, especially when the available network bandwidth is not enough to keep up with the redo
generation rate.
Note that after the redo data reaches the standby site, Data Guard SQL Apply (logical standby database)
offers some flexibility in that it allows you to skip applying that redo for certain tables. This is not possible
with Data Guard Redo Apply (physical standby database), which by definition is a block-for-block copy of
the primary database.
A follow-up question is whether one can do NOLOGGING operations on the primary database in a Data
Guard configuration, to reduce the load in the network. The answer is that one shouldn't do it. The redo
data is the basis of Data Guard's operations. Since nologging operations write directly to the data files
and bypass the redo logs, Data Guard will not be able to keep the standby database consistent with the
primary during nologging operations. In fact, to ensure this doesn't happen, Oracle9i introduced the
command ALTER DATABASE FORCE LOGGING; to make sure that all database write operations are
logged. It is always a good idea to run this command on your production database so that you are
protected against any application that may have NOLOGGING operations in-built in its code. Refer to the
MetaLink Note 216211.1 "Nologging In The E-Business Suite" for further details in this matter.
Conclusion
This article focused on the network bandwidth implications of a Data Guard configuration. The objective of
the article was to convey the most relevant issues in a concise manner and provide readers with helpful
pointers for further reading.
Network bandwidth management is not a one-off exercise. It needs careful planning, review and
understanding of SLAs for the supported applications, as well as SLAs promised by the network service
provider, and continuous monitoring of the network to ensure that the business operations goals and
availability requirements are being met. The good thing for administrators is that several bandwidth
management and monitoring tools are available in the market, that allow administrators to extract the
maximum value of their network connectivity investments, instead of buying extra bandwidth that
ultimately may not be necessary.
Data Guard is an excellent choice for data protection and disaster recovery not just because of its
comprehensive functionality, but also because of the way it is optimally architected to handle data
transmission issues over a network. It is based on standard TCP/IP protocols, which means organizations
can leverage existing resources, and not buy extra hardware, or incur extra training. The redo
transmission is optimal - even though a write transaction affects the redo log files, archive log files and
data files, Data Guard sends only the redo data to keep the standby databases synchronized with the
primary. This is in contrast to certain storage-level remote mirroring solutions which may send all of those
changes, requiring up to 3 times more network resource consumption. Data Guard also offers
administrators the flexibility to configure their desired redo transport mechanism based on their business
requirements. Finally Data Guard comes with a rich set of configuration guidelines and best practice
blueprints that make it easy to implement and use.
References
1. Oracle Data Guard Overview
2. Oracle Data Guard Concepts and Administration Manual
3. Oracle9i Database Performance Tuning Guide and Reference Manual - Chap. 21: Using
Statspack
4. MetaLink Note 94224.1 - "FAQ - Statspack Complete Reference"
5. Oracle Maximum Availability Architecture
6. Oracle9i Data Guard: Primary Site and Network Configuration Best Practices
7. MetaLink Note 225633.1 - "Implementing SSH port forwarding with 9i Data Guard"
8. MetaLink Note 241925.1 - "Troubleshooting 9i Data Guard Network Issues"
9. MetaLink Note 260040.1 - "Refining Remote Archival Over a Slow Network with the ARCH
Process"
10. Oracle Streams Overview
11. MetaLink Note 216211.1 - "Nologging In The E-Business Suite"

Network Bandwidth Implications of Oracle Data Guard2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Network Bandwidth Implications of Oracle Data Guard2

Загружено:

Авторское право:

Доступные форматы

Network Bandwidth Implications of Oracle Data Guard

their recommended redo data transport settings,

What if I have a Slow Network?

MetaLink Note 241925.1 "Troubleshooting 9i Data Guard Network Issues"

Вам также может понравиться