Академический Документы
Профессиональный Документы
Культура Документы
Introduction 3
Bronze Summary 9
Gold Summary 15
Oracle GoldenGate 16
Edition-Based Redefinition 17
Platinum Summary 17
Oracle GoldenGate 33
Conclusion 33
Database cloud transformation drives cost savings by dramatically improving system utilization and
reducing management overhead. DBaaS drives cost savings and increased agility through the
standardization of IT infrastructure and processes. The Cloud enhances these benefits by enabling a
more efficient utility model for computing.
All of the above initiatives, however, also incur business risk by amplifying the impact of downtime and
data loss. The failure of a standalone environment used by a single developer or small work group is
usually of limited impact. The failure of a critical application running in a traditional standalone
environment is immediately felt by the business, but other applications can continue to run unaffected.
In contrast, an outage of a consolidated environment supporting an organization’s entire development
staff, or multiple applications used by numerous departments, has a crippling effect on the business.
Equally crippling would be an interruption in service at a cloud provider where such applications are
running.
The Oracle Maximum Availability Architecture (Oracle MAA) and the MAA reference architectures
provide the requisite level of standardization for all databases and Database-as-a-Service (DBaaS)
where higher stability, lower downtime and better data protection are of interest. MAA reference
architectures address the complete range of availability and data protection required by enterprises of
all sizes and lines of business. All reference architectures are based upon a common platform that can
be deployed on-premises or on cloud. This approach makes Oracle MAA simpler and less risky to
move to the cloud.
This paper describes Oracle MAA reference architectures and the service level requirements that they
address using the Oracle Cloud Infrastructure (OCI) Services. It furthermore, discusses some
performance and availability results leveraging the existing OCI resources. The paper is most
appropriate for a technical audience, that is, architects, directors of IT, and database administrators
responsible for designing and implementing DBaaS and moving implementations to the cloud.
• Bare Metal Single Instance database systems with restart capabilities and redundant local storage
• Virtual Machine single instance database systems with restart capabilities and triple mirrored block storage
• Exadata systems (with various shapes, including quarter, half, and full racks) – best database platform
• Regions, Availability Domains (ADs), and Fault Domains (FDs) to provide outage isolation
o Secure, high bandwidth, and low latency within ADs and across ADs with Virtual Cloud (VCN)
Peering with Public and fully Private subnets
Please refer to the OCI documentation for more information and the latest developments within OCI infrastructure.
All the Availability Domains in a region are connected to each other by a low latency, high bandwidth network, which
makes it possible to provide highly available connectivity to the Internet and to customer premises, and to build
replicated systems in multiple Availability Domains for both high availability and disaster recovery. Availability
Domains have Fault Domains. A Fault Domain is a grouping of hardware and infrastructure within an Availability
Oracle recommends using VCN Peering, which is the process of connecting two VCNs in different regions when
configuring Oracle Data Guard across regions. The peering allows the VCNs' resources to communicate using
private IP addresses without routing the traffic over the internet or through your on-premises network. Without
peering, a given VCN requires an internet gateway and additional public IP addresses. Refer to the OCI
documentation for the latest information on regions that support Remote VCN Peering.
Each architecture uses an optimal set of Oracle HA capabilities to reliably achieve a given service level at the lowest
cost and complexity. They address all types of unplanned outages including data corruption, component failure,
system, and site outages, as well as planned downtime due to maintenance, migrations, or other purposes. MAA
also provides reference architectures for applications using Oracle Sharding and can provide unlimited scalability
and unsurpassed availability within OCI if your application is designed and customized for sharded architectures.
The MAA Oracle Sharding architecture is described in a different paper.
MAA reference architectures are based on a common infrastructure optimized for the Oracle Database that enables
customers to dial in the level of HA appropriate for different service level requirements. This makes it simple to move
a database from one HA-tier to the next, should business requirements change, or from one hardware platform to
another, or from on-premises to the Oracle Cloud.
Bronze Reference Architecture is appropriate for databases where a simple restart of the database instance,
node, or VM and restore from backup is ‘HA and DR enough’. Bronze uses HA capabilities such as server and
Oracle Clusterware monitoring and restart capabilities included with OCI and Oracle Standard and Enterprise
Edition. Bronze is a single instance Oracle 11g database or single instance Oracle 12c or higher Multitenant
database for additional consolidation, simplicity, and pluggable database agility. With Multitenant and pluggable
databases, customers can relocate a PDB with PDB Relocate or refresh a PDB with PDB Hot Cloning. Oracle
Multitenant is an option for database consolidation (multiple pluggable databases in a single container to reduce
operational expenses by managing many databases as one, and to reduce capital costs by increasing consolidation
density). Bronze relies upon Oracle-optimized backups to OCI object storage using Oracle Recovery Manager
(RMAN) to provide data protection within the same region Backups are automatically replicated to another AD for
additional isolation and protection.
Silver Reference Architecture is designed for databases that can’t afford to wait for a cold restart or a restore from
backup should there be an unrecoverable database server outage. Silver builds on the same functionality as the
Bronze architecture and adds capabilities that provide a choice of two different patterns for enhancing availability.
1. The primary pattern is to use Oracle RAC. Oracle RAC is an active-active clustering technology for minimal or
zero downtime in the event of database instance or server failure, and zero downtime for the most common
software updates (operating system, periodic DB/GI software updates). Silver implements best practices to
ensure application service failover. Just as with Bronze, RMAN provides database-optimized backups to protect
data and restore availability should there be a complete database or cluster outage. A two-node Oracle RAC
configuration is available in VM database systems; Oracle’s premier Exadata Database Machine provides
quarter, half, and full Exadata rack options. If Fault Domains exist in a given AD, then the Oracle RAC compute
nodes are placed in different FDs for additional fault isolation for two-node RAC VMs. Exadata remains Oracle’s
best MAA database platform with Oracle RAC and its additional HA, data protection, HA quality-of-service, and
management benefits not found in OCI RAC VM. For more information, refer to:
http://www.oracle.com/technetwork/database/features/availability/exadata-maa-best-practices-155385.html
Gold Reference Architecture is well suited for service level requirements that cannot tolerate downtime from
database, cluster, data corruptions, and site failures and major database upgrade. The Gold reference architecture
builds upon the Silver reference architecture Oracle RAC pattern by adding a standby database with Oracle Active
Data Guard across availability domains, or across regions if regional protection is required for DR. The primary and
standby database systems should be configured symmetrically to ensure that performance service levels are similar
after Data Guard role transitions. Some customers may start with fewer licensed cores on the standby during
recovery and burst after role transition, but the trade off is a delay in meeting performance SLAs. Data Guard Fast-
Start failover must be configured to maintain lowest Recovery Time Objective (RTO). You can configure Data Guard
Fast-Start failover with zero data loss across ADs or across regions by using SYNC or Far SYNC transport.
Alternatively, data loss is minimal with ASYNC transport and Data Guard Max Performance protection mode.
Oracle Active Data Guard across ADs or across regions exists today. You can configure Oracle RAC primary and
standby using Oracle RAC VMs or Oracle Exadata Database Machines. Local and Remote VCN (Virtual Connection
Network) Peering provides a secure, high bandwidth network across Availability Domains and regions. Refer to the
OCI documentation for the latest information about ADs, Fault Domains, and Regions, and their associated VCN
pairing support. If the Remote VCN peering option is not available, then the public Internet backbone can be used
together with Oracle Net and TDE encryption; however, all regions should have this capability now or very soon.
Platinum Reference Architecture builds on the existing Gold architecture with Oracle GoldenGate for zero
downtime upgrades and migrations, and Edition-Based Redefinition for zero downtime application upgrades. The
Platinum reference architecture delivers substantial value for the most critical applications where downtime is not an
option. Platinum requires the same on-premises features and products as Gold, plus Edition Based Redefinition and
Oracle GoldenGate
The following sections describe all reference architectures specific to Oracle Cloud Infrastructure.
Bronze is based on a Single Instance Oracle Database with database server and database instance restart
capabilities. For OCI, it can be a bare metal physical server or single instance VM. Each database installation has
Oracle Clusterware installed. If the server or VM fails, a restart attempt is automatic. Once the database server is
restarted, Oracle Clusterware restarts the Oracle instance, listener and associated service. When a machine
becomes unusable or the database unrecoverable, the recovery time objective (RTO) is a function of how quickly a
replacement system can be provisioned and a backup is restored. In a worst-case scenario of a complete AD
outage there will be additional time required to perform these tasks to restore in a remote location.
Oracle Recovery Manager (RMAN) is used to perform regular backups of the Oracle Database from OCI object
storage. The database backups are automatically replicated across ADs. The potential for data loss, also referred to
as the recovery point objective (RPO), if there is an unrecoverable outage, is equal to the last available backup up to
available contiguous set of archives needed for recovery. Daily backups and frequent archive backups to Cloud
object storage can reduce RPO. Database backups to the Oracle Cloud can be leveraged should a disaster strike
your existing availability domain.
Bronze uses the following capabilities included with the Oracle Database Enterprise Edition:
» Oracle Clusterware automatically restarts the database, the listener, and other Oracle components after a
hardware or software failure, or whenever a database host computer restarts. Oracle Clusterware is pre-
installed in all OCI database instances. MAA recommends using Oracle Clusterware managed services for
all applications that connect to the database using these services.
» Oracle corruption protection checks for physical corruption and logical intra-block corruptions. In-memory
corruptions are detected and prevented from being written to disk, and in many cases can be repaired
automatically. For details see Preventing, Detecting, and Repairing Block Corruption for the Oracle
Database. For OCI, default database configurations have DB_BLOCK_CHECKSUM=FULL enabled. MAA
recommends enabling DB_BLOCK_CHECKING to MED or FULL if performance impact is minimal.
» Automatic Storage Management (ASM) is an Oracle-integrated file system and volume manager that
includes local mirroring to protect against disk failure. MAA recommends that all Bare Metal deployments
The above practices become more relevant with higher service level MAA reference architectures,
especially any architecture that contain Oracle RAC or Data Guard Fast-Start Failover with zero data loss.
Bronze Summary
Table 1 summarizes RTO and RPO service level requirements for the Bronze reference architecture.
TABLE 1: BRONZE RECOVERY TIME (RTO) AND POTENTIAL DATA LOSS (RPO)
Potential
Event Downtime- RTO Data Loss -RPO
Data corruption, unrecoverable instance, server, database or site Hours to day Since last backup
failure
Application upgrades that modify back-end database objects Hours to a day Zero
The MAA recommended silver pattern uses Oracle RAC to enable automatic failover to a second active Oracle
instance for HA, and provides a potential zero downtime for the most common set of software updates. Oracle RAC
is available on OCI with Oracle RAC VMs and with Exadata.
The alternative pattern uses Data Guard database replication with automatic failover to a completely synchronized
copy of the production database in a different availability domain for HA. Similar to Bronze, backups will be sent to
the local OCI object storage and replicated to another AD object storage automatically.
Silver uses the following capabilities in addition to Bronze, included with Oracle Database Enterprise Edition:
• Oracle RAC enables an Oracle database to run across a cluster of servers, providing fault tolerance,
performance, and scalability with no application changes necessary. When there’s an instance or node
failure, database downtime is essentially zero. Furthermore, Oracle Clusterware automatically restarts
failed Oracle RAC instances and managed resources like Oracle listeners. For OCI, Oracle RAC is
available as a 2-node Oracle RAC VM or with Exadata.
• With Oracle RAC, the customer has the ability to manage planned maintenance without user interruption
and reduce service brownout for instance and node failures. Refer to Application Checklist for Continuous
Service for more information.
TABLE 2: SILVER RAC RECOVERY TIME (RTO) AND POTENTIAL DATA LOSS (RPO)
Potential
Event Downtime- RTO Data Loss -RPO
Data corruptions, unrecoverable database, Availability Domain or Hours to day Since last backup
Regional failure
Fault Domain failure (RAC nodes can be configured on separate Seconds Zero
fault domains within an AD)
Application upgrades that modify back-end database objects Hours to a day Zero
Silver Requirements (Oracle RAC: Cloud deployments require a minimum of Oracle Enterprise DBaaS (PaaS) and
Oracle Oracle Cloud Infrastructure Object Storage. Similar to Bronze, if Oracle Multitenant is used for database
consolidation then on-premises deployment also requires a license for Oracle Multitenant and cloud deployment
requires a minimum of Oracle High Performance DBaaS (Paas).
An alternative Silver MAA pattern uses Data Guard Fast-Start failover to maintain a local but separate synchronized
copy of the production database for HA across availability domains, or across Fault Domains when only AD exists.
TABLE 3: ALTERNATIVE SILVER WITH ADG FSFO RECOVERY TIME (RTO) AND POTENTIAL DATA LOSS
(RPO)
Potential
Event Downtime- RTO Data Loss -RPO
Recoverable or unrecoverable RAC instance failure Seconds to minute Zero with SYNC
Recoverable or unrecoverable RAC server failure Seconds to minute Zero with SYNC
Data corruptions, unrecoverable database, Availability Domain or Seconds to minute Zero with SYNC
Regional failure (depends if standby is in another region)
Application upgrades that modify back-end database objects Hours to a day Zero
Alternative Silver Requirements (Data Guard Fast Start Failover): Cloud deployments require a minimum of Oracle
Enterprise DBaaS (PaaS) and Oracle Oracle Cloud Infrastructure Object Storage. Similar to Bronze, if Oracle
Multitenant is used for database consolidation then on-premises deployment also requires a license for Oracle
Multitenant, and cloud deployment requires a minimum of Oracle High Performance DbaaS (Paas).
• Data Guard synchronous replication and Maximum Availability protection modes are used to provide zero data
loss protection required for an HA solution. Data Guard transmits changes made on a primary database to a
standby database in real-time. Changes are transmitted directly from the log buffer of the primary to minimize
propagation delay and overhead, and in order to completely isolate the standby database from corruptions that
can occur in the I/O stack of a production database.
• The primary database and its standby copy can be deployed locally in the same region but in different
Availability Domains or data centers. Each Availability Domain has its own power, cooling, network, servers,
and storage.
• In addition to providing failover options in case there’s a database, storage, or availability domain failure, Data
Guard performs continuous Oracle validation to ensure that corruption is not propagated from the primary to the
standby database. It detects physical and logical intra-block corruptions that can occur independently at either
primary or standby databases. It is also unique in enabling run-time detection of silent lost-write corruptions
(lost or stray writes that are acknowledged by the I/O subsystem as successful). For more details see My
Oracle Support Note 1302539.1 – Best Practices for Corruption Detection, Prevention, and Automatic Repair.
• Data Guard Fast-Start Failover provides automatic database failover. A Data Guard standby is a running
Oracle database, it does not need to be restarted to transition to the primary role. An automatic database
failover can complete in less than 60 seconds, even on heavily loaded systems. Fast-Start Failover provides
HA by eliminating the delay required for an administrator to be notified and respond to an outage.
• Data Guard uses role-specific database services and the same Oracle client notification framework used
by Oracle RAC to ensure that applications quickly drop their connections to a failed database and
automatically reconnect to the new primary database. Role transitions can also be executed manually
using either a command line interface or in the cloud console. To achieve the integrated transparent client
• Data Guard performs complete, one-way physical replication of an Oracle database with the following
characteristics: high performance, simple to manage, support for all data types, applications, and workloads
such as DML, DDL, OLTP, batch processing, data warehouse, and consolidated databases. Data Guard is
closely integrated with Oracle RAC, ASM, RMAN, and Oracle Flashback technologies.
• Primary and standby systems are exact physical replicas, enabling backups (in the future) to be offloaded from
the primary to the standby database. A backup taken at the standby can be used to restore either the primary
or standby database. This provides administrators with flexible recovery options without burdening production
systems with the overhead of performing backups. Today OCI backups cannot currently executed on the
standby.
• Standby databases can be used to upgrade to new Oracle Patch Sets (for example, patch release
12.1.0.2.180417 to 12.2.01.180417) or new Oracle releases (for example, release 12.2 to 18.1) in a rolling
manner. This is done by first upgrading the standby and then switching production to run on the new version.
Total downtime is limited to the time required to switch the standby database to the primary production role and
transition users to the new primary after maintenance has been completed. The new optimized transient logical
standby automated process incurs less than 15 seconds of downtime. Refer to Database Rolling Upgrade
using Data Guard or Automated Database Upgrades using Oracle Active Data Guard and DBMS_ROLLING for
12.2 and later database versions.
For more background on why Oracle recommends database replication using Data Guard or Active Data Guard
rather than storage-based remote mirroring solutions (for example, SRDF, Hitachi TrueCopy, and so on) refer to an
in-depth discussion in Oracle Active Data Guard vs. Storage Remote Mirroring.
Oracle Active Data Guard is a superset of the capabilities that are provided by Oracle Data Guard. The Gold
reference architecture uses the following advanced features of Oracle Active Data Guard:
• Choice of zero or near-zero data loss protection. If zero data loss upon a failover is not a requirement, you can
choose to deploy Oracle Active Data Guard with asynchronous replication to the remote DR site with Remote
VCN peering. If zero data loss is required, then an Oracle Active Data Guard Far Sync instance can be
deployed. A far sync instance uses a light-weight forwarding mechanism to enable zero data loss failover even
when primary and standby databases are hundreds or thousands of miles apart, without impacting primary
database performance. Far sync instances are simple to deploy and transparent to operate. A far sync instance
can also be used in combination with the Oracle Advanced Compression Option to enable off-host transport
compression to conserve network bandwidth and reduce RPO.
• Offload of read-only workload to an Oracle Active Data Guard standby database open read-only while
replication is active. An up-to-date active standby database is ideal for offloading ad-hoc queries and reporting
workloads from the production database. This increases ROI in standby systems and improves performance for
all workloads by using capacity that would otherwise be idle. It also provides continuous application validation
that standby databases are ready to support production workload should an outage occur.
• Fast incremental backups from the standby database using an RMAN block change tracking file. Fast
incremental backups complete up to 20x faster than traditional incremental backups. Today OCI backups
cannot currently executed on the standby.
• Automatic repair of block-level corruption caused by intermittent random I/O errors that can occur
independently at either primary or standby databases. Oracle Active Data Guard retrieves a good copy of the
block from the opposite database to perform the repair. No application changes are required and impact of the
corruption is transparent to the user.
TABLE 4: GOLD RECOVERY TIME (RTO) AND POTENTIAL DATA LOSS (RPO)
Potential
Event Downtime- RTO Data Loss -RPO
Data corruptions, unrecoverable database, Availability Domain or Seconds to Minute Zero with SYNC
Regional failure (depends if standby is in another region)
Application upgrades that modify back-end database objects Hours to day Zero
Gold Requirements: On premises deployment as a DR site requires Oracle Enterprise Edition, Oracle Active Data
Guard, Oracle Multitenant (optional for database consolidation) and Oracle Enterprise Manager life-cycle
management, diagnostic, and tuning packs. Cloud deployment requires a minimum of Oracle Extreme Performance
DBaaS (PaaS) or Exadata cloud services. Gold also uses Oracle Database Backup Cloud services.
Platinum uses Oracle GoldenGate and Edition-Based Redefinition to enable zero downtime maintenance,
migrations, and application upgrades.
Oracle GoldenGate
Oracle GoldenGate enables logical replication to maintain a synchronized copy (target database) of the production
database (source database). The target database contains the same data, but is a different database from the
source (for example, backups are not interchangeable). Oracle GoldenGate logical replication is a more
sophisticated process that has a number of prerequisites that do not apply to Data Guard physical replication. In
return for these prerequisites, Oracle GoldenGate provides unique capabilities to address advanced replication
requirements. Refer to MAA Best Practices: Oracle Active Data Guard and Oracle GoldenGate for additional
insights on the tradeoffs of each replication technology and requirements that may favor the use of one versus the
other, or the use of both technologies in a complementary manner.
The Platinum reference architecture uses Oracle GoldenGate bi-directional replication to implement zero downtime
maintenance and migrations. In such a scenario:
Bi-directional replication can also be used to increase availability service levels when a continuous read-write
connection to multiple copies of the same data is required. It is important to note that bi-directional replication is not
application transparent. It requires conflict detection and resolution when changes are made to the same record at
the same time in multiple databases. It also requires careful consideration of the impact of different failure states and
replication lag.
Many of our Platinum MAA customers use both Data Guard Fast-Start Failover and GoldenGate complementary to
achieve zero downtime upgrade and migration solution and still maintain zero data loss for database failures. Refer
to Transparent Role Transitions with Oracle Data Guard and Oracle GoldenGate for more details.
There is no concern for data loss during planned maintenance when Oracle GoldenGate is used, as long as the
production copy of the database is protected by a Data Guard standby.
Edition-Based Redefinition
Edition-Based Redefinition enables online application upgrades that require changes to database objects that would
otherwise require the database to be offline. EBR enables all changes to be implemented while the previous version
of the application and the database remain online. When the upgrade process is complete, the pre-upgrade
application and the post-upgrade application can be used at the same time against the same copy of the Oracle
database. Existing sessions can continue to use the pre-upgrade version until their users decide to end them, and
new sessions can use the post-upgrade version. When there are no longer any sessions using the pre-upgrade
version of the application, the pre-upgrade version can be retired.
Platinum Summary
RTO and RPO service level requirements addressed by the Platinum reference architecture are summarized in
Table 5. The assumption is that application continuity is required and can mask outages. Also, Oracle GoldenGate
and Edition-Based Redefinition are leveraged for zero application downtime.
TABLE 5: PLATINUM RECOVERY TIME (RTO) AND POTENTIAL DATA LOSS (RPO)
Data corruptions, unrecoverable database, Availability Domain or Zero or Seconds Zero with SYNC
Regional failure (depends if standby is in another region)
Platinum Requirements: On-premises deployment as a DR site requires Oracle Enterprise Edition, Oracle RAC,
Oracle Active Data Guard, Oracle GoldenGate, Oracle Multitenant (optional for database consolidation), and Oracle
Enterprise Manager life-cycle management, diagnostic, and tuning packs. Cloud deployments require a minimum of
Oracle Extreme Performance DBaaS (PaaS) or Exadata cloud services and Oracle GoldenGate cloud service.
Platinum also uses Oracle Database Backup Cloud services.
• Manual checks are initiated by the administrator or at regular intervals by a scheduled job that performs
periodic checks.
• Runtime checks are automatically executed on a continuous basis by background processes while the
database is open.
• Background checks are run on a regularly scheduled interval, but only during periods when resources would
otherwise be idle.
Manual All Dbverify, Analyze Physical block checks Logical checks for intra-block
and inter-object consistency
Automatic with All RMAN Physical block checks during backup Intra-block logical checks
OCI backup and restore
APIs
Runtime Silver – Pattern 2, Data Guard, Physical block checking at standby Detect lost-write corruption,
Gold and Platinum Active Data Guard Strong isolation between primary and auto shutdown and failover
standby eliminates single point of failure Intra-block logical checks at
Automatic database failover standby
Runtime Gold and Platinum Active Data Guard Automatic repair of physical corruptions
Runtime All Oracle block checksum In-memory block and redo checksum In-memory intra-block logical
and block checking checks
Enabled by default for
all OCI Data Guard
deployed systems.
Note that HARD validation and the Automatic Disk Scrub and Repair are unique to Exadata storage. HARD
validation ensures that Oracle Database does not write physically corrupt blocks to disk. Automatic Hard Disk Scrub
and Repair inspects and repairs hard disks with damaged or worn out disk sectors (cluster of storage) or other
physical or logical defects periodically when there are idle resources. Exadata sends a request to ASM to repair the
bad sectors by reading the data from another mirror copy. By default the hard disk scrub runs every two weeks.
• Ensure you have installed the Oracle Cloud Backup module from OTN and you configure your RMAN
environment properly. With automatic backups, this is already taken care of by the backup agent.
RMAN>CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS
'SBT_LIBRARY=/home/oracle/OPC/lib/libopc.so,
ENV=(OPC_PFILE=/u01/products/db/12.1/dbs/opcodbs.ora)';
b. Set RMAN PARALLELISM equivalent to 4 per database server. For 2 node Oracle RAC, set
PARALLELISM to 8.
RMAN> CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 8 BACKUP TYPE TO
BACKUPSET;
a. Reduce the amount of time needed for daily backups. Since backup times are shorter, you have an
option to backup more frequently as well to reduce RPO.
b. Reduce network usage and network bandwidth requirements when backing up over a network.
c. Reduce backup overhead and read I/Os.
The trade off is that the restore and recovery time is longer because you must restore the previous cumulative
backup and subsequent incremental plus redo to recover the database.
Archive backups are automatic and executed every hour on any archives that have not been backed up
previously. Since archives are automatically managed by the Fast Recovery Area, there are no user action to
manage or purge the archives.
A more detailed Cloud OCI backup/restore write up will be published in a subsequent paper. Figure 8 serves as an
example of MAA performance observations. OCI backup/restore APIs are being changed to use the MAA default
recommendations. Most of the performance numbers were based on Exadata on OCI. It was stated that smaller OCI
compute or Oracle RAC VM shapes may have network and object storage resource management controls, which
implies that some backup/restore rates may be lower.
chicago =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS=(PROTOCOL= TCP)
(HOST=prmy-scan)(PORT=1521)))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = chicago)))
boston =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS=(PROTOCOL= TCP)
(HOST=stby-scan)(PORT=1521)))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = boston)))
LISTENER = (DESCRIPTION =
(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=host_name)
(PORT=port_num))))
SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=sid_name)
(GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
(ORACLE_HOME=oracle_home)
(ENVS="TNS_ADMIN=oracle_home/network/admin"))
You should also be aware of the following additional considerations in an Oracle Clusterware environment.
• The static service must be set in the listener.ora file in the GRID_HOME of all the nodes.
• The ORACLE_HOME listener.ora parameter in the static service definition must be set to the
ORACLE_HOME of the instance, not the GRID_HOME.
• The ENVS listener.ora parameter must be used in the static service definition to explicitly set the
TNS_ADMIN environment variable to the appropriate network admin directory, which is usually the
network admin directory of the Oracle Home for the Oracle database.
• In an Oracle RAC One Node or Policy Managed Oracle RAC environment, the SID_NAME of each
possible instance must be specified in SID_LIST. The SID_NAME of each instance must match the
INSTANCE_NAME database initialization parameter of that instance.
3. Restart or reload all of the listeners where the above modification was made (primary and standby nodes).
5. Verify that the configuration was created successfully by using the SHOW CONFIGURATION command.
Configuration Status:
SUCCESS
Per Oracle’s recommendation, set DB_FLASHBACK_RETENTION_TARGET to 120 minutes (default value is 1440)
if only used for fast start reinstantiation following a Data Guard-based failover. Note that if Flashback Database
serves the additional function of providing fast point in time recovery for protection against user error and
corruption, an extended flashback retention period should be set for the amount of time deemed necessary to
achieve these goals.
7. If the primary and standby are release 11.2.0.4, and you want to use Maximum Availability, set the LogXptMode
database property for both the primary and target standby databases to SYNC. If the primary and standby are
release 12.1.0.1 or 12.2.0.1 you can set the LogXptMode database property for both the primary and target
standby databases to FASTSYNC.
If you are more concerned about the performance of the primary database than zero data loss, or if the latency
between primary and standby is unpredictable and not consistently <2 ms, enable fast-start failover and set the
configuration protection mode to Maximum Performance. In this mode you must consider how much data loss is
acceptable in terms of seconds and set the FastStartFailoverLagLimit configuration property accordingly.
This property specifies the amount of data, in seconds, that the target standby database can lag behind the
primary database in terms of redo applied. If the standby database's redo apply point is within that many seconds
of the primary database's redo generation point, a fast-start failover is allowed. In the following example it is
permissible to lose up to 60 seconds worth of redo during a failover in this mode.
8. Set the FastStartFailoverThreshold property to specify the number of seconds you want the observer and
target standby database to wait (after detecting the primary database is unavailable) before initiating a failover.
A fast-start failover occurs when the Observer and the Standby database both lose contact with the production
database for a period of time that exceeds the value set for FastStartFailoverThreshold, and when both
parties agree that the state of the configuration is synchronized (Maximum Availability), or that the lag is not more
than the configured FastStartFailoverLagLimit (Maximum Performance). An optimum value for
FastStartFailoverThreshold weighs the trade-off between the fastest possible failover (thus minimizing
downtime), and unnecessarily triggering failover due to fleeting network irregularities or other short-lived events
that do not have material impact on availability. The default value set when Fast-Start Failover is enabled is 30
seconds. Recommended settings for FastStartFailoverThreshold are:
9. Enable fast-start failover with DGMGRL by issuing the ENABLE FAST_START FAILOVER command while
connected to any database in the broker configuration, including on the observer computer.
Observers should be installed and run on a computer system that is separate from the primary and standby
systems. Observers are very lightweight and can be put on the smallest OCI VM shape. For Oracle Database
releases 11.2 and 12.1, you must have a custom script to monitor the observer and attempt to restart on current
target or in an alternate target.
Ideally, you should run the observer on a system that is on a separate AD from the primary and standby databases.
You can also install the observer on the same network as the application to represent the same application
connectivity. If a third, independent location is not available, then locate the observer in the primary AD on a
separate fault domain and isolate the observer as much as possible from failures affecting the standby database.
To start an observer, you must be able to log in to DGMGRL with an account that has the SYSDG or SYSDBA
privilege. An observer is an OCI client that connects to the primary and target standby databases using the same
SYS credentials you used when you connected to the Oracle Data Guard configuration with DGMGRL. Note that the
following example starts the observer using the IN BACKGROUND clause that is only available in Oracle Database
release12.2. If you are starting the observer in version 12.1 or 11.2, omit that clause.
DGMGRL> sys/welcome1@boston
DGMGRL> START OBSERVER IN BACKGROUND
FILE IS /net/sales/dat/oracle/broker/fsfo.dat
If your primary and standby database are release 12.2.0.1 you can register up to three observers to monitor a single
Data Guard broker configuration. Each observer is identified by a name that you supply when you issue the START
OBSERVER command. You can also start the observers as a background process.
DGMGRL> sys/welcome1@boston
DGMGRL> start observer number_one in background;
On the same host or a different host you can start additional observers for High Availability (release 12.2.0.1):
DGMGRL> sys/welcome1@boston
DGMGRL> start observer number_two in background;
Only the master observer can coordinate fast-start failover with the Data Guard broker. All other registered
observers are considered to be backup observers.
If the observer was not placed in the background (release 12.2 only) then the observer is a continuously executing
process that is created when the START OBSERVER command is issued. Thus, the command-line prompt on the
observer computer does not return until you issue the STOP OBSERVER command from another DGMGRL session.
To issue commands and interact with the broker configuration, you must connect through another DGMGRL client
session.
• Database instance failure (or last instance failure in an Oracle RAC configuration)
• Shutdown abort (or shutdown abort of the last instance in an Oracle RAC configuration)
• Data files taken offline due to I/O errors
• When both the Data Guard observer and the standby database lose their network connection to the production
database, and when the standby database confirms that it is in a synchronized state.
• A user configurable condition is one in which the user can specify a condition for which a FSFO is provoked. It
is recommend that you leave these user specified condition at the default values. The conditions that can be set
up for automatic failover are:
a) Datafile offline (write error)
b) Corrupted Dictionary
c) Corrupted Controlfile
d) Inaccessible Logfile
e) Stuck Archiver
f) ORA-240 (control file enqueue timeout)
Should one of these conditions be detected, the Data Guard observer fails over to the standby, and the primary is
shut down, regardless of how the Data Guard broker attribute FastStartFailoverPmyShutdown is set. Note that
for user specified conditions the fast start failover threshold is ignored and the failover proceeds immediately.
Following unplanned downtime on a primary database that requires a failover, full fault tolerance is compromised
until the standby database is reestablished. Full database protection should be restored as soon as possible.
Reinstating databases is automated if you are using Data Guard fast-start failover, and failed instances are
automatically restarted. After a fast-start failover completes, the observer automatically attempts to reinstate the
original primary database as a standby database. Reinstatement restores high availability to the broker configuration
so that, if the new primary database fails, another fast-start failover can occur. The reinstated database can act as
To reinstate the original primary database, the database must be started and mounted, but it cannot be opened.
• Initial Primary Region will have the primary database in AD1 and HA far sync instances and HA observer in
AD2.
• Initial Standby Region will have the standby database in AD1 and HA far sync instances and HA observer
(used after role change) in AD2
a) Far sync uses Oracle RAC for the lowest brownout and reconciliation to zero data loss (1.5 mins). If
using Oracle RAC for the far sync instance is not feasible then use alternate destinations
understanding that the time to get back to zero data loss after a far sync failure will be higher.
b) For the HA observer MAA recommends to use at least 2 observer targets in the same primary region
but different ADs
Deployment Configuration: 2 Regions and only 1 AD in each region
• Initial Primary regions will have the primary database and 2 servers to host far sync instances and observers.
Place the far sync servers in different fault domains compared to primary database.
• Initial Standby Region will have the standby database and 2 servers to host far sync instances and observers
(when there is a role change). Best to place the potential far sync servers in different fault domains compared to
the standby database.
For more information on far sync instance sizing and far sync Data Guard architectures, refer to Oracle Active Data
Guard Far Sync - Zero Data Loss at Any Distance
Oracle Active Data Guard is always recommended and is a requirement for the Gold and Platinum MAA reference
architectures, for its additional benefits such as auto-block repair of physical data corruptions and ability to offload
backups and reads to the real-time physical standby database. As represented in Figure 10, Read-Write
transactions of the primary can incur a low downtime impact of 15 seconds while the Read-Only transactions on the
read-only standby can incur an even lower downtime impact of 6 seconds, maintaining very good availability for all
your business transactions.
Concluding, by deploying Data Guard FSFO minimal read-write and read-only downtime can be achieved after
primary database failure.
Figure 11: Oracle Database Failover Performance on OCI single instance with single instance standby
MAA recommends using a higher default FSFO threshold for Oracle RAC (60 seconds) and Exadata (30 seconds).
The overall failover times can be still less than 1 minute for Exadata within OCI.
Oracle Data Guard SYNC transport with FSFO is typically configured to provide zero data loss HA. When configured
with heavy OLTP Swingbench-generated workload that is 5 times higher than typical OLTP workload, test results
have shown that Oracle OCI can easily support synchronous transport with very little overhead on the primary and
with near zero data lag on the standby. Table 7 provides an overview of all test results.
Figure 11 shows that enabling SYNC transport had very little performance impact on a very intensive OLTP
application with 15 MB/sec redo rate. Typical OLTP application generates less than 3 MB/sec. OLTP transactions
are impacted by the SYNC replication due to two-phase commit requirement. However, in Oracle OCI, no
application impacts were observed.
The application downtime from the time the service switches to the new primary database, including the Data Guard
switchover time, can be less than 2 minutes. With Oracle Database 12c, downtime was observed to be less than 13
seconds in Oracle OCI. For additional planned maintenance documents, please refer to Role Transition Best
Practices: Data Guard and Active Data Guard, Database Rolling Upgrade using Data Guard and Automated
Database Upgrades using Oracle Active Data Guard and DBMS_ROLLING.
On a primary database, the health check determines if the following conditions are met:
• Database is in the state specified by the user, as recorded in the broker configuration file
• Database is in the correct data protection mode
• Database is using a server parameter file
• Database is in the ARCHIVELOG mode
• Redo transport services do not have any errors
• Database settings match those specified by the broker configurable properties
• Redo transport settings match those specified by the redo transport-related properties of the standby
databases
• Current data protection level is consistent with configured data protection mode
• Primary database is able to resolve all gaps for all standby databases
On a standby database, the health check determines whether the following conditions are met:
• Database is in the state specified by the user, as recorded in the broker configuration file
• Database is using a server parameter file
• Database settings match those specified by the broker configurable properties
• Database guard is turned on when the database is a logical standby database
• Primary and target standby databases are synchronized or within lag limits if fast-start failover is enabled
To identify any warnings on the overall configuration, show the status using the SHOW CONFIGURATION command:
DGMGRL> show configuration;
Configuration – dg
Configuration Status:
SUCCESS (status updated 18 seconds ago)
If the configuration status is SUCCESS, everything in the broker configuration is working properly. However, if you
see a status of WARNING or ERROR, then something is wrong in the configuration. Additional error messages will
accompany the WARNING or ERROR status that should be used to identify current issues. The next step is to
examine each database in the configuration to narrow down what the specific error is related to.
Database – tin
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
tin1
tin2
Database Status:
SUCCESS
If the database status is SUCCESS then the database is working properly. However, if you see a status of WARNING
or ERROR, then something is wrong in the database. Additional error messages will accompany the WARNING or
ERROR status that should be used to identify current issues. Repeat the same SHOW DATABASE command on the
standby database and assess any error messages.
In addition to the above commands, Data Guard broker features a VALIDATE DATABASE command with Oracle
Database 12c Release 1 and later:
DGMGRL> validate database tin
Database Role: Primary database
Ready for Switchover: Yes
DGMGRL> validate database can;
Capacity Information:
Database Instances Threads
tin 2 2
can 1 2
Warning: the target standby has fewer instances than the
primary database, this may impact application performance
The VALIDATE DATABASE command does not provide a SUCCESS or WARNING status and must be examined to
determine if any action needs to be taken.
It is recommended that you run the VALIDATE DATABASE command after the broker configuration has been
created, and prior to and after any role transition operation.
When using the Data Guard broker, the transport or apply lag can be viewed by using the SHOW DATABASE
command and referencing the standby database , as shown here.
Database Status:
SUCCESS
The Data Guard broker TransportDisconnectedThreshold database property (default of 0 in release 11.2
and 30 seconds for releases 12.1 and 12.2) can be used to generate a warning status for a standby when the last
communication from the primary database exceeds the value specified by the property. The property value is
expressed in seconds. The follow is an example of the warning when a disconnection has occurred:
Database – orclsb
Database Warning(s):
ORA-16857: member disconnected from redo source for longer than specified
threshold
• The ApplyLagThreshold configurable database property generates a warning status for a logical or physical
standby when the database's apply lag exceeds the value specified by the property. The property value is
expressed in seconds. A value of 0 seconds results in no warnings being generated when an apply lag exists.
As a best practice, Oracle recommends setting ApplyLagThreshold to at least 15 minutes (default of 0 in
version 11.2 and 30 seconds for versions 12.1 and 12.2).
• The TransportLagThreshold configurable database property can be used to generate a warning status for
a logical, physical, or snapshot standby when the database's transport lag exceeds the value specified by the
property. The property value is expressed in seconds. A value of 0 seconds results in no warnings being
generated when a transport lag exists. As a best practice, Oracle recommends setting
TransportLagThreshold (default of 0 in release 11.2 and 30 seconds in release 12.1 and 12.2) to at least
15 minutes.
Oracle GoldenGate
With Oracle GoldenGate, you can set up an active/active replica for zero downtime migrations, upgrades, or online
changes. You can set up Oracle GoldenGate on Oracle OCI manually for deploying the Platinum MAA reference
architecture. Refer to Oracle GoldenGate Performance Best Practices and Transparent Role Transitions with Oracle
Data Guard and Oracle GoldenGate.
Conclusion
Enterprises need solutions that address the full continuum of requirements for data protection, availability, and
disaster recovery. Oracle MAA best practices define four HA reference architectures: BRONZE, SILVER, GOLD,
and PLATINUM that address the most typical HA SLAs.
Oracle OCI infrastructure provides an ultimate scalable and available network, compute, and storage environment to
support all Oracle applications and databases that require any of the above MAA reference architectures.
Moreover, Oracle OCI’s high bandwidth and low latency storage and network infrastructure, along with ability to
deploy single instance databases, Oracle RAC Databases. Exadata Database Machine and Data Guard Fast-Start
Failover configurations make OCI the best cloud infrastructure to deploy MAA. As more OCI features are added, the
MAA team will continue to ensure that MAA architectures, configurations, and life cycle operations are incorporated.
CONNECT WITH US
blogs.oracle.com/oracle Copyright © 2018, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the
contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
facebook.com/oracle warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
twitter.com/oracle means, electronic or mechanical, for any purpose, without our prior written permission.
oracle.com Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and
are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are
trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0116