Вы находитесь на странице: 1из 39

Application High Availability

with Oracle

Aychin Gasimov
02/2014

Application High Availability


Application must be able to provide
uninterrupted service to its end users.
Application must be able to handle below
listed cases:
Member instance of the service failure
All instances of the service failure
Node/Site failure
Planned downtimes

Required components
Oracle Clusterware, Oracle Restart, Oracle Data
Guard
FAN
ONS
Services
UCP
LBA and different types of load balancing
FCF
TAF

FAN Fast Application Notification


FAN is a notification mechanism that Oracle
Clusterware uses to notify other processes
FAN publishes service/instance/node state
change events, like UP and DOWN
FAN also publishes load balancing advisory events
FAN events are published using Oracle
Notification Service and Oracle Streams
Advanced Queuing.
Oracle Net Services listeners are integrated with
FAN events

FAN Fast Application Notification


FAN publishes service/instance/node state change
events, like UP and DOWN
FAN notifies about configuration and service level information that
includes service status changes, such as UP or DOWN events. Applications can
respond to FAN events and take immediate action. FAN UP and DOWN events can
apply to instances, services, and nodes.
For cluster configuration changes, the Oracle RAC high availability
framework publishes a FAN event immediately when a state change occurs in the
cluster. Instead of waiting for the application to poll the database and detect a
problem, applications can receive FAN events and react immediately. With FAN,
in-flight transactions can be immediately terminated and the client notified when
the instance fails.

FAN Fast Application Notification


FAN publishes load balancing advisory events
FAN also publishes load balancing advisory events. Applications can take
advantage of the load balancing advisory FAN events to direct work requests to
the instance in the cluster that is currently providing the best service quality.

Listeners are integrated with FAN events


Oracle Net Services listeners are integrated with FAN events,
enabling the listener and CMAN to immediately de-register services provided
by the failed instance and to avoid erroneously sending connection requests
to failed instances.

SUBSCRIBE_FOR_NODE_DOWN_EVENT_listener_name=ON (default)

ONS Oracle Notification Service


A publish and subscribe service for
communicating information about all FAN events.
Oracle Notification Service is included as part of
the Oracle Clusterware and Client software
(ons.jar).
Maintained as the Clusterware resource
One ONS process per node
Can communicate ONS processes on other nodes
and on client side

Services
A named representation of one or more database instances. The service
name for an Oracle database is normally its global database name. Clients
use the service name to connect to one or more database instances.
Logical abstractions for managing workloads in Oracle Database
The services are tightly integrated with Oracle Database and are
maintained in the data dictionary.
Connection requests can include a database service name.
Services enable you to configure a workload, administer it, enable and
disable it, and measure the workload as a single entity.
AWR records service performance. Each service has quality-of-service
thresholds for response time and CPU consumption.
Database Resource Manager can map services to consumer groups.
Therefore, you can automatically manage the priority of one service
relative to others.
Services can be created by DBMS_SERVICE package or srvctl utility

Services
Oracle Cluster CLS1

Applications

Node 1
Instance1
30% RTPC 0.5s CPUPC 0.3s
70% RTPC 0.7s CPUPC 0.5s
Srv1_db
Srv2_db

Node 2
Instance2
100% RTPC 0.3s CPUPC 0.2s

Srv3_db
Node 3
Instance3
60% RTPC 0.8s CPUPC 0.6s
40% RTPC 0.5s CPUPC 0.3s

Using Resource Manager to distribute resources between services


Setting thresholds on Response Time per sec and CPU per sec for the services

UCP Universal Connection Pool


UCP for JDBC provides a connection pool implementation
for caching JDBC connections. Java applications that are
database-intensive use the connection pool to improve
performance and better utilize system resources.
A UCP JDBC connection pool can use any JDBC driver to
create physical connections that are then maintained by
the pool.
The pool also leverages many high availability and
performance features available through an Oracle Real
Application Clusters (RAC) database. These features include
Fast Connection Failover (FCF), run-time connection load
balancing, and connection affinity.
Documented in Oracle Universal Connection Pool for
JDBC Developer's Guide

Requirements for UCP


JRE 1.5 or higher
A JDBC diver or a connection factory class
capable of returning a java.sql.Connection and
javax.sql.XAConnection object
Oracle drivers from releases 10.1 or higher are
supported. Advanced Oracle Database features, such
as Oracle RAC and Fast Connection Failover, require
the Oracle Notification Service library (ons.jar) that is
included with the Oracle Client software.

The ucp.jar library must be included in the


CLASSPATH of an application.

LBA Load Balancing Advisory


The Load Balancing Advisory provides information to applications or
clients about the current service levels that the Oracle RAC database
instances are providing. (v$servicemetric.goodness)
Load balancing advisory is integrated with the AWR. AWR measures
response time and CPU consumption for each service
The advice given by the LBA takes into account the power of the server
and the current workload of the service
Integrated with Oracle 11g JDBC, ODP.NET and OCI
Applications can take advantage of the load balancing FAN events to direct
work requests to the instance in the cluster that provides the best
performance based on the workload management directives defined for
that service.
Configured by defining service-level goals for the Service. It enables the
LBA for that service and enables the publication of FAN load balancing
events.
Listener also can use the load balancing advisory when it balances the
connection loads if LBA enabled and clb_goal is set to SHORT for the
Service.

RLB Run-time Load Balancing


RLB is a feature of Oracle connection pools that can distribute client work
requests across the instances in an Oracle RAC, based on the LBA
information. It allocates connections, based on the current performance
levels. This provides load balancing at the transaction level.
There are two types of service-level goals for Run-time Connection Load
Balancing
Service Time (SERVICE_TIME)Attempts to direct work requests to instances
according to response time. Load balancing advisory data is based on elapsed
time for work done in the service plus available bandwidth to the service. An
example for the use of SERVICE_TIME is for workloads such as internet
shopping where the rate of demand changes. (v$servicemetric.dbtimepercall)
srvctl modify service -d DB -s app_srvc -B SERVICE_TIME -j SHORT

Throughput (THROUGHPUT)Attempts to direct work requests according to


throughput. The load balancing advisory is based on the rate that work is
completed in the service plus available bandwidth to the service. An example
for the use of THROUGHPUT is for workloads such as batch processes, where
the next job starts when the last job completes. (v$servicemetric.callspersec)
srvctl modify service -d DB -s batch_srvc -B THROUGHPUT -j LONG

CLB Connection Load Balancing


Provides load balancing at the time of the initial database
connection
Listener directs a connection request to the best instance currently
providing the service
For each service, you can define the method the listener uses for
load balancing by setting the connection load balancing goal.
SHORT --Connection load balancing uses Load Balancing Advisory,
when Load Balancing Advisory is enabled (either goal_service_time or
goal_throughput). When GOAL=NONE (LBA disabled), connection load
balancing uses an abridged advice based on CPU utilization.
LONG --Balances the number of connections per instance using
session count per service. This setting is recommended for
applications with long connections such as forms.

Controlled by clb_goal property of the Service

Client-Side Load Balancing


Client-side load balancing balances the connection requests
across the listeners.
Client-side load balancing is defined in client connection
definition by setting the parameter LOAD_BALANCE=ON
Oracle client randomly selects an address from the address
list, and connects to that node's listener
Client-side load balancing includes connection failover.
LOAD_BALANCE is ON by default for DESCRIPTION_LIST
only. This parameter by default is OFF for an address list
within a DESCRIPTION. Setting this ON for a SCAN-based
address implies that new connections will be randomly
assigned to one of the 3 SCAN-based IP addresses resolved
by DNS.

Client-Side Failover and Load Balancing

100.125.200.21

100.125.200.22
100.125.200.23

DB =
(DESCRIPTION =
(FAILOVER = on)
(LOAD_BALANCE = off)
(CONNECT_TIMEOUT = 5)
(TRANSPORT_CONNECT_TIMEOUT = 2)
(RETRY_COUNT = 2)
(ADDRESS = (PROTOCOL = TCP)(HOST = scan1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = scan2)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = scan3)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = myservice)
)
)

CLB occurs on client side


FAILOVER option is set to ON, it is default value.
Connection will be tried to first SCAN address
Then within this 3 SCAN IPs connection will be tried to first IP, if it will fail then second IP will be tried, each try
will have 2 sec TCP timeout and 5 sec overall timeout to connect and this 3 IPs will be traversed 3 times, 1 time
+ 2 RETRY_COUNT. It means that if all 3 SCAN IPs will fail it will take up to 2 * 3 * 3 = 18 sec to try next SCAN
address. If TCP connection will success in 2 sec then we will have additional time (CONNECT_TIMEOUT TRANSPORT_CONNECT_TIMOUT) to establish connection to the instance.
If next SCAN address will success then connection will be established
If all subsequent address will fail then all addresses will be tries 2 more times. Overall number of tries will be 3.
Addresses will be tried one by one in sequential order (LOAD_BALANCE=off), in this particular case the load
balancing between 3 SCAN IPs also will not be performed, it will try to connect to the first IP returned from
DNS
To enable CLB set LOAD_BALANCE=ON, then address will be randomly chosen from 3 addresses and also it will
randomly choose between 3 SCAN IPs.

How it works together


pds.setURL(jdbc:oracle:thin:@(DESCRIPTION=
(LOAD_BALANCE=ON)
(ADDRESS = (host=db-scan))
...
(CONNECT_DATA=(SERVICE_NAME=service1)));

Application
SCAN Listener
service1

UCP

service2

ONS
Instance 1
ONS

Instance 2

SCAN Listener
service1
service2

ONS

Instance 3

FAN LBA event

UCP will create physical connections to the instances using provided connection description. Client side load balancing will distribute new
connection requests between different SCAN listeners (3 IPs) because LOAD_BALANCE=ON
Connection request arrives to the Listener, now according to the Services clb_goal value it will redirect it to the appropriate instance, it is serverside connection load balancing. If clb_goal is SHORT and LBA is enabled for the Service then listener will use the services GOODNESS information
which it receives from serving instances to decide to which instance to redirect the connection. If clb_goal is LONG then Listener will balance
connections by number of sessions per service. If connection pools physical connections count is constant then we can use clb_goal=LONG with
UCP, if this number is dynamic then clb_goal=SHORT must be used, because each new connection request from UCP must be accurately redirected
according to the LBA advice and goal (goal can be SERVICE_TIME or THROUGHPUT)
ONS from each node periodically sends LBA FAN events to UCP. This way UCP is aware about current service levels on each instance, like Listener.
According to this information Run-time load balancing mechanism distributes workload between different instances during application life.

How it works together


Run-Time Load Balancing and Connection load balancing are related if clb_goal
of the Service is set to SHORT in:
They both use Load Balancing Advisor.
They both use same balancing goal defined in the Service definition by B
key, i.e. SERVICE_TIME or THROUGHPUT.
Database using AWR data will calculate the GOODNESS for each service based on
the runtime load balancing goal or clb_goal for that service. Current GOODNESS
number can be found in the V$SERVICEMETRIC.GOODNESS field.
If clb_goal is:
LONG, LBA will not be used for server-side load balancing, GOODNESS field
will contain just the number of current sessions for this service in current
instance.
SHORT, LBA will be used for server-side load balancing, GOODNESS will be
calculated based on the load balancing goal, SERVICE_TIME or
THROUGHPUT.
If clb_goal is SHORT and LBA is not enabled B NONE then listener will consider
the node load to equalize CPU usage when distributing connections.

FCF Fast Connection Failover


FCF designed for fast instance and database failover and switchover with
Oracle RAC and Oracle Data Guard.
FCF receives FAN availability events and immediately clears affected
connections from the pool.
Requires the use of an Oracle JDBC driver for JAVA applications and an
Oracle RAC database or an Oracle Restart.
Can be used with Session and Connection pools of OCI applications
It was introduced as part of pooling feature Implicit Connection Cache
that available from JDBC 10g
Starting from 11gR2 Implicit Connection Caching is deprecated in favor of
UCP
Now UCP must be used to benefit from FCF and RLB.
FCF supports planned (instance relocation or shutdown in RAC database)
and unplanned outages
Application logic must be used to make outages transparent for the end
users.

FCF
Planned outage
Stale borrowed connections are marked and removed after they
are returned to the pool
On-going transactions proceed to complete

Unplanned outage
Detect and remove stale connections from pool
Borrowed connections are immediately aborted and closed
On-going transactions immediately receive an exception

FCF supports RAC database, Data Guard and Single Instance


with Oracle Restart, they all can publish FAN messages
Set oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR
property in milliseconds.

FCF planned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

Node 2

ONS

Service

Instance 2

Application uses UCP, there is 9 physical connections in the pool


Connections are distributed between 2 RAC instances
Now execute:
srvctl stop service d DB s Service I Instance1

FCF planned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

ONS

Service

Node 2

Instance 2

Service on Instance 1 went down, evmd publishes service DOWN event

FCF planned outage


FAN servc DOWN

Application

Instance 1

Service

UCP

Interconnect

Borrowed
connections

Node 1

ONS

ONS

Service

Node 2

Instance 2

ONS publishes FAN availability event about service DOWN on Instance 1

FCF planned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

ONS

Service

Instance 1

Service

Node 2

Instance 2

UCP received FAN event and immediately marks borrowed connections to the Instance 1 as to be
cleared, not borrowed connects are cleared and if needed reestablished to the available instance
Physical connections is still there, because there is borrowed connections in use. It is possible
because when we do normal service shutdown already active connections are not disconnected
and it is up to client (UCP) when to disconnect.

FCF planned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

Node 2

ONS

Service

Instance 2

As soon as application closes borrowed connection UCP will clear it

FCF planned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

ONS

Service

Node 2

Instance 2

If the pool min size will be reached new connection will be reestablished immediately
to the available Instance
After Service on Node 1 will be started new connections will be placed to it by SLB

FCF unplanned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

ONS

Service

Application uses UCP, there is 9 physical connections in the pool


Connections are distributed between 2 RAC instances

Node 2

Instance 2

FCF unplanned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

ONS

Service

Node 1 fails, evmd publishes DOWN event

Instance 1

Service

Node 2

Instance 2

FCF unplanned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

ONS

Service

Node 2

Instance 2

Connections to the Instance 1 will fall into TCP retransmission cycle and will be in this
state until TCP timeout will expire which can take several minutes, but

FCF unplanned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

Instance 1

Service

ONS

Service
FAN DOWN event

ONS will distribute DOWN event immediately

Node 2

Instance 2

FCF unplanned outage


Node 1

ONS

Application

UCP

Interconnect

Borrowed
connections

ONS

Service

Instance 1

Service

Node 2

Instance 2

UCP will receive DOWN event and will immediately break affected connections out of TCP timeouts by
disconnecting physical connections
Application will immediately receive error, all not committed work is already rolled back by Instance 2.
Application do not need to execute rollback.
Application must:
Retry the connection request, because the old one is no longer open
Replay the transaction

Usage model of UCP/FCF


1. Get connection from the pool
2. Perform activity on it
3. Get exception from failure of some
component
4. Check with isValid() function if connection
still valid
5. If not, reconnect and recover lost actions
For information about how to configure UCP in your java app refer to:
Oracle Universal Connection Pool for JDBC Developer's Guide 11g Release 2 (11.2)

FCF with Data Guard failover


1. Primary site lost! Connections fall into hang-state.
2. After failover complete. Respective database services
will start and DG Broker publish FAN availability
event
3. FCF will break connections out from TCP time out,
clear stale connections and throw error to the
application
4. Application will retry connection and replay lost
transactions if any

FCF not needed for DG switchover

For DG switchover FCF is not needed because its


primary role is to break connections from TCP
timeouts. Which is not a case when planned
switchover occurs.
Switchover steps:
1.
2.
3.
4.

Primary converts to physical standby and disconnects all


sessions
Client sessions receive ORA-3113 and begin going through
their retry logic (TAF for OCI and Application logic for JDBC)
Standby converted to primary database
As new primary opened the respective services are started
and clients now see the services as available and connect.
Replay lost actions if any.

TAF Transparent Application Failover

Client side feature of the OCI driver


Transparently fails over read-only sessions
Can use FAN events distributed by Streams AQ
Do not restore sessions state (ALTER SESSION)
Do not support DML
Provides callback functions to manage failover
steps
Can be configured on client as well as on the
server side using database Services

TAF Transparent Application Failover


To use FAN with OCI next conditions must met:
Initialize the OCI Environment in OCI_EVENTS mode
Connect to the Service that have AQ HA notifications
Link with a thread library

TAF have 2 failover types


SESSION, when new sessions will be reestablished by
TAF but no select operation recovery
SELECT, new sessions will be reestablished and
enables users with open cursors to continue fetching
after failover. Involves overhead on the client side in
normal select operations

TAF Transparent Application Failover

Sessions with active update transactions (UPDATE,


INSERT, DELETE) at the time of the failure:

Will be reconnected to a new session


Uncommitted transactions will be rolled back
Error message will be returned to the application, stating
that a rollback must be issued
Application must rollback and reissue the transaction

TAF also provides the ability, with the RETRIES and


DELAY parameters, to automatically retry reconnecting on failover
Example of TAF configured service creation:
srvctl add service -d DB -s taf_service -q TRUE -e SESSION -m BASIC -w 10 -z 50

-q TRUE enables AQ HA notifications


-e SESSION sets failover type to SESSION
-m set failover method to BASIC
-w set failover delay to 10 sec
-z set failover retries to 50

Oracle 12c Application Continuity

Restores full session including all states, cursors,


variables and last transaction if there was any.
Supports planned and unplanned outages
Performed automatically, minimal application change
Supported for Oracle RAC, Data Guard, Active Data
Guard and WebLogic Server in conjunction with the
JDBC Thin Driver or the UCP.
It applies only to JDBC Thin connections (JDBC OCI is
not supported).
Requires JDBC Replay driver
Service properties: FAILOVER_TYPE=TRANSACTION,
COMMIT_OUTCOME=TRUE,NOTIFICATION=TRUE

Oracle 12c Application Continuity


Node 1

ONS

Application

Continuity
Directory
Interconnect

LTXID
Replay
Context

UCP

JDBC Replay Driver

Borrowed
connections

Instance 1

Service

ONS

Service

LTXID

Node 2

Instance 2
Continuity
Directory
LTXID

Вам также может понравиться