Вы находитесь на странице: 1из 12

Mokum is the only full-time Oracle virtualization

integrator with the expertise to help you virtualize your


Production, Test and DR Oracle workloads.

sales@mokumsolutions.com
Copyright 2014 Mokum Solutions, Inc. All rights reserved.
Distribution of the Oracle Cloud Cookbook or derivative of the work in any form
is prohibited unless prior permission is obtained from the Copyright holder.
About Mokum Solutions, Inc.
Founded in March 2011, Mokum Solutions, Inc. specializes in the implementation,
delivery and support of Oracle technologies in private and public clouds. Mokum
corporate headquarters are located in San Francisco, CA http://mokumsolutions.com
or call 1 415 252 9164
About the Author
The author of the Oracle Cloud Cookbook is none other than the owner of
Mokum Solutions, Inc., Roddy Rodstein. Roddy is one of the most respected
Oracle Cloud Computing experts, having designed and managed many of the
worlds largest and most complex Oracle private clouds. Before establishing
Mokum in March 2011, Roddy spent three years at Oracle on the Oracle VM
and Oracle Linux team designing and supporting Oracle's largest and most
complex customer environments. Before Oracle, Roddy spent six years at Citrix,
designing and supporting Citrix's largest and most complex customer environments,
Including Oracle's. With Mr. Rodsteins rich background and knowledge, there
can be no better resource for revealing the Oracle Cloud recipe.
Audience
The Oracle Cloud Cookbook is a comprehensive, field tested reference design that
guides you through each step to move to your Oracle software portfolio to an elastic
Oracle cloud using the Oracle VM product line, Oracle Linux, Oracle Engineered
Systems managed by Oracle Enterprise Manager 12c, with total control over Oracle
processor licensing.

http://mokumsolutions.com

Last update: 01/03/14


This chapter of the Oracle Cloud Cookbook describes how to fault test Oracle VM. This chapter applies to
all Oracle VM releases.

Table of Contents
Oracle
Oracle
Oracle
Oracle
Oracle

VM
VM
VM
VM
VM

Fault Testing Introduction


Architectural Review and Oracle VM Fault Testing
Network Fault Testing
Storage Fault Testing
Master Server VIP Failover Testing

Change Log
Revision

Change Description

Updated By

Date

1.0

Document Creation

Roddy Rodstein

01/03/14

Oracle VM Fault Testing Introduction


Before an Oracle VM server pool is placed into production, both network and storage fault tests should be
conducted on each Oracle VM server to nd compatible o2cb timeout values, an 802.3AD bond mode (1, 4
or 6) and network and storage congurations that provide predicable failure response. For example, Oracle
VM Servers should be able to lose a bond port, redundant network or storage switch and/or an HBA
without node evictions. Incompatible o2cb timeout values, 802.3AD bond modes, network switch and
storage congurations can trigger node evictions and unexpected server reboots.
Tip: The only way to know if an Oracle VM server pool is production ready, is to fault test it.

Oracle VM Architectural Review and Oracle VM Fault Testing


Oracle VM uses OCFS2 to manage up to 32 clustered Oracle VM Servers in an Oracle VM server pool.
OCFS2 monitors the status of each server pool member using a network and storage heartbeat. If a server
pool member fails to update or respond to network and/or storage heartbeats, the server pool member is
fenced from the pool, promptly reboots, then all HA-enabled virtual machines are restarted on a live node
in the pool. Fencing forcefully removes dead servers from a pool to ensure that active servers are not
obstructed from accessing fenced servers cluster resources. The term node eviction is also used to
describe servers fencing and reboots. A best practice is to design Oracle VM Server pools with dedicated
network and storage network channels to avoid contention and unexpected server reboots.
A slightly modied version of OCSF2 (o2dlm) is bundled with Oracle VM. The OCFS2 le system and
cluster stack are installed and congured as part of an Oracle VM Server installation. The o2cb service
manages the cluster stack and the ocfs2 service manages the OCSF2 le system. The o2cb cluster service
is a set of modules and in-memory le systems that manage the ocfs2 le system service, network and
storage heartbeats and node evictions.
The Oracle Cluster File System 2 (OCFS2) is a general-purpose journaling le system developed by Oracle.
Oracle released OCFS2 under the GNU General Public License (GPL), version 2. The OCSF2 source code
and its tool set are part of the mainline Linux 2.6 kernel and above. The OCSF2 source code and its tool set
can be downloaded from kernel.org, the Oracle Public Yum Server and from the Unbreakable Linux

Mokum Solutions, Inc. +1 415 252-9164

4 of 14

http://mokumsolutions.com
Network.
Oracle VM facilitates centralized server pool management using an agent-based architecture. The Oracle
VM agent is a python application that is installed by default with Oracle VM Server. Oracle VM Manager
dispatches commands using XML RPC over a dedicated network called the Server Management network
channel using TCP/8899 to each server pool's Master Server agent. Each Master Server agent dispatch
commands to subordinate agent servers using TCP/8899. The Oracle VM agent is also responsible for
propagating the /etc/ocfs2/cluster.conf le to subordinate agent servers. There is only one Master Server in
a server pool at any one point in time. The Master Server is the only server in a server pool to communicate
with Oracle VM Manager.
Note: When Oracle VM Server is installed, the IP address entered during the installation is assigned to the
Server Management network channel.
Once an Oracle VM server pool is created, two cluster conguration les are shared across all nodes in the
server pool that maintain the cluster layout and cluster timeouts congurations. The /etc/ocfs2/cluster.conf
le maintains the cluster layout and the /etc/syscong/o2cb le maintains the cluster timeouts. Both
conguration les are read by the user-space utility congfs. congfs communicates the list of nodes in the
/etc/ocfs2/cluster.conf le to the in-kernel node manager, along with the resource used for the heartbeat to
the in-kernel heartbeat thread.
An Oracle VM server must be online to be a member of an Oracle VM pool/cluster. Once the cluster is
on-line, each pool member starts a process, o2net. The o2net process creates TCP/IP intra-cluster node
communication channels on port 7777 and sends regular keepalive packages to each node in the cluster to
validate if the nodes are alive. The intra-cluster node communication uses the Cluster Heartbeat network
channel. If a pool member loses network connectivity the keepalive connection becomes silent causing the
node to self-fence. The keepalive connection time out value is managed in each nodes /etc/syscong/o2cb
le's O2CB_IDLE_TIMEOUT_MS setting.
Along with the keepalive packages that check for node connectivity, the cluster stack also employs a disk
heartbeat check. o2hb is the process that is responsible for the disk heartbeat component of cluster stack
that actively monitors the status of all pool members. The heartbeat system uses a le on the OCSF2 le
system, that each pool member periodically writes a block to, along with a time stamp. The time stamps are
read by each pool member and are used to check if a pool member is alive or dead. If a pool members
block stops getting updated, the node is considered dead, and self-fences. The disk heartbeat time out value
is managed in each nodes /etc/syscong/o2cb le's O2CB_HEARTBEAT_THRESHOLD setting.
The OCFS2 network and storage heartbeat time out values are managed in each Oracle VM Servers
/etc/syscong/o2cb le. Each pool member must have the same /etc/syscong/o2cb values. The default
timeout values should be tested and tuned to your network and storage infrastructure to provide predicable
failure response.
Tip: If a SAN storage controller fail over takes 120 seconds, and OCFS2 is set to the default value of 60
seconds, Oracle VM Servers will reboot halfway through the controller fail over. The
O2CB_HEARTBEAT_THRESHOLD timeout value must longer then the SAN storage controller fail over
timeout value.
The next example shows the default Oracle VM 3.2.x /etc/syscong/o2cb timeout values.
O2CB_IDLE_TIMEOUT_MS=60000 (60 secs)
O2CB_HEARTBEAT_THRESHOLD=91 (180 secs)
O2CB_RECONNECT_DELAY_MS=2000 (2 secs)
O2CB_KEEPALIVE_DELAY_MS=2000 (2 secs)
The next list explains each Oracle VM 3.2.x o2cb timeout value.
1- O2CB_IDLE_TIMEOUT_MS: Default settings is 60000 = 60 secs
Time in ms before a network connection is considered dead.
2- O2CB_HEARTBEAT_THRESHOLD: Default 91 = 180 secs
The disk heartbeat timeout is the number of two-second iterations before a node is considered dead. The
exact formula used to convert the timeout in seconds to the number of iterations is:

Mokum Solutions, Inc. +1 415 252-9164

5 of 14

http://mokumsolutions.com
O2CB_HEARTBEAT_THRESHOLD = (((timeout in seconds) / 2) + 1)
For example, to specify a 60 sec timeout, set it to 31. For 120 secs, set it to 61. The default for Oracle VM
3.2.x is 91 (180 secs), the default for Oracle VM 3.1.x is 31 (60 secs).
Note: The O2CB_HEARTBEAT_THRESHOLD should be congured using the Oracle VM Manager GUI. The
max setting for 3.2.x and above in Oracle VM Manager is 300, i.e. O2CB_HEARTBEAT_THRESHOLD = 151
in /etc/syscong/o2cb.
3- O2CB_RECONNECT_DELAY_MS: Default 2000 = 2 secs
Min time in ms between connection attempts
4- O2CB_KEEPALIVE_DELAY_MS: Default 2000 = 2 secs
Max time in ms before a keepalive packet is sent
Note: If reboots are occurring and the root cause has not yet been identied, the following time-out values
may provide a temporary solution.
O2CB_IDLE_TIMEOUT_MS=90000 (90 secs)
O2CB_HEARTBEAT_THRESHOLD=91 (180 secs)
O2CB_RECONNECT_DELAY_MS=4000 (4 secs)
O2CB_KEEPALIVE_DELAY_MS=4000 (4 secs)
A minimum of one Ethernet network interface (NIC) card is required to install Oracle VM, although at least
four 10G or four or more 1G NICs are recommended for fault testing. Trunk ports/802.1Q and/or access
ports with NIC bonding mode 1, 4 and 6 are supported and congured post Oracle VM Server installation
with Oracle VM Manager and/or Oracle Enterprise Manager. Oracle VM supports two NIC ports per
network bond and a total of ve network bonds per Oracle VM Server.
The next Figures shows two dierent Oracle VM networking strategies.
Figure 1 shows a four port 10G 802.1q/LACP trunk port design with two mode 4 bonds.

Figure 2 shows a ten port 1G or 10G access port design with ve mode 1, 4 or 6 bonds.

Mokum Solutions, Inc. +1 415 252-9164

6 of 14

http://mokumsolutions.com

Tip: I highly recommend the four port 10G 802.1q/LACP trunk port design with two mode 4 bonds. Trunk
ports can have two or more VLANs per port. An access port is limited to one VLAN per port.
The Cluster Heartbeat, Storage and Virtual Machine network channels NIC bond modes (1, 4 and 6) should
be fault tested with various network switch settings to conrm which NIC bonding mode and network
switch setting combination provides predicable failure response. Table 1 shows each of the Oracle VM
network channels, NIC bonding modes, and a variety of network switch options that should be fault tested.
Network
Channel

Description

Network
Type

iLO
(Integrated
Lights Out
Manager)

iLO ports enable browser


Class A, B
based access to servers for or C
installations and
management.

Bond Modes

Network Switch Options

Not
applicable

Trunk ports and/or Access


ports.
An access port is limited to one
VLAN per port.
A trunk port can have two or
more VLANs per port.

Note: iLo is
iLO ports should be on a
not managed network isolated from the
by Oracle VM server payload networks.
Manager. ilo
is included in
this table for
completeness.

Most network switches support


the following standards:
EtherChannel, Port Channels,
Link Aggregation Control
Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

Server
Management

The Server Management


Class A, B
network is the
or C
communication channel for
Oracle VM Manager,
Oracle VM Server agents

Mode 1, 4 or
6

Trunk ports and/or Access


ports.
An access port is limited to
Mode 1 and 6 only "one" VLAN on the port,
do not require i.e. the port can carry trac for

Mokum Solutions, Inc. +1 415 252-9164

7 of 14

http://mokumsolutions.com

as well as administrative
ssh access to Oracle VM
Servers, and
http/https/VNC access to
and from Oracle VM
Manager.

special
network
switch
settings.
Mode 4
requires the
network
switch to
support
802.3ad/LACP.

Oracle VM Manager
dispatches commands
using XML RPC over the
Server Management
network using TCP/8899 to
each server pool's master
agent server. Each Master
Server agent server
dispatch commands to
subordinate agent servers
using the Server
Management network with
TCP/8899.

one VLAN.
A trunk port can have two or
more VLANs the port, i.e. the
port can carry trac for
multiple simultaneous VLANs.
Most network switches support
the following standards:
EtherChannel, Port Channels,
Link Aggregation Control
Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

The Server Management


ports should be on an
isolated routable network.
Note: The Server
Management network
should be dedicated for
Oracle VM not a shared
corporate wide Server
Management network.
Cluster
Heartbeat

An Oracle VM server must


be online to be a member
of an Oracle VM
pool/cluster. Once the
cluster is on-line, each
pool member starts a
process, o2net. The o2net
process creates TCP/IP
intra-cluster node
communication channels
on port 7777 and sends
regular keepalive
packages to each node in
the cluster to validate if
the nodes are alive. The
intra-cluster node
communication uses the
Cluster Heartbeat network
channel. If a pool member
loses network connectivity
the keepalive connection
becomes silent causing the
node to self-fence. The
keepalive connection time
out value is managed in
each nodes
/etc/syscong/o2cb le's

Class A, B
or C
*The
network
could be a
private
non-routable
network.

Mode 1, 4 or
6

Trunk ports and/or Access


ports.
An access port is limited to
Mode 1 and 6 only "one" VLAN on the port,
do not require i.e. the port can carry trac for
special
one VLAN.
network
A trunk port can have two or
switch
more VLANs the port, i.e. the
settings.
port can carry trac for
multiple simultaneous VLANs.
Mode 4
requires the
Most network switches support
network
the following standards:
switch to
EtherChannel, Port Channels,
support
Link Aggregation Control
802.3ad/LACP. Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

Mokum Solutions, Inc. +1 415 252-9164

8 of 14

http://mokumsolutions.com

O2CB_IDLE_TIMEOUT_MS
setting.
Live
Migration

Storage

The Oracle VM Live


Migration feature moves
running virtual machines
between server pool
members across a LAN
without loss of availability.
Oracle VM uses an
iterative precopy method
to migrate running virtual
machines between two
pool members over the
Live Migration network
channel. A Live Migration
event starts when the
source server sends a
migration request to the
target server, which
contains the virtual
machines resource
requirements. If the target
accepts the migration
request, the source starts
the iterative precopy
phase. The iterative
precopy phase starts by
iteratively copying the
guests memory pages
from the source to the
target server over the Live
Migration network
channel. If a memory page
changes during the
precopy phase, it is
marked dirty and resent.
Once the majority of the
pages are copied, the
stop-and-copy phase
begins. The stop-and-copy
phase starts by pausing
the guest while the
remaining dirty pages are
copied to the target, which
usually takes 60 to 300
milliseconds. Once the
pages are copied to the
target, the virtual machine
is started on target server.

Class A, B
or C *The
network
could be a
private
non-routable
network.

The Storage network


channel is only used for
iSCI and NFS storage.
FC/SAN use a dedicated
bre fabric.

Class A, B,
C or
FC/SAN

Mode 1, 4 or
6

Trunk ports and/or Access


ports.
An access port is limited to
Mode 1 and 6 only "one" VLAN on the port,
do not require i.e. the port can carry trac for
special
one VLAN.
network
A trunk port can have two or
switch
more VLANs the port, i.e. the
settings.
port can carry trac for
multiple simultaneous VLANs.
Mode 4
requires the
Most network switches support
network
the following standards:
switch to
EtherChannel, Port Channels,
support
Link Aggregation Control
802.3ad/LACP. Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

Mode 1, 4 or
6

Trunk ports and/or Access


ports.
An access port is limited to
Mode 1 and 6 only "one" VLAN on the port,
do not require i.e. the port can carry trac for
special
one VLAN.
network
A trunk port can have two or
switch
more VLANs the port, i.e. the

Mokum Solutions, Inc. +1 415 252-9164

9 of 14

http://mokumsolutions.com

settings.
Mode 4
requires the
network
switch to
support
802.3ad/LACP.

port can carry trac for


multiple simultaneous VLANs.
Most network switches support
the following standards:
EtherChannel, Port Channels,
Link Aggregation Control
Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

Virtual
Machine(s)

The virtual machine


network channels provide
access to one or more
networks provisioned for
virtual machines.

Class A, B
or C

Mode 1, 4 or
6

Trunk ports and/or Access


ports.
An access port is limited to
Mode 1 and 6 only "one" VLAN on the port,
do not require i.e. the port can carry trac for
special
one VLAN.
network
A trunk port can have two or
switch
more VLANs the port, i.e. the
settings.
port can carry trac for
multiple simultaneous VLANs.
Mode 4
requires the
Most network switches support
network
the following standards:
switch to
EtherChannel, Port Channels,
support
Link Aggregation Control
802.3ad/LACP. Protocol (LACP) / 802.3ad, etc...
Consult Cisco for network
switch conguration details:
http://www.cisco.com/en/US
/tech/tk389/tk213
/tsd_technology_support_proto...

Tip: A best practice is use a minimum of three nodes per cluster to ensure quorum and to be able to
generate meaningful network heartbeat (O2CB_IDLE_TIMEOUT_MS timeout) fault tests. A known
limitation with two node clusters and network failures causes the node with the higher node number to
self-fence.

Oracle VM Network Fault Testing


Network Bond Mode, Network Bond failover, and O2CB_IDLE_TIMEOUT_MS Fault Tests:
If a server pool member fails to respond to its network heartbeat, the server pool member is fenced from
the pool, promptly reboots, then all HA-enabled virtual machines are restarted on a live node in the pool.
The network heartbeat should be placed on a routable or non-routable dedicated class A, B or C network. A
best practice is to provision a dedicated network channel for the network heartbeat to avoid network
contention and unexpected reboots. The network heartbeat is referred to as the Cluster Heartbeat network
channel in Oracle VM Manager.
Before an Oracle VM server pool is placed into production, network fault testing should be conducted on
the Cluster Heartbeat, Storage and Virtual Machine network channels to nd a suitable
O2CB_IDLE_TIMEOUT_MS o2cb timeout value, bond mode (1, 4 or 6) and network switch conguration
that provides predicable failure response. For example, Oracle VM Servers should be able to lose a bond
port/NIC and/or a redundant network switch without node evictions. Incompatible o2cb timeout values,

Mokum Solutions, Inc. +1 415 252-9164

10 of 14

http://mokumsolutions.com
bond modes and network switch congurations can trigger node evictions and unexpected server reboots.
Tip: When fault testing a two node cluster's O2CB_IDLE_TIMEOUT_MS time out value, the node with the
higher node number will reboot when the network fails. A best practice is use a minimum of three nodes
per cluster to ensure quorum and to be able to fault test the O2CB_IDLE_TIMEOUT_MS timeout value.
The next tables shows each of the fault test with the expected failure results. The example is four a four
port 10G 802.1q/LACP trunk port design with two mode 4 bonds. Modify the table to reect your design,
and your fault tests.
1. Disable switch ports using the following fault patterns to text NIC, Bond and Switch failures and
OCFS2 compatibility.
2. Use the following commands to conrm the tests results.
watch cat /proc/net/bonding/bond0
watch cat /proc/net/bonding/bond1
tail -f /var/log/messages
Note: While fault testing, run a HA enabled VM on the tested Oracle VM server to conrm that the VM is
not interrupted during the single port fault test. When both server management bond ports fail at the same
time, the Oracle VM server should fence, the HA enabled VM should restart on a live node.
Complete and document each test to conrm which settings provide the expected failure results.
Use Case
1- Disable the port and
wait +20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.

Port(s)

Mode
x

LACP

Bond stays active the node


does not self-fence.

eth1 Mode
bond0
x

LACP

Bond stays active the node


does not self-fence.

LACP

1) Bond stays active the


node does not self-fence.
2) VM does not lose
connectivity.

eth3 Mode
bond1
x

LACP

1) Bond stays active the


node does not self-fence.
2) VM does not lose
connectivity.

5- Disable both ports and


eth0
eth2 Mode
wait +20 secs over the
x
O2CB_IDLE_TIMEOUT_MS bond0 bond1

LACP

1) Bond stays active the


node does not self-fence.
2) VM does not lose

Enable the port and wait


+20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.

eth0
bond0

Bond Switch
Expected Results
Mode Mode

2- Disable the port and


wait +20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
Enable the port and wait
+20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
3- Disable the port and
wait 60 secs.
Enable the port and wait
60 secs.
4- Disable the port and
wait 60 secs.
Enable the port and wait
60 secs.

eth2
bond1

Mode
x

Mokum Solutions, Inc. +1 415 252-9164

Results

11 of 14

http://mokumsolutions.com

time out value.


Enable both ports and wait
+20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
6- Disable both ports and
wait +20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
Enable both ports and wait
+20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
7- Disable both ports and
wait 60 secs.
Enable both ports and wait
60 secs.
8- Disable both ports and
wait +20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.
Enable both ports and wait
+20 secs over the
O2CB_IDLE_TIMEOUT_MS
time out value.

connectivity.

eth1
eth3 Mode
bond0 bond1
x

LACP

1) Bond stays active the


node does not self-fence.
2) VM does not lose
connectivity.

eth2
eth3 Mode
bond1 bond1
x

LACP

Bond loses connectivity,


VMs lose connectivity, the
node does not self-fence.

LACP

1) At the
O2CB_IDLE_TIMEOUT_MS
timeout value the node
self-fences and reboots.
2) After ~60 the VM
restartes on a live node
3) The node successfully
reboots and joins the pool

eth0
eth1 Mode
bond0 bond0
x

Oracle VM Storage Fault Testing


If a server pool member fails to read and write storage heartbeats, the server pool member is fenced from
the pool, promptly reboots, then all HA-enabled virtual machines are restarted on a live node in the pool.
The storage heartbeat should be on a dedicated IP or Fiber channel network. A best practice is to have
dedicate not shared storage with performance monitoring and storage quotas alerts to avoid contention
and full disks that may cause unexpected reboots. A best practice with iSCSI and NFS storage is to
provision the storage on a dedicated class A,B or C network channel to avoid network contention and
unexpected server reboots.
Oracle VM uses two dierent types of storage repositories. The rst type of storage repository, called a pool
le system, is used to host a server pool's cluster congurations including the storage heartbeat. There can
only be one pool le system per server pool. The other type of storage repository, called a virtual machine
le system, is used to host virtual machine conguration les and disks. There can be one or more virtual
machine le system repositories in a server pool. Virtual machine le system repositories do not have a
storage heartbeat.
The storage heartbeat, also known as a quorum disk, is used to monitor the status of each Oracle VM
Server in a pool. With a quorum disk, every Oracle VM Server in a pool regularly reads and writes a small
amount of status data to a reserved section of pool le system. Each Oracle VM Server writes its own status
and reads the status of all the other Oracle VM Servers in the pool. If any Oracle VM Server in a pool fails
to update its status within its O2CB_HEARTBEAT_THRESHOLD o2cb timeout value, the Oracle VM Server
is fenced from the pool, promptly reboots, then all HA-enabled virtual machines are restarted on a live
Oracle VM Server in the pool.
Before an Oracle VM server pool is placed into production, storage fault testing should be conducted on the

Mokum Solutions, Inc. +1 415 252-9164

12 of 14

http://mokumsolutions.com
quorum disk to nd a suitable O2CB_HEARTBEAT_THRESHOLD o2cb timeout value, SAN and FC Switch
congurations that provides predicable failure response. For example, Oracle VM Servers should be able to
lose one HBA without node evictions. Incompatible o2cb timeout values, SAN and FC Switch congurations
can trigger node evictions and unexpected server reboots.
Note: While fault testing, run a HA enabled VM on the Oracle VM server that will be tested to conrm that
the VM is not interrupted during the single port fault testes. When both HBAs fail at the same time, the
Oracle VM server should fence, the HA enabled VM should restart on a live node.
The next tables shows each of the fault test with the expected failure results. Modify the table to reect
your design, and yourr fault tests.
Use the following commands to conrm the tests results.
watch multipath -ll
tail -f /var/log/messages
In the following example, each host has two HBAs installed with a single path to each fabric. Each fabric
then has two connections (one to each SP) to the array for a total of four paths on the array-side of the
fabric.
Complete and document each test to conrm which settings provide the expected failure results.
Use Case

Port(s)

Bond
Expected Results
Mode

1- Disable the port and wait +20


secs over the
O2CB_HEARTBEAT_THRESHOLD
time out value.

HBA0

ALUA HBA0 paths go down, HBA1


Mode paths stay active, the node does
4
not self-fence.

2- Disable the port and wait +20


secs over the
O2CB_HEARTBEAT_THRESHOLD
time out value.

HBA1

ALUA HBA1 paths go down, HBA0


Mode paths stay active, the node does
4
not self-fence.

3- Disable both ports and wait


+20 secs over the
O2CB_HEARTBEAT_THRESHOLD
time out value.

HBA0
HBA1

After the
ALUA
O2CB_HEARTBEAT_THRESHOLD
Mode
timeout value the node
4
self-fences.

Results

Oracle VM Master Server VIP Failover Testing


Oracle VM facilitates centralized server pool management using an agent-based architecture. The Oracle
VM agent is a python application that is installed by default with Oracle VM Server. Oracle VM Manager
dispatches commands using XML RPC over a dedicated network using TCP/8899 to each server pool's
Master Server. Each Master Server dispatchs commands to subordinate agent servers using TCP/8899.
There is only one Master Server agent in a server pool at any one point in time. The Master Server agent is
the only server in a server pool to communicate with Oracle VM Manager. Agent intra-component trac
should be isolated to a dedicated class A,B or C Server Management network channel.
To address the single point of failure for the Master Server agent, the server pool "Virtual IP" feature was
introduced. The Virtual IP feature detects the loss of the Master Server agent and automatically failover the
Master Server to the rst node that can lock the cluster.
To test Virtual IP failover, rst conrm which node is the Master Server by accessing Oracle VM Manager
=> Servers and VMs => Right Click the desired Server Pool => Conrm the host name in the Master
Server drop down list. Next, as root access the Master Server, and stop the ovs-agent service to typing

Mokum Solutions, Inc. +1 415 252-9164

13 of 14

http://mokumsolutions.com
service ovs-agent stop. After 60 seconds, ssh to the Virtual IP address to conrm that the Master Server
agent failed over to a new node.

Mokum Solutions, Inc. +1 415 252-9164

14 of 14

Вам также может понравиться