Академический Документы
Профессиональный Документы
Культура Документы
Content EXECUTIVE SUMMARY THIRD-PARTY LEGAL NOTICES LICENSING AND REGISTRATION TECHNICAL SUPPORT SCOPE OF DOCUMENT AUDIENCE BACKGROUND INTRODUCTION INTRODUCTION TO VCS TOPOLOGY SPLIT BRAIN OUTLINED WHAT IS THE PROBLEM? There are three types of split brain conditions: Regular split brain Serial Split brain Wide Area Split brain W HAT ARE THE MOST COMMON CASES FOR A SPLIT BRAIN TO HAPPEN? GENERAL NOTES OF I/O FENCING MEMBERSHIP ARBITRATION DATA PROTECTION SCSI3 Persistent Reservations for Private DiskGroups: SCSI3 Persistent Group Reservations Shared DiskGroups (CVM/CFS) AGENTS RELATED TO I/O FENCING DiskGroup Agent notes and attributes MonitorReservation (Boolean-scalar) Reservation CoordPoint Agent notes I/O FENCING CAN BE ENABLED FOR ALL ENVIRONMENTS NON SCSI3 BASED FENCING PREFERRED FENCING DEPLOYING I/O FENCING W ORKFLOW HOW TO DEPLOY I/O FENCING IN YOUR ENVIRONMENT. CHOOSING COORDINATION POINT TECHNOLOGY Disk-based coordination points CP Server based coordination points A combination of CP Servers and Coordinator using SCSI3-PR CHOOSING COORDINATION POINT PLACEMENT DEPLOYING I/O FENCING DEPLOYING PREFERRED FENCING (OPTIONAL) CP SERVER CONSIDERATIONS CP Server scalability requirements Clustering the CP-Server itself I/O FENCING DEPLOYMENT SCENARIOS SCENARIO 1: ALL NODES IN THE SAME DATA CENTER USING DISK BASED COORDINATION POINTS. SCENARIO 2: ALL CLUSTER NODES IN THE SAME DATACENTER, WHILE REDUCING THE
AMOUNT OF STORAGE USED FOR COORDINATOR DISKS SCENARIO 3: CAMPUS CLUSTER CONFIGURATION USING THREE SITES SCENARIO 4: REPLACING ALL COORDINATION DISKS WITH CP SERVERS AVAILABILITY SCENARIO 5: REPLACING ALL COORDINATION DISKS WITH CP SERVERS FLEXIBILITY
3 3 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 8 9 10 10 10 10 10 10 11 11 11 12 12 13 13 13 14 14 14 14 15 15 15 16 16 17 18 19 20 21
Page 2
Executive Summary
I/O Fencing provides protection against data corruption and can guarantee data consistency in a clustered environment. Data is the most valuable component in todays enterprises. Having data protected and therefore consistent at all times is a number one priority. This White Paper describes the different deployment methods and strategies available for I/O Fencing. It is designed to illustrate configuration options and provide examples where they are appropriate. Symantec has led the way in solving the potential data corruption issues that are associated with clusters. We have developed and adopted industry standards (SCSI-3 Persistent Group Reservations [PGR]) that leverage modern disk-array controllers that integrate tightly into the overall cluster communications framework.
Technical support
For technical assistance, visit: http://www.symantec.com/enterprise/support/assistance_care.jsp. Select phone or email support. Use the Knowledge Base search feature to access resources such as TechNotes, product alerts, software downloads, hardware compatibility lists, and our customer email notification service.
Scope of document
This document is intended to explain and clarify I/O Fencing for Veritas Cluster Server (VCS) Clusters. It will provide information to assist with adoption and configuration of I/O Fencing for VCS. Note that VCS is included in several product bundles from Symantec, including but not limited to Storage Foundation, Storage Foundation Cluster File System and Storage Foundation for Oracle RAC. The document describes how I/O Fencing operates, is deployed as well as an provides an outline of the available functionality. Installation and Administration procedures are well covered in public available documentation. This document focuses on the I/O Fencing functionality provided in VCS 5.1 SP1. However, information may or may not be applicable to earlier and later releases.
Audience
This document is targeted for technical users and architects who wish to deploy VCS with I/O Fencing. The reader should have a basic understanding of VCS. More information around VCS can be found here: http://www.symantec.com/business/cluster-server. I/O Fencing Deployment Considerations Whitepaper Page 3
Background
Clustering, in its nature, exposes a risk of data corruption, due to the fact that independent nodes have access to the same data. In its infancy, this technology caused data corruptions. As clustering evolved, different technologies have been developed to prevent data corruption. In short, the problem arises when two nodes are accessing the same data, independently of each other. This is termed split brain and is outlined in the Introduction chapter of this document (next page). Preventing data corruption during a split brain scenario is relatively easy. However, some cluster solutions handle this situation by forcing downtime to the applications. This is not acceptable for todays demanding environments. As VCS evolved, several methods of avoiding split brain and data corruption have been put into place. Since VCS 3.5 (Released in 2001), a feature named I/O Fencing has been available. I/O Fencing can eliminate the risk of data corruption in a split brain scenario. This document focuses on the various options for I/O Fencing, implementation considerations and guidelines. It provides a comparison between the different deployment methods. NOTE: VCS for Windows (known as Storage Foundation for Windows HA) uses another method to prevent data corruptions in split brain scenarios. Please refer to public documentation for the Storage Foundation for Windows HA release for more information.
Introduction
Introduction to VCS topology
Node-to-Node communication, also called a heartbeat link, is an essential factor in a cluster design. A single VCS cluster consists of multiple systems that are connected via specific heartbeat networks. In most cases, two independent heartbeat networks are used. Protecting the heartbeat networks is crucial for cluster stability. Sometimes, the heartbeat networks are referred to as private networks. VCS replicates the current state of all cluster resources from each cluster node to all other nodes in the cluster. State information is transferred over the heartbeat networks; hence all nodes have the same information about all cluster resources. VCS also recognizes active nodes, nodes joining and leaving the cluster, and faulted nodes over the heartbeat networks. Note that shared storage isnt required when using VCS. However, VCS is most commonly configured with shared storage.
Page 4
If all cluster heartbeat links fail simultaneously, it is possible for one cluster to separate into two or more subclusters. In this situation, each subcluster will not be aware of the status of the other subclusters. Each subcluster could carry out recovery actions for the departed systems. For example, a passive node can start to online service groups, despite that the application is online on the primary node. This concept is known as split brain.
There are three types of split brain conditions: Regular split brain
Given a local cluster, with possible mirroring of shared storage, if no protection is in place, its very likely that split brain will likely lead to corruption of data. I/O Fencing can provide data protection against this scenario.
global clusters is a manual operation, although some global clusters can be configured to perform automatic failovers. There are two ways for a wide area split brain to occur: If using manual failover: Heartbeat link between two or more clusters is down, and a System Administrator is bringing up applications that actually are online in other clusters. If using automatic failovers: When the heartbeat links between two or more clusters go down, the cluster can automatically bring up service groups on the remote cluster. As global heartbeats usually are deployed over networks that span multiple countries or even continents, its difficult to get appropriate reliability on those networks. In addition, global clusters are usually deployed for DR purposes. Many companies prefer to have manual DR operations. In global cluster configurations with VCS and Global Cluster Option, a steward process can be utilized. The steward process is run on a server on a third site to be used when heartbeat communication is lost between the primary site and the DR site. The DR site checks with the steward process, which is located outside of both the primary and DR sites, to determine if the primary site is down. NOTE: Wide area split brains are not handled by I/O Fencing. I/O Fencing operates on the individual cluster level, and is only for local clusters.
What are the most common cases for a split brain to happen?
Heartbeat networks disconnected, dividing cluster nodes into subclusters A cluster node hangs. Other nodes expects that this node is down, and starts action to prevent downtime (bringing up services) Operating System Break/Pause and Resume. If the break feature of an OS is used, the cluster expects that this node is down, and will start actions to prevent downtime. If the OS is resumed soon after, this can introduce risk of a data corruption. In addition, some virtualization technologies also support the ability to Pause a running Virtual Machine. I/O Fencing under VCS should be deployed to protect from all scenarios described above.
Membership Arbitration
Membership arbitration is necessary to ensure that when there is an issue with the cluster heartbeat network and the cluster members are unable to communicate, only a single subcluster should remain online. Arbitration is the process of determining which nodes are to remain online. The ultimate goal is to have a process to guarantee I/O Fencing Deployment Considerations Whitepaper Page 6
multiple servers in the same cluster are not attempting to startup the same application at the same time. It is also to be done rapidly, in a timely fashion, so as to avoid any chance of data corruption. Another reason membership arbitration is necessary is because systems may falsely appear to be down. If the cluster heartbeat network fails, a cluster node can appear to be faulted when it actually is not. Limited bandwidth, configuration issues, driver bugs, improper configuration or power outage of switches can cause heartbeat networks to fail. Even if no SPOFs (Single Points of Failure) should exist in the heartbeat configuration, human mistakes are still possible. Therefore, the membership arbitration functionality in the I/O Fencing feature is critical to cluster integrity. Membership arbitration guarantees against such split brain conditions. The key components for membership arbitration in VCS are coordination points. Coordination points provide a lock mechanism to determine which nodes will be forced to leave the cluster in the event of a split brain. A node must eject a peer from the coordination points before it can fence the peer from the data drives. The number of coordination points is required to be an odd number, three or greater. Most commonly, three coordination points are deployed. When a cluster node starts up, a component known as the vxfen kernel module will register to all coordination point. With Symantec fencing technologies, coordination points can be either disk devices (coordinator disks) or distributed server nodes (coordination point servers or, CP Servers). We will discuss these reasons more thoroughly in the following sections. Here is a chart with information to help you decide which I/O Fencing technology to implement.
Coordinator Disk Communication Benefits SAN Connection using SCSI3-PR SCSI3-PR based data Protection Disk based Membership Arbitration Guaranteed Data Protection Coordination Point Server Network to a CP Server Non-SCSI3 Fencing Network Membership Arbitration Wastes less storage space Designed to help with Campus Clusters split site problem with SCSI3-PR disks Drawbacks Uses Dedicated LUN per Coordinator Disk Some low-end storage arrays do not support SCSI3-PR Some Virtualization technologies do not support SCSI3-PR Primary use case Need Guaranteed Data Availability Campus Cluster across two sites Requires an additional server to run the CPS process Networks are not as Fault Tolerant as SANs
Both of these technologies can be used together to provide I/O Fencing for the same cluster. This can only occur when using SCSI3-PR based fencing. We can see an example of this configuration in the following diagram.
Page 7
In our diagram we have an example of a 2-node VCS cluster configured with 3 coordination points. The yellow balls on the CP Servers and the Coordinator Disk each represents a Fencing Key. When there are registrations on the coordination points, then the node can join the cluster, which is represented by the yellow ball next to the cluster node. The same is true for the second cluster node and the green balls. When at least one Coordinator Disk is used, SCSI3 based fencing is in use.
Data Protection
I/O Fencing uses SCSI3 Persistent Reservations (PR) for data protection. SCSI3-PR supports device access from multiple systems, or from multiple paths from a single system. At the same time it blocks access to the device from other systems, or other paths. It also ensures persistent reservations across SCSI bus resets. Note that SCSI3-PR needs to be supported by the disk array. Using SCSI3-PR eliminates the risk of data corruption in a split brain scenario by fencing off nodes from the protected data disks. If a node has been fenced off from the data disks, there is no possibility for that node to write data to the disks. The data disks will simply be in a read-only state. Note: SCSI3-PR also protects against accidental use of LUNs. For example, if a LUN is used on one system, and is unintentionally provisioned to another server, there is no possibility for corruption, the LUN will simply not be writable, and hence there is no possibility of corruption. Membership arbitration alone doesnt give a 100% guarantee against corruptions in split brain scenarios. 1. Kernel Hangs. If a system is hung, VCS will interpret this as Node Faulted and will take action to prevent downtime 2. Operating System break/resume used 3. Very busy cluster node will not allow heartbeating I/O Fencing Deployment Considerations Whitepaper Page 8
All those scenarios are rare, but they do happen. Lets take the first scenario, Kernel hang. 1. Within the VCS cluster, node1 hangs 2. VCS on node2 recognize this as a failure of node1, and brings online service groups that previously were running on node1 3. All service groups online on node2 4. Two minutes after the initial kernel hang on node1, the node wakes up. Most probably, data will be in the buffers, ready to be written down to the disk If we do not have SCSI3 Persistent Reservations here, there is a risk of corruption. Both nodes are writing to the same file system in an uncoordinated fashion, which will lead to data corruption If SCSI3-PR is enabled, the following will happen: 1. When the kernel hang is released, and node1 wakes up, it will try to write to the protected disks, but will be prevented, as VCS already has fenced off the node. Write errors will be present in the system log. A few seconds later, VCS will panic the node In this situation, data will be consistent and online, accessed by node2. To provide a further example for both Membership Arbitration and Data Protection let us look at when a cluster node/subcluster has ejected another cluster node/subcluster from the coordination point. One of the following things will happen: If SCSI3-PR Fencing is enabled (default and recommended): 1. Data disks will be fenced off from the leaving node(s) preventing them from accessing the disk, including flushing their cache. 2. A panic will be issued by VCS on the leaving node(s) 3. VCS will immediately bring online service groups which previously were online on the now leaving node(s), according to the configured failover policies. If Non-SCSI3 Fencing is enabled: 1. A panic will be issued by VCS on the leaving node(s) 2. VCS on the remaining nodes will wait for a significant amount of time (110 seconds in 5.1 SP1), before bring up faulted service groups (which were previously online) on the now remaining node(s). If no I/O Fencing is enabled: 1. Since each subcluster does not know the state of the other cluster nodes, VCS does not automatically startup service groups that are offline. In this case, you have a significant risk of corruption because each subcluster node could think they are the only remaining node and startup application and import disks that are already online and active on another node.
MonitorReservation (Boolean-scalar)
Symantec has noted that some array operations, for example online firmware upgrades, have removed the reservations. This attribute enables monitoring of the reservations. If the value is 1, and SCSI3 Based Fencing is configured, the agent monitors the SCSI reservations on the disks in the disk group. If a reservation is missing, the monitor agent function takes the resource offline. This attribute is set to 0 by default.
Reservation
The Reservation attribute determines if you want to enable SCSI-3 reservation. This attribute was added in VCS 5.1 SP1 to enable granular reservation configuration, for individual disk groups This attribute can have one of the following three values: ClusterDefault - The disk group is imported with SCSI-3 reservation if the value of the cluster-level UseFence attribute is SCSI3. If the value of the cluster-level UseFence attribute is NONE, the disk group is imported without reservation. SCSI3 - The disk group is imported with SCSI-3 reservation if the value of the clusterlevel UseFence attribute is SCSI3. NONE - The disk group is imported without SCSI-3 reservation. Default value for this attribute is ClusterDefault.
within their cluster service SG to ensure that the coordination points are currently active. Any issue with the coordination points will be logged in the Engine_A.log and if notification is enabled a message will be sent. Please see the bundled agent guide for more information on implementation. Here is an example of the agent configured within a main.cf cluster configuration: group vxfen ( SystemList = { sysA = 0, sysB = 1 } Parallel = 1 AutoStartList = { sysA, sysB } ) CoordPoint coordpoint ( FaultTolerance=0 )
Preferred Fencing
The I/O fencing driver uses coordination points to prevent complete split brain in the event of a VCS cluster communication breakdown. At the time of a network (private interconnects, heartbeat links) partition, the fencing driver in each subcluster races for the coordination points. The subcluster that grabs the majority of coordination points survives whereas the fencing driver causes a system panic on nodes from all other subclusters whose racer node lost the race.
Page 11
By default, the fencing driver favors the subcluster with maximum number of nodes during the race for coordination points. If the subclusters are equal in number, then VCS will decide. Note that this behavior doesnt take VCS service groups or applications into consideration. It is possible that a passive node survives, and that an active node is fenced off and will panic, leaving the passive node to possibly go to an active state. Using Preferred Fencing, you can favor one of the subclusters using predefined policies. As a note, just because the node or the node containing the preferred Service Group is preferred, that does not mean it will win the Fencing Race. If the preferred node does not have access to the Coordination Points, it will lose the race regardless of the Preferred Fencing settings. Preferred Fencing is controlled by the PreferredFencingPolicy attribute, found at the cluster level. The following values are possible for the PreferredFencingPolicy attribute: Disabled (default) Enables the standard node count and node number based fencing policy as described above. Preferred fencing is disabled. Group Enables Preferred Fencing based on Service Groups. Preferred Fencing using Service Group priority will favor the subcluster running with most critical service groups in an online state. Criticality of a Service Groups is configured by weight. Each service group should be configured with a weight using the Priority attribute. Note that weight from all service groups currently online in the subcluster will be combined on the racer node. The racer node with the highest combined weight will be favored. System Enables Preferred Fencing based on Systems. Preferred Fencing using System priority will prioritize certain cluster nodes. For example, if one cluster node is more powerful in terms of CPU/Memory than others or if it location makes it higher priority, then we can give it a higher priority from a fencing perspective. Preferred Fencing for systems is controlled by the System level FencingWeight attribute for each individual node. The calculation of combining the FencingWeight values for the subcluster and comparing the racer nodes to determine who should win continues. Note: Giving a Node or Service Group Priority does not mean that it will win the race. If the subcluster loses access to the SAN or the network and is unable to obtain the fencing reservations, it will not be able to win the race. Preferred Fencing is giving preference to a specific system or service group, it does not guarantee a winner.
Page 13
CP Server considerations
This section contains general considerations for CP Server deployments.
Page 15
Scenario 1: All nodes in the same Data Center using Disk based coordination points.
This is the simplest scenario, suitable for customers who are deploying clustering within the same datacenter at the same location. In this scenario, you can place all coordination points on LUNs in the array. This is the most common configuration as it was previously the only option customers had to receive SCSI3-PR protection. Customers take three LUNs and create a coordinator disk group after validating SCSI3-PR compatibility with the vxfentsthdw command. Please see the VCS Administrator Guide for more information.
Page 16
Scenario 2: All cluster nodes in the same datacenter, while reducing the amount of storage used for coordinator disks
2-node cluster using SCSI3 Fencing with 2-CPS and 1 Coordinator Disk
In this scenario, the goal is to ensure that SPOFs are reduced in the configuration while reducing the amount of storage used for coordination points. This configuration has the two of the three coordination points as CP Servers. It continues to provide SCSI3-PR data protection. The CP Servers can service up to 128 clusters. This scenario uses a single Coordinator Disk along with 2 Coordination Point Servers. It reduces the amount of disk space used for coordination points, which still continuing to provide data protection and membership availability with server based fencing.
Page 17
Campus cluster using SCSI3 Fencing with a Coordinator Disk on each site and a CPS on a 3rd site
Campus Cluster environment with two main sites and a third site In this scenario, the cluster is stretched between two main sites. Campus cluster requirements apply here, and those can be found in the VCS Administration Guide. Preferably would be to put one coordination point in each of the two sites, and a third coordination point in a remote site. Typically, a Campus Cluster is stretched between two data centers, with a distance of 5-80 kilometers or 3-50 miles. Note that this limitation is due to the ability of writing data in a synchronous manner with decent performance. The distance may be less or more depending on network latency. The CP Server was originally designed to protect this scenario. If a campus cluster was to utilize all coordinator disks, then one site would have a majority of the disks. If there was a site failure, the secondary site would not be able to come online because it would not have access to more than of the coordination points. The CPS on the rd 3 site resolves this issue.
Page 18
4 CP Servers spread throughout the environment to provide high availability to the CPS
If an enterprise determines to replace all coordination disks with CP servers, availability of the CP servers is crucial. Guaranteeing availability for the CP servers can be done with VCS as described earlier. In this scenario, the cluster environment is located on one site. Each coordination point server will be made highly available within a production cluster. In this configuration 4 CP Servers would be needed. Each cluster requires access to 3 CP Servers and they are unable to use a CPS contained within their cluster configuration. The architecture should distribute the CP Servers throughout the computing environment and place the instances in clusters with uptime requirements to ensure the CPS instance is available to service the other clusters. It is recommended to spread out the CP Servers throughout the environment to reduce Single Points of Failure (SPOF). Also, it is recommended not to have all of the CP Servers on the same VMware ESX host.
Page 19
If an enterprise determines to replace all coordination disks with CP servers, and computing resources are scarce each CPS can run in one-node cluster as described earlier. VCS can ensure that the CPS application remains online, guaranteeing availability and access to the VCS cluster nodes. Note: Using 3 CP Servers does not prevent clusters from implementing SCSI3-PR for data protection. SCSI3 would be implemented in the DiskGroup agent within the VCS configuration.
Page 20
Page 21
About Symantec Symantec is a global leader in providing security, storage and systems management solutions to help businesses and consumers secure and manage their information. Headquartered in Mountain View, Calif., Symantec has operations in 40 countries. More information is available at www.symantec.com.
For specific country offices and contact numbers, please visit our Web site. For product information in the U.S., call toll-free 1 (800) 745 6054.
Symantec Corporation World Headquarters 350 Ellis Street Mountain View, CA 94043 USA +1 (408) 517 8000 1 (800) 721 3934 www.symantec.com
Copyright 2008 Symantec Corporation. All rights reserved. Symantec and the Symantec logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. 02/08