Вы находитесь на странице: 1из 4

Failover Cluster

By Jacqueline Emigh (Send Email)

Download the authoritative guide: Data Center Guide: Optimizing Your Data Center Strategy
Download the authoritative guide: Cloud Computing: Using the Cloud for Competitive
Advantage
A failover cluster is a set of computer servers that work together to provide either high availability (HA)
or continuous availability (CA). If one of the servers goes down, another node in the cluster can
assume its workload with either minimum or no downtime through a process referred to as failover.

Some failover clusters use physical servers only, whereas others involve virtual machines (VMs).
The main purpose of a failover cluster is to provide either CA or HA for applications and services. Also
referred to as fault tolerant (FT) clusters, CA clusters allow end users to keep utilizing applications and
services without experiencing any timeouts if a server fails. With HA clusters, on the other hand, a
user might undergo a brief interruption in service, but the system will recover automatically with no
data loss and minimum downtime.

A cluster is made up of two or more nodes, or servers, which are generally connected through physical
cables in addition to software. Other kinds of clustering technology can be used for purposes such as
load balancing, storage, and concurrent or parallel processing. Some implementations combine
failover clusters with additional clustering technology.
To protect your data, a dedicated network connects the failover cluster nodes, providing essential CA
or HA backup.
How Failover Clusters Work
While CA failover clusters are designed for 100 percent availability, HA clusters attempt 99.999
percent availability, also known as “five nines,” for downtime amounting to no more than 5.26 minutes
yearly. As a trade off for their greater availability, though, CA clusters are more costly to implement,
due to increased hardware requirements.

High Availability Failover Clusters


In a high availability cluster, groups of independent servers are loosely coupled to share resources
and data throughout the system. All nodes in a failover cluster have access to shared storage. High
availability clusters also include a monitoring connection which servers use to check the “heartbeat” or
health of one another. At least one of the nodes in a cluster is active, while at least one is passive.

In a simple two-node configuration, for example, if Node 1 fails, Node 2 uses the heartbeat connection
to recognize the failure and then configures itself as the active node. Clustering software installed on
every node in the cluster makes sure than clients connect to an active node.

In larger configurations, cluster management can be performed by dedicated servers. A cluster


management server constantly sends out heartbeat signals to determine if any of the nodes is failing,
and if so, to direct another node to assume the load.

Some cluster management software provides HA for virtual machines (VMs) by pooling them and the
physical servers they reside on into a cluster. If failure occurs, the VMs on the failed host are restarted
on alternate hosts.

Shared storage does pose a risk as a potential single point of failure. However, the use of RAID 6
together with RAID 10 can help to ensure that service will continue even if two hard drives fail.

If all servers are plugged into the same power grid, electrical power can represent another single point
of failure. Yet the nodes can be safeguarded by equipping each with a separate uninterruptible power
supply (UPS).

Continuous Availability Failover Clusters


In contrast, a fault-tolerant cluster consists of multiple systems, which share a single copy of a
computer's OS. Software commands issued by one system are also executed on the other system. CA
can only be achieved by using a continuously available and nearly exact copy of a physical or virtual
machine running the service. This redundancy model is called 2N.

CA requires the organization to use formatted computer equipment, plus a secondary UPS. CA
systems can also compensate for many different sorts of failures.

A fault tolerant system can automatically detect a failure of not just a hard drive but a computer
processor unit, I/O subsystem, power supply, or network component, for instance. The failure point
can be immediately identified, and a backup component or procedure can take its place instantly
without interruption in service.

In a CA failover cluster, the operating system (OS) is outfitted with an interface permitting a software
programmer to do checkpoints of critical data at predetermined points in a transaction.

Clustering software can also be used to group together two or more servers to act as a single virtual
server. You can also create many other CA failover setups. For example, a cluster might be configured
so that if one of the virtual servers fails, the others respond by temporarily removing the virtual server
from the cluster. It then automatically redistributes the workload among the remaining servers until the
downed server is ready to go online again.

An alternative to CA failover clusters is use of a “double” hardware server in which all physical
components are duplicated. Calculations are done independently and simultaneously on the same
hardware system. Yet this option can be even more expensive.

These “double” hardware systems perform synchronization by using a dedicated node that keeps tabs
on the results coming from both physical servers. Stratus, a maker of these specialized fault tolerant
hardware servers, promises that system downtime won’t amount to more than 32 seconds each year.
However, the cost of one Stratus server with dual CPUs for each synchronized module is estimated at
approximately $160,000 per synchronized nodule.

Practical Applications of Failover Clusters


Ongoing Availability of Mission Critical Applications
Fault tolerant systems are a necessity for computers used in online transaction processing (OLTP)
systems. OLTP, which demands 100 percent availability, is used in airline reservations systems,
electronic stock trading, and ATM banking, for example.

Many other types of organizations also use either CA clusters or fault tolerant computers for mission
critical applications, such as businesses in the fields of manufacturing, logistics, and retailing.
Applications include e-commerce, order management, and employee time clock systems, for example.

For clustering applications and services requiring only “five nines” uptime, though, high availability
clusters are generally regarded as adequate.

Disaster Recovery
Disaster recovery is another practical application for failover clusters. Of course, it’s highly advisable
for failover servers to be housed at remote sites in the event that a disaster such as a fire or flood
takes out all physical hardware and software in the primary data center.

In Windows Server 2016 and 2019, for example, Microsoft provides Storage Replica, a technology
allowing replication of volumes between servers for disaster recovery. The technology includes a
“stretch failover” feature for failover clusters spanning two geographic sites.

By stretching failover clusters, organizations can replicate among multiple data centers. If a disaster
strikes at one location, all data continues to exist on failover servers at other sites.

Database Replication
According to Microsoft, the company originally introduced Windows Server Failover Cluster (WSFC) in
Windows Server 2016 mainly to protect “mission-critical” applications such as its SQL Server database
and Microsoft Exchange communications server.

Other database providers, too, offer failover cluster technology for database replication. MySQL
Cluster, for example, includes a heartbeat mechanism for instant failure detection, typically within one
second, to other nodes in the cluster, with no service interruptions to clients. A geographic replication
feature enables databases to be mirrored to remote locations.

Failover Cluster Types


VMWare Failover Clusters
Among the virtualization products available, VMware offers several virtualization tools for VM clusters.
vSphere 6 Fault Tolerance provides a CA architecture that exactly replicates a VMware virtual
machine on an alternate physical host in case the main host server goes down.
A second product, VMware HA, follows the approach of providing HA for VMs by pooling them and
their hosts into a cluster for automatic failover. Using VMware HA in conjunction with VMWare’s
Distributed Resource Scheduler (DRS) adds load balancing, for faster rebalancing of VMs after
VMware HA has moved the VMs to other hosts.

Windows Server Failover Cluster (WSFC)


You can create Hyper-V failover servers with the use of WFSC, a feature in Windows 2016 and 2019
that monitors clustered physical servers, providing failover if needed. WFSC also monitors clustered
roles, formerly referred to as clustered applications and services. If a clustered role isn’t working
correctly, it is either restarted or moved to another node.

WFSC includes Microsoft’s previous  Cluster Shared Volume (CSV) technology to provide a
consistent, distributed namespace for accessing shared storage from all nodes. In addition, WSFC
supports CA file share storage for SQL Server and Microsoft Hyper-V cluster VMs. It also supports HA
roles running on physical servers and Hyper-V cluster VMs. Here is a Hyper-V cluster diagram.

SQL Server Failover Clusters


In SQL Server 2017, Microsoft introduced Always On, an HA solution that uses WSFC as a platform
technology, registering SQL Server components as WSFC cluster resources. According to Microsoft,
related resources are combined into a role which is dependent on other WSFC resources. WSFC can
then identify and communicate the need to either restart a SQL Server instance or automatically fail it
over to a different node.

Red Hat Linux Failover Clusters


OS makers other than Microsoft also provide their own failover cluster technologies. For example, Red
Hat Enterprise Linux (RHEL) users can create HA failover clusters with the High Availability Add-On
and Red Hat Global File System (GFS/GFS2). Support is provided for single-cluster stretch clusters
spanning multiple sites as well as multi-site of “disaster-tolerant” clusters. The multi-site clusters
generally use storage area network (SAN)-enabled data storage replication.

This article was originally published on May 15, 2019

Вам также может понравиться