Вы находитесь на странице: 1из 7

O S

E N V I R O N M E N T

Architecting

Linux
High-Availability
Clusters
By Tau Leng; Jenwei Hsieh, Ph.D.; and Edward Yardumian

This article, the first in a series on Linux high-availability (HA) clusters,


provides an overview of the Linux HA cluster. It describes the two common
types of clusters, the Linux Virtual Server for IP HA and load balancing, and
HA application clusters. Future articles in the series will cover product-specific
implementations and features.

inux is known as a stable operating system. However, a


Linux client/server configuration can have several points
of failure, including the server hardware, the networking
components, and the server-based applications. As more
administrators choose Linux for critical applications, the
demand for high-availability (HA) clustering for the Linux
platform is increasing.
In response, a number of Linux distributors have
designed and implemented bundled HA solutions in their
products, and numerous third-party add-ons are now available. However, several aspects of these technologies are not
always clear, such as how the technologies work, the types of
applications for which they are suitable, and the kinds of
hardware required.

HA Clustering Provides Availability,


Performance, and Capacity
For networked computers, clustering is the process of connecting multiple systems together to provide greater overall system
availability, performance, capacity, or some combination of
94

PowerSolutions

these. Because the term clustering itself is so broad, other


termssuch as load balancing, failover, parallel, and
Beowulfare used to describe specific cluster implementations. For example, Beowulf clusters are designed to provide
scalability and parallel processing for computational functions. HA clustering solutions, however, seek to provide
enhanced availability for a service or application.
Common Types of HA Clusters in Linux
Two common types of HA clusters are emerging in the
Linux environment: HA IP clusters and HA application
clusters. HA IP clusters ensure availability for network
access points, which typically are IP addresses that clients
use to access network services. HA IP clusters on the Linux
platform achieve high availability using the Linux Virtual
Server (LVS) mechanism. By using this mechanism to support virtual IP addresses, some HA IP cluster implementations can also load-balance certain types of applications if
their contents are completely replicated on a pool of application servers. Applications that the HA IP cluster can

Heartbeat
Active LVS Server

Heartbeat

Passive LVS

Heartbeat

Application Server

Application Server

Heartbeat

Application Server

Figure 1. Monitoring the Health of Application Servers

load-balance include static Web and File Transfer Protocol


(FTP) servers and video streaming servers.
HA application clusters, on the other hand, are more
suitable for stateful, transactional applications, such as database servers, Web application servers, file servers, and print
servers. HA application clusters ensure availability through
the failover of applications, along with all of the resources
that the applications needsuch as disks, IP addresses, and
software modulesto remaining servers.

HA IP Clusters: The LVS Presents a Single System

capable of processing requests. If one of the application


servers fails, one or more application servers will be available
to continue service through the same virtual IP address.
Heartbeats Monitor Server Health
To provide uninterrupted service, the LVS continuously
monitors the health of the application servers (see Figure 1).
Health monitoring between the LVS and application servers
ensures timely failure detection and cluster membership status.
This monitoring is performed via a heartbeat mechanism
managed by the LVS. Heartbeat packets are sent between
cluster nodes at regular intervals (on the order of seconds). If a
heartbeat is not received after a predefined period of time
typically a few heartbeat intervalsthe absent machine is presumed failed. If this machine is an application node, the LVS
server stops routing client requests to it until it is restored.
Depending on the implementation, the heartbeat protocol
may run through a TTY serial port, UDP/IP over Ethernet, or
even over shared storage connectivity.
Providing continuous service requires that the applications are installed locally in each of the application servers.
For this reason, the application servers are often referred to
as clones. Any data, such as Web or FTP content, must
be completely replicated to all of the application servers to
ensure a consistent response from the application server
pool. In the event of a server failure, the LVS heartbeat
mechanism detects the failure, makes the necessary changes
to the clusters membership, and continues forwarding
requests to the remaining server or servers.

Most HA IP cluster implementations, such as Piranha, Red


Hat High Availability Server 1.0, TurboCluster from
TurboLinux, and Ultra Monkey (supported by VA Linux),
use an LVS mechanism as well as a group, or
pool, of cloned application servers.
HA IP Clusters Support Load Balancing
The LVS presents the pool of application
HA IP clusters not only provide high availIn
the
event
of
a
server
servers to network clients as if it were a single
ability for an IP address, but the manner in
system. The LVS is represented by the virtual
failure, the LVS heartbeat which client requests are forwarded from the
IP address or addresses clients use to access
LVS server to the cloned application servers
mechanism detects the also supports load balancing. In fact, load
the clustered services, including the specific
port and protocoleither UDP/IP or TCP/IP.
balancingsimply by adding additional
failure, makes the neces- application nodes when demand increases
An LVS server maintains the LVS identity
and dispatches client requests to a physically
can help the appropriate applications achieve
sary changes to the
separate application server or pool of applicatremendous scalability.
clusters
membership,
and
tion servers. Network clients are kept unaware
To help spread client requests, or workof the unique physical IP addresses used by
load, across the pool of application servers,
continues forwarding
the LVS server nodes or any of the applicaeach LVS implementation uses a set of basic
tion nodes; the clients access only a virtual IP
requests to the remaining scheduling policies. The most commonly used
address managed by the LVS.
policies are round robin and least connecThe LVS server is responsible for routing
tions. Round robin simply forwards requests
server or servers.
client requests to cloned application servers.
to each application server one at a time and
To accomplish this task, the LVS is configperpetually repeats the process in the same
ured with scheduling policies that allocate and forward
order. With least connections, the LVS assesses the current
incoming connections to the application servers. High
number of connections on each application server and foravailability is achieved by having multiple destinations
wards the request to the server with the fewest number.
www.dell.com/powersolutions

PowerSolutions 95

Client Network
Virtual IP
Active LVS

Passive LVS

Application Server 1
Application and Data

Application Server 2
Application and Data

Figure 2. Hosting the LVS Mechanism on an Application Server

Client Network
Virtual IPs
Active LVS Server

Passive LVS

Private Network

Application Server 1
Application and Data

Application Server 2
Application and Data

Application Server 3
Application and Data

Figure 3. Redundant LVS Server Mechanism

Some Linux distributions or products also use more advanced


algorithms that can actually examine the load on each application server and distribute the incoming requests accordingly.
For Web sites that do not maintain state information outside
the Web server itself, most LVS implementations have a persistency mode that redirects clients to the appropriate servers
throughout a session.
At a minimum, an HA IP cluster can be implemented by
using one of the application servers for the LVS mechanism
(see Figure 2). All requests are first handled by the server
running the LVS, which initially determines whether the
request will be handled locally or shipped to another application server. In this configuration, however, the LVS mechanism can become a bottleneck because one server handles
all routing functions and application requests.
An Active/Passive LVS Helps to Prevent Bottlenecks
Assigning smaller loads to the server running the LVS can
mitigate the risk of a bottleneck. Ideally, the LVS should be
96

PowerSolutions

built with a pair of dedicated servers, one actively functioning


as the LVS, the other acting as its hot standby (see Figure 3).
This configuration is often referred to as an active/passive
LVS, because one server actively serves as the LVS and routes
requests to the application servers while the passive LVS
server waits to assume control only if the active server fails. In
a failure scenario, the standby server assumes the virtual IP
address of the LVS, while retaining its own unique physical IP
address, through a process referred to as IP failover. Clients are
automatically reconnected to the LVS running on the other
server without reconfiguration.
Currently, most HA IP cluster distributions do not support active/active configurations. An active/active configuration would include two or more LVS servers, each active and
responsible for a different LVS in addition to being available
in the case of a failover. Furthermore, no LVS implementations currently support load-balancing configurations in
which multiple LVS servers share routing responsibilities for
the same virtual IP address.
As traffic to the site grows and the LVS server routes an
increasing number of requests, the LVS server may have to
be upgraded or replaced to ensure that CPU, memory, or
network bandwidth is adequate. To ease the load on an LVS
server that typically routes all application server responses
back to clients using network address translation (NAT),
responses from application servers can be sent directly to
clients by creating another physical route and using IP tunneling or direct routing techniques.
Data Replication is a Challenge
Data replication often is the greatest challenge when implementing HA IP clusters with a pool of application servers. If
just two application servers are required, a shared storage system
can be built using Dells cluster-ready PERC 2/DC RAID controllers and PowerVault 200S or 210S storage systems with
Enclosure Services Expander Module (ESEM) or SCSI
Enclosure Management Module (SEMM) cluster modules (see
Figure 4). In shared storage configurations, both servers can
access the same set of files using the global file system to share
files between application servers. If more than two nodes are
required or if shared storage is not desired, Intermezzos distributed file system enables directory tree replication and can be
used to replicate files to the application servers internal disks.
When transactions are involved or when mirroring
must be nearly instantaneous, complex distributed locking
techniques often are required to maintain data integrity
and consistency. While replication and mirroring may
work for some applications that involve writes, the replication can incur substantial overhead. In the cases of
databases, messaging, and most application services, it is
difficult to implement HA IP clustering because of their

read/write and transactional natures and the complexities


in replicating or synchronizing their content. Therefore,
these applications are better suited for Linux HA application clusters.

Lower level resources such as disks and IP addresses are


brought online first, and application modules are brought up
last, after all the dependent resources are ready.
For example, the resource group hierarchy
for a file share could include a disk where the
HA Application Clusters
files are stored, an IP address, a server name,
When transactions
HA application clusters, such as LifeKeeper
and file share resources (see Figure 5). The
from SteelEye, Convolo Cluster from
are involved or when disk resource and the IP address have no
Mission Critical Linux, RSF-1 from Highdependencies and are brought online first.
Availability.com, and VERITAS Cluster
The disk is among the first resources to come
mirroring must be
Server, are appropriate for transactional
online because it would be futile to connect
nearly instantaneous, to the servers before the disk they are stored
applications, such as databases, groupware,
file systems, and other applications containon is online and mounted. Likewise, it is
complex distributed
ing business logic. While the LVS mechanism
important to bring up the IP address before
is the enabling technology for HA IP cluslocking techniques often bringing up the servers name. Finally, after
ters, HA application clusters take the conall of the required dependencies are brought
are
required
to
maintain
cept of the LVS a step further to ensure the
online, the application services are made
availability of applications. HA application
available on the network.
data integrity and
clusters achieve this availability by continuKey elements that assist application
ously monitoring the health of an application
failover
are the resource manager and applicaconsistency.
and the resources the application depends on
tion recovery kits. The resource manager
for normal operation, including the server it
enables the user to define the resource hieraris running on. Should any of the application
chy for applications and specify the dependenresources fail, the HA application cluster will restart, or fail
cies among resources. Application recovery kits are tools, or
over, the application on one of the remaining servers.
a set of scripts, that provide the mechanism to automatically
To ensure that all of the resources an application needs are
restart an application and all of its resources, in the proper
closely monitored and will fail over to one of the remaining
order, on one of the remaining servers should a failure occur.
servers, they are grouped together in a resource hierarchy. In
Application recovery kits are usually provided by vendors of
resource hierarchies, the resources are grouped and arranged in
HA application clusters for packaged software, including
dependency trees so that they can be moved (between physidatabases and the most commonly used applications on
cal resources) or restarted on different servers. A dependency
Linux servers, such as Apache Web server, sendmail, and
tree ensures that the resources come online in the right order.
print services. Moreover, recovery kits at the system level for
the Linux file system are becoming more widely available.
For applications that have no associated recovery kit, users
can create custom scripts by employing application programClient Network
ming interface (API) commands and utilities.
Generally three approaches exist to ensure that application
App 1 Virtual IP
App 2 Virtual IP
data or storage remain available to reminding servers after
failover. Figure 6 summarizes the advantages and disadvantages
of these approaches.
Application Server 1

Application Server 2

Passive Standby and Active/Active Modes Ensure High Availability


Similar to the LVS concepts of active/passive and active/
active, HA application clusters use the passive standby mode
Data of Application 1
Data of Application 2

IP Address

Network Name
Application

External SCSI or Fibre Channel Storage


Disk

Figure 4. Two-Node Shared Storage Configuration


www.dell.com/powersolutions

Figure 5. Example Resource Group Hierarchy


PowerSolutions 97

APPROACHES TO ENSURE DATA OR STORAGE AVAILABILITY AFTER FAILOVER


Description

Advantages

Disadvantages

Mirroring

Each server owns its own data


storage; servers perform constant
coping operations to achieve data
synchronization

High disaster recovery rate

High network and system overhead


due to data synchronization
operations; potential data loss when
failover occurs

Shared
Nothing

Servers cabled to a shared storage


enclosure, but each server owns its
own set of RAID volumes; during
failover, the surviving node assumes
ownership of the disks and the
service of the failed server

Minimum network overhead for


maintaining the data consistency
in the cluster

Requires redundant connectivity


and RAID technology to prevent
the storage enclosure from being
a single point of failure for the
entire configuration

Shared
Everything

Servers share simultaneous access


to the same disks

Minimum network overhead for


maintaining the data consistency
in the cluster

Requires redundant connectivity,


RAID technology, and complicated
lock manager software control
to maintain data integrity

Figure 6. Approaches to Ensure Data or Storage Availability After Failover

and active/active mode terminology. A straightforward


approach to achieve high availability is the passive standby
mode, in which one server acts as the primary server, while a
secondary server remains available for use should the primary
server fail. In this passive backup mode, the secondary server
is not used for any other processing; it simply stands by to
take over if the primary server fails. This configuration
enables maximum resources to be available to the application in the event of a failure. However, this configuration is
expensive to implement because it requires twice the
amount of hardware to be purchased.
For all but the most mission-critical applications,
active/active configurations can be highly effective. In
active/active configurations, each server performs useful
processes, while still possessing the ability to take over for
another server in the event of a failure. Drawbacks of
active/active configurations include increased design complexity and the potential introduction of performance
issues upon failover.

Multinode Solutions Are Now Available


Although the most common HA application cluster configurations are currently for two nodes, several multinode
(more than two nodes) failover solutions for Linux have
recently become available. With multinode clusters, configurations such as N+1 and cascaded failover help administrators meet high-availability needs in a complex environment while also providing better resource utilization.
For example, in an N+1 configuration, a single dedicated
server runs in passive mode while the rest of the servers
actively process requests for their applications. If a server
98

PowerSolutions

completely fails, the passive node provides all the


resources of an unused server, rather than squeezing the
application onto another server already responsible for
several applications.
Multinode configurations can be implemented either
by mirroring content locally to each server or through a
switched storage fabric. Mirroring often requires complex
replication techniques and network overhead to push and
pull the content to all of the servers. The technology is
now widely available to build switched fabrics by using
storage area networks (SANs). SANs (see Figure 7) provide multinode clusters with excellent server-to-storage
performanceeven as additional servers are added to the
SANas well as the ability to effectively scale the
amount of storage the cluster nodes use.

Combine Both HA Cluster Types


for a Multitier Solution
High availability and scalability are equally important to the
construction of an e-commerce or a business-critical system.
Providing continuous service for a distributed, multitier
application is possible by deploying HA IP and HA application clusters together. Figure 8 shows an e-commerce
configuration in which a pair of LVS servers are responsible
for two Web-based applications, one for static content running on a pair of servers, and one for commerce applications running on three servers. Both of these sites are load
balanced through IP HA clustering and the active LVS
server. High availability for the database component of
the site is achieved with an active/active HA application
clustering solution. One node of the cluster runs a database

Client Network

Server 1
B
A
FC
HBA

FC
HBA

Server 2
B
A
FC
HBA

FC
HBA

FC
HBA

Fibre Channel
Switches

Fabric A

Storage
Processor A

Virtual IPs

Server 4
B
A

Server 3
B
A
FC
HBA

FC
HBA

Active LVS Server

FC
HBA

Fabric B

Private Network

Storage
Processor B

App
Server 1
Static
Content

Fibre Channel
Storage Enclosure

Passive LVS

App
Server 2
Static
Content

HA IP Cluster 1: Content

App
Server 1
Commerce
Application

App
Server 2
Commerce
Application

App
Server 3
Commerce
Application

App
Server 1
Database
Engine

App
Server 2
Database
Engine

HA IP Cluster 2: Commerce Application


Database 1Inventory
Database 2Orders
HA App Cluster: Database
External SCSI or
Fibre Channel Storage

Figure 7. A Four-Node Switched SAN Configuration

Figure 8. Using HA IP and HA Application Clusters Together

used for the sites catalog and inventory, while the other
node runs the orders database. Although both servers
cannot coprocess the same databases, each server can run
all of the databases simultaneously if one of the two
servers fails.

Jenwei Hsieh, Ph.D. (jenwei_hsieh@dell.com) is a member of


the Scale Out Systems Group at Dell. He has published extensively in the areas of multimedia computing and communications,
high-speed networking, serial storage interfaces, and distributed
network computing. Jenwei has a Ph.D. in Computer Science
from the University of Minnesota.

Support is Growing
Support for high availability and scalable services under
Linux is growing. As these technologies mature, we will
cover more specific implementations and features in
future articles. Presently, Linux is more widely used as
the front end of distributed, multitier configurations for
stateless mode operations, such as load-balanced Web
serving. While the need for high availability and scalability expands far beyond Web farms, the technologies
mentioned in this article provide a good starting point
for advanced solutions to come. As HA implementations
continue their migration from UNIX to Linux, the
number of proven options developed in the UNIX space
will continue to expand for Linux.
Tau Leng (leng_tau@dell.com) is a system engineer in the
Scale Out System Group. His product development responsibilities include cluster product solutions from Dell including
Linux high-performance and high-availability clusters. Tau
earned an M.S. in Computer Science from Utah State
University. Currently he is a Ph.D. candidate in Computer
Science at the University of Houston.
www.dell.com/powersolutions

Edward Yardumian (edward_yardumian@dell.com) is a technologist specializing in distributed systems, cluster computing, and
Internet infrastructure in the Scale Out Systems Group in the
Enterprise Server Products division at Dell. Previously, Ed
was a lead engineer for Dell PowerEdge Clusters.
FOR MORE INFORMATION

High-Availability Linux Project:


http://linux-ha.org/

Linux Virtual Server Project:


http://www.linuxvirtualserver.org/

Red Hat High Availability Server:


http://www.redhat.com/products/software/linux/
haserver/details.html

SteelEye LifeKeeper:
http://www.steeleye.com/

TurboLinux Cluster Server:


http://www.turbolinux.com/products/tcs/

Mission Critical Convolo Cluster:


http://www.missioncriticallinux.com/products/convolo/

PowerSolutions 99

100

PowerSolutions

Вам также может понравиться