Cluster Concept

Based on conferences that I have attended and E-mails that I receive, it always seems to me that when it comes to clustering,
quorums are one of the most commonly misunderstood topics. In order to effectively administer a cluster though, you need to understand what a quorum is and you need to know about the various types of quorums. In this article, I will explain what a quorum is and what it does. ince this tends to be a confusing topic for a lot of people, I will attempt to keep my explanations as simple as I can.
Clustering Basics
Before I can really talk about what a quorum is and what it does, you need to know a little bit about how a cluster works. !icrosoft server products support two main types of clustering" server clusters and network load balancing #$%B&. 'he design philosophy behind these two types of servers couldn(t be more different, but the one thing that both designs share is the concept of a virtual server. 'here are several different meanings to the term virtual server, but in clustering it has a specific meaning. It means that users #and other computers& see the cluster as a single machine even though it is made up of multiple servers. 'he single machine that the users see is the virtual server. 'he physical servers that make up the virtual server are known as cluster nodes.
Network Load Balancing

'hese two different types of clusters have two completely different purposes. $etwork %oad Balancing is known as a share all cluster. It gets this name because an application can run across all of the cluster(s nodes simultaneously. In this type of cluster, each server runs its own individual copy of an application. It is possible that each server can link to a shared database though. $etwork %oad Balancing clusters are most often used for hosting high demand )eb sites. In a network load balancing architecture, each of the cluster(s nodes maintains its own copy of the )eb site. If one of the nodes were to go down, the other nodes in the cluster pick up the slack. If performance starts to dwindle as demand increases, *ust add additional servers to the cluster and those servers will share the workload. + $etwork %oad Balancing cluster distributes the current workload evenly across all of the cluster(s active nodes. ,sers access the virtual server defined by the cluster, and the user(s request is serviced by the node that is the least busy.
Server Clusters
'he other type of cluster is simply known as a server cluster. + server cluster is known as a share nothing architecture. 'his type of cluster is appropriate for applications that can not be distributed across multiple servers. -or example, you couldn(t run a database server across multiple nodes because each node would receive updates independently, and the databases would not be synchroni.ed. In a server cluster, only one node is active at a time. 'he other node or nodes are placed in a sort of stand by mode. 'hey are waiting to take over if the active node should fail. +s you may recall, I said that server clusters are used for applications that can not be distributed across multiple nodes. 'he reason that it is possible for a node to take over running an application when the active node fails is because all of the nodes in the cluster are connected to a shared storage mechanism. 'his shared storage mechanism might be a /+I0 array, it might be a storage area network, or it might be something else. 'he actual media type is irrelevant, but the concept of shared storage is extremely important in understanding what a quorum is. In fact, server clusters is the only type of clustering that uses quorums. $etwork load balancing does not use quorums. 'herefore, the remainder of this discussion will focus on server clusters.
What is a Quorum?
12, now that I have given you all of the necessary background information, let(s move on to the big question. )hat is a quorum3 'o put it simply, a quorum is the cluster(s configuration database. 'he database resides in a file named 4! 5 4quolog.log. 'he quorum is sometimes also referred to as the quorum log.
+lthough the quorum is *ust a configuration database, it has two very important *obs. -irst of all, it tells the cluster which node should be active. 'hink about it for a minute. In order for a cluster to work, all of the nodes have to function in a way that allows the virtual server to function in the desired manner. In order for this to happen, each node must have a crystal clear understanding of its role within the cluster. 'his is where the quorum comes into play. 'he quorum tells the cluster which node is currently active and which node or nodes are in stand by. It is extremely important for nodes to conform to the status defined by the quorum. It is so important in fact, that !icrosoft has designed the clustering service so that if a node can not read the quorum, that node will not be brought online as a part of the cluster. 'he other thing that the quorum does is to intervene when communications fail between nodes. $ormally, each node within a cluster can communicate with every other node in the cluster over a dedicated network connection. If this network connection were to fail though, the cluster would be split into two pieces, each containing one or more functional nodes that can not communicate with the nodes that exist on the other side of the communications failure. )hen this type of communications failure occurs, the cluster is said to have been partitioned. 'he problem is that both partitions have the same goal" to keep the application running. 'he application can(t be run on multiple servers simultaneously though, so there must be a way of determining which partition gets to run the application. 'his is where the quorum comes in. 'he partition that 6owns7 the quorum is allowed to continue running the application. 'he other partition is removed from the cluster.
Types of Quorums
o far in this article, I have been describing a quorum type known as a standard quorum. 'he main idea behind a standard quorum is that it is a configuration database for the cluster and is stored on a shared hard disk, accessible to all of the cluster(s nodes. In )indows erver 899:, !icrosoft introduced a new type of quorum called the !a*ority $ode et ;uorum #!$ &. 'he thing that really sets a !$ quorum apart from a standard quorum is the fact that each node has its own, locally stored copy of the quorum database. +t first, each node having its own copy of the quorum database might not seem like a big deal, but it really is because it opens the doors to long distance clustering. tandard clusters are not usually practical over long distances because of issues involved in accessing a central quorum database in an efficient manner. <owever, when each node has its own copy of the database, geographically dispersed clusters become much more practical. +lthough !$ quorums offer some interesting possibilities, they also have some serious limitations that you need to be aware of. 'he key to understanding !$ is to know that everything works based on ma*orities. 1ne example of this is that when the quorum database is updated, each copy of the database needs to be updated. 'he update isn(t considered to have actually been made until over half of the databases have been updated ##number of nodes = 8& >?&. -or example, if a cluster has five nodes, then three nodes would be considered the ma*ority. If an update to the quorum was being made, the update would not be considered valid until three nodes had been updated. 1therwise if two or fewer nodes had been updated, then the ma*ority of the nodes would still have the old quorum information and therefore, the old quorum configuration would still be in effect. 'he other way that a !$ quorum depends on ma*orities is in starting the nodes. + ma*ority of the nodes ##number of nodes =8& >?& must be online before the cluster will start the virtual server. If fewer than the ma*ority of nodes are online, then the cluster is said to 6not have quorum7. In such a case, the necessary services will keep restarting until a sufficient number of nodes are present. 1ne of the most important things to know about !$ is that you must have at least three nodes in the cluster. /emember that a ma*ority of nodes must be running at all times. If a cluster only has two nodes, then the ma*ority is calculated to be 8 ##8 nodes = 8& >?&-8. 'herefore, if one node were to fail, the entire cluster would go down because it would not have quorum.
Split-brain, Quorum, and Fencing - updated

In some ways, an HA system is pretty simple - it starts services, it stops them, and it sees if they and the computers that run them are still running. But, there are a few bits of important "rocket science" hiding in there among all these apparently simple tasks. uch of the rocket science that!s there centers around trying to solve a single thorny problem - split brain. "he methods that are used to solve this problem are #uorum and fencing. $nfortunately, if you manage an HA system you need to understand these issues. %o this post will concentrate on these three topics& split-brain, quorum, and fencing. If you have three computers and some way for them to communicate with each other, you can make a cluster out of them and,each can monitor the others to see if their peer has crashed. $nfortunately, there!s a problem here - you can!t distinguish a crash of a peer from broken communications with the peer. All you really know is that you can!t hear anything from them. 'ou!re really stuck in a (unn!s law)*+ situation - where you really don!t know very much, but desperately need to. aybe you don!t feel too desperate yet. ,erhaps you think that you don!t need to be able to distinguish these two cases. "he truth is that sometimes you don!t need to, but much of the time you very much need to be able to tell the difference. -et!s see if I can make this clearer with an illustration. -et!s say you have three computers, paul, silas, and mark, and paul and silas can!t hear anything from mark and vice versa. -et!s further suppose that mark had a filesystem /importantstuff from a %A. volume mounted on it when we lost contact with it. and that mark is alive but out of contact. /hat happens if we 0ust go ahead and mount /importantstuff up on paul1 "he short answer is that bad things will happen)2+. /importantstuff will be irreparably corrupted as two different computers update the disk independently. "he ne3t #uestion you!ll ask yourself is "/here are those backup tapes1". "hat!s the kind of #uestion that!s been known to be career-ending.
Split-Brain
"his problem of a subset of computers in a cluster beginning to operate autonomously from each other is called Split Brain)4+. In our e3ample above, the cluster has split into two subclusters& 5 paul, silas6 and 5mark6, and each subset is unaware of the others. "his is the perhaps most difficult problem to deal with in high-availability clustering. Although this situation does not occur fre#uently in practice, it does occur more often than one would guess. As a result, it!s vital that a clustering system have a way to safely deal with this situation. 7arlier I mentioned that there was information you really want to know, but don!t know. 73actly what information did I mean1 /hat I wanted to know was "is it safe to mount up /importantstuff somewhere else1". In turn, you could figure that out if you knew the answer to one of these two #uestions& "Is mark really dead1" which is one way of figuring out "Is mark going to write on the volume any more1" But, of course, since we can!t communicate with mark, this is pretty hard to figure out. %o, cluster developers came out with a kind of clever way of ensuring that this #uestion can be answered. /e call that answer fencing.
Fencing
8encing is the idea of putting a fence around a subcluster so that it can!t access cluster resources, like /importantstuff. If you put a fence between it and its resources, then suddenly you know the answer to the #uestion "Is mark going to write on the volume any more1" - and the answer is no - because that!s what the fence is designed to prevent. %o, instead of passively wondering what the answer to the safeness #uestion is, fencing takes action to ensure the "right" answer to the #uestion. "his sort of abstract idea of fencing is fine enough, but how is this fencing stuff actually done1 "here are basically two general techni#ues& resource fencing )9+ and node fencing.):+.
Resource fencing is the idea that if you know what resources a node might be using, then you can use some method of keeping it from accessing those resources. 8or e3ample, if one has a disk which is accessed by a fiber channel switch, then one can talk to the fiber channel switch and tell it to deny the errant node access to the %A.. Node fencing is the idea that one can keep a node from accessing all resources - without knowing what kind of resources it might be accessing, or how one might deny access to them. A common way of doing this is to power off or reset the errant node. "his is a very effective if somewhat inelegant method of keeping it from accessing anything at all. "his techni#ue is also called %";.I"H)<+ - which is a graphic and colorful acronym standing for %hoot "he ;ther .ode In "he Head.
/ith fencing, we can easily keep errant nodes from accessing resources, and we can now keep the world safe for democracy - or at least keep our little corner of it safe for clustering. An important aspect of good fencing techni#ues is that they!re performed without the cooperation of the node being fenced off, and that they give positive confirmation that the fencing was done. %ince errant nodes are suspect, it!s by far better to rely on positive confirmation from a correctly operating fencing component than to rely on errant cluster nodes you can!t communicate with to police themselves. Although fencing is sufficient to ensure safe resource access, it is not typically considered to be sufficient for happy cluster operation because without some other mechanism, there are some behaviors it can get into which can be significantly annoying =even if your data really is safe>. "o discuss this, let!s return our sample cluster. 7arlier we talked about how paul or silas could use fencing to keep the errant node mark from accessing /importantstuff. But, what aboutmark1 If mark is still alive, then it is going to regard paul and silas as errant, not itself. %o, it would also proceed to fence paul and silas - and progress in the cluster would stop. If it is
using %";.I"H, then one could get into a sort of infinite reboot loop, with nodes declaring each other as errant and rebooting each other, coming back up and doing it all over again. Although this is kind of humorous the first time you see this in a test environment - in production with important services, the humor of the situation probably wouldn!t be your first thought. "o solve this problem, we introduce another new mechanism - quorum.
Quorum
;ne way to solve the mutual fencing dilemma described above is to somehow select only one of these two subclusters to carry on and fence the subclusters it can!t communicate with. ;f course, you have to solve it without communicating with the other subclusters - since that!s the problem - you can!t communicate with them. "he idea of #uorum represents the process of selecting a uni#ue =or distinguished for the mathematically inclined> subcluster. "he most classic solution to selecting a single subcluster is a ma0ority vote. If you choose a subcluster with more than half of the members in it, then =barring bugs> you know there can!t be any other subclusters like this one. %o, this is looks like a simple and elegant solution to the problem. 8or many cases, that!s true. But, what if your cluster only has two nodes in it1 .ow, if you have a single node fail, then you can!t do anything - no one has #uorum. If this is the case, then two machines have no advantage over a single machine - it!s not much of an HA cluster. %ince 2-node HA clusters are by far the most common si?e of HA cluster, it!s kind of an important case to handle well. %o, how are we going to get out of this problem1
Quorum Variants and Improvements

/hat you need in this case, is some kind of a 4rd party arbitrator to help select who can fence off the other nodes and allow you to bring up resources - safely. "o solve this problem there is a variety of other methods available to act as this arbitrator - either software or hardware. Although there are several methods available to use as
arbitrator, we!ll only talk about one each of hardware and software methods& SCSI reserve and Quorum Daemon.
SCSI reserve& In hardware, we fall back on our friend %@%I reserve. In this usage, both nodes try and reserve a disk partition available to both of them, and the %@%I reserve mechanism ensures that only one of the two of them can succeed. Although I won!t go into all the gory details here, %@%I reserve creates its own set of problems including it won!t work reliably over geographic distances. A disk which one uses in this way with %@%I reserve to determine #uorum is sometimes called a #uorum disk. %ome HA implementations =notably icrosoft!s> re#uire a #uorum disk. Quorum Daemon& In -inu3-HA)A+, we have implemented a #uorum daemon - whose sole purpose in life is to arbitrate #uorum disputes between cluster members. ;ne could argue that for the purposes of #uorum this is basically %@%I reserve implemented in software - and such an analogy is a reasonable one. However, since it is designed for only this purpose, it has a number of significant advantages over %@%I reserve - one of which is that it can conveniently and reliably operate over geographic distances, making it ideal for disaster recovery =(B> type situations. I!ll cover the #uorum daemon and why it!s a good thing in more detail in a later posting. Both H, and %un have similar implementations, although I have security concerns about them, particularly over long distances. ;ther than the security concerns =which might or might not concern you>, both H,!s and %un!s implementations are also good ideas.
Arguably the best way to use these alternative techni#ues is not directly as a #uorum method, but rather as a way of breaking ties when the number of nodes in a subcluster is e3actly half the number of nodes in the cluster. ;therwise, these mechanisms can become single points of failure - that is, if they fail the cluster cannot recover.
lternatives to Fencing
"here are times when it is impossible to use normal 4rd-party fencing techni#ues. 8or e3ample, in a split-site configuration =a cluster which is split across geographically distributed sites>, when inter-site communication fails, then attempts to fence will also fail. In these cases, there are a few self-fencing alternatives which one can use when the more normal third-party fencing methods aren!t available. "hese include&
Node suicide. If a node is running resources and it loses #uorum, then it can power itself off or reboot itself =sort of a self-%";.I"H>. "he remaining nodes wait "long enough" for the other node to notice and kill itself. "he problem is that a node which is sick might not succeed in self-suicide, or might not notice that it had a membership change, or had lost #uorum. It is e#ually bad if notification of these events is simply delayed "too long". %ince there is a belief that the node in #uestion is, or at least might be, malfunctioning, this is not a trivial #uestion. In this case, use of hardware or software watchdog timers becomes critical. Self-s!utdo"n. "his self-fencing method is a variant on suicide, e3cept that resources are stopped gracefully. It has many of the same problems, e3cept it is somewhat less reliable because the time to shut down resources can be #uite long. -ike the case above, use of hardware or software watchdog timers becomes critical.
.ote that without fencing, the membership and #uorum algorithms are e3tremely critical. 'ou!ve basically lost a layer of protection, and you!ve switched from relying on a component which gives positive confirmation to relying on a probably faulty component to fence itself, and then hoping without confirmation that you!ve waited long enough before continuing.
Summar#
%plit-brain is the idea that a cluster can have communication failures, which can cause it to split into subclusters. 8encing is the way of
ensuring that one can safely proceed in these cases, and #uorum is the idea of determining which subcluster can fence the others and proceed to recover the cluster services.
n Important Final Note

It is fencing which best guarantees the safety of your resources. .othing else works #uite as well. If you have fencing in your cluster software, and you have irreparable resources =i.e. that would be irreparably damaged in a split-brain situation>, then you must configure fencing. If your HA software doesn!t support =4rd party> fencing, then I suggest that you consider getting a different HA package.

Cluster Concept

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cluster Concept

Загружено:

Авторское право:

Доступные форматы

Based on conferences that I have attended and E-mails that I receive, it always seems to me that when it comes to clustering,

Network Load Balancing

Split-brain, Quorum, and Fencing - updated

Quorum Variants and Improvements

n Important Final Note

Вам также может понравиться