Вы находитесь на странице: 1из 56

1

Red Hat Clustering:


Best Practices & Pitfalls
Lon Hohberger
Principal Software Engineer
Red Hat
May 2013
2
Red Hat Clustering: Best Practices & Pitfalls

Why Cl!ter"

#$% &encing and 'or Cl!ter

2()ode Cl!ter! and Why they are Special

*or+ ,i!-!

Ser.ice Strctre

Mltipath Con!ideration! in a cl!tered en.iron+ent

/&S2 0 Cl!ter &ile Sy!te+


3
Why Cluster?

1pplication$Ser.ice &ailo.er

Redce M22R

Meet b!ine!! need! and SL1!

Protect again!t !oftware and hardware falt!

3irtal +achine +anage+ent

1llow for planned +aintenance with +ini+al downti+e

Load 4alancing

Scale ot wor-load!

#+pro.e application re!pon!e ti+e!


5
Why not Cluster?

%ften re6ire! additional hardware

#ncrea!e! total !y!te+ co+ple7ity

More po!!ible part! that can fail

More failre !cenario! to e.alate

Harder to configre

Harder to debg proble+!


8
Component Overview

coro!ync 0 2ote+ SRP$RRP(ba!ed +e+ber!hip9 3S


+e!!aging9 clo!ed proce!! grop!

c+an 0 6or+9 .oting9 6or+ di!-

fenced 0 handle! #$% fencing for :oined +e+ber!

&encing agent! 0 carry ot fencing operation!

,LM 0 di!tribted loc- +anager ;-ernel<

cl.+d 0 cl!ter logical .ol+e +anager

gf!2 0 cl!ter file !y!te+

rg+anager 0 cold failo.er for application!

Pace+a-er ;2P< 0 )e7t(generation CRM


=
ailure Recovery Overview

coro!ync ( 2ote+ to-en i! lo!t> 2ote+ for+! a new ring

fenced enter! reco.ery !tate 0 6orate partition


initiate! fencing of dead node;!<

,LM enter! reco.ery !tate 0 loc-! on dead node;!< are


dropped

cl.+d9 gf!2 enter reco.ery !tate 0 reco.er $ replay


:ornal!

rg+anager initiate! cold failo.er of !er application!


?
!"O encing

1n acti.e conter+ea!re ta-en by a fnctioning ho!t


to i!olate a +i!beha.ing or pre!+ed dead ho!t fro+
!hared data

Mo!t critical part of a cl!ter tili@ing S1) or other


!hared !torage technology

,e!pite thi!9 not e.eryone !e! it

How +ch i! yor data worth"

Re6ired by gf!29 cl.+d9 and cold failo.er !oftware


!hipped by Red Hat

Atili@ed by RHE39 too 0 &encing i! not a cl!ter(


!pecific technology
B
!"O encing

Protect! data in the e.ent of planned or nplanned


!y!te+ downti+e

Cernel panic

Sy!te+ free@e

Li.e hang $ reco.ery

Enable! node! to !afely a!!+e control of !hared


re!orce! when booted in a networ- partition !itation
D
!"O encing

S1) fabric and SCS# fencing are not fully recoverable

)ode +!t typically be rebooted +anally

Enable! an atop!y of the node

So+eti+e! doe! not re6ire additional hardware

Power fencing i! !ally fully recoverable

'or !y!te+ can reboot and re:oin the cl!ter ( thereby


re!toring capacity ( withot ad+ini!trator inter.ention

2hi! i! a redction in M22R


10
!"O encing # $raw%ac&s

,ifficlt to configre

)o ato+ated way to Edi!co.erF fencing de.ice!

&encing de.ice! are all .ery different and ha.e different


per+i!!ion !che+e! and re6ire+ent!

2ypically re6ire! additional hardware

1dditional co!t often not con!idered when prcha!ing


!y!te+!

1 gi.en Eappro.edF #H3 +ay not !ell the hardware yo


want to !e
11
!"O encing # Best Practices

#ntegrated power +anage+ent

A!e !er.er! with dal power !pplie!

A!e a bac-p fencing de.ice

#PM# o.er L1) fencing !ally re6ire! di!abling acpid

Single(rail !witched P,A!

A!e 2 !witched P,A!

A!e a P,A with two power rail!

A!e a bac-p fencing de.ice


12
Ho!t Ho!t
!ntegrated Power 'anagement Pitfall
&encing ,e.ice
)et
&encing ,e.ice
)et

Ho!t ;and fencing de.ice< lo!e


power

Safe to reco.er> ho!t i! off

Ho!t and &encing ,e.ice lo!e


networ- connecti.ity

)E3ER !afe to reco.erG

2he two ca!e! are indi!tingi!hable

1 ti+eot doe! not en!re data integrity in thi! ca!e

)ot all integrated power +anage+ent de.ice! !ffer thi!


proble+
13
(ingle Rail Pitfall
Ho!t
Ho!t
&
e
n
c
i
n
g

,
e
.
i
c
e

%ne power cord H Single Point of &ailre


Ho!t
Ho!t
&
e
n
c
i
n
g

,
e
.
i
c
e
15
Best Practice: $ual Rail encing $evice
Ho!t
Ho!t

,al power !orce!9 two rail! in the fencing de.ice9


two power !pplie! in the cl!ter node!

&encing de.ice electronic! rn off of either rail


Rail 4
Rail 1
&encing
,e.ice
Cl!ter
#nterconnect
18
Best Practice: $ual (ingle Rail encing $evices
Ho!t
Ho!t

,al power !orce!9 two fencing de.ice!


,e.ice
4
,e.ice
1
Cl!ter
#nterconnect
1=
!"O encing # Pitfalls

S1) fabric fencing

&ll reco.ery typically not ato+atic

Anfencing in RHEL= allow! a ho!t to trn on it! port!


after reboot

SCS#(3 PR fencing

)ot all de.ice! !pport it

*or+ di!- +ay not re!ide on a LA) +anaged by


SCS# fencing de to 6or+ Echic-en and eggF proble+
1?
!"O encing ) Pitfalls

SCS#(3 PR &encing ;contI<

Preempt-and-abort co++and i! not re6ired by SCS#(3


!pecification

)ot all SCS#(3 co+pliant de.ice! !pport it

LA) detection can be done by 6erying CL3M9 loo-ing


for .ol+e grop! with the cl!ter tag !et

%n RHEL=9 watchdog !cript allow! reboot after fencing


1B
*)+ode Clusters

Mo!t co++on !e ca!e in high a.ailability $ cold


failo.er cl!ter!

#ne7pen!i.e to !et p> !e.eral can fit in a !ingle rac-

Red Hat ha! had two node failo.er cl!tering !ince


2002
1D
Why *)+ode Clusters are (pecial

Cl!ter operate! !ing a simple majority 6or+


algorith+

4e!t predictability with re!pect to node failre cont!


co+pared to other 6or+ algorith+! ;e7J /rid<

2here i! ne.er a +a:ority with one node ot of two

Si+ple SoltionJ two,node-./. +ode

When a node boot!9 it a!!+e! 6or+

Ser.ice!9 gf!29 etcI are pre.ented fro+ operating ntil


fencing co+plete!
20
*)+ode Pitfalls: ence 0oops

#f two node! beco+e partitioned9 a fence loop can


occr

)ode 1 -ill! node 49 who reboot! and -ill! node 1III


etcI

Soltion!

Correct networ- configration

&encing de.ice! on !a+e networ- !ed for cl!ter


co++nication

A!e fencing delay!

A!e a 6or+ di!-


21
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
)etwor-
Cl!ter #nterconnect
22
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
Cl!ter #nterconnect
Cable pll or !witch
lo!e! power
)etwor-
23
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
&encing Re6e!t
&encing Re6e!t
bloc-ed> de.ice
allow! only
one !er at a
ti+e
Cl!ter #nterconnect
)etwor-
25
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
)ode 1 power
cycled
)etwor-
Cl!ter #nterconnect
28
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
)ode 1
boot!
Cl!ter #nterconnect
)etwor-
2=
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
&encing Re6e!t
)etwor-
Cl!ter #nterconnect
2?
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
)ode 2 power
cycled
)etwor-
Cl!ter #nterconnect
2B
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
)ode 2 boot!
)etwor-
Cl!ter #nterconnect
2D
ence 0oop
)ode 1 )ode 2
&encing ,e.ice
&encing Re6e!t
)etwor-
Cl!ter #nterconnect
30
!mmune to ence 0oops

%n cable pll9 node


withot connecti.ity can
not fence

#f interconnect die! and


co+e! bac- later9 fencing
de.ice !eriali@e! acce!!
!o that only one node i!
fenced )ode 1 )ode 2
&encing ,e.ice
Cl!ter #nterconnect
31
*)+ode Pitfalls: ence $eath

1 co+bined pitfall when !ing integrated power in two


node cl!ter!

#f a two node cl!ter beco+e! partitioned9 a fence


death can occr if fencing de.ice! are !till acce!!ible

2wo node! tell each otherK! fencing de.ice to trn off


the other node at the !a+e ti+e

)o one i! ali.e to trn either ho!t bac- onG

Soltion!

Sa+e a! fence loops

A!e a !witched P,A which !eriali@e! acce!!


32
ence $eath
)ode 1 )ode 2
&encing
,e.ice
&encing
,e.ice
)etwor-
Cl!ter #nterconnect
33
ence $eath
)ode 1 )ode 2
&encing
,e.ice
&encing
,e.ice
)etwor-
Cl!ter #nterconnect
Cl!ter interconnect
i! lo!t ;cable pll9
!witch trned off9
etcI<
35
ence $eath
)ode 1 )ode 2
&encing
,e.ice
&encing
,e.ice
&encing
Re6e!t
&encing
Re6e!t
)etwor-
Cl!ter #nterconnect
4oth node! fence each other
38
ence $eath
)ode 1 )ode 2
&encing
,e.ice
&encing
,e.ice
)etwor-
Cl!ter #nterconnect
)o one i! ali.e
to trn the other
bac- onI
3=
!mmune to ence $eath
)ode 1 )ode 2
&encing ,e.ice

Single power fencing


de.ice !eriali@e! acce!!

Cable pll en!re! one


node Elo!e!F
Cl!ter #nterconnect
3?
*)+ode Pitfalls: Crossover Ca%les

Ca!e! both node! to lo!e lin- on cl!ter interconnect


when only one lin- ha! failed

#ndeter+inate !tate for 6or+ di!- withot very cle.er


heri!tic! ;!e master_wins<

&encing canKt be placed on the !a+e networ-

We donKt te!t thi!


3B
*)+ode Clusters: Pitfall avoidance

)etwor- $ fencing configration e.alation

A!e a 6or+ di!-

Create a 3 node cl!ter J<

Si+ple to configre9 increa!ed wor-ing capacity9 etcI


3D
1uorum $is& ) Benefits

Pre.ent! fence-loop and fence death !itation!

E7i!ting cl!ter +e+ber retain! 6or+ ntil it fail! or


cl!ter connecti.ity i! re!tored

Heri!tic! en!re that ad+ini!trator(defined Ebe!t(fitF


node contine! operation in a networ- partition

Pro.ide! all-but-one or last-man-standing failre +ode

E7a+ple!J

5 node cl!ter9 and 3 node! fail

5 node cl!ter and 3 node! lo!e acce!! to a critical networ-


path a! decided by the ad+ini!trator

)oteJ En!re capacity of re+aining node i! ade6ate


for all cl!ter operation! before trying thi!
50
1uorum $is& ) $raw%ac&s

A!ed to be co+ple7 to configre9 bt RHEL =I3 fi7e!


+o!t of thi!

Heri!tic! need to be written by ad+ini!trator! for their


particlar en.iron+ent!

#ncorrect configration can reduce a.ailability

1lgorith+ !ed i! non(traditional

4ac-p +e+ber!hip algorith+ .!I owner!hip algorith+


or !i+ple Etie(brea-erF
51
1uorum $is& 2iming Pitfall 3RH4056
52
1uorum $is& 'ade 7(imple. 3RH4056

*or+ di!- failre reco.ery !hold be a bit le!! than


half of CM1)K! failre ti+e

2hi! allow! for the 6or+ di!- arbitration node to fail


o.er before CM1) ti+e! ot

*or+ di!- failre reco.ery !hold be appro7i+ately


30L longer than a +ltipath failo.erI E7a+ple M1NJ

7 H +ltipath failo.er

7 O 1I3 H 6or+ di!- failo.er

7 O 2I? H CM1) failo.er


M1N httpJ$$-ba!eIredhatIco+$fa6$doc!$,%C(2BB2
53
1uorum $is& Best Practices

,onKt !e it if yo donKt need it

&encing delay! can !ally pro.ide ade6ate deci!ion(


+a-ing

#f re6ired9 !e heri!tic! for your en.iron+ent

Prefer master_wins o.er heri!tic!

#$% Schedling

deadline !chedler

cfq !chedler with realti+e prio

ionice -c 1 -n 0 -p `pidof qdiskd`


55
Clustered (ervices # Best Practices

Ser.ice !trctre !hold be a! flat a! po!!ible

#+pro.e! readability $ +aintainability

Redce! configration file footprint

Rg+anager fi7e! +o!t co++on ordering +i!ta-e!

2he re!orce! bloc- i! not re6ired

3irtal +achine! !hold not e7ceed +e+ory li+it! of a


ho!t after a failo.er for be!t predictability
58

With SCS#(3 PR &encing9 +ltipath wor-!9 bt only


when !ing de.ice(+apper

When !ing +ltiple path! and S1) fencing9 yo +!t


en!re all path! to all !torage i! fenced for a gi.en
ho!t

When !ing +ltipath with a 6or+ di!-9 yo must


not !e no,path,retry - 8ueueI

When !ing +ltipath with /&S29 yo should not use


no,path,retry - 8ueueI
On 'ultipath
5=
On 'ultipath

,o not place $.ar on a +ltipath de.ice withot


relocating the binding! file to the root partition

)ot all S1) fabric! beha.e the !a+e way in the !a+e
failre !cenario!

2e!t all failre !cenario! yo e7pect to ha.e the cl!ter


handle

A!e de.ice(+apper +ltipath rather than .endor


!pplied .er!ion! for the be!t !pport fro+ Red Hat
5?
9(* # (hared $is& Cluster ile (ystem

Pro.ide nifor+ .iew! of a file !y!te+ in a cl!ter

P%S#P co+pliant ;a! +ch a! Lin7 i!9 anyway<

1llow ea!y +anage+ent of thing! li-e .irtal +achine


i+age!

/ood for getting lot! of data to !e.eral node! 6ic-ly


5B
9(* Considerations

Qornal cont ;cl!ter !i@e<

%ne :ornal per node

&ile !y!te+ !i@e

%nline e7tend !pported

Shrin-ing i! not !pported

Wor-load re6ire+ent! R planned !age


5D
9(* Pitfalls

Ma-ing a file !y!te+ with lock_nolock a! the loc-ing


protocol

&ailre to allocate enogh :ornal! at file !y!te+


creation ti+e and adding node! to the cl!ter ;/&S
only<

)&S loc- failo.er does not workG

)e.er !e a cl!ter file !y!te+ on top of an +d(raid


de.ice

A!e of local file !y!te+! on +d(raid for failo.er i! al!o


not !pported
80
Other 2opics

Stretch cl!tering 0 +ltiple bilding! on the !a+e


ca+p! in the !a+e cl!ter

Mini+al !pport for thi!

/eographic cl!tering $ di!a!ter tolerance 0 longer(


di!tance

E.alated typically on a ca!e(by(ca!e ba!i!> re6ire!


!ite to !ite !torage replication and a bac-p cl!ter

1cti.e$acti.e cl!tering acro!! !ite! i! not !pported


81
2rou%leshooting corosync & C':+

coro!ync doe! not ha.e an ea!y tool to a!!i!t


troble!hooting> chec- !y!te+ log! ;it i! very .erbo!e
if proble+! occr<

Mo!t co++on proble+ w$ coro!ync i! incorrect


+ltica!t configration on the !witch

A,PA ;=I2S< +ore reliable

cman,tool status

Show! cl!ter !tate! ;inclI .ote!<

cman,tool nodes

Show cl!ter node !tate!


82
2rou%leshooting encing

group,tool ls 0 2he fence grop !hold be in )%)E


;or ErnF depending on .er!ion<

#f it i! in another !tate ;&1#LTS2%PTW1#29


&1#LTS21R2TW1#2<9 chec- log! on the low node #,

cman,tool nodes )f 0 Show node! and the la!t ti+e


each were fenced ;if e.er<

fence,ac&,manual )e )n ;node< ( e+ergency fencing


o.errideI A!e if yo are !re the ho!t i! dead and the
fencing de.ice i! inacce!!ible ;or if fencing i!
incorrectly configred< to allow the cl!ter to reco.erI
83
(ummary

Choo!e a fencing configration which wor-! in the


failre ca!e! yo e7pect

2e!t all failre ca!e! yo e7pect the cl!ter to reco.er


fro+

2he +ore co+ple7 the !y!te+9 the +ore li-ely a !ingle


co+ponent will fail

A!e the !i+ple!t configration whene.er po!!ible

When !ing cl!tered file !y!te+!9 tne according to


yor wor-load
85
References

http!J$$acce!!IredhatIco+$-nowledge$!oltion!$1??B5

http!J$$acce!!IredhatIco+$-nowledge$node$2B=03

http!J$$acce!!IredhatIco+$-nowledge$node$2D550

http!J$$acce!!IredhatIco+$-nowledge$article!$50081

httpJ$$peopleIredhatIco+$lhh$Cl!terPitfall!Ipdf
88
Comple= +(P Cluster
Ho!t
Ho!t

1ny !ingle failre in the !y!te+ either allow! reco.ery


or contined operation

4ordering on in!ane
Rail 4
Rail 1
&encing
,e.ice
)et *or+
Cl!ter
)et
8=
(impler +(P configuration
Ho!t
Ho!t
Rail 4
Rail 1
&encing
,e.ice
Ho!t
Switch1
Switch2
#SL

Вам также может понравиться