Вы находитесь на странице: 1из 12

Two Node MYSQL HA Cluster with Heartbeat and MON

This document describes how to build a two node MYSQL Failover Cluster using Heartbeat and MON. I have tested this on Linux ( Redhat ) , but this should work on any *NIX distributions.

What this document is for ?


Now this setup can be used in small to medium organizations which use MYSQL as a backend for various operations like websites, accounts, etc, and who are currently using MYSQL in a Master/Slave replicated setup, for redundancy, so that if the Master fails the Slave can be brought up to take over the Master, but in this case there is manual intervention involved, so if the master fails during the night at 2am, the whole application would be down until the Slave is brought back online to take over the Master. Now what this document provides is a way to setup that failover without manual intervention, that is if for some reason the Master fails the Slave would take over the IP address of the master and start serving clients without any downtime, and this is completely transparent to the end users, and the switchover can take less than 2 seconds.

What you need for this setup ?


I am starting off with the assumption that you already have a working Master/Slave replication in your network, if you don't I strongly advice you to read the documentation at http://mysql.com to setup replication in Mysql. 1) You will either need a Null Modem Cable or a Crossover Ethernet Cable to connect the two machines for privately exchanging heartbeats between the two, this is highly recommended, so if you want to use the Crossover cable then you need a free ethernet port on each of the machines, or a free serial port on each of them if you are using a null modem cable.... 2) Heartbeat software from http://www.linux-ha.org/Heartbeat 3) MON from http://www.kernel.org/software/mon/

Installing and configuring the Software Heartbeat (1.2.3)


Download all the required rpm's for your distribution into a folder and install them as follows ... # rpm -ivh heartbeat-ldirectord-1.2.3-2.rh.9.i386.rpm # rpm -ivh heartbeat-pils-1.2.3-2.rh.9.i386.rpm # rpm -ivh heartbeat-stonith-1.2.3-2.rh.9.i386.rpm # rpm -ivh heartbeat-1.2.3-2.rh.9.i386.rpm Now this install would create a directory called /etc/ha.d/ ( on Redhat) and this is where all your configuration files are for HB to work properly...

The three main configuration files to make <HB >happy are as follows
<"/etc/ha.d/authkeys"> <"/etc/ha.d/haresources"> <"/etc/ha.d/ha.cf">

Create or edit the "/etc/ha.d/authkeys" file on both the servers to look like this

auth 1 1 crc

Create or edit the "/etc/ha.d/haresources" file as follows on both the boxes to look like thishost1.example.com 192.168.100.81 mysql mon

Note: Please note that this file "SHOULD BE IDENTICAL" on both the machines ..... Note: This line basically tells HB that host1.example.com is the Active machine and the resources it is monitoring it is keeping are mysql and mon , also 192.168.100.81 is the virtual IP address of the Cluster....so all your clients would be connecting to this IP Note: "mysql" and "mon" described here are the startup scripts for Mysql and Mon usually in /etc/init.d/ ,so if you dont have them place them appropriately ( it is described later in this document ) under the "Resource scripts section"

Create or edit the "/etc/ha.d/ha.cf" file on both the machines to look like this

#This is the start of ha.cf file logfile /var/log/ha-log

# Sets the time between heartbeats to 2 seconds. keepalive 2

#Time in seconds before issuing a "late heartbeat" warning in the logs. warntime 30 # the time to wait before declaring a node to be dead , you should tune this to avoid DeadLocks... deadtime 80 # this is the time before heartbeat will start the resources the first time it starts ie mon,mysql....a initdead 90 # if you are using the crossover cable bcast eth1

# if you are using the null modem cable (I would recommend using both for redundancy) serial /dev/ttyS0 # nodes in the cluster # Please make sure that you add this entry to /etc/hosts on both the machines # do a #hostname on each of the machines and add the following to /etc/hosts ########################################################## # /etc/hosts file on BOTH machines should contain these entries # 192.168.100.82 host1.example.com # 192.168.100.83 host2.example.com node host1.example.com node host2.example.com # cannot set auto_failback to on as we are not implementing Multi-Master Replication in Mysql auto_failback off # will ping the default switch(s)/nodes to see if this node has network connectivity at all # thus taking care of failure for the Primary NIC (S) itself , also it is done every keepalive #seconds ( Note: we have a ping group here for redundancy ) ping_group ping-cluster-test 192.168.1.1 192.168.1.125

# This will run ipfail as the user hacluster , this will listen to the replies from the switch # this will make sure that the network is in good condition ... respawn hacluster /usr/lib/heartbeat/ipfail <end of file ha.cf>

MON (0.99.2)
Download the MON software and install in under /usr/local/mon on both the machines .... For Mon to function you will also need the following perl modules

* DBI * DBD::Mysql * Time::Period * Time::HiRes * Convert::BER * Mon::* you should install these modules before using MON usually as follows # gunzip <module>.tar.gz # tar -xvf <module>.tar # cd <module-dir> # perl Makefile.pl # make # make test # make install This will install MON in /usr/local/mon/ directory , you will need to config MON now First you will need a file called "mon.cf", which is MON"s main configuration file .... you can create it anywhere, but I would create it in /usr/local/mon dir as follows

Create or edit the "/usr/local/mon/mon.cf" file to look like this

#This is the start of mon.cf file alertdir = /usr/local/mon/alert.d mondir = /usr/local/mon/mon.d statedir = /usr/local/mon/state.d maxprocs = 20 histlength = 100 randstart = 60s ## Define groups of hosts to monitor Mysql Server ## This is the virtual IP of the Active/Master

hostgroup mysql_servers 192.168.100.81

## Changed the interval to 2m from 15sec, as sometimes mysql reaches Max_Connections for a short spike ## Also changed the alert after to 3 , so in 6mins the slave should take up if really the master mysql fails ##instead of a short spike changing it back to 1m and 3 failures as we have given the monitor super priv in ##mysql which would allow it to connect even

if max_connections are reached ..... watch mysql_servers service mysql interval 1m monitor mysql.monitor period wd {Mon-Sun} alert bring-ha-down.alert alert mail.alert -S "Host1 MYSQL is down" admin@example.com upalert mail.alert -S "Host1 MYSQL server is back up" admin@example.com alertevery 600s alertafter 3 <end of file mon.cf>

Now you will notice that there a new alert in "mon.cf" called "bring-ha-down.alert " , "This is very important,as this alert is the one which would stop HB on the master if MYSQL has failed on the Master".

Note: This is the main reason we are using MON , that is for Monitoring MYSQL'S Health. HB only monitors the Server itself and not any applications like MYSQL

so now create a "/usr/local/mon/alert.d/bring-ha-down.alert" file with the following line

/etc/rc.d/init.d/heartbeat stop

Now the other thing you notice in "mon.cf" is "mysql.monitor", this is actually a mysql monitor resource script in the directory "/usr/local/mon/mon.d" , this directory will initially have a script called mysql-msql.monitor , please change its name to mysql.monitor so that Mon knows that the resource to be monitered is Mysql and not Msql.

After you have changed the name of the script , edit the script (pretty self explanatory) to include the username,password, host,database etc of the user which Mon is supposed to use to check the health of the Mysql server ( it actually does so by connecting to mysql and issuing a simple query like "SHOW TABLES" ) Thats pretty much it with MON ....

Resource Scripts
Now you might have noticed that in /etc/ha.d/haresources file which reads host1.example.com 192.168.100.81 mysql mon here "mysql" and "mon" are the startup scripts for "mysql" and "mon" I am assuming you have the startup script for Mysql in may be /etc/init.d/ (/etc/rc.d/init.d) SL Note: Upon startup, Heartbeat will look in /etc/rc.d/init.d or /etc/ha.d/resource.d for it's resource start scripts. The Mon startup script could be as follows , so please create it and name it "mon" and place it in /etc/init.d (/etc/rc.d/init.d)

#!/bin/bash MON_HOME=/usr/local/mon case "$1" in start) if [ -f $MON_HOME/mon.pid ]; then echo "mon already started" exit fi echo "Starting Mon" $MON_HOME/mon -c $MON_HOME/mon.cf -L $MON_HOME -P $MON_HOME/mon.pid & ;; stop) if [ -f $MON_HOME/mon.pid ]; then echo "Stopping Mon" kill -9 `cat $MON_HOME/mon.pid` rm -f $MON_HOME/mon.pid else echo "no server pid, server doesn't seem to run" fi ;; status) echo "doing good" ;; *)

echo "Usage: $0 {start|stop|status|reload|restart}" exit 1 esac exit 0 <end of mon file>


so now HB knows where its resource scripts are and would start/shutdown them accordingly ....

Prime Time

( Testing )

Now if you have come so far then you are ready to start the cluster..... First start HB on the Master as follows as follows

/etc/init.d/heartbeat start This will start heartbeat on the master and you will find these logs in /var/log/ha-log
heartbeat: 2006/03/16_00:28:48 info: Configuration validated. Starting heartbeat 1.2.3 heartbeat: 2006/03/16_00:28:48 info: heartbeat: version 1.2.3 heartbeat: 2006/03/16_00:28:49 info: Heartbeat generation: 17 heartbeat: 2006/03/16_00:28:49 info: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat: 2006/03/16_00:28:49 info: ping heartbeat started. heartbeat: 2006/03/16_00:28:49 info: pid 3015 locked in memory. heartbeat: 2006/03/16_00:28:49 info: Local status now set to: 'up' heartbeat: 2006/03/16_00:28:50 info: pid 3018 locked in memory. heartbeat: 2006/03/16_00:28:50 info: pid 3022 locked in memory. heartbeat: 2006/03/16_00:28:50 info: pid 3021 locked in memory. heartbeat: 2006/03/16_00:28:50 info: pid 3019 locked in memory. heartbeat: 2006/03/16_00:28:50 info: pid 3020 locked in memory. heartbeat: 2006/03/16_00:28:50 info: Link 192.168.100.1:192.168.100.1 up. heartbeat: 2006/03/16_00:28:50 info: Status update for node 192.168.100.1: status ping heartbeat: 2006/03/16_00:28:50 info: Link host1.example.com:eth0 up. heartbeat: 2006/03/16_00:29:09 WARN: node host2.example.com: is dead heartbeat: 2006/03/16_00:29:09 info: Local status now set to: 'active' heartbeat: 2006/03/16_00:29:09 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90) heartbeat: 2006/03/16_00:29:09 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 3026)

heartbeat: 2006/03/16_00:29:09 WARN: No STONITH device configured. heartbeat: 2006/03/16_00:29:09 WARN: Shared disks are not protected. heartbeat: 2006/03/16_00:29:09 info: Resources being acquired from host2.example.com. heartbeat: 2006/03/16_00:29:09 info: Running /etc/ha.d/rc.d/status status heartbeat: 2006/03/16_00:29:09 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired heartbeat: 2006/03/16_00:29:09 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat: 2006/03/16_00:29:09 info: mach_down takeover complete. heartbeat: 2006/03/16_00:29:09 info: mach_down takeover complete for node host2.example.com. heartbeat: 2006/03/16_00:29:09 info: Local Resource acquisition completed. heartbeat: 2006/03/16_00:29:09 info: Running /etc/ha.d/rc.d/ip-request-resp iprequest-resp heartbeat: 2006/03/16_00:29:09 received ip-request-resp 192.168.100.81 OK yes heartbeat: 2006/03/16_00:29:09 info: Acquiring resource group: host1.example.com 192.168.100.81 mysql mon heartbeat: 2006/03/16_00:29:09 info: Running /etc/ha.d/resource.d/IPaddr 192.168.100.81 start heartbeat: 2006/03/16_00:29:09 info: /sbin/ifconfig sw0:0 192.168.100.81 netmask 255.255.255.0 broadcast 192.168.100.255 heartbeat: 2006/03/16_00:29:09 info: Sending Gratuitous Arp for 192.168.100.81 on sw0:0 [sw0] heartbeat: 2006/03/16_00:29:09 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.100.81 sw0 192.168.100.81 auto 192.168.100.81 ffffffffffff heartbeat: 2006/03/16_00:29:09 info: Running /etc/init.d/mysql start heartbeat: 2006/03/16_00:29:09 info: Running /etc/init.d/mon start heartbeat: 2006/03/16_00:29:21 info: Local Resource acquisition completed. (none) heartbeat: 2006/03/16_00:29:21 info: local resource transition completed.

You will also see that the VIP 192.168.100.81 has been created on the Master , you can check it as follows... #/sbin/ifconfig -a sw0 Link encap:Ethernet HWaddr 00:06:5B:8C:26:2A inet addr:192.168.100.82 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:792936520 errors:0 dropped:0 overruns:0 frame:0 TX packets:1331822244 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0 RX bytes:896274855 (854.7 Mb) TX bytes:676292302 (644.9 Mb) sw0:0 Link encap:Ethernet HWaddr 00:06:5B:8C:26:2A inet addr:192.168.100.81 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:43614 errors:0 dropped:0 overruns:0 frame:0 TX packets:43618 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2982264 (2.8 Mb) TX bytes:1744622 (1.6 Mb)

Now start HB on the slave and you will find the log as follows in /var/log/ha-log
heartbeat: 2006/03/16_00:30:17 info: ************************** heartbeat: 2006/03/16_00:30:17 info: Configuration validated. Starting heartbeat 1.2.3 heartbeat: 2006/03/16_00:30:17 info: heartbeat: version 1.2.3 heartbeat: 2006/03/16_00:30:17 info: Heartbeat generation: 21 heartbeat: 2006/03/16_00:30:17 info: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat: 2006/03/16_00:30:17 info: ping heartbeat started. heartbeat: 2006/03/16_00:30:17 info: pid 29545 locked in memory. heartbeat: 2006/03/16_00:30:17 info: Local status now set to: 'up' heartbeat: 2006/03/16_00:30:18 info: pid 29552 locked in memory. heartbeat: 2006/03/16_00:30:18 info: pid 29548 locked in memory. heartbeat: 2006/03/16_00:30:18 info: pid 29551 locked in memory. heartbeat: 2006/03/16_00:30:18 info: pid 29550 locked in memory. heartbeat: 2006/03/16_00:30:18 info: Link 192.168.100.1:192.168.100.1 up. heartbeat: 2006/03/16_00:30:18 info: Status update for node 192.168.100.1: status ping heartbeat: 2006/03/16_00:30:18 info: pid 29549 locked in memory. heartbeat: 2006/03/16_00:30:18 info: Link host1.example.com:eth0 up. heartbeat: 2006/03/16_00:30:18 info: Status update for node host1.example.com: status active heartbeat: 2006/03/16_00:30:18 info: Local status now set to: 'active' heartbeat: 2006/03/16_00:30:18 info: Starting child client "/usr/lib/heartbeat/ipfail " (90,90) heartbeat: 2006/03/16_00:30:18 info: Starting "/usr/lib/heartbeat/ipfail " as uid 90 gid 90 (pid 29554) heartbeat: 2006/03/16_00:30:18 info: Link host2.example.com:eth0 up. heartbeat: 2006/03/16_00:30:18 info: remote resource transition completed. heartbeat: 2006/03/16_00:30:18 info: remote resource transition completed. heartbeat: 2006/03/16_00:30:18 info: Local Resource acquisition completed. (none) heartbeat: 2006/03/16_00:30:18 info: Initial resource acquisition complete (T_RESOURCES(them))

heartbeat: 2006/03/16_00:30:18 info: Running /etc/ha.d/rc.d/status status

Now you have a working two node HA cluster..., you can test the cluster failover by doing the following
1) power down the Master ( init 5 , shutdown -h) 2) remove the ethernet link (eth0) on Master 3) stop Mysql on Master 4) etc ...

And you will see that the VIP ( 192.168.100.81 ) is now taken over by the slave, thus redirecting the client requests to the slave which is now the master, you will find something like this in the HA log for the Slave
heartbeat: 2006/03/15_13:21:23 WARN: node host1.example.com: is dead heartbeat: 2006/03/15_13:21:23 WARN: No STONITH device configured. heartbeat: 2006/03/15_13:21:23 WARN: Shared disks are not protected. heartbeat: 2006/03/15_13:21:23 info: Resources being acquired from host1.example.com. heartbeat: 2006/03/15_13:21:23 info: Link host1.example.com:eth0 dead. heartbeat: 2006/03/15_13:21:23 info: Running /etc/ha.d/rc.d/status status heartbeat: 2006/03/15_13:21:23 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys host2.example.com to acquire. heartbeat: 2006/03/15_13:21:23 info: Taking over resource group 192.168.100.81 heartbeat: 2006/03/15_13:21:23 info: Acquiring resource group: host1.example.com 192.168.100.81 mysql mon heartbeat: 2006/03/15_13:21:24 info: Running /etc/ha.d/resource.d/IPaddr 192.168.100.81 start heartbeat: 2006/03/15_13:21:24 info: /sbin/ifconfig sw0:0 192.168.100.81 netmask 255.255.255.0 broadcast 192.168.100.255 heartbeat: 2006/03/15_13:21:24 info: Sending Gratuitous Arp for 192.168.100.81 on sw0:0 [sw0] heartbeat: 2006/03/15_13:21:24 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.100.81 sw0 192. 168.100.81 auto 192.168.100.81 ffffffffffff heartbeat: 2006/03/15_13:21:24 info: Running /etc/init.d/mysql start heartbeat: 2006/03/15_13:21:24 info: Running /etc/init.d/mon start heartbeat: 2006/03/15_13:21:25 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired heartbeat: 2006/03/15_13:21:25 info: mach_down takeover complete. heartbeat: 2006/03/15_13:21:25 info: mach_down takeover complete for node host1.example.com.

Hope this helps somebody to setup their Mysql failover clusters..........

Kishore Jalleda http://kjalleda.googlepages.com

Вам также может понравиться