Академический Документы
Профессиональный Документы
Культура Документы
2 and later)
===================================================
kdump configuration
===================
1. Install the kdump, kexec-tools,
rpm -ivh kexec-tools-1.102pre-75.el5.x86_64.rpm
2. Add the following to the kernel line(s) in /boot/grub/grub.conf
crashkernel=128M@16M
like: --
========
title Red Hat Enterprise Linux (2.6.32-358.el6.i686)
root (hd0,0)
kernel /vmlinuz-2.6.32-358.el6.i686 ro root=/dev/mapper/VolGroup-lv_root
nomodeset rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFO
NT=latarcyrheb-sun16 crashkernel=128M@16M rd_LVM_LV=VolGroup/lv_root KEYBOARDTYP
E=pc KEYTABLE=us rd_NO_DM rhgb quiet
initrd /initramfs-2.6.32-358.el6.i686.img
FENCE_KDUMP configure
=====================
In Red Hat Enterprise Linux 6.2 and later, the fence_kdump agent can be used to
detect that a failed cluster node has entered the kdump crash recovery service a
nd mark the node as fenced.
Using the fence_kdump agent will result is significantly shorter recovery time a
s compared to using post_fail_delay. When using post_fail_delay, recovery will n
ot complete until kdump core collection has completed. The fence_kdump agent red
uces recovery time since fencing will complete as soon as the cluster is notifie
d that a failed node has entered the kdump crash recovery service, allowing the
cluster to recovery prior to the completion of kdump core collection.
The fence_kdump agent must be used in conjunction with another fence agent. It m
ust not be used by itself. The fence_kdump agent is only capable of detecting th
at a cluster node has entered the kdump kernel. Other events that required fenci
ng (eg. network outage) must be handled by other fencing methods.
The fence_kdump fence agent has two components:
===============================================
1. fence_kdump: The fencing agent. When fencing occurs, this agent will list
en for a message from the node that is being fenced. If the agent does not recei
ve a message from the failed node within a certain amount of time, the agent ret
urns failure and other fencing methods should be attempted. If the agent does re
ceive a message from the failed node, the agent returns success and the node is
considered to be fenced.
2 fence_kdump_send: The utility that sends the message. This is normally run
from within the kdump kernel while the kdump crash recovery service is performi
ng core collection. Messages will be sent continuously at a regular interval to
all nodes in the cluster.
To use the fence_kdump agent, first edit the /etc/cluster/cluster.conf file.
# ccs -h node2 --getconf
<cluster config_version="43" name="Cluster_BD">
<fence_daemon post_fail_delay="5" post_join_delay="10"/>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="kdump">
<device name="kdump"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="kdump">
<device name="kdump"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_kdump" name="kdump" port="8452" timeout="180"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="failover1" nofailback="0" ordered="1" restricted="0"
>
<failoverdomainnode name="node1" priority="1"/>
<failoverdomainnode name="node2" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/CLUSTERVG/lv1" fstype="ext4" mountpoint="/cluster/fs1" na
me="fs"/>
</resources>
<service domain="failover1" name="FS" recovery="relocate">
<fs ref="fs"/>
</service>
</rm>
</cluster>
4. It is important to note that if a node fails for any reason other tha
n a kernel panic, the total recovery time will be delayed by the time that fence
_kdump waits for a message. In the example above, if "node1" fails for any reaso
n other than a kernel panic, the fence_apc agent will not attempt to fence the n
ode until fence_kdump has returned failure after 120 seconds.
Once the /etc/cluster/cluster.conf file has been modified to use fence_kdump, re
start the kdump service.
This step is required so that the kdump service detects that it should send mess
ages to the cluster nodes. The kdump service will detect that the /etc/cluster/c
luster.conf file has changed and rebuild the kexec initrd image. When this occur
s, kdump will extract a list of cluster nodes that should received notification
messages when the node enters the kdump crash recovery service. For the reason,
the kdump service should be restarted whenever the /etc/cluster/cluster.conf is
modified.
<fencedevice name="kdump" agent="fence_kdump" timeout="180" port
="8452"/>
# ccs -h node1 --addfencedev kdump1 agent="fence_kdump" port="8
452" timeout="180"
# ccs -h node1 --addmethod kdump1 node1
# ccs -h node1 --addfenceinst kdump1 node1 kdump1
5. By default, the fence_kdump agent will listen for UDP messages on por
t 7410. The default timeout is 60 seconds. These values can be modified by setti
ng parameters for the fence_kdump fence device:
In this example, the fence_kdump agent will listen for UDP messages on port 8452
and will timeout if no message is received after 180 seconds.
6. With the kdump service enabled and running, bring up the cluster by s
tarting the cman service on each node.
# service cman start/restart
7. The fence_kdump agent can now be tested by forcing a node to panic.
For this example, assume that the node being forced to panic is "node1" with an
IP address of 192.168.1.4.
# echo c > /proc/sysrq-trigger
8. Once the node has entered the kdump kernel, fence_kdump_send will beg
in sending messages to all cluster nodes. Of the remaining cluster nodes, the no
de with the lowest node ID will be responsible for fencing the failed node "node
1". Inspecting /var/log/messages on the node performing the fence operation shou
ld show the following:`
#tail -f /var/log/messages