Вы находитесь на странице: 1из 43

VCS QUSTION&ANSWERS

1.How do check the status of VERITAS Cluster Server aka VCS ?


Ans: hastatus sum
2. Which is the main config file for VCS and where it is located?
Ans: main.cf is the main configuration file for VCS and it is located in
/etc/VRTSvcs/conf/config.
3. Which command you will use to check the syntax of the main.cf ?
Ans: hacf -verify /etc/VRTSvcs/conf/config
4. How will you check the status of individual resources of VCS cluster?
Ans: hares state
5. What is the service group in VCS ?
Ans: Service group is made up of resources and their links which you
normally requires to maintain the HA of application.
6. What is the use of halink command ?
Ans: halink is used to link the dependencies of the resources
7. What is the difference between switchover and failover ?
Ans: Switchover is an manual task where as failover is automatic. You
can switchover service group from online cluster node to offline cluster
node in case of power outage, hardware failure, schedule shutdown and
reboot. But the failover will failover the service group to the other node
when VCS heartbeat link down, damaged, broken because of some
disaster or system hung.
8. What is the use of hagrp command ?
Ans: hagrp is used for doing administrative actions on service groups like
online, offline, switch etc.
9. How to switchover the service group in VCS ?
Ans: hagrp switch -to
10. How to online the service groups in VCS ?
Ans: hagrp online sys

Unix question and answers


Q: How do you display your running kernel version? (Solaris, AIX, Linux)
A: Linux # uname r , Solaris # showrev
Q: Which command do you use to display a table of running processes?
(Solaris, AIX, Linux)
A: Linux # ps ef and top , Solaris # prstat
Q: Which file do you modify to configure a domain name resolver? (Solaris,
AIX, Linux)
A: Linux # /etc/resolv.conf , Solaris # /etc/resolv.conf
Q: Which file contains a list of locally defined hostnames and corresponding
IP addresses? (Solaris, AIX, Linux)
A: Linux # /etc/hosts , Solaris # /etc/hosts and linked file /etc/inet/hosts
Q: How do you display a routing table? (Solaris, AIX, Linux)
A: Linux # ip route show or #netstat nr or #route n and Solaris # netstat
nr and #route -n
Q: Which command would you use to view partitions and their sizes on
Solaris?
A: # df -kh
Q: Which OpenBoot command would you use to print/view OpenBoot
environment variables on a SUN server?
A: #printenv
Q: What does ypwhich command do? (Solaris, AIX, Linux)
A: # Will display NIS server to which client is connected to and which NIS
Server is master for particular map specified with this command
Q: which command would you use to create an OS user on Solaris and
Linux?
A: Linux # useradd and Solaris #useradd
Q: Which file contains passwords for local users on Solaris, Linux and on

AIX?
A: Linux #/etc/shadow and Solaris # /etc/shadow
Q: Which command would you use to list partitions on Linux?
A: Linux # mount l or # df -kh
Q: Which command/commands would you use to manage installed packages
on RedHat Linux?
A: Linux # rpm
Q: What is the default port for SSH server?
A: 22
Q: Which command/commands would you use to manage installed packages
on Solaris?
A: #pkginfo #pkgrm # pkgadd #pkgchk
Q: What command would you use to install an OS patch on Solaris?
A: #showrev p and #patchadd -p
Q: Which Veritas command would you use to display a list of Veritas
volumes?
A: # vxprint
Q: Which Veritas command would you use to display a list of disks on a
system?
A: # vxdisk list

Q: Which file has a list of filesystems to be mounted at boot time on Solaris,


Linux and AIX?
A: Linux # /etc/fstab and Solaris #/etc/vfstab
Q: Which Veritas Cluster Server command would you use to display the
status of a cluster and its resources?
A: clustat and to manage the cluster configruation use clusvcadm
Q: Which command would you use to rename a disk for VMware Guest
virtual machine on ESX server 3 storage volume?
A: the best way is clone vm to different datastore or in the same datastore

with different name


vmkfstools -i \\vmfs\old_vm.vmdk \\vmfs\new_vm.vmdk
this will take care of it all
Q: Which command would use on VMware ESX 3 server to display virtual
switch configuration?
A: esxcfg-vswitch -l or user esxcfg-vswitch -help to see all options
1. How do you replace a failed boot disk under Meta in
Solaris? Step by step explanation?
Ans: 1) Checks the disks available: #metastat p
2) Detach the one with errors: #metadetach d1 d2
If you get error here : #metadetach -f d1 d2
3) Now check the status again: # metastat -p
.........................................................
2. How do you remove Meta only for the root slice?
Remaining slices should run under Meta?
Ans: ????????????
.........................................................
3. What you would do if you want to replace a slice using
metareplace option?
Ans: We use metareplace usually when we have have faulty
submirrors and once replaced it resyncs with the failed
components.
1) Find the meta state databases on the slice
#metadb -i
2) If any meta state databases exist remove them
#metadb d c0t0d0sX where x is slice number
3) Once the meta state databases are removed we can use
cfgadm to unconfigure the device
#cfgadm -c unconfigure diskname
Once unconfigured replace the disk and configure it as
below
4)#cfgadm c configure diskname
5) Now copy the Volume Table Of Contents (VTOC) to the new
disk
#prtvtoc /dev/rdsk/devicename | fmthard -s -

/dev/rdsk/devicename
6) Once the VTOC is in place now use metareplace command to
replace the faulty meta devices
#metareplace -e d11 devicename
( Where d11 is the
meta device )
...........................................................
4. What is the significance of 51% state database replicas
in SVM?
Ans: A state database is the collections of multiple,
replicated database copies and each copy is considered as a
state database replica.
If the solaris box loses a state database replica, SVM
should figure out which state database replicas still
contain valid data and boot using the valid ones and this
is achieved by using majority consensus algorithm and
according to this algorithm we need half+1 number of state
database replicas before it finds for a valid data.
And for the above reason we need to recreate at least three
state database replicas when we setup a disk configuration.
If all the three database replicas are corrupted meaning we
lose all data stored on svm volumes.
Hence its good to create as many replicas on separate
drives across controllers.
..........................................................
5. What are the common errors you find in Solaris Volume
manager?
Ans: 1) Disks failures (Boot device failures) 2)
Insufficient state database replicas issues 3) Wrong
entries in /etc/vfstab file
..........................................................
6. You have a boot disk under svm, the machine fails to
boot and remains in ok prompt? what could be the possible
reason?
Ans: 1) May be issue with /etc/system file 2) Root file

system corrupted 3) Wrong entries in /etc/vfstab file


..........................................................
7. Metastat -p shows a metavolume needs replacement.
Metavolume is a single way mirror only. Actually you find
disk and metavolumes are ok and I/O is happening to the
file systems how will you remove the metareplace message
that comes out of metastat.
Ans: Do a two way mirror and check the metastat p.
( If you still see that message detach and attach the disk
using metadetach )
..........................................................
8. How to create a shared disk group in VxVM?
Ans: Creating shared dg form the existing dgs
1)
List all the dg available: #vxdg list
2)
Find out the node that is master or slace using:
#vxdctl -c mode
3)
Deport the disk groups to be shared: #vxdg deport
<dg name>
4)
Import dgs to be shared: #vxdg -s import < dg
name> % Do this on master node
5)
To check the shared disk groups: #vxdg list
...........................................................
9. What is the difference between private and public
regions in Veritas Volume manager?
Ans: Private Region: It has meta data regarding the disk. A
copy of the configuration database is copied to each
private region with the disk group. It keeps 5 copies of
the configuration database. It is configured as slice 3.
Once the private region is created it cannot be changed.
Public region: This is the area where all users data is
stored. Usually it is configured as slice 4.
...........................................................
10. What would you do if the private region of a
particular disk group is full? What are the design

considerations for the size of private region in Vxvm disk


group?
Ans: Reinitialize the disk with new private region length
#vxdisksetup -I c1t0d0 privlen=2048
%Default 1024
...........................................................
11. How to replace a corrupt private region? in vxvm 3.5
and greater versions
Ans: This link might give you some information
http://www.symantec.com/business/support/index?
page=content&id=TECH1197
...........................................................
12. How would you convert a volume from gen to fsgen? why
should you do that?
Ans: 1) #vxprint -g dgname -rhmvps filename >
filename.vxoutput
2) open the file simplevol.vxout using your fav editor and
change the use_type field alone from gen to fsgen.
3. Save the file. make sure you edit the use_type field
alone
4. Umount the filesystem; umount /simplevol
5. stop the volume ; vxvol -g dgname stop simplevol
6. remove the volume ; vxedit -g dgname -rf rm simplevol
7. using vxmake rebuild the volume from the saved file ;
vxmake -g dgname -d simplevol.vxout
8. check the vxprint out; vxprint -g dgname -hrt ; check
for the usage type; it would be fsgen
9. start the volume; vxvol -g dgname start simplevol
10. do a fsck for the volume; fsck y /dev/vx/rdsk/dgname/simplevol
11. mount the filesystem ; mount /simplevol
gen assumes a volume does not contain filesystem and fsgen
assumes a volume contains a file system. Vxassist uses
fsgen as default type and vxmake uses gen as default type.
...........................................................

13. How can you unencapsulate a boot disk in VxVM?


Ans:
http://www.brandonhutchinson.com/Unencapsulating_a_root_disk
.html
...........................................................
14. How to identify multiple paths for a disk.
Ans: #vxdisk list <disk name>
...........................................................
15. What is the difference between Vxdmp and EMC powerpath?
Ans: vxdmp: The way the load balancing done is using round
robin approach
EMC Power path: It does it by knowing which path is
least loaded.
Powerpath will fail back a path once it becomes available.
The older version of DMP that we were using on Solaris you
had to manually bring back the other path. This could have
just been the limitations of Solaris as well. We ran Power
path on AIX and DMP on SUN.
...........................................................
16. vxdisk -o alldgs list o/p shows some disk groups in
braces What does that signify?
Ans: Those are shared disk groups.
...........................................................
17. What are the various layouts that are available in
VxVM?
Ans: 1) Mirroring (RAID-1)
2) Striping (RAID-0)
3) Concatenation and Spanning
4) Striping plus Mirroring (Mirrored-stripe or RAID
0+1)
5) Mirroring plus Striping (Striped mirror, RAID
1+0, OR RAID-10)

6) RAID-5 (Striping with parity).


............................................................
18. What is a layered volume? How to create it using
vxmake?
Ans: Using vxmake we need to build each object until we
create a volume and its very easy to use vxassist instead
vxmake.
1)
Create subdisk ( # vxmake g dg dgname sd
sdname )
2)
Create plex (#vxmake g dgname plex sd=sdname )
3)
Create volume (#vxmake g dgname -U fsgen vol
plex=plexname )
...........................................................
19. How to quickly mirror a volume, if the volume is empty?
Ans: With DRL : #vxassist g <diskgroup> make <vol name>
<length> layout=mirror, log <diskname> <diskname>
Without DRL : #vxassist g <diskgroup> make <volname>
<length> layout=mirror, nolog <diskname> <diskname>
...........................................................
20. How to grow a volume?
Ans: First check how much space you can increase using
#vxassist -g <diskgroup> maxgrow
After checking how much you can increase we can use the
below commands to increase volume
#vxassist -g <diskgroup> growby <volume> <len_to_grow_by>
or
#vxassist -g <diskgroup> growto <volume> <new_len>
...........................................................
21. What is the difference between failing and failed
disks?
Ans:
Failing disk

1) It shows read/write errors in /var/adm/messages.


2) As the time passes we can see increased number of hard
and transport error when checkd by iostat -En.
3)We can see the disk available when format is used
Failed disk
1)It shows disk not responding to selection message
2)It only shows the increased transport errors
3)Format command displays disk not available message
...........................................................
22. How to replace a failed disk in Veritas?
Ans: Can be removed using vxdiskadm and select option 5 (
To replace a failed disk)
23. Plex is in a disabled state. How will you recover? what
are the steps to follow?
Ans: 1) Place the plex in CLEAN state #vxmend -g dg
fix clean fix
2) To recover other plexes in a volume from CLEAN plex
#vxmend -g dg fix stale plex
3) Enable CLEAN plex # vxvol -g dg start volume
...........................................................
24. what is the difference between detached and
disassociate state of plexes?
Ans: Source: Symantec Docs
Detach State
Detaching a plex leaves the plex associated with its
volume, but prevents normal volume I/O from being directed
to the plex. This operation can be applied to plexes that
are enabled or disabled. The rules for performing the
detach depend upon the usage types of the volumes involved.
The operation does not apply to dissociated plexes.
Disassociate State
Dissociate each of the named plexes. Dissociating a plex
breaks the link between the plex and its volume. A
dissociated plex is inaccessible until it is reassociated,

which can be done either with vxplex att or with vxmake.


Any checks and synchronizations that apply to the det
operation also apply to the dis operation.
Plex dissociation is the normal means of unmirroring a
volume, or reducing the mirror count for a volume. To
support this use, -o rm can be used to dissociate and
remove the plex (and its associated subdisks) in the same
operation. This makes the space used by those subdisks
usable for new allocations (such as with vxassist or with
vxmake).
Plex dissociation can also be used for file system backups
of volumes that are normally mirrored. Plex devices are not
directly mountable, so the backup method described for the
det operation will not work if the backup program requires
a mounted file system. To support such backup programs, a
plex can be dissociated and can then be allocated to a new
volume, such as with the command:
vxmake -U gen vol volume plex=plex
The created volume can then be started and mounted for use
by the backup program.
...........................................................
25. Whats the boot process of VxVM?
Ans: During the solaris boot process once it reads
the /etc/system file and is supposed to boot from veritas
volumes below are the two lines that need to be placed
in /etc/system file such that it boot using veritas root
device.
1.
rootdev:/pseudo/vxio@0:0
2.
set vxio:vol_rootdev_is_volume=1
...........................................................
26. Whats the difference between SVM and VxVM? What would
you recommend to your clients? why?
Ans:
SVM

1)Comes by default with Solaris 9/10


2)We cannot shrink volume in SVM
VxVM
1) Third party software where we need to install it
2) We can shrink volume in VxVM
............................................................
27. What are the various clusters you have worked on?
Ans: Never worked on all but can give below answer if you
have worked on any of them:
1)
Sun Cluster
2)
Veritas cluster
3)
HACMP High Availability Cluster Multiprocesing
...........................................................
28. Which cluster is better VCS or Sun cluster?
Ans: Depends on the taste of the guy and budget they have
to implement the cluster.
............................................................
29. Compare and contrast VCS and Sun Cluster.
Ans: Source: azlabs blog
Sun Cluster
1) Kernel-based Fatser in failure detection
2) It runs only on Solaris systems ( Platform dependent)
Veritas Cluster
1)Software based
2)Works on multiple OS ( Platform independent)
...........................................................
30.how will you start VCS service????? What are the

configuration files in VCS?


Ans: To start an agent: #haagent - start agentname -sys
To start cluster : #hastart
Configuration file: /etc/VRTSvcs/conf/config/main.cf
...........................................................
31. How would switch a service group?
Ans: #hagrp switch <service group name> -to <system name>
............................................................
32. How would you freeze a service group?
Ans: #haconf -makerw
#hagrp -freeze <group name> -persistent
#haconf -dump -makero
...........................................................
33. What is a Split brain scenario ?
Ans: The situation when two or more clusters cannot
communicate with each other and each node thinks that it
owns the resources is said to be split brain scenario

Monday, January 10, 2011

VCS Troubleshooting
Cluster Not Up -- HELP
The normal debugging of steps includes: checking on status, restarting
if no faults, checking licenses, clearing faults if needed, and checking
logs.
To find out Current Status:
/opt/VRTSvcs/bin/hastatus -summary
This will give the general status of each machine and processes
/opt/VRTSvcs/bin/hares -display
This gives much more details - down to the resource level.
If hastatus fails on both machines (it returns that the cluster is not up
or returns nothing), try to start the cluster
/opt/VRTSvcs/bin/hastart
/opt/VRTSvcs/bin/hastatus -summary
will tell you if processes started properly. It will NOT start processes on
a FAULTED system.
Starting Single System NOT Faulted
If the system is NOT FAULTED and only one system is up, the cluster
probably needs to have gabconfig manually started. Do this by
running:
/sbin/gabconfig -c -x
/opt/VRTSvcs/bin/hastart
/opt/VRTSvcs/bin/hastatus -summary
If the system is faulted, check licenses and clear the faults as
described next.
To check licenses:
vxlicense -p

Make sure all licenses are current - and NOT expired! If they are
expired, that is your problem. Call VERITAS to get temporary licenses.
There is a BUG with veritas licences. Veritas will not run if there are
ANY expired licenses -- even if you have the valid ones you need. To
get veritas to run, you will need to MOVE the expired licenses. [Note:
you will minimally need VXFS, VxVM and RAID licenses to NOT be
expired from what I understand.]
vxlicense -p
Note the NUMBER after the license (ie: Feature name:
DATABASE_EDITION [100])
cd /etc/vx/elm
mkdir old
mv lic.number old [do this for all expired licenses]
vxlicense -p [Make sure there are no expired licenses AND your good
licenses are there]
hastart
If still fails, call veritas for temp licenses. Otherwise, be certain to do
the same on your second machine.
To clear FAULTS:
hares -display
For each resource that is faulted run:
hares -clear resource-name -sys faulted-system
If all of these clear, then run hastatus -summary and make sure that
these are clear. If some don't clear you MAY be able to clear them on
the group level. Only do this as last resort:
hagrp -disableresources groupname
hagrp -flush group -sys sysname
hagrp -enableresources groupname
To get a group to go online:
hagrp -online group -sys desired-system

If it did NOT clear, did you check licenses?


Bringing up Machines when fault will NOT clear:
System has the following EXACT status:
gedb002# hastatus -summary
-- SYSTEM STATE
-- System State Frozen
A gedb001 RUNNING 0
A gedb002 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B oragrp gedb001 Y N OFFLINE
B oragrp gedb002 Y N OFFLINE
gedb002# hares -display | grep ONLINE
nic-qfe3 State gedb001 ONLINE
nic-qfe3 State gedb002 ONLINE
gedb002# vxdg list
NAME STATE ID
rootdg enabled 957265489.1025.gedb002
gedb001# vxdg list
NAME STATE ID
rootdg enabled 957266358.1025.gedb001
Recovery Commands:
hastop -all
on one machine hastart
wait a few minutes
on other machine hastart
Reviewing Log Files:

If you are still having troubles, look at the logs in /var/VRTSvcs/log.


Look at the most recent ones for debugging purposes (ls -ltr). Here is
a short description of the logs in /var/VRTSvcs/log:
hashadow-log_A: hashadow checks to see if the ha cluster daemon
(had) is up and restarts it if needed. This is the log of that process.
engine.log_A: primary log, usually what you will be reading for
debugging
Oracle_A: oracle process log (related to cluster only)
Sqlnet_A: sqlnet process log (related to cluster only)
IP_A: related to shared IP
Volume_A: related to Volume manager
Mount_A: related to mounting actual filesystes (filesystem)
DiskGroup_A: related to Volume Manager/Cluster Server
NIC_A: related to actual network device
By looking at the most recent logs, you can know what failed last (or
most recently). You can also tell what did NOT run which may be jut as
much of a clue. Of course, if none of this helps, open a call with veritas
tech support.
Calling Tech Support:
If you have tried the previously described debugging methods, call
Veritas tech support: 800-634-4747. Your company needs to have a
Veritas support contract.
Restarting Services:
If a system is gracefully shutdown and it was running oracle or other
high availability services, it will NOT transfer them. It only transfers
services when the system crashes or has an error.
hastart
hastatus -summary
will tell you if processes started properly. It will NOT start processes on
a FAULTED system. If the system is faulted, clear the faults as
described above.
Doing Maintenance on DBs:

BEFORE working on DB
Run hastop -all -force
AFTER working on Dbs:
You MUST bring up oracle on same machine
Once Oracle is up, run:
hastart on the same machine as you started the work on (the first on
system with oracle running)
wait 3-5 minutes then run hastart on the other system
If you need the instance to run on the other system, you can run:
hagrp -switch oragrp -to othersystem
Shutting down db machines:
If you shutdown the machine that is running veritas cluster, it will NOT
start on the other machine. It only fails over if the machine crashes.
You need to manually switch the services if you shutdown the
machine. To switch processes:
Find out groups to transfer over
hagrp -display
Switch over each group
hagrp -switch group-to-move -to new-system
Then shutdown machine as desired. When rebooted will start cluster
daemon automatically.
Doing Maintenance on Admin Network:
If the admin network is brought down (that the veritas cluster uses),
veritas WILL fault both machines AND bring down oracle (nicely). You
will need to do the following to recover:
hastop -all
On ONE machine: hastart
wait 5 minutes
On other machine: hastart
Manual start/stop WITHOUT veritas cluster:
THIS IS ONLY USED WHEN THERE ARE DB FAILURES
If possible, use the section on DB Maintenance. Only use this if system

fails on coming up AND you KNOW that it is due to a db configuration


error. If you manually startup filesystems/oracle -- manually shut
them down and restart using hastart when done.
To startup:
Make sure ONLY rootdg volume group is active on BOTH NODEs. This
is EXTREMELY important as if it is active on both nodes corruption
occurs. [ie. oradg or xxoradg is NOT present]
vxdg list
hastatus (stop on both as you are faulted on both machines )
hastop -all (if either was active make sure you are truly shutdown!)
Once you have confirmed that the oracle datagroup is not active, on
ONE machine do the following:
vxdg import oradg [this may be xxoradg where xx is the client 2 char
code]
vxvol -g oradg startall
mount -F vxfs /dev/vx/dsk/oradg/name /mountpoint [Find volumes
and mount points in /etc/VRTSvcs/conf/config/main.cf]
Let DBAs do their stuff
To shutdown:
umount /mountpoint [foreach mountpoint]
vxdg deport oradg
vxvol -g oradg stopall
clear faults; start cluster as described above

Tuesday, March 30, 2010


Sample Exercise to Create a High Availability Service
Group in VCS:
Sample Process Script:
Copy and Save the below script under your mount point (Eg:
/mohi/loopy)
#!/bin/ksh

# Loopy script for VCS class.


#############################
#
# $1 is Service Group name
# $0 is name of shell script being executed
#
while true
do
echo `date` ${1} Loopy is alive >> ${0}out
sleep 4
echo `date` ${1} Loopy is still alive >> ${0}out
sleep 4
done

Service Group Name: mohisg


Participating Nodes: sys11 and sys12
Resources:
Network Resources:
NIC Resource Name: mohinic
IP Resource Name: mohiip
Disk Resources:
DiskGroup Name: mohidg
Volume Name: mohivol
Mount Point: mohimount
Process Resource:
Process Name: mohiprocess
Creating Service Group.
+++++++++++++++++
# haconf -makerw
# hagrp -add mohisg
# hagrp -modify mohisg SystemList sys11 0 sys22 1
# hagrp -modify mohisg AutoStartList sls11
# hagrp -display mohisg
# haconf -dump
# view /etc/VRTSvcs/conf/config/main.cf

Adding a NIC resource to a service group


+++++++++++++++++++++++++++
# hares -add mohinic NIC mohisg
# hares -modify mohinic Critical 0
# hares -modify mohinic Device hme0
# hares -modify mohinic NetworkHosts 192.168.1.11
# hares -modify mohinic Enabled 1
# hares -state mohinic
Adding a IP resource
++++++++++++++++
# hares -add mohiip IP mohisg
# hares -modify mohiip Critical 0
# hares -modify mohiip Device hme0
# hares -modify mohiip Address 192.168.1.92
# hares -modify mohiip Address 192.168.1.93
# hares -modify mohiip Enabled 1
# hares -online mohiip -sys sls11
# hares -state mohiip
# hastatus -sum
# haconf -dump
Adding a DiskGroup resource
++++++++++++++++++++++
# hares -add mohidg DiskGroup mohisg
# hares -modify mohidg Critical 0
# hares -modify mohidg DiskGroup mohidg
# hares -modify mohidg mohidg Enabled 1
# hares -online mohidg -sys sls11
# hares -state mohidg
# vxdg list | grep mohidg
# haconf -dump
# view main.cf
Adding Volume resource
++++++++++++++++++
# hares -add mohivol Volume mohisg
# hares -modify mohivol Critical 0
# hares -modify mohivol Volume mohivol

# hares -modify mohivol DiskGroup mohidg


# hares -modify mohivol Enabled 1
# hares -display mohivol
# hares -online mohivol -sys sls11
# hares -state mohivol
# vxprint -g mohidg
# haconf -dump
Adding a MountPoint resource
+++++++++++++++++++++++
# hares -add mohimount mount mohisg
# hares -modify mohimount Critical 0
# hares -modify mohimount MountPoint /mohi
# hares -modify mohimount BlockDevice /dev/vx/dsk/mohidg/mohivol
# hares -modify mohimount FSType vxfs
# hares -modify mohimount FSCKopt %-y
# hares -modify mohimount Enabled 1
# hares -display mohimount
# hares -online mohimount -sys sls11
# hares -state mohimount
# haconf -dump
# view main.cf
Adding a Process Resource
++++++++++++++++++++
# hares -add mohiprocess Process mohisg
# hares -modify mohiprocess Critical 0
# hares -modify mohiprocess PathName /bin/sh
# hares -modify mohiprocess Arguments "/mohi/loopy mohisg"
# hares -modify mohiprocess Enabled 1
# hares -display mohiprocess
# hares -online mohiprocess -sys sls11
# hares -state mohiprocess
# ps -ef | grep loopy
# haconf -dump
# view main.cf
Linking Resources in the service group
+++++++++++++++++++++++++++++
# hares -link mohiip mohinic

# hares -link mohivol mohidg


# hares -link mohimount mohivol
# hares -link mohiprocess mohiip
# hares -link mohiprocess mohimount
# hares -dep | grep mohisg
# haconf -dump -makerw
Testing the Service Group
++++++++++++++++++++
#hastatus -sum
#hagrp -switch mohisg -to sys12
#hagrp -state mohisg
#hagrp -switch mohisg -to sys11
#hagrp -state mohisg
Posted byMohi (a) Mohankumar Gandhiat6:12 AM0 comments
Labels:Sample Exercise

Things to Remember
Service Group: Collection of dependent Resources
Resource: Anything that the end user requires
Resource Type: Collection of the resources with same type
Agents: To manage the Resource Types (Start,Stop and Monitor)
Service Group Online: Child Resource to Parent Resource
Service Group Offline: Parent Resource to Child Resource
LLT Files
/etc/llthosts
/etc/llttab
GAB Files:
/etc/gabtab
Manipulating Service Groups:
1. hagrp -offline AppSG -sys S1 -localclus --> Offline the AppSG only
in S1 system (node)
2. hagrp -offline OracleSG -any --> Offline the OracleSG in all the
systems
3. hagrp -online AppSG -sys S2 -localclus --> Online the AppSG in
node S2
4. hagrp -switch AppSG -to S1 --> AppSG will be moved to node S1
Manipulating Resources:

1. hares -offline Oralistener -sys S3 --> Bring offline the Oralistener


resource in node S3
2. hares -online ipres -sys S2 -> Bring online the ipres resource in
node S2
Handling VCS services:
haconf -dump -makero --> sync the RAM's main.cf with hardisk's
main.cf and make the status as Readonly
hastop -all --> Stop the application and cluster
hastop -all -force --> Application will be continue running but the
cluster service has been stopped
hastop -local --> stop the cluster service in local node
Posted byMohi (a) Mohankumar Gandhiat3:23 AM0 comments
Labels:Things to Remember

Useful Commands
SERVICE GROUPS AND RESOURCE OPERATIONS:
Configuring service groups
hagrp add|-delete|-online|-offline group_name
Modifying resources
hares add|-delete res_name type group
hares online|-offline res_name sys system_name
Modifying agents
haagent start|-stop agent_name sys system_name
BASIC CONFIGURATION OPERATIONS:
Service Goups
hagrp -modify group_name attribute_name value
hagrp list group_name
hagrp value attribute_name
Resources
hares -modify res_name attribute_name value
hares -link res_name res_name
Agents
haagent -display agent_name sys system_name
hatype modify

VCS ENGINE OPERATIONS:


Starting had
hastart force|stale system_name
hasys force system_name
Stopping had
hastop local|-all|-force|-evacuate
hastop sys system_name
Adding Users
hauser add user_name
STATUS AND VERIFICATION:
Group Status/Verification
hagrp -display group_name|state|resource group_name
Resources Status/Verification
hares -display res_name
hares list
hares -probe res_name sys system_name
Agents Status/Verification
haagent list
haagent -display agent_name sys system_name
ps ef|grep agent_name
VCS Status
hastatus group
LLT Status/Verification
lltconfig a list
lltstat|lltshow|lltdump
GAB Status/Verification
gabconfig a
gabdiskhb l
COMMUNICATION:
Starting and Stopping LLT
lltconfig -U
lltconfig -c
lltconfig -a list
Starting and Stopping GAB

gabconfig c n #seed number (eg: gabconfig -c -n 2)


gabconfig U
Administering Group Services
hagrp clear|-flush|-switch group_name sys system_name
Administering Resources
hares clear|-probe res_name sys system_name
Administering Agents
haagent -list
haagent -display agent_name sys system_name
Verify Configuration
hacf verify
Posted byMohi (a) Mohankumar Gandhiat3:00 AM0 comments
Labels:Useful Commands

VCS/VxVm vs HACMP/AIX/LVM

Posted byMohi (a) Mohankumar Gandhiat2:51 AM0 comments


Labels:VCS vs HACMP

VCS Concepts
Concepts
VCS is built on three components: LLT, GAB, and HAD.
LLT (Low-Latency Transport)
veritas uses a high-performance, low-latency protocol for cluster
communications. LLT runs directly on top of the data link provider
interface (DLPI) layer ver ethernet and has several major junctions:
sending and receiving heartbeats
monitoring and transporting network traffic over multiple network
links to every active system within the cluster
load-balancing traffic over multiple links
maintaining the state of communication
providing a nonroutable transport mechanism for cluster
communications.
Group membership services/Atomic Broadcast (GAB)
GAB provides the following:
Group Membership Services - GAB manitains the overall cluster

membership by the way of its Group Membership Sevices function.


Heartbeats are used to determine if a system is active member,
joining or leaving a cluster. GAB determines what the position of a
system is in within a cluster.
Atomic Broadcast - Cluster configuration and status information is
distributed dynamically to all system within the cluster using GAB's
Atomic Broadcast feature. Atomic Broadcast ensures all active system
receive all messages, for every resource and service group in the
cluster. Atomic means that all system receives the update, if one fails
then the change is rolled back on all systems.
High Availability Daemon (HAD)
The HAD tracks all changes within the cluster configuration and
resource status by communicating with GAB. Think of HAD as the
manager of the resource agents. A companion daemon called
hashadow moniotrs HAD and if HAD fails hashadow attempts to restart
it. Like wise if hashadow daemon dies HAD will restart it. HAD
maintains the cluster state information. HAD uses the main.cf file to
build the cluster information in memory and is also responsible for
updating the configuration in memory.
VCS architecture
So putting the above altogether we get:
Agents monitor resources on each system and provide status to HAD
on the local system
HAD on each system send status information to GAB
GAB broadcasts configuration information to all cluster members
LLT transports all cluster communications to all cluster nodes
HAD on each node takes corrective action, such as failover, when
necessary
Service Groups
There are three types of service groups:
Failover - The service group runs on one system at any one time.
Parallel - The service group can run simultaneously pn more than
one system at any time.
Hybrid - A hybrid service group is a combination of a failover service
group and a parallel service group used in VCS 4.0 replicated data

clusters, which are based on Veritas Volume Replicator.


When a service group appears to be suspended while being brought
online you can flush the service group to enable corrective action.
Flushing a service group stops VCS from attempting to bring resources
online or take them offline and clears any internal wait states.
Resources
Resources are objects that related to hardware and software, VCS
controls these resources through these actions:
Bringing resource online (starting)
Taking resource offline (stopping)
Monitoring a resource (probing)
When you link a parent resource to a child resource, the dependency
becomes a component of the service group configuration. You can view
the dependencies at the bottom of the main.cf file.
Proxy Resource
A proxy resource allows multiple service groups to monitor the same
network interface. This reduces the network traffic that would result
from having multiple NIC resources in different service groups
monitoring the same interface.
Example for Proxy Resource:
Proxy PreProd_proxy (
Critical = 0
TargetResName = PreProd_MultiNICB
)
Phantom Resource
The phantom resource is used to report the actual status of a service
group that consists of only persistent resources. A service group shows
an online status only when all of its nonpersistent resources are online.
Therefore, if a service group has only persistent resources (network
interface), VCS considers the group offline, even if the persistent
resources are running properly. By adding a phantom resource, the
status of the service group is shown as online.
Example for Phantom:

Phantom Phantom_NIC (
)
Configuration
VCS configuration is fairly simple. The three configurations to worry
about are LLT, GAB, and VCS resources.
LLT
LLT configuration requires two files: /etc/llttab and /etc/llthosts. llttab
contains information on node-id, cluster membership, and heartbeat
links. It should look like this:
# llttab -- low-latency transport configuration file
# this sets our node ID, must be unique in cluster
set-node 0
# set the heartbeat links
link hme1 /dev/hme:1 - ether - # link-lowpri is for public networks
link-lowpri hme0 /dev/hme:0 - ether - # set cluster number, must be unique
set-cluster 0
start
The "link" directive should only be used for private links. "link-lowpri"
is better suited to public networks used for heartbeats, as it uses less
bandwidth. VCS requires at least two heartbeat signals (although one
of these can be a communication disk) to function without complaints.
The "set-cluster" directive tells LLT which cluster to listen to. The llttab
needs to end in "start" to tell LLT to actually run.
The second file is /etc/llthosts. This file is just like /etc/hosts, except
instead of IP->hostnames, it does llt node numbers (as set in setnode). You need this file for VCS to start. It should look like this:
0 daldev05
1 daldev06
GAB
GAB requires only one configuration file, /etc/gabtab. This file lists the
number of nodes in the cluster and also, if there are any
communication disks in the system, configuration for them. Ex:
/sbin/gabconfig -c -n2
tells GAB to start GAB with 2 hosts in the cluster. To specify VCS
communication disks:

/sbin/gabdisk -a /dev/dsk/cXtXdXs2 -s 16 -p a
/sbin/gabdisk -a /dev/dsk/cXtXdXs2 -s 144 -p h
/sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 16 -p a
/sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 144 -p h
-a specifies the disk, -s specifies the start block for each
communication region, and -p specifies the port to use, "a" being the
GAB seed port and "h" the VCS port. The ports are the same as the
network ports used by LLT and GAB, but are simulated on a disk.
VCS
The VCS configuration file(s) are in /etc/VRTSvcs/conf/config. The
two most important files are main.cf and types.cf. I like to set
$VCSCONF to that directory to make my life easier. main.cf contains
the actual VCS configuration for Clusters, Groups, and Resources,
while types.cf contains C-like prototypes for each possible Resource.
The VCS configurationis very similar to the C language, but all you are
doing is defining variables. Comments are "//" (if you try to use #'s,
you'll be unhappy with the result), and you can use "include"
statements if you want to break up your configuration to make it more
readable. One file you must include is types.cf.
In main.cf, you need to specify a Cluster definition:
cluster iMS ( )
You can specify variables within this cluster definition, but for the most
part, the defaults are acceptible. Cluster variables include maximum
number of groups per cluster, link monitoring, log size, maximum
number of resources, maximum number of types, and a list of user
names for the GUI that you will never use and shouldn't install.
You then need to specify the systems in the cluster:
system daldev05 ( )
system daldev06 ( )
These systems must be in /etc/llthosts for VCS to start.
You can also specify SNMP settings for VCS:
snmp vcs (
Enabled = 1
IPAddr = 0.0.0.0
TrapList = { 1 = "A new system has joined the VCS Cluster",
2 = "An existing system has changed its state",
3 = "A service group has changed its state",
4 = "One or more heartbeat links has gone down",
5 = "An HA service has done a manual restart",
6 = "An HA service has been manually idled",
7 = "An HA service has been successfully started" }
)
IPAddr is the IP address of the trap listener. Enabled defaults to 0, so

you need to include this if you want VCS to send traps. You can also
specify a list of numerical traps; listed above are the VCS default
traps.
Each cluster can have multiple Service Group definitions. The most
basic Service Group looks like this:
group iMS5a (
SystemList = { daldev05, daldev06 }
AutoStartList = { daldev05 }
)
You can also set the following variables (not a complete list):
FailOverPolicy - you can set which policy is used to determine which
system to fail over to, choose from Priority (numerically based on
node-id), Load (system with the lowest system load gets failover), or
RoundRobin (system with the least number of active services is
chosen).
ManualOps - whether VCS allows manual (CLI) operation on this
Group
Parallel - indicats if the service group is parallel or failover
Inside each Service Group you need to define Resources. These are
the nuts and bolts of VCS. A full description of the bundled Resources
can be found in the Install Guide and a full description of the
configuration language can be found in the User's Guide.
Here are a couple of Resource examples:
NIC networka (
Device = hme0
NetworkType = ether
)
IP logical_IPa (
Device = hme0
Address = "10.10.30.156"
)
The first line begins with a Resource type (e.g. NIC or IP) and then a
globally unique name for that particular resource. Inside the paren
block, you can set the variables for each resource.
Once you have set up resources, you need to build a resource
dependancy tree for the group. The syntax is "child_resource requires
parent_resource." A dependancy tree for the above resources would
look like this:
logical_IPa requires networka
The dependancy tree tells VCS which resources need to be started
before other resources can be activated. In this case, VCS knows that
the NIC hme0 has to be working before resource logical_IPa can be
started. This works well with things like volumes and volumegroups;

without a dependancy tree, VCS could try to mount a volume before


importing the volume group. VCS deactivates all VCS controlled
resources when it shuts down, so all virtual interfaces (resource type
IP) are unplumbed and volumes are unmounted/exported at VCS
shutdown.
Once the configuration is buld, you can verify it by running
/opt/VRTSvcs/bin/hacf -verify and then you can start VCS by running
/opt/VRTSvcs/bin/hastart.
Commands and Tasks
Here are some important commands in VCS. They are in
/opt/VRTSvcs/bin unless otherwise noted. It's a good idea to set
your PATH to include that directory.
Manpages for these commands are all installed in /opt/VRTS/man.
hastart starts VCS using the current seeded configuration.
hastop stops VCS. -all stops it on all VCS nodes in the cluster, -force
keeps the service groups up but stops VCS, and -local stop VCS on the
current node, and -sys systemname stop VCS on a remote system.
hastatus shows VCS status for all nodes, groups, and resources. It
waits for new VCS status, so it runs forever unless you run it with the summary option.
/sbin/lltstat shows network statistics (for only the local host) much
like netstat -s. Using the -nvv option shows detailed information on all
hosts on the network segment, even if they aren't members of the
cluster.
/sbin/gabconfig sets the GAB configuration just like in /etc/gabtab.
/sbin/gabconfig -a show current GAB port status. Output should look
like this:
daldev05 # /sbin/gabconfig -a
GAB Port Memberships
===========================================
====================
Port a gen f6c90005 membership 01
Port h gen 3aab0005 membership 01
The last digits in each line are the node IDs of the cluster members.
Any mention of "jeopardy" ports means there's a problem with that
node in the cluster.
haclus displays information about the VCS cluster. It's not
particularly useful because there are other, more detailed tools you
can use:
hasys controls information about VCS systems. hasys -display shows
each host in the cluster and it's current status. You can also set this to
add, delete, or modify existing systems in the cluster.
hagrp controls Service Groups. It can offline, online (or swing)

groups from host to host. This is one of the most useful VCS tools.
hares controls Resources. This is the finest granular tool for VCS, as
it can add, remove, or modify individual resources and resource
attributes.
Here are some useful things you can do with VCS:
Activate VCS: run "hastart" on one system. All members of the
cluster will use the seeded configuration. All the resources come up.
Swing a whole Group administratively:
Assuming the system you're running GroupA on is sysa, and you want
to swing it to sysb
hagrp -switch GroupA -to sysb
Turn off a particular resource (say, ResourceA on sysa):
hares -offline ResourceA -sys sysa
In a failover Group, you can only online the resource on system on
which the group is online, so if ResourceA is a member of GroupA, you
can only bring ResourceA online on the system that is running GroupA.
To online a resource:
hares -online ResourceA -sys sysa
If you get a fault on any resource or group, you need to clear the Fault
on a system before you can bring that resource/group up on it. To
clear faults:
hagrp -clear GroupA
hares -clear ResourceA
Caveats
Here are some tricks for VCS:
VCS likes to have complete control of all its resources. It brings up all
its own virtual interfaces, so don't bother to do that in your init scripts.
VCS also likes to have complete control of all the Veritas volumes and
groups, so you shouldn't mount them at boot. VCS will fail to mount a
volume unless it is responsible for importing the Volume Group; if you
import the VG and then start VCS, it will fail after about 5 minutes and
drop the volume without cleaning the FS. So make sure all VCScontrolled VG's are exported before starting VCS.
Resource and Group names have no scope in VCS, so each must be a
unique identifier or VCS will fail to load your new configuration. There
is no equivalent to perl's my or local. VCS is also very case sensitive,
so all Types, Groups, Resources, and Systems must be the same every
time. To make matters worse, most of the VCS bundled types use
random capitalization to try to fool you. Copy and paste is your friend.
Make sure to create your Resource Dependancy Tree before your start
VCS or It will creates a problem to your whole cluster.
The default time-out for LLT/GAB communication is 15 seconds. If VCS
detects a system is down on all communcations channels for 15

seconds, it fails all of that system's resource groups over to a new


system.
If you use Veritas VM, VCS can't manage volumes in rootdg, so what I
do is encapsulate the root disk into rootdg and create new volume in
their own VCS managed VG. Don't put VCS and non-VCS volumes in
the same VG.
Don't let VCS manage non-virtual interfaces. I did this in testing, and if
you fail a real interface, VCS will unplumb it, fail it over to a virtual on
the fail-over system. Then when you try to swing it back, it will fail.
Notes on how the configuration is loaded
Because VCS doesn't have any determination of primary/slave for the
cluster, VCS needs to determine who has the valid configuration for
the cluster. As far as I can tell (because of course it's not
documented), this is how it works: When VCS starts, GAB waits a
predetermined timeout for the number of systems in /etc/gabtab to
join the cluster. At this point, all the systems in the cluster compare
local configurations, and the system with the newest config tries to
load it. If it's invalid, it pulls down the second newest valid config. If it
is valid, all the systems in VCS load that config.
Posted byMohi (a) Mohankumar Gandhiat2:23 AM0 comments
Labels:concepts

VCS Intro
Veritas Cluster Server:
Veritas Cluster Server is the industry's leading cross-platform
clustering solution for minimizing application downtime. Through
central management tools, automated failover, features to test
disaster recovery plans without disruption, and advanced failover
management based on server capacity, Cluster Server allows IT
managers to maximize resources by moving beyond reactive recovery
to proactive management of application availability.
System Requirements:
Solaris
Solaris 9 & 10 on SPARC
Solaris 10 on x64
AIX
AIX 5.3
AIX 6.1
LINUX
Red Hat Enterprise Linux (RHEL) 5 on x86 and IBM system p

Novell SUSE Linux Enterprise Server (SLES) 10 & 11 on x86 & IBM
system p
Oracle Enterprise Linux (OEL) 5 on x86
HP-UX
HP-UX 11i version 1/2/3
Windows (Requires Veritas Storage Foundation)
Windows Server 2003 SP2 (x86): Web Edition
Windows Server 2003 SP2 (x86, x64, IA64): Standard, Enterprise,
Datacenter Editions
Windows Server 2003 R2 SP2 (x86, x64): Standard, Enterprise,
Datacenter Editions
Windows Server 2008 SP1 or SP2 (x86, x64): Web, Standard (without
Hyper-V or in guest), Enterprise (without Hyper-V or in guest),
Datacenter (without Hyper-V or in guest) Editions
Windows Server 2008 for Itanium-based Systems SP1 or SP2 (IA64)
Windows Server 2008 R2 (x64): Web, Standard (without Hyper-V or in
guest), Enterprise (without Hyper-V or in guest), Datacenter (without
Hyper-V or in guest) Editions
Windows Server 2008 R2 for Itanium-based Systems (IA64)
Windows 7 (x86, x64): (use with SFWHA client components)
Windows Vista SP1 or SP2 (x86, x64): Ultimate, Enterprise, Business
Editions (use with SFWHA client components)
Windows XP SP2 or SP3 (x86, x64): (use with SFWHA client
components)
Vertias Cluster Server for Windows (Standalone)
Windows Server 2003 SP2 (x86, x64, IA64): Standard, Enterprise,
Datacenter Editions
Windows Server 2003 R2 SP2 (x86, x64): Standard, Enterprise,
Datacenter Editions
Windows 7 (x86, x64): (use with SFWHA client components)
Windows Vista SP1 or SP2 (x86, x64): Ultimate, Enterprise, Business
Editions (use with SFWHA client components)
Windows XP SP2 or SP3 (x86, x64): (use with SFWHA client
components)

Concepts
VCS is built on three components: LLT, GAB, and VCS itself. LLT handles
kernel-to-kernel communication over the LAN heartbeat links, GAB handles
shared disk communication and messaging between cluster members, and VCS
handles the management of services.

Once cluster members can communicate via LLT and GAB, VCS is started. In
the VCS configuration, each Cluster contains systems, Service Groups, and
Resources. Service Groups contain a list of systems belonging to that group, a
list of systems on which the Group should be started, and Resources. A
Resource is something controlled or monitored by VCS, like network
interfaces, logical IP's, mount point, physical/logical disks, processes, files, etc.
Each resource corresponds to a VCS agent which actually handles VCS control
over the resource.
VCS configuration can be set either statically through a configuration file,
dynamically through the CLI, or both. LLT and GAB configurations are
primarily set through configuration files.

Configuration
VCS configuration is fairly simple. The three configurations to worry about are
LLT, GAB, and VCS resources.
LLT
LLT configuration requires two files: /etc/llttab and /etc/llthosts. llttab contains
information on node-id, cluster membership, and heartbeat links. It should look
like this:
# llttab -- low-latency transport configuration file # this sets our
node ID, must be unique in cluster set-node 0 # set the heartbeat
links link hme1 /dev/hme:1 - ether - - # link-lowpri is for public
networks link-lowpri hme0 /dev/hme:0 - ether - - # set cluster number,
must be unique set-cluster 0 start

The "link" directive should only be used for private links. "link-lowpri" is
better suited to public networks used for heartbeats, as it uses less bandwidth.
VCS requires at least two heartbeat signals (although one of these can be a
communication disk) to function without complaints.
The "set-cluster" directive tells LLT which cluster to listen to. The llttab needs
to end in "start" to tell LLT to actually run.
The second file is /etc/llthosts. This file is just like /etc/hosts, except instead of
IP->hostnames, it does llt node numbers (as set in set-node). You need this file
for VCS to start. It should look like this:
0

daldev05 1

daldev06

GAB
GAB requires only one configuration file, /etc/gabtab. This file lists the number
of nodes in the cluster and also, if there are any communication disks in the
system, configuration for them. Ex:
/sbin/gabconfig -c -n2

tells GAB to start GAB with 2 hosts in the cluster. To specify VCS
communication disks:
/sbin/gabdisk -a /dev/dsk/cXtXdXs2 -s 16 -p a /sbin/gabdisk -a
/dev/dsk/cXtXdXs2 -s 144 -p h /sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 16
-p a /sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 144 -p h

-a specifies the disk, -s specifies the start block for each communication region,
and -p specifies the port to use, "a" being the GAB seed port and "h" the VCS
port. The ports are the same as the network ports used by LLT and GAB, but
are simulated on a disk.
VCS
The VCS configuration file(s) are in /etc/VRTSvcs/conf/config. The two most
important files are main.cf and types.cf. I like to set $VCSCONF to that
directory to make my life easier. main.cf contains the actual VCS configuration
for Clusters, Groups, and Resources, while types.cf contains C-like prototypes
for each possible Resource.
The VCS configurationis very similar to the C language, but all you are doing
is defining variables. Comments are "//" (if you try to use #'s, you'll be unhappy
with the result), and you can use "include" statements if you want to break up
your configuration to make it more readable. One file you must include is
types.cf.
In main.cf, you need to specify a Cluster definition:
cluster iMS ( )

You can specify variables within this cluster definition, but for the most part,
the defaults are acceptible. Cluster variables include maximum number of
groups per cluster, link monitoring, log size, maximum number of resources,
maximum number of types, and a list of user names for the GUI that you will
never use and shouldn't install.
You then need to specify the systems in the cluster:
system daldev05 ( ) system daldev06 ( )

These systems must be in /etc/llthosts for VCS to start.


You can also specify SNMP settings for VCS:

snmp vcs (
Enabled = 1
IPAddr = 0.0.0.0
TrapList = { 1 = "A new
system has joined the VCS Cluster",
2 = "An existing system has
changed its state",
3 = "A service group has changed its state",
4 = "One or more heartbeat links has gone down",
5 = "An HA service
has done a manual restart",
6 = "An HA service has been manually
idled",
7 = "An HA service has been successfully started" }
)

IPAddr is the IP address of the trap listener. Enabled defaults to 0, so you need
to include this if you want VCS to send traps. You can also specify a list of
numerical traps; listed above are the VCS default traps.
Each cluster can have multiple Service Group definitions. The most basic
Service Group looks like this:
group iMS5a (
SystemList = { daldev05, daldev06 }
AutoStartList = { daldev05 }
)

You can also set the following variables (not a complete list):

FailOverPolicy - you can set which policy is used to determine which


system to fail over to, choose from Priority (numerically based on nodeid), Load (system with the lowest system load gets failover), or
RoundRobin (system with the least number of active services is chosen).
ManualOps - whether VCS allows manual (CLI) operation on this Group
Parallel - indicats if the service group is parallel or failover

Inside each Service Group you need to define Resources. These are the nuts
and bolts of VCS. A full description of the bundled Resources can be found in
the Install Guide and a full description of the configuration language can be
found in the User's Guide.
Here are a couple of Resource examples:
NIC networka (
Device = hme0
NetworkType = ether
)
logical_IPa (
Device = hme0
Address = "10.10.30.156"

IP

The first line begins with a Resource type (e.g. NIC or IP) and then a globally
unique name for that particular resource. Inside the paren block, you can set the
variables for each resource.
Once you have set up resources, you need to build a resource dependancy tree
for the group. The syntax is "child_resource requires parent_resource." A
dependancy tree for the above resources would look like this:
logical_IPa requires networka

The dependancy tree tells VCS which resources need to be started before other
resources can be activated. In this case, VCS knows that the NIC hme0 has to
be working before resource logical_IPa can be started. This works well with

things like volumes and volumegroups; without a dependancy tree, VCS could
try to mount a volume before importing the volume group. VCS deactivates all
VCS controlled resources when it shuts down, so all virtual interfaces (resource
type IP) are unplumbed and volumes are unmounted/exported at VCS
shutdown.
Once the configuration is buld, you can verify it by running
/opt/VRTSvcs/bin/hacf -verify and then you can start VCS by running
/opt/VRTSvcs/bin/hastart.

Commands and Tasks


Here are some important commands in VCS. They are in /opt/VRTSvcs/bin
unless otherwise noted. It's a good idea to set your PATH to include that
directory.
Manpages for these commands are all installed in /opt/VRTS/man.

hastart starts VCS using the current seeded configuration.


hastop stops VCS. -all stops it on all VCS nodes in the cluster, -force
keeps the service groups up but stops VCS, and -local stop VCS on the
current node, and -sys systemname stop VCS on a remote system.
hastatus shows VCS status for all nodes, groups, and resources. It waits
for new VCS status, so it runs forever unless you run it with the summary option.
/sbin/lltstat shows network statistics (for only the local host) much like
netstat -s. Using the -nvv option shows detailed information on all hosts
on the network segment, even if they aren't members of the cluster.
/sbin/gabconfig sets the GAB configuration just like in /etc/gabtab.
/sbin/gabconfig -a show current GAB port status. Output should look
like this:
daldev05 # /sbin/gabconfig -a GAB Port Memberships
===============================================================
Port a gen f6c90005 membership 01
Port h gen 3aab0005 membership 01

The last digits in each line are the node IDs of the cluster members. Any
mention of "jeopardy" ports means there's a problem with that node in
the cluster.

haclus displays information about the VCS cluster. It's not particularly
useful because there are other, more detailed tools you can use:
hasys controls information about VCS systems. hasys -display shows
each host in the cluster and it's current status. You can also set this to
add, delete, or modify existing systems in the cluster.
hagrp controls Service Groups. It can offline, online (or swing) groups
from host to host. This is one of the most useful VCS tools.
hares controls Resources. This is the finest granular tool for VCS, as it
can add, remove, or modify individual resources and resource attributes.

Here are some useful things you can do with VCS:


Activate VCS: run "hastart" on one system. All members of the cluster will use
the seeded configuration. All the resources come up.
Swing a whole Group administratively:
Assuming the system you're running GroupA on is sysa, and you want to swing
it to sysb
hagrp -switch GroupA -to sysb

Turn off a particular resource (say, ResourceA on sysa):


hares -offline ResourceA -sys sysa

In a failover Group, you can only online the resource on system on which the
group is online, so if ResourceA is a member of GroupA, you can only bring
ResourceA online on the system that is running GroupA. To online a resource:
hares -online ResourceA -sys sysa

If you get a fault on any resource or group, you need to clear the Fault on a
system before you can bring that resource/group up on it. To clear faults:
hagrp -clear GroupA hares -clear ResourceA

Caveats
Here are some tricks for VCS:

VCS likes to have complete control of all its resources. It brings up all its own
virtual interfaces, so don't bother to do that in your init scripts. VCS also likes
to have complete control of all the Veritas volumes and groups, so you
shouldn't mount them at boot. VCS will fail to mount a volume unless it is
responsible for importing the Volume Group; if you import the VG and then
start VCS, it will fail after about 5 minutes and drop the volume without
cleaning the FS. So make sure all VCS-controlled VG's are exported before
starting VCS.
Resource and Group names have no scope in VCS, so each must be a unique
identifier or VCS will fail to load your new configuration. There is no
equivalent to perl's my or local. VCS is also very case sensitive, so all Types,
Groups, Resources, and Systems must be the same every time. To make matters
worse, most of the VCS bundled types use random capitalization to try to fool
you. Copy and paste is your friend.
Make sure to create your Resource Dependancy Tree before your start VCS or
you could fuck up your whole cluster.
The default time-out for LLT/GAB communication is 15 seconds. If VCS
detects a system is down on all communcations channels for 15 seconds, it fails
all of that system's resource groups over to a new system.
If you use Veritas VM, VCS can't manage volumes in rootdg, so what I do is
encapsulate the root disk into rootdg and create new volume in their own VCS
managed VG. Don't put VCS and non-VCS volumes in the same VG.
Don't let VCS manage non-virtual interfaces. I did this in testing, and if you fail
a real interface, VCS will unplumb it, fail it over to a virtual on the fail-over
system. Then when you try to swing it back, it will fail.
Notes on how the configuration is loaded
Because VCS doesn't have any determination of primary/slave for the cluster,
VCS needs to determine who has the valid configuration for the cluster. As far
as I can tell (because of course it's not documented), this is how it works: When
VCS starts, GAB waits a predetermined timeout for the number of systems in
/etc/gabtab to join the cluster. At this point, all the systems in the cluster
compare local configurations, and the system with the newest config tries to
load it. If it's invalid, it pulls down the second newest valid config. If it is valid,
all the systems in VCS load that config.

Вам также может понравиться