HACMP II Administration Student Notebook ERC 1.2

V3.1.0.
cover
Front cover
HACMP II: Administration

(Course Code QV125)
Student Notebook
ERC 1.2
UNIX Software Service Enablement
Student Notebook
July 2007 Edition

The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
Copyright International Business Machines Corporation 2007. All rights reserved.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V3.1.0.1
Student Notebook
TOC
Contents
Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Unit 1. HACMP Concept Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Fundamental HACMP Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
HACMP's Topology Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
HACMP's Resource Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Networking Review: IPAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Networking Review: Configuration Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
Just What Does HACMP Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
What Happens When Something Fails? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
What Happens When a Problem is Fixed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
Resource Group Behavior? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
So, What is HACMP Really? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
Additional Features of HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
Some Assembly Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
HACMP V5.4 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20
Things HACMP Does Not Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21
When HACMP Is Not The Correct Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
Sources of HACMP Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Unit 2. Configuring Shared Storage for HACMP . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Data and Storage Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
LVM Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
LVM Volume Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
High Availability Data/Storage Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Configuring a Mirrored File System for HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Shared Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Serial Access Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Reserve/Release Voluntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Reserve/Release Involuntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21
RSCT Based Voluntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
RSCT Based Involuntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Synchronizing Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Quorum Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
Quorum/Mirror Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
HACMP Forced Varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
Recommendations for Forced Varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
Copyright IBM Corp. 2007
Contents
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
iii
Student Notebook
OEM VG and File System Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-39

OEM Disk Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-42
Virtual Storage (VIO) and HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-47
Checkpoint 1 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-51
Checkpoint 2 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-52
Checkpoint 3 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-53
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-54
Lab Exercises: Exercise 1 and Exercise 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-55
Unit 3. HACMP Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3
3.1 HACMP Status and Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Topic 1: HACMP Status and Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
Useful AIX Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
Useful HACMP Status Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8
Summary of Main HACMP Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Where are the Log Files? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-15
Lets Review: Topic 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16
3.2 Topology and Resource Group Management . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Topic 2: Topology and Resource Group Management . . . . . . . . . . . . . . . . . . . . . .3-18
Yet Another Resource Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19
Adding a Third Resource Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20
Adding a Third Service IP Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-21
Adding a Third Application Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23
Adding Resources to the Third RG (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-24
Adding Resources to the Third RG (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-25
Synchronize Your Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-26
Expanding the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-27
Adding a New Cluster Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-28
Add Node -- Versus Extended Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-29
7. Define the Non-IP rs232 Networks (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-31
Define the Non-IP rs232 Networks (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-33
8-9. Synchronize and Start Cluster Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-34
Final Steps: Add the Node to a Resource Group, Synchronize, and Test . . . . . . .3-35
Shrinking the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36
Removing a Cluster Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-37
Removing an Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-38
Removing a Resource Group (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-39
Removing a Resource Group (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-41
Lets Review: Topic 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-42
3.3 Cluster Single Point of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43
Topic 3: Cluster Single Point of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-44
Administering a High Availability Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-45
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-46
Cluster Single Point of Control (C-SPOC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-48
The Top-Level C-SPOC Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-50
Starting Cluster Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51
iv

V3.1.0.1
Student Notebook
TOC
Verifying Cluster Services Have Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53

Stopping Cluster Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-55
Verifying Cluster Services Have Stopped: Stopping Without Unmanaged Resource
Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-57
Verifying Cluster Services Have Stopped: Stopping With Unmanaged Resource
Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-59
LVM Change Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-61
LVM Changes, Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-63
LVM Changes, Lazy Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-64
LVM Changes, C-SPOC Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-65
Enhanced Concurrent Mode Volume Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-66
Managing Shared LVM Components with C-SPOC . . . . . . . . . . . . . . . . . . . . . . . 3-67
Creating a Shared Volume Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-69
Discover, Add VG to a Resource Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-70
Creating a Shared File System (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-71
Creating a Shared File System (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-73
LVM Changes, Select Your Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-74
Update the Size of a Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-75
HACMP Resource Group Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-76
Priority Override Location (POL) Old . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-77
Priority Override Location (POL) New . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-79
Moving a Resource Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-81
Bring a Resource Group Offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-82
Bring a Resource Group Back Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-83
Lets Review: Topic 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-84
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-85
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-86
Unit 4. Cluster Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
How Does HACMP Communicate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
HACMP Security Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Standard Connection Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Using IPSec VPN Tunnels for Communications (1 of 2) . . . . . . . . . . . . . . . . . . . . . 4-8
Using IPSec VPN Tunnels (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Create Additional IP Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
HACMP Message Authentication and Encryption (1 of 3) . . . . . . . . . . . . . . . . . . . 4-11
Message Authentication and Encryption (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
Message Authentication and Encryption (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15
A Holistic Approach to Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20
Lab Exercises: Exercise 3 and Optional Exercises . . . . . . . . . . . . . . . . . . . . . . . . 4-21
Contents
Student Notebook
Appendix A. Checkpoint Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Appendix B. Integrating NFS into HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
So, What is NFS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
NFS Background Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
Combining NFS With HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
NFS Fallover With HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6
Configuring NFS for High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7
Cross-mounting NFS Filesystems (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-9
Cross-mounting NFS Filesystems (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-11
Cross-mounting NFS Filesystems (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-12
Choosing the Network for Cross-Mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-13
Configuring HACMP for Cross-Mounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-14
Syntax for Specifying Cross-Mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-15
Ensuring the VG Major Number is Unique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-16
NFS with HACMP Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-17
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-19
Appendix C. Using WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Web-Endabled SMIT (WebSMIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
WebSMIT Main Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5
WebSMIT Context Menu Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-7
WebSMIT Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
WebSMIT Online Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-9
WebSMIT Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-10
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-15
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-16
vi

V3.1.0.1
Student Notebook
pref
Course Description
Purpose
This course is part of an HACMP curriculum designed to prepare students to support
customers who are using HACMP. This course teaches how to administer a highly
available cluster using HACMP Version 5.4 on an IBM pSeries server running AIX 5L
V5.2 or V5.3.
Audience
This course is intended for AIX technical support personnel and AIX system
administrators.
Prerequisites
Students attending this course are expected to have:
- AIX TCP/ IP, LVM storage and disk hardware implementation skills
- An understanding of basic HACMP concepts and the ability to install and configure a
basic two-node cluster in standby configuration
These skills are addressed in the following course and its prerequisites, or can be
obtained through equivalent education and experience:
- AHQV120: HACMP-I: Installation and Initial Configuration

Course Objectives
On completion of this course, students should be able to:
- Configure AIX shared storage for HACMP
- Configure HACMP for two resource groups in a two-node mutual takeover
configuration
- Use the SMIT Standard and Extended menus to make topology and resource group
changes
- Perform cluster administration using C-SPOC
- Identify the cluster status commands
- List the cluster log files and their locations and describe the type of information
which can be found in each
Course Description
vii
Student Notebook
Curriculum relationships
This course is the second course in our HACMP support curriculum:
- HACMP-I: Installation and Initial Configuration
HACMP-I is an introductory course designed to prepare students to install and
configure a highly available cluster using HACMP Version 5.4 on an IBM pSeries
server running AIX 5L V5.2 or V5.3.
- HACMP-II: Administration
HACMP-II teaches how to administer a highly available cluster using HACMP
Version 5.4 on an IBM pSeries server running AIX 5L V5.2 or V5.3.
- HACMP-III: Extended Configuration
HACMP-III teaches more advanced HACMP administration, including extended
configuration, cluster event flow and monitoring cluster status.
- HACMP-IV: Application Integration
HACMP-IV Describes the requirements for successful application integration and
monitoring. Students will integrate a real application into HACMP and will resolve
application problems.
- HACMP-V: Problem Determination
HACMP-V introduces HACMP problem determination concepts and techniques,
including: common failures, strategies, tools and log files. Students will resolve LVM
and CSPOC problems, networking and RSCT problems and event script problems.
viii

V3.1.0.1
Student Notebook
pref
Agenda
(1:00) Welcome
(1:00) Unit 1 - HACMP Concept Review
(2:30) Unit 2 - Configuring Shared Storage for HACMP
(0:30) Exercise 1 - Configure Shared Storage for HACMP
(3:00) Exercise 2 - Create a Mutual Takeover Cluster
(2:30) Unit 3 - HACMP Administration
(1:00) Unit 4 - HACMP Security
(3:00) Exercise 3 - HACMP Administration
(OPTIONAL) Exercise 4 - HACMP Security
Appendix B - Integrating NFS into HACMP
Appendix C - Using WebSMIT
Agenda
ix
Student Notebook
Text highlighting
The following text highlighting conventions are used throughout this book:
Bold
Identifies file names, file paths, directories, user names,

principals, menu paths and menu selections. Also identifies
graphical objects such as buttons, labels and icons that the
user selects.
Italics
Identifies links to web sites, publication titles, is used where the

word or phrase is meant to stand out from the surrounding text,
and identifies parameters whose actual names or values are to
be supplied by the user.
Monospace
Identifies attributes, variables, file listings, SMIT menus, code

examples and command output that you would see displayed
on a terminal, and messages from the system.
Monospace bold
Identifies commands, subroutines, daemons, and text the user

would type.

V3.1.0.1
Student Notebook
Uempty
Unit 1. HACMP Concept Review

What This Unit Is About
This unit reviews the fundamental concepts of HACMP for AIX.
What You Should Be Able to Do

After completing this unit, you should be able to:
Discuss basic fundamental concepts of HACMP for AIX
Outline the features of HACMP for AIX
Review the features, components, and limits of an HACMP for AIX
cluster
Explain how HACMP for AIX operates in typical cases
Describe some of the considerations and limits of an HACMP
cluster
Locate HACMP sources of information
How You Will Check Your Progress

Accountability:
Checkpoint questions
Lab exercises
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html
HACMP for AIX manuals

1-1
Student Notebook
Unit Objectives
After
After completing
completing this
this unit,
unit, you
you should
should be
be able
able to:
to:
Discuss
Discuss basic
basic fundamental
fundamental concepts
concepts of
of HACMP
HACMP for
for AIX
AIX
Outline
Outline the
the features
features of
of HACMP
HACMP for
for AIX
AIX
Review
Review the
the features,
features, components,
components, and
and limits
limits of
of an
an
HACMP
HACMP for
for AIX
AIX cluster
cluster
Explain
Explain how
how HACMP
HACMP for
for AIX
AIX operates
operates in
in typical
typical cases
cases
Describe
Describe some
some of
of the
the considerations
considerations and
and limits
limits of
of an
an
HACMP
HACMP cluster
cluster
Locate
Locate HACMP
HACMP sources
sources of
of information
information
Copyright IBM Corporation 2007
Figure 1-1. Unit Objectives
QV1251.2
Notes
1-2

V3.1.0.1
Student Notebook
Uempty
Fundamental HACMP Concepts

Topology: Physical networking-centric components
Resources: Entities which are being made highly available
Resource group: A collection of resources which HACMP
controls as a single unit
Resource group policies:
startup policy: determines on which node the resource
group is activated
fallover policy: determines target when there is a
failure
fallback policy: determines fallback behavior
Customization: The process of augmenting HACMP,
typically via implementing scripts
Figure 1-2. Fundamental HACMP Concepts
QV1251.2
Notes
Terminology
A clear understanding of the above concepts and terms is important as they appear
over and over again both in the remainder of the course and throughout the HACMP
documentation, log files, and SMIT screens.

1-3
Student Notebook
HACMP's Topology Components
IP ork
tw
Ne
-IP k
on or
N tw
e
N
Communication
Interface
n
atio
unic
m
Com Device
Nod
r
st e
Cl u
No
de
The topology components consist of a cluster, nodes, and the

network technology which connects them together.
Figure 1-3. HACMP's Topology Components
QV1251.2
Notes
Topology components
An HACMP cluster's topology encompasses nodes (System p servers / LPARS), IP and
non-IP networks (connections between the nodes). IP networks consist of
communication interfaces (for example, Ethernet or token-ring network adapters) and
for non-IP networks the communication devices (for example, /dev/tty for RS232).
Nodes
In the context of HACMP, the term node means any IBM System p which is a member
of a High Availability cluster running HACMP. This would also include a logical partition
(LPAR) running AIX and HACMP. A node can only be a member of at most one cluster.
1-4

V3.1.0.1
Student Notebook
Uempty
Networks
Networks consist of IP and non-IP networks. The non-IP networks ensure that cluster
monitoring can be done if there is a total loss of IP communication. Non-IP networks are
strongly recommended to be configured in an HACMP.

1-5
Student Notebook
tio
ica
pl
Ap
n
er
rv
Se
Se
Ad rvic
dr e I
es P
s
Vo
Gr lum
ou e
p
HACMP's Resource Components
le
Fi tem
s
Sy
roup
G
e
c
r
u
Reso
s
Node e Policies
m
Runti ces
ur
Reso
Figure 1-4. HACMP's Resource Components
QV1251.2
Notes
Resource group
A resource group is a collection of resources treated as a unit along with what nodes
they can potentially be activated on and what policies the cluster manager should use to
decide which node to choose during startup, fallover, and fallback. A cluster may have
more than one resource group (usually one for each application), thus allowing for very
flexible configurations.
Resources
Resources are logical components that are made highly available by HACMP. Because
they are logical components, they can be moved without human intervention.
The resources shown in the visual are a typical set of resources used in resource
groups such as:
1-6

V3.1.0.1
Student Notebook
Uempty
- Service IP Address - Users need to be able to connect to the application. Typically,

the users are given an IP address or label to connect to the application. This IP
address/label becomes a resource in the resource group as it must be associated
with the same node that is running the application.
- Volume Group - If the application requires shared disk storage, this storage is
contained within volume groups.
- Filesystem - An application often requires certain filesystems to be mounted.
- Application Server - The application itself must be part of the resource group
(strictly speaking, the application server actually consists of scripts which start and
stop the application as required by HACMP).
In addition to the resources listed in the figure, in the SMIT Extended Configuration
path there are more options which are less prevalent such as NFS mounts and X.25
communication links.

1-7
Student Notebook
Networking Review: IPAT

HACMP uses IP Address Takeover (IPAT) to keep
networking resources (service IP labels, persistent labels)
highly available
There are 2 types of IPAT:
IPAT via IP Aliasing:
HACMP adds the service IP address to an (AIX) interface IP address
using AIX's IP aliasing feature:
ifconfig en0 alias 192.168.1.2
IPAT via IP Replacement:

HACMP replaces an (AIX) interface IP addresses with the service IP
addresses:
ifconfig en0 192.168.1.2
Figure 1-5. Networking Review: IPAT
QV1251.2
Notes:
IP Address Takeover (IPAT)
HACMP keeps service and persistent addresses and labels highly available using IP
Address Takeover or IPAT. This allows HACMP to move an address to another NIC or
node when the component supporting the address fails.
An HACMP network can be configured to use either IPAT via IP Aliasing or IPAT via IP
Replacement. When aliasing is used, service labels are aliased onto interfaces,
maintaining the existing configuration (the non-service addresses are still available from
the affected interfaces). When replacement is used, service labels replace the
non-service address configured on an interface.
1-8

V3.1.0.1
Student Notebook
Uempty
Networking Review: Configuration Rules

Non-service IP addresses
Define these address in the /etc/host file and configure them in
HACMP topology as communication interfaces
Using heartbeat over IP interfaces
To enable accurate diagnosis of network component failures, each IP address
defined on a nodes interfaces must be in a different logical IP subnet (this address
is configured in AIX)
There must be at least one subnet in common with all nodes
Using heartbeat over IP alias

Removes subnet restrictions on all addresses
Service IP addresses
Define service addresses in /etc/hosts and in HACMP

resources
HACMP will configure them to AIX when needed
IPAT via IP Aliasing:

They must not be in the same logical IP subnet as any of the non-service IP
addresses
IPAT via IP Replacement

Each service IP label must be in the same subnet as a non-service label subnet
There must be at least as many NICs on each node as there are service IP labels
All service IP labels must be in the same subnet
Figure 1-6. Networking Review: Configuration Rules
QV1251.2
Notes:
Non-service address rules
When heartbeating over IP interfaces is used, in order for topology services to
accurately diagnose network component failures (using hearbeat rings), all interfaces
on a node must be configured with IP addresses that are on different subnets. Using
heartbeating over IP alias removes the subnet restrictions. With this method you specify
a base address for the heartbeat subnets and HACMP configures heartbeat rings using
IP aliasing.
You define non-service IP addresses and labels using AIX (smitty mktcpip, smitty
chinet). A node will boot with non-service addresses configured on its interfaces by AIX.
These addresses and labels are listed in the /etc/hosts file on each node, along with
any service labels and addresses.

1-9
Student Notebook
Service address rules

The rules for configuring service IP addresses depend on the type of IPAT used. When
service addresses will be aliased, they must be configured on a different subnet than
any of the non-service addresses. When service addresses will replace non-service
addresses, it is important that they reside in the same subnet as a non-service address,
and one that is accessible by all nodes.
1-10 HACMP II: Administration

V3.1.0.1
Student Notebook
Uempty
Just What Does HACMP Do?
HACMP functions:
Monitor the states of nodes, networks, network
adapters/devices
Strive to keep resource groups highly available
Optionally, HACMP can monitor the state of the application(s)
and can be customized to react to every possible failure
Figure 1-7. Just What Does HACMP Do?
QV1251.2
Notes
HACMP basic functions
HACMP directly detects four kinds of network-related failures:
- A communications adapter or device failure
- A node failure
- A network failure (all communication adapters/devices on a given network
- Application failure (requires application monitors).
Most other failures are handled outside HACMP, either by AIX or LVM, and can be
handled in HACMP via customization. Customization that allows HACMP to react when
loss of quorum for a volume group occurs is built-in.

1-11
Student Notebook
What Happens When Something Fails?
How the cluster responds to a failure depends on what

has failed, what the resource group's fallover policy is,
and if there are any resource group dependencies:
Typically another equivalent component takes over
duties of the failed component (for example, another
node takes over from a failed node)
Figure 1-8. What Happens When Something Fails?
QV1251.2
Notes
How HACMP responds to a failure
HACMP generally responds to a failure by using an equivalent but still available
component to take over the duties of the failed component. For example, if a node fails,
then HACMP initiates a fallover (for non concurrent resource groups), an action which
consists of moving the resource groups which were previously on the failed node to a
surviving node. If a Network Interface Card (NIC) fails, HACMP usually moves any IP
addresses being used by clients to another available NIC. If there are no remaining
available NICs, HACMP initiates a fallover. If only one resource group is affected, then
only the one resource group is moved to another node.

V3.1.0.1
Student Notebook
Uempty
What Happens When a Problem is Fixed?
How the cluster responds to the recovery of a failed component

depends on what has recovered, what the resource group's fallback
policy is, and what resource group dependencies there are:
Typically, administrators need to indicate/confirm that the fixed
component is approved for use. Some components are
integrated automatically, for instance when a communication
interface recovers.
Figure 1-9. What Happens When a Problem is Fixed?
QV1251.2
Notes
How HACMP responds to a recovery
When a previously failed component recovers, it must be reintegrated back into the
cluster (reintegration is the process of HACMP recognizing that the component is
available for use again). Some components, like NICs, are automatically reintegrated
when they recover. Most of the time other components, like nodes, are not reintegrated
until the cluster administrator explicitly requests the reintegration (by starting the
HACMP daemons on the recovered node).

1-13
Student Notebook
Resource Group Behavior?

Non-concurrent
Standby with/without fallback
Mutual takeover (very popular)
B A
trinity
neo
Concurrent
Application must be designed to run
simultaneously on multiple nodes
This has the potential for essentially
zero downtime and is designed for fault
tolerance and high performance
The application must be specifically
written for the environment
A
neo
trinity
zion
Figure 1-10. Resource Group Behavior?
QV1251.2
Notes
Non-concurrent mode
This is where HACMP runs an application on a single node that will fallover to a standby
node in case of a failure. This method is used to build mutual takeover clusters whereby
each node will run an application. Mutual takeover configurations are very popular
configurations for HACMP since they support two highly available applications at a cost
which is not that much more than would be required to run the two applications in
separate stand-alone configurations.
Each cluster node probably needs to be somewhat larger than the stand-alone nodes
as they must each be capable of running both applications, possibly in a slightly
degraded mode, should one of the nodes fail.

V3.1.0.1
Student Notebook
Uempty
Concurrent mode
HACMP also supports resource groups in which the application is active on multiple
nodes simultaneously (online on all available nodes). In such a resource group, all
nodes run a copy of the application and share simultaneous access to the disk. This
style of cluster is often referred to as a concurrent access cluster or concurrent access
environment.

1-15
Student Notebook
So, What is HACMP Really?

An application which:
Controls where resource groups run

Monitors and reacts to events:
Provides tools for cluster wide configuration and synchronization
Relies on other AIX Subsystems (ODM, LVM, RSCT, TCP/IP, SRC, and so
on)
Cluster Manager Subsystem (clstrmgrES)
clcomdES
Topology
manager
Resource
manager
Event
manager
RSCT
(topsvcs, grpsvcs, RMC
subsystems)
SNMP
manager
snmpd
clinfoES
clstat
Figure 1-11. So, What is HACMP Really?
QV1251.2
Notes
HACMP core components
HACMP comprises a number of software components:
- The cluster manager, clstrmgrES, is the core process which monitors cluster
membership. The cluster manager includes a topology manager to manage the
topology components, a resource manager to manage resource groups, an event
manager with event scripts that works through the RMC facility, and RSCT to react
to failures.
- In HACMP v5.3/5.4, the cluster manager also contains an SNMP manager which
allows for SNMP-based monitoring to be done using an SNMP manager such as
Tivoli NetView.
- The clinfo process provides an API for communicating between cluster manager
and your application. clinfo also provides remote monitoring capabilities and can
run a script in response to a status change in the cluster. clinfo is an optional
V3.1.0.1
Student Notebook
Uempty
process which can run on both servers and clients (the source code is provided on
request). The clstat command uses clinfo to display status via ascii, Xwindow, or
Web browser interfaces.
In HACMP v5.x, clcomdES provides a secure node communication path which allows
the cluster nodes to communicate in a secure manner without using rsh and .rhost
files.

1-17
Student Notebook
Additional Features of HACMP

Configuration
assistant
OLPW
WebSMIT
Verification/
Auto
correction
CTT
ClstrmgrES
CSPOC
DARE
SNMP
Tivoli
Integration
Application
Monitoring
HACMP is shipped with utilities to simplify configuration,

monitoring, customization, and cluster administration.
Figure 1-12. Additional Features of HACMP
QV1251.2
Notes
Additional features
HACMP also has additional software to provide facilities for administration, testing,
remote monitoring, auto-correction, and verification.

V3.1.0.1
Student Notebook
Uempty
Some Assembly Required

HACMP can be used out of the box, however some
assembly is required
Minimum:
Application Start/Stop/Monitor scripts
Optional:
Customized pre/post event scripts
Reaction to events
Error notification Methods
User Defined Events (UDEs)
Cluster State Change
HACMP's flexibility allows for complex customization in

order to meet availability goals
Figure 1-13. Some Assembly Required
QV1251.2
Notes
Customization required
Minimally, you will have to create application start and stop scripts. It is strongly
suggested that you create application monitors also, to allow HACMP to handle failure
of the application.
Optional customization
HACMP is shipped with event scripts (Korn Shell scripts) which handle default failure
scenarios. If you have a requirement to customize some special behavior, then this can
be achieved through pre- and post-event scripts or error notification methods and User
Defined Events (UDEs).

1-19
Student Notebook
HACMP V5.4 Limits

Cluster limits:
32 nodes in a cluster
64 resource groups per cluster
256 IP addresses known to HACMP
(for example, service and boot IP labels)
128 application monitors (no limit per application server)
2 sites (minimum of 1 node per site)
RSCT limit:
48 heartbeat rings
Figure 1-14. HACMP V5.4 Limits
QV1251.2
Notes
RSCT limit
HACMP uses the Topology Services component of RSCT for monitoring networks and
network interfaces. Topology Services organizes all the interfaces in the topology into
different heartbeat rings. The current version of RSCT Topology services has a limit of
48 heartbeat rings, which is usually sufficient to monitor networks and network
interfaces. Roughly speaking, the number of heartbeat rings is (usually) very close to
the number of network adapters on the node with the most adapters.
These limits do not tend to be a major concern in most clusters. Refer to the HACMP
documentation for additional information if you are planning a cluster which might
approach some of these limits.

V3.1.0.1
Student Notebook
Uempty
Things HACMP Does Not Do
Backup and restoration

Time synchronization
Application-specific configuration
System administration tasks unique to each node
Figure 1-15. Things HACMP Does Not Do
QV1251.2
Notes
Things HACMP does not do
HACMP does not automate your backups, neither does it keep time in sync between
the cluster nodes nor tune your DB2 configuration. These tasks do require further
configuration and software. For example, you can use Tivoli Storage Manager (TSM) as
an enterprise backup solution and a time protocol such as xntp for time
synchronization.

1-21
Student Notebook
When HACMP Is Not The Correct Solution

Zero downtime required:
Maybe a fault-tolerant system is the correct choice
7 x 24 x 365, HACMP occasionally needs to be shut
down for maintenance
Life-critical environments
Security Issues:
Too little security:
Lots of people with the ability to change the environment
Too much security:

May not allow HACMP to function as designed
Unstable environments:
HACMP cannot make an unstable and poorly-managed
environment stable
HACMP increases the availability of well-managed
systems
Figure 1-16. When HACMP Is Not The Correct Solution
QV1251.2
Notes
Zero downtime
An example of zero downtime may be the intensive care room. Also HACMP is not
designed to handle many failures at once.
Security issues
One security issue that is now addressed is the need to eliminate .rhost files. Also
there is better encryption possible with inter-node communications but this may not be
enough for some security environments.
Unstable environments
The prime cause of problems with HACMP is poor design, planning, implementation,
and administration. If you have an unstable environment, with poorly trained

V3.1.0.1
Student Notebook
Uempty
administrators, easy access to the root password, and a lack of change control,
HACMP is not the solution for you.
With HACMP, the only thing more expensive than employing a professional to plan,
design, install, configure, customize, and administer the cluster is employing an
amateur.
Other characteristics of poorly managed systems are:
- Lack of change control
- Failure to treat cluster as single entity
- Lack of documented operational procedures

1-23
Student Notebook
Sources of HACMP Information

HACMP manuals come with the product
cluster.doc.en_US.es.html
cluster.doc.en_US.es.pdf
Release notes contain important information about the version release
/usr/es/sbin/cluster/release_notes
This course is the second course in our HACMP support curriculum:
HACMP-I: Installation and Initial Configuration
HACMP-II: Administration
HACMP-III: Extended Configuration
HACMP-III teaches more advanced HACMP administration, including extended
configuration, cluster event flow and monitoring cluster status.
HACMP-IV: Application Integration

HACMP-IV describes the requirements for successful application integration and
monitoring. Students will integrate a real application into HACMP and will resolve
application problems.
HACMP-V: Problem Determination

HACMP-V introduces HACMP problem determination concepts and techniques,
including: common failures, strategies, tools and log files. Students will resolve LVM and
CSPOC problems, networking and RSCT problems and event script problems.
IBM HACMP web sites:

http://www.ibm.com/systems/p/ha/
http://www.ibm.com/systems/p/software/hacmp.html
http://www-03.ibm.com/systems/p/ha/resources.html
Figure 1-17. Sources of HACMP Information
QV1251.2
Notes
Sources of information
There are many excellent sources of HACMP information. Manuals and release notes
come with the product; read them. You can also find the manuals (for all supported
versions of HACMP) online, as well as Redpapers, Redbooks, and whitepapers that
cover many topics.

V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.
True or False: HWAT is compatible with IPAT over Aliasing.
2.
If node1 has NICs configured with the addresses 192.168.20.1 and

192.168.21.1 and node2 has NICs with the IP addresses 192.168.20.2 and
192.168.21.2, then which of the following are valid service IP addresses
when using IPAT via Aliasing:
a.
b.
c.
d.
(192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and 192.168.21.4)

192.168.20.3 and 192.168.20.4 and 192.168.21.3 and 192.168.21.4
192.168.22.3 and 192.168.22.4
192.168.23.3 and 192.168.20.3
3.
On reboot of a failed node, HACMP will:

a.
Do nothing
b.
Issue a clRGmove for all RGs which belong to that node
c.
Bring on-line RGs which are in ERROR state only
d.
It depends
4.
True or False: A Resource may belong to more than one Resource group.
5.
A /dev/hdisk device when used by HACMP as a non-IP heartbeat network is

referred to as a
a.
Communication interface
b.
Communication device
c.
Communication adapter
d.
Non-IP network
Figure 1-18. Checkpoint
QV1251.2
Notes

1-25
Student Notebook
Unit Summary
Key points from this unit:
Basic fundamental concepts of HACMP for AIX
Topology, resources, customization
HACMP networks
IPAT, configuration rules
Features of HACMP for AIX
Planning and configuration tools and assistants
Components and limits of an HACMP for AIX cluster
RSCT, SNMP, clstrmgr, clcomd, clinfo
HACMP keeps resource groups and applications highly available
Cluster Manager initiates fallover and fallback according to policies
and conditions
Considerations and limits of an HACMP cluster
No data backup, time synchronization, application configuration
Not fault-tolerant
Security and environment stability considerations
Locate HACMP sources of information
With the product, in courses, and on the web
Figure 1-19. Unit Summary
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Unit 2. Configuring Shared Storage for HACMP

This unit discusses shared storage in a high-availability environment
with a particular emphasis, of course, on shared storage in an HACMP
context.

Discuss the issues to make data and storage highly available.
Describe how access to shared storage is controlled in an HACMP
cluster
Explain how enhanced concurrent mode volume groups are used
Explain the issue of PVID consistency within an HACMP cluster
Discuss how LVM aids cluster availability
Describe the quorum issues associated with HACMP
Set up LVM for maximum availability
Configure a new shared volume group, filesystem, and jfslog

Lab exercises
References
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
HACMP manuals

2-1
Student Notebook
Unit Objectives
Discuss the issues to make data and storage highly
available.
Describe how access to shared storage is controlled in an
HACMP cluster
Explain how enhanced concurrent mode volume groups are
used
Explain the issue of PVID consistency within an HACMP
cluster
Discuss how LVM aids cluster availability
Describe the quorum issues associated with HACMP
Set up LVM for maximum availability
Configure a new shared volume group, file system, and
jfslog
QV1251.2
Notes
2-2

V3.1.0.1
Student Notebook
Uempty
Data and Storage Basics

AIX
Components: VG, LV, FS

Definitions: CuDv (pvid), /etc/filesystems
Commands: chvg, varyonvg
Protection: quorum, mirroring
LVM
Device support:
Hardware Adapter
hdisks, vpath
Driver: SDD, MPIO
SCSI
SSA
HBA,FC (SAN)
OEM (EMC)
Node
node1/#
hdisk0
hdisk1
hdisk2
hdisk3
lspv
00013c26f4222080
00013c26be8aabbe
00013c260ce205d2
00013c26beea7727
VGDA
rootvg
rootvg active
appB_vg
glvm_vg
None
DISKs (LUNs)
PVID
Storage system
DS8000,DS6000,DS4000,
SAN Volume Controller 2104,
ESS2105
Determine HACMP compatibility levels
Figure 2-2. Data and Storage Basics
QV1251.2
Notes:
Introduction
It is assumed in this course that you have had experience with AIX LVM management
and the storage systems that you will be using. The purpose of this unit is to bring out
the information that is relevant to an HACMP environment.
Single system data management

Managing data on a single system involves the combination of AIX LVM constructs and
hardware components. The LVM constructs consist of volume groups which contain a
collection of disks (LUNs), logical volumes which represent data partitions and file
systems which make the partitions available to an application via the mount command.
The storage hardware is represented by hdisks and/or vpaths with device drivers to
manage the hardware adapters and access to the storage systems. HACMP supports
both hdisks and vpath devices.

2-3
Student Notebook
It is important to remember that some information is kept both in AIX and on the disk
(LUN). This information includes the VGDA and especially the PVID.
PVIDs and their use in AIX

For AIX to use a disk (LUN), it requires that the disk (LUN) be assigned a unique
physical volume ID (PVID). This is stored in the ODM and on the disk (LUN), and linked
to a logical construct in AIX called an hdisk. hdisks are numbered sequentially as
discovered by the configuration manager (cfgmgr). If a disk (LUN) has no PVID it is
assigned when the disk (LUN) is defined to a volume group or manually by a user via
the chdev command. If a disk (LUN) has a PVID assigned, it will be recognized by AIX
when a cfgmgr runs (manually or at system boot) and stored in the ODM.
Storage systems
SAN (SDD,HBA)
IBM Storage Subsystems currently supported include:
DS8000 / DS6000 families
DS4000 family
SAN Volume Controller (SVC)
IBM Storage Subsystem support with HACMP is announced via Flash
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Flashes
Determine the HACMP compatibility levels for the following:
HBA device driver
AIX patch levels
Multi-pathing software (SDD, RDAC, MPIO PCM, and so on)
Device microcode/firmware
With most IBM SAN Storage devices, the multi-pathing software will be the Subsystem
Device Driver (SDD). It is supported with HACMP (with appropriate PTFs).
To use C-SPOC with VPATH disks, SDD 1.3.1.3, or later, is required.
For levels and maintenance, check:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S4000
065&loc=en_US&cs=utf-8&lang=en
SCSI
It is most likely you will be using an IBM 2104 Expandable Storage Plus device if you
are attaching via SCSI. It is also possible, though unlikely, that you would connect
an ESS (2105) to your pSeries system using SCSI.
SSA
- SSA is not longer marketed.
- SSA uses a loop technology which offers multiple data paths to disk. There are
number and type of adapter restrictions on each loop. For example:
2-4

V3.1.0.1
Student Notebook
Uempty
SSA loops can support eight adapters per loop (Maximum of eight HACMP
nodes sharing SSA disks)
Adapters used in RAID mode are limited to two per loop
For additional information see:
- Redbook, Understanding SSA Subsystems in Your Environment,
SG24-5750-00
- http://www-03.ibm.com/servers/storage/support/disk/7133/index.html
- You can use IBM 7133 and 7131-405 SSA disk subsystems as shared external disk
storage devices to provide concurrent access in an HACMP cluster configuration.
- SSA adapters
The capabilities of SSA adapters have improved over time: - Only 6215, 6219, 6225
and 6230 adapters support Target Mode SSA and RAID5. Only the 6230 adapter with
6235 Fast Write Cache Option feature code supports enabling the write cache with
HACMP
Compatible adapters: 6214 + 6216 or 6217 + 6218 or 6219 + 6215 + 6225 + 6230
For more information and microcode updates (go to SSA downloadable files):
http://www-03.ibm.com/servers/storage/support/disk/7133/downloading.html Features
and functionality of otherwise identical adapters and drives can vary depending upon
the level of microcode installed on the devices so be careful!
Note: AIX V5.2+ does not support the MCA 6214, 6216, 6217 and 6219 SSA adapters.
Always a good idea to contact IBM support

2-5
Student Notebook
HACMP compatibility
Compatibility Flashes can be found at
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Flashes
Hints, Tips and Technotes can be found at
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Technotes
HACMP Release Notes Shipped with the product
2-6

V3.1.0.1
Student Notebook
LVM Components
LVM manages the components of the disk subsystem. Applications talk to the
disks through LVM.
This example shows an application writing to a file system which has its LVs
mirrored in a volume group physically residing on separate hdisks.
hdisks
Physical
Partitions
Volume Group
Uempty
LV
Logical
Partitions
FS
write to
/file system
Mirrored
Logical
Volume
Application
Figure 2-3. LVM Components
QV1251.2
Notes:
LVM relationships
An application writes to a filesystem. A filesystem provides the directory structure and is
used to map the application data to logical partitions of a logical volume. Because there
is a LVM, the application is isolated from the physical disks. The LVM can be configured
to map a logical partition to up to three physical partitions and have each physical
partition (copy) reside on a different disk The different disks can be different types/sizes.

2-7
Student Notebook
LVM L
LVM Volume Groups

Classic: hardware based access control
-Non-concurrent mode
Designed for single node (serial) access
-Concurrent mode (can no longer create them)

Designed for multi node (parallel) access
Enhanced Concurrent Mode (ECM): RSCT access control

-Single or multi node access
-HACMP support for:
varyonvg in passive mode (read only lsvg, lslv commands)
Fast Disk Takeover, non-ip networks
-Displaying passive state (cannot use lsvg o):
toronto # lsvg ecmvg
VOLUME GROUP: ecmvg
0009314700004c00000000fe2eaa2d6d
VG STATE:
active
VG PERMISSION: passive-only
...
...
Concurrent:
Enhanced-Capable
VG IDENTIFIER:
PP SIZE:
TOTAL PPs:
8 MB
537 (4296 MB)
...
Auto-Concurrent: Disabled
Figure 2-4. LVM Volume Groups
QV1251.2
Notes:
Classic and enhanced concurrent mode (ECM) volume groups
History
Concurrent mode volume groups were created to allow multiple nodes to access the
same logical volumes concurrently.
The original concurrent mode volume groups are only supported on Serial DASD and
SSA disks in conjunction with the 32-bit kernel.
Beginning with AIX V5.1, the enhanced concurrent mode volume group was introduced
to extend the concurrent mode support to all other disk types and to the 64-bit kernel.
Enhanced concurrent volume groups can also be used in a non-concurrent
environment to provide RSCT-based shared storage protection.
2-8

V3.1.0.1
Student Notebook
Uempty
Normal access environment

While both normal and classical concurrent volume groups are supported for
reserve/release-based shared storage protection, usually you would use normal volume
groups.
Concurrent access environment
If you need concurrent access, you must use concurrent volume groups. You should
convert classical concurrent volume groups to enhanced concurrent mode whenever
possible to make use of its flexibility. Also, support for classical concurrent volume
groups is being withdrawn (see below).
Support for the classical concurrent volume groups is being removed
- AIX V5.1 introduced enhanced concurrent volume groups, but still allowed you to
create and use the classical concurrent volume groups. When concurrent volume
groups are created on AIX V.5.1 and up, they are created as enhanced concurrent
mode volume groups by default.
- AIX V5.2 does not allow you to create classical concurrent volume groups, but you
can still use them in AIX V5.2.
- AIX V5.3 removes the support for classical concurrent volume groups entirely; only
enhanced concurrent volume groups are supported.
What is passive mode

With enhanced concurrent mode (ECM) volume groups, a VG may be varied on in
passive mode or active mode. Active mode is equivalent to normal varyon and will be
displayed with the lsvg -o command.
Passive mode (which should be used under the control of HACMP) allows read only
access to the LVM data via commands such as lsvg and lslv. It is implemented using
a group services subsystem called gsclvmd. You cannot determine the passive varyon
state from the lsvg -o command. As the visual shows, you must use the lsvg vg_name
command to determine this state.
One big benefit of passive mode will be seen when we discuss the shared storage
environment. Changes to the LVM constructs (except filesystem changes) on an active
node will automatically be synched to the passive nodes using the gsclvmd daemon.
What is fast disk takeover

Normally, varyonvg requires time to make disk accesses but switching a volume group
from passive to active state (or the reverse) is a very fast operation as it only updates
the LVMs internal state of the volume group in an AIX kernel data structure. We will see
a little later in this unit how HACMP will make use of this in a shared storage
environment.

2-9
Student Notebook
High Availability Data/Storage Issues

Storage Adapter failure
- Duplicate adapters
Data/Disk (LUN) Access failure

- LVM mirroring
- RAID
RAID 1 or 10 (AIX or Disk subsystem)
RAID 5 (DISK subsystem only)
Storage system Access failure

- Multiple paths
- dual power
Total storage failures: (Node, all adapters, or all disks (LUNs))

- Another Node with shared storage
Figure 2-5. High Availability Data/Storage Issues
QV1251.2
Notes:
Data access failure requires redundancy
HACMP does not provide data redundancy. Data must be striped or mirrored across
multiple physical drives (generally presented to AIX as a LUN). The replication can be
done by AIX using LVM mirroring or the storage system using RAID 5 or, if using JBOD
(Just a Bunch of Disks), by AIX using LVM mirroring. HACMP is not aware of the
method being used.
AIX: LVM mirroring

LVM mirroring is normally used if the storage system is setup to use Just a bunch of
disks (JBOD).
Some of the features of LVM mirroring are:

V3.1.0.1
Student Notebook
Uempty
- Data can be mirrored on three disks rather than having just two copies of data. This
provides higher availability in the case of multiple failures, but does require more
disks for the three copies.
- The disks used in the physical volumes could be of mixed attachment types.
- Instead of entire disks, individual logical volumes are mirrored. This provides
somewhat more flexibility in how the mirrors are organized. It also allows for an odd
number of disks to be used and provides protection for disk failures when more than
one disk is used.
- The disks can be configured so that mirrored pairs are in separate sites or in
different power domains. In this case, after a total power failure on one site,
operations can continue using the disks on the other site that still has power. No
information is displayed on the physical location of each disk when mirrored logical
volumes are being created, unlike when creating RAID 1 or RAID 0+1 arrays, so
allocating disks on different sites requires considerable care and attention.
- Mirrored pairs can be on different adapters.
- Read performance is good for short length operations as data can be read from
either of two disks, so the one with the shortest queue of commands can be used.
Write performance requires a write to two disks.
- Extra mirrored copies can be created and then split off for backup purposes.
- Data can be striped across several mirrored disks, an approach which avoids hot
spots caused by excessive activity on a few disks by distributing the I/O operations
across all the member disks.
- There are parameters such as Mirror Write Consistency, Scheduling Policy, and
Enable Write Verify which can help maximize performance and reliability.
Storage system
RAID 5 can be used within the storage system. Hardware features must be checked for
compatibility with HACMP. Multiple paths to get to the data from the server is
accomplished through multi-pathing software. That software must be checked for
compatibility with HACMP.
Although not in the scope of this class, the selected storage subsystem will be affected
by the factors listed below (among others). The selected storage subsystem will then
determine what you will look for in terms of compatibility with the chosen HACMP
version and features.
- Data access performance requirements
- Capacity
- Support for multi-pathing
- Price

2-11
Student Notebook

V3.1.0.1
Student Notebook
Uempty
Configuring a Mirrored
File System for HACMP
Step
Description
Options
Name the VG something meaningful like shared_vg1
create shared volume group
change auto varyon flag
create a jfslog lv
"sharedlvlog"
initialize the jfslog
create a data lv "sharedlv"
create a file system on a

previously created lv
mount filesys, lsvg -l shared_vg1 should show 1

verify the log file is in use lv
type jfslog, 1 lp, 2pp.
chvg -an shared_vg1

Type=jfslog, size=1pp, separate physical
volumes=yes, scheduling=sequential,
copies=2/dev/sharedlvlog
logform
type= jfs, size=?,separate physical volumes=yes,
copies=2, scheduling = sequential, write verify = ??
pick the lv = sharedlv to create the file system on,
automount = no, assign desired mount point
Figure 2-6. Configuring a Mirrored File System for HACMP
QV1251.2
Notes
Introduction
This visual describes a procedure for creating a mirrored filesystem for use in HACMP.
There is an easier-to-use method provided by an HACMP facility called C-SPOC which
is discussed later in the course. The C-SPOC method cannot be used until the HACMP
clusters topology and at least one resource group have been configured.
The procedure described in the visual permits the creation of shared file systems before
performing any HACMP related configuration (an approach favored by some cluster
configurators).
Detailed procedure
Here are the steps in somewhat more detail:
a. Use the smit mkvg fastpath to create the volume group.

2-13
Student Notebook
b. Make sure that the volume group is created with the Activate volume group
AUTOMATICALLY at system restart parameter set to no (or use smit chvg to
set it to no). This gives HACMP control over when the volume group is brought
online. It is also necessary to prevent, for example, a backup node from attempting
to online the volume group at a point in time when it is already online on a primary
node. This is not necessary for ECM volume groups -- it is the default.
c. Use the smit mklv fastpath to create a logical volume for the jfslog with the
parameters indicated in the figure above (make sure that you specify a type of jfslog
or AIX ignores the logical volume and creates a new one that is not mirrored when
you create filesystem below).
d. Use the logform command to initialize the logical volume for use as a JFS log
device.
e. Use the smit mklv fastpath again to create a logical volume for the filesystem with
the parameters indicated in the figure above.
f. Use the smit crjfslv fastpath (not crjfs) to create a JFS filesystem in the now
existing logical volume.
g. Verify by mounting the filesystem and using the lsvg command. Notice that if
copies were set to 2, then the number for PPs should be twice the number for LPs
and that if you specified separate physical volumes then the values for PVs should
be 2 (the number of copies).
The procedure for creating a JFS2 filesystem is quite similar although there are a few
differences:
- The type of the JFS2 log logical volume should be jfs2log
- The logform command requires an additional parameter to cause it to create a
JFS2 log
# logform -V jfs2log <lvname>
- The type of the JFS2 filesystem logical volume should be jfs2
- The fastpath for creating a JFS2 filesystem in an existing logical volume is
smit crjfs2lvstd

V3.1.0.1
Student Notebook
Uempty
Shared Storage Considerations

Node
1
Node
2
LVM
odm
pvid
Device
hdisk
Adapter
LVM
odm
pvid
Device
hdisk
Adapter
access
access
shared
disks
rootvg
rootvg
rootvg
rootvg
VGDA
pvid
private
private
Adapters: connect to same disks; compatible (microcode, PTF levels for drivers)
Device: may be different hdisk numbers but better to match
LVM: definitions, PVIDs must be in synch
Access: Private vs. Shared
Storage system must connect to both nodes
Shared may be serial (non-concurrent) or parallel (concurrent)
Figure 2-7. Shared Storage Considerations
QV1251.2
Notes:
Shared storage
The answer to the loss of a node is the concept of shared storage. In this case we have
access to the storage from more than one node. Shared storage requires that LVM
components be in synch on all nodes. Also, adapters and microcode on all the systems
be at the same level.
Shared storage and application storage requirements

A computer application always requires at least a certain amount of disk storage space.
When such an application is placed into a high-availability cluster, any of the
applications data which changes must be stored in a location which is accessible to
whichever node the application is currently running on.
Some application related storage need not be shared if accessed from only one system
(such as rootvg shown above). We refer to this as private storage.

2-15
Student Notebook
LVM PVIDs
Each AIX system that is sharing a volume group will need to have access to the same
disks (LUNs). This is either done through zoning and masking in the SAN or via twin-tail
cabling for non-SAN implementations. If the zoning/masking/cabling is done correctly,
each system will see the same disks (LUNs).
Note, for SCISI adapters in a shared storage environment, avoid SCSI id 7 as AIX may
assign it during a maintenance or diag operation and you could end up by accident with
two SCIS id = 7.

V3.1.0.1
Student Notebook
Uempty
Serial Access Requirements

Controlling Access
Only the node running the application should be able to access the data
Facilities:
-
reserve/release:
used with classic vg
varyonvg, varyoffvg (or HACMP low level code)
gsclvmd (RSCT):
used with ECM VGs
invoked with varyonvg in passive mode
used for fast disk takeover needs no disk access
Synchronizing Changes
Changes made to one side must be propagated to the other side
Facilities:
-
importvg command (normally requires varyoffvg on other
node)
-
RSCT (Enhanced Concurrent VG: LV -- not file system
HACMP C-SPOC (preferred method -- does not require
HACMP Lazy Update
changes)
varyoffvg)
Figure 2-8. Serial Access Requirements
QV1251.2
Notes:
Why?
The shared storage is physically connected to each node that the application might run
on. In a serial (non-concurrent) access environment, the application actually runs on
only one node at a time and modification or even access to the data from any other
node during this time could be catastrophic (the data could be corrupted in ways which
take days or even weeks to notice).
Any LVM changes in shared storage must be synchronized.
Controlling access using reserve/release

Reserve/release-based shared storage protection relies on the disk technology
supporting a mechanism called disk reservation. Disks which support this mechanism
can be, in effect, told to refuse to accept almost all commands from any node other than
the one which issued the reservation. AIXs LVM automatically issues a reservation

2-17
Student Notebook
request for each disk in a volume group when the volume group is varied online by the
varyonvg command. The varyonvg command fails for any disks which are currently
reserved by other nodes. If it fails for enough disks, which it almost certainly does since
if one disk is reserved by another node, the others presumably are also, then the varyon
of the volume group fails. HACMP can, if necessary during a fallover, execute the low
level routines to unreserve a disk.
Controlling access using gsclvmd (RSCT) and fast disk takeover

- Description
AIX V5.1 introduced a new mechanism to be used with enhanced concurrent volume
groups. This mechanism uses an AIX component called Reliable Scalable Cluster
Technology (RSCT). A special subsystem, gsclvmd, runs on all nodes and uses the
Group Services component of RSCT to allow varyonvg in passive mode. This
eliminates the need for hardware reserve/release and is disk independent. HACMP 5.x
uses this mechanism when enhanced concurrent volume groups are in use.
- Fast disk takeover details
The ability to use varyonvg in passive mode and then switching a volume group from
passive to active state (or the reverse) is referred to as fast disk takeover because it is a
very fast operation. It only updates the LVMs internal state of the volume group in an
AIX kernel data structure and does not require any actual disk access operations.
It is automatically enabled as long as all nodes are at HACMP 5.x and the VG is an
ECM volume group.
Caution: Fast disk takeover requires all systems accessing the disk to be under the
control of HACMP. HACMP uses varyonvg passive mode -- this allows group services
to prevent access if there is a problem with group services. If not under the control of
HACMP then it is possible to varyon a VG from 2 different nodes as there is no
hardware reserve release.
Synchronization of LVM data

- Lazy update
When using reserve/release-based shared storage protection, HACMP provides a
last-chance mechanism called lazy update to update the ODM on the takeover node at
the time of fallover. This is meant to be a final attempt at synchronizing the VGDA
content with a takeover nodes ODM at fallover time. For obvious reasons (like the fact
that it cant overcome some VGDA/ODM mismatches) relying on lazy update should be
avoided.
Lazy update works by using the volume group timestamp in the ODM. When HACMP
needs to varyon a volume group, it compares the ODM timestamp to the timestamp in
the VGDA. If the timestamps disagree, lazy update does an exportvg/importvg to
recreate the ODM on the node. If the timestamps agree, no extra steps are required. It
V3.1.0.1
Student Notebook
Uempty
is, of course, possible to update the ODM on inactive nodes when the change to the
meta-data is made. In this way, extra time at fallover is avoided. The ODM can be
updated manually or you can use Cluster Single Point of Control (C-SPOC) which can
automate this task. Lazy update and the various options for updating ODM information
on inactive nodes are discussed in detail in a later unit in this course.
Must also be careful for concurrent access

Some clusters have instances of the application active on more than one node at a time
(for example, parallel databases). Such clusters require simultaneous access to the
shared disks and must be designed to carefully control or coordinate their access to the
shared data. Concurrent access applications also require controlled, or at least
coordinated access to the shared data. This mechanism must be provided by the
application.

2-19
Student Notebook
Reserve/Release Voluntary VG Takeover

Node
1
httpvg
varyonvg
ODM
ODM
ODM
dbvg
C
varyonvg
Node
1
ODM
ODM
Node
1
Node
2
Node
2
ODM
ODM
dbvg
C
varyonvg
httpvg
varyonvg
ODM
ODM
Node
2
ODM
ODM
dbvg
C
varyonvg
Node2:
varyoffvg httpvg
Node1:
varyonvg httpvg
Figure 2-9. Reserve/Release Voluntary VG Takeover
QV1251.2
Notes
Voluntary takeover
With reserve/release-based shared storage protection, HACMP passes volume groups
between nodes by issuing a varyoffvg command on one node and a varyonvg
command on the other node. The coordination of these commands (ensuring that the
varyoffvg is performed before the varyonvg) is the responsibility of HACMP.

V3.1.0.1
Student Notebook
Uempty
Reserve/Release Involuntary VG Takeover
httpvg
Node
1
B varyonvg
ODM
ODM
ODM
ODM
varyonvg
Node
1
Node
2
httpvg
varyonvg
Node
2
ODM
ODM
ODM
varyonvg
Figure 2-10. Reserve/Release Involuntary VG Takeover
QV1251.2
Notes
Involuntary disk takeover
The right node has failed with the shared disks still reserved to the right node. When
HACMP encounters a reserved disk in this context, it uses a special utility program to
break the disk reservation. It then varies on the volume group which causes the disks to
be reserved to the takeover node.
Implications
Note that if the right node had not really failed then it would lose its reserves on the
shared disks (rather abruptly) when the left node varied them on. This will be seen in
the left nodes error log and should be acted on immediately as this indicates you are in
a situation where both nodes can access and update the data on the disks (each
believing that it is the only node accessing and updating the data). An involuntary
takeover isnt possible unless all paths used by HACMP to communicate between the
two nodes have been severed.

2-21
Student Notebook
How do we know the other node has failed?

Involuntary disk takeover will only take place when a node believes that the active node
has failed. HACMP uses communication between the nodes to determine if each node
is still active. In other words, it is important to ensure that there is sufficient redundancy
in these communication paths to ensure that loss of all communication with another
node implies that the other node has truly failed.

V3.1.0.1
Student Notebook
Uempty
RSCT Based Voluntary VG Takeover

Node
1
httpvg
passive
varyon
active
varyon
ODM
ODM
active
varyon
Node
1
dbvg
passive
varyon
httpvg
passive
varyon
passive
varyon
ODM
dbvg
httpvg
A
active
varyon
passive
varyon
passive
varyon
ODM
dbvg
passive
varyon
2. Right node puts httpvg

into passive mode
Node
2
ODM
active
varyon
1. A decision is made to
move httpvg from the
right node to the left
Node
2
ODM
active
varyon
Node
1
Node
2
3. Left node puts httpvg

into active mode
Figure 2-11. RSCT Based Voluntary VG Takeover
QV1251.2
Notes
Voluntary VG takeover with fast disk takeover
With RSCT based takeover there is no need to check for lazy update or to do the
reserves and a lot of the varyonvg processing. This is referred to in HACMP as Fast
Disk Takeover. The fast disk takeover mechanism handles a voluntary VG takeover by
first putting the volume group on the node which is giving up the volume group into
passive state. It then sets the active varyon state on the node which is taking over the
volume group. The coordination of these operations is managed by HACMP 5.x and
AIX RSCT.

2-23
Student Notebook
RSCT Based Involuntary VG Takeover

Node
1
passive
varyon
httpvg
B
active
varyon
ODM
ODM
active
varyon
Node
1
dbvg
passive
varyon
httpvg
passive
varyon
Node
2
ODM
ODM
active
varyon
Node
1
Node
2
active
varyon
dbvg
httpvg
passive
varyon
passive
varyon
ODM
dbvg
Active varyon state and

passive varyon state are
concepts which don't apply
to failed nodes
Node
2
ODM
active
varyon
1. Right node fails

2. Left node realizes that
right node has failed
passive
varyon
3. Left node obtains active

mode varyon of httpvg
Figure 2-12. RSCT Based Involuntary VG Takeover
QV1251.2
Notes
Involuntary with fast disk takeover
A node has failed. Once the remaining node (or nodes) realize that the node has failed,
the takeover node sets the volume groups varyon state to be active.
There is no need to break disk reservations as no disk reservations are in place. The
only action required is that the takeover node ask its local LVM to mark the volume
groups varyon state as active.
If Topology Services fail (that is, no communication between the nodes) then group
services fail and it is not possible to activate the volume group. This makes it very safe
to use. It is recommended, however, to attach the disks in an enhanced volume group
only to systems running HACMP 5.x.

V3.1.0.1
Student Notebook
Uempty
Synchronizing Changes
Without C-SPOC
Node
1
Node
2
Disk
Array
VGDA
ODM
ODM
#1
mkvg
#2
mklv (log)
unmount
logform
varyoffvg
mklv (data)
crfs
OR
chvg,chlv,chfs May require stopping application
With C-SPOC
does not require stopping application
only supported method for ECM VGs
#3
(cfgmgr)
importvg
chvg
#4
varyoffvg
Figure 2-13. Synchronizing Changes
QV1251.2
Notes
Introduction
The steps to add a shared volume groups are:
1)
2)
3)
4)
5)
6)
Ensure common PVIDs

Create a new VG and its contents
Varyoff VG on Node1
Import VG on Node2 and set VG characteristics correctly
Varyoff VG on Node2
Start HACMP
Please note that the slide presents only a high-level view of the commands required to
perform these steps. More details are provided below.

2-25
Student Notebook
1. Ensure common PVIDs across all nodes that will share volume group
As discussed earlier, HACMP has no requirement that hdisk names on all the nodes are
consistent, but that all the nodes have access to the same disks and have discovered
the PVIDs.
a. Ensure disks are zoned/masked so that the disks will be seen by both nodes.
b. Add the shared disk(s) to AIX on the primary node (Node1 in the example):
cfgmgr
c. Assign a PVID to the disk(s)
chdev -a pv=yes -l disk_name
where disk_name is hdisk#, hdiskpower# or vpath#.
d. Add the disks to AIX on the secondary node (Node2)
cfgmgr
e. Using the PVIDs, verify that the necessary PVIDs are seen on both nodes. If not,
correct.
lspv
2. Create a new VG on Node1

a. Create the shared volume group
Use smit mkvg or C-SPOC, remember to pick a unique Major number for the VG.
b. Change the auto vary on flag using:
chvg -an <vgname>
(C-SPOC does this automatically. Also, this step is unnecessary if you are using an
enhanced concurrent mode VG)
c. Create and Initialize the jfslog using:
mklv or smit mklv
logform <jfslogname>
(C-SPOC handles this automatically)
d. Create the logical volume
use smit mklv or C-SPOC
e. Create the filesystem using one of the following options:
crfs or smit jfs or C-SPOC
using SMIT, select
Add a Journaled File System on a previously defined logical volume
3. Varyoff VG from Node1

a. umount <File_System> any file systems that are part of the VG which was just
created.
b. varyoffvg <vgname>, the new volume group created in step 1.
4. Import VG on Node2 and set VG characteristics correctly

a. On the second cluster node perform the following commands:
V3.1.0.1
Student Notebook
Uempty
importvg -V <major#> -y <vgname> <hdisk#>

chvg -an <vgname>
If using C-SPOC, you can skip this step as it will do this automatically for you.
5. Varyoff the VG on Node2

a. varyoffvg <vgname>
If using C-SPOC, you can skip this step as it will do this automatically for you.
6. Start HACMP
a. Restart HACMP, which varies on the VG and mounts the filesystems and you can
then resume processing.
C-SPOC
Fortunately, there is an easier way.
These steps will be done automatically if the cluster is active and C-SPOC is used.
Otherwise, you can use the commands listed here in the notes.
Unfortunately, we are not looking at the easier way until we get to the C-SPOC unit.

2-27
Student Notebook
Quorum Issues
AIX performs quorum checking on volume groups in order to ensure that the volume
group remains consistent
the quorum rules are intended to ensure that structural changes to the
volume group (for example, adding or deleting a logical volume) are
consistent across an arbitrary number of varyon-varyoff cycles
When mirroring in AIX, quorum checking is an issue because losing access to 50% of the
disks in a volume group takes the volume group offline
How can you lose access to 50% of the disks?
logical volumes are mirrored across two things

the two things can be two disk enclosures or two sites
one of the two things goes away
VG status
Quorum checking
Enabled for
volume group
Running
>50% VGDAs
Quorum checking Disabled for

volume group
>1
VGDAs
100% VGDAs
varyonvg
>50% VGDAs
or if MISSINGPV_VARYON=TRUE
>50% VGDAs
Figure 2-14. Quorum Issues
QV1251.2
Notes
Introduction
If you plan to mirror your data at the AIX level to provide redundancy, you will need to
consider AIX quorum checking on a volume group. If you arent mirroring your data at
the AIX level, quorum isnt an issue.
Quorum
Quorum is the check used by the LVM at the volume group level to resolve possible
data conflicts and to prevent data corruption. Quorum is a method by which >50% of
VGDAs must be available in a volume group before any LVM actions can continue.
Note: For a VG with 3 or more disks, there is one copy of the VGDA on each disk. For a
one disk VG, there are two copies of the VGDA. For a two disk VG, the first disk has two
copies and the second has one copy of the VGDA. The VGDA is identical for all disks in
the VG.
V3.1.0.1
Student Notebook
Uempty
Quorum is especially important in a HA cluster. If LVM can varyon a volume group with
half or less of the disks, it might be possible for two nodes to varyon the same VG at the
same time, using different subsets of the disks in the VG. This is a very bad situation
which we will discuss in the next visual.
Normally LVM verifies quorum when the VG is varied on and continuously while the VG
is varied on.
50% of the disks go away

This is the reason you worry about quorum. As the visual indicates, the loss of access
to 50% of the disks will cause quorum checking to take the volume group offline. This is
not good when you consider that you are buying extra hardware to provide greater
availability for the end-user. But what does it mean to lose access to 50% of the disks?
If youre mirroring within a site, this will happen if youre mirroring across disk
enclosures. If one enclosure loses power or the adapter that the AIX system is using to
access the enclosure goes offline, you have lost access to 50% of the disks. If youre
mirroring cross-site, losing access to 50% of the disks means losing access to the other
sites storage subsystem. This could be a problem with just the storage subsystem at
the other site, a problem with the communications to the other site, or the other site is
entirely down.
In the case where you are dealing within a site, consider disabling quorum. In the case
where you are dealing with cross-site LVM mirroring, consider using HACMP to handle
the loss of access and ensure you enable the volume group for cross-site mirroring
verification (when adding the volume group via C-SPOC), add the disks in the volume
group to the list of cross-site mirrored disks (Add Disk/Site Definition for Cross-Site
LVM Mirroring, via smitty cl_xslvmm) and set the forced varyon flag in the
resource group that contains all cross-site mirrored volume groups. On recovery, if the
stale partition synchronization encounters a problem, you may have to use the manual
process of synchronizing the mirrors (C-SPOC menu item Synchronize Shared LVM
Mirrors).
AIX errlog entry for quorum loss

If quorum is lost the following is an example of an AIX errlog entry:
Id
Label
91F9700D LVM_SA_QUORCLOSE
Type CL
UNKN H
Description
QUORUM LOST, VOLUME GROUP CLOSING
How HACMP reacts to quorum loss

HACMP 4.5 and up automatically reacts to a loss of quorum (LVM_SA_QUORCLOSE)
error associated with a volume group going offline on a cluster node. In response to this
error, a non-concurrent resource group goes offline on the node where the error
occurred. If the AIX Logical Volume Manager takes a volume group in the resource

2-29
Student Notebook
group offline due to a loss of quorum for the volume group on the node, HACMP
selectively moves the resource group to another node.
You can change this default behavior by customizing resource recovery to use a notify
method instead of fallover. For more information, see Chapter 4: Configuring HACMP
Cluster Topology and Resources (Extended) in the HACMP for AIX V5.4 Administration
Guide.
Note: HACMP launches selective fallover and moves the affected resource group only
in the case of the LVM_SA_QUORCLOSE error. This error can occur if you use mirrored
volume groups with quorum enabled. However, other types of volume group failure
errors could occur. HACMP does not react to any other type of volume group errors
automatically. In these cases, you still need to configure customized error notification
methods, or use AIX Automatic Error Notification methods to react to volume group
failures.

V3.1.0.1
Student Notebook
Uempty
Quorum/Mirror Choices
Dont mirror in the AIX node
Use external storage subsystem (DS8000/DS6000, EMC, etc) or
RAID arrays
Mirror with quorum disabled
It may be possible for each side of a two-node cluster to have
different parts of the same volume group varied online
It is possible that volume group structural changes (for example, add
or delete of a logical volume) made during the last varyon are
unknown during the current varyon
It is possible that volume group structural changes are made to one
part of the volume group which are inconsistent with a different set of
structural changes which are made to another part of the volume
group
Use HACMP Forced Varyon
Figure 2-15. Quorum/Mirror Choices
QV1251.2
Notes
Introduction
Eliminating quorum issues is done either by mirroring with quorum disabled, or by not
mirroring at the AIX level.
Eliminating quorum problems

In order to enhance the availability of a volume group you should think about the
following:
- Using more than one disk adapter prevents the loss of access to the disks if a single
adapter fails. This can be used with an external disk subsystem to provide multiple
path (using multipathing software) to the LUNs, or with mirroring so that different
copies of the data are accessed through different adapters.
- For higher availability you should use two external power sources.

2-31
Student Notebook
- If there are only two disks in the volume group then you lose access to the volume
group if the disk with two VGDAs is lost.
- If you are mirrored across two disk subsystems, consider a quorum buster disk to
prevent loss if quorum if you lose access to one subsystem. This is discussed in the
later in the notes.
Distribute hard disks across more than one bus
Use multipathing software and two Fibre Channel adapters
Use three adapters per node in SCSI
Use two adapters per node, per loop in SSA
Use different power sources
Connect each power supply in the storage device to a different power source
Dont mirror at the AIX level

This is the option most configurations use today. The data redundancy is provided in the
external storage subsystem. Quorum is not an issue in this case.
Disabling quorum - nonquorum volume groups

Quorum checking can be disabled on a per-volume group basis. If quorum checking is
disabled, LVM will not varyoff a volume group if quorum is lost while the VG is running.
However, in this case, 100% of the VGDAs must be available when the volume group is
varied on. Disabling the quorum checking will only ensure that the volume group stays
varied on even in the case of loss of quorum.
Why disable quorum checking?
Disabling quorum checking may seem like a good idea from an availability point of view.
For example, consider a volume group mirrored across two disk cabinets. If access to
one disk cabinet is lost, only half of the VGDAs are available. With quorum checking
enabled, quorum is lost and the VG is varied off. This would seem to defeat the purpose
of mirroring. However, there are real risks associated with disabling quorum. We will
discuss ways to handle the quorum problem in the next few visuals.
Risks of disabling quorum checking
Disabling quorum checking is an option, however, considerable care must be taken to
ensure that a consistent set of VGDAs is used on an ongoing basis. In addition,
exceptional care must be taken to ensure that one half of the cluster isnt running with
one half of all the mirrored logical volumes while the other node is running with the other
half of all the mirrored logical volumes as this leads to a phenomenon known as data
divergence.
Sometimes it may be necessary to disable quorum in a cluster. In this case, take care
that you do not end up with data divergence. The primary strategy for avoiding data
V3.1.0.1
Student Notebook
Uempty
divergence is to avoid partitioned clusters although careful design of the clusters

shared storage is also important.
Quorum buster disk

Although not mentioned in the visual, another solution is to add a disk to the volume
group without putting data on it; this is called a quorum buster disk. The extra disk need
not contain any data, but as a member of the shared VG it holds a copy of the VGDA
and hence is counted in the quorum check.
Note: In order to be effective, the quorum buster disk must not rely on any component
that either of the two halves of the rest of the volume group relies on. In other words, the
quorum buster must have its own disk adapter (in each node), its own source of power
and its own cabling and cooling. If, for example, the quorum buster shares a disk
adapter or a power supply with one of the two halves then the loss of that disk adapter
or power supply results in the loss of the half and of the quorum buster which, in turn,
results in the loss of quorum and the volume group goes offline.

2-33
Student Notebook
HACMP Forced Varyon

Oryou can allow HACMP to handle it
Involves downtime when a mirror copy is lost (reducing availability)
HACMP 5.x provides a per resource group forced varyon:

Each resource group has a flag which can be set to cause HACMP to
perform a careful forced varyon of the resource group's VGs
If normal varyonvg fails and this flag is set:
HACMP verifies that at least one complete copy of each logical volume is
available
If verification succeeds, HACMP forces the volume group online
This is not a complete and perfect solution to quorum issues:

If the cluster is partitioned then the rest of the volume group might still
be online on a node in the other partition
HACMP 4.5 introduced forced varyon for all shared VGs:

Still available in HACMP 5.x
If the HACMP_MIRROR_VARYON environment variable is set to TRUE,
forced varyon is enabled for all shared VGs in the cluster
If set, HACMP_MIRROR_VARYON overrides the per resource group
forced varyon flag
Figure 2-16. HACMP Forced Varyon
QV1251.2
Notes
Introduction
If you decide to mirror at the AIX level and to leave quorum checking on, you will want to
have HACMP handle the loss of access to a volume group if half the disks are lost. Be
sure you understand what youre deciding to do though. If you allow HACMP to handle
the loss of access to the volume group, this means that the loss of half the disks (only
one of your two copies of the data) will result in the users loss of access to the
application until it can be taken by another cluster node. Youve purchased the
additional hardware and setup the mirroring precisely to avoid downtime if you lose
access to part of the hardware, but this strategy will result in downtime. You make the
call (see disabling quorum in the previous visual).
varyonvg -f
AIX provides the ability to varyon a volume group if a quorum of disks is not available.
This is called forced varyon. The varyonvg -f command allows a volume group to be
V3.1.0.1
Student Notebook
Uempty
made active that does not currently have a quorum of available disks. All disks that
cannot be brought to an active state will be put in a removed state. At least one disk
must be available for use in the volume group.
Per resource group forced varyon

HACMP 5.x provides a flag in each resource group which allows you to enable forced
varyon of the VGs in that resource group, as described in the visual.
Forced varyon of all shared volume groups

The HACMP_MIRROR_VARYON environment variable, introduced in HACMP 4.5, when set
to TRUE, enables the forced varyon mechanism for all shared volume groups in the
cluster.
In contrast, the HACMP 5.x forced varyon mechanism applies to specific resource
groups volume groups.
The HACMP_MIRROR_VARYON variable is still supported by HACMP 5.x and, if set to TRUE,
overrides any per-resource group settings for the forced varyon feature.
If the HACMP_MIRROR_VARYON variable is used, it should probably be defined by inserting
the following line into /etc/environments on each cluster node:
HACMP_MIRROR_VARYON=TRUE
MISSINGPV_VARYON environment variable

An approach commonly used in the past to deal with quorum-related issues involves
the use of the MISSINGPV_VARYON environment variable. This AIX provided environment
variable, if set to TRUE in /etc/environment, enables the forced varyon of any VGs
which are missing disks.
Clusters which use the MISSINGPV_VARYON variable should probably be updated to use
either the HACMP_MIRROR_VARYON variable or HACMP 5.xs forced varyon feature.

2-35
Student Notebook
Recommendations for Forced Varyon

Before enabling HACMP's forced varyon feature for a volume
group or the HACMP_MIRROR_VARYON variable for the entire
cluster, ensure that:
The affected volume groups are mirrored across disk enclosures
The affected volume groups are set to super-strict allocation
There are redundant heartbeat networks between all nodes
Administrative policies are in effect to prevent volume group structural
changes when the cluster is running degraded (that is, failed over or
with disks missing)
Figure 2-17. Recommendations for Forced Varyon
QV1251.2
Notes
Be careful when using forced varyon
Failure to follow each and every one of these recommendations could result in either
data divergence or inconsistent VGDAs. Either problem can be very difficult if not
impossible to resolve in any sort of satisfactory way, so be careful!
More information
Refer to the HACMP for AIX Administration Guide Version 5.4 (chapter 15) and the
HACMP for AIX Planning Guide Version 5.4 (chapter 5) for more information about
forced varyon and quorum issues.

V3.1.0.1
Student Notebook
Uempty
Guidelines
Following these simple guidelines helps keep the configuration
easier to administer:
All LVM constructs must have unique names in the cluster
For example, httplv, httploglv, httpfs and httpvg
Mirror or otherwise provide redundancy for critical logical volumes

Don't forget the jfslog
If it isn't worth mirroring then consider deleting it now rather than having to
wait to lose the data when the wrong disk fails someday
Even data which is truly temporary is worth mirroring as it avoids an
application crash when the wrong disk fails
External disk subsystems (like the DS8000 or EMC Symmetrix) or RAID-5
storage devices are alternative ways to provide redundancy
The VG major device numbers should be the same

Mandatory for clusters exporting NFS file systems, but it is a good habit for
any cluster
Shared data on internal disks is a bad idea

Focus on the elimination of single points of failure
Figure 2-18. Guidelines
QV1251.2
Notes
Unique names
Since your LVM definitions are used on multiple nodes in the cluster, you must make
sure that the names created on one node are not in use on another node. The safest
way to do this generally is to explicitly create and name each entity (do not forget to
explicitly create, name and format the jfslog logical volumes using logform).
Mirror or otherwise provide redundancy

For availability, you should mirror (or use hardware RAID) for all your shared logical
volumes including the jfslog logical volume.
- If it is worth keeping then it is worth mirroring. If it is not worth mirroring then it is not
worth keeping and should be deleted.
- It is important to even mirror totally scratch space (in other words, space whose
contents is worthless after a restart of the application). Failure to mirror scratch

2-37
Student Notebook
space could cause an outage if the wrong disk fails. In order to avoid the outage,
mirror the scratch space!
The mirrorvg command provides an easy way to mirror all the logical volumes on a
given volume group. This same functionality may also be accomplished manually if you
execute the mklvcopy command for each individual logical volume in a volume group.
Volume group major numbers

If you are using NFS, you must be sure to use the same major number on all nodes.
Even if not using NFS, this is good practice, and makes it easy to begin using NFS with
this volume group in the future.
Use the lvlstmajor command on each node to determine a free major number
common to all nodes.
Use external disks for shared data

External disks should be used for shared volume groups. If internal disks were
configured for shared volume groups and the owning node needed to be powered down
for any reason, it would render the shared volume groups unavailable - clearly a bad
idea.
Eliminate single points of failure

The focus of cluster design must always be eliminating single points of failure.

V3.1.0.1
Student Notebook
Uempty
OEM VG and File System Support

OEM volume groups, filesystems can be used with HACMP
HACMP 5.3 automatically detects and provides the methods
for Veritas volume groups (VxVM) and filesystems (VxFS)
Can configure custom volume group or filesystems
processing methods using SMIT:
Extended Configuration
Extended Resource Configuration
HACMP Extended Resources Configuration
Configure Custom Volume Methods
Configure Custom Filesystem Methods
Limitations and more information
Figure 2-19. OEM VG and File System Support
QV1251.2
Notes
Introduction
You can configure OEM volume groups and file systems in AIX and use HACMP as an
IBM high-availability solution to manage such volume groups.
Note: Different OEMs may use different terminology to refer to similar constructs. For
example, the Veritas Volume Manage (VxVM) term Disk Group is analogous to the AIX
LVM term Volume Group. We will use the term volume groups to refer to OEM and
Veritas volume groups.
Veritas volume manager

Among other OEM volume groups and filesystems, HACMP 5.3 supports volume
groups and filesystems created with VxVM in Veritas Foundation Suite v.4.0. To make it
easier for you to accommodate Veritas volume groups in the HACMP cluster, the
methods for Veritas volume groups support are predefined in HACMP and are used

2-39
Student Notebook
automatically. After you add Veritas volume groups to HACMP resource groups, you
can select the methods for the volume groups from the pick lists in HACMP SMIT
menus for OEM volume groups support.
Note: Veritas Foundation Suite is also referred to as Veritas Storage Foundation (VSF).
Veritas file systems

Among other OEM volume groups and filesystems, HACMP 5.3 supports volume
groups and filesystems created with VxVM in Veritas Foundation Suite v.4.0. To make it
easier for you to accommodate Veritas filesystems in the HACMP cluster, the methods
for Veritas filesystems support are predefined in HACMP. As with volume groups, after
you add Veritas file systems to HACMP resource groups, you can select the methods
for the file systems from the pick lists in HACMP SMIT menus for OEM file systems
support.
Configuring custom volume group processing methods using SMIT

When HACMP identifies OEM volume groups of a particular type, it can be configured
to provide the volume group processing functions shown in the visual.
You can add, change, and remove custom volume groups processing methods for a
specific OEM volume group using SMIT. You can select existing custom volume group
methods that are supported by HACMP, or you can use your own custom methods.
Using SMIT, you can perform the following functions for OEM disks:
- Add Custom Volume Group Methods
- Change/Show Custom Volume Group Methods
- Remove Custom Volume Group Methods
Configuring custom filesystem processing methods using SMIT

When HACMP identifies OEM file systems of a particular type, it can be configured to
provide the file system processing functions shown in the visual.
You can add, change, and remove custom volume groups processing methods for a
specific OEM volume group using SMIT. You can select existing custom file system
methods that are supported by HACMP, or you can use your own custom methods.
- Add Custom Filesystem Methods
- Change/Show Custom Filesystem Methods
- Remove Custom Filesystem Methods

V3.1.0.1
Student Notebook
Uempty
Additional considerations
The custom volume group processing or filesystem methods that you specify for a
particular OEM volume group is added to the local node only. This information is not
propagated to other nodes; you must copy this custom volume group processing
method to each node manually. Alternatively, you can use the HACMP File Collections
facility to make the disk, volume, and file system methods available on all nodes.
Limitations and more information

There are some limitations to using OEM volume groups with HACMP. For example,
HACMP supports a number of extended functions for LVM volume groups that are not
available for OEM volume groups, such as enhanced concurrent mode, active and
passive varyon process, heartbeating over disk, selective fallover upon volume group
loss and others. In addition, there are a number of other limitations.
For complete details on using OEM volume groups/filesystems with HACMP, see
Appendix B in the HACMP for AIX Version 5.4 Installation Guide.

2-41
Student Notebook
OEM Disk Support

HACMP lets you use either IBM disks or OEM disks
EMC Considerations
Treat an unknown disk type the same way as a known type
/etc/cluster/disktype.lst
/etc/cluster/lunreset.lst
/etc/cluster/conraid.dat
Use custom disk processing methods
Identifying ghost disks
Determining whether a disk reserve is being held by
another node in the cluster
Breaking a disk reserve
Making a disk available for use by another node
Enhanced concurrent VGs
Figure 2-20. OEM Disk Support
QV1251.2
Notes
Introduction
HACMP lets you use either physical storage disks manufactured by IBM or by an
Original Equipment Manufacturer (OEM) as part of a highly available infrastructure.
Depending on the type of OEM disk, custom methods allow you (or an OEM disk
vendor) to either
- tell HACMP that an unknown disk should be treated the same way as a known and
supported disk type, or
- specify the custom methods that provide the low-level disk processing functions
supported by HACMP for that particular disk type

V3.1.0.1
Student Notebook
Uempty
EMC support
IBM does not provide the requirements for HACMP compatibility with non-IBM storage.
You must contact the support organization or online reference materials for the vendor
of the non-IBM storage.
Be sure to consider the multi-pathing software version and maintenance (PowerPath,
HDLM, MPIO PCM).
For EMC planning see their support matrix:
http://www.emc.com/interoperability/matrices/EMCSupportMatrix.pdf
Search for HACMP. You will get many hits, look in the sections that apply to your
storage devices. Then look for the HACMP version that you are installing. - Finally, look
for device driver, PowerPath and AIX patch information for your configuration.
Treat an unknown disk the same way as a known type

HACMP provides mechanisms that will allow you, while configuring a cluster, to direct
HACMP to treat an unknown disk exactly the same way as another disk it supports. The
following three files can be edited to perform this configuration. (There is no SMIT menu
to edit these files.)
- /etc/cluster/disktype.lst
This file is referenced by HACMP during disk takeover.
You can use this file to tell HACMP that it can process a particular type of disk the
same way it processes a disk type that it supports. The file contains a series of lines
of the following form:
<PdDvLn field of the hdisk><tab><supported disk type>
To determine the value of the PdDvLn field for a particular hdisk, enter the following
command:
# lsdev -Cc disk -l <hdisk name> -F PdDvLn
The known and supported disk types are:
Disk Name in HACMP
SCSIDISK
SSA
FCPARRAY
ARRAY
FSCSI
Disk Type
SCSI -2 Disk
IBM Serial Storage Architecture
Fibre Attached Disk Array
SCSI Disk Array
Fibre Attached SCSI Disk
For example, to have a disk whose PdDvLn field was disk/fcal/HAL9000 be treated
the same as IBM fibre SCSI disks, a line would be added that read:
disk/fcal/HAL9000
FSCSI
A sample disktype.lst file, which contains comments, is provided.


2-43
Student Notebook
- /etc/cluster/lunreset.lst
This file is referenced by HACMP during disk takeover.
HACMP will use either a target ID reset or a LUN reset for parallel SCSI devices
based on whether a SCSI inquiry of the device returns a 2 or a 3. Normally, only
SCSI-3 devices support LUN reset. However, some SCSI-2 devices will support an
LUN reset. So, HACMP will check the Vendor Identification returned by a SCSI
inquiry against the lines of this file. If the device is listed in this file, then a LUN reset
is used. This file is intended to be customer modifiable.
For example, if the HAL 9000" disk subsystem returned an ANSI level of '2' to
inquiry, but supported LUN reset, and its Vendor ID was HAL and its Product ID
was 9000, then this file should be modified to add a line which was either:
HAL
or
HAL9000
depending on whether vendor or vendor plus product match was desired. Note the
use of padding of Vendor ID to 8 characters.
A sample /etc/cluster/lunreset.lst file, which contains comments, is provided.
- /etc/cluster/conraid.dat
This file is referenced by HACMP during varyon of a concurrent volume group.
You can use this file to tell HACMP that a particular disk is a RAID disk that can be
used in classical concurrent mode. The file contains a list of disk types, one disk
type per line.
The value of the Disk Type field for a particular hdisk is returned by the following
command:
# lsdev -Cc disk -l <hdisk name> -F type
Note: This file only applies to classical concurrent volume groups. Thus this file has
no effect in AIX V5.3, which does not support classical concurrent VGs.
HACMP does not include a sample conraid.dat file. The file is referenced by the
/usr/sbin/cluster/events/utils/cl_raid_vg script, which does include some
comments.
The previously described files in /etc/cluster are not modified by HACMP after they
have been configured and are not removed if the product is uninstalled. This ensures
that customized modifications are unaffected by the changes in HACMP. By default, the
files initially contain comments explaining their format and usage.

V3.1.0.1
Student Notebook
Uempty
Keep in mind that the entries in these files are classified by disk type, not by the number
of disks of the same type. If there are several disks of the same type attached to a
cluster, there should be only one file entry for that disk type.
Finally, unlike other configuration information, HACMP does not automatically
propagate these files across nodes in a cluster. It is your responsibility to ensure that
these files contain the appropriate content on all cluster nodes. You can use the
HACMP File Collections facility to propagate this information to all cluster nodes.
Use custom disk processing methods

Some disks may behave sufficiently differently from those supported by HACMP so that
it is not possible to achieve proper results by telling HACMP to process these disks
exactly the same way as supported disk types. For these cases, HACMP provides finer
control.
While doing cluster configuration, you can either
- Select one of the specific methods to be used for the steps in disk processing
- Specify a custom method
HACMP supports the following disk processing steps:
-
Identifying ghost disks

Determining whether a disk reserve is being held by another node in the cluster
Breaking a disk reserve in HACMP
Making a disk available for use by another node
HACMP allows you to specify any of its own methods for each step in disk processing,
or to use a customized method, which you define.
- Add Custom Disk Methods
- Change/Show Custom Disk Methods
- Remove Custom Disk Methods
What is a ghost disk?

Although ghost disks no longer show up with IBM disks they may with OEM disks.
During the AIX boot sequence, the configuration manager (cfgmgr) accesses all the
shared disks (and all other disks and devices). Each time it accesses a physical volume
at a particular hardware address, it tries to determine if the physical volume is the same
actual physical volume that was last seen at the particular hardware address. It does
this by attempting to read the physical volumes ID (PVID) from the disk. This operation
fails if the disk is currently reserved to another node. Consequently, the configuration
manager is not sure if the physical volume is the one it expects or is a different physical
volume. In order to be safe, it assumes that it is a different physical volume and assigns
it a temporary hdisk name. This temporary hdisk name is called a ghost disk. When the

2-45
Student Notebook
volume group is eventually brought online by Cluster Services, the question of whether
each physical volume is the expected physical volume is resolved. If it is, then the ghost
disk is deleted. If it isnt, then the ghost disk remains. Whether or not the online of the
volume group ultimately succeeds depends on whether or not the LVM can determine
the identity of the disk.
This is not a problem with IBM disks, they can be identified even when there is a
reserve.
Ghost disk issues

- Time
Dealing with ghost disks takes time with the result that a volume group with ghost
disks takes longer to varyon than one without. For example, in one customer cluster
where ghost disks were found, they added about twenty seconds per ghost disk to
the time required to varyon the volume group. In volume groups that contain a large
number of physical volumes (LUNs), this can result in a significant delay during
fallovers.
- Dont delete ghost disks
It is very important that if ghost disks occur, they be left in the AIX device
configuration as their presence is necessary for the correct operation of the LVM
when the volume group is ultimately brought online by Cluster Services.
Additional considerations for custom methods

The custom disk processing method that you add, change or delete for a particular
OEM disk is added only to the local node. This information is not propagated to other
nodes; you must copy this custom disk processing method to each node manually or
use the HACMP File Collections facility.
OEM disks and enhanced concurrent volume groups

OEM disks can be used in enhanced concurrent volume groups, either for concurrent
access mode or, in non-concurrent access mode, for fast disk takeover. In this case,
you would need to edit the /etc/cluster/disktype.lst file and associate the OEM disk
with a supported disk type.
More information
For detailed information about configuring OEM disks for use with HACMP, see
Appendix B in the HACMP for AIX V5.4 Installation Guide.

V3.1.0.1
Student Notebook
Uempty
Virtual Storage (VIO) and HACMP

FRAME 1
VIOS 1
HBA
MPIO
hdisk0
vhost0
no_reserve
VIOS 2
HBA
MPIO
HBA
hdisk0
HACMP Node1
Hypervisor
HBA
vscsi0
MPIO
hdisk0
vscsi1
sharedvg
vhost0
hdisk0
FRAME 2
Stg
Dev
VIOS 1
HBA
MPIO
hdisk0
vhost0
no_reserve
VIOS 2
HBA
MPIO
HBA
hdisk0
HACMP Node2
Hypervisor
HBA
vscsi0
MPIO
vscsi1
hdisk0
sharedvg
vhost0
Enhanced concurrent mode volume groups required on HACMP nodes

MPIO or other (supported) multi-pathing software on VIO server
MPIO on HACMP nodes
Figure 2-21. Virtual Storage (VIO) and HACMP
QV1251.2
Notes
Overview
This type of configuration is becoming prevalent with the adoption of the Virtualization
capabilities of the Power5 (and follow-on) architecture. A full discussion of the
implementation of this configuration is beyond the scope of the class. The intent is to
indicate that this is a supported configuration, some of the terms to learn, requirements
and a configuration overview. Always consult the IBM Sales Manual and IBM Support
(and anyone else you can find who will talk to you about this from an experienced
standpoint) for the latest requirements and considerations.
Legend
Stg Dev - Storage Subsystem providing access to disks, like a DS8300, DS4000, EMC,
HDS, SSA, and so on.

2-47
Student Notebook
VIOS - Virtual I/O Server, the special LPAR on a Power5 systems that provides
virtualized storage (and networking) devices for use by client LPARs
HBA - Host Bus Adapter also known as Fibre Channel Adapter, this is the connection to
the SAN, giving the VIOS access to storage in the SAN (LUNs).
MPIO - Multipath I/O, built into AIX since v5.1, creates path devices for each instance of
a disk/LUN that is recognized by AIX, presenting only a single hdisk device from these
multiple paths.
vhost0 - Virtual SCSI (server) adapter on the Virtual IO Server that provides the client
LPARs with access to virtual SCSI disks.
vscsi0 - Virtual SCSI (client) adapter on the client LPAR that provides the client access
to the VIOSs Virtual SCSI (server) adapter and therefore access to the virtual SCSI
disks.
Hypervisor - the Power5 component that manages access between the vhost and
vscsi adapters.
Minimum requirements
As of the writing of this version of the course, the minimum requirements for HACMP
with Virtual SCSI (VSCSI) and Virtual LAN (VLAN) on POWER5 (eServer p5 and
eServer i5) models were:
- AIX V5.3 Maintenance Level 5300-002 with APARs IY70082 and IY72974
- VIO Server V1.1 with VIOS Fix pack 6.2 and iFIX IY71303
- HACMP V5.3 (or later), or HACMP V5.2 with APAR IY68370 (or higher) and APAR
IY68387, or HACMP V5.1 with APAR IY66556 (or higher)
All the details on requirements and specifications are in this FLASH:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10390
Configuration overview
Configuration is mostly performed on the VIOS and Hardware Management Console.
The use of MPIO at the AIX level is also key to ensuring data availability in the event
access to a VIOS is lost. Ensure that you can reactivate any path in MPIO that was lost
after it is recovered so as to avoid total loss of access to data on a subsequent path
failure. The HACMP consideration, in addition to the correct software levels as outlined
above, is that enhanced concurrent volume groups are used in this configuration.
Otherwise, this is just another volume group to be managed in a resource group to the
Cluster Manager.
On storage device
Map LUNs to the two corresponding VIO servers

V3.1.0.1
Student Notebook
Uempty
On Hardware Management Console

Define Mappings (vhost & vscsi)
On VIO Server 1
Set no_reserve attribute
chdev -l <hdisk#> -a reserve_policy=no_reserve a algorith=round_robin
Export the LUNs out to each client
mkvdev vdev hdisk# -vadapter vhost0
mkvdev f vdev hdisk# -vadapter vhost1
On VIO Server 2
Set no_reserve attribute
chdev -l <hdisk#> -a reserve_policy=no_reserve
Export the LUNs out to each client
mkvdev vdev hdisk# -vadapter vhost0
mkvdev f vdev hdisk# -vadapter vhost1
On Clients
- Install MPIO SDDPCM
- Create the shared volume group as enhanced concurrent VG on first Client
(bos.clvm.enh required)
- Varyoffvg on Client 1
- Import VG onto Client 2
- Define to HACMP as a shared resource in a resource group
References
Courses that address this configuration:
- Q1373, Logical Partitioning (LPAR) and Virtualization on System p POWER5
Systems
- Q1378, Advanced POWER Virtualization Implementation and Best Practices
Redbooks and Redpapers (www.redbooks.ibm.com):
- REDP-4194-00: IBM System p Advanced POWER Virtualization Best Practices
- REDP-4027-00: HACMP 5.3, Dynamic LPAR and Virtualization
Provides details later in the document on HACMP and Virtualization along with
failure scenarios in the VIO infrastructure and performance considerations.
- SG24-5768-01: Advanced POWER Virtualization on IBM eServer p5 Servers:
Architecture and Performance Considerations

2-49
Student Notebook
- SG24-7940-02: Advanced POWER Virtualization on IBM System p5: Introduction

and Configuration

V3.1.0.1
Student Notebook
Uempty
Checkpoint (1 of 3)
1. Which of the following statements is TRUE (pick the best answer)?
a. Static application data should always reside on private storage.
b. Dynamic application data should always reside on shared
storage.
c. Shared storage must always be simultaneously accessible in
read-write mode to all cluster nodes.
d. Application binaries should only be placed on shared storage.
2. True or False?
Using RSCT-based shared disk protection results in slower
fallovers.
3. Which of the following disk technologies are supported by HACMP?
a. SCSI
b. SSA
c. FC
d. All of the above
Figure 2-22. Checkpoint 1 of 3
QV1251.2
Notes

2-51
Student Notebook
Checkpoint (2 of 3)
4. True or False?
You should check the vendors website for supported HACMP
configurations when using SAN based storage units (DS8000,
ESS, EMC HDS, and so forth).
5. True or False?
hdisk numbers must map to the same PVIDs across an entire
HACMP cluster.
6. True or False?
Lazy update attempts to keep VGDA constructs in sync between
cluster nodes (reserve/release-based shared storage protection)
7. Which of the following commands will bring a volume group online?
a. mountvg vgA
b. getvtg vgA
c. attachvg vgA
d. varyonvg vgA
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Checkpoint (3 of 3)
8. True or False?
Quorum should always be disabled on shared volume groups.
9. True or False?
File system and logical volume attributes cannot be changed while
the cluster is operational.
10. True or False?
An enhanced concurrent volume group is required for the
heartbeat over disk feature.
QV1251.2
Notes

2-53
Student Notebook
Unit Summary
Access to shared storage must be controlled
Non-concurrent (serial) access
Reserve/release-based protection:
Slower and may result in ghost disks
RSCT-based protection (fast disk takeover):
Faster, no ghost disks, and some risk of partitioned cluster in the event of
communication failure
Careful planning is needed for both methods of shared storage protection
to prevent fallover due to communication failures
Concurrent access
Access must be managed by the parallel application
HACMP supports several disk technologies

Must be well understood to eliminate single points of failure
Shared storage should be protected with redundancy

LVM mirroring
LVM configuration options must be understood to ensure availability
LVM quorum checking and forced varyon must be understood to ensure
availability
Hardware RAID
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Lab Exercises: Exercise 1 and Exercise 2

Exercise 1: Configuring Shared Storage for HACMP
Estimated time: 30 minutes
Configure a shared disk with an enhanced concurrent
mode volume group to be used by one of the cluster
applications
Exercise 2: Create a Cluster
Estimated time: 3 hours
Configure a 2-node hot standby cluster, and extend to a
mutual takeover cluster
Figure 2-26. Lab Exercises: Exercise 1 and Exercise 2
QV1251.2
Notes:

2-55
Student Notebook

V3.1.0.1
Student Notebook
Uempty
Unit 3. HACMP Administration

This unit describes administration tasks for HACMP for AIX. It
discusses how to monitor an HACMP cluster with status commands
and log files, how to change the configuration of a cluster topology and
cluster resources, and how to perform cluster-wide configuration using
the Cluster Single Point of Control (C-SPOC).

Topic 1: HACMP Status and Log Files
- Display cluster configuration and status
- Locate and use HACMP log files
Topic 2: Topology and Resource Group Management
- Use the SMIT standard and extended menus to make topology
and resource group changes
Topic 3: Cluster Single Point of Control
- Describe the benefits and capabilities of C-SPOC
- Perform routine administrative changes using C-SPOC
- Start and stop cluster services
- Perform resource group move operations

Accountability:
Checkpoint
Machine exercises
References

3-1
Student Notebook
HACMP manuals
3-2

V3.1.0.1
Student Notebook
Uempty
Unit Objectives
Display cluster configuration and status
Locate and use HACMP log files
Topic 2: Topology and Resource Group Management

Use the SMIT standard and extended menus to make topology
and resource group changes

Describe the benefits and capabilities of C-SPOC
Perform routine administrative changes using C-SPOC
Start and stop cluster services
Perform resource group move operations
QV1251.2
Notes

3-3
Student Notebook
3-4

V3.1.0.1
Student Notebook
Uempty
3.1 HACMP Status and Log Files

3-5
Student Notebook

After completing this topic, you should be able to:
Display cluster configuration and status
Locate and use HACMP log files
Figure 3-2. Topic 1: HACMP Status and Log Files
QV1251.2
Notes
3-6

V3.1.0.1
Student Notebook
Uempty
Useful AIX Commands

Command
Description
ps -ef
./myappcheckscript
Is application running?
mount
df
Are file systems mounted? Are they full?
lsvg -o
Which VGs are active?
lsvg vgname
Check VG details.
lspv
Are disks in consistent state?
netstat -i
ifconfig -a
Where are the IP labels?
netstat -rn
ping -R
Can I reach the network?
lssrc -g cluster
lssrc -a | grep cl
Are HACMP subsystems running?
lssrc -ls clstrmgrES
Have cluster services been started? Are any events

running?
Figure 3-3. Useful AIX Commands
QV1251.2
Notes:
Useful AIX commands
Here is a list of useful AIX commands that are frequently used in cluster administration.
For additional commands or general reference purposes consult one of the following;
- AIX man pages
- The pSeries and AIX Information Center:
http://publib16.boulder.ibm.com/pseries/index.htm

3-7
Student Notebook
Useful HACMP Status Commands

Command
Description
clstat
Displays topology and resource group status.

Two modes: ASCII and X Windows.
Ongoing monitor, or one time status.
clinfoES and snmpd must be running.
cldump
Displays topology and resource group status and some

configuration.
snmpd must be running.
cldisp
Displays application and topology status and some

configuration.
snmpd must be running.
cltopinfo
(cllsif)
Displays topology configuration.
clRGinfo
(clfindres)
Displays resource group status.
clshowres
Displays resource group configuration.
clshowsrv
Calls lssrc to display status of:

HACMP subsystems
HACMP and RSCT subsystems
(clshowsrv -a)
(clshowsrv -v)
Figure 3-4. Useful HACMP Status Commands
QV1251.2
Notes:
clstat
The clstat utility uses the clinfo library routines to display all node, interface and
resource group information for a selected cluster. clinfoES and snmpd must be running.
ASCII or X mode
The clstat utility is supported in two modes: ASCII mode and X Window mode. ASCII
mode can run on any physical or virtual ASCII terminal, including xterm or aixterm
windows. If the DISPLAY variable is set, the program will run in X Window mode, unless
you specify the -a flag when issuing the command.
Monitor or one time status
Specifying the -o flag will execute the ASCII mode one time and exit. This is useful for
capturing clstat output from a shell script or cron job. Otherwise, clstat provides a
regularly updated display of cluster status.
3-8

V3.1.0.1
Student Notebook
Uempty
Refresh interval
Use -r to specify the refresh interval - the frequency with which clstat queries clinfo
for new cluster information. In ASCII mode, the command interprets the value of interval
in seconds. The default interval is 1 second. In X display mode, clstat interprets the
value of interval in tenths of seconds. The default interval is .1 of a second.
cldump
cldump uses SNMP to gather cluster status and sends the results to standard out.
cldisp
This script uses SNMP and prints an application-centric summary of the cluster to
standard output.
cltopinfo
cltopinfo displays cluster topology information in an easy to read format. There are
several flags to select which information is displayed.
cltopinfo shows configuration, not status. For example:
- It shows where service labels can be configured, not where they are currently
configured
- It shows the addresses configured for each interface, but does not show interface
state (UP or DOWN)
clslif is a link to cltopinfo and displays the topology in a slightly different format.
clRGinfo
The clRGinfo command displays a resource group's attributes. With no flags, it just
shows where each resource group is running. With various options, it will show
additional resource group attributes. You can specify a list of one or more resource
groups, or, if the command is invoked without any resource groups in command line,
information about all the configured resource groups is displayed.
If cluster services are not running on the local node, the command determines a node
where the cluster services are active and obtains the resource group information from
the active Cluster Manager.
clshowres
The clshowres command retrieves information from the HACMP resource ODM object
class and lists the resources defined for all resource groups or for a given group or
node.
It does not show where each resource group is currently running.

3-9
Student Notebook
clshowsrv
The clshowsrv command displays the status of HACMP subsystems. Status includes
subsystem name, group name, process ID, and status. The status of a daemon can be
any one of the states reflected by the SRC subsystem (active, inoperative, warned to
stop, etc).
Flags
- -a
Displays all HACMP daemons.
- -v
Displays all RSCT, HACMP and optional HACMP daemons.
- subsystem
Shows the status of the specified HACMP subsystem. Valid values are clstrmgrES,
clinfoES, and clcomdES. If you specify more than one subsystem, separate the
entries with a space (no commas).

V3.1.0.1
Student Notebook
Uempty
Summary of Main HACMP Log Files

Log File
Description
Start here to get time
of cluster event
High level view of cluster activity from clstrmgrES, clinfoES,

startup and reconfiguration scripts.
Usually a good place to start.
Use time to locate

details in hacmp.out
Detailed tracing information from HACMP event scripts. Cycled daily.

Keeps 7 files + today.
/usr/es/adm/cluster.log
/tmp/hacmp.out
/tmp/hacmp.out.[1-7]
/usr/es/sbin/cluster/history/cluster.mmddyyy
Daily history log. Generated by HACMP scripts.
/tmp/cspoc.log
Log of CSPOC activity.
/var/hacmp/clverify/clverify.log
Verbose messages from clverfiy (cluster verification utility).
/tmp/emuhacmp.out
Log of emulated HACMP events.
/tmp/clstrmgr.debug
Highly detailed debug output from clstrmgrES.
/var/hacmp/clcomd/
clcomd.log
clcomddiag.log
clcomd logs:
Log of incoming and outgoing connection requests.
Diagnostic information from clcomd.
/var/adm/clavan.log
The application availability analysis tool uses this file to analyze

application availability.
/var/hacmp/log/
clconfigassist.log
clutils.log
cl_testtool.log
Misc. logs:
Two-node assistant log.
Utilities and file propagation log.
Cluster test tool log.
/var/ha/log/
RSCT logs:
Log of Group Services daemon.
Log of Topology Services daemon.
grpsvcs*
topsvcs*
Figure 3-5. Summary of Main HACMP Log Files
QV1251.2
Notes:
HACMP log files
Your first approach to diagnosing a problem affecting your cluster should be to examine
the cluster log files for messages output by the HACMP subsystems. These messages
provide valuable information for understanding the current state of the cluster.
Which log files should I look at?

HACMP has MANY log files in several directories. How do you know where to start?
Generally the two most useful logs will be cluster.log and hacmp.out.
cluster.log provides a high level overview of HACMP activity and is a good starting
point. For most troubleshooting, the /tmp/hacmp.out file will be the most helpful log file.
However, hacmp.out is a very long and detailed file. Its usually helpful to start with the
high level cluster.log file to get an overview of event flow and to get a time for an event,
then use that time to locate the relevant entries in hacmp.out.

3-11
Student Notebook
/usr/es/adm/cluster.log
Contains time-stamped, formatted messages generated by HACMP scripts and
daemons.
Recommended Use: Because this log file provides a high-level view of current cluster
status, check this file first when diagnosing a cluster problem.
/tmp/hacmp.out
Contains time-stamped, formatted messages generated by HACMP scripts on the
current day.
In verbose mode (the default), this log file contains a line-by-line record of every
command executed by scripts, including the values of all arguments to each command.
An event summary of each high-level event is included at the end of each events
details.
Recommended Use: Because the information in this log file supplements and expands
upon the information in the /usr/es/adm/cluster.log file, it is the primary source of
information when investigating a problem.
Note: With recent changes in the way resource groups are handled and prioritized in
fallover circumstances, the hacmp.out file and its event summaries have become even
more important in tracking the activity and resulting location of your resource groups.
/usr/es/sbin/cluster/history/cluster.mmddyyyy
Contains time-stamped, formatted messages generated by HACMP scripts. The
system creates a cluster history file every day, identifying each file by its file name
extension, where mm indicates the month, dd indicates the day, and yyyy the year.
Recommended Use: Use the cluster history log files to get an extended view of cluster
behavior over time.
Note: This log is not a good tool for tracking resource groups processed in parallel. In
parallel processing, certain steps formerly run as separate events are now processed
differently and these steps will not be evident in the cluster history log. Use the
hacmp.out file to track parallel processing activity.
/tmp/cspoc.log
Contains time-stamped, formatted messages generated by HACMP C-SPOC
commands. The /tmp/cspoc.log file resides on the node that invokes the C-SPOC
command.
Recommended Use: Use the C-SPOC log file when tracing a C-SPOC commands
execution on cluster nodes.

V3.1.0.1
Student Notebook
Uempty
/var/hacmp/clverify/clverify.log
The /var/hacmp/clverify/clverify.log file contains the verbose messages output by the
cluster verification utility. The messages indicate the node(s), devices, command, etc. in
which any verification error occurred.
/tmp/emuhacmp.out
Contains time-stamped, formatted messages generated by the HACMP Event
Emulator. The messages are collected from output files on each node of the cluster, and
cataloged together into the /tmp/emuhacmp.out log file.
In verbose mode (recommended), this log file contains a line-by-line record of every
event emulated. Customized scripts within the event are displayed, but commands
within those scripts are not executed.
/tmp/clstrmgr.debug
Contains time-stamped, formatted messages generated by the clstrmgrES daemon.
/var/hacmp/clcomd/clcomd.log
Contains time-stamped, formatted messages generated by Cluster Communications
daemon (clcomd) activity. The log shows information about incoming and outgoing
connections, both successful and unsuccessful. Also displays a warning if the file
permissions for /usr/es/sbin/cluster/etc/rhosts are not set correctlyusers on the
system should not be able to write to the file.
Recommended Use: Use information in this file to troubleshoot inter-node
communications, and to obtain information about attempted connections to the daemon
(and therefore to HACMP).
/var/hacmp/clcomd/clcomddiag.log
Contains time-stamped, formatted, diagnostic messages generated by clcomd.
/var/adm/clavan.log
Contains the state transitions of applications managed by HACMP. For example, when
each application managed by HACMP is started or stopped and when the node stops
on which an application is running.
Each node has its own instance of the file. Each record in the clavan.log file consists of
a single line. Each line contains a fixed portion and a variable portion.
Recommended Use: By collecting the records in the clavan.log file from every node in
the cluster, the Application Availability Analysis utility (clavan) can determine how long

3-13
Student Notebook
each application has been up, as well as compute other statistics describing application
availability time.
/var/hacmp/utilities/cl_configassist.log
Contains debugging information for the Two-Node Cluster Configuration Assistant. The
Assistant stores up to ten copies of the numbered log files to assist with troubleshooting
activities.
/var/hacmp/log/clutils.log
Contains information about the date, time, results, and which node performed an
automatic cluster configuration verification.
It also contains information for the file collection utility, the two-node cluster
configuration assistant, the cluster test tool and the Online Planning Worksheet (OLPW)
conversion tool.
/var/hacmp/utilities/cl_testtool.log
Includes excerpts from the hacmp.out file. The Cluster Test Tool saves up to three log
files and numbers them so that you can compare the results of different cluster tests.
The tool also rotates the files with the oldest file being overwritten.
RSCT logs: /var/ha/log/grpsvcs*

These files contain time-stamped messages in ASCII format. They track the execution
of internal activities of the RSCT Group Services daemon. IBM support personnel use
this information for troubleshooting. The files get trimmed regularly. Therefore, please
save them promptly if there is a chance you may need it.
RSCT logs: /var/ha/log/topsvcs*

These files contain time-stamped messages in ASCII format. They track the execution
of internal activities of the RSCT Topology Services daemon. IBM support personnel
use this information for troubleshooting. The files get trimmed regularly. Therefore,
please save them promptly if there is a chance you may need it.
More details
These notes provide an overview of the HACMP log files. They will be discussed in
detail in later HACMP courses. In addition, see the following manual for more
information about using the HACMP log files:

V3.1.0.1
Student Notebook
Uempty
Where are the Log Files?

HACMP has a LOT of logs
Users can change the location the logs (HACMP logs, not RSCT logs)
Can't remember where they are?
Use the following command to list default and current locations:
node1/# odmget HACMPlogs
HACMPlogs:
name = "clstrmgr.debug"
description = "Generated by the clstrmgr daemon"
defaultdir = "/tmp"
value = "/tmp"
rfs = ""
HACMPlogs:
name = "cluster.log"
description = "Generated by cluster scripts and daemons"
defaultdir = "/usr/es/adm"
value = "/usr/es/adm"
rfs = ""
HACMPlogs:
name = "cluster.mmddyyyy"
description = "Cluster history files generated daily"
defaultdir = "/usr/es/sbin/cluster/history"
value = "/usr/es/sbin/cluster/history"
rfs = ""
HACMPlogs:
name = "cspoc.log"
description = "Generated by CSPOC commands"
defaultdir = "/tmp"
value = "/tmp"
rfs = ""
. . .
Figure 3-6. Where are the Log Files?
QV1251.2
Notes:
Finding the log files
HACMP has many log files in many different directories. In addition, users can change
the location of one or more of the HACMP log files.
Fortunately, you can use odmget, as shown in the visual, to display the location of the
HACMP log files.
The RSCT log files cannot be relocated and will always be found in /var/ha/log.

3-15
Student Notebook
Lets Review: Topic 1

1.
What's the fastest way to locate the cluster.log file?

a. Consult the HACMP Troubleshooting Guide
b. odmget HACMPlogs
c. find / -name cluster.log -print
d. Open a service call
2. True or False?
cldump does not require clinfoES.
3.
True or False?
clstat does not require clinfoES.
Figure 3-7. Lets Review: Topic 1
QV1251.2
Notes:

V3.1.0.1
Student Notebook
Uempty
3.2 Topology and Resource Group Management

3-17
Student Notebook
Topic 2: Topology and

Resource Group Management
Add a resource group and resources to an existing cluster
Remove a resource group from a cluster
Add a new node to an existing cluster
Remove a node from an existing cluster
Configure a non-IP heartbeat network
Figure 3-8. Topic 2: Topology and Resource Group Management
QV1251.2
Notes:

V3.1.0.1
Student Notebook
Uempty
Yet Another Resource Group

The users have asked that a third application be added to the cluster
The application uses very little CPU or memory and there's money in
the budget for more disk drives in the disk enclosure
Minimizing downtime is particularly important for this application
The resource group is called zwebgroup
usa
uk
X
Figure 3-9. Yet Another Resource Group
QV1251.2
Notes
Introduction
Were now going to embark on a series of hypothetical scenarios to illustrate a number
of routine cluster administration tasks. Some of these scenarios are more realistic than
others.
Add a resource group

In this first scenario, were going to add a resource group to the cluster. This new
resource group is called zwebgroup. This resource groups application has been
reported to use very little in the way of system resource, and there is a strong desire to
avoid unnecessary zwebgroup outages.

3-19
Student Notebook
Adding a Third Resource Group

We'll change the startup policy to "Online On First Available Node" so that
the resource group comes up if uk is started when usa is down.
Add a Resource Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Resource Group Name
* Participating Node Names (Default Node Priority)
Startup Policy
Fallover Policy
Fallback Policy
[Entry Fields]
[zwebgroup]
[usa uk] +
Online On First Avail> +

Fallover To Next Prio> +
Never Fallback
+
avoid startup delay by starting on first available node

avoid fallback outage by never falling back
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-10. Adding a Third Resource Group
QV1251.2
Notes
Add a resource group
We use the Extended Configuration path.
The resource group will be configured to start up on whichever node is available first
and to never fallback when a node rejoins the cluster. The combination of these two
parameters should go a long way towards minimizing this resource groups downtime.
If youre familiar with the older terminology of cascading and rotating resource groups,
this resource groups policies make it essentially identical to a cascading without
fallback resource group.

V3.1.0.1
Student Notebook
Uempty
Adding a Third Service IP Label

The extended configuration path screen for adding a service IP label
provides more options. We choose those which mimic the standard path.
Configure HACMP Service IP Labels/Addresses
Move cursor to desired item and press Enter.
Add a Service IP Label/Address

Change/Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
+--------------------------------------------------------------------------+
Select a Service IP Label/Address type
Configurable on Multiple Nodes
Bound to a Single Node
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Figure 3-11. Adding a Third Service IP Label
QV1251.2
Notes
Introduction
We need to define a service IP label for the zwebgroup resource group.
IPAT via IP aliasing required

Creating a third resource group on a cluster with one network and two nodes requires
the use of IPAT via IP aliasing. A cluster which only uses IPAT via IP replacement is for
all practical purposes restricted to one resource group with a service IP label per node
per IP network. Since our cluster has only one IP network, it would not be able to
support three different resource groups with service IP labels if it used IPAT via
replacement.

3-21
Student Notebook
Resource group limits

HACMP V5.2 and above supports a maximum of 64 resource groups and 256 IP
addresses known to HACMP (for example, service and interface IP addresses). There
are no other limits on the number of resource groups with service labels that can be
configured on an IPAT via IP aliasing network (although, eventually, you run out of CPU
power or memory or something for all the applications associated with these resource
groups).
Service IP label/address type

Bound to a Single Node is used with IBMs General Parallel File System (GPFS).
Network name
The next step is to associate this Service Label with one of the HACMP networks. This
is not shown in the visual.
Alternate HW address
When you configure a service label, you can associate a hardware address with the IP
label and address for hardware address takeover, but only if you are using IPAT via
replacement.

V3.1.0.1
Student Notebook
Uempty
Adding a Third Application Server

The Add Application Server screen is identical in both
configuration paths.
Add Application Server
[Entry Fields]
[zwebserver]
[/usr/local/scripts/startzweb]
[/usr/local/scripts/stopzweb]
* Server Name
* Start Script
* Stop Script
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-12. Adding a Third Application Server
QV1251.2
Notes
Add an application server
You must give it a name and specify a start and stop script that you have already tested
on each node that will support the application.

3-23
Student Notebook
Adding Resources to the Third RG (1 of 2)

The extended path's SMIT screen for updating the contents of a
resource group is MUCH more complicated!
Change/Show All Resources and Attributes for a Resource Group
[TOP]
Resource Group Name
Resource Group Management Policy
Inter-site Management Policy
Participating Node Names (Default Node Priority)
[Entry Fields]
zwebgroup
custom
ignore
uk usa
Startup Behavior
Fallover Behavior
Fallback Behavior
Fallback Timer Policy (empty is immediate)
Online On First Avail>

Fallover To Next Prio>
Fallback To Higher Pr>
[]
+
Service IP Labels/Addresses
Application Servers
[zweb]
[zwebserver]
+
+
[zwebvg]
false
false
[]
fsck
+
+
+
+
+
Volume Groups
Use forced varyon of volume groups, if necessary
Automatically Import Volume Groups
Filesystems (empty is ALL for VGs specified)
Filesystems Consistency Check
[MORE...17]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-13. Adding Resources to the Third RG (1 of 2)
QV1251.2
Notes
Adding resources to a resource group (extended path)
This is the first of two screens to show the Extended Path menu for adding attributes.
Unlike the Standard path, it contains a listing of all the possible attributes.

V3.1.0.1
Student Notebook
Uempty
Adding Resources to the Third RG (2 of 2)

Even more choices!
Fortunately, only a handful tend to be used in any given context.
[MORE...17]
Filesystems Recovery Method
Filesystems mounted before IP configured
Filesystems/Directories to Export
[Entry Fields]
fsck
sequential
false
[]
Filesystems/Directories to NFS Mount

Network For NFS Mount
[]
[]
+
+
+
+
+
+
+
Tape Resources
Raw Disk PVIDs
[]
[]
+
+
Fast Connect Services

Communication Links
[]
[]
+
+
Primary Workload Manager Class

Secondary Workload Manager Class
[]
[]
+
+
Miscellaneous Data
[BOTTOM]
F1=Help
F5=Reset
F9=Shell
[]
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-14. Adding Resources to the Third RG (2 of 2)
QV1251.2
Notes
Adding resources to a resource group (extended path)
Unlike the menu you see on the standard path, here you can see all of the options
available for configuring resources and attributes for a resource group. This includes
NFS exports and mounts, which are covered in Appendix B, with an accompanying
exercise in Appendix A in the exercise book.

3-25
Student Notebook
Synchronize Your Changes

The extended configuration path provides verification and synchronization
options.
HACMP Verification and Synchronization
* Verify, Synchronize or Both
Force synchronization if verification fails?
* Verify changes only?
* Logging
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
[Both]
+
[No]
+
[No]
+
[Standard]
+
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Don't forget to verify that you actually implemented what was

planned by executing your test plan.
Figure 3-15. Synchronize Your Changes
QV1251.2
Notes
Extended path synchronization
This is the extended path screen to show the synchronization menu options which are
not shown in the standard path. An additional option to Automatically correct
errors found during verification is available when cluster services is down on all
nodes.

V3.1.0.1
Student Notebook
Uempty
Expanding the Cluster

The users decide to improve the availability of two of the
applications by adding another node to support them
usa
uk
india
Figure 3-16. Expanding the Cluster
QV1251.2
Notes
Expanding the cluster
In this scenario, well look at adding a node to a cluster.

3-27
Student Notebook
Adding a New Cluster Node

1. Physically connect the new node
Connect to IP networks
Connect to the shared storage subsystem
Connect to non-IP networks to create a ring encompassing all nodes
2.
3.
4.
5.
Configure the shared volume groups on the new node

Add the new node's IP labels to /etc/hosts on one existing node
Copy /etc/hosts from this node to all other nodes
Install AIX, HACMP and application software on the new node:
Install patches required to bring the new node up to the same level as the
existing cluster nodes
Reboot the new node (always reboot after installing or patching HACMP)
6. Add the new node to the existing cluster (from one of the existing
nodes)
7. Add non-IP networks for the new node
8. Synchronize your changes
9. Start HACMP on the new node
10. Add the new node to the appropriate resource groups
11. Synchronize your changes again
12. Run through your (updated) test plan
Figure 3-17. Adding a New Cluster Node
QV1251.2
Notes
Adding a new cluster node
Adding a node to an existing cluster isnt all that difficult from the HACMP perspective
(as we see shortly). The hard work involves integrating the node into the cluster from an
AIX and from an application perspective.
Well be discussing the HACMP part of this work (starting at step 6 in the visual).

V3.1.0.1
Student Notebook
Uempty
Add Node Standard

Versus Extended Path
Configure Nodes to an HACMP Cluster (standard)
[Entry Fields]
[qv125cluster]
[]
+
usa uk
* Cluster Name
New Nodes (via selected communication paths)
Currently Configured Node(s)
Add a Node to the HACMP Cluster

F1=Help
F2=Refresh
* Node Name
Communication Path
to Node
F5=Reset
F6=Command
F9=Shell
F10=Exit
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
F3=Cancel[india]
F4=List
+
F7=Edit [indiaboot1] F8=Image
Enter=Do
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-18. Add Node -- Versus Extended Path
QV1251.2
Notes
Add node -- standard versus extended path
The extended path is a little different than the standard path in this case.
Standard path
From the standard path, you would select the menu Configure Nodes to an HACMP
Cluster (standard), which allows you to set the cluster name, and add additional
nodes via discovery using their communication paths. When you hit Enter from this
screen in the standard path, the network configuration for the added node would be
discovered automatically and added to the cluster configuration.
Extended path
From the extended path, you can specify the new node name (you type this in, there is
no selection from F4), and you can use F4 to select the boot IP label that you will use

3-29
Student Notebook
for the communication path to the node (and which you have already added to the
/etc/hosts files on all nodes). Be aware that at this point youve only configured the
node definition. You must also configure the adapter definitions (boot adapter
definitions). To do this you use the extended path (Extended Topology,
Communications Interfaces/Devices). If you run cltopinfo at this point from the
administrative node, you will see the new node, but you wont see any of its interfaces.

V3.1.0.1
Student Notebook
Uempty
7. Define the Non-IP rs232 Networks (1 of 2)

We've added (and tested) a fully wired rs232 null modem cable
between indias tty1 and usa's tty2 so we define that as a non-IP
rs232 network.
Configure HACMP Communication Interfaces/Devices
+-------------------------------------------------------------------------+
Select Point-to-Point Pair of Discovered Communication Devices to Add
Move cursor to desired item and press F7. Use arrow keys to scroll.
ONE OR MORE items can be selected.
Press Enter AFTER making all selections.
# Node
Device
Device Path
Pvid
usa
tty0
/dev/tty0
uk
tty0
/dev/tty0
india
tty0
/dev/tty0
usa
tty1
/dev/tty1
uk
tty1
/dev/tty1
>
india
tty1
/dev/tty1
>
usa
tty2
/dev/tty2
uk
tty2
/dev/tty2
india
tty2
/dev/tty2
F1=Help
F2=Refresh
F3=Cancel
F7=Select
F8=Image
F10=Exit
F1 Enter=Do
/=Find
n=Find Next
F9+-------------------------------------------------------------------------+
Figure 3-19. 7. Define the Non-IP rs232 Networks (1 of 2)
QV1251.2
Notes
Introduction
This visual, and the next one, show how to add two more non-IP networks to our cluster.
Make sure that the topology of the non-IP networks that you describe to HACMP
corresponds to the actual topology of the physical rs232 cables.
In the following notes, we discuss why we need to add two more non-IP RS-232 links.
Note that if you are using heartbeat on disk the same two steps are required. There
must be a unique disk shared between india and usa, and india and uk in order to
define the two heartbeat on disk networks (one between india and usa, the other
between india and uk). You cant use an hdisk on one node for a heartbeat on disk
network with two different nodes.

3-31
Student Notebook
Minimum non-IP network configuration: ring

At minimum, the non-IP networks in a cluster with more than two nodes should form a
ring encompassing all the nodes, that is each node is connected to its two directly
adjacent neighbors. A ring provides redundancy (two non-IP heartbeat paths for every
node) and is simple to implement.
Mesh configuration
The most redundant configuration would be a mesh, each node connected to every
other node. However, if you have more than three nodes, this means extra complexity
and can mean a lot of extra hardware, depending on which type of non-IP network you
are using.
Note: For a three node cluster, a ring and a mesh are the same.
Star configuration not recommended

While the HACMP for AIX Planning and Installation Guide discusses using a star, ring
or mesh configuration for non-IP networks, a star is not a good choice. A star means
that the center node is a SPOF for the non-IP networks; losing the center node means
that all the other nodes lose non-IP network connectivity.
Three-node example
In the example in the visual, we already have a non-IP network between usa and uk so
we need to configure one between india and usa (on this page) and another one
between uk and india (on the next page).
If, for example, we left out the uk and india non-IP network then the loss of the usa
node would leave the uk and india nodes without a non-IP path between them.
Five-node example
In even larger clusters, it is still only necessary to configure a ring of non-IP networks.
For example, if the nodes are A, B, C, D and E then five non-IP networks would be the
minimum requirement: A to B, B to C, C to D, D to E and E to A being one possibility. Of
course, other possibilities exist like A to B, B to D, D to C, C to E and E to A.

V3.1.0.1
Student Notebook
Uempty
Define the Non-IP rs232 Networks (2 of 2)

We've also added (and tested) a fully wired rs232 null-modem cable
between uk's tty2 and indias tty2 so we define that as a non-IP rs232
network.
Configure HACMP Communication Interfaces/Devices
+--------------------------------------------------------------------------+
Select Point-to-Point Pair of Discovered Communication Devices to Add
Move cursor to desired item and press F7. Use arrow keys to scroll.
ONE OR MORE items can be selected.
Press Enter AFTER making all selections.
# Node
Device
Device Path
Pvid
usa
tty0
/dev/tty0
uk
tty0
/dev/tty0
india
tty0
/dev/tty0
usa
tty1
/dev/tty1
uk
tty1
/dev/tty1
india
tty1
/dev/tty1
usa
tty2
/dev/tty2
>
uk
tty2
/dev/tty2
>
india
tty2
/dev/tty2
F1=Help
F2=Refresh
F3=Cancel
F7=Select
F8=Image
F10=Exit
F1 Enter=Do
/=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Figure 3-20. Define the Non-IP rs232 Networks (2 of 2)
QV1251.2
Notes
Define non-IP networks
Make sure that the topology of the non-IP networks that you describe to HACMP
corresponds to the actual topology of the physical rs232 cables.

3-33
Student Notebook
8-9. Synchronize and Start Cluster Services

Synchronize the cluster on the administrative node, where you added
the new node and non-IP network
When synchronization is successful, the new node is part of the
cluster
Start cluster services on the newly added node
You can do this from the new node, or any other node in the
cluster
Figure 3-21. 8-9. Synchronize and Start Cluster Services
QV1251.2
Notes
Synchronize
At this point, all this configuration exists only on the node where the data was entered.
To populate the other nodes HACMP ODMs, you must synchronize. Once weve
synchronized our changes, the india node is an official member of the cluster.
Start cluster services

You can start cluster services from your administrative node. Now the node is available
to take over if another node fails, or to take some of the application load.

V3.1.0.1
Student Notebook
Uempty
Final Steps: Add the Node to a

Resource Group, Synchronize, and Test
Add the node to a resource group
Use the Change/Show a Resource Group menu from the Extended
Resources path
Modify the node list
Repeat for all resource groups that will be supported by the new node
Remember to synchronize the cluster again
Synchronize after any cluster changes!
Test cluster changes using your updated test plan
Figure 3-22. Final Steps: Add the Node to a Resource Group, Synchronize, and Test
QV1251.2
Notes
Add the node to a resource group
Remember that adding the new india node to the HACMP configuration is the easy
part. You would not perform any of the SMIT HACMP operations shown so far in this
scenario until you were CERTAIN that the india node was actually capable of running
the application.
Synchronize and test

Although the HACMP configuration work is now done, the task of adding the new india
node to the cluster is not finished until the (updated) cluster test plan has been
executed successfully.

3-35
Student Notebook
Shrinking the Cluster

The Auditors aren't impressed with the latest investment and
force the removal of the india node from the cluster so that it
can be transferred to a new project
usa
uk
X
india
Figure 3-23. Shrinking the Cluster
QV1251.2
Notes
Removing a node
In this scenario, we take a look at how to remove a node from an HACMP cluster.

V3.1.0.1
Student Notebook
Uempty
Removing a Cluster Node

1. Using any cluster node, move resource groups to other nodes
2. Remove the departing node from all resource groups and
synchronize your changes
Ensure that each resource group is left with at least two nodes
3. Stop HACMP on the departing node

4. Using one of the cluster nodes which is not being removed:
Remove the departing node from the cluster's topology
Remove a Node from the HACMP Cluster
(Extended Configuration)
Synchronize
Once the synchronization is completed successfully, the departing node is
no longer a member of the cluster
5. Remove the departed node's IP addresses from

/usr/es/sbin/cluster/etc/rhosts on the remaining nodes
Prevents departed node from interfering with HACMP on remaining nodes
6. Physically disconnect the (correct) rs232 cables

7. Disconnect the departing node from the shared storage
subsystem
Strongly recommended as it makes it impossible for the departed node to
screw up the cluster's shared storage
8. Run through your (updated) test plan

Figure 3-24. Removing a Cluster Node
QV1251.2
Notes
Removing a node
While removing a node from a cluster is another fairly involved process, some of the
work has little if anything to do with HACMP itself.
Use HACMP to move resource groups to other nodes before taking any other steps.
Next remove the node from membership in any resource groups. Remember that each
resource group must be associated with at least two nodes, so you may have to make
additional changes to your configuration.
After you stop HACMP on the departing node, you must remove it from the cluster
topology from another node. Synchronizing the cluster makes the removal of the node
complete.

3-37
Student Notebook
Removing an Application
The zwebserver application has been causing problems and
a decision has been made to move it out of the cluster
usa
uk
X
Figure 3-25. Removing an Application
QV1251.2
Notes
Removing an application
In this scenario, we will remove an application from the control of HACMP. This means
we must remove the resource group that contains the application, and unconfigure the
applications resources.

V3.1.0.1
Student Notebook
Uempty
Removing a Resource Group (1 of 2)

1.Take the resource group offline
2.OPTIONAL: Take a cluster snapshot
3.Using any cluster node and either configuration path:
Remove the departing resource group using the
Remove a Resource Group SMIT screen
Remove any service IP labels previously used by the departing resource group
using the Remove Service IP Labels/Addresses SMIT screen
Synchronize your changes
This will shut down the resource group's applications using the application
server's stop script and release any resources previously used by the
resource group
4.Clean out anything that is no longer needed by the cluster:

Export any shared volume groups previously used by the application.
Consider deleting service IP labels from the /etc/hosts file
Uninstall the application
5.Run through your (updated) test plan
Figure 3-26. Removing a Resource Group (1 of 2)
QV1251.2
Notes
Introduction
The procedure for removing a resource group is actually fairly straightforward.
Cluster snapshot
HACMP supports something called a cluster snapshot. This would be an excellent time
to take a cluster snapshot, just in case we decide to go back to the old configuration.
We will discuss snapshots later in this unit.
Remove unused resources

Do not underestimate the importance of removing unused resources like service IP
labels and volume groups. They will only clutter up the clusters configuration and, in

3-39
Student Notebook
the case of shared volume groups, tie up physical resources which could presumably
be better used elsewhere.
A cluster should not have any useless resources or components as anything which
simplifies the cluster tends to improve availability by reducing the likelihood of human
error.

V3.1.0.1
Student Notebook
Uempty
Removing a Resource Group (2 of 2)

HACMP Extended Resource Group Configuration
Add a Resource Group
Change/Show a Resource Group
Change/Show Resources and Attributes for a Resource Group
Remove a Resource Group

Show All Resources by Node or Resource Group
+--------------------------------------------------------------------------+
Select a Resource Group
xwebgroup
ywebgroup
zwebgroup
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Synchronize the changes and run through the test plan.

Figure 3-27. Removing a Resource Group (2 of 2)
QV1251.2
Notes
Removing a resource group
Make sure that you delete the correct resource group
Are you sure?

Pause to make sure you know what you are doing. If you arent sure, its easy to go
back and step through the process again.

3-41
Student Notebook

1.
True or False?
Creating a third resource group on a cluster that has only one IP
network with two interfaces on each node requires using IPAT via
aliasing.
2. True or False?
It is NOT possible to add a node while HACMP is running.
3. Youve decided to add a third node to your existing two-node HACMP
cluster. What very important step, which will help prevent a partitioned
cluster, follows adding the node definition to the cluster configuration?
a. Install HACMP software
b. Configure a non-IP network
c. Start Cluster Services on the new node
d. Add a resource group for the new node
4. What should you do first when removing a node from a cluster?
a. Uninstall HACMP software
b. Move (or take offline) any resource groups online on the node
c. Remove the nodes IP address from the rhosts file
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
3.3 Cluster Single Point of Control

3-43
Student Notebook

Discuss the need for change management when using
HACMP
Describe the benefits and capabilities of C-SPOC
Perform routine administrative changes using C-SPOC
Figure 3-29. Topic 3: Cluster Single Point of Control
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Administering a High Availability Cluster

Administering a HA cluster is different from administering a
stand-alone server:
Changes made to one node need to be reflected on the other node
Poorly considered changes can have far reaching implications
Beware the law of unintended consequences
Aspects of the clusters configuration could be quite subtle and yet

critical
Scheduling downtime to install and test changes can be challenging
Figure 3-30. Administering a High Availability Cluster
QV1251.2
Notes
Introduction
You must develop good change management procedures for managing an HACMP
cluster. As you will see, C-SPOC utilities can be used to help, but do not do the job by
themselves. Having well documented and tested procedures to follow, as well as
restricting who can make changes, (for example you should not have more than two or
three persons with root privileges) minimizes loss of availability when making changes.
The snapshot utility should be used before any change is made.

3-45
Student Notebook
Recommendations
Implement and adhere to a change control/management
process
Wherever possible, use HACMP's C-SPOC facility to make
changes to the cluster (details to follow)
Document routine operational procedures in a step-by-step
list fashion (for example, shutdown, startup, increasing size
of a filesystem)
Restrict access to the root password to trained High
Availability cluster administrators
Always take a snapshot of your existing configuration before
making a change
Figure 3-31. Recommendations
QV1251.2
Notes
Some beginning recommendations
These recommendations should probably be considered to be the minimum acceptable
level of cluster administration. There are additional measures and issues which should
probably be carefully considered (for example, problem escalation procedures should
be documented, and both hardware and software support contracts should either be
kept current or a procedure developed for authorizing the purchase of time and
materials support during off hours should an emergency arise).
Importance of change management

A real change control or management process requires a serious commitment on the
part of the entire organization:
- Every change must be carefully considered

V3.1.0.1
Student Notebook
Uempty
As the cluster administrator you should make yourself part of every change
meeting that occurs on your HACMP systems
Think about the implications of the change on the cluster configuration and
function, keeping in mind the networking concepts weve discussed as well as
any changes to the applications data organization or start/stop procedures
- The onus should be on the requester of the change to demonstrate that it is
necessary
Not on the cluster administrators to demonstrate that it is unwise.
- Management must support the process
Defend cluster administrators against unreasonable request or pressure
Not allow politics to affect a change's priority or schedule
- Every change, even the minor ones, must follow the process
No system/cluster/database administrator can be allowed to sneak
changes past the process
The notion that a change might be permitted without following the process must
be considered to be absurd
Other recommendations
Ensure that you request sufficient time during the maintenance window for testing the
cluster. If this isnt possible, advise all parties of the risks of running without testing.
Update any documentation as soon as possible after the change is made to reflect the
new configuration/function of the cluster, if anything changes.

3-47
Student Notebook
Cluster Single Point of Control (C-SPOC)

C-SPOC provides facilities for performing common cluster
wide administration tasks from any node within the cluster
Relies on the clcomdES socket based subsystem for secure
node-to-node communications
C-SPOC operations may fail if any target node is down at the
time of execution or selected resource is not available
Any change to a shared VGDA is synchronized automatically if
C-SPOC is used to change a shared LVM component
C-SPOC uses a script parser called the Command Execution
Language
Target
node
Target
node
Initiating
node
Target
node
Target
node
Figure 3-32. Cluster Single Point of Control (C-SPOC)
QV1251.2
Notes
C-SPOC command execution
C-SPOC commands first execute on the initiating node. Then the HACMP command
cl_rsh is used to propagate the command (or a similar command) to the target nodes.
Secure distributed communications between the nodes

The clcomdES subsystem provides secure communications between nodes. This
daemon provides secure communication between cluster nodes for all cluster utilities
such as verification and synchronization and system management (C-SPOC). The
clcomd daemon is started automatically at boot time by the init process.
More details
All the nodes in the resource group must be available or the C-SPOC command may be
performed partially across the cluster, only on the active nodes. This can lead to
V3.1.0.1
Student Notebook
Uempty
problems later when nodes are brought up and are out of sync with the other nodes in
the cluster.
As you saw in the LVM unit, LVM changes, if made through C-SPOC, may be
synchronized automatically (for enhanced concurrent mode volume groups, but only for
the LV information, not the filesystem information).
C-SPOC capabilities
You can use C-SPOC to do most cluster tasks, including managing users and security,
managing resources and resource group configurations, managing cluster services,
and managing physical and logical volume changes (including changes to volume
groups, logical volumes, and filesystems). You can use C-SPOC to add a user to the
cluster, synchronize passwords, add a physical volume, shared volume group, logical
volume, or filesystem to the cluster, or make changes to filesystems and logical
volumes.
Using C-SPOC will decrease the likelihood that you will make an error performing
cluster tasks, but is not a replacement for a good change management plan.
C-SPOC command line

C-SPOC commands can be executed from the command line (or through SMIT, of
course).
Error messages and warnings returned by the commands are based on the underlying
AIX-related commands.
Appendix C: HACMP for AIX Commands in the HACMP for AIX Administration Guide
provides a list of all C-SPOC commands provided with the HACMP for AIX software.
Command Execution Language (CEL)

C-SPOC commands are written as execution plans in CEL. Each plan contains
constructs to handle one or more underlying AIX tasks (a command, executable, or
script) with a minimum of user input.
An execution plan becomes a C-SPOC command when the
/usr/es/sbin/cluster/utilities/celpp utility converts it into a cluster aware ksh
script, meaning the script uses the C-SPOC distributed mechanismthe C-SPOC
Execution Engineto execute the underlying AIX commands on cluster nodes to
complete the defined tasks.
CEL is a programming language that lets you integrate dshs distributed functionality
into each C-SPOC script the CEL preprocessor (celpp) generates. When you invoke a
C-SPOC script from a single cluster node to perform an administrative task, the script is
automatically executed on all nodes in the cluster. The language is described further in
Appendix B of the HACMP for AIX Troubleshooting Guide Version 5.4.

3-49
Student Notebook
The Top-Level C-SPOC Menu

System Management (C-SPOC)
Manage HACMP Services
HACMP Communication Interface Management
HACMP Resource Group and Application Management
HACMP Log Viewing and Management
HACMP File Collection Management
HACMP Security and Users Management
HACMP Logical Volume Management
HACMP Concurrent Logical Volume Management
HACMP Physical Volume Management
Open a SMIT Session on a Node
F1=Help
F9=Shell
F2=Refresh
F10=Exit
F3=Cancel
Enter=Do
F8=Image
Figure 3-33. The Top-Level C-SPOC Menu
QV1251.2
Notes
Top-level C-SPOC menu
The top-level C-SPOC menu is one of the four top-level HACMP menus.
C-SPOC scripts are used for users, LVM, concurrent LVM, and physical volume
management.
clRGmove is used for resource group management.
The other functions are included here as a logical place to put these system
management facilities. We will look at Managing Cluster Services and the Logical
Volume Management tasks.
The fast path is smitty cl_admin.

V3.1.0.1
Student Notebook
Uempty
Starting Cluster Services

# smit clstart
Start Cluster Services
* Start now, on system restart or both
Start Cluster Services on these nodes
* Manage Resource Groups
BROADCAST message at startup?
Startup Cluster Information Daemon?
Ignore verification errors?
Automatically correct errors found during
cluster start?
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
now
[usa,uk]
Automatically
true
true
false
Interactively
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
+
+
+
F4=List
F8=Image
Figure 3-34. Starting Cluster Services
QV1251.2
Notes
Briefly, how did we get here?
The first choice in the C-SPOC menu is Manage HACMP Services. This option brings up
another menu containing three choices, Start Cluster Service, Stop Cluster
Services and Show Cluster Services. This menu appears when we choose Start
Cluster Services. Better yet, just use the fast path, smitty clstart.
Starting cluster services

We saw this in the previous unit. Now for the details.
You have the option to start cluster services at system boot time, now or both. Selecting
to start cluster services at boot time results in adding entry to /etc/inittab. If you select
to start them now, it results in invoking cl_rc.cluster. Think carefully about starting
cluster services at system boot time as this may result in resource group movement,
depending on your fallback policies.

3-51
Student Notebook
You have a choice of any or all nodes in the cluster to start services. Use F4 to get a
pick list. If the field is left blank, services will be started on all nodes.
When cluster services is started, it acquires resources in resource groups as configured
and makes applications available. Beginning with HACMP V5.4, the function of
managing resource groups can be deferred if you choose Manually for the option
Manage Resource Groups. To allow cluster services to acquire resources and make
applications available if so configured (pre-HACMP v5.4 behavior), choose the default,
Automatically.
You can broadcast a message that cluster services are being started.
You have the option to start the Client Information Daemon, clinfo, along with the start of
cluster services. This is usually a good idea as it allows you to use the clstat cluster
monitor utility.
Finally, there are options regarding verification. Before cluster services is started, a
verification is run to ensure that you are not starting a node with an inconsistent
configuration. You can choose to ignore verification errors and start anyway. This is not
something that you would do unless you are very aware of the reason for the
verification error, you understand the ramifications of starting with the error and you
must activate cluster services. An alternative that is safer would be to choose to
Interactively correct errors found during verification. Not all errors can be corrected, but
you have a better chance of getting cluster services activated in a clean configuration
with this option.
The options that you choose here are retained in the HACMP ODM and repopulated on
reentry.

V3.1.0.1
Student Notebook
Uempty
Verifying Cluster Services Have Started

Have patience! - it can take a few minutes
You have several options: clstat, clcheck_server, lssrc -l, cldump
clstat (requires clinfoES)
usa # clstat -a
clstat - HACMP Cluster Status Monitor
------------------------------------Cluster: ibmcluster (1156578448)
Wed Aug 30 11:16:19 2006
State: UP
Nodes: 2
SubState: STABLE
Node: usa
Interface: usaboot1 (2)
State: UP
Address:
State:
Address:
State:
Address:
State:
Address:
State:

Interface: usa_hdisk5_01 (0)
Interface: xweb (2)
Resource Group: xwebgroup
192.168.15.29
UP
192.168.16.29
UP
0.0.0.0
UP
192.168.5.92
UP
State: On line
clcheck_server (only return code)

usa # clcheck_server grpsvcs;print $?
1
Note: rc=1 means cluster services are active
lssrc -ls clstrmgrES

usa # lssrc -ls clstrmgrES
Current state: ST_STABLE
. . .
cldump (uses SNMP directly)

usa # cldump
. . .
Cluster State: UP
Cluster Substate: STABLE
. . .
Figure 3-35. Verifying Cluster Services Have Started
QV1251.2
Notes
Remember patience
Patience is key with HACMP tasks. There are many things going on under the covers
when you ask the Cluster Manager to do something. Getting the OK in SMIT does
NOT mean that the task has been completely performed. Its just the beginning in many
cases.
Did I mention patience?
The Cluster Manager daemon queues events. It doesnt forget (usually anyway). So
keep in mind, that if you launch a task with the Cluster Manager and dont verify its
status closely and then attempt to give the process a boost by launching another task
(like following a resource group move with an offline) you have just queued the second
task. Once the Cluster Manager completes the first task, providing its in a state where it
can continue processing, it will perform the second task. This might not be what you
wanted.

3-53
Student Notebook
What to look at, what to look for

Documentation for HACMP V5.3 indicated that the clcheck_server utility was to be
used given that the Cluster Manager daemon was a long running process. This method
still works. Run it with grpsvcs as the only parameter and then look at the return code.
A return code of 1 indicates that the Cluster Manager is a member of a group services
group that implies cluster services are active.
Although you may find the output to be unreliable at times, the clstat utility is a good
mechanism to use. If youre not a fan of clstat consider using cldump, which relies on
SNMP directly.
Another option is to use lssrc. This is to be used with caution. You must understand
what state is expected and then be patient, retrying the command to ensure that the
state changes are no longer occurring. A state of ST_STABLE is a tricky indication. It may
mean that cluster services are active or it may mean that cluster services was forced
down on this node. Pay close attention to the Forced down nodes list: portion of the
output of the lssrc -ls clstrmgrES. Know what state to expect.
Finally, although not shown (due to lack of space on the visual), another option is to use
WebSMIT. This is the solution for those of you who want to see a graphical
representation of cluster status. You can learn more about WebSMIT in Appendix C.

V3.1.0.1
Student Notebook
Uempty
Stopping Cluster Services

# smit clstop
Stop Cluster Services
* Stop now, on system restart or both
Stop Cluster Services on these nodes
BROADCAST cluster shutdown?
* Select and Action on Resource Groups
[Entry Fields]
now
[usa]
true
Bring Resource Groups>
+
+
+
+
+--------------------------------------------------------------------------+
Shutdown mode
Bring Resource Groups Offline
Move Resource Groups
Unmanage Resource Groups
F1=Help
F2=Refresh
F3=Cancel
F1 F8=Image
F10=Exit
Enter=Do
F5 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Figure 3-36. Stopping Cluster Services
QV1251.2
Notes
Briefly, how did we get here?
From the Manage HACMP Services C-SPOC menu. This menu appears when we choose
Stop Cluster Services. You can use the fast path, smitty clstop.
Stopping cluster services

Remember that this is not stopping the Cluster Manager daemon. It runs all the time.
Actually, when you stop cluster services, the Cluster Manager daemon dies gracefully
and is respawned by the System Resource Controller.
You have the option to stop cluster services when you run through this menu, remove
the option to start cluster services at system start (removes entry from /etc/inittab), or
both. Note that the system start option is a reversal of the setting made for system start
when starting cluster services.

3-55
Student Notebook
You have a choice of any or all nodes in the cluster to stop services. Use F4 to get a
pick list. If the field is left blank, services will be stopped on all nodes.
You can broadcast a message that cluster services are being stopped.
Finally, the options regarding resource group management. Prior to HACMP V5.4 the
options were graceful, takeover and forced. Graceful meant to bring resource groups
offline prior to stopping cluster services. Takeover meant to move resource groups to
other available nodes, if applicable, according to the current locations and fallover
policies of the resource groups. As you can see, these options map directly to the
current options and their functions are self-explanatory.
But what about forced down you say? Prior to HACMP V5.4, forcing down cluster
services was supported sometimes, in some scenarios and resulted in an environment
that was potentially unstable (that is, potentially unavailable), Forcing cluster services
down when using enhanced concurrent mode volume groups was not supported
because Group Services and gsclvmd were brought down as part of the forced down
operation. Group Services and gsclvmd are the components that maintain the volume
groups VGDA/VGSA integrity across all nodes. With HACMP V5.4 and later, forcing
down cluster services is supported by moving the resource groups to an unmanaged
state. In addition, the Cluster Manager and the RSCT infrastructure remain active
permitting this action with enhanced concurrent mode volume groups. Thus, the option
in the menu above, Unmanage Resource Groups. While in this state, the Cluster
Manager remains in the ST_STABLE state. It doesnt die gracefully and respawn as
stated earlier and doesnt return to the ST_INIT state. This allows the Cluster Manager
to participate in cluster activities and keep track of changes that occur in the cluster.
As with starting cluster services, the options that you choose here are retained in the
HACMP ODM and repopulated on reentry.

V3.1.0.1
Student Notebook
Uempty
Verifying Cluster Services Have Stopped:

Stopping w/out Unmanaged Resource Groups
usa # tail -2 /tmp/hacmp.out
clexit.rc : Normal termination of clstrmgrES. Restart now.
0513-059 The clstrmgrES Subsystem has been started. Subsystem PID is 483466.
uk # clstat -a
------------------------------------Cluster: ibmcluster (1156578448)
Wed Aug 30 10:44:20 2006
State: UP
SubState: STABLE

Current state: ST_INIT
. . .
Nodes: 2
Node: usa
State: DOWN
Interface: usaboot1 (2) Address:
State:
Interface: usaboot2 (2) Address:
State:
192.168.15.29
DOWN
192.168.16.29
DOWN
Have patience!
It can take a few
minutes
usa # tail -1 /tmp/clstrmgr.debug.1

Wed Aug 30 10:31:54 code is 0 - exhale our dying breath and count on the good graces of SRC to reincarnate us!
(Note: When the clstrmgr restarts, clstmgr.debug is renamed to clstrmgr.debug.1
and a new clstrmgr.debug is created)
Figure 3-37. Verifying Cluster Services Have Stopped: Stopping Without Unmanaged Resource Groups
QV1251.2
Notes
Stop of cluster services without going to unmanaged
This means youve chosen to stop cluster services either with the Bring Resource
Groups Offline or Move Resource Groups option. In other words, its not a forced
down.
As with starting cluster services, remember that patience is key.

As stated above, stopping cluster services results in the Cluster Manager daemon
being respawned by the System Resource Controller. The surest way to verify that
cluster services has stopped completely is the following message in /tmp/hacmp.out,
indicating that cluster services has stopped and the Cluster Manager Daemon has been
respawned:
clexit.rc: Normal termination of clstrmgrES. Restart now.

3-57
Student Notebook
0513-059 The clstrmgrES Subsystem has been started. Subsystem PID is nnnnnn.
Although you may find the output to be unreliable at times, the clstat utility is a good
mechanism to use. Note that it was run on another system, not the one where cluster
services was stopped. If youre not a fan of clstat consider using cldump, which relies
on SNMP directly.
Another option is to use lssrc. This is to be used with caution. You must understand
what state is expected and then be patient, retrying the command to ensure that the
state changes are no longer occurring. A state of ST_INIT is the indication that cluster
services has stopped on this node. This is the resulting state from a respawn of the
Cluster Manager daemon. As you will see in the next visual, stopping cluster services
with unmanaged resource groups leaves the Cluster Manager daemon in ST_STABLE.
Know what state to expect.

V3.1.0.1
Student Notebook
Uempty
Verifying Cluster Services Have Stopped:

Stopping with Unmanaged Resource Groups
usa # clRGinfo
-------------------------------------Group Name
Group State
Node
-------------------------------------xwebgroup
UNMANAGED
usa
UNMANAGED
uk

Current state: ST_STABLE
Forced down node list: usa
uk # clstat -a
-------------------------------------
Have patience!
It can take a few
minutes
Cluster: ibmcluster (1156578448)

Wed Aug 30 11:16:19 2006
State: UP
Nodes: 2
SubState: STABLE
Node: usa
Interface: xweb (2)

Resource Group: xwebgroup
State: UP
Address: 192.168.15.29
State: UP
Address: 192.168.5.92
State: UP
State: Unmanaged
Figure 3-38. Verifying Cluster Services Have Stopped:
Stopping With Unmanaged Resource Groups
QV1251.2
Notes
Stop of cluster services with unmanaged resource groups
This means youve chosen to force down cluster services.
One more time, remember that patience is key. Did I mention that getting the OK in
SMIT does NOT mean that the task has been completely performed?

In the case of unmanaged resource groups, stopping cluster services does NOT result
in the Cluster Manager daemon dieing gracefully and being respawned by the System
Resource Controller. The Cluster Manager daemon stays up and should remain in the
ST_STABLE state. But using lssrc -ls clstrmgrES can be useful in determining
which nodes have been forced down, as it provides a list as shown on the visual.
Again, the clstat utility can be a good mechanism to use. Note that it was run on
another system, not the one where cluster services was stopped. Notice that it shows

3-59
Student Notebook
the resource group state as Unmanaged and the service IP label is available. You only
stopped cluster services, not the resources.
The quickest way to see that there are unmanaged resources is to use clRGinfo. Note
that is shows the state of the resource group as unmanaged on both nodes. In fact, it
will show unmanaged on any node where that resource group can acquired as long as
this isnt a concurrent resource group. If the startup policy is Online on All Available
Nodes, it will show unmanaged only on the node where cluster services was stopped.
How do I get a resource group out of the unmanaged state?

Change the resource group to the offline state in order to move it to another node. This
clearly involves application downtime. Or, restart cluster services on the forced node,
specifying Automatically for the Manage Resource Groups option. Understand that
this will cause the application server start script to be run again, unless an application
monitor is configured for the application that indicates the application is currently
running. In the case where the application monitor detects the running application, the
application server start script is not invoked. A similar option is to start cluster services
on the forced node, but specify Manually for the Manage Resource Groups option.
Then use C-SPOC to bring the resource group online at your discretion. The same
warning applies about a respawn of the application server start script in this scenario.

V3.1.0.1
Student Notebook
Uempty
LVM Change Management

Historically, lack of LVM change management has been a
major cause of cluster failure during fallover. There are
several methods available to ensure LVM changes are
correctly synced across the cluster.
Manual updates to each node to synchronize the ODM
records
Lazy update
C-SPOC synchronization of ODM records
RSCT for enhanced concurrent volume groups
C-SPOC LVM operations - cluster enabled equivalents of the
standard SMIT LVM functions
VGDA = ODM
Figure 3-39. LVM Change Management
QV1251.2
Notes
The importance of LVM change management
LVM change management is critical for successful takeover in the event of a node
failure.
Information regarding LVM constructs is held in a number of different locations:
- physical disks: VGDA, LVCB
- AIX files: primarily the ODM, but also /usr/sbin/cluster/etc/vg, files in the /dev
directory and /etc/filesystems
- physical RAM: kernel memory space
This information must be kept in sync on all nodes which may access the shared
volume group(s) in order for takeover to work.

3-61
Student Notebook
How to keep LVM synchronized across the cluster

There are a number of ways to ensure this information is kept in sync:
a. Manual update
b. Lazy Update
c. C-SPOC VG synchronization utility
d. C-SPOC LVM operations
e. RSCT (for enhanced concurrent mode volume groups)

V3.1.0.1
Student Notebook
Uempty
LVM Changes, Manual

To perform manual changes the volume group must be
active on one of the nodes
1. Make necessary changes to the volume group or filesystem
2. Unmount filesystems and varyoff the vg (or stop cluster services)
On all the other nodes that share the volume group

1.
2.
3.
4.
Export the volume group from the ODM

Import the information from the VGDA
Change the auto vary on flag (if necessary)
Correct the permissions and ownership's on the logical volumes as
required
5. Repeat to all other nodes
#
#
#
#
mklv -ydb10lv' -t'jfs2' sharedvg 10

crfs -v jfs2 -d'db10lv' -m'/db10'
unmount /sharedfs
varyoffvg sharedvg
#
#
#
#
exportvg sharedvg
importvg -V123 -y sharedvg hdisk3
chvg -an sharedvg
varyoffvg sharedvg
Figure 3-40. LVM Changes, Manual
QV1251.2
Notes
Making manual changes to the LVM
After making a change to an LVM component such as creating a new logical volume
and file system as shown in the figure, you must propagate the change to the other
nodes in the cluster which are sharing the volume group using the steps above. Make
sure that the auto activate is turned off (chvg -an sharedvg) after the importvg
command is executed since the Cluster Manager will control the use of the varyonvg
command on the node where the volume group should be varied on.
Other than the sheer complexity of this procedure, the real problem with it is that it
requires that the resource group be down while the procedure is being carried out.
Fortunately, there are better ways...

3-63
Student Notebook
LVM Changes, Lazy Update

At fallover time, lazy update compares the time stamp
value in the VGDA with one stored in the ODM. If the time
stamps are the same, then the varyonvg proceeds.
If the timestamps do not agree, then HACMP does the
export/import cycle similar to a manual update.
Note: HACMP does change the VG auto vary on flag AND it
preserves permissions and ownership of the logical volumes.
11 12 1
10
2
3
9
4
8
7 6 5
11 12 1
10
2
3
9
4
8
7 6 5
Figure 3-41. LVM Changes, Lazy Update
QV1251.2
Notes
The lazy administrators solution
HACMP has a facility called Lazy Update that it uses to attempt to synchronize LVM
changes during a fallover.
HACMP uses a copy of the timestamp kept in the ODM and a timestamp from the
volume groups VGDA. AIX updates the ODM timestamp whenever the LVM component
is modified on that system. When a cluster node attempts to vary on the volume group,
HACMP for AIX compares the timestamp from the ODM with the timestamp in the
VGDA on the disk (use /usr/es/sbin/cluster/utilities/clvgdata hdiskn to find
the VGDA timestamp for a volume group). If the values are different, HACMP exports
and re-imports the volume group before activating it.
This method requires no downtime although it does increase the fallover time minimally
for the first fallover after the LVM change was made. Realize though that this isnt the
best solution and will not fix every situation where nodes are out-of-sync.

V3.1.0.1
Student Notebook
Uempty
LVM Changes, C-SPOC Synchronization

Manually make your change to the LVM on one node
Use C-SPOC to propagate the changes to all nodes in the
resource group
Filesystem updates (imfs) are not performed using this function if the
volume group is an enhanced concurrent mode volume group
smitty hacmp --> System Management (C-SPOC) --> HACMP Logical
Volume Management --> Synchronize a Shared Volume Group
Definition
update vg constructs
use C-SPOC syncvg
C-SPOC updates ODM

and the time stamp file
Figure 3-42. LVM Changes, C-SPOC Synchronization
QV1251.2
Notes
Using C-SPOC to synchronize manual LVM changes
In this method, you manually make your change to the LVM on one node and then
invoke C-SPOC to propagate the change. Most likely the reason you are using this
C-SPOC task is because someone who is unfamiliar with cluster node management
made a change to a shared LVM component without using C-SPOC, creating an
out-of-sync condition between a node in the cluster and the rest of the nodes. This task
allows you to use C-SPOC to clean-up after-the-fact.
Note: If using an enhanced concurrent mode volume group and a filesystem has been
added to an existing logical volume without using C-SPOC, the imfs is not done
meaning this is an ineffective function. For this reason (among many others), you are
strongly encouraged to use C-SPOC to perform the LVM add/remove/update and not
use this mechanism to synchronize after-the-fact.

3-65
Student Notebook
Enhanced Concurrent
Mode Volume Groups
Another synchronization method is the use of ECMVGs
(Enhanced Concurrent Mode Volume Groups)
RSCT updates LVM information automatically for
ECMVGs
Happens immediately on all nodes running cluster services
Nodes that are not running cluster services will be updated when
cluster services are started
Limitations
Incomplete
/etc/filesystems not updated
Incompatible
Must be careful using ECMVGs if any product that is running on the
system places SCSI reserves on the disks as part of its function
Figure 3-43. Enhanced Concurrent Mode Volume Groups
QV1251.2
Notes
RSCT as LVM change management
With enhanced concurrent mode (ECM) volume groups, RSCT will automatically
update the ODM on all the nodes which share the volume group when an LVM change
occurs on one node.
However, since it is limited to only ECM volume groups and since /etc/filesystems is
not updated, its better to explicitly use C-SPOC to make LVM changes.

V3.1.0.1
Student Notebook
Uempty
Managing Shared LVM

Components with C-SPOC
HACMP Logical Volume Management
Make non-enhanced concurrent

mode volume groups
Manage volume groups in
home node or first available
resource groups

Shared Volume Groups
Shared Logical Volumes
Shared File Systems
Synchronize Shared LVM Mirrors
Synchronize a Shared Volume Group Definition
HACMP Concurrent Logical Volume Management

F1=Help
F9=Shell
F2=Refresh
F10=Exit
Make enhanced concurrent

mode volume groups
Manage online on all nodes
volume groups
F3=Cancel
Enter=Do

F8=Image
Concurrent Volume Groups
Concurrent Logical Volumes
Synchronize Concurrent LVM Mirrors
F1=Help
F9=Shell
F2=Refresh
F10=Exit
F3=Cancel
Enter=Do
F8=Image
Figure 3-44. Managing Shared LVM Components with C-SPOC
QV1251.2
Notes
Introduction
This is the menu for using C-SPOC to perform LVM change management and
synchronization. As was mentioned in the LVM unit, you can make changes in AIX
directly and then synchronize OR, you can make the changes utilizing C-SPOC utilities
where the synchronization is automatic.
C-SPOC simplifies the process

Once youve configured the clusters topology and added a resource group, you can
configure your shared disks using this part of the C-SPOC hierarchy (available directly
from the top level C-SPOC SMIT menu). You will generally find that shared disk
configuration and maintenance is considerably easier and less prone to errors if you
use the C-SPOC for this work.

3-67
Student Notebook
How it works
Once you create a shared volume group, you must rerun the discovery mechanism
(refer to top-level menu in the Enhanced Configuration path) to get HACMP to know
about the volume group. You must then add the volume group to a resource group
before you can use C-SPOC to add shared logical volumes or filesystems.
Synchronization
Note that you only need to add the volume group to a resource group using SMIT from
one of the cluster nodes, and then you can start working with C-SPOC from the same
node. You do not need to synchronize the cluster between adding the volume group to a
resource group and working with it using C-SPOC unless you want to use C-SPOC
from some other node. Keep in mind that the volume group is not really a part of the
resource group until you synchronize that change.
Concurrent versus non-concurrent

The C-SPOC menus shown above are the two menus on the main C-SPOC menu for
Logical Volume Management. Whats the difference you ask? The Concurrent
Logical Volume Management menus are used for two things. First, to create enhanced
concurrent mode volume groups and second, most importantly, for managing volume
groups that are in resource groups that are configured with Online on All Available
Nodes for their startup policy. These are sometimes referred to as concurrent mode
resource groups or if youve been around HACMP a long time, Mode 3 resource
groups. You dont see any options for adding filesystems to these volume groups. They
are expected to be used in true concurrent mode across all the nodes in the resource
group (using raw logical volumes). The HACMP Logical Volume Management menus
are for managing volume groups in the serial access resource group types. It is
supported and generally recommended to use enhanced concurrent mode volume
groups for these types of resource groups as well as for concurrent resource groups.

V3.1.0.1
Student Notebook
Uempty
Creating a Shared Volume Group

Create a Concurrent Volume Group
Node Names
PVID
VOLUME GROUP name
Physical partition SIZE in megabytes
Volume group MAJOR NUMBER
Enhanced Concurrent Mode
Enable Cross-Site LVM Mirroring Verification
[Entry Fields]
usa,uk
00055207bbf6edab 0000>
[xwebvg]
64
[207]
true
false
+
#
+
+
Warning :
Changing the volume group major number may result
in the command being unable to execute
successfully on a node that does not have the
major number currently available. Please check
for a commonly available major number on all nodes
before changing this setting.
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure 3-45. Creating a Shared Volume Group
QV1251.2
Notes
Creating a shared volume group
You can use C-SPOC to create a volume group but be aware that you must then add
the volume group name to a resource group and synchronize. This is one case of using
C-SPOC where synchronization is not automatic.
Before creating a shared volume group for the cluster using C-SPOC check that:
- All disk devices are properly attached to the cluster nodes
- All disk devices are properly configured on all cluster nodes and the device is listed
as available on all nodes
- Disks have a PVID
(C-SPOC lists the disks by their PVIDs. This ensures that we are using the same
disk on all nodes, even if the hdisk names are not consistent across the nodes).
This menu was reached through the Concurrent Logical Volume Management option
on the main C-SPOC menu.

3-69
Student Notebook
Discover, Add VG to a Resource Group

Extended Configuration
Discover HACMP-related Information from Configured Nodes
Extended Topology Configuration
Extended Resource Configuration
Extended Event Configuration
Extended Cluster Service Settings
Extended Performance Tuning Parameters Configuration
Security and Users Configuration
Snapshot Configuration
Export Definition File for Online Planning Worksheets
Extended Verification and Synchronization
HACMP Cluster Test Tool
F1=Help
Esc+9=Shell
F2=Refresh
Esc+0=Exit
F3=Cancel
Esc+8=Image
Enter=Do
Figure 3-46. Discover, Add VG to a Resource Group
QV1251.2
Notes
Discover and add VG to resource group
After creating a volume group, you must discover it so that the new volume group will be
available in pick lists for future actions, like adding it to a resource group, and so forth.
You must use the Extended Configuration menu for both of these actions. Youll find
the discovery action at the top of the Extended Configuration menu shown in the visual.
To add the volume group to a resource group, youll use the Extended Resource
Configuration menu to get to the HACMP Extended Resource Group Configuration
menu.

V3.1.0.1
Student Notebook
Uempty
Creating a Shared File System (1 of 2)

First create logical volumes for the filesystem and jfslog. Do not forget to logform
the jfslog logical volume. Mirrored LV shown, use if appropriate.
Add a Shared Logical Volume
[TOP]
Resource Group Name
VOLUME GROUP name
Reference node
* Number of LOGICAL PARTITIONS
PHYSICAL VOLUME names
Logical volume NAME
Logical volume TYPE
POSITION on physical volume
RANGE of physical volumes
MAXIMUM NUMBER of PHYSICAL VOLUMES
to use for allocation
Number of COPIES of each logical
partition
[MORE...11]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F7=Edit
F10=Exit
[Entry Fields]
xwebgroup
xwebvg
usa
[200]
[xweblv]
[jfs]
middle
minimum
[]
+
+
#
F3=Cancel
F8=Image
Enter=Do
F4=List
The volume group must be in a resource group that is online or it does not appear in
the pop-up list.
Figure 3-47. Creating a Shared File System (1 of 2)
QV1251.2
Notes
Creating a shared file system using C-SPOC
It is generally preferable to control the names of all of your logical volumes.
Consequently, it is generally best to explicitly create a logical volume for the file system.
If the volume group does not already have a JFS log, then you must also explicitly
create a logical volume for the JFS log and format it with logform. The same can be said
if you are creating a JFS2 filesystem (unless you plan to use inline logs, then the jfs2log
wont be needed).
The volume group to which you wish to add the filesystem must be online. Your choice,
either varyonvg the volume group manually, or via starting cluster services.
However, C-PSOC allows you to add a journaled file system to either:
- A shared volume group (no previously defined cluster logical volume)
SMIT checks the list of nodes that can own the resource group that contains the
volume group, creates the logical volume (on an existing log logical volume if

3-71
Student Notebook
present, otherwise it creates a new log logical volume) and adds the file system to
the node where the volume group is varied on (whether it was varied on by the
C-SPOC utility or it was already online). All other nodes in the resource group run an
importvg -L for non-enhanced concurrent mode volume groups, or an imfs for
enhanced concurrent mode volume groups.
- A previously defined cluster logical volume (in a shared volume group)
SMIT checks the list of nodes that can own the resource group which contains the
volume group where the logical volume is located. It adds the file system to the node
where the volume group is varied on (whether it was varied on by the C-SPOC utility
or it was already online). All other nodes in the resource group run an importvg -L
for non-enhanced concurrent mode volume groups, or an imfs for enhanced
concurrent mode volume groups.

V3.1.0.1
Student Notebook
Uempty
Creating a Shared File System (2 of 2)

Then create the filesystem on the now "previously defined logical volume"
Add a Standard Journaled File System
Node Names
LOGICAL VOLUME name
* MOUNT POINT
PERMISSIONS
Mount OPTIONS
Start Disk Accounting?
Fragment Size (bytes)
Number of bytes per inode
Allocation Group Size (MBytes)
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
usa,uk
xweblv
[/xwebfs]
read/write
[]
no
4096
4096
8
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
+
+
F4=List
F8=Image
Figure 3-48. Creating a Shared File System (2 of 2)
QV1251.2
Notes
Creating a shared file system, step 2
Once youve created the logical volume, then create a file system on it. Use the path
that allows creating a file system on a previously defined logical volume.

3-73
Student Notebook
LVM Changes, Select Your Filesystem

Journaled File Systems
Add a Journaled File System
Add a Journaled File System on a Previously Defined Logical Volume
List All Shared File Systems
Change / Show Characteristics of a Shared File System
Remove a Shared File System
+--------------------------------------------------------------------------+
File System Name
# Resource Group
File System
xwebgroup
/xwebfs
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Figure 3-49. LVM Changes, Select Your Filesystem
QV1251.2
Notes
Changing a shared file system using C-SPOC
We have to provide the name of the file system which we want to change. The file
system must be in a volume group which is currently online somewhere in the cluster
and is already configured into a resource group.

V3.1.0.1
Student Notebook
Uempty
Update the Size of a Filesystem

Change/Show Characteristics of a Shared File System in the Cluster
Resource Group Name

File system name
NEW mount point
SIZE of file system
Mount GROUP
Mount AUTOMATICALLY at system restart?
PERMISSIONS
Mount OPTIONS
Start Disk Accounting?
Fragment Size (bytes)
Number of bytes per inode
Compression algorithm
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
discovery
/xwebfs
[/xwebfs]
[4000000]
[]
no
read/write
[]
no
4096
4096
no
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
F4=List
F8=Image
Figure 3-50. Update the Size of a Filesystem
QV1251.2
Notes
Changing file system size
Specify a new file system size, in 512 byte blocks, and press Enter. The file system is
re-sized and the relevant LVM information is updated on all cluster nodes configured to
use the file systems volume group.

3-75
Student Notebook
HACMP Resource Group Operations

Bring a Resource Group Online
Bring a Resource Group Offline
Move a Resource Group to Another Node
Suspend/Resume Application Monitoring
Application Availability Analysis
F1=Help
Esc+9=Shell
F2=Refresh
Esc+0=Exit
F3=Cancel
Enter=Do
Esc+8=Image
Figure 3-51. HACMP Resource Group Operations
QV1251.2
Notes
HACMP resource group and application management
This visual shows the selections for managing resource groups. We can control if and
where resource groups are running, control application monitoring, and perform
application availability analysis.
In this section, well examine the choices for managing the state and running location of
a resource group using C-SPOC.

V3.1.0.1
Student Notebook
Uempty
Priority Override Location (POL) Old

Old, problem behavior
Assigned during a resource group move operation.
The destination node for a resource group online, offline or move request
becomes the resource group's POL
Represents the location a resource group goes to regardless of cluster
events,
Meant to honor the administrators wish to have the resource group on a
specific node
Truly an override of resource group policy setting
Restore_Node_Priority_Order caused resource group movement,
regardless of Fallback policy
(e.g.: RG moved to original highest priority node, even if fallback policy was
Never Fallback)
POL is viewed with the command:

clRGinfo p
Information maintained in a file

Manual manipulation possible by changing the file
Obvious problem is that the behavior of the resource group may be

unexpected in that it may contradict the policy in the resource group
Figure 3-52. Priority Override Location (POL) Old
QV1251.2
Notes
Priority override location (POL)
HACMP V5.x introduced the notion of a priority override location. A POL overrides all
other fallover/fallback policies and possible locations for the resource group.
A resource group does not normally have a priority override location. The destination
node that you specify for a resource group move, online or offline request (see next
couple of visuals) becomes the priority override location for the resource group. The
resource group remains on that node in an online state (if you moved or brought it
online there) or offline state (if you took it offline there) until the POL is cancelled.
POL - old problem behavior

Problem behavior is in the following levels:
- Before HACMP V5.3 PTF IY84883 May 2006
- Before HACMP V5.2 PTF IY82989 April 2006

3-77
Student Notebook
- Before HACMP V5.1 PTF IY84646 May 2006

The problem with POL, is that restoring the original node priority always resulted in the
resource group moving to the original highest priority node, even if the fallback policy
was Never Fallback. This caused problems if you did not expect this behavior. Also,
there was no way to cancel the POL without resource group movement.
Persistent and non-persistent POL

Priority override locations can be persistent and non-persistent.
- A persistent priority override location remains in effect until explicitly cancelled.
- A non-persistent priority override location is cancelled either:
Explicitly
OR
Implicitly when the HACMP services are shut down on all the nodes in the cluster
simultaneously

V3.1.0.1
Student Notebook
Uempty
Priority Override Location (POL) New

Pre-HACMP V5.4
Restore_Node_Priority_Order resets POL, then moves RG back to
highest priority node (RG policies do not control)
HACMP V5.4 and later

Destination node is now the new home node
Function is strictly internal
No Restore_Node_Priority_Order SMIT choice
Original highest priority node is remembered and flagged in SMIT on later
moves
Persist across cluster reboot is no longer supported
(For more permanent changes, change the resource group)
Changes to clRGinfo p:
Now shows location of temporary highest priority and timestamp of move
Resource Group Name: appKgroup
Primary instance(s):
The following node temporarily has the highest priority for this instance:
node2, user-requested rg_move performed on Mon Jun 4 00:39:45 2007
Node
Group State
---------------------------- --------------node1
OFFLINE
node2
ONLINE
Figure 3-53. Priority Override Location (POL) New
QV1251.2
Notes
New POL behavior in older versions of HACMP
New behavior is in the following levels and later:
- HACMP V5.3 PTF IY84883 May 2006
- HACMP V5.2 PTF IY82989 April 2006
- HACMP V5.1 PTF IY84646 May 2006
In the levels shown above, the problem where the resource group moved on
Restore_Node_Priority_Order regardless of fallback policy settings is fixed.
Now the Restore_Node_Priority_Order only resets the POL setting, without resource
group movement, unless the fallback policy is Fallback to Higher Priority Node In the
List. In that case, the behavior is the same as the old way.

3-79
Student Notebook
POL behavior in HACMP V5.4

For HACMP V5.4 and later, the function is strictly internal and the resource group move
operation is treated as temporary. If more permanent changes are desired, make the
changes in the resource group. The original highest priority node is flagged in SMIT
when subsequent resource group moves are initiated.

V3.1.0.1
Student Notebook
Uempty
Moving a Resource Group

Move Resource Groups to Another Node
Move Resource Groups to Another Site
Select a Destination Node
# *Denotes Originally Configured Highest Priority Node
*usa
uk
india
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
Figure 3-54. Moving a Resource Group
QV1251.2
Notes
Moving a resource group
You can request that a resource group be moved to any node that is in the resource
groups list of nodes.
The clRGmove utility program is used, which can also be invoked from the command
line. See the man page for details.
The destination node that you specify becomes the resource groups priority override
location. On a subsequent move, the original highest priority node is marked with an
asterisk (*).

3-81
Student Notebook

Show the Current State of Applications and Resource Groups
Move a Resource Group to Another Node / Site
Select a Resource Group
# Resource Group
State
Node(s) / Site
xwebgroup
ONLINE
usa
/
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
Figure 3-55. Bring a Resource Group Offline
QV1251.2
Notes
Bring a resource group offline -> select a resource group
To start, you must select the resource group you wish to take offline. Only resource
groups that are currently online will be shown.
Choose the node

Then youll select an online node where you want the resource group brought offline.
This is pretty obvious for a resource group that will only be active on one node at a time
(OHNO or OFAN). For resource groups that can be online on more than one node at
once (Online on All Available), you can choose all or just one of the active nodes.

V3.1.0.1
Student Notebook
Uempty
Bring a Resource Group Back Online

Show the Current State of Applications and Resource Groups
Move a Resource Group to Another Node / Site
Application Availability Analysis
Select a Destination Node
# *Denotes Originally Configured Highest Priority Node
usa
uk
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
Figure 3-56. Bring a Resource Group Back Online
QV1251.2
Notes
Bring a resource group online
Bringing a resource group online will activate the resources in it on the target node. You
may want to manually bring resource groups online after performing verification of a
node that rejoins the cluster after a forced down.

3-83
Student Notebook

1.
True or False?
Using C-SPOC reduces the likelihood of an outage by reducing the
likelihood that you will make a mistake.
2. True or False?
C-SPOC reduces the need for a change management process.
3. C-SPOC cannot do which of the following administration tasks?
a. Add a user to the cluster.
b. Change the size of a filesystem.
c. Add a physical disks to the cluster.
d. Add a shared volume groups to the cluster.
e. Synchronize existing passwords.
f. None of the above.
4. True or False?
It does not matter which node in the cluster is used to initiate a
C-SPOC operation.
5. True or False?
Priority Override Location behavior changed in HACMP V5.4 to
prevent actions that conflict with desired resource group fallback
behavior.
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.
2.
3.
4.
5.
True or False?
A star configuration is a good choice for your non-IP networks.
True or False?
RSCT will automatically update /etc/filesystems when using
enhanced concurrent mode volume groups
True or False?
With HACMP V5.4, a resource groups priority override location can
be cancelled by selecting a destination node of
Restore_Node_Priority_Order.
You want to create an enhanced concurrent mode volume group that
will be used in a resource group that will have an Online on Home
Node Only Startup policy. Which C-SPOC menu should you use?
a. HACMP Logical Volume Management
b. HACMP Concurrent Logical Volume Management
You want to add a logical volume to the volume group you created in
the question above. Which C-SPOC menu should I use?
Figure 3-58. Checkpoint
QV1251.2
Notes

3-85
Student Notebook
Unit Summary
There are many tools and log files that can be used for
monitoring a cluster
Cluster tools: clstat, cldump, cltopinfo, clRGinfo
AIX commands: lssrc, lsvg, mount, netstat
Use odmget HACMPlogs to find log files
The SMIT standard and extended menus are used to make
topology and resource group changes
Implementing procedures for change management is a critical
part of administering a HACMP cluster
C-SPOC provides facilities for performing common cluster wide
administration tasks from any node within the cluster
Perform routine administrative changes
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Unit 4. Cluster Security

This unit describes the options for securing cluster communications,

Describe the HACMP options for securing cluster communications
- Connection authentication method
- VPN tunnels for cluster communications
- Message authentication and encryption

Accountability:
Lab exercises
References
www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html
HACMP for AIX manuals

4-1
Student Notebook
Unit Objectives
After
After completing
completing this
this unit,
unit, you
you should
should be
be able
able to:
to:
Describe
Describe the
the HACMP
HACMP options
options for
for securing
securing cluster
cluster
communications
communications
Connection
Connection authentication
authentication method
method
VPN
VPN tunnels
tunnels for
for cluster
cluster communications
communications
Message
Message authentication
authentication and
and encryption
encryption
QV1251.2
Notes
4-2

V3.1.0.1
Student Notebook
Uempty
How Does HACMP Communicate?

The clcomdES runs on all nodes to transparently manage
inter-node communications for HACMP
Started at boot from inittab
Requires only one communication path
Supports C-SPOC and DARE functionality
Not all communication goes through the clcomdES
Cluster Manager communications, heartbeating and
messaging:
RSCT infrastructure
Cluster information program (clinfo):
SNMP protocol
Figure 4-2. How Does HACMP Communicate?
QV1251.2
Notes
Cluster communications
A Cluster Communications daemon (clcomd) runs on each HACMP node to
transparently manage inter-node communications for HACMP. This daemon
consolidates communication mechanisms in HACMP and decreases management
traffic on the network. This communication infrastructure requires only one common
communication path, rather than multiple TCP connections, between each pair of
nodes.
Most components communicate through the Cluster Communications daemon, but
some components use a different mechanism for inter-node communications:
- Cluster Manager
RSCT
- Heartbeat messaging
RSCT
- Cluster Information Program (clinfo)
SNMP

4-3
Student Notebook
HACMP Security Options

Connection authentication for inter-node communication:
Standard
Kerberos (SP only)
IPSec (VPN) tunnels for cluster communications
Is an Internet standard
Made of a set of cryptographic protocols for:
Securing packet flows
Key exchange
Encapsulating Security Payload (ESP) Protocol provides:

Authentication, data confidentiality and message integrity
Only one key exchange protocol currently defined
Internet Key Exchange (IKE)

(http://www.ietf.org/rfc/rfc2409.txt)
HACMP (Using AIX ctsec) Services offers:
Message authentication and encryption:

Authentication only
Authentication and encryption
Figure 4-3. HACMP Security Options
QV1251.2
Notes
Security options
There are three ways that you can configure security in an HACMP cluster:
Connection authentication is based around the clcomd HACMP authentication
process, or Kerberos. Kerberos is a network authentication protocol that is based on
a secret key encryption scheme that is used only on SP systems.
Cluster communications
IPSec (IP security) is a standardized framework for securing Internet Protocol (IP)
communications by encrypting and/or authenticating each IP packet in data stream.
There are two modes of IPSec operation: transport mode and tunnel mode.
In transport mode only the payload (message) of the IP packet is encrypted. It is
fully routable since the IP header is sent as plain text; however, it cannot cross
4-4

V3.1.0.1
Student Notebook
Uempty
Network Address Translation (NAT) interfaces, as this will invalidate its hash value.
Transport mode is used for host-to-host communications over a LAN.
In tunnel mode, the entire IP packet is encrypted. It must then be encapsulated into
a new IP packet for routing to work. Tunnel mode is used for network-to-network
communications (secure tunnels between routers) or host-to-network and
host-to-host communications over the Internet.
IPSec is implemented by a set of cryptographic protocols for (1) securing packet
flows and (2) Internet key exchange. Of the former, there are two:
Authentication Header (AH), which provides authentication, payload (message) and
IP header integrity and with some cryptography algorithm also non-repudiation, but
does not offer confidentiality; and
Encapsulating Security Payload (ESP), which provides data confidentiality, payload
(message) integrity and with some cryptography algorithm also authentication.
Originally AH was only used for integrity and ESP was used only for encryption;
authentication functionality was added subsequently to ESP. Currently only one key
exchange protocol is defined, the IKE (Internet Key Exchange) protocol.
IPSec protocols operate at the network layer, layer 3 of the OSI model. Other
Internet security protocols in widespread use, such as SSL and TLS, operate from
the transport layer up (OSI layers 4-7). This makes IPSec more flexible, as it can be
used for protecting both TCP and UDP-based protocols, but increases its complexity
and processing overhead, as it cannot rely on TCP (layer 4 OSI model) to manage
reliability and fragmentation.
Message authentication and encryption
Message authentication and encryption rely on secret key technology. For
authentication, the message is signed and the signature is encrypted by a key when
sent, and the signature is decrypted and verified when received. For encryption, the
encryption algorithm uses the key to make data unreadable. The message is
encrypted when sent and decrypted when received.
You can enable message authentication alone, or both message authentication and
encryption.

4-5
Student Notebook
Standard Connection Authentication

Default method
Basic authentication based upon incoming IP address, HACMP node
name, hostname, and the cluster rhosts file
Data is matched using HACMPnode/HACMPadapter ODM classes
No encryption
Spells danger for options such as Change a users password in
C-SPOC
Auto Discovery populates /usr/es/sbin/cluster/etc/rhosts on Create an
HACMP cluster function and also Add new node to the cluster
Connectivity log
/var/hacmp/clcomd.log
#
clrsh trinity /usr/es/sbin/cluster/utilities/cldump

# tail /var/hacmp/clcomd.log
.
Wed Jul 26 17:02:39 2006: RSH: Command='SNMPINFO address',pid=655614
Wed Jul 26 17:02:39 2006: RSH:COMPLETED: exit code = 0, pid=655614
Wed Jul 26 17:02:40 2006: RSH: ACCEPTED: trinity: 192.168.2.1->192.168.2.2
Wed Jul 26 17:02:40 2006: looking for service type = 1
Wed Jul 26 17:02:40 2006: RSH: Command='/usr/es/sbin/cluster/utilities/clRGinfo v',pid=606398
Wed Jul 26 17:02:40 2006: RSH:COMPLETED: exit code = 0, pid=606398
Figure 4-4. Standard Connection Authentication
QV1251.2
Notes
How standard authentication works
The clcomd daemon authenticates each inbound session by checking the session's
source IP address against a list of addresses in /usr/sbin/cluster/etc/rhosts and the
addresses configured into the cluster itself (in other words, in the HACMPadapter and
HACMPnode ODM files). In order to defeat any attempt at IP spoofing (a very
timing-dependent technique which involves faking a session's source IP address), each
non-call-back session is checked by connecting back to the source IP address and
verifying who the sender is.
The action taken to a request depends on the state of the /usr/sbin/cluster/etc/rhosts. If
a cluster node is being moved to a new cluster or if the entire cluster configuration is
being redone from scratch, it may be necessary to empty /usr/es/sbin/cluster/etc/rhosts
or manually populate it with the appropriate IP addresses for the new cluster.
4-6

V3.1.0.1
Student Notebook
Uempty
Security hole at installation time

The empty /usr/es/sbin/cluster/etc/rhosts file provides a window of opportunity between
installation and when HACMP is configured. To further reduce this window it is possible
to edit this file just after the installation if it is considered that this window will be a
problem.

4-7
Student Notebook
Using IPSec VPN Tunnels

for Communications (1 of 2)
1. Install and configure AIX IP Security filesets
bos.crypto-priv
bos.net.ipsec.websm
bos.net.ipsec.rte
bos.msg.LANG.net.ipsec
bos.net.ipsec.keymgt
2. Ensure the cluster has persistent IP addresses defined and active
3. Create the VPN tunnel
Detailed subject covered in AIX 5L Version 5.3 Security Guide (SC234907-03)
a) Edit the IKE XML templates provided by IBM to configure the VPN tunnel
b) On each node load the XML file to create the IKE database
# ikedb pF /tmp/IKEtun.xml
a) Activate the tunnel
# ike cmd=activate
a) List the tunnel to check that its active
# ike cmd=list
Figure 4-5. Using IPSec VPN Tunnels for Communications (1 of 2)
QV1251.2
Notes
Setting up cluster communications over VPN
VPN support relies on the IP Security feature in AIX. There are a number of additional
filesets which need to be installed that are listed in the visual. Choose the desired
bos.msg.LANG.net.ipsec filesets, and the bos.crypto-priv fileset for your country.
You can configure VLANs in AIX using SMIT or Web-based System Manager. For more
information about VPNs, you can go to
http://www.ibm.com/servers/aix/products/ibmsw/security/vpn/techref/
A ESP host-to-host transport VPN tunnel over the persistent address network is
recommend in this example. The topic of IPSec itself is way beyond the realms of this
course. Further reading and education can be found in the AIX 5L Version 5.3 Security
Guide and the AU42 Security course.
4-8

V3.1.0.1
Student Notebook
Uempty
Using IPSec VPN Tunnels (2 of 2)

4. Configure HACMP to use persistent labels for VPN tunnels
smitty hacmp -> Extended Configuration
-> Security and Users Configuration -> HACMP Cluster
Security -> Configure Connection Authentication Mode
5. Synchronize the cluster

Figure 4-6. Using IPSec VPN Tunnels (2 of 2)
QV1251.2
Notes
Configure HACMP to use VPN tunnels
Once the tunnel has been created, you need to instruct HACMP to use it with the SMIT
menu shown in the visual. You will select Yes for the field Use Persistent Labels for
VPN Tunnels, and then synchronize the cluster.

4-9
Student Notebook
Create Additional IP Security

6. Optional: Create additional IP Security filter rules which implicitly deny
port 6191 (clcomdES) on the base IP addresses
smitty ipsec4 -> Advanced IP Security Configuration ->
Configure IP Security Filter Rules ->Add an IP Security
Filter Rule
Figure 4-7. Create Additional IP Security
QV1251.2
Notes
Additional IP security
Optionally, you can configure IP filter rules to implicitly deny port 6191 across the
HACMP boot IP networks. To do this, add an IP security filter rule that denies access to
the port for boot IP addresses.

V3.1.0.1
Student Notebook
Uempty
HACMP Message
Authentication and Encryption (1 of 3)
1. Install rsct.crypt.<symmetric crypt algorithm>
DES:
3DES:
AES:
Low encryption
Medium encryption
High encryption
Can be configured through smit or command line
smitty hacmp -> C-SPOC -> Security and Users -> HACMP Cluster
Security -> Configure Message Authentication Mode and Key
Management
command line easier (recommended)
Figure 4-8. HACMP Message Authentication and Encryption (1 of 3)
QV1251.2
Notes
Cluster Security Services
Message authentication and encryption rely on Cluster Security (CtSec) Services in
AIX, and use the encryption keys available from Cluster Security Services. HACMP
message authentication uses message digest version 5 (MD5) to create the digital
signatures for the message digest. Message authentication uses the following types of
keys to encrypt and decrypt signatures and messages (if selected):
- Data encryption standard (DES)
- Triple DES
- Advanced encryption standard (AES)
The message authentication mode is based on the encryption algorithm. Your selection
of a message authentication mode depends on the security requirements for your
HACMP cluster.

4-11
Student Notebook
Authenticating and encrypting messages increases the overhead required to process

messages and may impact HACMP performance. Processing more sophisticated
encryption algorithms may take more time than less complex algorithms. For example,
processing AES messages may take more time than processing DES messages.
You can configure message authentication and encryption using SMIT menus or the
command line. It is recommended that you configure them from the command line.
Prerequisites
The HACMP product does not include encryption libraries. Before you can use
message authentication and encryption, the following AIX 5L filesets must be installed
on each cluster node:
- For data encryption with DES message authentication: rsct.crypt.des
- For data encryption standard Triple DES message authentication: rsct.crypt.3des
- For data encryption with Advanced Encryption Standard (AES) message
authentication: rsct.crypt.aes256
You can install these filesets from the AIX 5L Expansion Pack CD-ROM.
If you install the AIX 5L encryption filesets after you have HACMP running, restart the
Cluster Communications daemon to enable HACMP to use these filesets. To restart the
Cluster Communications daemon:
stopsrc -s clcomdES
startsrc -s clcomdES

V3.1.0.1
Student Notebook
Uempty
Message Authentication
and Encryption (2 of 3)
4. If you trust your network then HACMP can distribute the secure key.
Enable key distribution on all nodes:
# clkeygen e Enabled
0513-077 Subsystem has been changed.
0513-044 The clcomdES Subsystem was requested to stop.
0513-059 The clcomdES Subsystem has been started. Subsystem
PID is 315598.
The key distribution was Enabled
5. Generate and distribute a key. In this example we will use MD5

signature for authentication and 3DES for Encryption
# clkeygen gmd5_3des -d
6. Activate the key on all nodes

# clkeygen kc
Keys are located in /usr/es/sbin/cluster/etc named

key_md5_<symmetric algorithm>
Keys can also be distributed manually using a method such as

Secure Copy (scp)
Figure 4-9. Message Authentication and Encryption (2 of 3)
QV1251.2
Notes
Managing keys
HACMP cluster security uses a shared common (symmetric) key. This means that each
node must have a copy of the same key for inter-node communications to be
successful. You control when keys change and how keys are distributed.
The steps above show only the commands for enabling message authentication and
encryption, assuming that we trust our network in allowing HACMP to automatically
distribute the key. SMIT can also be used to accomplish this, but using the command
line is much easier. In the lab you will explore both automatic and manual key
distribution methods.
If you want HACMP to distribute keys automatically, you have to enable key distribution
on each node, as shown in the visual (or using the Extended Configuration ->
Security and Users Configuration -> HACMP Cluster Security -> Configure
Message Authentication Mode and Key Management -> Enable/Disable
Automatic Key Distribution SMIT path).

4-13
Student Notebook
Disable key distribution when done

Remember to disable key distribution on each node after keys have been distributed
and activated. Leaving key distribution enabled might allow an unwelcome user to
distribute a spurious key to cluster nodes and compromise cluster security.

V3.1.0.1
Student Notebook
Uempty
Message Authentication
and Encryption (3 of 3)
6. Set HACMP to use Message Authentication and
Encryption
# clchclstr -m 'md5_3des' e
Cluster Name: myapp_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: md5_3des
Cluster Message Encryption: Enabled
Use Persistent Labels for Communication: No
7. Synchronize the cluster Done!

Create and distribute new keys as required by your
security policy
Figure 4-10. Message Authentication and Encryption (3 of 3)
QV1251.2
Notes
Configure HACMP and synchronize
Either before or after you create, distribute and activate keys on all nodes, you must
configure HACMP to use message authentication and encryption. You can do this with
the command shown in the visual, or from the Extended Configuration -> Security
and Users Configuration -> HACMP Cluster Configuration -> Configure
Message Authentication Mode and Key Management -> Configure Message
Authentication Mode SMIT menu path.
After you configure HACMP to use message authentication and encryption,
synchronize the cluster.
Key maintenance
It may be necessary to periodically create, distribute and activate new keys to satisfy
your security requirements.

4-15
Student Notebook
A Holistic Approach to Security

Outside HACMP, standard operating system practices apply:
Read the AIX Security Guide
Available from the AIX Information Center
Harden the system using the AIX Security Expert or the aixpert command
A system security hardening tool that automatically configures over 300
security configuration settings based on a specified level of security
Allows automatically replicating configuration on other systems
Eliminate unnecessary services
tn, rsh, rexec, ftp, and so on
Switch to more secure protocols/implementations
openSSH, f-secure, tcpwrappers, and so on
Minimize access to the cluster nodes
Stay current with security patches
Sign up for the IBM security advisories
Monitor the cluster carefully
Assume that you will be compromised (some day)
Have a response plan
Figure 4-11. A Holistic Approach to Security
QV1251.2
Notes
The bigger picture
A holistic approach to security is a general approach to system hardening. It is
important to There are numerous security configuration settings in any operating
system, and mastering all of them is no small task. A good start is reading the AIX
Security Guide. It describes settings and tools for securing the operating system and
network, and also describes the AIX Security Expert, available in AIX 5.3.
AIX Security Expert

AIX Security Expert is a system security hardening tool that provides a center for all
security settings (TCP, NET, IPSEC, system, and auditing). It provides simple menu
settings for High Level Security, Medium Level Security, Low Level Security, and AIX
Standard Settings security that integrate over 300 security configuration settings while
still providing control over each security element for advanced administrators. AIX
Security Expert can be used to implement the appropriate level of security, without the
V3.1.0.1
Student Notebook
Uempty
necessity of reading a large number of papers on security hardening and then

individually implementing each security element.
AIX Security Expert can be used to take a security configuration snapshot. This
snapshot can be used to set up the same security configuration on other systems. This
both saves time and ensures that all systems have the proper security configuration in
an enterprise environment. AIX Security Expert can be run from Web-based System
Manager, SMIT, or you can use the aixpert command.
Securing a clustered environment

Most of the work in securing a clustered environment is the same as securing a
standalone system. For example you should:
- Keep security fixes current on all nodes in the cluster
- Use secure services for all node communications, and eliminate unnecessary
services
- Minimize system and cluster administrator access to all nodes
- Monitor all nodes in the cluster
- Assume you will be compromised and have a plan to recover

4-17
Student Notebook
Checkpoint (1 of 2)
1. Which daemon, which uses the
/usr/es/sbin/cluster/etc/rhosts file for authentication, do
most inter-node communications use:
a. RSCT
b. clcomd
c. SNMP
d. clinfo
2. True or False: HACMP supports two connection
authentication methods, Standard and Kerberos.
3. True or False: Use of VPN tunnels for cluster
communications requires that nodes are configured
with persistent IP labels, and that HACMP is configured
to use them.
Figure 4-12. Checkpoint (1 of 2)
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Checkpoint (2 of 2)
4. True or False: You can enable message encryption
without enabling message authentication.
5. Which of the following is TRUE about configuring
message encryption in HACMP:
a. It is a simple, one-step process
b. It only requires AIX base install and HACMP filesets
c. It requires installing rsct.crypt and performing tasks
on all nodes to enable and implement key
distribution and activation
d. It can only be configured on the command line
6. True or False: AIX Security Expert provides automatic
configuration of security settings, including those for
TCP, NET, IPSEC, system, and auditing.
Figure 4-13. Checkpoint (2 of 2)
QV1251.2
Notes:

4-19
Student Notebook
Unit Summary
Key
Key points
points from
from this
this Unit:
Unit:
There
There are
are several
several security
security options
options that
that can
can be
be
configured
configured for
for HACMP
HACMP
connection
connection authentication
authentication method,
method, VPN
VPN tunnels,
tunnels, and
and
message
message authentication
authentication and
and encryption
encryption
Connection
authentication methods
methods
Standard,
Kerberos
Standard, Kerberos
VPN
VPN tunnels
tunnels
Requires
Requires AIX
AIX IP
IP security
security filesets
filesets
Use
Use persistent
persistent labels
labels for
for VPN
VPN tunnels
tunnels
Message
Message authentication
authentication and
and encryption
encryption
Requires
rsct.crypt
Requires rsct.crypt
Enable
Enable distribution,
distribution, create
create keys,
keys, distribute
distribute and
and
activate
activate them
them
Keep
Keep the
the big
big picture
picture in
in mind
mind and
and use
use aa holistic
holistic approach
approach
to
to security,
security, or
or cluster
cluster security
security wont
wont make
make aa difference
difference
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Lab Exercises: Exercise 3

and Optional Exercises
Exercise 3: Basic HACMP Administration
Estimated time: 3 hours
Use C-SPOC to make changes to the running cluster and
observe how resource group policies affect where an
application runs in the cluster
(Optional) Exercise 4: Cluster Security
Estimated time: 1 hour
Configure a VPN tunnel and message authentication and
encryption
(Optional) Appendix A: Network File System (NFS)
Estimated time: 1 hour
Configure a highly available NFS export and cross-mount
Figure 4-15. Lab Exercises: Exercise 3 and Optional Exercises
QV1251.2
Notes:

4-21
Student Notebook

V3.1.0.1
Student Notebook
AP
Appendix A. Checkpoint Solutions

Unit 1
Checkpoint Solutions
1.
True or False: HWAT is compatible with IPAT over Aliasing
2.
If node1 has NICs configured with the addresses 192.168.20.1 and

192.168.21.1 and node2 has NICs with the IP addresses 192.168.20.2 and
192.168.21.2, then which of the following are valid service IP addresses
when using IPAT via Aliasing:
a.
(192.168.20.3 and 192.168.20.4) OR (192.168.21.3 and
192.168.21.4)
b.
192.168.20.3 and 192.168.20.4 and 192.168.21.3 and 192.168.21.4
c.
192.168.22.3 and 192.168.22.4
d.
192.168.23.3 and 192.168.20.3
3.
On reboot of a failed node, HACMP will:

a.
Do nothing
b.
Issue a clRGmove for all RGs which belong to that node
c.
Bring on-line RGs which are in ERROR state only
d.
It depends on whether HACMP starts at boot, the default is Do
nothing
4.
True or False: A Resource may belong to more than one Resource group.
5.
A /dev/hdisk device when used by HACMP as a non-IP heartbeat network is

referred to as a
a.
Communication interface
b.
Communication device
c.
Communication adapter
d.
Non-IP network

A-1
Student Notebook
Unit 2
Checkpoint Solutions (1 of 3)
1. Which of the following statements is TRUE (pick the best answer)?
a. Static application data should always reside on private storage.
b. Dynamic application data should always reside on shared
storage.
c. Shared storage must always be simultaneously accessible in
read-write mode to all cluster nodes.
d. Application binaries should only be placed on shared storage.
2. True or False?
Using RSCT-based shared disk protection results in slower
fallovers.
3. Which of the following disk technologies are supported by
HACMP?
a. SCSI
b. SSA
c. FC
d. All of the above
A-2

V3.1.0.1
Student Notebook
AP
Unit 2
4. True or False?
You should check the vendors website for supported
HACMP configurations when using SAN based storage units
(DS8000, ESS, EMC HDS, and so forth).
5. True or False?
hdisk numbers must map to the same PVIDs across an entire
HACMP cluster.
6. True or False?
Lazy update attempts to keep VGDA constructs in sync
between cluster nodes (reserve/release-based shared
storage protection)
7. Which of the following commands will bring a volume group
named vgA online?
a. mountvg vgA
b. getvg vgA
c. attachvg vgA
d. varyonvg vgA

A-3
Student Notebook
Unit 2
8. True or False?
Quorum should always be disabled on shared volume groups.
9. True or False?
File system and logical volume attributes cannot be changed while
the cluster is operational.
10. True or False?
An enhanced concurrent volume group is required for the
heartbeat over disk feature.
A-4

V3.1.0.1
Student Notebook
AP
Unit 3
Lets Review: Topic 1 Solutions

1.
What's the fastest way to locate the cluster.log file?

a. Consult the HACMP Troubleshooting Guide
b. odmget HACMPlogs
c. find / -name cluster.log -print
d. Open a service call
2. True or False?
cldump does not require clinfoES.
3.
True or False?
clstat does not require clinfoES.

A-5
Student Notebook
Unit 3

1.
True or False?
Creating a third resource group on a cluster that has only one IP
network with two interfaces on each node requires using IPAT via
aliasing.
2. True or False?
It is NOT possible to add a node while HACMP is running.
3. Youve decided to add a third node to your existing two-node HACMP
cluster. What very important step follows adding the node definition to
the cluster configuration (whether through standard or extended path)?
a. Install HACMP software
b. Configure a non-IP network
c. Start Cluster Services on the new node
d. Add a resource group for the new node
4. What should you do first when removing a node from a cluster?
a. Uninstall HACMP software
b. Move (or take offline) any resource groups online on the node
c. Remove the nodes IP address from the rhosts file
A-6

V3.1.0.1
Student Notebook
AP
Unit 3

1.
True or False?
Using C-SPOC reduces the likelihood of an outage by reducing the
likelihood that you will make a mistake.
2. True or False?
C-SPOC reduces the need for a change management process.
3. C-SPOC cannot do which of the following administration tasks?
a. Add a user to the cluster.
b. Change the size of a filesystem.
c. Add a physical disks to the cluster.
d. Add a shared volume groups to the cluster.
e. Synchronize existing passwords.
f. None of the above. (e was correct for previous versions)
4. True or False?
It does not matter which node in the cluster is used to initiate a
C-SPOC operation.
5. True or False?
Priority Override Location behavior changed in HACMP V5.4 to
prevent actions that conflict with desired resource group fallback
behavior.

A-7
Student Notebook
Unit 3
1.
2.
3.
4.
5.
True or False?
A star configuration is a good choice for your non-IP networks.
True or False?
RSCT will automatically update /etc/filesystems when using
enhanced concurrent mode volume groups
True or False?
With HACMP V5.4, a resource groups priority override location can
be cancelled by selecting a destination node of
Restore_Node_Priority_Order.
You want to create an enhanced concurrent mode volume group that
will be used in a resource group that will have an Online on Home
Node Startup policy. Which C-SPOC menu should you use?
You want to add a logical volume to the volume group you created in
the question above. Which C-SPOC menu should I use?
A-8

V3.1.0.1
Student Notebook
AP
Unit 4
1. Which daemon, which uses the
/usr/es/sbin/cluster/etc/rhosts file for authentication, do
most inter-node communications use:
a. RSCT
b. clcomd
c. SNMP
d. clinfo
2. True or False: HACMP supports two connection
authentication methods, Standard and Kerberos.
3. True or False: Use of VPN tunnels for cluster
communications requires that nodes are configured with
persistent IP labels, and that HACMP is configured to
use them.

A-9
Student Notebook
Unit 4
4. True or False: You can enable message encryption
without enabling message authentication.
5. Which of the following is TRUE about configuring
message encryption in HACMP:
a. It is a simple, one-step process
b. It only requires AIX base install and HACMP filesets
c. It requires installing rsct.crypt and performing
tasks on all nodes to enable and implement key
distribution and activation
d. It can only be configured on the command line
6. True or False: AIX Security Expert provides automatic
configuration of security settings, including those for
TCP, NET, IPSEC, system, and auditing.
A-10 HACMP II: Administration

V3.1.0.1
Student Notebook
AP
Appendix B
1.
True or False?
HACMP supports all NFS export configuration options.

(/usr/es/sbin/cluster/exports must be used to specify NFS export options if the default of "read-write to
the world" is not acceptable.)
2.
Which of the following is a special consideration when using HACMP to NFS

export filesystems? (select all that apply)
a.
b.
c.
d.
3.
What does [/abc;/xyz] mean when specifying a directory to cross-mount?
a.
b.
4.
NFS exports must be read-write.

Secure RPC must be used at all times.
A cluster may not use NFS Cross-mounts if there are client
systems accessing the NFS exported filesystems.
A volume group which contains filesystems which are NFS
exported must have the same major device number on all
cluster nodes in the resource group.
/abc is the name of the filesystem which is exported and /xyz
is where it should be mounted at
/abc is where the filesystem should be mounted at and /xyz is
the name of the filesystem which is exported
True or False?
HACMP's NFS exporting feature only supports clusters of two

nodes. (Resource groups larger than two nodes which export NFS filesystems do not provide full
NFS functionality (for example, NFS file locks are not preserved across a fallover)
5.
True or False?
IPAT is required in resource groups which export NFS

filesystems.

A-11
Student Notebook
Appendix C
1.
True or False?
In HACMP 5.4, the configuration of WebSMIT is simplified by a new
utility (websmit_config) that configures WebSMIT to be independent
of the system-wide Web server configuration.
2.
True or False?
The /usr/es/sbin/cluster/wsm/README file describes the use of the
websmit_config utility.
3.
True or False?
Only HACMP SMIT panels can be accessed using Web SMIT.
4.
What file controls security settings for Web SMIT?

a. /usr/es/sbin/cluster/wsm/wsm_smit.conf
b. /usr/es/sbin/cluster/wsm/wsm_smit.redirect
c. /usr/es/sbin/cluster/wsm/wsm_smit.log
d. /usr/es/sbin/cluster/wsm/wsm_smit.script
A-12 HACMP II: Administration

V3.1.0.1
Student Notebook
Uempty
Appendix B. Integrating NFS into HACMP

This unit covers the concepts of using Suns Network File System in a
highly available cluster. You learn how to configure NFS in an HACMP
environment for maximum availability.

Explain the concepts of Network File System (NFS)
Configure HACMP to support NFS
Discuss why Volume Group major numbers must be unique when
using NFS with HACMP
Outline the NFS configuration parameters for HACMP

Accountability:
Checkpoint
Machine exercises
References
HACMP manuals

B-1
Student Notebook
Unit Objectives
Explain the concepts of Network File System (NFS)
Configure HACMP to support NFS
Discuss why Volume Group major numbers must be unique
when using NFS with HACMP
Outline the NFS configuration parameters for HACMP
Figure B-1. Unit Objectives
QV1251.2
Notes
Objectives
In this unit, we examine how NFS can be integrated in to HACMP in order to provide a
Highly Available Network File System.
B-2

V3.1.0.1
Student Notebook
Uempty
So, What is NFS?

The Network File System (NFS) is a client/server
application that lets a computer user view and optionally
store and update files on a remote computer as though they
were on the user's own computer
NFS Client
NFS mount
NFS Server
read-write
NFS mount
read-only
JFS mount
read-only
NFS mount
NFS Client and Server

shared_vg
Figure B-2. So, What is NFS?
QV1251.2
Notes
NFS
NFS is a suite of protocols which allow file sharing across an IP network. An NFS server
is a provider of file service (that is, a file, a directory or a file system). An NFS client is a
recipient of a remote file service. A system can be both an NFS client and server at the
same time.

B-3
Student Notebook
NFS Background Processes

NFS uses TCP/IP and a number of background processes
to
allow clients to access disk resource on a remote server
Configuration files are used on the client and server to
specify export and mount options
NFS Client
NFS Server
n x nfsd and mountd
n x biod
/etc/exports
/etc/filesystems
NFS Client and Server

n x biod
n x nfsd and mountd
Figure B-3. NFS Background Processes
QV1251.2
Notes
NFS processes
The NFS server uses a process called mountd to allow remote clients to mount a local
disk or CD resource across the network. One or more nfsd processes handle I/O on the
server side of the relationship.
The NFS client uses the mount command to establish a mount to a remote storage
resource which is offered for export by the NFS server. One or more block I/O
daemons, biod, run on the client to handle I/O on the client side.
The server maintains details of data resources offered to clients in the /etc/exports file.
Clients can automatically mount network file systems using the /etc/filesystems file.
B-4

V3.1.0.1
Student Notebook
Uempty
Combining NFS With HACMP

NFS exports can be made highly available by using the HACMP
resource group to specify NFS exports and mounts
client system
# mount aservice:/fsa /a
The A resource group specifies:
aservice as a service IP label resource
/fsa as a filesystem resource
/fsa as a NFS filesystem to export
client system sees /fsa as /a
aservice
export /fsa
A
/fsa
# mount /fsa
Hudson
Bondar
Figure B-4. Combining NFS With HACMP
QV1251.2
Notes
Combining NFS with HACMP
We can combine NFS with HACMP in order to achieve a Highly Available Network File
System. One node in the cluster mounts the disk resource locally and offers that disk
resource for export across the IP network. Clients optionally mount the disk resource. A
second node is configured to take over the NFS export in the event of node failure.
There is one unusual aspect to the above configuration which should be discussed. The
HACMP cluster is exporting the /fsa file system via the aservice service IP label. The
client is mounting the aservice:/fsa file system on the local mount point /a. This is
somewhat unusual in the sense that client systems usually use a local mount point
which is the same as the NFS file systems name on the server.
In the configuration shown above, there is no particularly good reason why the client is
using a different mount point than /fsa and, in fact, the client is free to use whatever
mount point is wishes to use including, of course, /fsa. Why this example is using a
local mount point of /a will become clear shortly.

B-5
Student Notebook
NFS Fallover With HACMP

In this scenario, the resource group moves to the surviving node in the
cluster, which exports /fsa. Clients see NFS server not responding
during fallover
client system
client system "sees" /fsa as /a
aservice
/fsa
export /fsa
A
# mount /fsa
Bondar
Hudson
Figure B-5. NFS Fallover With HACMP
QV1251.2
Notes
Fallover
If the node offering the NFS export should fail, a standby node takes over the shared
disk resource, locally mounts the file system, and exports the file system or directory for
remote mount.
If the client was not accessing the disk resource during the period of the fallover, then it
is not aware of the change in which node is serving the NFS export.
Note that the aservice service IP label is in the resource group which is exporting /fsa.
The HACMP NFS server support requires that resource groups which export NFS
filesystems be configured to use IPAT since the client system is not capable of dealing
with two different IP addresses for its NFS server depending on which node the NFS
server service happens to be running on.
B-6

V3.1.0.1
Student Notebook
Uempty
Configuring NFS for High Availability

[MORE...10]
[Entry Fields]
Volume Groups
[aaavg]
false
false
+
+
+

[]
fsck
sequential

true
[/fsa]
+
+

[]
[]
+
+
+
+
+
[MORE...10]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure B-6. Configuring NFS for High Availability
QV1251.2
Notes
Configuring NFS for high availability
The visual shows the resource group attributes which are important for configuring an
NFS file system.
- Filesystems/Directories to Export
Specifies the filesystems to be NFS exported.
- Filesystems mounted before IP configured
When implementing NFS support in HACMP, you should also set this option. This
prevents access from a client before the filesystems are ready.
- Filesystem (empty is ALL for VGs specified)
This particular example also explicitly lists the /fsa filesystem as a resource to be
included in the resource group (see the Filesystem (empty is ALL for VGs specified)
field). This is not necessary as this field could have been left blank to indicate that all

B-7
Student Notebook
the filesystems in the aaavg volume group should be treated as resources within the
resource group.
Only non-concurrent access resource groups

The resource group policy cannot be concurrent (On Line On All Available Nodes).
B-8

V3.1.0.1
Student Notebook
Uempty
Cross-Mounting NFS Filesystems (1 of 3)

A filesystem configured in a resource group can be made
available to all the nodes in the resource group:
One node has the resource group and acts as an NFS

server
Mounts the filesystem (/fsa)
Exports the filesystem (/fsa)
All nodes act as NFS clients

Mount the NFS filesystem (aservice:/fsa) onto a local mount point (/a)
aservice
/a
acts as an NFS server

(exports /fsa)
/fsa
/a
acts as an NFS client

Figure B-7. Cross-mounting NFS Filesystems (1 of 3)
QV1251.2
Notes
Cross-mounting
We can use HACMP to mount an NFS exported filesystem locally on all the nodes
within the cluster. This allows two or more nodes to have access to the same disk
resource in parallel. An example of such a configuration might be a shared repository
for the product manuals (read only) or a shared /home filesystem (read-write). One
node mounts the filesystem locally, then exports the filesystem. All nodes within the
resource group then NFS mount the filesystem.
By having all nodes in the resource group act as an NFS client including the node which
holds the resource group, it is not necessary for the takeover node to unmount the
filesystem before becoming the NFS server.

B-9
Student Notebook
Concurrent access limitations

While the NFS file system can be mounted read-write by multiple nodes, it should be
noted that all of the NFS caching issues that exist with a regular NFS configuration (one
not involving HACMP in any way) still exist. Parallel or concurrent writes are not
supported. For example, applications running on the two cluster nodes should not
attempt to update the same NFS served file as only one of them is likely to succeed with
the other getting either stale NFS file handle problems or mysterious loss of changes
made to the file. This is a fundamental issue with NFS.
True concurrent access

Clusters wishing to have true concurrent access to the same filesystem for reading and
writing purposes should use the IBM GPFS (General Parallel File System) product
instead of NFS to share the filesystem across the cluster nodes.
B-10 HACMP II: Administration

V3.1.0.1
Student Notebook
Uempty

When a fallover occurs, the role of NFS server moves with
the resource group
All (surviving) nodes continue to be NFS clients
aservice
/fsa
/a
/a
acts as an NFS server

(exports /fsa)
acts as an NFS client

QV1251.2
Notes
Fallover with a cross-mounted file system
If the left-hand node fails then HACMP on the right hand node initiates a fallover of the
resource group. This primarily consists of:
- Assigning or aliasing (depending on which flavor of IPAT is being used) the
aservice service IP label to a NIC
- Varying on the shared volume group and mounting the /fsa journaled filesystem
- NFS exporting the /fsa filesystem
Note that the right hand node already has the aservice:/fsa filesystem NFS mounted
on /a.

B-11
Student Notebook

Here's a more detailed look at what is going on:
client system
client system "sees" /fsa as
/a
/fsa as a NFS filesystem to mount on /a
aservice
export /fsa
A
# mount /fsa
# mount aservice:/fsa
/a
/fsa
Bondar
Hudson
QV1251.2
Notes
Cross-mounting details
The key change, compared to the configuration which did not use cross-mounting, is
that this configurations resource group lists /fsa as a NFS filesystem and specifies that
it is to be mounted on /a. This causes every node in the resource group to act as an
NFS client with aservice:/fsa mounted at /a. Only the node which actually has the
resource group is acting as an NFS server for the /fsa filesystem.

V3.1.0.1
Student Notebook
Uempty
Choosing the Network for Cross-Mounts

In a cluster with multiple IP networks, it may be useful to specify
which network should be used by HACMP for cross-mounts
This is usually done as a performance enhancement
/fsa as a NFS filesystem to mount on /a
net_ether_01 is the network for NFS mounts
net_ether_01
net_ether_02
aGservice
aservice
export /fsa
/fsa
# mount /fsa
Bondar
Hudson
Figure B-10. Choosing the Network for Cross-Mounts
QV1251.2
Notes
Network for NFS mount
HACMP allows you to specify which network should be used for NFS exports from this
resource group.
In this scenario, we have an NFS cross-mount within a cluster which has two IP
networks. For some reason, probably that the net_ether_01 network is either a faster
networking technology or under a lighter load, the cluster administrator has decided to
force the cross-mount traffic to flow over the net_ether_01 network.
This field is relevant only if you have filled in the Filesystems/Directories to NFS
Mount field. The Service IP Labels/IP Addresses field should contain a service label
which is on the network you select.
If the network you have specified is unavailable when the node is attempting to NFS
mount, it will seek other defined, available IP networks in the cluster on which to
establish the NFS mount.

B-13
Student Notebook
Configuring HACMP for Cross-Mounting

[MORE...10]
[Entry Fields]
Volume Groups
[aaavg]
false
false
+
+
+

sequential
+
[]
fsck
+
+
true
[/fsa]
+
+

[/a;/fsa]
[net_ether_01] +
[MORE...10]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
Figure B-11. Configuring HACMP for Cross-Mounting
QV1251.2
Notes
Configuring HACMP for cross-mounting
The directory or directories to be cross-mounted are specified in the
Filesystems/Directories to NFS Mount field. The network to be used for NFS
cross-mounts is optionally specified in the Network for NFS Mount field.
Cross-mount syntax
Note the rather strange /a;/fsa syntax for specifying the directory to be
cross-mounted. This rather unusual syntax is explained in the next foil.
Note that the resource group must include a service IP label which is on the
net_ether_01 network (aservice in the previous foil).

V3.1.0.1
Student Notebook
Uempty
Syntax for Specifying Cross-Mounts

Where the filesystem should be mounted over
/a;/fsa
What the filesystem is exported as
What HACMP does

(on each node in the resource group)
Figure B-12. Syntax for Specifying Cross-Mounts
QV1251.2
Notes
Syntax for specifying cross-mounts
The inclusion of a semi-colon in the Filesystems/Directories to NFS Mount field
indicates that the newer (and easier to work with) approach to NFS cross-mounting
described in this unit is in effect. The local mount point to be used by all the nodes in the
resource group when they act as NFS clients is specified before the semi-colon. The
NFS filesystem which they are to NFS mount is specified after the semi-colon.
Since the configuration specified in the last HACMP smit screen uses net_ether_01 for
cross-mounts and the service IP label on the net_ether_01 network is aservice (see
the diagram a couple of foils back showing the two IP networks), each node in the
resource group will mount aservice:/fsa on their local /a mount point directory.

B-15
Student Notebook
Ensuring the VG Major Number is Unique

Any Volume Group which contains a filesystem that is
offered for NFS export to clients or other cluster nodes must
use the same VG major number on every node in the cluster
To display the current VG major numbers, use:
# ls -l /dev/*webvg
crw-rw---1 root
crw-rw---1 root
crw-rw---1 root
system
system
system
201,
203,
205,
0 Sep 04 23:23 /dev/xwebvg

0 Sep 05 18:27 /dev/ywebvg
0 Sep 05 23:31 /dev/zwebvg
The command lvlstmajor will list the available major numbers for each
node in the cluster
For example:
# lvlstmajor
43...200,202,206...
The VG major number may be set at the time of creating the VG using SMIT
mkvg or by using the -V flag on the importvg command, for example:
# importvg -V100 -y shared_vg_a hdisk2
C-SPOC will "suggest" a VG major number which is unique across the nodes
when it is used to create a shared volume group
Figure B-13. Ensuring the VG Major Number is Unique
QV1251.2
Notes
VG major numbers
Volume group major numbers must be the same for any given volume group across all
nodes in the cluster. This is a requirement for any volume group that has filesystems
which are NFS exported to clients (either within or without the cluster).

V3.1.0.1
Student Notebook
Uempty
NFS With HACMP Considerations

Some points to note...
Resource groups which export NFS filesystems MUST

implement IPAT.
The filesystems mounted before IP configured resource group

attribute must be set to true.
HACMP does not use /etc/exports and the default is to export

filesystems rw to the world. Specify NFS export options in
/usr/es/sbin/cluster/etc/exports if you want better control (AIX 5.2
provides an option to specify this path)
HACMP only preserves NFS locks if the NFS exporting resource
group has no more than two nodes.
Figure B-14. NFS with HACMP Considerations
QV1251.2
Notes
HACMP exports file
As mentioned in the visual, if you need to specify NFS options, you must use the
HACMP exports file, not the standard AIX exports file. You can use AIX smit mknfsexp
to build the HACMP exports file:
Add a Directory to Exports List
* PATHNAME of directory to export []
* MODE to export directory read-write
HOSTS & NETGROUPS allowed client access
Anonymous UID
HOSTS allowed root access
HOSTNAME list. If exported read-mostly
Use SECURE option?
Public filesystem?
* EXPORT directory now, system restart or both
PATHNAME of alternate Exports file
/
[]
[-2]
[]
[]
no
+
no
+
both
+
[/usr/es/sbin/cluster/etc/exports]

B-17
Student Notebook
Checkpoint
1.
True or False?
2.
Which of the following is a special consideration when using HACMP to NFS

export filesystems? (select all that apply)
HACMP supports all NFS export configuration options.

a.
b.
c.
d.
3.
What does [/abc;/xyz] mean when specifying a directory to cross-mount?
a.
b.
4.
NFS exports must be read-write.

Secure RPC must be used at all times.
A cluster may not use NFS Cross-mounts if there are client
systems accessing the NFS exported filesystems.
A volume group which contains filesystems which are NFS
exported must have the same major device number on all
cluster nodes in the resource group.
/abc is the name of the filesystem which is exported and /xyz
is where it should be mounted at
/abc is where the filesystem should be mounted at and /xyz is
the name of the filesystem which is exported
True or False?
HACMP's NFS exporting feature only supports clusters of two

nodes.
5.
True or False?
IPAT is required in resource groups which export NFS

filesystems.
Figure B-15. Checkpoint
QV1251.2
Notes

V3.1.0.1
Student Notebook
Uempty
Unit Summary
HACMP provides a means to make Network File System (NFS) highly available
Configure Filesystem/Directory to Export and

Filesystems mounted before IP started in resource group
VG major number must be the same on all nodes
Clients NFS mount using service address
In case of node failure, takeover node acquires the service address,
acquires the disk resource, mounts the file system and NFS exports
the file system
Clients see NFS server not responding during the fallover
NFS file systems can be cross-mounted across all nodes
Faster takeover: takeover node does not have to unmount the file
system
A preferred network can be selected
Really only for read only file systems: NFS cross-mounted file systems
can be mounted read-write, but concurrent write attempts will produce
inconsistent results
Use GPFS for true concurrent access
Non-default export options can be specified in /usr/es/sbin/cluster/etc/exports
Figure B-16. Unit Summary
QV1251.2
Notes

B-19
Student Notebook

V3.1.0.1
Student Notebook
Uempty
Appendix C. Using WebSMIT

This unit describes how to configure and use WebSMIT.

Configure and use WebSMIT

Accountability:
Checkpoint
References
HACMP manuals

C-1
Student Notebook
Unit Objectives
Configure and use WebSMIT
Figure C-1. Unit Objectives
QV1251.2
Notes:
C-2

V3.1.0.1
Student Notebook
Uempty
Web-Enabled SMIT (WebSMIT)

HACMP V5.2 and up includes a web-enabled user interface
that provides easy access to:
HACMP configuration and management functions
Interactive cluster status display and manipulation
HACMP online documentation
The WebSMIT interface is similar to the ASCII SMIT

interface. You do not need to learn a new user interface or
terminology and can easily switch between ASCII SMIT and
WebSMIT
To use the WebSMIT interface, you must configure and run
a Web server process on the cluster nodes to be
administered
The configuration has been made simpler with HACMP 5.4 and later
Use websmit_config utility
Figure C-2. Web-Endabled SMIT (WebSMIT)
QV1251.2
Notes:
Introduction
WebSMIT combines the advantages of SMIT with the ease of access from any system
which runs a browser.
For those looking for a graphical interface for managing and monitoring HACMP,
WebSMIT provides those capabilities via a web browser. It provides real-time graphical
status of the cluster components, similar to the clstat.cgi. It also provides context menu
access to those components to control by launching a WebSMIT menu containing the
action(s) to take. There are multiple views, Node-by-node, Resource Group,
Associations, component Details, and so on.
Configuration
This utility uses snmp, so it is imperative that you have your snmp interface to the
Cluster Manager functioning. To test that, attempt a cldump command on the system

C-3
Student Notebook
where you will be running the WebSMIT utility. A configuration utility is provided
(websmit_config) requiring that only a supported http server is installed to configure
the system for use as a WebSMIT server. A robust control tool is provided as well to
control the http server functioning. The tool is called websmitctl.
C-4

V3.1.0.1
Student Notebook
Uempty
WebSMIT Main Page
HACMP SMIT access
Figure C-3. WebSMIT Main Page
QV1251.2
Notes:
Introduction
To connect to WebSMIT, point your browser to the cluster node that you have
configured for WebSMIT.
WebSMIT uses port 42267 by default.
After authentication, this will be the first screen that you see. Note the Navigation Frame
(left side) and the Activity Frame (right side). Also, note that were looking at
configuration options only. Each pane is tabulated to provide access to different status,
functions or controls.
Navigation Frame Tabs
- SMIT - access to HACMP SMIT
- N&N - a node-by-node relationship and status view of the cluster (if snmp can get
cluster information)

C-5
Student Notebook
- RGs - a resource group relationship and status view of the cluster status
Expand All / Collapse All links can be used to get the full view or clean up the view.
Activity Frame Tabs
- Configuration - permanent access to HACMP SMIT from Activity Frame
- Details - comes to top when a component is selected in Navigation Frame, and
displays configuration information about the component
- Associations - shows component relationship to other HACMP components for
component that is selected in the Navigation Frame
- Doc - If the HACMP pubs were installed (html or pdf version), this tab will display
links to access them
Dont attempt to navigate using the browsers Back/Forward buttons. Note the FastPath
box at the bottom of the Configuration Tab. This allows you to go directly to any SMIT
panel (HACMP or other) if you know the fastpath.
C-6

V3.1.0.1
Student Notebook
Uempty
WebSMIT Context Menu Controls
Activity Frame changes

Right mouse click
on app_server
Choose an item from the context menu
Figure C-4. WebSMIT Context Menu Controls
QV1251.2
Notes:
Using the context menus
Right-click the object in the Navigation Frame. Choose the item you want to control from
the context menu and watch the Activity Frame change to the task youre trying to
perform. Remember this is still SMIT, so youll get HACMP SMIT menus as a result of
the context menu selections.
Status
Notice that the icons (on the screen anyway) are color coded. This is real-time status.
More to come on the next visual, regarding the associations.

C-7
Student Notebook
WebSMIT Associations
Figure C-5. WebSMIT Associations
QV1251.2
Notes:
Associations
To see associations, go to the RGs tab, select (left mouse click) Resource Group, then
select the Associations tab.
If you dont click fast enough (or just pause long enough) between the selection of the
resource group and clicking on the Associations tab, youll see the Details tab come to
the top of the Activity Frame with the configuration details of the resource group.
C-8

V3.1.0.1
Student Notebook
Uempty
WebSMIT Online Documentation
Figure C-6. WebSMIT Online Documentation
QV1251.2
Notes:
Online documentation
This screen allows you to view the HACMP manuals in either HTML or PDF format. You
must install the HACMP documentation filesets.

C-9
Student Notebook
WebSMIT Configuration
/usr/es/sbin/cluster/wsm/README
Setting up WebSMIT online documentation
Install cluster.doc.en_US.es.html and cluster.doc.en_US.es.pdf
Configure and run a Web server on cluster nodes

websmit_config takes it from there
Security considerations
wsm_smit.conf
wsm_cmd_exec
Log files
wsm_smit.log
wsm_smit.script
Controlling which SMIT panels can be used

wsm_smit.allow
wsm_smit.deny
wsm_smit.redirect
Figure C-7. WebSMIT Configuration
QV1251.2
Notes:
Documentation
The primary source for information on configuring WebSMIT is the WebSMIT README
file as shown in the visual. The HACMP Installation Guide provides some additional
information on installation and the HACMP Administration Guide provides information
on using WebSMIT.
Web server
To use WebSMIT, you must configure one (or more) of your cluster nodes as a Web
server. You must use either IBM HTTP Server (IBMIHS) V6.0 (or later) or Apache 1.3
(or later). Refer to the specific documentation for the Web server you choose.
This configuration is done using the websmit_config utility, located in
/usr/es/sbin/cluster/wsm. See the README file for details.
C-10 HACMP II: Administration

V3.1.0.1
Student Notebook
Uempty
WebSMIT security
Since WebSMIT gives you root access to all the nodes in your cluster, you must
carefully consider the security implications.
WebSMIT uses a configuration file, wsm_smit.conf, that contains settings for
WebSMIT's security related features. This file is installed as
/usr/es/sbin/cluster/wsm/wsm_smit.conf, and it may not be moved to another
location. The default settings used provide the highest level of security in the default
AIX/Apache environment. However, you should carefully consider the security
characteristics of your system before putting WebSMIT to use. It may be possible to use
different combinations of security settings for AIX, Apache, and WebSMIT to improve
the security of the application in your environment.
WebSMIT uses the following mechanisms to implement a secure environment:
-
Non-standard port
Secure http (https)
User authentication
Session time-out
wsm_cmd_exec setuid program
Use non-standard port

WebSMIT can be configured to allow access only over a specified port using the
wsm_smit.conf AUTHORIZED_PORT setting. If you do not specify an AUTHORIZED_PORT,
or specify a port of 0, then any connections via any port will be accepted. It is strongly
recommended that you explicitly specify the AUTHORIZED_PORT, and that you use a
non-standard port. The default setting for this configuration variable is 42267.
Allow only secure http
If your http server supports secure http, it is strongly recommended that you require all
WebSMIT connections to be established via https. This will ensure that you are not
transmitting sensitive information about your cluster over the Internet in plain text.
WebSMIT can be configured to require secure http access using the wsm_smit.conf
REDIRECT_TO_HTTPS setting. If the value for this setting is 1, then users connecting to
WebSMIT via an insecure connection will be redirected to a secure http connection.
The default value for REDIRECT_TO_HTTPS is 1.
Note: Regarding the REDIRECT_TO_HTTPS variable, the README file states:
This variable will only function correctly if the AUTHORIZED_PORT feature is disabled.
This did not appear to be true in our testing.
Require user authentication
If Apache's built-in authentication is not being used, WebSMIT can be configured to use
AIX authentication using the wsm_smit.conf file REQUIRE_AUTHENTICATION setting. If
the value for this setting is 1 and there is no .htaccess file controlling access to
WebSMIT, the user will be required to provide AIX authentication information before

C-11
Student Notebook
gaining access.
(Refer to the documentation included with Apache for more details about Apache's
built-in authentication.)
The default value for REQUIRE_AUTHENTICATION is 1. If REQUIRE_AUTHENTICATION is
set, then the HACMP administrator must specify one or more users who are allowed to
access the system. This can be done using the wsm_smit.conf ACCEPTED_USERS
setting. Only users whose names are specified will be allowed access to WebSMIT, and
all ACCEPTED_USERS will be provided with root access to the system. By default, only the
root user is allowed access via the ACCEPTED_USERS setting.
Because AIX authentication mechanisms are in use, login failures can cause an account to
be locked. It is recommended that a separate user be created for the sole purpose of
accessing WebSMIT. If the root user has a login failure limit, failed WebSMIT login attempts
could quickly lock the root account.
Session time-out
Continued access to WebSMIT is controlled through the use of a non-persistent session
cookie. Cookies must be enabled in the client browser in order to use AIX
authentication for access control. If the session is used continuously, then the cookie
will not expire. However, the cookie is designed to time out after an extended period of
inactivity. WebSMIT allows the user to adjust the time-out period using the
wsm_smit.conf SESSION_TIMEOUT setting. This configuration setting must have a value
expressed in minutes. The default value for SESSION_TIMEOUT is 20 (minutes).
Controlling access to wsm_cmd_exec (setuid)
A setuid program is supplied with WebSMIT that allows non-root users to execute
commands with root permissions (wsm_cmd_exec). The setuid bit for this program must
be turned on in order for the WebSMIT system to function.
It is also very important for security reasons that wsm_cmd_exec does not have read
permission for non-root users. It should not be made possible for a non-root user to
copy the executable to another location or to decompile the program.
Thus the utility wsm_cmd_exec (located in /usr/es/sbin/cluster/wsm/cgi-bin/) must be
set with 4511 permissions.
See the README for details.
Care must be taken to limit access to this executable. WebSMIT allows the user to
dictate the list of users who are allowed to use the wsm_cmd_exec program using the
wsm_smit.conf REQUIRED_WEBSERVER_UID setting. The real user ID of the process
must match the UID of one of the users listed in wsm_smit.conf in order for the
program to carry out any of its functionality. The default value for
REQUIRED_WEBSERVER_UID is nobody.
By default, a Web server CGI process runs as user nobody, and by default it is not
possible for non-root users to execute programs as user nobody. If your http server
V3.1.0.1
Student Notebook
Uempty
configuration executes CGI programs as a different user, it is important to ensure that

the REQUIRED_WEBSERVER_UID value matches the configuration of your Web server. It is
strongly recommended that the http server be configured to run CGI programs as a user
who is not authorized to open a login shell (as with user nobody).
Log files
All operations of the WebSMIT interface are logged to the wsm_smit.log file and are
equivalent to the logging done with smitty -v. Script commands are also captured in
the wsm_smit.script log file.
WebSMIT log files are created by the CGI scripts using a relative path of <../logs>. If
you copy the CGI scripts to the default location for the IBM HTTP Server, the final path
to the logs is /usr/IBMIHS/logs.
The WebSMIT logs are not subject to manipulation by the HACMP Log Viewing and
Management SMIT panel. Also, just like smit.log and smit.script, the files grow
indefinitely.
The snap -e utility captures the WebSMIT log files if you leave them in the default
location (/usr/es/sbin/cluster/wsm/logs); but if you install WebSMIT somewhere else,
snap -e will not find them.
Customizing the WebSMIT status panel

wsm_clstat.cgi displays cluster information in the WebSMIT status panel. You can
customize wsm_clstat.cgi by changing the
/usr/es/sbin/cluster/wsm/cgi-bin/wsm_smit.conf file. This file allows you to configure
logging and the popup menus for the WebSMIT status panel.
Controlling which SMIT screens can be used

As mentioned earlier, WebSMIT will process just about any valid SMIT panel. You can
limit the set of panels that WebSMIT will process by configuring one or more of these
files.
- wsm_smit.allow
If this file exists on the server, it will be checked before any SMIT panel is
processed. If the SMIT panel id (fast path) is not contained in the file, the http
request will be rejected. Use this file to limit WebSMIT to a specific set of SMIT
panels. A sample file is provided which contains all the SMIT panel ids for HACMP.
Simply rename this file to wsm_smit.allow if you want to limit access to just the
HACMP SMIT panels.
- wsm_smit.deny
Entering a SMIT panel id in this file will cause WebSMIT to deny access to that
panel. If the same SMIT panel id is stored in both the .allow and .deny files, .deny
processing takes precedence.

C-13
Student Notebook
- wsm_smit.redirect
Instead of simply rejecting access to a specific page, you can redirect the user to a
different page. The default .redirect file has entries to redirect the user from specific
HACMP SMIT panels that are not supported by WebSMIT.
Using the online documentation feature

To use the online documentation feature, you must install the file sets shown in the
visual.
See the README file for details.

V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.True or False?
In HACMP 5.4, the configuration of WebSMIT is simplified by a new utility
(websmit_config) that configures WebSMIT to be independent of the
system-wide Web server configuration.
2.True or False?
The /usr/es/sbin/cluster/wsm/README file describes the use of the
websmit_config utility.
3.True or False?
Only HACMP SMIT panels can be accessed using Web SMIT.
4.What file controls security settings for Web SMIT?
a. /usr/es/sbin/cluster/wsm/wsm_smit.conf
b. /usr/es/sbin/cluster/wsm/wsm_smit.redirect
c. /usr/es/sbin/cluster/wsm/wsm_smit.log
d. /usr/es/sbin/cluster/wsm/wsm_smit.script
Figure C-8. Checkpoint
QV1251.2
Notes:

C-15
Student Notebook
Unit Summary
WebSMIT provides a graphical user interface for HACMP configuration,
management, and monitoring from a browser
It uses snmp to provide information about the cluster
Requires that a Web server is installed
It uses port 42267 by default
A configuration utility called websmit_config provides automatic
configuration if Apache or IBM HTTP Server is installed
The WebSMIT interface provides access to documentation if it is installed
Security is configured in the /usr/es/sbin/cluster/wsm/wsm_smit.conf file
REDIRECT_TO_HTTPS, AUTHORIZED_PORT,
REQUIRE_AUTHENTICATION, ACCEPTED_USERS
Figure C-9. Unit Summary
QV1251.2
Notes:

V3.1.0.1
backpg
Back page

HACMP II Administration Student Notebook ERC 1.2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

HACMP II Administration Student Notebook ERC 1.2

Загружено:

Авторское право:

Доступные форматы

V3.1.0.

HACMP II: Administration

UNIX Software Service Enablement

July 2007 Edition

Copyright IBM Corp. 2007

OEM VG and File System Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-39

HACMP II: Administration

Copyright IBM Corp. 2007

Verifying Cluster Services Have Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53

Copyright IBM Corp. 2007

Appendix A. Checkpoint Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

HACMP II: Administration

Copyright IBM Corp. 2007

HACMP II: Administration

Copyright IBM Corp. 2007

HACMP II: Administration

Copyright IBM Corp. 2007

Copyright IBM Corp. 2007

Identifies file names, file paths, directories, user names,

Identifies links to web sites, publication titles, is used where the

Identifies attributes, variables, file listings, SMIT menus, code

Identifies commands, subroutines, daemons, and text the user

HACMP II: Administration

Copyright IBM Corp. 2007

Unit 1. HACMP Concept Review

What You Should Be Able to Do

How You Will Check Your Progress

Copyright IBM Corp. 2007

Unit 1. HACMP Concept Review

Copyright IBM Corporation 2007

Figure 1-1. Unit Objectives

HACMP II: Administration

Copyright IBM Corp. 2007

Fundamental HACMP Concepts

Copyright IBM Corporation 2007

Figure 1-2. Fundamental HACMP Concepts

Copyright IBM Corp. 2007

Unit 1. HACMP Concept Review

HACMP's Topology Components

The topology components consist of a cluster, nodes, and the

Figure 1-3. HACMP's Topology Components

HACMP II: Administration

Copyright IBM Corp. 2007

Copyright IBM Corp. 2007

Unit 1. HACMP Concept Review

HACMP's Resource Components

Figure 1-4. HACMP's Resource Components

HACMP II: Administration

Copyright IBM Corp. 2007

- Service IP Address - Users need to be able to connect to the application. Typically,

Copyright IBM Corp. 2007

Unit 1. HACMP Concept Review

Networking Review: IPAT

IPAT via IP Replacement:

Copyright IBM Corporation 2007

Figure 1-5. Networking Review: IPAT

HACMP II: Administration

Copyright IBM Corp. 2007

Networking Review: Configuration Rules

Using heartbeat over IP alias

Define service addresses in /etc/hosts and in HACMP

IPAT via IP Aliasing:

IPAT via IP Replacement

Figure 1-6. Networking Review: Configuration Rules