Академический Документы
Профессиональный Документы
Культура Документы
cover
Front cover
Student Notebook
ERC 1.2
Student Notebook
V3.1.0.1
Student Notebook
TOC
Contents
Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Unit 1. HACMP Concept Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Fundamental HACMP Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
HACMP's Topology Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
HACMP's Resource Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Networking Review: IPAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Networking Review: Configuration Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
Just What Does HACMP Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
What Happens When Something Fails? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
What Happens When a Problem is Fixed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
Resource Group Behavior? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
So, What is HACMP Really? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
Additional Features of HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
Some Assembly Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
HACMP V5.4 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20
Things HACMP Does Not Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21
When HACMP Is Not The Correct Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
Sources of HACMP Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Unit 2. Configuring Shared Storage for HACMP . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Data and Storage Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
LVM Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
LVM Volume Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
High Availability Data/Storage Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Configuring a Mirrored File System for HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Shared Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Serial Access Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Reserve/Release Voluntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Reserve/Release Involuntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21
RSCT Based Voluntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
RSCT Based Involuntary VG Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Synchronizing Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Quorum Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
Quorum/Mirror Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
HACMP Forced Varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
Recommendations for Forced Varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
Contents
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
iii
Student Notebook
V3.1.0.1
Student Notebook
TOC
Contents
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
vi
V3.1.0.1
Student Notebook
pref
Course Description
HACMP II: Administration
Purpose
This course is part of an HACMP curriculum designed to prepare students to support
customers who are using HACMP. This course teaches how to administer a highly
available cluster using HACMP Version 5.4 on an IBM pSeries server running AIX 5L
V5.2 or V5.3.
Audience
This course is intended for AIX technical support personnel and AIX system
administrators.
Prerequisites
Students attending this course are expected to have:
- AIX TCP/ IP, LVM storage and disk hardware implementation skills
- An understanding of basic HACMP concepts and the ability to install and configure a
basic two-node cluster in standby configuration
These skills are addressed in the following course and its prerequisites, or can be
obtained through equivalent education and experience:
- AHQV120: HACMP-I: Installation and Initial Configuration
Course Description
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
vii
Student Notebook
Curriculum relationships
This course is the second course in our HACMP support curriculum:
- HACMP-I: Installation and Initial Configuration
HACMP-I is an introductory course designed to prepare students to install and
configure a highly available cluster using HACMP Version 5.4 on an IBM pSeries
server running AIX 5L V5.2 or V5.3.
- HACMP-II: Administration
HACMP-II teaches how to administer a highly available cluster using HACMP
Version 5.4 on an IBM pSeries server running AIX 5L V5.2 or V5.3.
- HACMP-III: Extended Configuration
HACMP-III teaches more advanced HACMP administration, including extended
configuration, cluster event flow and monitoring cluster status.
- HACMP-IV: Application Integration
HACMP-IV Describes the requirements for successful application integration and
monitoring. Students will integrate a real application into HACMP and will resolve
application problems.
- HACMP-V: Problem Determination
HACMP-V introduces HACMP problem determination concepts and techniques,
including: common failures, strategies, tools and log files. Students will resolve LVM
and CSPOC problems, networking and RSCT problems and event script problems.
viii
V3.1.0.1
Student Notebook
pref
Agenda
(1:00) Welcome
(1:00) Unit 1 - HACMP Concept Review
(2:30) Unit 2 - Configuring Shared Storage for HACMP
(0:30) Exercise 1 - Configure Shared Storage for HACMP
(3:00) Exercise 2 - Create a Mutual Takeover Cluster
(2:30) Unit 3 - HACMP Administration
(1:00) Unit 4 - HACMP Security
(3:00) Exercise 3 - HACMP Administration
(OPTIONAL) Exercise 4 - HACMP Security
Appendix B - Integrating NFS into HACMP
Appendix C - Using WebSMIT
Agenda
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
ix
Student Notebook
Text highlighting
The following text highlighting conventions are used throughout this book:
Bold
Italics
Monospace
Monospace bold
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html
HACMP for AIX manuals
1-1
Student Notebook
Unit Objectives
After
After completing
completing this
this unit,
unit, you
you should
should be
be able
able to:
to:
Discuss
Discuss basic
basic fundamental
fundamental concepts
concepts of
of HACMP
HACMP for
for AIX
AIX
Outline
Outline the
the features
features of
of HACMP
HACMP for
for AIX
AIX
Review
Review the
the features,
features, components,
components, and
and limits
limits of
of an
an
HACMP
HACMP for
for AIX
AIX cluster
cluster
Explain
Explain how
how HACMP
HACMP for
for AIX
AIX operates
operates in
in typical
typical cases
cases
Describe
Describe some
some of
of the
the considerations
considerations and
and limits
limits of
of an
an
HACMP
HACMP cluster
cluster
Locate
Locate HACMP
HACMP sources
sources of
of information
information
QV1251.2
Notes
1-2
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Terminology
A clear understanding of the above concepts and terms is important as they appear
over and over again both in the remainder of the course and throughout the HACMP
documentation, log files, and SMIT screens.
1-3
Student Notebook
IP ork
tw
Ne
-IP k
on or
N tw
e
N
Communication
Interface
n
atio
unic
m
Com Device
Nod
r
st e
Cl u
No
de
QV1251.2
Notes
Topology components
An HACMP cluster's topology encompasses nodes (System p servers / LPARS), IP and
non-IP networks (connections between the nodes). IP networks consist of
communication interfaces (for example, Ethernet or token-ring network adapters) and
for non-IP networks the communication devices (for example, /dev/tty for RS232).
Nodes
In the context of HACMP, the term node means any IBM System p which is a member
of a High Availability cluster running HACMP. This would also include a logical partition
(LPAR) running AIX and HACMP. A node can only be a member of at most one cluster.
1-4
V3.1.0.1
Student Notebook
Uempty
Networks
Networks consist of IP and non-IP networks. The non-IP networks ensure that cluster
monitoring can be done if there is a total loss of IP communication. Non-IP networks are
strongly recommended to be configured in an HACMP.
1-5
Student Notebook
tio
ica
pl
Ap
n
er
rv
Se
Se
Ad rvic
dr e I
es P
s
Vo
Gr lum
ou e
p
le
Fi tem
s
Sy
roup
G
e
c
r
u
Reso
s
Node e Policies
m
Runti ces
ur
Reso
Copyright IBM Corporation 2007
QV1251.2
Notes
Resource group
A resource group is a collection of resources treated as a unit along with what nodes
they can potentially be activated on and what policies the cluster manager should use to
decide which node to choose during startup, fallover, and fallback. A cluster may have
more than one resource group (usually one for each application), thus allowing for very
flexible configurations.
Resources
Resources are logical components that are made highly available by HACMP. Because
they are logical components, they can be moved without human intervention.
The resources shown in the visual are a typical set of resources used in resource
groups such as:
1-6
V3.1.0.1
Student Notebook
Uempty
1-7
Student Notebook
QV1251.2
Notes:
IP Address Takeover (IPAT)
HACMP keeps service and persistent addresses and labels highly available using IP
Address Takeover or IPAT. This allows HACMP to move an address to another NIC or
node when the component supporting the address fails.
An HACMP network can be configured to use either IPAT via IP Aliasing or IPAT via IP
Replacement. When aliasing is used, service labels are aliased onto interfaces,
maintaining the existing configuration (the non-service addresses are still available from
the affected interfaces). When replacement is used, service labels replace the
non-service address configured on an interface.
1-8
V3.1.0.1
Student Notebook
Uempty
Service IP addresses
QV1251.2
Notes:
Non-service address rules
When heartbeating over IP interfaces is used, in order for topology services to
accurately diagnose network component failures (using hearbeat rings), all interfaces
on a node must be configured with IP addresses that are on different subnets. Using
heartbeating over IP alias removes the subnet restrictions. With this method you specify
a base address for the heartbeat subnets and HACMP configures heartbeat rings using
IP aliasing.
You define non-service IP addresses and labels using AIX (smitty mktcpip, smitty
chinet). A node will boot with non-service addresses configured on its interfaces by AIX.
These addresses and labels are listed in the /etc/hosts file on each node, along with
any service labels and addresses.
1-9
Student Notebook
V3.1.0.1
Student Notebook
Uempty
HACMP functions:
Monitor the states of nodes, networks, network
adapters/devices
Strive to keep resource groups highly available
Optionally, HACMP can monitor the state of the application(s)
and can be customized to react to every possible failure
QV1251.2
Notes
HACMP basic functions
HACMP directly detects four kinds of network-related failures:
- A communications adapter or device failure
- A node failure
- A network failure (all communication adapters/devices on a given network
- Application failure (requires application monitors).
Most other failures are handled outside HACMP, either by AIX or LVM, and can be
handled in HACMP via customization. Customization that allows HACMP to react when
loss of quorum for a volume group occurs is built-in.
1-11
Student Notebook
QV1251.2
Notes
How HACMP responds to a failure
HACMP generally responds to a failure by using an equivalent but still available
component to take over the duties of the failed component. For example, if a node fails,
then HACMP initiates a fallover (for non concurrent resource groups), an action which
consists of moving the resource groups which were previously on the failed node to a
surviving node. If a Network Interface Card (NIC) fails, HACMP usually moves any IP
addresses being used by clients to another available NIC. If there are no remaining
available NICs, HACMP initiates a fallover. If only one resource group is affected, then
only the one resource group is moved to another node.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
How HACMP responds to a recovery
When a previously failed component recovers, it must be reintegrated back into the
cluster (reintegration is the process of HACMP recognizing that the component is
available for use again). Some components, like NICs, are automatically reintegrated
when they recover. Most of the time other components, like nodes, are not reintegrated
until the cluster administrator explicitly requests the reintegration (by starting the
HACMP daemons on the recovered node).
1-13
Student Notebook
B A
trinity
neo
Concurrent
Application must be designed to run
simultaneously on multiple nodes
This has the potential for essentially
zero downtime and is designed for fault
tolerance and high performance
The application must be specifically
written for the environment
A
neo
trinity
zion
QV1251.2
Notes
Non-concurrent mode
This is where HACMP runs an application on a single node that will fallover to a standby
node in case of a failure. This method is used to build mutual takeover clusters whereby
each node will run an application. Mutual takeover configurations are very popular
configurations for HACMP since they support two highly available applications at a cost
which is not that much more than would be required to run the two applications in
separate stand-alone configurations.
Each cluster node probably needs to be somewhat larger than the stand-alone nodes
as they must each be capable of running both applications, possibly in a slightly
degraded mode, should one of the nodes fail.
V3.1.0.1
Student Notebook
Uempty
Concurrent mode
HACMP also supports resource groups in which the application is active on multiple
nodes simultaneously (online on all available nodes). In such a resource group, all
nodes run a copy of the application and share simultaneous access to the disk. This
style of cluster is often referred to as a concurrent access cluster or concurrent access
environment.
1-15
Student Notebook
clcomdES
Topology
manager
Resource
manager
Event
manager
RSCT
(topsvcs, grpsvcs, RMC
subsystems)
SNMP
manager
snmpd
clinfoES
clstat
QV1251.2
Notes
HACMP core components
HACMP comprises a number of software components:
- The cluster manager, clstrmgrES, is the core process which monitors cluster
membership. The cluster manager includes a topology manager to manage the
topology components, a resource manager to manage resource groups, an event
manager with event scripts that works through the RMC facility, and RSCT to react
to failures.
- In HACMP v5.3/5.4, the cluster manager also contains an SNMP manager which
allows for SNMP-based monitoring to be done using an SNMP manager such as
Tivoli NetView.
- The clinfo process provides an API for communicating between cluster manager
and your application. clinfo also provides remote monitoring capabilities and can
run a script in response to a status change in the cluster. clinfo is an optional
1-16 HACMP II: Administration
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1.0.1
Student Notebook
Uempty
process which can run on both servers and clients (the source code is provided on
request). The clstat command uses clinfo to display status via ascii, Xwindow, or
Web browser interfaces.
In HACMP v5.x, clcomdES provides a secure node communication path which allows
the cluster nodes to communicate in a secure manner without using rsh and .rhost
files.
1-17
Student Notebook
OLPW
WebSMIT
Verification/
Auto
correction
CTT
ClstrmgrES
CSPOC
DARE
SNMP
Tivoli
Integration
Application
Monitoring
QV1251.2
Notes
Additional features
HACMP also has additional software to provide facilities for administration, testing,
remote monitoring, auto-correction, and verification.
V3.1.0.1
Student Notebook
Uempty
Optional:
Customized pre/post event scripts
Reaction to events
Error notification Methods
User Defined Events (UDEs)
Cluster State Change
QV1251.2
Notes
Customization required
Minimally, you will have to create application start and stop scripts. It is strongly
suggested that you create application monitors also, to allow HACMP to handle failure
of the application.
Optional customization
HACMP is shipped with event scripts (Korn Shell scripts) which handle default failure
scenarios. If you have a requirement to customize some special behavior, then this can
be achieved through pre- and post-event scripts or error notification methods and User
Defined Events (UDEs).
1-19
Student Notebook
QV1251.2
Notes
RSCT limit
HACMP uses the Topology Services component of RSCT for monitoring networks and
network interfaces. Topology Services organizes all the interfaces in the topology into
different heartbeat rings. The current version of RSCT Topology services has a limit of
48 heartbeat rings, which is usually sufficient to monitor networks and network
interfaces. Roughly speaking, the number of heartbeat rings is (usually) very close to
the number of network adapters on the node with the most adapters.
These limits do not tend to be a major concern in most clusters. Refer to the HACMP
documentation for additional information if you are planning a cluster which might
approach some of these limits.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Things HACMP does not do
HACMP does not automate your backups, neither does it keep time in sync between
the cluster nodes nor tune your DB2 configuration. These tasks do require further
configuration and software. For example, you can use Tivoli Storage Manager (TSM) as
an enterprise backup solution and a time protocol such as xntp for time
synchronization.
1-21
Student Notebook
Unstable environments:
HACMP cannot make an unstable and poorly-managed
environment stable
HACMP increases the availability of well-managed
systems
Copyright IBM Corporation 2007
QV1251.2
Notes
Zero downtime
An example of zero downtime may be the intensive care room. Also HACMP is not
designed to handle many failures at once.
Security issues
One security issue that is now addressed is the need to eliminate .rhost files. Also
there is better encryption possible with inter-node communications but this may not be
enough for some security environments.
Unstable environments
The prime cause of problems with HACMP is poor design, planning, implementation,
and administration. If you have an unstable environment, with poorly trained
V3.1.0.1
Student Notebook
Uempty
administrators, easy access to the root password, and a lack of change control,
HACMP is not the solution for you.
With HACMP, the only thing more expensive than employing a professional to plan,
design, install, configure, customize, and administer the cluster is employing an
amateur.
Other characteristics of poorly managed systems are:
- Lack of change control
- Failure to treat cluster as single entity
- Lack of documented operational procedures
1-23
Student Notebook
QV1251.2
Notes
Sources of information
There are many excellent sources of HACMP information. Manuals and release notes
come with the product; read them. You can also find the manuals (for all supported
versions of HACMP) online, as well as Redpapers, Redbooks, and whitepapers that
cover many topics.
V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.
2.
3.
4.
True or False: A Resource may belong to more than one Resource group.
5.
QV1251.2
Notes
1-25
Student Notebook
Unit Summary
Key points from this unit:
Basic fundamental concepts of HACMP for AIX
Topology, resources, customization
HACMP networks
IPAT, configuration rules
Features of HACMP for AIX
Planning and configuration tools and assistants
Components and limits of an HACMP for AIX cluster
RSCT, SNMP, clstrmgr, clcomd, clinfo
HACMP keeps resource groups and applications highly available
Cluster Manager initiates fallover and fallback according to policies
and conditions
Considerations and limits of an HACMP cluster
No data backup, time synchronization, application configuration
Not fault-tolerant
Security and environment stability considerations
Locate HACMP sources of information
With the product, in courses, and on the web
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
HACMP manuals
2-1
Student Notebook
Unit Objectives
After completing this unit, you should be able to:
Discuss the issues to make data and storage highly
available.
Describe how access to shared storage is controlled in an
HACMP cluster
Explain how enhanced concurrent mode volume groups are
used
Explain the issue of PVID consistency within an HACMP
cluster
Discuss how LVM aids cluster availability
Describe the quorum issues associated with HACMP
Set up LVM for maximum availability
Configure a new shared volume group, file system, and
jfslog
Copyright IBM Corporation 2007
QV1251.2
Notes
2-2
V3.1.0.1
Student Notebook
Uempty
LVM
Device support:
Hardware Adapter
hdisks, vpath
Driver: SDD, MPIO
SCSI
SSA
HBA,FC (SAN)
OEM (EMC)
Node
node1/#
hdisk0
hdisk1
hdisk2
hdisk3
lspv
00013c26f4222080
00013c26be8aabbe
00013c260ce205d2
00013c26beea7727
VGDA
rootvg
rootvg active
appB_vg
glvm_vg
None
DISKs (LUNs)
PVID
Storage system
DS8000,DS6000,DS4000,
SAN Volume Controller 2104,
ESS2105
Determine HACMP compatibility levels
QV1251.2
Notes:
Introduction
It is assumed in this course that you have had experience with AIX LVM management
and the storage systems that you will be using. The purpose of this unit is to bring out
the information that is relevant to an HACMP environment.
2-3
Student Notebook
It is important to remember that some information is kept both in AIX and on the disk
(LUN). This information includes the VGDA and especially the PVID.
Storage systems
SAN (SDD,HBA)
IBM Storage Subsystems currently supported include:
DS8000 / DS6000 families
DS4000 family
SAN Volume Controller (SVC)
IBM Storage Subsystem support with HACMP is announced via Flash
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Flashes
Determine the HACMP compatibility levels for the following:
HBA device driver
AIX patch levels
Multi-pathing software (SDD, RDAC, MPIO PCM, and so on)
Device microcode/firmware
With most IBM SAN Storage devices, the multi-pathing software will be the Subsystem
Device Driver (SDD). It is supported with HACMP (with appropriate PTFs).
To use C-SPOC with VPATH disks, SDD 1.3.1.3, or later, is required.
For levels and maintenance, check:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S4000
065&loc=en_US&cs=utf-8&lang=en
SCSI
It is most likely you will be using an IBM 2104 Expandable Storage Plus device if you
are attaching via SCSI. It is also possible, though unlikely, that you would connect
an ESS (2105) to your pSeries system using SCSI.
SSA
- SSA is not longer marketed.
- SSA uses a loop technology which offers multiple data paths to disk. There are
number and type of adapter restrictions on each loop. For example:
2-4
V3.1.0.1
Student Notebook
Uempty
SSA loops can support eight adapters per loop (Maximum of eight HACMP
nodes sharing SSA disks)
Adapters used in RAID mode are limited to two per loop
For additional information see:
- Redbook, Understanding SSA Subsystems in Your Environment,
SG24-5750-00
- http://www-03.ibm.com/servers/storage/support/disk/7133/index.html
- You can use IBM 7133 and 7131-405 SSA disk subsystems as shared external disk
storage devices to provide concurrent access in an HACMP cluster configuration.
- SSA adapters
The capabilities of SSA adapters have improved over time: - Only 6215, 6219, 6225
and 6230 adapters support Target Mode SSA and RAID5. Only the 6230 adapter with
6235 Fast Write Cache Option feature code supports enabling the write cache with
HACMP
Compatible adapters: 6214 + 6216 or 6217 + 6218 or 6219 + 6215 + 6225 + 6230
For more information and microcode updates (go to SSA downloadable files):
http://www-03.ibm.com/servers/storage/support/disk/7133/downloading.html Features
and functionality of otherwise identical adapters and drives can vary depending upon
the level of microcode installed on the devices so be careful!
Note: AIX V5.2+ does not support the MCA 6214, 6216, 6217 and 6219 SSA adapters.
Always a good idea to contact IBM support
2-5
Student Notebook
HACMP compatibility
Compatibility Flashes can be found at
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Flashes
Hints, Tips and Technotes can be found at
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/Technotes
HACMP Release Notes Shipped with the product
2-6
V3.1.0.1
Student Notebook
LVM Components
LVM manages the components of the disk subsystem. Applications talk to the
disks through LVM.
This example shows an application writing to a file system which has its LVs
mirrored in a volume group physically residing on separate hdisks.
hdisks
Physical
Partitions
Volume Group
Uempty
LV
Logical
Partitions
FS
write to
/file system
Mirrored
Logical
Volume
Application
QV1251.2
Notes:
LVM relationships
An application writes to a filesystem. A filesystem provides the directory structure and is
used to map the application data to logical partitions of a logical volume. Because there
is a LVM, the application is isolated from the physical disks. The LVM can be configured
to map a logical partition to up to three physical partitions and have each physical
partition (copy) reside on a different disk The different disks can be different types/sizes.
2-7
Student Notebook
LVM L
VG IDENTIFIER:
PP SIZE:
TOTAL PPs:
8 MB
537 (4296 MB)
...
Auto-Concurrent: Disabled
QV1251.2
Notes:
Classic and enhanced concurrent mode (ECM) volume groups
History
Concurrent mode volume groups were created to allow multiple nodes to access the
same logical volumes concurrently.
The original concurrent mode volume groups are only supported on Serial DASD and
SSA disks in conjunction with the 32-bit kernel.
Beginning with AIX V5.1, the enhanced concurrent mode volume group was introduced
to extend the concurrent mode support to all other disk types and to the 64-bit kernel.
Enhanced concurrent volume groups can also be used in a non-concurrent
environment to provide RSCT-based shared storage protection.
2-8
V3.1.0.1
Student Notebook
Uempty
2-9
Student Notebook
QV1251.2
Notes:
Data access failure requires redundancy
HACMP does not provide data redundancy. Data must be striped or mirrored across
multiple physical drives (generally presented to AIX as a LUN). The replication can be
done by AIX using LVM mirroring or the storage system using RAID 5 or, if using JBOD
(Just a Bunch of Disks), by AIX using LVM mirroring. HACMP is not aware of the
method being used.
V3.1.0.1
Student Notebook
Uempty
- Data can be mirrored on three disks rather than having just two copies of data. This
provides higher availability in the case of multiple failures, but does require more
disks for the three copies.
- The disks used in the physical volumes could be of mixed attachment types.
- Instead of entire disks, individual logical volumes are mirrored. This provides
somewhat more flexibility in how the mirrors are organized. It also allows for an odd
number of disks to be used and provides protection for disk failures when more than
one disk is used.
- The disks can be configured so that mirrored pairs are in separate sites or in
different power domains. In this case, after a total power failure on one site,
operations can continue using the disks on the other site that still has power. No
information is displayed on the physical location of each disk when mirrored logical
volumes are being created, unlike when creating RAID 1 or RAID 0+1 arrays, so
allocating disks on different sites requires considerable care and attention.
- Mirrored pairs can be on different adapters.
- Read performance is good for short length operations as data can be read from
either of two disks, so the one with the shortest queue of commands can be used.
Write performance requires a write to two disks.
- Extra mirrored copies can be created and then split off for backup purposes.
- Data can be striped across several mirrored disks, an approach which avoids hot
spots caused by excessive activity on a few disks by distributing the I/O operations
across all the member disks.
- There are parameters such as Mirror Write Consistency, Scheduling Policy, and
Enable Write Verify which can help maximize performance and reliability.
Storage system
RAID 5 can be used within the storage system. Hardware features must be checked for
compatibility with HACMP. Multiple paths to get to the data from the server is
accomplished through multi-pathing software. That software must be checked for
compatibility with HACMP.
Although not in the scope of this class, the selected storage subsystem will be affected
by the factors listed below (among others). The selected storage subsystem will then
determine what you will look for in terms of compatibility with the chosen HACMP
version and features.
- Data access performance requirements
- Capacity
- Support for multi-pathing
- Price
2-11
Student Notebook
V3.1.0.1
Student Notebook
Uempty
Configuring a Mirrored
File System for HACMP
Step
Description
Options
Name the VG something meaningful like shared_vg1
create a jfslog lv
"sharedlvlog"
QV1251.2
Notes
Introduction
This visual describes a procedure for creating a mirrored filesystem for use in HACMP.
There is an easier-to-use method provided by an HACMP facility called C-SPOC which
is discussed later in the course. The C-SPOC method cannot be used until the HACMP
clusters topology and at least one resource group have been configured.
The procedure described in the visual permits the creation of shared file systems before
performing any HACMP related configuration (an approach favored by some cluster
configurators).
Detailed procedure
Here are the steps in somewhat more detail:
a. Use the smit mkvg fastpath to create the volume group.
2-13
Student Notebook
b. Make sure that the volume group is created with the Activate volume group
AUTOMATICALLY at system restart parameter set to no (or use smit chvg to
set it to no). This gives HACMP control over when the volume group is brought
online. It is also necessary to prevent, for example, a backup node from attempting
to online the volume group at a point in time when it is already online on a primary
node. This is not necessary for ECM volume groups -- it is the default.
c. Use the smit mklv fastpath to create a logical volume for the jfslog with the
parameters indicated in the figure above (make sure that you specify a type of jfslog
or AIX ignores the logical volume and creates a new one that is not mirrored when
you create filesystem below).
d. Use the logform command to initialize the logical volume for use as a JFS log
device.
e. Use the smit mklv fastpath again to create a logical volume for the filesystem with
the parameters indicated in the figure above.
f. Use the smit crjfslv fastpath (not crjfs) to create a JFS filesystem in the now
existing logical volume.
g. Verify by mounting the filesystem and using the lsvg command. Notice that if
copies were set to 2, then the number for PPs should be twice the number for LPs
and that if you specified separate physical volumes then the values for PVs should
be 2 (the number of copies).
The procedure for creating a JFS2 filesystem is quite similar although there are a few
differences:
- The type of the JFS2 log logical volume should be jfs2log
- The logform command requires an additional parameter to cause it to create a
JFS2 log
# logform -V jfs2log <lvname>
- The type of the JFS2 filesystem logical volume should be jfs2
- The fastpath for creating a JFS2 filesystem in an existing logical volume is
smit crjfs2lvstd
V3.1.0.1
Student Notebook
Uempty
Node
2
LVM
odm
pvid
Device
hdisk
Adapter
LVM
odm
pvid
Device
hdisk
Adapter
access
access
shared
disks
rootvg
rootvg
rootvg
rootvg
VGDA
pvid
private
private
Adapters: connect to same disks; compatible (microcode, PTF levels for drivers)
Device: may be different hdisk numbers but better to match
LVM: definitions, PVIDs must be in synch
Access: Private vs. Shared
Storage system must connect to both nodes
Shared may be serial (non-concurrent) or parallel (concurrent)
QV1251.2
Notes:
Shared storage
The answer to the loss of a node is the concept of shared storage. In this case we have
access to the storage from more than one node. Shared storage requires that LVM
components be in synch on all nodes. Also, adapters and microcode on all the systems
be at the same level.
2-15
Student Notebook
LVM PVIDs
Each AIX system that is sharing a volume group will need to have access to the same
disks (LUNs). This is either done through zoning and masking in the SAN or via twin-tail
cabling for non-SAN implementations. If the zoning/masking/cabling is done correctly,
each system will see the same disks (LUNs).
Note, for SCISI adapters in a shared storage environment, avoid SCSI id 7 as AIX may
assign it during a maintenance or diag operation and you could end up by accident with
two SCIS id = 7.
V3.1.0.1
Student Notebook
Uempty
Facilities:
-
reserve/release:
used with classic vg
varyonvg, varyoffvg (or HACMP low level code)
gsclvmd (RSCT):
used with ECM VGs
invoked with varyonvg in passive mode
used for fast disk takeover needs no disk access
Synchronizing Changes
Changes made to one side must be propagated to the other side
Facilities:
-
node)
-
changes)
varyoffvg)
QV1251.2
Notes:
Why?
The shared storage is physically connected to each node that the application might run
on. In a serial (non-concurrent) access environment, the application actually runs on
only one node at a time and modification or even access to the data from any other
node during this time could be catastrophic (the data could be corrupted in ways which
take days or even weeks to notice).
Any LVM changes in shared storage must be synchronized.
2-17
Student Notebook
request for each disk in a volume group when the volume group is varied online by the
varyonvg command. The varyonvg command fails for any disks which are currently
reserved by other nodes. If it fails for enough disks, which it almost certainly does since
if one disk is reserved by another node, the others presumably are also, then the varyon
of the volume group fails. HACMP can, if necessary during a fallover, execute the low
level routines to unreserve a disk.
V3.1.0.1
Student Notebook
Uempty
is, of course, possible to update the ODM on inactive nodes when the change to the
meta-data is made. In this way, extra time at fallover is avoided. The ODM can be
updated manually or you can use Cluster Single Point of Control (C-SPOC) which can
automate this task. Lazy update and the various options for updating ODM information
on inactive nodes are discussed in detail in a later unit in this course.
2-19
Student Notebook
httpvg
varyonvg
ODM
ODM
ODM
dbvg
C
varyonvg
Node
1
ODM
ODM
Node
1
Node
2
Node
2
ODM
ODM
dbvg
C
varyonvg
httpvg
varyonvg
ODM
ODM
Node
2
ODM
ODM
dbvg
C
varyonvg
Node2:
varyoffvg httpvg
Node1:
varyonvg httpvg
QV1251.2
Notes
Voluntary takeover
With reserve/release-based shared storage protection, HACMP passes volume groups
between nodes by issuing a varyoffvg command on one node and a varyonvg
command on the other node. The coordination of these commands (ensuring that the
varyoffvg is performed before the varyonvg) is the responsibility of HACMP.
V3.1.0.1
Student Notebook
Uempty
httpvg
Node
1
B varyonvg
ODM
ODM
ODM
ODM
varyonvg
Node
1
Node
2
httpvg
varyonvg
Node
2
ODM
ODM
ODM
varyonvg
QV1251.2
Notes
Involuntary disk takeover
The right node has failed with the shared disks still reserved to the right node. When
HACMP encounters a reserved disk in this context, it uses a special utility program to
break the disk reservation. It then varies on the volume group which causes the disks to
be reserved to the takeover node.
Implications
Note that if the right node had not really failed then it would lose its reserves on the
shared disks (rather abruptly) when the left node varied them on. This will be seen in
the left nodes error log and should be acted on immediately as this indicates you are in
a situation where both nodes can access and update the data on the disks (each
believing that it is the only node accessing and updating the data). An involuntary
takeover isnt possible unless all paths used by HACMP to communicate between the
two nodes have been severed.
Copyright IBM Corp. 2007
2-21
Student Notebook
V3.1.0.1
Student Notebook
Uempty
httpvg
passive
varyon
active
varyon
ODM
ODM
active
varyon
Node
1
dbvg
passive
varyon
httpvg
passive
varyon
passive
varyon
ODM
dbvg
httpvg
A
active
varyon
passive
varyon
passive
varyon
ODM
dbvg
passive
varyon
Node
2
ODM
active
varyon
1. A decision is made to
move httpvg from the
right node to the left
Node
2
ODM
active
varyon
Node
1
Node
2
QV1251.2
Notes
Voluntary VG takeover with fast disk takeover
With RSCT based takeover there is no need to check for lazy update or to do the
reserves and a lot of the varyonvg processing. This is referred to in HACMP as Fast
Disk Takeover. The fast disk takeover mechanism handles a voluntary VG takeover by
first putting the volume group on the node which is giving up the volume group into
passive state. It then sets the active varyon state on the node which is taking over the
volume group. The coordination of these operations is managed by HACMP 5.x and
AIX RSCT.
2-23
Student Notebook
passive
varyon
httpvg
B
active
varyon
ODM
ODM
active
varyon
Node
1
dbvg
passive
varyon
httpvg
passive
varyon
Node
2
ODM
ODM
active
varyon
Node
1
Node
2
active
varyon
dbvg
httpvg
passive
varyon
passive
varyon
ODM
dbvg
Node
2
ODM
active
varyon
passive
varyon
QV1251.2
Notes
Involuntary with fast disk takeover
A node has failed. Once the remaining node (or nodes) realize that the node has failed,
the takeover node sets the volume groups varyon state to be active.
There is no need to break disk reservations as no disk reservations are in place. The
only action required is that the takeover node ask its local LVM to mark the volume
groups varyon state as active.
If Topology Services fail (that is, no communication between the nodes) then group
services fail and it is not possible to activate the volume group. This makes it very safe
to use. It is recommended, however, to attach the disks in an enhanced volume group
only to systems running HACMP 5.x.
V3.1.0.1
Student Notebook
Uempty
Synchronizing Changes
Without C-SPOC
Node
1
Node
2
Disk
Array
VGDA
ODM
ODM
#1
mkvg
#2
mklv (log)
unmount
logform
varyoffvg
mklv (data)
crfs
OR
chvg,chlv,chfs May require stopping application
With C-SPOC
does not require stopping application
only supported method for ECM VGs
#3
(cfgmgr)
importvg
chvg
#4
varyoffvg
QV1251.2
Notes
Introduction
The steps to add a shared volume groups are:
1)
2)
3)
4)
5)
6)
Please note that the slide presents only a high-level view of the commands required to
perform these steps. More details are provided below.
2-25
Student Notebook
1. Ensure common PVIDs across all nodes that will share volume group
As discussed earlier, HACMP has no requirement that hdisk names on all the nodes are
consistent, but that all the nodes have access to the same disks and have discovered
the PVIDs.
a. Ensure disks are zoned/masked so that the disks will be seen by both nodes.
b. Add the shared disk(s) to AIX on the primary node (Node1 in the example):
cfgmgr
c. Assign a PVID to the disk(s)
chdev -a pv=yes -l disk_name
where disk_name is hdisk#, hdiskpower# or vpath#.
d. Add the disks to AIX on the secondary node (Node2)
cfgmgr
e. Using the PVIDs, verify that the necessary PVIDs are seen on both nodes. If not,
correct.
lspv
V3.1.0.1
Student Notebook
Uempty
6. Start HACMP
a. Restart HACMP, which varies on the VG and mounts the filesystems and you can
then resume processing.
C-SPOC
Fortunately, there is an easier way.
These steps will be done automatically if the cluster is active and C-SPOC is used.
Otherwise, you can use the commands listed here in the notes.
Unfortunately, we are not looking at the easier way until we get to the C-SPOC unit.
2-27
Student Notebook
Quorum Issues
AIX performs quorum checking on volume groups in order to ensure that the volume
group remains consistent
the quorum rules are intended to ensure that structural changes to the
volume group (for example, adding or deleting a logical volume) are
consistent across an arbitrary number of varyon-varyoff cycles
When mirroring in AIX, quorum checking is an issue because losing access to 50% of the
disks in a volume group takes the volume group offline
How can you lose access to 50% of the disks?
VG status
Quorum checking
Enabled for
volume group
Running
>50% VGDAs
>1
VGDAs
100% VGDAs
varyonvg
>50% VGDAs
or if MISSINGPV_VARYON=TRUE
>50% VGDAs
QV1251.2
Notes
Introduction
If you plan to mirror your data at the AIX level to provide redundancy, you will need to
consider AIX quorum checking on a volume group. If you arent mirroring your data at
the AIX level, quorum isnt an issue.
Quorum
Quorum is the check used by the LVM at the volume group level to resolve possible
data conflicts and to prevent data corruption. Quorum is a method by which >50% of
VGDAs must be available in a volume group before any LVM actions can continue.
Note: For a VG with 3 or more disks, there is one copy of the VGDA on each disk. For a
one disk VG, there are two copies of the VGDA. For a two disk VG, the first disk has two
copies and the second has one copy of the VGDA. The VGDA is identical for all disks in
the VG.
2-28 HACMP II: Administration
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1.0.1
Student Notebook
Uempty
Quorum is especially important in a HA cluster. If LVM can varyon a volume group with
half or less of the disks, it might be possible for two nodes to varyon the same VG at the
same time, using different subsets of the disks in the VG. This is a very bad situation
which we will discuss in the next visual.
Normally LVM verifies quorum when the VG is varied on and continuously while the VG
is varied on.
Type CL
UNKN H
Description
QUORUM LOST, VOLUME GROUP CLOSING
2-29
Student Notebook
group offline due to a loss of quorum for the volume group on the node, HACMP
selectively moves the resource group to another node.
You can change this default behavior by customizing resource recovery to use a notify
method instead of fallover. For more information, see Chapter 4: Configuring HACMP
Cluster Topology and Resources (Extended) in the HACMP for AIX V5.4 Administration
Guide.
Note: HACMP launches selective fallover and moves the affected resource group only
in the case of the LVM_SA_QUORCLOSE error. This error can occur if you use mirrored
volume groups with quorum enabled. However, other types of volume group failure
errors could occur. HACMP does not react to any other type of volume group errors
automatically. In these cases, you still need to configure customized error notification
methods, or use AIX Automatic Error Notification methods to react to volume group
failures.
V3.1.0.1
Student Notebook
Uempty
Quorum/Mirror Choices
Dont mirror in the AIX node
Use external storage subsystem (DS8000/DS6000, EMC, etc) or
RAID arrays
Mirror with quorum disabled
It may be possible for each side of a two-node cluster to have
different parts of the same volume group varied online
It is possible that volume group structural changes (for example, add
or delete of a logical volume) made during the last varyon are
unknown during the current varyon
It is possible that volume group structural changes are made to one
part of the volume group which are inconsistent with a different set of
structural changes which are made to another part of the volume
group
Use HACMP Forced Varyon
QV1251.2
Notes
Introduction
Eliminating quorum issues is done either by mirroring with quorum disabled, or by not
mirroring at the AIX level.
2-31
Student Notebook
- If there are only two disks in the volume group then you lose access to the volume
group if the disk with two VGDAs is lost.
- If you are mirrored across two disk subsystems, consider a quorum buster disk to
prevent loss if quorum if you lose access to one subsystem. This is discussed in the
later in the notes.
Distribute hard disks across more than one bus
Use multipathing software and two Fibre Channel adapters
Use three adapters per node in SCSI
Use two adapters per node, per loop in SSA
Use different power sources
Connect each power supply in the storage device to a different power source
V3.1.0.1
Student Notebook
Uempty
2-33
Student Notebook
QV1251.2
Notes
Introduction
If you decide to mirror at the AIX level and to leave quorum checking on, you will want to
have HACMP handle the loss of access to a volume group if half the disks are lost. Be
sure you understand what youre deciding to do though. If you allow HACMP to handle
the loss of access to the volume group, this means that the loss of half the disks (only
one of your two copies of the data) will result in the users loss of access to the
application until it can be taken by another cluster node. Youve purchased the
additional hardware and setup the mirroring precisely to avoid downtime if you lose
access to part of the hardware, but this strategy will result in downtime. You make the
call (see disabling quorum in the previous visual).
varyonvg -f
AIX provides the ability to varyon a volume group if a quorum of disks is not available.
This is called forced varyon. The varyonvg -f command allows a volume group to be
2-34 HACMP II: Administration
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1.0.1
Student Notebook
Uempty
made active that does not currently have a quorum of available disks. All disks that
cannot be brought to an active state will be put in a removed state. At least one disk
must be available for use in the volume group.
2-35
Student Notebook
QV1251.2
Notes
Be careful when using forced varyon
Failure to follow each and every one of these recommendations could result in either
data divergence or inconsistent VGDAs. Either problem can be very difficult if not
impossible to resolve in any sort of satisfactory way, so be careful!
More information
Refer to the HACMP for AIX Administration Guide Version 5.4 (chapter 15) and the
HACMP for AIX Planning Guide Version 5.4 (chapter 5) for more information about
forced varyon and quorum issues.
V3.1.0.1
Student Notebook
Uempty
Guidelines
Following these simple guidelines helps keep the configuration
easier to administer:
All LVM constructs must have unique names in the cluster
For example, httplv, httploglv, httpfs and httpvg
QV1251.2
Notes
Unique names
Since your LVM definitions are used on multiple nodes in the cluster, you must make
sure that the names created on one node are not in use on another node. The safest
way to do this generally is to explicitly create and name each entity (do not forget to
explicitly create, name and format the jfslog logical volumes using logform).
2-37
Student Notebook
space could cause an outage if the wrong disk fails. In order to avoid the outage,
mirror the scratch space!
The mirrorvg command provides an easy way to mirror all the logical volumes on a
given volume group. This same functionality may also be accomplished manually if you
execute the mklvcopy command for each individual logical volume in a volume group.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Introduction
You can configure OEM volume groups and file systems in AIX and use HACMP as an
IBM high-availability solution to manage such volume groups.
Note: Different OEMs may use different terminology to refer to similar constructs. For
example, the Veritas Volume Manage (VxVM) term Disk Group is analogous to the AIX
LVM term Volume Group. We will use the term volume groups to refer to OEM and
Veritas volume groups.
2-39
Student Notebook
automatically. After you add Veritas volume groups to HACMP resource groups, you
can select the methods for the volume groups from the pick lists in HACMP SMIT
menus for OEM volume groups support.
Note: Veritas Foundation Suite is also referred to as Veritas Storage Foundation (VSF).
V3.1.0.1
Student Notebook
Uempty
Additional considerations
The custom volume group processing or filesystem methods that you specify for a
particular OEM volume group is added to the local node only. This information is not
propagated to other nodes; you must copy this custom volume group processing
method to each node manually. Alternatively, you can use the HACMP File Collections
facility to make the disk, volume, and file system methods available on all nodes.
2-41
Student Notebook
QV1251.2
Notes
Introduction
HACMP lets you use either physical storage disks manufactured by IBM or by an
Original Equipment Manufacturer (OEM) as part of a highly available infrastructure.
Depending on the type of OEM disk, custom methods allow you (or an OEM disk
vendor) to either
- tell HACMP that an unknown disk should be treated the same way as a known and
supported disk type, or
- specify the custom methods that provide the low-level disk processing functions
supported by HACMP for that particular disk type
V3.1.0.1
Student Notebook
Uempty
EMC support
IBM does not provide the requirements for HACMP compatibility with non-IBM storage.
You must contact the support organization or online reference materials for the vendor
of the non-IBM storage.
Be sure to consider the multi-pathing software version and maintenance (PowerPath,
HDLM, MPIO PCM).
For EMC planning see their support matrix:
http://www.emc.com/interoperability/matrices/EMCSupportMatrix.pdf
Search for HACMP. You will get many hits, look in the sections that apply to your
storage devices. Then look for the HACMP version that you are installing. - Finally, look
for device driver, PowerPath and AIX patch information for your configuration.
Disk Type
SCSI -2 Disk
IBM Serial Storage Architecture
Fibre Attached Disk Array
SCSI Disk Array
Fibre Attached SCSI Disk
For example, to have a disk whose PdDvLn field was disk/fcal/HAL9000 be treated
the same as IBM fibre SCSI disks, a line would be added that read:
disk/fcal/HAL9000
FSCSI
2-43
Student Notebook
- /etc/cluster/lunreset.lst
This file is referenced by HACMP during disk takeover.
HACMP will use either a target ID reset or a LUN reset for parallel SCSI devices
based on whether a SCSI inquiry of the device returns a 2 or a 3. Normally, only
SCSI-3 devices support LUN reset. However, some SCSI-2 devices will support an
LUN reset. So, HACMP will check the Vendor Identification returned by a SCSI
inquiry against the lines of this file. If the device is listed in this file, then a LUN reset
is used. This file is intended to be customer modifiable.
For example, if the HAL 9000" disk subsystem returned an ANSI level of '2' to
inquiry, but supported LUN reset, and its Vendor ID was HAL and its Product ID
was 9000, then this file should be modified to add a line which was either:
HAL
or
HAL9000
depending on whether vendor or vendor plus product match was desired. Note the
use of padding of Vendor ID to 8 characters.
A sample /etc/cluster/lunreset.lst file, which contains comments, is provided.
- /etc/cluster/conraid.dat
This file is referenced by HACMP during varyon of a concurrent volume group.
You can use this file to tell HACMP that a particular disk is a RAID disk that can be
used in classical concurrent mode. The file contains a list of disk types, one disk
type per line.
The value of the Disk Type field for a particular hdisk is returned by the following
command:
# lsdev -Cc disk -l <hdisk name> -F type
Note: This file only applies to classical concurrent volume groups. Thus this file has
no effect in AIX V5.3, which does not support classical concurrent VGs.
HACMP does not include a sample conraid.dat file. The file is referenced by the
/usr/sbin/cluster/events/utils/cl_raid_vg script, which does include some
comments.
Additional considerations
The previously described files in /etc/cluster are not modified by HACMP after they
have been configured and are not removed if the product is uninstalled. This ensures
that customized modifications are unaffected by the changes in HACMP. By default, the
files initially contain comments explaining their format and usage.
V3.1.0.1
Student Notebook
Uempty
Keep in mind that the entries in these files are classified by disk type, not by the number
of disks of the same type. If there are several disks of the same type attached to a
cluster, there should be only one file entry for that disk type.
Finally, unlike other configuration information, HACMP does not automatically
propagate these files across nodes in a cluster. It is your responsibility to ensure that
these files contain the appropriate content on all cluster nodes. You can use the
HACMP File Collections facility to propagate this information to all cluster nodes.
HACMP allows you to specify any of its own methods for each step in disk processing,
or to use a customized method, which you define.
Using SMIT, you can perform the following functions for OEM disks:
- Add Custom Disk Methods
- Change/Show Custom Disk Methods
- Remove Custom Disk Methods
2-45
Student Notebook
volume group is eventually brought online by Cluster Services, the question of whether
each physical volume is the expected physical volume is resolved. If it is, then the ghost
disk is deleted. If it isnt, then the ghost disk remains. Whether or not the online of the
volume group ultimately succeeds depends on whether or not the LVM can determine
the identity of the disk.
This is not a problem with IBM disks, they can be identified even when there is a
reserve.
More information
For detailed information about configuring OEM disks for use with HACMP, see
Appendix B in the HACMP for AIX V5.4 Installation Guide.
V3.1.0.1
Student Notebook
Uempty
HBA
MPIO
hdisk0
vhost0
no_reserve
VIOS 2
HBA
MPIO
HBA
hdisk0
HACMP Node1
Hypervisor
HBA
vscsi0
MPIO
hdisk0
vscsi1
sharedvg
vhost0
hdisk0
FRAME 2
Stg
Dev
VIOS 1
HBA
MPIO
hdisk0
vhost0
no_reserve
VIOS 2
HBA
MPIO
HBA
hdisk0
HACMP Node2
Hypervisor
HBA
vscsi0
MPIO
vscsi1
hdisk0
sharedvg
vhost0
QV1251.2
Notes
Overview
This type of configuration is becoming prevalent with the adoption of the Virtualization
capabilities of the Power5 (and follow-on) architecture. A full discussion of the
implementation of this configuration is beyond the scope of the class. The intent is to
indicate that this is a supported configuration, some of the terms to learn, requirements
and a configuration overview. Always consult the IBM Sales Manual and IBM Support
(and anyone else you can find who will talk to you about this from an experienced
standpoint) for the latest requirements and considerations.
Legend
Stg Dev - Storage Subsystem providing access to disks, like a DS8300, DS4000, EMC,
HDS, SSA, and so on.
2-47
Student Notebook
VIOS - Virtual I/O Server, the special LPAR on a Power5 systems that provides
virtualized storage (and networking) devices for use by client LPARs
HBA - Host Bus Adapter also known as Fibre Channel Adapter, this is the connection to
the SAN, giving the VIOS access to storage in the SAN (LUNs).
MPIO - Multipath I/O, built into AIX since v5.1, creates path devices for each instance of
a disk/LUN that is recognized by AIX, presenting only a single hdisk device from these
multiple paths.
vhost0 - Virtual SCSI (server) adapter on the Virtual IO Server that provides the client
LPARs with access to virtual SCSI disks.
vscsi0 - Virtual SCSI (client) adapter on the client LPAR that provides the client access
to the VIOSs Virtual SCSI (server) adapter and therefore access to the virtual SCSI
disks.
Hypervisor - the Power5 component that manages access between the vhost and
vscsi adapters.
Minimum requirements
As of the writing of this version of the course, the minimum requirements for HACMP
with Virtual SCSI (VSCSI) and Virtual LAN (VLAN) on POWER5 (eServer p5 and
eServer i5) models were:
- AIX V5.3 Maintenance Level 5300-002 with APARs IY70082 and IY72974
- VIO Server V1.1 with VIOS Fix pack 6.2 and iFIX IY71303
- HACMP V5.3 (or later), or HACMP V5.2 with APAR IY68370 (or higher) and APAR
IY68387, or HACMP V5.1 with APAR IY66556 (or higher)
All the details on requirements and specifications are in this FLASH:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10390
Configuration overview
Configuration is mostly performed on the VIOS and Hardware Management Console.
The use of MPIO at the AIX level is also key to ensuring data availability in the event
access to a VIOS is lost. Ensure that you can reactivate any path in MPIO that was lost
after it is recovered so as to avoid total loss of access to data on a subsequent path
failure. The HACMP consideration, in addition to the correct software levels as outlined
above, is that enhanced concurrent volume groups are used in this configuration.
Otherwise, this is just another volume group to be managed in a resource group to the
Cluster Manager.
On storage device
Map LUNs to the two corresponding VIO servers
V3.1.0.1
Student Notebook
Uempty
References
Courses that address this configuration:
- Q1373, Logical Partitioning (LPAR) and Virtualization on System p POWER5
Systems
- Q1378, Advanced POWER Virtualization Implementation and Best Practices
Redbooks and Redpapers (www.redbooks.ibm.com):
- REDP-4194-00: IBM System p Advanced POWER Virtualization Best Practices
- REDP-4027-00: HACMP 5.3, Dynamic LPAR and Virtualization
Provides details later in the document on HACMP and Virtualization along with
failure scenarios in the VIO infrastructure and performance considerations.
- SG24-5768-01: Advanced POWER Virtualization on IBM eServer p5 Servers:
Architecture and Performance Considerations
Copyright IBM Corp. 2007
2-49
Student Notebook
V3.1.0.1
Student Notebook
Uempty
Checkpoint (1 of 3)
1. Which of the following statements is TRUE (pick the best answer)?
a. Static application data should always reside on private storage.
b. Dynamic application data should always reside on shared
storage.
c. Shared storage must always be simultaneously accessible in
read-write mode to all cluster nodes.
d. Application binaries should only be placed on shared storage.
2. True or False?
Using RSCT-based shared disk protection results in slower
fallovers.
3. Which of the following disk technologies are supported by HACMP?
a. SCSI
b. SSA
c. FC
d. All of the above
Copyright IBM Corporation 2007
QV1251.2
Notes
2-51
Student Notebook
Checkpoint (2 of 3)
4. True or False?
You should check the vendors website for supported HACMP
configurations when using SAN based storage units (DS8000,
ESS, EMC HDS, and so forth).
5. True or False?
hdisk numbers must map to the same PVIDs across an entire
HACMP cluster.
6. True or False?
Lazy update attempts to keep VGDA constructs in sync between
cluster nodes (reserve/release-based shared storage protection)
7. Which of the following commands will bring a volume group online?
a. mountvg vgA
b. getvtg vgA
c. attachvg vgA
d. varyonvg vgA
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
Checkpoint (3 of 3)
8. True or False?
Quorum should always be disabled on shared volume groups.
9. True or False?
File system and logical volume attributes cannot be changed while
the cluster is operational.
10. True or False?
An enhanced concurrent volume group is required for the
heartbeat over disk feature.
QV1251.2
Notes
2-53
Student Notebook
Unit Summary
Key points from this unit:
Access to shared storage must be controlled
Non-concurrent (serial) access
Reserve/release-based protection:
Slower and may result in ghost disks
RSCT-based protection (fast disk takeover):
Faster, no ghost disks, and some risk of partitioned cluster in the event of
communication failure
Careful planning is needed for both methods of shared storage protection
to prevent fallover due to communication failures
Concurrent access
Access must be managed by the parallel application
Hardware RAID
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
2-55
Student Notebook
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
3-1
Student Notebook
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
HACMP manuals
3-2
V3.1.0.1
Student Notebook
Uempty
Unit Objectives
After completing this unit, you should be able to:
Topic 1: HACMP Status and Log Files
Display cluster configuration and status
Locate and use HACMP log files
QV1251.2
Notes
3-3
Student Notebook
3-4
V3.1.0.1
Student Notebook
Uempty
3-5
Student Notebook
QV1251.2
Notes
3-6
V3.1.0.1
Student Notebook
Uempty
Description
ps -ef
./myappcheckscript
Is application running?
mount
df
lsvg -o
lsvg vgname
Check VG details.
lspv
netstat -i
ifconfig -a
netstat -rn
ping -R
lssrc -g cluster
lssrc -a | grep cl
QV1251.2
Notes:
Useful AIX commands
Here is a list of useful AIX commands that are frequently used in cluster administration.
For additional commands or general reference purposes consult one of the following;
- AIX man pages
- The pSeries and AIX Information Center:
http://publib16.boulder.ibm.com/pseries/index.htm
3-7
Student Notebook
Description
clstat
cldump
cldisp
cltopinfo
(cllsif)
clRGinfo
(clfindres)
clshowres
clshowsrv
(clshowsrv -a)
(clshowsrv -v)
QV1251.2
Notes:
clstat
The clstat utility uses the clinfo library routines to display all node, interface and
resource group information for a selected cluster. clinfoES and snmpd must be running.
ASCII or X mode
The clstat utility is supported in two modes: ASCII mode and X Window mode. ASCII
mode can run on any physical or virtual ASCII terminal, including xterm or aixterm
windows. If the DISPLAY variable is set, the program will run in X Window mode, unless
you specify the -a flag when issuing the command.
Monitor or one time status
Specifying the -o flag will execute the ASCII mode one time and exit. This is useful for
capturing clstat output from a shell script or cron job. Otherwise, clstat provides a
regularly updated display of cluster status.
3-8
V3.1.0.1
Student Notebook
Uempty
Refresh interval
Use -r to specify the refresh interval - the frequency with which clstat queries clinfo
for new cluster information. In ASCII mode, the command interprets the value of interval
in seconds. The default interval is 1 second. In X display mode, clstat interprets the
value of interval in tenths of seconds. The default interval is .1 of a second.
cldump
cldump uses SNMP to gather cluster status and sends the results to standard out.
cldisp
This script uses SNMP and prints an application-centric summary of the cluster to
standard output.
cltopinfo
cltopinfo displays cluster topology information in an easy to read format. There are
several flags to select which information is displayed.
cltopinfo shows configuration, not status. For example:
- It shows where service labels can be configured, not where they are currently
configured
- It shows the addresses configured for each interface, but does not show interface
state (UP or DOWN)
clslif is a link to cltopinfo and displays the topology in a slightly different format.
clRGinfo
The clRGinfo command displays a resource group's attributes. With no flags, it just
shows where each resource group is running. With various options, it will show
additional resource group attributes. You can specify a list of one or more resource
groups, or, if the command is invoked without any resource groups in command line,
information about all the configured resource groups is displayed.
If cluster services are not running on the local node, the command determines a node
where the cluster services are active and obtains the resource group information from
the active Cluster Manager.
clshowres
The clshowres command retrieves information from the HACMP resource ODM object
class and lists the resources defined for all resource groups or for a given group or
node.
It does not show where each resource group is currently running.
Copyright IBM Corp. 2007
3-9
Student Notebook
clshowsrv
The clshowsrv command displays the status of HACMP subsystems. Status includes
subsystem name, group name, process ID, and status. The status of a daemon can be
any one of the states reflected by the SRC subsystem (active, inoperative, warned to
stop, etc).
Flags
- -a
Displays all HACMP daemons.
- -v
Displays all RSCT, HACMP and optional HACMP daemons.
- subsystem
Shows the status of the specified HACMP subsystem. Valid values are clstrmgrES,
clinfoES, and clcomdES. If you specify more than one subsystem, separate the
entries with a space (no commas).
V3.1.0.1
Student Notebook
Uempty
Description
Start here to get time
of cluster event
/usr/es/adm/cluster.log
/tmp/hacmp.out
/tmp/hacmp.out.[1-7]
/usr/es/sbin/cluster/history/cluster.mmddyyy
/tmp/cspoc.log
/var/hacmp/clverify/clverify.log
/tmp/emuhacmp.out
/tmp/clstrmgr.debug
/var/hacmp/clcomd/
clcomd.log
clcomddiag.log
clcomd logs:
Log of incoming and outgoing connection requests.
Diagnostic information from clcomd.
/var/adm/clavan.log
/var/hacmp/log/
clconfigassist.log
clutils.log
cl_testtool.log
Misc. logs:
Two-node assistant log.
Utilities and file propagation log.
Cluster test tool log.
/var/ha/log/
RSCT logs:
Log of Group Services daemon.
Log of Topology Services daemon.
grpsvcs*
topsvcs*
QV1251.2
Notes:
HACMP log files
Your first approach to diagnosing a problem affecting your cluster should be to examine
the cluster log files for messages output by the HACMP subsystems. These messages
provide valuable information for understanding the current state of the cluster.
3-11
Student Notebook
/usr/es/adm/cluster.log
Contains time-stamped, formatted messages generated by HACMP scripts and
daemons.
Recommended Use: Because this log file provides a high-level view of current cluster
status, check this file first when diagnosing a cluster problem.
/tmp/hacmp.out
Contains time-stamped, formatted messages generated by HACMP scripts on the
current day.
In verbose mode (the default), this log file contains a line-by-line record of every
command executed by scripts, including the values of all arguments to each command.
An event summary of each high-level event is included at the end of each events
details.
Recommended Use: Because the information in this log file supplements and expands
upon the information in the /usr/es/adm/cluster.log file, it is the primary source of
information when investigating a problem.
Note: With recent changes in the way resource groups are handled and prioritized in
fallover circumstances, the hacmp.out file and its event summaries have become even
more important in tracking the activity and resulting location of your resource groups.
/usr/es/sbin/cluster/history/cluster.mmddyyyy
Contains time-stamped, formatted messages generated by HACMP scripts. The
system creates a cluster history file every day, identifying each file by its file name
extension, where mm indicates the month, dd indicates the day, and yyyy the year.
Recommended Use: Use the cluster history log files to get an extended view of cluster
behavior over time.
Note: This log is not a good tool for tracking resource groups processed in parallel. In
parallel processing, certain steps formerly run as separate events are now processed
differently and these steps will not be evident in the cluster history log. Use the
hacmp.out file to track parallel processing activity.
/tmp/cspoc.log
Contains time-stamped, formatted messages generated by HACMP C-SPOC
commands. The /tmp/cspoc.log file resides on the node that invokes the C-SPOC
command.
Recommended Use: Use the C-SPOC log file when tracing a C-SPOC commands
execution on cluster nodes.
V3.1.0.1
Student Notebook
Uempty
/var/hacmp/clverify/clverify.log
The /var/hacmp/clverify/clverify.log file contains the verbose messages output by the
cluster verification utility. The messages indicate the node(s), devices, command, etc. in
which any verification error occurred.
/tmp/emuhacmp.out
Contains time-stamped, formatted messages generated by the HACMP Event
Emulator. The messages are collected from output files on each node of the cluster, and
cataloged together into the /tmp/emuhacmp.out log file.
In verbose mode (recommended), this log file contains a line-by-line record of every
event emulated. Customized scripts within the event are displayed, but commands
within those scripts are not executed.
/tmp/clstrmgr.debug
Contains time-stamped, formatted messages generated by the clstrmgrES daemon.
/var/hacmp/clcomd/clcomd.log
Contains time-stamped, formatted messages generated by Cluster Communications
daemon (clcomd) activity. The log shows information about incoming and outgoing
connections, both successful and unsuccessful. Also displays a warning if the file
permissions for /usr/es/sbin/cluster/etc/rhosts are not set correctlyusers on the
system should not be able to write to the file.
Recommended Use: Use information in this file to troubleshoot inter-node
communications, and to obtain information about attempted connections to the daemon
(and therefore to HACMP).
/var/hacmp/clcomd/clcomddiag.log
Contains time-stamped, formatted, diagnostic messages generated by clcomd.
/var/adm/clavan.log
Contains the state transitions of applications managed by HACMP. For example, when
each application managed by HACMP is started or stopped and when the node stops
on which an application is running.
Each node has its own instance of the file. Each record in the clavan.log file consists of
a single line. Each line contains a fixed portion and a variable portion.
Recommended Use: By collecting the records in the clavan.log file from every node in
the cluster, the Application Availability Analysis utility (clavan) can determine how long
3-13
Student Notebook
each application has been up, as well as compute other statistics describing application
availability time.
/var/hacmp/utilities/cl_configassist.log
Contains debugging information for the Two-Node Cluster Configuration Assistant. The
Assistant stores up to ten copies of the numbered log files to assist with troubleshooting
activities.
/var/hacmp/log/clutils.log
Contains information about the date, time, results, and which node performed an
automatic cluster configuration verification.
It also contains information for the file collection utility, the two-node cluster
configuration assistant, the cluster test tool and the Online Planning Worksheet (OLPW)
conversion tool.
/var/hacmp/utilities/cl_testtool.log
Includes excerpts from the hacmp.out file. The Cluster Test Tool saves up to three log
files and numbers them so that you can compare the results of different cluster tests.
The tool also rotates the files with the oldest file being overwritten.
More details
These notes provide an overview of the HACMP log files. They will be discussed in
detail in later HACMP courses. In addition, see the following manual for more
information about using the HACMP log files:
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
Finding the log files
HACMP has many log files in many different directories. In addition, users can change
the location of one or more of the HACMP log files.
Fortunately, you can use odmget, as shown in the visual, to display the location of the
HACMP log files.
The RSCT log files cannot be relocated and will always be found in /var/ha/log.
3-15
Student Notebook
True or False?
clstat does not require clinfoES.
QV1251.2
Notes:
V3.1.0.1
Student Notebook
Uempty
3-17
Student Notebook
QV1251.2
Notes:
V3.1.0.1
Student Notebook
Uempty
usa
uk
X
QV1251.2
Notes
Introduction
Were now going to embark on a series of hypothetical scenarios to illustrate a number
of routine cluster administration tasks. Some of these scenarios are more realistic than
others.
3-19
Student Notebook
[Entry Fields]
[zwebgroup]
[usa uk] +
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Add a resource group
We use the Extended Configuration path.
The resource group will be configured to start up on whichever node is available first
and to never fallback when a node rejoins the cluster. The combination of these two
parameters should go a long way towards minimizing this resource groups downtime.
If youre familiar with the older terminology of cascading and rotating resource groups,
this resource groups policies make it essentially identical to a cascading without
fallback resource group.
V3.1.0.1
Student Notebook
Uempty
+--------------------------------------------------------------------------+
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
QV1251.2
Notes
Introduction
We need to define a service IP label for the zwebgroup resource group.
3-21
Student Notebook
Network name
The next step is to associate this Service Label with one of the HACMP networks. This
is not shown in the visual.
Alternate HW address
When you configure a service label, you can associate a hardware address with the IP
label and address for hardware address takeover, but only if you are using IPAT via
replacement.
V3.1.0.1
Student Notebook
Uempty
* Server Name
* Start Script
* Stop Script
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Add an application server
You must give it a name and specify a start and stop script that you have already tested
on each node that will support the application.
3-23
Student Notebook
[Entry Fields]
zwebgroup
custom
ignore
uk usa
Startup Behavior
Fallover Behavior
Fallback Behavior
Fallback Timer Policy (empty is immediate)
Service IP Labels/Addresses
Application Servers
[zweb]
[zwebserver]
+
+
[zwebvg]
false
false
[]
fsck
+
+
+
+
+
Volume Groups
Use forced varyon of volume groups, if necessary
Automatically Import Volume Groups
Filesystems (empty is ALL for VGs specified)
Filesystems Consistency Check
[MORE...17]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Adding resources to a resource group (extended path)
This is the first of two screens to show the Extended Path menu for adding attributes.
Unlike the Standard path, it contains a listing of all the possible attributes.
V3.1.0.1
Student Notebook
Uempty
[Entry Fields]
fsck
sequential
false
[]
[]
[]
+
+
+
+
+
+
+
Tape Resources
Raw Disk PVIDs
[]
[]
+
+
[]
[]
+
+
[]
[]
+
+
Miscellaneous Data
[BOTTOM]
F1=Help
F5=Reset
F9=Shell
[]
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Adding resources to a resource group (extended path)
Unlike the menu you see on the standard path, here you can see all of the options
available for configuring resources and attributes for a resource group. This includes
NFS exports and mounts, which are covered in Appendix B, with an accompanying
exercise in Appendix A in the exercise book.
3-25
Student Notebook
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
[Both]
+
[No]
+
[No]
+
[Standard]
+
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Extended path synchronization
This is the extended path screen to show the synchronization menu options which are
not shown in the standard path. An additional option to Automatically correct
errors found during verification is available when cluster services is down on all
nodes.
V3.1.0.1
Student Notebook
Uempty
usa
uk
india
QV1251.2
Notes
Expanding the cluster
In this scenario, well look at adding a node to a cluster.
3-27
Student Notebook
2.
3.
4.
5.
6. Add the new node to the existing cluster (from one of the existing
nodes)
7. Add non-IP networks for the new node
8. Synchronize your changes
9. Start HACMP on the new node
10. Add the new node to the appropriate resource groups
11. Synchronize your changes again
12. Run through your (updated) test plan
Copyright IBM Corporation 2007
QV1251.2
Notes
Adding a new cluster node
Adding a node to an existing cluster isnt all that difficult from the HACMP perspective
(as we see shortly). The hard work involves integrating the node into the cluster from an
AIX and from an application perspective.
Well be discussing the HACMP part of this work (starting at step 6 in the visual).
V3.1.0.1
Student Notebook
Uempty
* Cluster Name
New Nodes (via selected communication paths)
Currently Configured Node(s)
F1=Help
F2=Refresh
* Node Name
Communication Path
to Node
F5=Reset
F6=Command
F9=Shell
F10=Exit
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
F3=Cancel[india]
F4=List
+
F7=Edit [indiaboot1] F8=Image
Enter=Do
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Add node -- standard versus extended path
The extended path is a little different than the standard path in this case.
Standard path
From the standard path, you would select the menu Configure Nodes to an HACMP
Cluster (standard), which allows you to set the cluster name, and add additional
nodes via discovery using their communication paths. When you hit Enter from this
screen in the standard path, the network configuration for the added node would be
discovered automatically and added to the cluster configuration.
Extended path
From the extended path, you can specify the new node name (you type this in, there is
no selection from F4), and you can use F4 to select the boot IP label that you will use
Copyright IBM Corp. 2007
3-29
Student Notebook
for the communication path to the node (and which you have already added to the
/etc/hosts files on all nodes). Be aware that at this point youve only configured the
node definition. You must also configure the adapter definitions (boot adapter
definitions). To do this you use the extended path (Extended Topology,
Communications Interfaces/Devices). If you run cltopinfo at this point from the
administrative node, you will see the new node, but you wont see any of its interfaces.
V3.1.0.1
Student Notebook
Uempty
Move cursor to desired item and press F7. Use arrow keys to scroll.
# Node
Device
Device Path
Pvid
usa
tty0
/dev/tty0
uk
tty0
/dev/tty0
india
tty0
/dev/tty0
usa
tty1
/dev/tty1
uk
tty1
/dev/tty1
>
india
tty1
/dev/tty1
>
usa
tty2
/dev/tty2
uk
tty2
/dev/tty2
india
tty2
/dev/tty2
F1=Help
F2=Refresh
F3=Cancel
F7=Select
F8=Image
F10=Exit
F1 Enter=Do
/=Find
n=Find Next
F9+-------------------------------------------------------------------------+
QV1251.2
Notes
Introduction
This visual, and the next one, show how to add two more non-IP networks to our cluster.
Make sure that the topology of the non-IP networks that you describe to HACMP
corresponds to the actual topology of the physical rs232 cables.
In the following notes, we discuss why we need to add two more non-IP RS-232 links.
Note that if you are using heartbeat on disk the same two steps are required. There
must be a unique disk shared between india and usa, and india and uk in order to
define the two heartbeat on disk networks (one between india and usa, the other
between india and uk). You cant use an hdisk on one node for a heartbeat on disk
network with two different nodes.
3-31
Student Notebook
Mesh configuration
The most redundant configuration would be a mesh, each node connected to every
other node. However, if you have more than three nodes, this means extra complexity
and can mean a lot of extra hardware, depending on which type of non-IP network you
are using.
Note: For a three node cluster, a ring and a mesh are the same.
Three-node example
In the example in the visual, we already have a non-IP network between usa and uk so
we need to configure one between india and usa (on this page) and another one
between uk and india (on the next page).
If, for example, we left out the uk and india non-IP network then the loss of the usa
node would leave the uk and india nodes without a non-IP path between them.
Five-node example
In even larger clusters, it is still only necessary to configure a ring of non-IP networks.
For example, if the nodes are A, B, C, D and E then five non-IP networks would be the
minimum requirement: A to B, B to C, C to D, D to E and E to A being one possibility. Of
course, other possibilities exist like A to B, B to D, D to C, C to E and E to A.
V3.1.0.1
Student Notebook
Uempty
Move cursor to desired item and press F7. Use arrow keys to scroll.
# Node
Device
Device Path
Pvid
usa
tty0
/dev/tty0
uk
tty0
/dev/tty0
india
tty0
/dev/tty0
usa
tty1
/dev/tty1
uk
tty1
/dev/tty1
india
tty1
/dev/tty1
usa
tty2
/dev/tty2
>
uk
tty2
/dev/tty2
>
india
tty2
/dev/tty2
F1=Help
F2=Refresh
F3=Cancel
F7=Select
F8=Image
F10=Exit
F1 Enter=Do
/=Find
n=Find Next
F9+--------------------------------------------------------------------------+
QV1251.2
Notes
Define non-IP networks
Make sure that the topology of the non-IP networks that you describe to HACMP
corresponds to the actual topology of the physical rs232 cables.
3-33
Student Notebook
QV1251.2
Notes
Synchronize
At this point, all this configuration exists only on the node where the data was entered.
To populate the other nodes HACMP ODMs, you must synchronize. Once weve
synchronized our changes, the india node is an official member of the cluster.
V3.1.0.1
Student Notebook
Uempty
Figure 3-22. Final Steps: Add the Node to a Resource Group, Synchronize, and Test
QV1251.2
Notes
Add the node to a resource group
Remember that adding the new india node to the HACMP configuration is the easy
part. You would not perform any of the SMIT HACMP operations shown so far in this
scenario until you were CERTAIN that the india node was actually capable of running
the application.
3-35
Student Notebook
usa
uk
X
india
QV1251.2
Notes
Removing a node
In this scenario, we take a look at how to remove a node from an HACMP cluster.
V3.1.0.1
Student Notebook
Uempty
(Extended Configuration)
Synchronize
Once the synchronization is completed successfully, the departing node is
no longer a member of the cluster
QV1251.2
Notes
Removing a node
While removing a node from a cluster is another fairly involved process, some of the
work has little if anything to do with HACMP itself.
Use HACMP to move resource groups to other nodes before taking any other steps.
Next remove the node from membership in any resource groups. Remember that each
resource group must be associated with at least two nodes, so you may have to make
additional changes to your configuration.
After you stop HACMP on the departing node, you must remove it from the cluster
topology from another node. Synchronizing the cluster makes the removal of the node
complete.
3-37
Student Notebook
Removing an Application
The zwebserver application has been causing problems and
a decision has been made to move it out of the cluster
usa
uk
X
QV1251.2
Notes
Removing an application
In this scenario, we will remove an application from the control of HACMP. This means
we must remove the resource group that contains the application, and unconfigure the
applications resources.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Introduction
The procedure for removing a resource group is actually fairly straightforward.
Cluster snapshot
HACMP supports something called a cluster snapshot. This would be an excellent time
to take a cluster snapshot, just in case we decide to go back to the old configuration.
We will discuss snapshots later in this unit.
3-39
Student Notebook
the case of shared volume groups, tie up physical resources which could presumably
be better used elsewhere.
A cluster should not have any useless resources or components as anything which
simplifies the cluster tends to improve availability by reducing the likelihood of human
error.
V3.1.0.1
Student Notebook
Uempty
xwebgroup
ywebgroup
zwebgroup
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
QV1251.2
Notes
Removing a resource group
Make sure that you delete the correct resource group
3-41
Student Notebook
True or False?
Creating a third resource group on a cluster that has only one IP
network with two interfaces on each node requires using IPAT via
aliasing.
2. True or False?
It is NOT possible to add a node while HACMP is running.
3. Youve decided to add a third node to your existing two-node HACMP
cluster. What very important step, which will help prevent a partitioned
cluster, follows adding the node definition to the cluster configuration?
a. Install HACMP software
b. Configure a non-IP network
c. Start Cluster Services on the new node
d. Add a resource group for the new node
4. What should you do first when removing a node from a cluster?
a. Uninstall HACMP software
b. Move (or take offline) any resource groups online on the node
c. Remove the nodes IP address from the rhosts file
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
3-43
Student Notebook
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Introduction
You must develop good change management procedures for managing an HACMP
cluster. As you will see, C-SPOC utilities can be used to help, but do not do the job by
themselves. Having well documented and tested procedures to follow, as well as
restricting who can make changes, (for example you should not have more than two or
three persons with root privileges) minimizes loss of availability when making changes.
The snapshot utility should be used before any change is made.
3-45
Student Notebook
Recommendations
Implement and adhere to a change control/management
process
Wherever possible, use HACMP's C-SPOC facility to make
changes to the cluster (details to follow)
Document routine operational procedures in a step-by-step
list fashion (for example, shutdown, startup, increasing size
of a filesystem)
Restrict access to the root password to trained High
Availability cluster administrators
Always take a snapshot of your existing configuration before
making a change
QV1251.2
Notes
Some beginning recommendations
These recommendations should probably be considered to be the minimum acceptable
level of cluster administration. There are additional measures and issues which should
probably be carefully considered (for example, problem escalation procedures should
be documented, and both hardware and software support contracts should either be
kept current or a procedure developed for authorizing the purchase of time and
materials support during off hours should an emergency arise).
V3.1.0.1
Student Notebook
Uempty
As the cluster administrator you should make yourself part of every change
meeting that occurs on your HACMP systems
Think about the implications of the change on the cluster configuration and
function, keeping in mind the networking concepts weve discussed as well as
any changes to the applications data organization or start/stop procedures
- The onus should be on the requester of the change to demonstrate that it is
necessary
Not on the cluster administrators to demonstrate that it is unwise.
- Management must support the process
Defend cluster administrators against unreasonable request or pressure
Not allow politics to affect a change's priority or schedule
- Every change, even the minor ones, must follow the process
No system/cluster/database administrator can be allowed to sneak
changes past the process
The notion that a change might be permitted without following the process must
be considered to be absurd
Other recommendations
Ensure that you request sufficient time during the maintenance window for testing the
cluster. If this isnt possible, advise all parties of the risks of running without testing.
Update any documentation as soon as possible after the change is made to reflect the
new configuration/function of the cluster, if anything changes.
3-47
Student Notebook
Target
node
Initiating
node
Target
node
Target
node
Copyright IBM Corporation 2007
QV1251.2
Notes
C-SPOC command execution
C-SPOC commands first execute on the initiating node. Then the HACMP command
cl_rsh is used to propagate the command (or a similar command) to the target nodes.
More details
All the nodes in the resource group must be available or the C-SPOC command may be
performed partially across the cluster, only on the active nodes. This can lead to
3-48 HACMP II: Administration
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1.0.1
Student Notebook
Uempty
problems later when nodes are brought up and are out of sync with the other nodes in
the cluster.
As you saw in the LVM unit, LVM changes, if made through C-SPOC, may be
synchronized automatically (for enhanced concurrent mode volume groups, but only for
the LV information, not the filesystem information).
C-SPOC capabilities
You can use C-SPOC to do most cluster tasks, including managing users and security,
managing resources and resource group configurations, managing cluster services,
and managing physical and logical volume changes (including changes to volume
groups, logical volumes, and filesystems). You can use C-SPOC to add a user to the
cluster, synchronize passwords, add a physical volume, shared volume group, logical
volume, or filesystem to the cluster, or make changes to filesystems and logical
volumes.
Using C-SPOC will decrease the likelihood that you will make an error performing
cluster tasks, but is not a replacement for a good change management plan.
3-49
Student Notebook
F1=Help
F9=Shell
F2=Refresh
F10=Exit
F3=Cancel
Enter=Do
F8=Image
QV1251.2
Notes
Top-level C-SPOC menu
The top-level C-SPOC menu is one of the four top-level HACMP menus.
C-SPOC scripts are used for users, LVM, concurrent LVM, and physical volume
management.
clRGmove is used for resource group management.
The other functions are included here as a logical place to put these system
management facilities. We will look at Managing Cluster Services and the Logical
Volume Management tasks.
The fast path is smitty cl_admin.
V3.1.0.1
Student Notebook
Uempty
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
now
[usa,uk]
Automatically
true
true
false
Interactively
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
+
+
+
F4=List
F8=Image
QV1251.2
Notes
Briefly, how did we get here?
The first choice in the C-SPOC menu is Manage HACMP Services. This option brings up
another menu containing three choices, Start Cluster Service, Stop Cluster
Services and Show Cluster Services. This menu appears when we choose Start
Cluster Services. Better yet, just use the fast path, smitty clstart.
3-51
Student Notebook
You have a choice of any or all nodes in the cluster to start services. Use F4 to get a
pick list. If the field is left blank, services will be started on all nodes.
When cluster services is started, it acquires resources in resource groups as configured
and makes applications available. Beginning with HACMP V5.4, the function of
managing resource groups can be deferred if you choose Manually for the option
Manage Resource Groups. To allow cluster services to acquire resources and make
applications available if so configured (pre-HACMP v5.4 behavior), choose the default,
Automatically.
You can broadcast a message that cluster services are being started.
You have the option to start the Client Information Daemon, clinfo, along with the start of
cluster services. This is usually a good idea as it allows you to use the clstat cluster
monitor utility.
Finally, there are options regarding verification. Before cluster services is started, a
verification is run to ensure that you are not starting a node with an inconsistent
configuration. You can choose to ignore verification errors and start anyway. This is not
something that you would do unless you are very aware of the reason for the
verification error, you understand the ramifications of starting with the error and you
must activate cluster services. An alternative that is safer would be to choose to
Interactively correct errors found during verification. Not all errors can be corrected, but
you have a better chance of getting cluster services activated in a clean configuration
with this option.
The options that you choose here are retained in the HACMP ODM and repopulated on
reentry.
V3.1.0.1
Student Notebook
Uempty
State: UP
Address:
State:
Address:
State:
Address:
State:
Address:
State:
192.168.15.29
UP
192.168.16.29
UP
0.0.0.0
UP
192.168.5.92
UP
State: On line
QV1251.2
Notes
Remember patience
Patience is key with HACMP tasks. There are many things going on under the covers
when you ask the Cluster Manager to do something. Getting the OK in SMIT does
NOT mean that the task has been completely performed. Its just the beginning in many
cases.
Did I mention patience?
The Cluster Manager daemon queues events. It doesnt forget (usually anyway). So
keep in mind, that if you launch a task with the Cluster Manager and dont verify its
status closely and then attempt to give the process a boost by launching another task
(like following a resource group move with an offline) you have just queued the second
task. Once the Cluster Manager completes the first task, providing its in a state where it
can continue processing, it will perform the second task. This might not be what you
wanted.
3-53
Student Notebook
V3.1.0.1
Student Notebook
Uempty
[Entry Fields]
now
[usa]
true
Bring Resource Groups>
+
+
+
+
+--------------------------------------------------------------------------+
Shutdown mode
F1=Help
F2=Refresh
F3=Cancel
F1 F8=Image
F10=Exit
Enter=Do
F5 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
QV1251.2
Notes
Briefly, how did we get here?
From the Manage HACMP Services C-SPOC menu. This menu appears when we choose
Stop Cluster Services. You can use the fast path, smitty clstop.
3-55
Student Notebook
You have a choice of any or all nodes in the cluster to stop services. Use F4 to get a
pick list. If the field is left blank, services will be stopped on all nodes.
You can broadcast a message that cluster services are being stopped.
Finally, the options regarding resource group management. Prior to HACMP V5.4 the
options were graceful, takeover and forced. Graceful meant to bring resource groups
offline prior to stopping cluster services. Takeover meant to move resource groups to
other available nodes, if applicable, according to the current locations and fallover
policies of the resource groups. As you can see, these options map directly to the
current options and their functions are self-explanatory.
But what about forced down you say? Prior to HACMP V5.4, forcing down cluster
services was supported sometimes, in some scenarios and resulted in an environment
that was potentially unstable (that is, potentially unavailable), Forcing cluster services
down when using enhanced concurrent mode volume groups was not supported
because Group Services and gsclvmd were brought down as part of the forced down
operation. Group Services and gsclvmd are the components that maintain the volume
groups VGDA/VGSA integrity across all nodes. With HACMP V5.4 and later, forcing
down cluster services is supported by moving the resource groups to an unmanaged
state. In addition, the Cluster Manager and the RSCT infrastructure remain active
permitting this action with enhanced concurrent mode volume groups. Thus, the option
in the menu above, Unmanage Resource Groups. While in this state, the Cluster
Manager remains in the ST_STABLE state. It doesnt die gracefully and respawn as
stated earlier and doesnt return to the ST_INIT state. This allows the Cluster Manager
to participate in cluster activities and keep track of changes that occur in the cluster.
As with starting cluster services, the options that you choose here are retained in the
HACMP ODM and repopulated on reentry.
V3.1.0.1
Student Notebook
Uempty
uk # clstat -a
clstat - HACMP Cluster Status Monitor
------------------------------------Cluster: ibmcluster (1156578448)
Wed Aug 30 10:44:20 2006
State: UP
SubState: STABLE
Nodes: 2
Node: usa
State: DOWN
Interface: usaboot1 (2) Address:
State:
Interface: usaboot2 (2) Address:
State:
192.168.15.29
DOWN
192.168.16.29
DOWN
Have patience!
It can take a few
minutes
Figure 3-37. Verifying Cluster Services Have Stopped: Stopping Without Unmanaged Resource Groups
QV1251.2
Notes
Stop of cluster services without going to unmanaged
This means youve chosen to stop cluster services either with the Bring Resource
Groups Offline or Move Resource Groups option. In other words, its not a forced
down.
As with starting cluster services, remember that patience is key.
3-57
Student Notebook
0513-059 The clstrmgrES Subsystem has been started. Subsystem PID is nnnnnn.
Although you may find the output to be unreliable at times, the clstat utility is a good
mechanism to use. Note that it was run on another system, not the one where cluster
services was stopped. If youre not a fan of clstat consider using cldump, which relies
on SNMP directly.
Another option is to use lssrc. This is to be used with caution. You must understand
what state is expected and then be patient, retrying the command to ensure that the
state changes are no longer occurring. A state of ST_INIT is the indication that cluster
services has stopped on this node. This is the resulting state from a respawn of the
Cluster Manager daemon. As you will see in the next visual, stopping cluster services
with unmanaged resource groups leaves the Cluster Manager daemon in ST_STABLE.
Know what state to expect.
V3.1.0.1
Student Notebook
Uempty
uk # clstat -a
clstat - HACMP Cluster Status Monitor
-------------------------------------
Have patience!
It can take a few
minutes
Node: usa
Interface: usaboot1 (2)
State: UP
Address: 192.168.15.29
State: UP
Address: 192.168.5.92
State: UP
State: Unmanaged
Copyright IBM Corporation 2007
QV1251.2
Notes
Stop of cluster services with unmanaged resource groups
This means youve chosen to force down cluster services.
One more time, remember that patience is key. Did I mention that getting the OK in
SMIT does NOT mean that the task has been completely performed?
3-59
Student Notebook
the resource group state as Unmanaged and the service IP label is available. You only
stopped cluster services, not the resources.
The quickest way to see that there are unmanaged resources is to use clRGinfo. Note
that is shows the state of the resource group as unmanaged on both nodes. In fact, it
will show unmanaged on any node where that resource group can acquired as long as
this isnt a concurrent resource group. If the startup policy is Online on All Available
Nodes, it will show unmanaged only on the node where cluster services was stopped.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
The importance of LVM change management
LVM change management is critical for successful takeover in the event of a node
failure.
Information regarding LVM constructs is held in a number of different locations:
- physical disks: VGDA, LVCB
- AIX files: primarily the ODM, but also /usr/sbin/cluster/etc/vg, files in the /dev
directory and /etc/filesystems
- physical RAM: kernel memory space
This information must be kept in sync on all nodes which may access the shared
volume group(s) in order for takeover to work.
3-61
Student Notebook
V3.1.0.1
Student Notebook
Uempty
exportvg sharedvg
importvg -V123 -y sharedvg hdisk3
chvg -an sharedvg
varyoffvg sharedvg
QV1251.2
Notes
Making manual changes to the LVM
After making a change to an LVM component such as creating a new logical volume
and file system as shown in the figure, you must propagate the change to the other
nodes in the cluster which are sharing the volume group using the steps above. Make
sure that the auto activate is turned off (chvg -an sharedvg) after the importvg
command is executed since the Cluster Manager will control the use of the varyonvg
command on the node where the volume group should be varied on.
Other than the sheer complexity of this procedure, the real problem with it is that it
requires that the resource group be down while the procedure is being carried out.
Fortunately, there are better ways...
3-63
Student Notebook
11 12 1
10
2
3
9
4
8
7 6 5
11 12 1
10
2
3
9
4
8
7 6 5
QV1251.2
Notes
The lazy administrators solution
HACMP has a facility called Lazy Update that it uses to attempt to synchronize LVM
changes during a fallover.
HACMP uses a copy of the timestamp kept in the ODM and a timestamp from the
volume groups VGDA. AIX updates the ODM timestamp whenever the LVM component
is modified on that system. When a cluster node attempts to vary on the volume group,
HACMP for AIX compares the timestamp from the ODM with the timestamp in the
VGDA on the disk (use /usr/es/sbin/cluster/utilities/clvgdata hdiskn to find
the VGDA timestamp for a volume group). If the values are different, HACMP exports
and re-imports the volume group before activating it.
This method requires no downtime although it does increase the fallover time minimally
for the first fallover after the LVM change was made. Realize though that this isnt the
best solution and will not fix every situation where nodes are out-of-sync.
V3.1.0.1
Student Notebook
Uempty
update vg constructs
use C-SPOC syncvg
QV1251.2
Notes
Using C-SPOC to synchronize manual LVM changes
In this method, you manually make your change to the LVM on one node and then
invoke C-SPOC to propagate the change. Most likely the reason you are using this
C-SPOC task is because someone who is unfamiliar with cluster node management
made a change to a shared LVM component without using C-SPOC, creating an
out-of-sync condition between a node in the cluster and the rest of the nodes. This task
allows you to use C-SPOC to clean-up after-the-fact.
Note: If using an enhanced concurrent mode volume group and a filesystem has been
added to an existing logical volume without using C-SPOC, the imfs is not done
meaning this is an ineffective function. For this reason (among many others), you are
strongly encouraged to use C-SPOC to perform the LVM add/remove/update and not
use this mechanism to synchronize after-the-fact.
3-65
Student Notebook
Enhanced Concurrent
Mode Volume Groups
Another synchronization method is the use of ECMVGs
(Enhanced Concurrent Mode Volume Groups)
RSCT updates LVM information automatically for
ECMVGs
Happens immediately on all nodes running cluster services
Nodes that are not running cluster services will be updated when
cluster services are started
Limitations
Incomplete
/etc/filesystems not updated
Incompatible
Must be careful using ECMVGs if any product that is running on the
system places SCSI reserves on the disks as part of its function
Copyright IBM Corporation 2007
QV1251.2
Notes
RSCT as LVM change management
With enhanced concurrent mode (ECM) volume groups, RSCT will automatically
update the ODM on all the nodes which share the volume group when an LVM change
occurs on one node.
However, since it is limited to only ECM volume groups and since /etc/filesystems is
not updated, its better to explicitly use C-SPOC to make LVM changes.
V3.1.0.1
Student Notebook
Uempty
F2=Refresh
F10=Exit
F3=Cancel
Enter=Do
F1=Help
F9=Shell
F2=Refresh
F10=Exit
F3=Cancel
Enter=Do
F8=Image
QV1251.2
Notes
Introduction
This is the menu for using C-SPOC to perform LVM change management and
synchronization. As was mentioned in the LVM unit, you can make changes in AIX
directly and then synchronize OR, you can make the changes utilizing C-SPOC utilities
where the synchronization is automatic.
3-67
Student Notebook
How it works
Once you create a shared volume group, you must rerun the discovery mechanism
(refer to top-level menu in the Enhanced Configuration path) to get HACMP to know
about the volume group. You must then add the volume group to a resource group
before you can use C-SPOC to add shared logical volumes or filesystems.
Synchronization
Note that you only need to add the volume group to a resource group using SMIT from
one of the cluster nodes, and then you can start working with C-SPOC from the same
node. You do not need to synchronize the cluster between adding the volume group to a
resource group and working with it using C-SPOC unless you want to use C-SPOC
from some other node. Keep in mind that the volume group is not really a part of the
resource group until you synchronize that change.
V3.1.0.1
Student Notebook
Uempty
Node Names
PVID
VOLUME GROUP name
Physical partition SIZE in megabytes
Volume group MAJOR NUMBER
Enhanced Concurrent Mode
Enable Cross-Site LVM Mirroring Verification
[Entry Fields]
usa,uk
00055207bbf6edab 0000>
[xwebvg]
64
[207]
true
false
+
#
+
+
Warning :
Changing the volume group major number may result
in the command being unable to execute
successfully on a node that does not have the
major number currently available. Please check
for a commonly available major number on all nodes
before changing this setting.
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Creating a shared volume group
You can use C-SPOC to create a volume group but be aware that you must then add
the volume group name to a resource group and synchronize. This is one case of using
C-SPOC where synchronization is not automatic.
Before creating a shared volume group for the cluster using C-SPOC check that:
- All disk devices are properly attached to the cluster nodes
- All disk devices are properly configured on all cluster nodes and the device is listed
as available on all nodes
- Disks have a PVID
(C-SPOC lists the disks by their PVIDs. This ensures that we are using the same
disk on all nodes, even if the hdisk names are not consistent across the nodes).
This menu was reached through the Concurrent Logical Volume Management option
on the main C-SPOC menu.
Copyright IBM Corp. 2007
3-69
Student Notebook
F1=Help
Esc+9=Shell
F2=Refresh
Esc+0=Exit
F3=Cancel
Esc+8=Image
Enter=Do
Copyright IBM Corporation 2007
QV1251.2
Notes
Discover and add VG to resource group
After creating a volume group, you must discover it so that the new volume group will be
available in pick lists for future actions, like adding it to a resource group, and so forth.
You must use the Extended Configuration menu for both of these actions. Youll find
the discovery action at the top of the Extended Configuration menu shown in the visual.
To add the volume group to a resource group, youll use the Extended Resource
Configuration menu to get to the HACMP Extended Resource Group Configuration
menu.
V3.1.0.1
Student Notebook
Uempty
F2=Refresh
F7=Edit
F10=Exit
[Entry Fields]
xwebgroup
xwebvg
usa
[200]
[xweblv]
[jfs]
middle
minimum
[]
+
+
#
F3=Cancel
F8=Image
Enter=Do
F4=List
The volume group must be in a resource group that is online or it does not appear in
the pop-up list.
Copyright IBM Corporation 2007
QV1251.2
Notes
Creating a shared file system using C-SPOC
It is generally preferable to control the names of all of your logical volumes.
Consequently, it is generally best to explicitly create a logical volume for the file system.
If the volume group does not already have a JFS log, then you must also explicitly
create a logical volume for the JFS log and format it with logform. The same can be said
if you are creating a JFS2 filesystem (unless you plan to use inline logs, then the jfs2log
wont be needed).
The volume group to which you wish to add the filesystem must be online. Your choice,
either varyonvg the volume group manually, or via starting cluster services.
However, C-PSOC allows you to add a journaled file system to either:
- A shared volume group (no previously defined cluster logical volume)
SMIT checks the list of nodes that can own the resource group that contains the
volume group, creates the logical volume (on an existing log logical volume if
Copyright IBM Corp. 2007
3-71
Student Notebook
present, otherwise it creates a new log logical volume) and adds the file system to
the node where the volume group is varied on (whether it was varied on by the
C-SPOC utility or it was already online). All other nodes in the resource group run an
importvg -L for non-enhanced concurrent mode volume groups, or an imfs for
enhanced concurrent mode volume groups.
- A previously defined cluster logical volume (in a shared volume group)
SMIT checks the list of nodes that can own the resource group which contains the
volume group where the logical volume is located. It adds the file system to the node
where the volume group is varied on (whether it was varied on by the C-SPOC utility
or it was already online). All other nodes in the resource group run an importvg -L
for non-enhanced concurrent mode volume groups, or an imfs for enhanced
concurrent mode volume groups.
V3.1.0.1
Student Notebook
Uempty
Node Names
LOGICAL VOLUME name
* MOUNT POINT
PERMISSIONS
Mount OPTIONS
Start Disk Accounting?
Fragment Size (bytes)
Number of bytes per inode
Allocation Group Size (MBytes)
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
usa,uk
xweblv
[/xwebfs]
read/write
[]
no
4096
4096
8
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
+
+
F4=List
F8=Image
QV1251.2
Notes
Creating a shared file system, step 2
Once youve created the logical volume, then create a file system on it. Use the path
that allows creating a file system on a previously defined logical volume.
3-73
Student Notebook
+--------------------------------------------------------------------------+
# Resource Group
File System
xwebgroup
/xwebfs
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9+--------------------------------------------------------------------------+
Copyright IBM Corporation 2007
QV1251.2
Notes
Changing a shared file system using C-SPOC
We have to provide the name of the file system which we want to change. The file
system must be in a volume group which is currently online somewhere in the cluster
and is already configured into a resource group.
V3.1.0.1
Student Notebook
Uempty
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
[Entry Fields]
discovery
/xwebfs
[/xwebfs]
[4000000]
[]
no
read/write
[]
no
4096
4096
no
F3=Cancel
F7=Edit
Enter=Do
+
+
+
+
F4=List
F8=Image
QV1251.2
Notes
Changing file system size
Specify a new file system size, in 512 byte blocks, and press Enter. The file system is
re-sized and the relevant LVM information is updated on all cluster nodes configured to
use the file systems volume group.
3-75
Student Notebook
F1=Help
Esc+9=Shell
F2=Refresh
Esc+0=Exit
F3=Cancel
Enter=Do
Esc+8=Image
QV1251.2
Notes
HACMP resource group and application management
This visual shows the selections for managing resource groups. We can control if and
where resource groups are running, control application monitoring, and perform
application availability analysis.
In this section, well examine the choices for managing the state and running location of
a resource group using C-SPOC.
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Priority override location (POL)
HACMP V5.x introduced the notion of a priority override location. A POL overrides all
other fallover/fallback policies and possible locations for the resource group.
A resource group does not normally have a priority override location. The destination
node that you specify for a resource group move, online or offline request (see next
couple of visuals) becomes the priority override location for the resource group. The
resource group remains on that node in an online state (if you moved or brought it
online there) or offline state (if you took it offline there) until the POL is cancelled.
3-77
Student Notebook
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
New POL behavior in older versions of HACMP
New behavior is in the following levels and later:
- HACMP V5.3 PTF IY84883 May 2006
- HACMP V5.2 PTF IY82989 April 2006
- HACMP V5.1 PTF IY84646 May 2006
In the levels shown above, the problem where the resource group moved on
Restore_Node_Priority_Order regardless of fallback policy settings is fixed.
Now the Restore_Node_Priority_Order only resets the POL setting, without resource
group movement, unless the fallback policy is Fallback to Higher Priority Node In the
List. In that case, the behavior is the same as the old way.
3-79
Student Notebook
V3.1.0.1
Student Notebook
Uempty
*usa
uk
india
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
Copyright IBM Corporation 2007
QV1251.2
Notes
Moving a resource group
You can request that a resource group be moved to any node that is in the resource
groups list of nodes.
The clRGmove utility program is used, which can also be invoked from the command
line. See the man page for details.
The destination node that you specify becomes the resource groups priority override
location. On a subsequent move, the original highest priority node is marked with an
asterisk (*).
3-81
Student Notebook
# Resource Group
State
Node(s) / Site
xwebgroup
ONLINE
usa
/
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
QV1251.2
Notes
Bring a resource group offline -> select a resource group
To start, you must select the resource group you wish to take offline. Only resource
groups that are currently online will be shown.
V3.1.0.1
Student Notebook
Uempty
usa
uk
F1=Help
F2=Refresh
F3=Cancel
F8=Image
F10=Exit
Enter=Do
F1 /=Find
n=Find Next
F9
QV1251.2
Notes
Bring a resource group online
Bringing a resource group online will activate the resources in it on the target node. You
may want to manually bring resource groups online after performing verification of a
node that rejoins the cluster after a forced down.
3-83
Student Notebook
True or False?
Using C-SPOC reduces the likelihood of an outage by reducing the
likelihood that you will make a mistake.
2. True or False?
C-SPOC reduces the need for a change management process.
3. C-SPOC cannot do which of the following administration tasks?
a. Add a user to the cluster.
b. Change the size of a filesystem.
c. Add a physical disks to the cluster.
d. Add a shared volume groups to the cluster.
e. Synchronize existing passwords.
f. None of the above.
4. True or False?
It does not matter which node in the cluster is used to initiate a
C-SPOC operation.
5. True or False?
Priority Override Location behavior changed in HACMP V5.4 to
prevent actions that conflict with desired resource group fallback
behavior.
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.
2.
3.
4.
5.
True or False?
A star configuration is a good choice for your non-IP networks.
True or False?
RSCT will automatically update /etc/filesystems when using
enhanced concurrent mode volume groups
True or False?
With HACMP V5.4, a resource groups priority override location can
be cancelled by selecting a destination node of
Restore_Node_Priority_Order.
You want to create an enhanced concurrent mode volume group that
will be used in a resource group that will have an Online on Home
Node Only Startup policy. Which C-SPOC menu should you use?
a. HACMP Logical Volume Management
b. HACMP Concurrent Logical Volume Management
You want to add a logical volume to the volume group you created in
the question above. Which C-SPOC menu should I use?
a. HACMP Logical Volume Management
b. HACMP Concurrent Logical Volume Management
QV1251.2
Notes
3-85
Student Notebook
Unit Summary
Key points from this unit:
There are many tools and log files that can be used for
monitoring a cluster
Cluster tools: clstat, cldump, cltopinfo, clRGinfo
AIX commands: lssrc, lsvg, mount, netstat
Use odmget HACMPlogs to find log files
The SMIT standard and extended menus are used to make
topology and resource group changes
Implementing procedures for change management is a critical
part of administering a HACMP cluster
C-SPOC provides facilities for performing common cluster wide
administration tasks from any node within the cluster
Perform routine administrative changes
Start and stop cluster services
Perform resource group move operations
Start and stop cluster services
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html
HACMP for AIX manuals
4-1
Student Notebook
Unit Objectives
After
After completing
completing this
this unit,
unit, you
you should
should be
be able
able to:
to:
Describe
Describe the
the HACMP
HACMP options
options for
for securing
securing cluster
cluster
communications
communications
Connection
Connection authentication
authentication method
method
VPN
VPN tunnels
tunnels for
for cluster
cluster communications
communications
Message
Message authentication
authentication and
and encryption
encryption
QV1251.2
Notes
4-2
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Cluster communications
A Cluster Communications daemon (clcomd) runs on each HACMP node to
transparently manage inter-node communications for HACMP. This daemon
consolidates communication mechanisms in HACMP and decreases management
traffic on the network. This communication infrastructure requires only one common
communication path, rather than multiple TCP connections, between each pair of
nodes.
Most components communicate through the Cluster Communications daemon, but
some components use a different mechanism for inter-node communications:
- Cluster Manager
RSCT
- Heartbeat messaging
RSCT
SNMP
4-3
Student Notebook
Standard
Kerberos (SP only)
IPSec (VPN) tunnels for cluster communications
Is an Internet standard
Made of a set of cryptographic protocols for:
Securing packet flows
Key exchange
QV1251.2
Notes
Security options
There are three ways that you can configure security in an HACMP cluster:
Connection authentication
Connection authentication is based around the clcomd HACMP authentication
process, or Kerberos. Kerberos is a network authentication protocol that is based on
a secret key encryption scheme that is used only on SP systems.
Cluster communications
IPSec (IP security) is a standardized framework for securing Internet Protocol (IP)
communications by encrypting and/or authenticating each IP packet in data stream.
There are two modes of IPSec operation: transport mode and tunnel mode.
In transport mode only the payload (message) of the IP packet is encrypted. It is
fully routable since the IP header is sent as plain text; however, it cannot cross
4-4
V3.1.0.1
Student Notebook
Uempty
Network Address Translation (NAT) interfaces, as this will invalidate its hash value.
Transport mode is used for host-to-host communications over a LAN.
In tunnel mode, the entire IP packet is encrypted. It must then be encapsulated into
a new IP packet for routing to work. Tunnel mode is used for network-to-network
communications (secure tunnels between routers) or host-to-network and
host-to-host communications over the Internet.
IPSec is implemented by a set of cryptographic protocols for (1) securing packet
flows and (2) Internet key exchange. Of the former, there are two:
Authentication Header (AH), which provides authentication, payload (message) and
IP header integrity and with some cryptography algorithm also non-repudiation, but
does not offer confidentiality; and
Encapsulating Security Payload (ESP), which provides data confidentiality, payload
(message) integrity and with some cryptography algorithm also authentication.
Originally AH was only used for integrity and ESP was used only for encryption;
authentication functionality was added subsequently to ESP. Currently only one key
exchange protocol is defined, the IKE (Internet Key Exchange) protocol.
IPSec protocols operate at the network layer, layer 3 of the OSI model. Other
Internet security protocols in widespread use, such as SSL and TLS, operate from
the transport layer up (OSI layers 4-7). This makes IPSec more flexible, as it can be
used for protecting both TCP and UDP-based protocols, but increases its complexity
and processing overhead, as it cannot rely on TCP (layer 4 OSI model) to manage
reliability and fragmentation.
Message authentication and encryption
Message authentication and encryption rely on secret key technology. For
authentication, the message is signed and the signature is encrypted by a key when
sent, and the signature is decrypted and verified when received. For encryption, the
encryption algorithm uses the key to make data unreadable. The message is
encrypted when sent and decrypted when received.
You can enable message authentication alone, or both message authentication and
encryption.
4-5
Student Notebook
QV1251.2
Notes
How standard authentication works
The clcomd daemon authenticates each inbound session by checking the session's
source IP address against a list of addresses in /usr/sbin/cluster/etc/rhosts and the
addresses configured into the cluster itself (in other words, in the HACMPadapter and
HACMPnode ODM files). In order to defeat any attempt at IP spoofing (a very
timing-dependent technique which involves faking a session's source IP address), each
non-call-back session is checked by connecting back to the source IP address and
verifying who the sender is.
The action taken to a request depends on the state of the /usr/sbin/cluster/etc/rhosts. If
a cluster node is being moved to a new cluster or if the entire cluster configuration is
being redone from scratch, it may be necessary to empty /usr/es/sbin/cluster/etc/rhosts
or manually populate it with the appropriate IP addresses for the new cluster.
4-6
V3.1.0.1
Student Notebook
Uempty
4-7
Student Notebook
QV1251.2
Notes
Setting up cluster communications over VPN
VPN support relies on the IP Security feature in AIX. There are a number of additional
filesets which need to be installed that are listed in the visual. Choose the desired
bos.msg.LANG.net.ipsec filesets, and the bos.crypto-priv fileset for your country.
You can configure VLANs in AIX using SMIT or Web-based System Manager. For more
information about VPNs, you can go to
http://www.ibm.com/servers/aix/products/ibmsw/security/vpn/techref/
A ESP host-to-host transport VPN tunnel over the persistent address network is
recommend in this example. The topic of IPSec itself is way beyond the realms of this
course. Further reading and education can be found in the AIX 5L Version 5.3 Security
Guide and the AU42 Security course.
4-8
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
Configure HACMP to use VPN tunnels
Once the tunnel has been created, you need to instruct HACMP to use it with the SMIT
menu shown in the visual. You will select Yes for the field Use Persistent Labels for
VPN Tunnels, and then synchronize the cluster.
4-9
Student Notebook
QV1251.2
Notes
Additional IP security
Optionally, you can configure IP filter rules to implicitly deny port 6191 across the
HACMP boot IP networks. To do this, add an IP security filter rule that denies access to
the port for boot IP addresses.
V3.1.0.1
Student Notebook
Uempty
HACMP Message
Authentication and Encryption (1 of 3)
1. Install rsct.crypt.<symmetric crypt algorithm>
DES:
3DES:
AES:
Low encryption
Medium encryption
High encryption
smitty hacmp -> C-SPOC -> Security and Users -> HACMP Cluster
Security -> Configure Message Authentication Mode and Key
Management
QV1251.2
Notes
Cluster Security Services
Message authentication and encryption rely on Cluster Security (CtSec) Services in
AIX, and use the encryption keys available from Cluster Security Services. HACMP
message authentication uses message digest version 5 (MD5) to create the digital
signatures for the message digest. Message authentication uses the following types of
keys to encrypt and decrypt signatures and messages (if selected):
- Data encryption standard (DES)
- Triple DES
- Advanced encryption standard (AES)
The message authentication mode is based on the encryption algorithm. Your selection
of a message authentication mode depends on the security requirements for your
HACMP cluster.
4-11
Student Notebook
Prerequisites
The HACMP product does not include encryption libraries. Before you can use
message authentication and encryption, the following AIX 5L filesets must be installed
on each cluster node:
- For data encryption with DES message authentication: rsct.crypt.des
- For data encryption standard Triple DES message authentication: rsct.crypt.3des
- For data encryption with Advanced Encryption Standard (AES) message
authentication: rsct.crypt.aes256
You can install these filesets from the AIX 5L Expansion Pack CD-ROM.
If you install the AIX 5L encryption filesets after you have HACMP running, restart the
Cluster Communications daemon to enable HACMP to use these filesets. To restart the
Cluster Communications daemon:
stopsrc -s clcomdES
startsrc -s clcomdES
V3.1.0.1
Student Notebook
Uempty
Message Authentication
and Encryption (2 of 3)
4. If you trust your network then HACMP can distribute the secure key.
Enable key distribution on all nodes:
# clkeygen e Enabled
0513-077 Subsystem has been changed.
0513-044 The clcomdES Subsystem was requested to stop.
0513-059 The clcomdES Subsystem has been started. Subsystem
PID is 315598.
The key distribution was Enabled
QV1251.2
Notes
Managing keys
HACMP cluster security uses a shared common (symmetric) key. This means that each
node must have a copy of the same key for inter-node communications to be
successful. You control when keys change and how keys are distributed.
The steps above show only the commands for enabling message authentication and
encryption, assuming that we trust our network in allowing HACMP to automatically
distribute the key. SMIT can also be used to accomplish this, but using the command
line is much easier. In the lab you will explore both automatic and manual key
distribution methods.
If you want HACMP to distribute keys automatically, you have to enable key distribution
on each node, as shown in the visual (or using the Extended Configuration ->
Security and Users Configuration -> HACMP Cluster Security -> Configure
Message Authentication Mode and Key Management -> Enable/Disable
Automatic Key Distribution SMIT path).
Copyright IBM Corp. 2007
4-13
Student Notebook
V3.1.0.1
Student Notebook
Uempty
Message Authentication
and Encryption (3 of 3)
6. Set HACMP to use Message Authentication and
Encryption
# clchclstr -m 'md5_3des' e
Cluster Name: myapp_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: md5_3des
Cluster Message Encryption: Enabled
Use Persistent Labels for Communication: No
QV1251.2
Notes
Configure HACMP and synchronize
Either before or after you create, distribute and activate keys on all nodes, you must
configure HACMP to use message authentication and encryption. You can do this with
the command shown in the visual, or from the Extended Configuration -> Security
and Users Configuration -> HACMP Cluster Configuration -> Configure
Message Authentication Mode and Key Management -> Configure Message
Authentication Mode SMIT menu path.
After you configure HACMP to use message authentication and encryption,
synchronize the cluster.
Key maintenance
It may be necessary to periodically create, distribute and activate new keys to satisfy
your security requirements.
Copyright IBM Corp. 2007
4-15
Student Notebook
QV1251.2
Notes
The bigger picture
A holistic approach to security is a general approach to system hardening. It is
important to There are numerous security configuration settings in any operating
system, and mastering all of them is no small task. A good start is reading the AIX
Security Guide. It describes settings and tools for securing the operating system and
network, and also describes the AIX Security Expert, available in AIX 5.3.
V3.1.0.1
Student Notebook
Uempty
4-17
Student Notebook
Checkpoint (1 of 2)
1. Which daemon, which uses the
/usr/es/sbin/cluster/etc/rhosts file for authentication, do
most inter-node communications use:
a. RSCT
b. clcomd
c. SNMP
d. clinfo
2. True or False: HACMP supports two connection
authentication methods, Standard and Kerberos.
3. True or False: Use of VPN tunnels for cluster
communications requires that nodes are configured
with persistent IP labels, and that HACMP is configured
to use them.
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
Checkpoint (2 of 2)
4. True or False: You can enable message encryption
without enabling message authentication.
5. Which of the following is TRUE about configuring
message encryption in HACMP:
a. It is a simple, one-step process
b. It only requires AIX base install and HACMP filesets
c. It requires installing rsct.crypt and performing tasks
on all nodes to enable and implement key
distribution and activation
d. It can only be configured on the command line
6. True or False: AIX Security Expert provides automatic
configuration of security settings, including those for
TCP, NET, IPSEC, system, and auditing.
Copyright IBM Corporation 2007
QV1251.2
Notes:
4-19
Student Notebook
Unit Summary
Key
Key points
points from
from this
this Unit:
Unit:
There
There are
are several
several security
security options
options that
that can
can be
be
configured
configured for
for HACMP
HACMP
connection
connection authentication
authentication method,
method, VPN
VPN tunnels,
tunnels, and
and
message
message authentication
authentication and
and encryption
encryption
Connection
Connection authentication
authentication methods
methods
Standard,
Kerberos
Standard, Kerberos
VPN
VPN tunnels
tunnels
Requires
Requires AIX
AIX IP
IP security
security filesets
filesets
Use
Use persistent
persistent labels
labels for
for VPN
VPN tunnels
tunnels
Message
Message authentication
authentication and
and encryption
encryption
Requires
rsct.crypt
Requires rsct.crypt
Enable
Enable distribution,
distribution, create
create keys,
keys, distribute
distribute and
and
activate
activate them
them
Keep
Keep the
the big
big picture
picture in
in mind
mind and
and use
use aa holistic
holistic approach
approach
to
to security,
security, or
or cluster
cluster security
security wont
wont make
make aa difference
difference
Copyright IBM Corporation 2007
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
4-21
Student Notebook
V3.1.0.1
Student Notebook
AP
Checkpoint Solutions
1.
2.
3.
4.
True or False: A Resource may belong to more than one Resource group.
5.
A-1
Student Notebook
Unit 2
Checkpoint Solutions (1 of 3)
1. Which of the following statements is TRUE (pick the best answer)?
a. Static application data should always reside on private storage.
b. Dynamic application data should always reside on shared
storage.
c. Shared storage must always be simultaneously accessible in
read-write mode to all cluster nodes.
d. Application binaries should only be placed on shared storage.
2. True or False?
Using RSCT-based shared disk protection results in slower
fallovers.
3. Which of the following disk technologies are supported by
HACMP?
a. SCSI
b. SSA
c. FC
d. All of the above
Copyright IBM Corporation 2007
A-2
V3.1.0.1
Student Notebook
AP
Unit 2
Checkpoint Solutions (2 of 3)
4. True or False?
You should check the vendors website for supported
HACMP configurations when using SAN based storage units
(DS8000, ESS, EMC HDS, and so forth).
5. True or False?
hdisk numbers must map to the same PVIDs across an entire
HACMP cluster.
6. True or False?
Lazy update attempts to keep VGDA constructs in sync
between cluster nodes (reserve/release-based shared
storage protection)
7. Which of the following commands will bring a volume group
named vgA online?
a. mountvg vgA
b. getvg vgA
c. attachvg vgA
d. varyonvg vgA
Copyright IBM Corporation 2007
A-3
Student Notebook
Unit 2
Checkpoint Solutions (3 of 3)
8. True or False?
Quorum should always be disabled on shared volume groups.
9. True or False?
File system and logical volume attributes cannot be changed while
the cluster is operational.
10. True or False?
An enhanced concurrent volume group is required for the
heartbeat over disk feature.
A-4
V3.1.0.1
Student Notebook
AP
Unit 3
3.
True or False?
clstat does not require clinfoES.
A-5
Student Notebook
Unit 3
True or False?
Creating a third resource group on a cluster that has only one IP
network with two interfaces on each node requires using IPAT via
aliasing.
2. True or False?
It is NOT possible to add a node while HACMP is running.
3. Youve decided to add a third node to your existing two-node HACMP
cluster. What very important step follows adding the node definition to
the cluster configuration (whether through standard or extended path)?
a. Install HACMP software
b. Configure a non-IP network
c. Start Cluster Services on the new node
d. Add a resource group for the new node
4. What should you do first when removing a node from a cluster?
a. Uninstall HACMP software
b. Move (or take offline) any resource groups online on the node
c. Remove the nodes IP address from the rhosts file
A-6
V3.1.0.1
Student Notebook
AP
Unit 3
True or False?
Using C-SPOC reduces the likelihood of an outage by reducing the
likelihood that you will make a mistake.
2. True or False?
C-SPOC reduces the need for a change management process.
3. C-SPOC cannot do which of the following administration tasks?
a. Add a user to the cluster.
b. Change the size of a filesystem.
c. Add a physical disks to the cluster.
d. Add a shared volume groups to the cluster.
e. Synchronize existing passwords.
f. None of the above. (e was correct for previous versions)
4. True or False?
It does not matter which node in the cluster is used to initiate a
C-SPOC operation.
5. True or False?
Priority Override Location behavior changed in HACMP V5.4 to
prevent actions that conflict with desired resource group fallback
behavior.
Copyright IBM Corporation 2007
A-7
Student Notebook
Unit 3
Checkpoint Solutions
1.
2.
3.
4.
5.
True or False?
A star configuration is a good choice for your non-IP networks.
True or False?
RSCT will automatically update /etc/filesystems when using
enhanced concurrent mode volume groups
True or False?
With HACMP V5.4, a resource groups priority override location can
be cancelled by selecting a destination node of
Restore_Node_Priority_Order.
You want to create an enhanced concurrent mode volume group that
will be used in a resource group that will have an Online on Home
Node Startup policy. Which C-SPOC menu should you use?
a. HACMP Logical Volume Management
b. HACMP Concurrent Logical Volume Management
You want to add a logical volume to the volume group you created in
the question above. Which C-SPOC menu should I use?
a. HACMP Logical Volume Management
b. HACMP Concurrent Logical Volume Management
A-8
V3.1.0.1
Student Notebook
AP
Unit 4
Checkpoint Solutions (1 of 2)
1. Which daemon, which uses the
/usr/es/sbin/cluster/etc/rhosts file for authentication, do
most inter-node communications use:
a. RSCT
b. clcomd
c. SNMP
d. clinfo
2. True or False: HACMP supports two connection
authentication methods, Standard and Kerberos.
3. True or False: Use of VPN tunnels for cluster
communications requires that nodes are configured with
persistent IP labels, and that HACMP is configured to
use them.
Copyright IBM Corporation 2007
A-9
Student Notebook
Unit 4
Checkpoint Solutions (2 of 2)
4. True or False: You can enable message encryption
without enabling message authentication.
5. Which of the following is TRUE about configuring
message encryption in HACMP:
a. It is a simple, one-step process
b. It only requires AIX base install and HACMP filesets
c. It requires installing rsct.crypt and performing
tasks on all nodes to enable and implement key
distribution and activation
d. It can only be configured on the command line
6. True or False: AIX Security Expert provides automatic
configuration of security settings, including those for
TCP, NET, IPSEC, system, and auditing.
Copyright IBM Corporation 2007
V3.1.0.1
Student Notebook
AP
Appendix B
Checkpoint Solutions
1.
True or False?
2.
a.
b.
c.
d.
3.
a.
b.
4.
True or False?
5.
True or False?
A-11
Student Notebook
Appendix C
Checkpoint Solutions
1.
True or False?
In HACMP 5.4, the configuration of WebSMIT is simplified by a new
utility (websmit_config) that configures WebSMIT to be independent
of the system-wide Web server configuration.
2.
True or False?
The /usr/es/sbin/cluster/wsm/README file describes the use of the
websmit_config utility.
3.
True or False?
Only HACMP SMIT panels can be accessed using Web SMIT.
4.
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4861-09 HACMP for AIX, Version 5.4 Planning Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
HACMP manuals
B-1
Student Notebook
Unit Objectives
After completing this unit, you should be able to:
Explain the concepts of Network File System (NFS)
Configure HACMP to support NFS
Discuss why Volume Group major numbers must be unique
when using NFS with HACMP
Outline the NFS configuration parameters for HACMP
QV1251.2
Notes
Objectives
In this unit, we examine how NFS can be integrated in to HACMP in order to provide a
Highly Available Network File System.
B-2
V3.1.0.1
Student Notebook
Uempty
NFS mount
NFS Server
read-write
NFS mount
read-only
JFS mount
read-only
NFS mount
QV1251.2
Notes
NFS
NFS is a suite of protocols which allow file sharing across an IP network. An NFS server
is a provider of file service (that is, a file, a directory or a file system). An NFS client is a
recipient of a remote file service. A system can be both an NFS client and server at the
same time.
B-3
Student Notebook
n x biod
/etc/exports
/etc/filesystems
QV1251.2
Notes
NFS processes
The NFS server uses a process called mountd to allow remote clients to mount a local
disk or CD resource across the network. One or more nfsd processes handle I/O on the
server side of the relationship.
The NFS client uses the mount command to establish a mount to a remote storage
resource which is offered for export by the NFS server. One or more block I/O
daemons, biod, run on the client to handle I/O on the client side.
The server maintains details of data resources offered to clients in the /etc/exports file.
Clients can automatically mount network file systems using the /etc/filesystems file.
B-4
V3.1.0.1
Student Notebook
Uempty
# mount aservice:/fsa /a
The A resource group specifies:
aservice as a service IP label resource
/fsa as a filesystem resource
/fsa as a NFS filesystem to export
aservice
export /fsa
A
/fsa
# mount /fsa
Hudson
Bondar
Copyright IBM Corporation 2007
QV1251.2
Notes
Combining NFS with HACMP
We can combine NFS with HACMP in order to achieve a Highly Available Network File
System. One node in the cluster mounts the disk resource locally and offers that disk
resource for export across the IP network. Clients optionally mount the disk resource. A
second node is configured to take over the NFS export in the event of node failure.
There is one unusual aspect to the above configuration which should be discussed. The
HACMP cluster is exporting the /fsa file system via the aservice service IP label. The
client is mounting the aservice:/fsa file system on the local mount point /a. This is
somewhat unusual in the sense that client systems usually use a local mount point
which is the same as the NFS file systems name on the server.
In the configuration shown above, there is no particularly good reason why the client is
using a different mount point than /fsa and, in fact, the client is free to use whatever
mount point is wishes to use including, of course, /fsa. Why this example is using a
local mount point of /a will become clear shortly.
Copyright IBM Corp. 2007
B-5
Student Notebook
# mount aservice:/fsa /a
client system "sees" /fsa as /a
aservice
/fsa
export /fsa
A
# mount /fsa
Bondar
Hudson
Copyright IBM Corporation 2007
QV1251.2
Notes
Fallover
If the node offering the NFS export should fail, a standby node takes over the shared
disk resource, locally mounts the file system, and exports the file system or directory for
remote mount.
If the client was not accessing the disk resource during the period of the fallover, then it
is not aware of the change in which node is serving the NFS export.
Note that the aservice service IP label is in the resource group which is exporting /fsa.
The HACMP NFS server support requires that resource groups which export NFS
filesystems be configured to use IPAT since the client system is not capable of dealing
with two different IP addresses for its NFS server depending on which node the NFS
server service happens to be running on.
B-6
V3.1.0.1
Student Notebook
Uempty
[Entry Fields]
Volume Groups
Use forced varyon of volume groups, if necessary
Automatically Import Volume Groups
[aaavg]
false
false
+
+
+
[]
fsck
sequential
true
[/fsa]
+
+
[]
[]
+
+
+
+
+
[MORE...10]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Configuring NFS for high availability
The visual shows the resource group attributes which are important for configuring an
NFS file system.
- Filesystems/Directories to Export
Specifies the filesystems to be NFS exported.
- Filesystems mounted before IP configured
When implementing NFS support in HACMP, you should also set this option. This
prevents access from a client before the filesystems are ready.
- Filesystem (empty is ALL for VGs specified)
This particular example also explicitly lists the /fsa filesystem as a resource to be
included in the resource group (see the Filesystem (empty is ALL for VGs specified)
field). This is not necessary as this field could have been left blank to indicate that all
B-7
Student Notebook
the filesystems in the aaavg volume group should be treated as resources within the
resource group.
B-8
V3.1.0.1
Student Notebook
Uempty
aservice
/a
/fsa
/a
QV1251.2
Notes
Cross-mounting
We can use HACMP to mount an NFS exported filesystem locally on all the nodes
within the cluster. This allows two or more nodes to have access to the same disk
resource in parallel. An example of such a configuration might be a shared repository
for the product manuals (read only) or a shared /home filesystem (read-write). One
node mounts the filesystem locally, then exports the filesystem. All nodes within the
resource group then NFS mount the filesystem.
By having all nodes in the resource group act as an NFS client including the node which
holds the resource group, it is not necessary for the takeover node to unmount the
filesystem before becoming the NFS server.
B-9
Student Notebook
V3.1.0.1
Student Notebook
Uempty
aservice
/fsa
/a
/a
QV1251.2
Notes
Fallover with a cross-mounted file system
If the left-hand node fails then HACMP on the right hand node initiates a fallover of the
resource group. This primarily consists of:
- Assigning or aliasing (depending on which flavor of IPAT is being used) the
aservice service IP label to a NIC
- Varying on the shared volume group and mounting the /fsa journaled filesystem
- NFS exporting the /fsa filesystem
Note that the right hand node already has the aservice:/fsa filesystem NFS mounted
on /a.
B-11
Student Notebook
aservice
export /fsa
A
# mount /fsa
# mount aservice:/fsa
/a
/fsa
# mount aservice:/fsa /a
Bondar
Hudson
Copyright IBM Corporation 2007
QV1251.2
Notes
Cross-mounting details
The key change, compared to the configuration which did not use cross-mounting, is
that this configurations resource group lists /fsa as a NFS filesystem and specifies that
it is to be mounted on /a. This causes every node in the resource group to act as an
NFS client with aservice:/fsa mounted at /a. Only the node which actually has the
resource group is acting as an NFS server for the /fsa filesystem.
V3.1.0.1
Student Notebook
Uempty
aGservice
aservice
export /fsa
/fsa
# mount /fsa
# mount aservice:/fsa /a
Bondar
# mount aservice:/fsa /a
Hudson
QV1251.2
Notes
Network for NFS mount
HACMP allows you to specify which network should be used for NFS exports from this
resource group.
In this scenario, we have an NFS cross-mount within a cluster which has two IP
networks. For some reason, probably that the net_ether_01 network is either a faster
networking technology or under a lighter load, the cluster administrator has decided to
force the cross-mount traffic to flow over the net_ether_01 network.
This field is relevant only if you have filled in the Filesystems/Directories to NFS
Mount field. The Service IP Labels/IP Addresses field should contain a service label
which is on the network you select.
If the network you have specified is unavailable when the node is attempting to NFS
mount, it will seek other defined, available IP networks in the cluster on which to
establish the NFS mount.
Copyright IBM Corp. 2007
B-13
Student Notebook
[Entry Fields]
Volume Groups
Use forced varyon of volume groups, if necessary
Automatically Import Volume Groups
[aaavg]
false
false
+
+
+
[]
fsck
+
+
true
[/fsa]
+
+
[/a;/fsa]
[net_ether_01] +
[MORE...10]
F1=Help
F5=Reset
F9=Shell
F2=Refresh
F6=Command
F10=Exit
F3=Cancel
F7=Edit
Enter=Do
F4=List
F8=Image
QV1251.2
Notes
Configuring HACMP for cross-mounting
The directory or directories to be cross-mounted are specified in the
Filesystems/Directories to NFS Mount field. The network to be used for NFS
cross-mounts is optionally specified in the Network for NFS Mount field.
Cross-mount syntax
Note the rather strange /a;/fsa syntax for specifying the directory to be
cross-mounted. This rather unusual syntax is explained in the next foil.
Note that the resource group must include a service IP label which is on the
net_ether_01 network (aservice in the previous foil).
V3.1.0.1
Student Notebook
Uempty
/a;/fsa
# mount aservice:/fsa /a
QV1251.2
Notes
Syntax for specifying cross-mounts
The inclusion of a semi-colon in the Filesystems/Directories to NFS Mount field
indicates that the newer (and easier to work with) approach to NFS cross-mounting
described in this unit is in effect. The local mount point to be used by all the nodes in the
resource group when they act as NFS clients is specified before the semi-colon. The
NFS filesystem which they are to NFS mount is specified after the semi-colon.
Since the configuration specified in the last HACMP smit screen uses net_ether_01 for
cross-mounts and the service IP label on the net_ether_01 network is aservice (see
the diagram a couple of foils back showing the two IP networks), each node in the
resource group will mount aservice:/fsa on their local /a mount point directory.
B-15
Student Notebook
system
system
system
201,
203,
205,
The command lvlstmajor will list the available major numbers for each
node in the cluster
For example:
# lvlstmajor
43...200,202,206...
The VG major number may be set at the time of creating the VG using SMIT
mkvg or by using the -V flag on the importvg command, for example:
# importvg -V100 -y shared_vg_a hdisk2
C-SPOC will "suggest" a VG major number which is unique across the nodes
when it is used to create a shared volume group
QV1251.2
Notes
VG major numbers
Volume group major numbers must be the same for any given volume group across all
nodes in the cluster. This is a requirement for any volume group that has filesystems
which are NFS exported to clients (either within or without the cluster).
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes
HACMP exports file
As mentioned in the visual, if you need to specify NFS options, you must use the
HACMP exports file, not the standard AIX exports file. You can use AIX smit mknfsexp
to build the HACMP exports file:
Add a Directory to Exports List
* PATHNAME of directory to export []
* MODE to export directory read-write
HOSTS & NETGROUPS allowed client access
Anonymous UID
HOSTS allowed root access
HOSTNAME list. If exported read-mostly
Use SECURE option?
Public filesystem?
* EXPORT directory now, system restart or both
PATHNAME of alternate Exports file
/
[]
[-2]
[]
[]
no
+
no
+
both
+
[/usr/es/sbin/cluster/etc/exports]
B-17
Student Notebook
Checkpoint
1.
True or False?
2.
a.
b.
4.
True or False?
True or False?
QV1251.2
Notes
V3.1.0.1
Student Notebook
Uempty
Unit Summary
Key points from this unit:
HACMP provides a means to make Network File System (NFS) highly available
Faster takeover: takeover node does not have to unmount the file
system
A preferred network can be selected
Really only for read only file systems: NFS cross-mounted file systems
can be mounted read-write, but concurrent write attempts will produce
inconsistent results
Use GPFS for true concurrent access
Non-default export options can be specified in /usr/es/sbin/cluster/etc/exports
QV1251.2
Notes
B-19
Student Notebook
V3.1.0.1
Student Notebook
Uempty
References
SC23-5209-00 HACMP for AIX, Version 5.4 Installation Guide
SC23-4864-09 HACMP for AIX, Version 5.4:
Concepts and Facilities Guide
SC23-4862-09 HACMP for AIX, Version 5.4 Administration Guide
SC23-5177-03 HACMP for AIX, Version 5.4 Troubleshooting Guide
SC23-4867-08 HACMP for AIX, Version 5.4 Master Glossary
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
HACMP manuals
C-1
Student Notebook
Unit Objectives
After completing this unit, you should be able to:
Configure and use WebSMIT
QV1251.2
Notes:
C-2
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
Introduction
WebSMIT combines the advantages of SMIT with the ease of access from any system
which runs a browser.
For those looking for a graphical interface for managing and monitoring HACMP,
WebSMIT provides those capabilities via a web browser. It provides real-time graphical
status of the cluster components, similar to the clstat.cgi. It also provides context menu
access to those components to control by launching a WebSMIT menu containing the
action(s) to take. There are multiple views, Node-by-node, Resource Group,
Associations, component Details, and so on.
Configuration
This utility uses snmp, so it is imperative that you have your snmp interface to the
Cluster Manager functioning. To test that, attempt a cldump command on the system
Copyright IBM Corp. 2007
C-3
Student Notebook
where you will be running the WebSMIT utility. A configuration utility is provided
(websmit_config) requiring that only a supported http server is installed to configure
the system for use as a WebSMIT server. A robust control tool is provided as well to
control the http server functioning. The tool is called websmitctl.
C-4
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
Introduction
To connect to WebSMIT, point your browser to the cluster node that you have
configured for WebSMIT.
WebSMIT uses port 42267 by default.
After authentication, this will be the first screen that you see. Note the Navigation Frame
(left side) and the Activity Frame (right side). Also, note that were looking at
configuration options only. Each pane is tabulated to provide access to different status,
functions or controls.
Navigation Frame Tabs
- SMIT - access to HACMP SMIT
- N&N - a node-by-node relationship and status view of the cluster (if snmp can get
cluster information)
C-5
Student Notebook
- RGs - a resource group relationship and status view of the cluster status
Expand All / Collapse All links can be used to get the full view or clean up the view.
Activity Frame Tabs
- Configuration - permanent access to HACMP SMIT from Activity Frame
- Details - comes to top when a component is selected in Navigation Frame, and
displays configuration information about the component
- Associations - shows component relationship to other HACMP components for
component that is selected in the Navigation Frame
- Doc - If the HACMP pubs were installed (html or pdf version), this tab will display
links to access them
Dont attempt to navigate using the browsers Back/Forward buttons. Note the FastPath
box at the bottom of the Configuration Tab. This allows you to go directly to any SMIT
panel (HACMP or other) if you know the fastpath.
C-6
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
Using the context menus
Right-click the object in the Navigation Frame. Choose the item you want to control from
the context menu and watch the Activity Frame change to the task youre trying to
perform. Remember this is still SMIT, so youll get HACMP SMIT menus as a result of
the context menu selections.
Status
Notice that the icons (on the screen anyway) are color coded. This is real-time status.
More to come on the next visual, regarding the associations.
C-7
Student Notebook
WebSMIT Associations
QV1251.2
Notes:
Associations
To see associations, go to the RGs tab, select (left mouse click) Resource Group, then
select the Associations tab.
If you dont click fast enough (or just pause long enough) between the selection of the
resource group and clicking on the Associations tab, youll see the Details tab come to
the top of the Activity Frame with the configuration details of the resource group.
C-8
V3.1.0.1
Student Notebook
Uempty
QV1251.2
Notes:
Online documentation
This screen allows you to view the HACMP manuals in either HTML or PDF format. You
must install the HACMP documentation filesets.
C-9
Student Notebook
WebSMIT Configuration
/usr/es/sbin/cluster/wsm/README
Setting up WebSMIT online documentation
Install cluster.doc.en_US.es.html and cluster.doc.en_US.es.pdf
Security considerations
wsm_smit.conf
wsm_cmd_exec
Log files
wsm_smit.log
wsm_smit.script
QV1251.2
Notes:
Documentation
The primary source for information on configuring WebSMIT is the WebSMIT README
file as shown in the visual. The HACMP Installation Guide provides some additional
information on installation and the HACMP Administration Guide provides information
on using WebSMIT.
Web server
To use WebSMIT, you must configure one (or more) of your cluster nodes as a Web
server. You must use either IBM HTTP Server (IBMIHS) V6.0 (or later) or Apache 1.3
(or later). Refer to the specific documentation for the Web server you choose.
This configuration is done using the websmit_config utility, located in
/usr/es/sbin/cluster/wsm. See the README file for details.
V3.1.0.1
Student Notebook
Uempty
WebSMIT security
Since WebSMIT gives you root access to all the nodes in your cluster, you must
carefully consider the security implications.
WebSMIT uses a configuration file, wsm_smit.conf, that contains settings for
WebSMIT's security related features. This file is installed as
/usr/es/sbin/cluster/wsm/wsm_smit.conf, and it may not be moved to another
location. The default settings used provide the highest level of security in the default
AIX/Apache environment. However, you should carefully consider the security
characteristics of your system before putting WebSMIT to use. It may be possible to use
different combinations of security settings for AIX, Apache, and WebSMIT to improve
the security of the application in your environment.
WebSMIT uses the following mechanisms to implement a secure environment:
-
Non-standard port
Secure http (https)
User authentication
Session time-out
wsm_cmd_exec setuid program
C-11
Student Notebook
gaining access.
(Refer to the documentation included with Apache for more details about Apache's
built-in authentication.)
The default value for REQUIRE_AUTHENTICATION is 1. If REQUIRE_AUTHENTICATION is
set, then the HACMP administrator must specify one or more users who are allowed to
access the system. This can be done using the wsm_smit.conf ACCEPTED_USERS
setting. Only users whose names are specified will be allowed access to WebSMIT, and
all ACCEPTED_USERS will be provided with root access to the system. By default, only the
root user is allowed access via the ACCEPTED_USERS setting.
Because AIX authentication mechanisms are in use, login failures can cause an account to
be locked. It is recommended that a separate user be created for the sole purpose of
accessing WebSMIT. If the root user has a login failure limit, failed WebSMIT login attempts
could quickly lock the root account.
Session time-out
Continued access to WebSMIT is controlled through the use of a non-persistent session
cookie. Cookies must be enabled in the client browser in order to use AIX
authentication for access control. If the session is used continuously, then the cookie
will not expire. However, the cookie is designed to time out after an extended period of
inactivity. WebSMIT allows the user to adjust the time-out period using the
wsm_smit.conf SESSION_TIMEOUT setting. This configuration setting must have a value
expressed in minutes. The default value for SESSION_TIMEOUT is 20 (minutes).
Controlling access to wsm_cmd_exec (setuid)
A setuid program is supplied with WebSMIT that allows non-root users to execute
commands with root permissions (wsm_cmd_exec). The setuid bit for this program must
be turned on in order for the WebSMIT system to function.
It is also very important for security reasons that wsm_cmd_exec does not have read
permission for non-root users. It should not be made possible for a non-root user to
copy the executable to another location or to decompile the program.
Thus the utility wsm_cmd_exec (located in /usr/es/sbin/cluster/wsm/cgi-bin/) must be
set with 4511 permissions.
See the README for details.
Care must be taken to limit access to this executable. WebSMIT allows the user to
dictate the list of users who are allowed to use the wsm_cmd_exec program using the
wsm_smit.conf REQUIRED_WEBSERVER_UID setting. The real user ID of the process
must match the UID of one of the users listed in wsm_smit.conf in order for the
program to carry out any of its functionality. The default value for
REQUIRED_WEBSERVER_UID is nobody.
By default, a Web server CGI process runs as user nobody, and by default it is not
possible for non-root users to execute programs as user nobody. If your http server
C-12 HACMP II: Administration
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1.0.1
Student Notebook
Uempty
Log files
All operations of the WebSMIT interface are logged to the wsm_smit.log file and are
equivalent to the logging done with smitty -v. Script commands are also captured in
the wsm_smit.script log file.
WebSMIT log files are created by the CGI scripts using a relative path of <../logs>. If
you copy the CGI scripts to the default location for the IBM HTTP Server, the final path
to the logs is /usr/IBMIHS/logs.
The WebSMIT logs are not subject to manipulation by the HACMP Log Viewing and
Management SMIT panel. Also, just like smit.log and smit.script, the files grow
indefinitely.
The snap -e utility captures the WebSMIT log files if you leave them in the default
location (/usr/es/sbin/cluster/wsm/logs); but if you install WebSMIT somewhere else,
snap -e will not find them.
C-13
Student Notebook
- wsm_smit.redirect
Instead of simply rejecting access to a specific page, you can redirect the user to a
different page. The default .redirect file has entries to redirect the user from specific
HACMP SMIT panels that are not supported by WebSMIT.
V3.1.0.1
Student Notebook
Uempty
Checkpoint
1.True or False?
In HACMP 5.4, the configuration of WebSMIT is simplified by a new utility
(websmit_config) that configures WebSMIT to be independent of the
system-wide Web server configuration.
2.True or False?
The /usr/es/sbin/cluster/wsm/README file describes the use of the
websmit_config utility.
3.True or False?
Only HACMP SMIT panels can be accessed using Web SMIT.
4.What file controls security settings for Web SMIT?
a. /usr/es/sbin/cluster/wsm/wsm_smit.conf
b. /usr/es/sbin/cluster/wsm/wsm_smit.redirect
c. /usr/es/sbin/cluster/wsm/wsm_smit.log
d. /usr/es/sbin/cluster/wsm/wsm_smit.script
QV1251.2
Notes:
C-15
Student Notebook
Unit Summary
Key points from this unit:
WebSMIT provides a graphical user interface for HACMP configuration,
management, and monitoring from a browser
It uses snmp to provide information about the cluster
Requires that a Web server is installed
It uses port 42267 by default
A configuration utility called websmit_config provides automatic
configuration if Apache or IBM HTTP Server is installed
The WebSMIT interface provides access to documentation if it is installed
Security is configured in the /usr/es/sbin/cluster/wsm/wsm_smit.conf file
REDIRECT_TO_HTTPS, AUTHORIZED_PORT,
REQUIRE_AUTHENTICATION, ACCEPTED_USERS
QV1251.2
Notes:
V3.1.0.1
backpg
Back page