You are on page 1of 46

MC91: High Availability for WebSphere MQ

MC91: High Availability for WebSphere MQ on


UNIX platforms
Version 7

April 2008

WebSphere MQ Development

IBM Hursley

Property of IBM
MC91: High Availability for WebSphere MQ

Take Note!

Before using this document, be sure to read the general information under “Notices”.

Fourth Edition, April 2008

This edition applies to Version 7.0 of SupportPac MC91 and to all subsequent releases and modifications
unless otherwise indicated in new editions.

© Copyright International Business Machines Corporation 2000, 2008. All rights reserved. Note to US
Government Users—Documentation related to restricted rights—Use, duplication or disclosure is subject to
restrictions set forth in GSA ADP Schedule contract with IBM Corp.

1
MC91: High Availability for WebSphere MQ

Table of Contents
Take Note!...................................................................................................................................................... 1

Table of Contents ........................................................................................................................................... 2

Notices............................................................................................................................................................ 4

Summary of Changes ..................................................................................................................................... 5

Trademarks................................................................................................................................................. 5

IMPORTANT: VERSIONS AND MIGRATION.......................................................................................... 7

Introduction .................................................................................................................................................... 8

Concepts ..................................................................................................................................................... 8

Definition of the word “cluster” ................................................................................................................. 8

Functional Capabilities ............................................................................................................................... 9

Installation .....................................................................................................................................................11

Installing the SupportPac...........................................................................................................................11

Configuration.................................................................................................................................................13

Step 1. Configure the HA Cluster..............................................................................................................14

Step 2. Configure the shared disks ............................................................................................................15

Step 3. Create the Queue Manager ............................................................................................................17

Step 4. Configure the movable resources ..................................................................................................18

Step 5. Configure the Application Server or Agent...................................................................................20

Step 6. Configure an Application Monitor ................................................................................................22

Step 7. Removal of Queue Manager from Cluster.....................................................................................24

Step 8. Deletion of Queue Manager ..........................................................................................................25

Upgrading WMQ software in a cluster..........................................................................................................26

Applying maintenance...............................................................................................................................26

Commands.....................................................................................................................................................27

hacrtmqm...................................................................................................................................................27

halinkmqm.................................................................................................................................................28

2
MC91: High Availability for WebSphere MQ

hadltmqm command ..................................................................................................................................29

hamqm_start ..............................................................................................................................................30

hamqm_stop ..............................................................................................................................................31

/MQHA/bin/rc.local...................................................................................................................................32

Working with other HA products ..................................................................................................................33

Related products ........................................................................................................................................33

Suggested Test...............................................................................................................................................34

Appendix A. Sample Configuration Files for VCS .......................................................................................37

types.cf ......................................................................................................................................................37

main.cf.......................................................................................................................................................37

Appendix B. Messages produced by MQM agent for VCS...........................................................................40

3
MC91: High Availability for WebSphere MQ

Notices
This report is intended to help the customer or IBM systems engineer configure WebSphere MQ (WMQ)
for UNIX platforms in a highly available manner using various High Availability products.

References in this report to IBM products or programs do not imply that IBM intends to make these
available in all countries in which IBM operates.

While the information may have been reviewed by IBM for accuracy in a specific situation, there is no
guarantee that the same or similar results will be obtained elsewhere.

The data contained in this report was determined in a controlled environment, and therefore results
obtained in other operating environments may vary significantly.

The following paragraph does not apply to any country where such provisions are inconsistent with local
law:

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS


IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore this statement may not apply to you.

References in this publication to IBM products, programs, or services do not imply that IBM intends to
make these available in all countries in which IBM operates. Any reference to an IBM product, program, or
service is not intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any of the intellectual property
rights of IBM may be used instead of the IBM product, program, or service. The evaluation and verification
of operation in conjunction with other products, except those expressly designated by IBM, are the
responsibility of the user.

Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independent created programs and other programs (including this one)
and (ii) the mutual use of the information which has been exchanged, should contact Laboratory Counsel,
Mail Point 151, IBM United Kingdom Laboratories, Hursley Park, Winchester, Hampshire SO21 2JN,
England. Such information may be available, subject to appropriate terms and conditions, including in
some cases, payment of a fee.

IBM may have patents or pending patent applications covering subject matter in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries,
in writing, to the IBM Director of Licensing, 500 Columbus Avenue, Thornwood, New York 10594,
U.S.A.

4
MC91: High Availability for WebSphere MQ

Summary of Changes
May 1997

• Version 1.0 Initial release of the HACMP SupportPac MC63

November 2000

• Version 2.0 Updated to reflect current versions of products

December 2005

• Version 6.0 Replacement of MC63, MC6A and MC6B with MC91 to combine HACMP,
MC/ServiceGuard and Veritas Cluster Server (VCS) documentation and scripts. Updated to
support WebSphere MQ V6. Version numbered to match current version of WMQ.

January 2007

• Version 6.0.1 Updated with various comments for clarity. Code changes for MC/ServiceGuard to
properly export environment variables. Code changes for VCS monitor script where queue
manager name includes a “.”. Added amqcrsta to the list of processes to kill.

April 2008

• Version 7.0 Updated for compatibility with WebSphere MQ V7. Removed migration script.

Trademarks
The following terms are trademarks of the IBM Corporation in the United States, or other countries, or
both:

o IBM
o MQ
o MQSeries
o AIX
o HACMP
o WebSphere MQ

The following are trademarks of Hewlett Packard in the United States, or other countries, or both:

o HP-UX
o Multi-Computer ServiceGuard (MC/ServiceGuard)

The following terms are trademarks of Sun Microsystems in the United States, or other countries, or both:

o Solaris

The following terms are trademarks of Symantec Corporation in the United States, or other countries, or
both:

o Veritas

UNIX is a registered trademark in the United States and other countries licensed exclusively through
X/Open Company Limited.

5
MC91: High Availability for WebSphere MQ

Other company, product, and service names, which may be denoted by a double asterisk (**), may be
trademarks or service marks of others.

6
MC91: High Availability for WebSphere MQ

IMPORTANT: VERSIONS AND MIGRATION


This SupportPac is designed for WebSphere MQ (WMQ) V7. In an attempt to simplify the scripts, which
had accumulated a lot of baggage from many previous versions of WMQ, the directories handled by the
halinkmqm and hadltmqm scripts have been reduced to only those required by WMQ V7.

Because there is little change between WMQ V6 and WMQ V7 in the relevant areas for HA, then the
scripts can still be used with WMQ V6.

No script is provided in this package to automate migration from earlier versions of WMQ. Again this is
because of the variety of older versions that might have been in use, and because WMQ V5.3 is in any case
no longer a supported release. Providing meaningful error handling was considered too problematic while
keeping the scripts simple enough to understand. The only step required for V6 to V7 migration is
described here.

Ideally queue managers will be newly created or recreated for V7 using these scripts. SupportPac MS03
can be useful to rebuild definitions of objects from a queue manager. However, we recognise it is not an
ideal world! If you have a queue manager created with the V6 HA scripts and you wish to continue to use
them with WMQ V7 then there is just one small change in the directory layout that you need to implement
manually. This change should be done BEFORE upgrading the WMQ product version (or at least before
restarting a queue manager) as migration of queue manager data is done automatically during the first
restart after code upgrades. For older versions of WMQ, the approach we recommend for HA is to recreate
the queue manager.

To change an existing V6 HA queue manager into the V7 layout, you need to add the qmgrlocl
subdirectory to the local IPC directory for the queue manager, and create a symlink from the queue
manager’s data directory. The queue manager can be kept running through this modification, as it will not
be using the new directory until the product code is updated and the queue manager restarted.

On all nodes of the cluster, assuming you are running as root or mqm

mkdir /var/mqm/ipc/<QMGR>/qmgrlocl
chown mqm:mqm /var/mqm/ipc/<QMGR>/qmgrlocl
chmod 775 /var/mqm/ipc/<QMGR>/qmgrlocl

On the node that currently owns the queue manager data directory

ln –fs /var/mqm/ipc/<QMGR>/qmgrlocl /var/mqm/qmgrs/<QMGR/qmgrlocl

7
MC91: High Availability for WebSphere MQ

Introduction
Concepts
This SupportPac provides notes and sample scripts to assist with the installation and configuration of
WebSphere MQ (WMQ) V6 and V7 in High Availability (HA) environments. Three different platforms
and environments are described here, but they share a common design and this design can also be extended
for many other systems.

Specifically this SupportPac deals with the following HA products:

• HACMP (High Availability Cluster Multi Processing)

• Veritas Cluster Server (VCS)

• MC/ServiceGuard (MCSG)

The corresponding operating systems for which these HA products have been built are AIX, Solaris and
HP-UX respectively. There is a separate SupportPac MC41 that provides similar function for WMQ on
i5/OS, and support for Microsoft Cluster Services on Windows is built into the WMQ product. A section of
this document describes how designs for other platform/HA product combinations can be implemented. In
some cases, an HA product vendor might also include support for WMQ components, but we cannot
comment on their suitability.

This document shows how to create and configure WMQ queue managers such that they are amenable to
operation within an HA cluster. It also shows how to configure the HA product to take control of such
queue managers. This SupportPac does not include details of how to configure redundant power supplies,
redundant disk controllers, disk mirroring or multiple network or adapter configurations. The reader is
referred to the HA product’s documentation for assistance with these topics.

WMQ includes many functions to assist with availability. However by using WMQ and HA products
together, it is possible to further enhance the availability of WMQ queue managers. With a suitably
configured HA cluster, it is possible for failures of power supplies, nodes, disks, disk controllers, networks,
network adapters or queue manager processes to be detected and automatically trigger recovery procedures
to bring an affected queue manager back online as quickly as possible. More information about WMQ’s
availability features can be found at

ibm.com/developerworks/websphere/library/techarticles/0505_hiscock/0505
_hiscock.html

It is assumed that the reader of this document has already decided to use an HA cluster – we will not go
through the benefits of these systems again.

Definition of the word “cluster”


The word “cluster” has a number of different meanings within the computing industry. Throughout this
document, unless explicitly noted otherwise, the word “cluster” is used to describe an HA cluster, which is
a collection of nodes and resources (such as disks and networks) which cooperate to provide high
availability of services running within the cluster. It is worth making a clear distinction between such an
“HA cluster” and the use of the phrase “WMQ Cluster”, which refers to a collection of queue managers
which can allow access to their queues by other queue managers in the cluster. The relationship between
these two types of cluster is described in “Relationship to WMQ Clusters” later in this chapter.

8
MC91: High Availability for WebSphere MQ

Functional Capabilities
Cluster Configurations
This SupportPac can be used to help set up either standby or takeover configurations, including mutual
takeover where all cluster nodes are running WMQ workload. Throughout this document we try to use the
word “node” to refer to the entity that is running an operating system and the HA software; “system” or
“machine” or “partition” or “blade” might be considered synonyms in this usage.

A standby configuration is the most basic cluster configuration in which one node performs work whilst
the other node acts only as standby. The standby node does not perform work and is referred to as idle; this
configuration is sometimes called “cold standby”. Such a configuration requires a high degree of hardware
redundancy. To economise on hardware, it is possible to extend this configuration to have multiple worker
nodes with a single standby node, the idea being that the standby node can take over the work of any other
worker node. This is still referred to as a standby configuration and sometimes as an “N+1” configuration.

A takeover configuration is a more advanced configuration in which all nodes perform some kind of work
and critical work can be taken over in the event of a node failure. A “one sided takeover” configuration is
one in which a standby node performs some additional, non critical and non movable work. This is rather
like a standby configuration but with (non critical) work being performed by the standby node. A “mutual
takeover” configuration is one in which all nodes are performing highly available (movable) work. This
type of cluster configuration is also sometimes referred to as “Active/Active” to indicate that all nodes are
actively processing critical workload.

With the extended standby configuration or either of the takeover configurations it is important to consider
the peak load which may be placed on any node which can take over the work of other nodes. Such a node
must possess sufficient capacity to maintain an acceptable level of performance.

Cluster Diagram
Cluster Diagram
The cluster could also have additional nodes, public and private networks, network adapters, disks and disk controllers

Remote
WebSphere MQ Queue
Clients Managers

public network (e.g. ethernet)

QMgr1 can migrate to other can migrate to other QMgr2


internal Service Address adapters or nodes
internal
adapters or nodes Service Address
disks disks

Node A Node B
private network (e.g. serial)

Highly Available can migrate to can migrate to Highly Available


QMgr1 other node/s other node/s QMgr2

QMgr1 Data QMgr2 Data

resources managed by cluster shared disks

9
MC91: High Availability for WebSphere MQ

WMQ Monitoring
This SupportPac includes a monitor for WMQ, which will allow the HA product to monitor the health of
the queue manager and initiate recovery actions that you configure, including the ability to restart the queue
manager locally or move it to an alternate system.

Relationship to WMQ Clusters


WMQ Clusters reduce administration and provide load balancing of messages across instances of cluster
queues. They also offer higher availability than a single queue manager, because following a failure of a
queue manager, messaging applications can still access surviving instances of a cluster queue. However,
WMQ Clusters alone do not provide automatic detection of queue manager failure and automatic triggering
of queue manager restart or failover. HA clusters provide these features. The two types of cluster can be
used together to good effect.

WMQ Clients
WMQ Clients which are communicating with a queue manager that may be subject to a restart or takeover
should be written to tolerate a broken connection and should repeatedly attempt to reconnect. WebSphere
MQ Version 7 introduces features in the processing of the Client Channel Definition Table that assist with
connection availability and workload balancing; however these are not directly relevant when working with
a failover system.

The Extended Transactional Client, which allows a WMQ Client to participate in two-phase transactions
must always connect to the same queue manager and cannot use techniques such as an IP load-balancer to
select from a list of queue managers. When an HA product is used, a queue manager maintains its identity
(name and address) whichever node it is running on so the ETC can be used with queue managers that are
under HA control.

10
MC91: High Availability for WebSphere MQ

Installation
The operating system, the HA product and WebSphere MQ should already be installed, using the normal
procedures on all nodes in the cluster. You should install WMQ onto local disks (which might be on a
SAN, but they appear to be local filesystems) on each of the nodes and not attempt to share a single
installation on shared disks. It is important that under normal operating conditions you are running identical
versions of software on all cluster nodes. The only exception to this is during a rolling upgrade.

When installing WMQ, ignore the advice in the WMQ documentation about creating separate /var/mqm
and /var/mqm/log filesystems. This is not the preferred configuration in an HA environment. See under
“Chapter 3. Configuration” for more details.

When installing WMQ in a cluster, it is essential that the “mqm” username and “mqm” groupname have
been created and each have the same numeric value on all of the cluster nodes.

Installing the SupportPac


For HACMP and MC/ServiceGuard:
For each node in the cluster, log on as mqm or root. Create the /MQHA/bin directory. This is the working
directory assumed by the example scripts. Download the SupportPac onto each of the cluster nodes into a
temporary directory, uncompress and untar it. Then copy the files from the mcsg or hacmp subdirectory to
/MQHA/bin.

All of the scripts in the working directory need to have executable permission. The easiest way to do this is
to change to the working directory and run

chmod 755 ha*

You could use a different location than the default working directory if you wanted to, but you would have
to change the example scripts.

For VCS:
For each node in the cluster, log on as root. Create the /opt/VRTSvcs/bin/MQM directory. This is the
working directory assumed by VCS and the example scripts. Download the SupportPac onto each of the
cluster nodes into a temporary directory, uncompress and untar it, and then copy the files from the vcs
subdirectory to the /opt/VRTSvcs/bin/MQM directory.

Ensure that all the methods and utility scripts are executable, by issuing:

chmod +x online offline monitor clean ha* explain

The agent methods are written in perl. You need to copy or link the ScriptAgent binary (supplied as part of
VCS) into the MQM agent directory, as follows:

cp /opt/VRTSvcs/bin/ScriptAgent /opt/VRTSvcs/bin/MQM/MQMAgent

The MQM resource type needs to be added to the cluster configuration file. This can be done using the
VCS GUI or ha* commands while the cluster is running, or by editing the types.cf file with the cluster
stopped. If you choose to do this by editing the types.cf file, stop the cluster and edit
/etc/VRTSvcs/conf/config/types.cf file by appending the MQM type definition shown in Appendix A. For
convenience, this definition can be copied directly from the types.MQM file. This sets the
OnlineWaitLimit, OfflineTimeout and LogLevel attributes of the resource type to recommended values.
See Appendix A for more details.

11
MC91: High Availability for WebSphere MQ

Configure and restart the cluster and check that the new resource type is recognized correctly by issuing the
following command:

hatype -display MQM

12
MC91: High Availability for WebSphere MQ

Configuration
All HA products have the concept of a unit of failover. This is a set of definitions that contains all the
processes and resources needed to deliver a highly available service and ideally should contain only those
processes and resources. This approach maximises the independence of each service, providing flexibility
and minimising disruption during a failure or planned maintenance.

In HACMP, the unit of failover is called a resource group. On other HA products the name might be
different, but the concept is the same. On VCS, it is known as a service group, and on MC/ServiceGuard it
is a package.

The smallest unit of failover for WMQ is a queue manager, since you cannot move part of a queue manager
without moving the whole thing. It follows that the optimal configuration is to place each queue manager in
a separate resource group, with the resources upon which it depends. The resource group should therefore
contain the shared disks used by a queue manager, which should be in a volume group or disk group
reserved exclusively for the resource group, the IP address used to connect to the queue manager (the
service address) and an object which represents the queue manager.

You can put multiple queue managers into the same resource group, but if you do so they all have to
failover to another node together, even if the problem causing the failover is confined to one queue
manager. This causes unnecessary disruption to applications using the other queue managers.

HACMP/ES users who wish to use application monitoring should also note the restriction that only one
Application Server in a resource group can be monitored. If you were to place multiple queue managers
into the same group and wanted to monitor all of them, you would need to write a monitor capable of
monitoring multiple queue managers.

It is assumed that if mirroring or RAID are used to provide protection from disk failures then references in
the following text to physical disks should be taken to mean the disk or group of disks that are being used
to store the data being described.

A queue manager that is to be used in an HA cluster needs to have its recovery logs and data on shared
disks, so that they can be accessed by a surviving node in the event of a node failure. A node running a
queue manager must also maintain a number of files on non-shared disks. These files include files that
relate to all queue managers on the node, such as /var/mqm/mqs.ini, and queue manager specific files that
are used to generate internal control information. Files related to a queue manager are therefore divided
between local/private and shared disks.

Regarding the queue manager files that are stored on shared disk it would, in principle, be possible to use a
single shared disk for all the recovery data (logs and data) related to a queue manager. However, for
optimal performance, it is recommended practice to place logs and data in separate filesystems such that
they can be separately tuned for disk I/O. The example scripts included in this SupportPac use separate
filesystems. The layout is described in “Step 2. Configure the Shared Disks”.

If the HA cluster will contain multiple queue managers then, depending on your chosen cluster
configuration, two or more queue managers may need to run on the same node, after a takeover. To provide
correct routing of WMQ channel traffic to the queue managers, you should use a different TCP/IP port
number for each queue manager. The standard WMQ port is 1414. It is common practice to use a range of
port numbers immediately above 1414 for additional queue managers. Note that whichever port number
you assign to a queue manager, that port needs to be consistently defined on all cluster nodes that may host
the queue manager, and all channels to that queue manager need to refer to the port.

When configuring a listener for incoming WMQ connections you can choose between inetd and runmqlsr.
If you use inetd then you do not need to perform any start or stop of the listener from within the cluster
scripts. If you use runmqlsr then you must configure the node so that the listener is started along with the

13
MC91: High Availability for WebSphere MQ

queue manager. This can be done on HACMP and MC/ServiceGuard by a user exit called by the start
scripts in this SupportPac. The preferred way of starting the channel listener from WMQ V6 onwards is to
set up a Listener object which automatically starts the service along with the queue manager; this removes
the need for any additional configuration in the start scripts.

The example scripts and utilities provided in the SupportPac and the descriptions of the configuration steps
deal with one queue manager at a time. For additional queue managers, repeat Steps 2 through 8.

Step 1. Configure the HA Cluster


An initial configuration is straightforward and should present no difficulties for a trained implementer
using the product documentation:

For HACMP:
• Configure TCP/IP on the cluster nodes for HACMP. Remember to configure ~root/.rhosts,
/etc/rc.net, etc.

• Configure the cluster, cluster nodes and adapters to HACMP as usual.

• Synchronise the Cluster Topology.

For VCS:
• Configuration of the VCS cluster should be performed as described in the VCS documentation.

• Create a cluster and configure the networks and systems as usual.

For MC/ServiceGuard:
• Create and configure the template ASCII package configuration file:

o Set the PACKAGE_NAME


o Set the NODE_NAME
o Set the RUN_SCRIPT
o Set the HALT_SCRIPT
o Set the SERVICE_NAME
o Set the SUBNET
• Create and configure the template package control script:

o Set VG, LV and IP


o Set the SUBNET
o Set the SERVICE_NAME
• Set up the customer_defined_run_cmds function to use the supplied hamqm_start script,

• Set up the customer_defined_stop_cmds function to use the supplied hamqm_stop script.

• Disable Node Fail Fast. This causes the machine to panic and is only necessary if WMQ services
are so tightly integrated with other services on the node that it is impractical to failover WMQ
independently.

• Enable Package Switching so that packages can be moved between nodes

14
MC91: High Availability for WebSphere MQ

For all:
Once the initial configuration has been performed, test that the basic cluster is working - for example, that
you can create a filesystem and that it can be moved from one node to another and that the filesystems
mount correctly on each node.

Step 2. Configure the shared disks


This step creates the volume group (or disk group) and filesystems needed for the queue manager. The
suggested layout is based on the advice earlier that each queue manager should be put into a separate
resource group. You should perform this step and the subsequent steps for each queue manager that you
wish to make highly available.

So that this queue manager can be moved from one node to another without disrupting any other queue
managers, you should designate a group containing shared disks which is used exclusively by this queue
manager and no others.

For performance, it is recommended that a queue manager uses separate filesystems for logs and data. The
suggested layout therefore creates two filesystems within the volume group.

If you are using Veritas Volume Manager (VxVM) to control disks, you do not require the optional VxVM
cluster feature that allows concurrent access to a shared disk by multiple systems. This capability is not
needed for a failover service such as WMQ and it is recommended that a queue manager’s data or log
filesystems are stored on disks that are not concurrently accessible from multiple nodes.

You can optionally protect each of the filesystems from disk failures by using mirroring or RAID, this is
not shown in the suggested layout.

Mount points must all be owned by the mqm user.

You will need the following filesystems:

Per node:
/var on a local non-shared disk - this is a standard filesystem or directory which will already exist. You
only need one of these per node regardless of the number of queue managers that the node may host. It is
important that all queue managers that may run on this node use one filesystem for some of their internal
control information and the example scripts designate /var/mqm for this purpose. With the suggested
configuration, not much WMQ data is stored in /var, so it should not need to be extended.

Even with a simple active/passive setup it is still recommended that you have independent mounted
filesystems for queue manager data and logs, with /var/mqm continuing to be a node-specific directory, as
applying maintenance to the software sometimes requires access to /var/mqm. Installing updates to a
“passive” or standby node would not be possible if the whole directory is only accessible to the active node.
This configuration also ensures that there is a per-node copy of mqs.ini which will be updated on standby
nodes by the halinkmqm script.

Per queue manager:


/MQHA/<qmgr>/data on shared disks - this is where the queue manager data directory will reside.
/MQHA is the top level directory used in the example scripts.

/MQHA/<qmgr>/log on shared disks - this is where the queue manager recovery logs will reside.

15
MC91: High Availability for WebSphere MQ

The filesystems are shown on the following diagram. The subdirectories and symlinks are all created
automatically in the next step. The diagram does not show ALL of the necessary symlinks but is an
indication of the structure. You only need to create the filesystems that are on shared disk (e.g
/MQHA/ha.csq1/data and /MQHA/ha.csq1/log), then proceed to Step 3.

Filesystem organisation
This diagram shows the filesystem organisation for a single queue manager, called ha.csq1

@ipcc/isem
Filesystem: /var
ipc ha!csq1 isem
Filesystem: /var
/var mqm esem
Same as for Node A
internal
ha!csq1 disks
internal qmgrs
disks
Node A Node B

Filesystem: /MQHA/ha.csq1/data
@ipcc/isem
data qmgrs ha!csq1 isem
/MQHA ha.csq1
esem
log ha!csq1

Filesystem: /MQHA/ha.csq1/log

= subdirectory
= symlink

shared disks = note

For HACMP:
1. Create the volume group that will be used for this queue manager’s data and log files.

2. Create the /MQHA/<qmgr>/data and /MQHA/<qmgr>/log filesystems using the volume


group created above.

3. For each node in turn, import the volume group, vary it on, ensure that the filesystems can be
mounted, unmount the filesystems and varyoff the volume group.

For VCS:
1. Create the disk group that will be used for this queue manager's data and log files, specifying
the nodes that may host the queue manager. Add sufficient disks to the disk group to support
the creation of volumes described below.

2. Create volumes within the disk group to support the creation of the /MQHA/<qmgr>/data and
/MQHA/<qmgr>/log filesystems.

3. For each node in turn, create the mount points for the filesystems, import the disk group
(temporarily), ensure that the filesystems can be mounted, unmount the filesystems and deport
the disk group.

16
MC91: High Availability for WebSphere MQ

For MC/ServiceGuard:
1. Create the volume group that will be used for this queue manager's data and log files (e.g.
/dev/vg01).

2. Create a logical volume in this volume group (e.g. /dev/vg01/<qmgr>).

3. Mount the logical volume to be shared at /MQHA/<qmgr>.

4. Create the /MQHA/<qmgr>/data and /MQHA/<qmgr>/log filesystems using the volume


group created above.

5. Unmount /MQHA/<qmgr>.

6. Issue a vgchange -a n /dev/vg01/<qmgr>.

7. Issue a vgchange -c y /dev/vg01/<qmgr>.

8. Issue a vgexport -m /tmp/mq.map -s -p -v /dev/vg01.

9. Copy the mq.map file created in 8 above to /tmp on the adoptive node.

10. On the adoptive node create the same logical volume and volume group as in steps 1 and 2
above.

11. Issue a vgimport -m /tmp/mq.map -s -v /dev/vg01

12. Mount the volume group on the adoptive node to check the configuration is correct

Step 3. Create the Queue Manager


Select a node on which to create the queue manager. It does not matter on which node you do this; any of
the nodes that might host the queue manager can be used.

When you create the queue manager, it is strongly advised that you should use the hacrtmqm script
included in the SupportPac. It is possible to create the queue manager manually, but using hacrtmqm will
save a lot of effort. For example, hacrtmqm moves and relinks some subdirectories and for HACMP creates
an HACMP/ES Application Monitor for the queue manager. The move and relink of these subdirectories is
to ensure smooth coexistence of queue managers which may run on the same node.

1. Select a node on which to perform the following actions

2. Ensure the queue manager’s filesystems are mounted on the selected node.

3. Create the queue manager on this node, using the hacrtmqm script

4. Start the queue manager manually, using the strmqm command

5. Create any queues and channels

6. Test the queue manager

7. End the queue manager manually, using endmqm

8. On the other nodes, which may takeover the queue manager, run the halinkmqm script

17
MC91: High Availability for WebSphere MQ

Step 4. Configure the movable resources


The queue manager has been created and the standby/takeover nodes have been updated. You now need to
define a resource or service group which will contain the queue manager and all its associated resources.

For HACMP:
The resource group can be either cascading or rotating. Whichever you choose, remember that the resource
group will use the IP address as the service label. This is the address which clients and channels will use to
connect to the queue manager.

If you choose cascading, it is recommended that you consider disabling the automatic fallback facility by
setting Cascading Without Fallback to true. This is to avoid the interruption to the queue manager which
would be caused by the reintegration of the top priority node after a failure. Unless you have a specific
requirement which would make automatic fallback desirable in your configuration, then it is probably
better to manually move the queue manager resource group back to the preferred node when it will cause
minimum disruption.

1. Create a resource group and select the type as discussed above.

2. Configure the resource group in the usual way adding the service IP label, volume group and
filesystem resources to the resource group.

3. Synchronise the cluster resources.

4. Start HACMP on each cluster node in turn and ensure that the cluster stabilizes, that the
respective volume groups are varied on by each node and that the filesystems are mounted
correctly.

For VCS:
The service group will contain the queue manager resource and the disk group and IP address for the queue
manager. The IP address is the one which clients and channels will use to connect to the queue manager.

Set up the SystemList attribute for the service group.

Because a queue manager can only run on one node at a time, the service group will be a failover group,
which is the default setting (0) of the Parallel attribute.

You may wish to consider what settings you would prefer for the OnlineRetryLimit, OnlineRetryInterval,
FailoverPolicy, AutoStart, AutoRestart and AutoFailover attributes.

1. Create a service group with the properties discussed or chosen above.

2. Add the disk group and IP address to the service group.

3. Ensure that the service group can be switched to each of the nodes in the SystemList and that
on each node the filesystems created earlier are successfully mounted.

4. Verify that the service group behaves as you would expect for your chosen settings of
attributes that control retries, failovers and automatic start or restart.

For MC/ServiceGuard:
The following steps show how to configure a cluster under MC/ServiceGuard and how to configure nodes
into the cluster. This information was supplied by Hewlett Packard and you should consult your

18
MC91: High Availability for WebSphere MQ

MC/ServiceGuard documentation for a full explanation of the commands. The example commands are to
set up a cluster of 2 nodes called ptaca2 and ptaca3. Examples of the files mentioned below are contained
in the appendices of the Hewlett Packard documentation.

To configure the cluster:

1. Create the ascii template file:

cmquerycl -v -C /etc/cmcluster/cluster.ascii -n ptaca2 -n ptaca3


2. Modify this template to reflect the environment and then verify the cluster configuration:

cmcheckconf -v -C /etc/cmcluster/cluster.ascii
3. Apply the configuration file, this creates the cluster and automatically distributes the
“cmclconfig” file throughout the cluster:
cmapplyconf -v -C /etc/cmcluster/cluster.ascii
4. Start and stop the cluster to check that the above has worked.
cmruncl -v -n ptaca2 -n ptaca3
cmviewcl -v
cmhalt -f -v
cmruncl -n ptaca2 -n ptaca3
To configure the MC/ServiceGuard package called mq1 on the first node:

1. Create and tailor the mq1 package for your environment:


cd /etc/cmcluster
mkdir mq1
cmmakepkg -s mq1.conf
2. Edit the mq1.conf file to reflect your environment.
3. Change into the mq1 directory created above.
4. Issue the following command:
cmmakepkg -s mq1.cntl
5. Shutdown the cluster:
cmhaltcl -f -v
6. Distribute the configuration files:
cmapplyconf -v -C /etc/cmcluster/cluster.ascii -P /etc/cmcluster/mq1/mq1.conf
To test the cluster and package startup:

1. Shutdown all queue managers (if any are running).


2. Unmount all logical volumes in the volume group you created earlier (e.g. /dev/vg01)
3. Deactivate the volume group
4. Start the cluster:
cmruncl
5. Check that the package has started:
cmviewcl -v

19
MC91: High Availability for WebSphere MQ

To assign the dynamic IP address of the package:

1. Halt the package:


cmhaltpkg mq1
2. Edit the mq1.cntl script to add the package’s IP address.
3. Restart the package:
cmrunpkg -v mq1
4. Check the package has started and has clients:
cmviewcl -v
To add the second node to the cluster:

1. Edit the mq1.conf file and add the following line:


NODE_NAME ptaca2
2. Apply the new configuration:
cmapplyconf -v -C /etc/cmcluster/cluster.ascii -P mq1.ascii
3. Halt the cluster:
cmhaltcl -f -v
4. Restart the cluster:
cmruncl -v
To test package switching:

1. Halt the mq1 package:


cmhaltpkg mq1
2. Start the mq1 package on the node ptaca3:
cmrunpkg -n ptaca3 mq1
3. Enable package switching for the mq1 package on ptaca3:
cmmodpkg -e mq1
4. Halt the mq1 package:
cmhaltpkg mq1
5. Start the mq1 package on the node ptaca2:
cmrunpkg -n ptaca2 mq1
6. Enable package switching for the mq1 package on ptaca2:
cmmodpkg -e mq1

Step 5. Configure the Application Server or Agent


The queue manager is represented within the resource group by an application server or agent. The
SupportPac includes example server start and stop methods which allow the HA products to start and end a
queue manager, in response to cluster commands or cluster events.

For HACMP and MC/ServiceGuard the hamqm_start, hamqm_stop and hamqm_applmon programs are ksh
scripts. For VCS, similar function is provided by the online, offline, monitor and clean perl programs.

20
MC91: High Availability for WebSphere MQ

The start and stop scripts allow you to specify a user exit to be invoked just after the queue manager is
brought online or just before it is taken offline. Use of the user exit is optional. The purpose of the user exit
is to allow you to start or stop additional processes following the start of a queue manager or just before
ending it. For example, you may wish to start a listener, a trigger monitor or a command server. Note that
WMQ V6 allows the queue manager to start these services and any arbitrary user program automatically,
which might make use of the user exit redundant.

For HACMP:
1. Define an application server which will start and stop the queue manager. The start and stop
scripts contained in the SupportPac may be used unmodified, or may be used as a basis from
which you can develop customized scripts. The examples are called hamqm_start and
hamqm_stop.

2. Add the application server to the resource group definition created in the previous step.

3. Optionally, create a user exit in /MQHA/bin/rc.local

4. Synchronise the cluster configuration.

5. Test that the node can start and stop the queue manager, by bringing the resource group online
and offline.

For VCS:
The agent, which is called MQM, allows VCS to monitor and control the queue manager, in response to
cluster commands or cluster events. You could use the example agent exactly as it is, or you could use it as
a guide to develop your own customised agent by writing a set of scripts which implement the agent
interface. The example agent allows you to create multiple resources of resource type MQM, either in the
same or different service groups.

All the methods operate on the queue manager with the same name as the resource to which the operation is
being applied. The resource name is the first parameter in the ArgList passed to each method.

The online and offline methods are robust in that they don't assume anything about the state of the queue
manager on entry. The offline method uses the OfflineTimeout resource type attribute to determine how
quickly it needs to operate and attempts initially to gracefully shutdown the queue manager, attempting
more severe means of stopping it if it would run out of time. When the offline method is invoked by the
cluster, it will issue an immediate stop of the queue manager (endmqm -i) and will allow just less than half
the value of OfflineTimeout seconds for it to complete. If it does not complete within that time, a
preemptive stop (endmqm -p) is issued, and a similar time is allowed. If the queue manager still hasn't
stopped within that time, then the offline method terminates the queue manager forcefully. The amount of
time allowed for each of the immediate and preemptive stops is just under half of the OfflineTimeout
attribute configured for the MQM resource type. The reason it is slightly less than half is that the agent
reserves a little time in case it needs to run the abrupt termination. By the expiry of OfflineTimeout, the
offline method will have brought the queue manager to a stop, forcefully if necessary. It is better if the
queue manager can be shutdown gracefully, because a clean shutdown will lead to a faster restart.

You could modify the online and offline scripts to include additional startup or shutdown tasks to be
performed just before or after an online or offline transition. Alternatively you could configure VCS
triggers to perform such actions. Either of these could be used to allow you to start or stop additional
processes following the start of a queue manager or just before ending it. For example, you may wish to
start a listener, a trigger monitor or a command server. You may also want to start some other application
that uses the queue manager. You may wish to send a notification to an application, a monitoring system,
or a human administrator. Again, remember that WMQ V6 allows the queue manager to start services
including any arbitrary user program automatically,

21
MC91: High Availability for WebSphere MQ

The MQM type should already have been added to the types.cf file, in the previous section on installation.
Resources of type MQM, which represent individual queue managers, are configured by editing the main.cf
configuration file. They could also be added using the VCS GUI or the ha* commands. An example of a
main.cf entry for a resource of type MQM is included in Appendix A of this document.

1. Add a resource entry into the /etc/VRTSvcs/conf/config/main.cf file. See Appendix A for an
example of a complete main.cf file. Set the resource attributes to your preferred values.

2. Create resource dependencies between the queue manager resource and the filesystems and IP
address. The main.cf in Appendix A provides an example.

3. Start the service group and check that it starts the queue manager successfully. You can test
the queue manager by using runmqsc and inspecting its queues.

Stop the service group and check that it stops the queue manager.

For MC/ServiceGuard:
To define the start command so as it can be run as the user mqm under MC/ServiceGuard control, create a
wrapper function in the package control script (/etc/cmcluster/mq1/mq1.cntl) that contains the following
line:

su mqm -c “/MQHA/bin/hamqm_start_su $qmgr”

To define the stop command so that can be run as the user mqm under Service Guard control , create a
wrapper function in the package control script (/etc/cmcluster/mq1/mq1.cntl) that contains the following
line:

su mqm -c “/MQHA/bin/hamqm_stop_su $qmgr 30”

Step 6. Configure a monitor


For HACMP:
If you are using HACMP/ES then you can configure an application monitor which will monitor the health
of the queue manager and trigger recovery actions as a result of MQ failures, not just node or network
failures. Recovery actions include the ability to perform local restarts of the queue manager (see below) or
to cause a failover of the resource group to another node.

To benefit from queue manager monitoring you must define an Application Monitor. If you created the
queue manager using hacrtmqm then one of these will have been created for you, in the /MQHA/bin
directory, and is called hamqm_applmon.$qmgr.

The example application monitor determines whether the queue manager is still starting or whether it
considers itself to be fully started. If the queue manager is still starting then the application monitor will
allow it to complete its startup processing. This is important because the startup time of the queue manager
can vary, depending on how much log replay needs to be performed. From WebSphere MQ V6, the queue
manager issues messages to the console during startup giving an indication of its progress through the
replay phase.

If the application monitor only tested whether the queue manager was running, then it would be difficult to
choose a stabilisation interval that was both short enough to allow sensitive monitoring and long enough to
cater for the cases where there is a lot of log replay to perform. There would be a risk that a genuine failure
could go undetected, or that a valid startup was abandoned during log replay. You may wish to incorporate
similar monitoring behaviour if you decided to write your own application monitor.

22
MC91: High Availability for WebSphere MQ

If you are using HACMP/ES, and have configured the application monitor, as described above, then the
recovery actions that you can configure include the ability to perform local restarts of the queue manager.
HACMP will attempt local restart attempts up to a maximum number of attempts within a specified period.
The maximum attempts threshold and the period are configurable. For WMQ, it is recommended that the
threshold is set to 1 so that only one restart is attempted, and that the time period is set to a small multiple
of the expected start time for the queue manager. With these settings, if successive restarts fail without a
significant period of stability between, then the queue manager resource group will be moved to a different
node. Attempting more restarts on a node on which a restart has just failed is unlikely to succeed.

1. To enable queue manager monitoring, define a custom application monitor for the Application
Server created in Step 5, providing the name of the monitor script and tell HACMP how
frequently to invoke it. Set the stabilisation interval to 10 seconds, unless your queue manager
is expected to take a long time to restart. This would normally be if your environment has
long-running transactions that might cause a substantial amount of recovery/replay to be
required.

2. To configure for local restarts, specify the Restart Count and Restart Interval.

3. Synchronise the cluster resources.

4. Test the operation of the application monitoring, and in particular verify that the local restart
capability is working as configured. A convenient way to provoke queue manager failures is
to identify the Execution Controller process (called amqzxma0) associated with the queue
manager, and kill it.

For VCS:
No special steps are needed for VCS as the monitoring process is automatically invoked as needed.

For MC/ServiceGuard:
Create a monitoring script file for use by MC/ServiceGuard. This script can initially be a renamed copy of
the /MQHA/bin/hamqm_applmon_su script supplied with this SupportPac, which checks the health of a
named queue manager using the PING QMGR command. You may wish to add extra features to your copy
of this script to check that other processes essential to your environment are functioning correctly.

If you wish to have one common script for all queue managers under MC/ServiceGuard control then they
can all use the same copy of the script, however if you wish the monitoring of different queue managers to
check different things then it is suggested that you call your script $qmgr.mon after the queue manager it is
used for. The monitoring script should ideally be run regularly but should also be a short lived process
doing a quick check or checks and then disappearing to avoid slowing down WMQ and the node it runs on.

23
MC91: High Availability for WebSphere MQ

The monitoring script is then called from the package control script for the queue manager
(/etc/cmcluster/mq1/mq1.cntl) . This is achieved by adding the following lines to mq1.cntl file created
during Service Guard configuration:

SERVICE_NAME[n]=mq1
SERVICE_COMMAND[n]=”su mqm -c \”/etc/cmcluster/mq1/mq1.mon QMA\””

where
mq1.mon is a renamed version of hamqm_applmon_script
mq1 is the name of the package being monitored
QMA is the name of the queue manager to monitor
n is the number of the service being monitored.

Step 7. Removal of Queue Manager from the Cluster


Should you decide to remove the queue manager from the cluster, it is sufficient to remove the application
server (and application monitor, if configured) from the HA configuration. You may also decide to delete
the resource group. This does not destroy the queue manager, which will continue to function normally, but
under manual control.

Once the queue manager has been removed from the HA configuration, it will not be highly available, and
will remain on one node. Other nodes will still remember the queue manager and you may wish to tidy up
the other nodes. Refer to Step 8 for details.

Similar considerations apply to the IP address used by the queue manager. This is easiest if the mapping of
queue managers to disk groups, network interfaces and service groups is simple and each disk group and
service group is used exclusively by one queue manager.

For HACMP:
1. Delete the application monitor, if configured

2. Delete the application server

3. Remove the filesystem, service label and volume group resources from the resource group.

4. Synchronise the cluster resources configuration.

For VCS:
1. Stop the resource, by taking it offline.

2. Delete the resource from the cluster by using the VCS GUI, VCS ha* commands or by editing
the main.cf configuration file.

3. If you wish to keep the queue manager, either destroy the service group if you have no further
use for it, or modify it by removing the disk group and public network interface used by the
queue manager, provided they are not also used by any other services or applications.

For MC/ServiceGuard:
1. Delete the application server.

2. Delete the monitoring script.

24
MC91: High Availability for WebSphere MQ

3. Remove the filesystem, service label and volume group resources from the package.

Step 8. Deletion of Queue Manager


If you decide to delete the queue manager, then you should first remove it from the cluster configuration, as
described in the previous step. Then, to delete the QM, perform the following actions.

1. Make sure the queue manager is stopped, by issuing the endmqm command.

2. On the node which currently has the queue manager’s shared disks and has the queue
manager’s filesystems mounted, run the hadltmqm script provided in the SupportPac.

3. You can now destroy the filesystems /MQHA/<qmgr>/data and /MQHA/<qmgr>/log.

4. You can also destroy the volume group.

5. On each of the other nodes in the cluster,

a. Run the hadltmqm command as above, which will clean up the subdirectories related to
the queue manager.

b. Manually remove the queue manager stanza from the /var/mqm/mqs.ini file.

The queue manager has now been completely removed from the cluster and the nodes.

25
MC91: High Availability for WebSphere MQ

Upgrading WMQ software in a cluster


Applying maintenance
All nodes in a cluster should normally be running exactly the same version/release of the WMQ software.
Sometimes however it is necessary to apply updates, such as for service fixes or for new versions of the
product. This is best done by means of a “rolling upgrade”.

The principle of a rolling upgrade is to apply the new software to each node in turn, while continuing the
WMQ service on other nodes. Assuming a two-node active/active cluster, the steps are

1. Select one node to upgrade first

2. At a suitable time, when the moving of a queue manager will not cause a serious disruption to
service, manually force a migration of the active queue manager to its partner node

3. On the node that is now running both queue managers, disable the failover capabilities for the
queue managers.

4. Upgrade the software on the node that is not running any queue managers

5. Re-enable failover, and move both queue managers across to the newly upgraded node

6. Disable failover again

7. Upgrade the original box

8. Re-enable failover

9. When it will cause least disruption, move one of the queue managers across to balance the
workload

It should be obvious how to modify these steps for standby configurations and for “N+1” configurations.
The overriding rule that has to be observed is that once a queue manager has been running on a node with a
new level of software, it must not be transferred to a node running old software.

While there are times in this process when no failover is permitted, and service could thus be lost if a
failure occurred, these periods are comparatively short as the newer software is installed. More complex
HA configurations can minimise these windows.

26
MC91: High Availability for WebSphere MQ

Commands
This section gives details of the configuration commands used to set up and monitor queue managers.

hacrtmqm
Purpose
The hacrtmqm command creates the queue manager and ensures that its directories are arranged to allow
for HA operation.

This command makes use of two environment variables to determine where the data and log directories
should be created.

export MQHAFSDATA="/MQHA/<qmgr>/data"
export MQHAFSLOG="/MQHA/<qmgr>/log"

The invocation of the hacrtmqm command uses exactly the same parameters that you would normally use
for crtmqm. You do not need to set MQSPREFIX or specify the -ld parameter for the log directory as these
are both handled automatically by hacrtmqm.

Note: You must be root to run the hacrtmqm command.

Syntax

hacrtmqm <crtmqm parameters>

Parameters
• crtmqm parameters are exactly the same as for the regular WMQ crtmqm command

Example

# export MQHAFSDATA="/MQHA/ha.csq1/data"
# export MQHAFSLOG="/MQHA/ha.csq1/log"

# hacrtmqm -c "Highly available queue manager" ha.csq1

27
MC91: High Availability for WebSphere MQ

halinkmqm
Purpose
Internally, hacrtmqm uses a script called halinkmqm to relink the subdirectories used for IPC keys and
create a symbolic link from /var/mqm/qmgrs/<qmgr> to the /MQHA/<qmgr>/data/qmgrs/<qmgr>
directory.

As shown at the end of hacrtmqm, must run halinkmqm on the remaining cluster nodes which will act as
standby nodes for this queue manager. Do not run halinkmqm on the node on which you created the queue
manager with hacrtmqm - it has already been run there.

The halinkmqm command creates the necessary links and inserts a stanza for the queue manager into the
mqs.ini file.

For HACMP, the halinkmqm command is also responsible for creating an HACMP/ES Application
Monitor on the standby/takeover nodes.

Note 1: You must be the mqm user or in the mqm group to run this command.

Note 2: The “mangled” queue manager directory might include a “!” character. With some shells, in
particular if you are using bash, then this is considered a special character (like the “*” and “?” characters
are) and might be expanded to unwanted values. To avoid this, make sure the parameter is enclosed in
quotes on the command line as this will inhibit shell expansion.

Syntax
halinkmqm <qmgr name> <mangled qmgr directory> <qmgr data directory>

Parameters
• qmgr name - The name of the queue manager as you specified it to hacrtmqm (e.g. ha.csq1)
• mangled qmgr directory - The name of the directory under /var/mqm/qmgrs/ which closely
resembles the qmgr name.
• qmgr data directory - The directory you selected for the queue manager data, and to which you set
MQHAFSDATA before issuing hacrtmqm. (e.g. /MQHA/ha.csq1/data).

Example
$ halinkmqm ha.csq1 ha!csq1 /MQHA/ha.csq1/data

28
MC91: High Availability for WebSphere MQ

hadltmqm command
Purpose
The hadltmqm command deletes a queue manager. This destroys its log files, and control files and on the
owning node only will remove the definition of the queue manager from the /var/mqm/mqs.ini file. This is
similar to the behaviour of the dltmqm command, which the hadltmqm command uses internally. The
hadltmqm command deletes the symbolic links used to locate the IPC subdirectories and deletes the
subdirectories themselves.

Note: You must be the mqm user or in the mqm group to run this command.

Syntax
hadltmqm <qmgr name>

Parameters
• qmgr name - the name of the queue manager to be deleted

29
MC91: High Availability for WebSphere MQ

hamqm_start
Purpose
This script is robust in that it does not assume anything about the state of the queue manager on entry and
will forcefully kill any existing processes that might be associated with the queue manager, to ensure a
clean start.

One error that can occur on restart of a queue manager (especially if it is an attempt to restart on the same
node, instead of failing over to a different node) is that previously-connected application programs may not
have disconnected. If that happens, the queue manager will not restart, will return error 24, and will show
message AMQ8041. It might be appropriate for you to modify the supplied scripts to parse the output of
this message and kill the offending application programs. This is not part of the default scripts, as it could
be considered disruptive.

Syntax
/MQHA/bin/hamqm_start <qmgr>

Parameters
• qmgr - the name of the queue manager to be started

Example
/MQHA/bin/hamqm_start ha.csq1

30
MC91: High Availability for WebSphere MQ

hamqm_stop
Purpose
The stop script attempts to shutdown the queue manager gracefully. If the stop command is defined in this
way, then when it is invoked, it will issue an immediate stop of the queue manager (endmqm -i) and will
allow a defined time for it to complete. If it does not complete within that time, a pre-emptive stop
(endmqm -p) is issued, and up a further delay is allowed. If the queue manager still hasn't stopped within
that time, then this command terminates the queue manager forcefully. It is clearly better if the queue
manager can be shutdown gracefully, because a clean shutdown will lead to a faster restart and is less
disruptive to clients and applications.

Syntax
/MQHA/bin/hamqm_stop <qmgr> <timeout>

Parameters
• qmgr - the name of the queue manager to be stopped

• timeout - the time in seconds to use on each of the levels of severity of stop

Example
/MQHA/bin/hamqm_stop ha.csq1 30

When the stop script is called, part of the processing is to forcefully kill all of the processes associated with
the queue manager if they do not stop properly. In previous versions of the HA SupportPacs, the list of
processes was hardcoded in the stop or restart scripts. For this version, the list of processes is in an external
file called hamqproc. As shipped, the list includes all previous and current known internal processes from
WMQ. The order in which the processes are killed is not especially important: if the queue manager has not
ended cleanly, there will probably be FDC files created regardless of the sequence of the kill operations.

The list does not include external commands such as user applications or trigger monitors, but you might
choose to add such commands to the file. Processing of this file requires that the queue manager name is
one of the visible parameters to any additional commands.

31
MC91: High Availability for WebSphere MQ

/MQHA/bin/rc.local
Purpose
For HACMP and MC/ServiceGuard, if you want to make use of the user exit capability, create a script
called /MQHA/bin/rc.local. The script can contain anything you like. Ensure that the rc.local script is
executable. The start and stop scripts will invoke the rc.local script as user "mqm". It is recommended that
you test the exit by invoking it manually, before putting it under cluster control.

Syntax
/MQHA/bin/rc.local <qmgr> <phase>

Parameters
• qmgr - the name of the queue manager which this invocation relates to
• phase – either of the strings "pre_offline" or "post_online", to indicate what is about to happen or
what has just happened.
The example start and stop scripts are written such that this script is invoked asynchronously (ie in the
background). This is a conservative policy that aims to reduce the likelihood that an errant script could
delay, possibly indefinitely, other cluster operations which the node needs to perform. The asynchronous
invocation policy does have the disadvantage that the exit script cannot make any assumptions about the
state of the queue manager, since it may change immediately after the script is invoked. The script should
therefore be written in a robust style.

32
MC91: High Availability for WebSphere MQ

Working with other HA products


There are many other HA products that could be used to control the operation of a WebSphere MQ system
on Unix and Linux systems. They all appear to operate in essentially the same way, with methods to start,
stop and monitor arbitrary resources. The scripts in this SupportPac have been used as the basis of similar
scripts for some other HA products.

Creation and deletion of queue managers are likely to be identical regardless of the HA product. You can
take the hacrtmqm, halinkmqm and hadltmqm scripts and probably use them unchanged, for the platform
you are using.

The scripts provided in this SupportPac are separated into directories for the specific pairing of HA product
and operating system on which they were originally implemented. But for other combinations, you should
select the appropriate mix. For example, if you want to use VCS on AIX, you should use the scripts from
the hacmp directory for creating, relinking, and deleting the queue manager. For Linux, the scripts that are
used for MC/ServiceGuard are probably the best starting point – this is because WMQ for Linux systems
continues to use the “ssem” subdirectories which are no longer used on AIX and Solaris.

More differences occur in the runtime aspects of HA, but this tends to simply be a matter of the “style”. For
example, does the HA product issue the periodic monitoring calls, or does the “WMQ agent” have to do it
itself; what is the natural language for the scripts to be written in; which return codes indicate errors; how
do you issue text messages for error conditions; how are dependencies configured? When you can answer
these questions, you should be able to take one or other of these sets of scripts and modify them to fit.

Related products
There are SupportPacs available for configuring WebSphere Message Broker in Highly Available
configurations. They follow the same model as this one, and recommend that the broker’s underlying queue
manager is put into an HA solution using this SupportPac.

33
MC91: High Availability for WebSphere MQ

Suggested Test
You may find that the following tests are helpful when determining whether your configuration is working
as intended.

Create a queue manager, e.g. QM1, and put it under HA control, as defined in the preceding chapters.

Start the QM1 queue manager and use runmqsc to define the following objects:

**************************************************************
* *
* Create the queues for QMgr (QM1) clustered QMgr *
* *
**************************************************************

**************************************************************
* Define the outbound xmit queue *
**************************************************************
DEFINE QLOCAL(XMITQ1) +
USAGE(XMITQ) +
DEFPSIST(YES) +
TRIGGER +
INITQ(SYSTEM.CHANNEL.INITQ) +
REPLACE

**************************************************************
* Define the inbound/outbound message queue *
**************************************************************
DEFINE QREMOTE(QM1_INBOUND_Q) +
RNAME(QM2_INBOUND_Q) +
RQMNAME(QM2) +
XMITQ(XMITQ1) +
DEFPSIST(YES) +
REPLACE

**************************************************************
* Define the channels between QM1 <-> QM2 *
**************************************************************

* Channel 1
DEFINE CHANNEL(QM2_SDR.TO.QM1_RCV) +
CHLTYPE(RCVR) +
TRPTYPE(TCP) +
HBINT(30) +
REPLACE

* Channel 2
DEFINE CHANNEL(QM1_SDR.TO.QM2_RCV) +
CHLTYPE(SDR) +
TRPTYPE(TCP) +
CONNAME(QM2_ip_address) +
XMITQ(XMITQ1) +
HBINT(30) +
REPLACE

34
MC91: High Availability for WebSphere MQ

Create another “out of cluster” queue manager, e.g. QM2, start it and create the following objects:

**************************************************************
* *
* Create the queues for "out-of-cluster" QMgr QM2 *
* *
**************************************************************

**************************************************************
* Define the inbound message queue *
**************************************************************
DEFINE QLOCAL(QM2_INBOUND_Q) +
DEFPSIST(YES) +
REPLACE

**************************************************************
* Define the outbound xmit queue *
**************************************************************
DEFINE QLOCAL(XMITQ1) +
USAGE(XMITQ) +
DEFPSIST(YES) +
INITQ(SYSTEM.CHANNEL.INITQ) +
REPLACE

**************************************************************
* Define the outbound message queue *
**************************************************************
DEFINE QREMOTE(QM2_OUTBOUND_Q) +
RNAME(QM1_INBOUND_Q) +
RQMNAME(QM1) +
XMITQ(XMITQ1) +
DEFPSIST(YES) +
REPLACE

**************************************************************
* Define the channels between QM2 <-> QM1 *
**************************************************************

* Channel 1
DEFINE CHANNEL(QM2_SDR.TO.QM1_RCV) +
CHLTYPE(SDR) +
TRPTYPE(TCP) +
CONNAME(QM1_ip_address) +
XMITQ(XMITQ1) +
HBINT(30) +
REPLACE

* Channel 2
DEFINE CHANNEL(QM1_SDR.TO.QM2_RCV) +
CHLTYPE(RCVR) +
TRPTYPE(TCP) +
HBINT(30) +
REPLACE

35
MC91: High Availability for WebSphere MQ

On the node running QM2, run a script similar to the following, which will prime the transmission queue
with a number of persistent messages.

#!/bin/sh

# $1 controls size of message buffer


# Actual amount sent is between $1 and 2*$1 KBytes
#
# e.g. QM2_put 10
#

rm -f message_buffer

# Construct the set of messages to send


SIZE=`expr $1 \* 1000`
MSG=0
date >> message_buffer
while [ $MSG -le $SIZE ]
do
# double the previous buffer
cp message_buffer .previous
cat .previous .previous > message_buffer
MSG=`ls -l message_buffer | awk '{ print $5 }'`
done
echo "Putting $MSG Bytes onto outbound queue"
cat message_buffer | /usr/lpp/mqm/samp/bin/amqsput QM2_OUTBOUND_Q QM2

Each line of text in the message_buffer file will be sent as a WMQ message. Run initially with a small
number of messages. The script will prime the transmission queue. When the transmission queue is primed,
use runmqsc to start the QM2_SDR.TO.QM1_RCVR channel, and the messages will be transmitted to
QM1and then routed via QM1’s transmission queue back to QM2’s inbound queue.

When you are happy that your definitions are correct and you get end to end transmission of messages, run
the script again with a much large number of messages to be sent. When the transmission queue is primed,
note how many messages it contains.

Once again start the sender channel, but after only a few seconds of transmitting messages reset the node on
which QM1 is running so that it fails over to another cluster node. The QM1 queue manager should restart
on the takeover node and the channels should be restarted. All the messages should be received at QM2’s
inbound queue. You could use runmqsc to inspect the queue, or run the amqsget sample to retrieve the
messages into a file.

36
MC91: High Availability for WebSphere MQ

Appendix A. Sample Configuration Files for VCS


types.cf
The MQM resource type can be created by adding the following resource type definition to the types.cf file.
If you add the resource type in this way, make sure you have stopped the cluster and use hacf -verify to
check that the modified file is correct.

# Append the following type definition to types.cf

type MQM (
static int OfflineTimeout = 60
static str OnlineWaitLimit
static str LogLevel = error
static str ArgList[] = { QMName, UserID }
NameRule = resource.QMName
str QMName
str UserID = mqm
)

As well as creating the MQM resource type, this also sets the values of the following resource type
attributes:

• OfflineTimeout

The VCS default of 300 seconds is quite long for a queue manager, so the suggested value for this
attribute is 60 seconds. You can adjust this attribute to suit your own configuration, but it is
recommended that you do not set it any shorter than approximately 15 seconds.

• OnlineWaitLimit
It is recommended that you configure the OnlineWaitLimit for the MQM resource type. The default
setting is 2, but to accelerate detection of start failures, this attribute should be set to 0.

• LogLevel
It is recommended that you run the MQM agent with LogLevel set to ‘error’. This will display any
serious error conditions (in the VCS log). If you want more detail of what the MQM agent is doing,
then you can increase the LogLevel to ‘debug’ or ‘all’, but this will produce far more messages and is
not recommended for regular operation.

main.cf
A resource of type called MQM can be defined by adding a resource entry to the
/etc/VRTSvcs/conf/config/main.cf file. The following is a complete main.cf for a simple cluster (called
Kona) with two nodes (sunph1, sunph2) and one service group (vxg1) which includes resources for one
queue manager (VXQM1) with an IP address (resource name vxip1) and filesystems managed by Mount
resources (vxmnt1, vxmnt2) and a DiskGroup (resource name vxdg1).

Note that the file continues onto a second page.

37
MC91: High Availability for WebSphere MQ

include "types.cf"

cluster Kona (
UserNames = { admin = "cDRpdxPmHpzS." }
CounterInterval = 5
Factor = { runque = 5, memory = 1, disk = 10, cpu = 25,
network = 5 }
MaxFactor = { runque = 100, memory = 10, disk = 100, cpu = 100,
network = 100 }
)

system sunph1

system sunph2

snmp vcs (
TrapList = { 1 = "A new system has joined the VCS Cluster",
2 = "An existing system has changed its state",
3 = "A service group has changed its state",
4 = "One or more heartbeat links has gone down",
5 = "An HA service has done a manual restart",
6 = "An HA service has been manually idled",
7 = "An HA service has been successfully started" }
)

group vxg1 (
SystemList = { sunph1, sunph2 }
)

DiskGroup vxdg1 (
DiskGroup = vxdg1
)

IP vxip1 (
Device = hme0
Address = "9.20.110.247"
)

MQM VXQM1 (
QMName = VXQM1
)

Mount vxmnt1 (
MountPoint = "/MQHA/VXQM1/data"
BlockDevice = "/dev/vx/dsk/vxdg1/vxvol1"
FSType = vxfs
)

38
MC91: High Availability for WebSphere MQ

Mount vxmnt2 (
MountPoint = "/MQHA/VXQM1/log"
BlockDevice = "/dev/vx/dsk/vxdg1/vxvol2"
FSType = vxfs
)

NIC vxnic1 (
Device = hme0
NetworkType = ether
)

VXQM1 requires vxip1


VXQM1 requires vxmnt1
VXQM1 requires vxmnt2
vxip1 requires vxnic1
vxmnt1 requires vxdg1
vxmnt2 requires vxdg1

// resource dependency tree


//
// group vxg1
// {
// MQM VXQM1
// {
// IP vxip1
// {
// NIC vxnic1
// }
// Mount vxmnt1
// {
// DiskGroup vxdg1
// }
// Mount vxmnt2
// {
// DiskGroup vxdg1
// }
// }
// }

39
MC91: High Availability for WebSphere MQ

Appendix B. Messages produced by MQM agent


for VCS
The following is a numerically sorted list of the messages that are produced by the MQM agent.
These messages should appear in the VCS cluster log. A message is only produced by the MQM
agent if the current setting of LogLevel is at least as high as the level specified for that message.
The explanation of any message can be viewed on the cluster systems, by running:
/opt/VRTSvcs/bin/MQM/explain <msgid>

Message id 3005001
Severity trace
Text qmname <qmname>; userid <userid>; loglevel <loglevel>
Explanation: This message is output to the VCS log whenever an MQM agent method is
invoked. It records the parameters passed to the method.
This message has trace severity, so will only appear if LogLevel='all'

Message id 3005002
Severity trace
Text completed without error
Explanation: This message indicates that an MQM agent method completed and that no
errors were encountered.
This message has trace severity, so will only appear if LogLevel='all'

Message id 3005003
Severity trace
Text Queue Manager <qmname> is responsive
Explanation: The queue manager has been tested by issuing a ping command, and it
responded to it. This is taken as an indication that the queue manager is running
correctly.
This message has trace severity, so will only appear if LogLevel='all'

Message id 3005004
Severity trace
Text Queue Manager <qmname> is starting
Explanation: The queue manager did not respond to a ping test, but it appears to be still
starting up. This is not viewed as an error condition. If startup is taking a long
time then it is possible that there is a lot of log replay to perform.
This message has trace severity, so will only appear if LogLevel='all'

Message id 3005005
Severity trace

40
MC91: High Availability for WebSphere MQ

Text Queue Manager <qmname> not responding (ping=<result>)


Explanation: The queue manager did not respond to a ping test and is not in the process of
starting up. If the queue manager resource is currently supposed to be online
then this is an error condition which will be handled by VCS either restarting
the service group or performing a failover.
This message has trace severity, so will only appear if LogLevel='all'

Message id 3005006
Severity error
Text Problem with loglevels!!
Explanation: This message indicates a problem with either the MQM agent code or the VCS
cluster software. It is generated when the current setting of the LogLevel
attribute is not one of the values in the set documented in the VCS Agent
Developer's Guide. The MQM agent will only produce log messages in
accordance with the setting of LogLevel. An apparently invalid setting will
suppress the logging of messages.
This message has error severity and should be investigated/reported.

Message id 3005007
Severity error
Text Could not locate queue manager <qmname>
Explanation: The queue manager name supplied to the method as parameter <qmname> does
not identify a known queue manager. Please check the cluster configuration and
retry.

Message id 3005008
Severity debug
Text <qmname> not running normally, will be terminated
Explanation: The queue manager identified by <qmname> is the subject of an offline
operation. The queue manager was found to be not in the normal online state
and the offline method will ensure that the queue manager is fully stopped.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005009
Severity debug
Text <qmname> claims to be running, take offline
Explanation: The queue manager identified by <qmname> is the subject of an
offline operation. The status of the queue manager has been
checked and was found to be a normal online state. The offline
method will attempt to perform a graceful shutdown of the queue
manager.
This message has debug severity and should only appear if
LogLevel is set to 'debug' or 'all'.

41
MC91: High Availability for WebSphere MQ

Message id 3005010
Severity error
Text Invalid value for OfflineTimeout
Explanation: This message indicates that the current value of attribute OfflineTimeout is not
set to a valid value. Any value greater than 0 is valid. The MQM agent offline
method exits immediately and the clean method will terminate the queue
manager.
This message has error severity and should be investigated, using hatype.

Message id 3005011
Severity debug
Text attempting <severity> stop of <qmname>
Explanation: As a result of an offline operation, the MQM agent is about to issue an end
command of the specified severity for the queue manager identified by
<qmname>.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005012
Severity error
Text Could not validate userid, <userid>
Explanation: The userid supplied to the method as parameter <userid> does not identify a
known user. Please check the cluster configuration and retry.

Message id 3005013
Severity error
Text Could not fork a process
Explanation: The MQM agent tried to fork a process, but was unable to. This is indicative of
a serious problem with the system and the operation was abandoned.
This message has error severity and should be investigated/reported.

Message id 3005014
Severity trace
Text <qmname> online method scheduling monitor in <wait_time> seconds
Explanation: This message is for information only and indicates that the online method for
queue manager <qmname> has requested to VCS that the first monitor of the
queue manager is scheduled to start in <wait_time> seconds.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005015
Severity error

42
MC91: High Availability for WebSphere MQ

Text Could not run hatype


Explanation: The MQM agent tried to use hatype to read an attribute value, but was unable
to. The operation had to be abandoned.
This message has error severity and should be investigated/reported.

Message id 3005016
Severity debug
Text waiting for <severity> stop of <qmname> to complete
Explanation: The MQM agent is waiting for the queue manager identified by <qmname> to
stop, as a result of an offline operation.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005017
Severity debug
Text <qmname> is still running...
Explanation: This message is for information only. The queue manager identified by
<qmname> is the subject of an offline operation and the MQM agent is
currently waiting for the queue manager to stop. If the queue manager fails to
stop within the time allowed by the agent, the agent will use a more forceful
stop to ensure that the queue manager is fully stopped within OfflineTimeout.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005018
Severity debug
Text <qmname> is stopping
Explanation: The queue manager identified by <qmname> is currently stopping as a result of
an offline operation. This is for information only.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005019
Severity debug
Text <qmname> has stopped
Explanation: The queue manager identified by <qmname> has now stopped as a result of an
offline operation. This is for information only.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005020
Severity error

43
MC91: High Availability for WebSphere MQ

Text ended with errors


Explanation: This message indicates that the method which reported it encountered and
detected a serious error condition and did not complete successfully. Preceding
messages should describe the nature of the error.
This message has error severity and should be investigated/reported.

Message id 3005021
Severity debug
Text strmqm for <qmname> completed
Explanation: This message is notification that a start command was issued for the queue
manager with the name <qmname> and that it completed successfully.
This message has debug severity and should only appear if LogLevel is set to
'debug' or 'all'.

Message id 3005022
Severity error
Text <qmname> could not be started (rc=<rc>)
Explanation: An attempt to start the queue manager did not succeed. If <rc> is 16 then check
that the queue manager exists. If <rc> is 25 then check that the directory
structure for the queue manager exists and is complete - this error could occur if
the queue manager has been deleted but still has a /var/mqm/mqs.ini entry on
one of the cluster systems. It could also indicate a problem with the content of
the service group. Check that the queue manager's filesystems are being
mounted and that the MQM resource has a resource dependency on the relevant
Mount resources. For all values of <rc> check the MQ error logs for details of
why the start failed.
This message has error severity and should be investigated/reported.

Message id 3005023
Severity error
Text Could not open file <filename>
Explanation: The file named <filename> could not be opened. The operation was abandoned.
Please check that the file exists and is readable.
This message has error severity and should be investigated/reported.

Message id 3005024
Severity error
Text Could not list running processes
Explanation: An attempt to list the processes which are currently running did not succeed.
This is indicative of a serious problem with the system and the operation was
abandoned.
This message has error severity and should be investigated/reported.

44
MC91: High Availability for WebSphere MQ

Message id 3005025
Severity error
Text Could not find directory for <qmname>
Explanation: An attempt to locate the queue manager directory for the queue manager named
<qmname> did not succeed. Please check the directory structure for the queue
manager.
This message has error severity and should be investigated/reported.

45