Вы находитесь на странице: 1из 13

White paper on Avaya Aura™ Application Enablement

Services 5.2 High Availability (HA) Configurations


Issue 1.0
AE Services 5.2
November 2009

This white paper is intended for a software application developer or a systems engineer
who is responsible for deploying an application or an AE server in a HA configuration.

The HA configurations covered include:


- The new Avaya Aura™ Application Enablement Services on System Platform release
5.2 failover features.
- All AE Services offers’ interaction with Avaya Aura™ Communication Manager
(CM) Enterprise Survivable Server (ESS) and Local Survivable Processor (LSP).

It also covers CM Processor Ethernet support and improvements to DMCC service


recovery using the CM Time To Service (TTS) feature.

Uninterrupted telephony is important for many enterprises especially for mission critical
applications. Avaya Aura™ Application Enablement (AE) Services on System Platform
(SP) Release 5.2 supports a high availability (HA) cluster of two nodes. The active server
node automatically fails over to the standby node in the event of a hardware failure.
Client applications are able to re-establish communication with the AE Services cluster
when the failover is complete. This failover feature is not supported on the AE Services
5.2 software only and bundled offerings.

Avaya Aura™ Communication Manager (CM) provides Enterprise Survivable Server


(ESS) and Local Survivable Processor (LSP) for failover from the main media server.
This feature provides the ability for media gateways, endpoints, application servers like
AE Services and its applications to continue their operations without a major outage. ESS
and LSP have been supported since AE Services 3.0 and is included in this paper for
completeness.

Avaya Aura™ Communication Manager (CM) provides the Processor Ethernet (PE)
interface for direct connection to the main media server. This feature reduces cost by not
requiring a CLAN for communications. However, a DMCC client application must re-
establish any H.323 registrations that are terminated when an interchange occurs between
a duplicated pair of CMs that are communicating to an AE server over PE, unless the
Time To Service feature is used. Furthermore, an AE Server 5.2 that communicates over
PE does not support the ESS and LSP configurations (only a single IP address is allowed
to be administered on the AE Server 5.2 for a PE connection, and ESS and LSP servers
will have their own (unique) IP addresses, which will always be different than that of the
main media server).

1
Avaya recommends the following:

• CM should be configured for H.323 registration using the Time To Service


feature.
• AE Services 5.2 should use the PE interface except in ESS/LSP environments.
• A local HA cluster of AE Services on System Platform release 5.2 servers is used.
• An application that uses the Device, Media and Call Control (DMCC) service
should keep trying to reestablish the DMCC session when it loses its socket
communication link to the DMCC service because the runtime state is preserved.
This applies to all AE Services configurations.
• An application that uses the CVLAN, DLG or TSAPI service should reestablish
its socket connections and, its monitors/associations if it loses the socket
connection to the service on the AE server because no runtime state is preserved
for these services. TSAPI applications also need to reestablish route registrations.

In this CM ESS configuration, the applications and associated AE Server at the remote
sites are always active and are supplying functionality for the local resources at the
remote site. As described in later sections, this type of configuration ensures the shortest
outage.

AE Services on System Platform 5.2 High Availability

Figure 1a below illustrates a HA cluster of AE Services on System Platform and Figure


1b shows a single AE Services (on System Platform, Software only or Bundled)
communicating to CM through the PE interface.

Headquarters Headquarters
Active AE Server Primary S8720 AE Server Primary S8720

Application G650 Gateways Application G650 Gateways

Figure 1a Figure 1b

The AE Services on System Platform 5.2 release provides higher availability relative to
the software-only, bundled and earlier releases. This configuration monitors the server
nodes for loss of network connectivity and hardware failure events. This information is
used to detect faults and decide when to failover from the active node to the standby node
in the server cluster. The AE Services on the standby node are restarted when a failover

2
event occurs. This feature enables AE Services to continue to provide service to client
applications with reduced downtime when a hardware failure event occurs. In addition to
this, the System Platform will restart the AE Services virtual machine if it does not
maintain its sanity keep alive because of a software fault condition.

Device Media and Call Control Service


Avaya recommends that applications reestablish DMCC sessions and verify that all
associations (monitors, registrations) are still active after a network interruption.

In addition to the SP failover feature, DMCC provides recovery from a software fault or a
shutdown that does not allow the DMCC Java Virtual Machine (JVM) process to exit
normally. The DMCC Service Recovery feature is available on all AE Services
configurations: software only, bundled and on system platform. When the DMCC JVM
process is restarted after an abnormal exit, the DMCC service is initialized from persisted
state information on the hard disk. This persisted state information is saved during normal
operation and represents the last known state of the DMCC service prior to a JVM
abnormal exit. The state information includes session, device, device/call monitor and
H.323 registration data.

From a client application’s point of view, the DMCC recovery appears as a temporary
network interruption that requires the client to re-establish any disconnected sessions.
When the client application re-establishes the session, the DMCC service will send
events for any resources that could not be recovered. These will include monitor stopped
and unregistered event messages and enable the client to determine what needs to be
restored through new service requests. Otherwise, the client will continue to operate as
usual.

TSAPI, CVLAN, DLG and Transport Services


No runtime state information is persisted for these services. The client application must
restore any state that existed before the service was restarted.

ESS/LSP Considerations for AE Services deployment

Figure 2 (shown below) is an illustration of a sample network configuration with the


main S8700 media server and G650 media gateways at the headquarter site. One remote
site (Remote site A) has a G700 media gateway with a LSP and the other two remote
sites (B and C) have G650 media gateways and ESS servers. Each of the remote sites has
an AE Services server and associated application connected to Communication Manager
through the CLAN(s) of one or more local (or remote) G650 media gateways.

Avaya recommends that all applications have a local AE server. In this


configuration, the applications and associated AE Server at the remote sites are always
active and are supplying functionality for the local resources at the remote site. As
described in later sections, this type of configuration ensures the most seamless
survivability in an ESS configuration.

3
Normal operation
Headquarters Remote site A
G650 Gateways Active AE Server
Active AE Server

G700 with LSP

Application
Application

Primary S8700
WAN

Remote site C G650 Gateways G650Gateways


Remote site B
Active AE Server Active AE Server

Application Application

ESS S8500 ESS S8500

Figure 2

In case of a WAN outage (as shown in Figure 3 below), each remote site becomes
independent and provides service without major interruption to endpoints and
applications. Remote site A with a G700 media gateway will have the LSP go online and
the G700 media gateway will connect to that local LSP. It is recommended to configure
the primary search list of the G700 media gateway such that it contains CLANs of only
one site (i.e. headquarters in this case). The secondary search list should contain the LSP
at the local site (site A in this case).

The AE server will detect connectivity failure with the main site (headquarters) and will
notify its applications. The applications will have to direct the AE server to move the
connectivity over to the LSP (described in detail further below).

The G650 media gateways at the remote sites (sites B and C) will connect to the local
ESS server in case of a WAN outage. The AE server will automatically get connected
with the ESS server through the G650 media gateways. This will be transparent to the
AE server and its applications except for what will appear to be a brief network outage
(described in detail further below).

4
The site at the headquarters will continue to function as it did previously in case of a
WAN outage.

Note: Each of the remote sites and the headquarter site will not be able to access each
other’s resources during a WAN outage.

WAN Outage
Remote site A
Headquarters G650 Gateways AE Server
AE Server

G700 with LSP

Application
Application

X
Primary S8700
WAN

G650 Gateways
Remote site C G650 Gateways Remote site B
AE Server
AE Server

Application Application

ESS S8500 ESS S8500

Figure 3

If the main headquarters site is completely down but the WAN is functional, (as shown in
Figure 4 below), the remote sites will behave similar to the WAN outage scenario
described above, but with one important exception. With the ESS feature, the system
will attempt to stay as “whole” as possible. Since the WAN is still intact, all of the G650
gateways end up being controlled by the same ESS server at Remote Site B. Since the
application and AE Server were configured to support only the local resources at the
remote sites, the application continues to function the same whether the sites operate
independently (WAN failure) or jointly (normal operation or site destruction at
headquarters).

5
Site destruction
Remote site A
Headquarters
G650 Gateways AE Server

X
AE Server

G700 with LSP

Application
Application

Primary S8700

WAN

Remote site C G650 Gateways G650 Gateways


Remote site B
AE Server
AE Server

Application Application

ESS S8500 ESS S8500

Figure 4

1. ESS (Enterprise Survivable Server) – Non PE Connectivity

The list below describes the behavior of the AE Services:

a. DMCC (Device and Media Control) Service:


As long as the application is configured to connect to CLANs in the local
gateways, recovery with an ESS server should be very straightforward. The
application will receive an unregistered event for each DMCC softphone when
connectivity is lost from the local gateways (like G600 or G650) to the
primary S8700. At this point, the application should begin attempts to re-
register the DMCC softphones with the same CLAN IP address (es) it was
using before. Note that it takes a little over 3 minutes for the media gateway
(like G600 or G650) to connect to an ESS server. For this reason, it is
recommended that the application keep trying to register with the same CLAN
(through the AE server) for that amount of time before it tries to register with
a LSP (if one exists). When the gateways (like G600 or G650) connect with
the ESS server, the registration attempts will begin to succeed. After the
application has successfully registered all DMCC softphones, it should
reestablish its previous state and resume operation.

6
b. CallInformation Services within DMCC, Call Control Services within
DMCC, and all other CTI services
The CallInformation and Call Control services within DMCC and all other
CTI Services (TSAPI, CVLAN, DLG and JTAPI) use the Transport (AEP)
link to communicate with Communication Manager. The transport links
(Switch Connections) on each AE Server should be administered to
communicate only with CLANs in gateways that are local to the AE Server’s
site. If the system is configured in this fashion, the application / AE Server
will not have to take any unusual action to recover in the event that a gateway
loses connectivity to the primary S8700 and transitions to an ESS server.

If a gateway loses connectivity to the primary server for an extended period of


time (more than 30 seconds), all AEP sockets that are established through
CLANs resident in that gateway will drop. If an AE Server loses all of its
AEP connections, it will notify any connected applications via a
LinkDownEvent (DMCC CallInformationServices) or a CTI link down
notification (CTI services). For Call Control Services within DMCC, Avaya
recommends that applications add a CallInformationListener and look for a
LinkDown event for indication that connectivity to the main site is down. (In
future releases, Call Control Services clients will receive a MonitorStop
request for all call control monitors if the link is lost to the main site.)
Depending on the CTI API, clients will receive an appropriate event when the
connectivity to the main site is down. CVLAN clients will receive an “abort”
for each association. TSAPI clients will receive a CSTAMonitorEnded event
if the client is monitoring a device and/or a CSTASysStatEvent with a link
down indication if the client is monitoring system status. Avaya JTAPI 5.2
and later clients will receive a “call event transmission ended” event if the
client has call listeners. Otherwise, an “observation ended” event will be
received if the client has call observers. DLG clients will receive a link status
event with a link down indication and a cause value.

The AE Server will then automatically attempt to reestablish the AEP links.
Note that it takes a little over 3 minutes for the media gateway (like G600 or
G650) to connect to an ESS server. Once the media gateway has registered
with the ESS server, the AE Server will succeed in establishing its AEP links
very soon thereafter (after around 30 seconds). As soon as an AEP link is
established, the application will be notified that the CTI link is back up, and
the application can begin to resume normal operations. Since there is no run-
time state preserved on a transition to an ESS server (as there is with an
interchange on an S8700) all application state must be reestablished. Note
that, from the AE server’s and application’s perspectives, the failure scenario
and recovery actions appear exactly the same as a long network outage
between the AE Server and the gateways.

7
There is one important note with respect to the current versions of AE
Services (i.e., AE Services 3.1 and above) ESS behavior and AE Services 3.0
ESS behavior. If an AE 3.0 Server ends up with AEP links to gateways that
are controlled by different ESS or primary call servers (i.e. a fragmented
system), the system will not behave in a sane fashion. Some messages will be
sent to one call server, and others will be sent to other call servers, with no
deterministic behavior with respect to where messages are being sent. Recall,
however, that the ESS feature attempts to keep as many gateways as possible
under the control of a single call server. Given that this is the case, it is
possible to configure the system such that it is extremely unlikely that a 3.0
AE Server will have AEP links to different fragments of a survivable system.
The safest configuration is to have the 3.0 AE Server talk only to CLANs
resident in a single gateway. Avaya recommends that wherever possible, all
gateways through which an AE Server connects are all on the same LAN,
preferably even on the same ethernet switch to avoid fragmentation. In such a
configuration, it is virtually certain that the gateways will all be controlled by
the same controller at all times, and the system will therefore always operate
in a sane fashion.

Unlike AE Services 3.0, however, in all newer versions of AE Services (i.e.,


AE Services 3.1 and above) ESS behavior is deterministic. AE Services 3.1
and above will only establish and use links to gateways that are controlled by
the same ESS or primary call server. Therefore, it is always known by the
application to which call server messages are being sent. More specifically,
the AE server will only use links to the first ESS or primary call server to
which it establishes a connection. Subsequent connections that are made to
any other servers, other than the primary server, will be immediately dropped.
If a connection to the primary server is (re)established, then any existing
connections to any ESS servers or LSPs will be dropped, and the primary
server will be used again (note that this will result in the loss of all
monitors/associations, and the behavior described above when an AE Server
loses all of its AEP connections will apply).

2. LSP (Local Survivable Processor)


A media gateway like G700, G350 or G250 can be controlled by a LSP running
Communication Manager if the main Communication Manager media server (S8700,
S8500 or S8300) is unavailable or down. The AE Server connects to either a CLAN
(S8700) or directly to the PE (S8300) to communicate with Communication Manager.
Typically, LSPs are configured for remote media gateways so that those media
gateways can get service in case the connectivity to the main site is down (e.g. WAN
connectivity failure as shown in Figure 3 or site destruction as shown in Figure 4).
Once the LSP detects a failure of connectivity to the main media server, the
Communication Manager running on that LSP comes online.

8
Starting with Communication Manager 3.1, new administration forms have been
created to control the behavior of survivable processors (i.e. LSPs and ESSs).
Particularly, the Enabled field on the add/change survivable-processor forms that
can be set to one of the following three values:

• "n" or no: This means that this processor channel will be disabled on the LSP or
ESS.

• "i" or inherit: This means that this link is to be inherited by the LSP or ESS
exactly as administered on the main. When set to "i" the remaining data on the
line is recopied from the translations from the main and may not be edited. Note
that this does not mean that the link will work. For example, if the link is
administered to a CLAN and an attempt is made to inherit this link on an LSP, the
link won’t work because the LSP has no CLAN. It is most appropriate to use "i"
for a link administered via procr or for an ESS.

• "o" or overwrite: This entry will cause the link field to change to "p" and be
uneditable. The data entered on this line will overwrite the processor channel
shown on this line when the data is file-synchronized to an LSP or ESS.

Avaya recommends different administration settings for the Enabled field depending
on the configuration of a system (as shown below).

Configuration Administration
Only LSPs (no ESSs) set Enabled to “o”
Both LSPs and ESSs set Enabled to “n”

Table 1

2.1 Configurations with only LSPs

For configurations with LSPs and no ESSs, Avaya recommends setting the Enabled
field to “o” (overwrite). This will allow automatic transition to a local LSP after
detecting connectivity failure to the main site.

2.2 Configurations with both LSPs and ESSs

If both LSPs and ESS servers are configured, Avaya recommends setting the Enabled
field to “n” (disabled) for LSPs. Setting the Enabled field to “o” (overwrite) for LSPs
will most likely result in undesired behavior. Consider the scenario in Figure 4 where
the main headquarters site is completely down but the WAN is still functional. If the
Enabled field is set to “o” for the LSPs, then the AE Server will always connect to an

9
LSP first since it would be available for connections before any of the ESS servers.
Remember, it takes a little over 3 minutes for a media gateway (like a G600 or G650)
to connect to an ESS server. Additionally, while connected to the LSP, the AE Server
will deny (i.e. immediately drop) subsequent connections to any ESS servers.
However, in this scenario, it would have been preferable to connect to one of the ESS
servers first since it’s possible that the ESS server had connectivity and full control of
the system.

2.3 Transitioning to a local LSP when Enabled is set to “n”

Note that if the Enabled field is set to “n” (disabled), the AE server will detect
connectivity failure to the main site, but it will not automatically transition to a local
LSP. Depending on the Link type different actions need to be performed, as
described below, by the applications using the AE server.

a. DMCC (Device and Media Control) Service:


DMCC uses the H.323 link for each DMCC softphone extension to talk to
Communication Manager. When the connectivity to the main site is down the
DMCC service on the AE server detects it and sends an unregistered event to
the application for each DMCC extension. Avaya recommends that the
application then retry connecting to the main Communication Manager
(through the DMCC service on the AE server). If that fails, it should try
connecting to the Communication Manager (through the DMCC service on
the AE server) on the local LSP. If the LSP is up, the application will get
connectivity to Communication Manager (via the DMCC service on the AE
server).

When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually or automatically (if configured
properly). The DMCC service will detect connectivity failure to the LSP and
will send an unregistered event to the application for each DMCC extension.
Avaya recommends that the application then retry connecting to the main
Communication Manager through the DMCC service on the AE server.

AE Services has a feature in 3.0 that allows the use of a symbolic name
for a list of ip-addresses (i.e. Gatekeeper list). Once administered through
the AE Services OAM web-page, the application can then use the
symbolic name to get a DeviceID (i.e. DMCC softphone extension) for a
particular Communication Manager. This feature allows the application
to easily switch over the DMCC softphones to the LSP using the symbolic
name.

b. CallInformation Services within DMCC:


The CallInformation service within DMCC uses the Transport (AEP) link to
communicate with Communication Manager. When the connectivity to the
main site is down the CallInformation service on the AE server detects it and

10
sends a link down event to the application. Avaya recommends that the AE
server be pre-configured to have the LSP administered under the main site
switch name through the AE Services OAM web-page. This connection will
not be active as long as the LSP is not up. The application will have to use
System Management Services to dynamically configure the Transport (AEP)
link (using the change ip-services command) on Communication Manager
running on the LSP once it receives the Call Information link down event. The
application should use the WSDL defined in:
http://<ae-svcs-server-name>/sms/SystemManagementService.php?wsdl
with the IPService Model defined in:
http://<machine-name>/sms/ModelSchema.php?model=IPServices

When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). In either case, the Call Information service will
detect transport link connectivity failure to the LSP and will send a link down
event to the application. Also the transport link to the main Communication
Manager will be back up for which the application will receive a link up event
from Call Information services.

Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a Call Information link down event unless
connectivity to all CLANs is lost.
b) If the Transport AEP link is connected to one CLAN and the DMCC H.323
link is connected to another CLAN, it is possible that one of the connections
could be down. In this case, if LSPs are being used, then one of the links
could be on the main server and the other could be on a LSP. This will cause
undesirable behavior.
c) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.

c. Call Control Services within DMCC:


Call Control Services within DMCC uses the Transport (AEP) link to
communicate with Communication Manager. Avaya recommends that
applications add a CallInformationListener and look for a LinkDown event for
indication that connectivity to the main site is down. (In future releases, Call
Control Services clients will receive a MonitorStop request for all call control
monitors if the link is lost to the main site.) Avaya recommends that the AE
server be pre-configured to have the LSP administered under the main site
switch name through the AE Services OAM web-page. This connection will
not be active as long as the LSP is not up. The application will have to use
System Management Services to dynamically configure the Transport (AEP)

11
link (using the change ip-services command) on Communication Manager
running on the LSP once it receives the link down event. The application
should use the WSDL defined in:
http://<ae-svcs-server-name>/sms/SystemManagementService.php?wsdl
with the IPService Model defined in:
http://<machine-name>/sms/ModelSchema.php?model=IPServices

When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). The transport link to the LSP will be down and
the application will receive a link down event. Also the transport link to the
main Communication Manager will be back up for which the application will
receive a link up event.

Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a link down event unless connectivity to all
CLANs is lost.
b) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.

d. TSAPI, CVLAN, DLG Service and JTAPI:


The TSAPI, CVLAN, DLG Service and JTAPI on the AE server use the
Transport (AEP) link to communicate with Communication Manager.
Depending on the API, clients will receive an appropriate event when the
connectivity to the main site is down. CVLAN clients will receive an “abort”
for each association. TSAPI clients will receive a CSTAMonitorEnded event
if the client is monitoring a device and/or a CSTASysStatEvent with a link
down indication if the client is monitoring system status. Avaya JTAPI 5.2 and
later clients will receive a “call event transmission ended” event if the client
has call listeners. Otherwise, an “observation ended” event will be received if
the client has call observers. DLG clients will receive a link status event with
a link down indication and a cause value.

Avaya recommends that the AE server be pre-configured to have the LSP


administered under the main site switch name through the AE Services OAM
web-page. This connection will not be active as long as the LSP is not up.
The application will have to use System Management Services to dynamically
configure the Transport (AEP) link (using the change ip-services command)
on Communication Manager running on the LSP once it receives the link
down event. The application should use the WSDL defined in:
http://<ae-svcs-server-name>/sms/SystemManagementService.php?wsdl
with the IPService Model defined in:

12
http://<machine-name>/sms/ModelSchema.php?model=IPServices

When the connectivity to the main server is back up, the LSP would need to
be put in offline mode either manually (by giving a “reset system 4”
command in Communication Manager) or automatically (if configured
properly through the “change system-parameters mg-recovery-rule” form
in Communication Manager). The transport link to the LSP will be down and
the application will receive a link down event. Also the transport link to the
main Communication Manager will be back up for which the application will
receive a link up event.

Note: a) If the Transport (AEP) link has multiple CLAN addresses configured,
the application will not receive a link down event unless connectivity to all
CLANs is lost.
b) Avaya recommends that for remote sites with G700 gateways and LSPs
(and without a G600/G650/MCC/SCC on the same site), the transport (AEP)
link from AE Server at that remote site be configured to link to a CLAN(s)
(on a G600/G650/MCC/SCC) at the main headquarters site.

Terminology and Acronyms


Term Meaning
AEP Application Enablement Protocol
AE Services Application Enablement Services
API Application Programming Interface
ASAI Adjunct Switch Application Interface
CLAN Control Local Area Network interface card
CM Communication Manager
CVLAN Call Visor LAN
DLG Definity LAN Gateway
DMCC Device, Media and Call Control
ESS Enterprise Survivable Server
HA High Availability
JTAPI Java Telephony API
LSP Local Survivable Server
MCC Multi-Carrier Cabinet
PE Processor Ethernet also referred or procr
SCC Single Carrier Cabinet
SP System Platform
TSAPI Telephony Server API
WSDL Web Service Definition Language

13

Вам также может понравиться